Recently I found myself trying to restore corrupted data to it’s original form. A query gone wrong had erased valuable data from a MongoDB collection and it needed to be restored. This blog is a means to note my learnings from that incident for my future self and others.
If you’re using a MongoDB replica set, you can use MongoDB Oplog to undo almost any type of recent data loss. If you never heard of it, go read the docs as I’ll skip to the restoring part.
The oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
Restore whatever latest backup you have before the the data was corrupted. It’ll be used as a base. Make sure the oplog has operations from or before the time the backup was taken, otherwise the restore may not work.
Now you need to browse the contents of the Oplog and look for the query which corrupted the data in the first place. Oplog is a special collection, but still a collection. So, you can query it to narrow down your results like this:
Replica:PRIMARY> use local
Let’s say you found the query and it looked like:
Note down the value of field
ts. It will be needed later.
Before you can restore your your from oplog, you need to dump it to a file.
mongodump -d <DBNAME> -c oplog.rs -o oplogD
Now you have a dump of the oplog in
You want to restore the data prior to the faulty write operation.
mongorestore --oplogReplay --oplogLimit 1528225054:1 oplogR
The value of
oplogLimit paramter is the
ts property of the faulty query you noted earier.
Mongo oplog is idempotent, i.e. it can be applied multiple times without duplicating data. This is why you need not give
mongorestore a start time.
The previous method helps when the good and bad operations are clearly separated by time. But that’s not the case mostly. It’s likely that the good and bad writes happened around the same time.
Here, you need to remember that Oplog is a collection too, and in addition to be queried on, it can also be modified. You need to copy oplog to an uncapped collection and delete the bad write operations.
After you’ve removed all bad operations you can continue the restoration process from a dump of the modified oplog. But this time, no limit argument would be required.
Hope this helps!
This post is inpired from resources like Asya’s stackoverflow answer which was used to solve the problem at hand.