Restore deleted MongoDB documents using Oplog
Source: Dilbert
Recently I found myself trying to restore corrupted data to it’s original form. A query gone wrong had erased valuable data from a MongoDB collection and it needed to be restored. This blog is a means to note my learnings from that incident for my future self and others.
Oplog is amazing
If you’re using a MongoDB replica set, you can use MongoDB Oplog to undo almost any type of recent data loss. If you never heard of it, go read the docs as I’ll skip to the restoring part.
The oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
Restore backup
Restore whatever latest backup you have before the the data was corrupted. It’ll be used as a base. Make sure the oplog has operations from or before the time the backup was taken, otherwise the restore may not work.
Find the faulty query/queries
Now you need to browse the contents of the Oplog and look for the query which corrupted the data in the first place. Oplog is a special collection, but still a collection. So, you can query it to narrow down your results like this:
1 | Replica:PRIMARY> use local |
Let’s say you found the query and it looked like:
1 | ... |
Note down the value of field ts
. It will be needed later.
Take a dump of oplog
Before you can restore your your from oplog, you need to dump it to a file.
1 | mongodump -d <DBNAME> -c oplog.rs -o oplogD |
Now you have a dump of the oplog in oplogR/oplog.bson
.
Restore from Oplog
You want to restore the data prior to the faulty write operation.
1 | mongorestore --oplogReplay --oplogLimit 1528225054:1 oplogR |
The value of oplogLimit
paramter is the ts
property of the faulty query you noted earier.
Mongo oplog is idempotent, i.e. it can be applied multiple times without duplicating data. This is why you need not give mongorestore
a start time.
Selective restore
The previous method helps when the good and bad operations are clearly separated by time. But that’s not the case mostly. It’s likely that the good and bad writes happened around the same time.
Here, you need to remember that Oplog is a collection too, and in addition to be queried on, it can also be modified. You need to copy oplog to an uncapped collection and delete the bad write operations.
After you’ve removed all bad operations you can continue the restoration process from a dump of the modified oplog. But this time, no limit argument would be required.
Hope this helps!
❤️ code
This post is inpired from resources like Asya’s stackoverflow answer which was used to solve the problem at hand.