Minimizing data loss when accidents happens

Overview
"Help! I just deleted all rows in the invoice table! Can you undo the operation?"
You might have been there? Chances are that the operation wasn't performed inside a transaction, which mean that you are probably in for a RESTORE operation.
Say that the mistake was performed at 15:45. The user is well trained and came to you immediately after the mistake (we allow 5 minutes for some agony), which mean the user came to you at 15:50.
A database backup is performed every night at 01:00 and the transaction log is backed up every hour.

Options

  1. Restore the most recent database backup.
  2. Restore the most recent database backup and all subsequent log backups.
  3. Perform a log backup; restore the most recent database backup and all subsequent log backups (including the very last one).
  4. Restore into another database and copy the affected data to the production database.
  5. Use some tool which can generate "undo" operations from the transaction log and apply that script to your production database.
For all options, you want to start with kicking all users out of the database, so they can't do any further modifications between the time the user who did the mistake came to you and the time you start to act on the mistake (start the restore operation, for instance). Say that this takes you 2 minutes, the time is now 15:52.

1. Restore the most recent database backup.
This one is obvious. You will lose all work performed that day. Not acceptable in most situations.

2. Restore the most recent database backup and all subsequent log backups.
(Requires full or bulk_logged recovery model and proper handling of transaction log backups.)
This is also rather obvious. You will lose all work performed since 15:00. Might be acceptable, but we can do better.

3. Perform a log backup, restore the most recent database backup and all subsequent log backups (including the very last one).
(Requires full or bulk_logged recovery model and proper handling of transaction log backups.)
First you perform a regular transaction log backup.

You now want to restore the database backup and all subsequent log backups. But you also need to specify STOPAT for the last log backup, else you will bring the database to the state of 15:52, and the accident has already happened. If you know when the accident happened, you just use that time (a few moments before, of course) for the STOPAT parameter in the final RESTORE LOG command.
But what if you don't know the time when the accident happened?

3a. Get the time to stop from the transaction log backup.
Unfortunately, SQL Server doesn't come with any log reader tools usable for this purpose (see note below). Sure, you have DBCC LOG and fn_loginfo(), but these will not provide you with readable information. If you want to read the log backup when the accident happened, you need to use some 3:rd party tool. I have listed the ones I know of on my links page.
Note:I have just learned (2011-07-14) from Edwin Sarmiento that there is an undocumented function that allow us to somewhat decipher log records in a log backup file: fn_dump_dblog(). I haven't played with this function (yet) myself, but this blog post by Janice C. Lee discusses it and has a good example.

3b. Restore the same last log backup several times to find out when the mistake happened.
Another approach is to restore the same last log backup several times, pushing forward the time for the STOPAT parameter a little each time. After each restore, you do a SELECT or similar to find out whether the accident has happened yet. You now know when the accident happened (15:45), so you restart the restore operations and specify this time for the final log restore.

The ability to restore the same log backup several times, pushing forward the STOPAT parameter a little, is a well hidden feature in SQL Server. I have checked with MS and this is a supported restore method. At then end of this article, you'll find a script which displays this.

4. Restore into another database and copy the affected data into the production database.
The advantage with this technique is that you will only lose data for the affected table(s). Use any of above options, but restore into a new database. Then use INSERT SELECT, SELECT INTO, DTS or some other techniques to copy the data to your production database.

The downside with this technique is that the database now has data from two different time periods. I.e., the database is potentially out-of-sync if you have relations inside the database or other time related dependencies (this should have been allowed unless that happened first, so you can't just unto "that"). And most databases do indeed have such relationships and dependencies. Ether explicitly defined using FOREIGN KEY constraints, transactions etc. or implicitly and hopefully maintained by the applications. Needles to say, this is only for the more advanced DBA who understand how the data relate and the consequences of doing this.

5. Use some tool which can generate "undo" operations from the transaction log and apply that script to your production database.
(Don't expect this to be an option if you are in simple recovery model since SQL Server will empty the log by itself - i.e., log records most probably no longer around.)
This step doesn't involve any restore. You use a 3:rd party tool which read the transaction log and based on that generates operations that effectively undo the mistake. See my links page for such tools. As with above, the data is at risk as there could be operations performed after the mistake that was based on the mistake actually took place!

How much data did we lose?
Common to scenario 1 to 3 is that we lose at least 7 minutes of work, from 15:45 (the accident) to 15:52 (when you got to work).
  • If the user (probably a developer or DBA in this example) performed the operation inside a transaction, all needed would be a ROLLBACK and we would have zero data loss.
  • Scenario 1: We lost 15 hours and 52 minutes of work.
  • Scenario 2: We lost 52 minutes of work.
  • Scenario 3: We lost 7 minutes of work.
  • Scenario 4: Same as 1-3, but only for the table(s) involved in the mistake. But you might have a messed up database.
  • Scenario 5: Same as 3, but only for the table(s) involved in the mistake. But you might have a messed up database.
Any lessons learned?
  • Always perform transaction log backup regularly if you want to minimize data loss.
  • Train your DBAs and developers to protect risky operations inside transactions.
  • Don't let developers work on production systems.
  • Train your users to come to you immediately after a mistake happened. The longer they wait the more data you will lose.
  • Perform a log backup immediately. This gives you the ability to go the restore route as specified in 3b. Once you started the restore database operation, it is too late for this.
Script to RESTORE the same log backup several times, using different time for STOPAT
The script below creates a database with a table in it, and then does a BACKUP DATABASE. It also creates a temp table to keep track on when each INSERT operation is performed, to be used for the RESTORE operations. Then a number of INSERT operations are performed, with a delay of 5 seconds. The initial stage finishes with a BACKUP LOG.

Finally, the RESTORE operations. First RESTORE DATABASE. Then a cursor is used over the temp table so we can restore the same log backup, using STANDBY so we can do SELECT in between each RESTORE and , of course, using a different time for each RESTORE operation.
Warning: Below will delete a database named 'test', if such exists!

-- Initial stage: create the database, tables, BACKUP DATABASE,  
-- some INSERT operations and finally a BACKUP LOG.
SET NOCOUNT ON
USE master
IF OBJECT_ID('tempdb..#dt') IS NOT NULL DROP TABLE #dt
CREATE TABLE #dt(seq INT, when_ datetime)
GO
IF EXISTS (SELECT * FROM sysdatabases WHERE name = 'test') DROP DATABASE test
GO
CREATE DATABASE test
GO
CREATE TABLE test..t(c1 INT)
BACKUP DATABASE test TO DISK = 'C:\test.bak' WITH INIT
GO
INSERT test..t VALUES(1)
INSERT #dt (seq, when_) SELECT 1, GETDATE()
WAITFOR DELAY '00:00:5'
INSERT test..t VALUES(2)
INSERT #dt (seq, when_) SELECT 2, GETDATE()
WAITFOR DELAY '00:00:5'
INSERT test..t VALUES(3)
INSERT #dt (seq, when_) SELECT 3, GETDATE()
WAITFOR DELAY '00:00:5'
INSERT test..t VALUES(4)
INSERT #dt (seq, when_) SELECT 4, GETDATE()
BACKUP LOG test TO DISK = 'C:\test.bak'
GO

-- RESTORE stage: First RESTORE DATABASE,  
-- then use a CURSOR so we can construct the RESTORE LOG command.  
-- Use both STANDBY and STOPAT and restore the same backup several times,
-- with different time for each.
RESTORE DATABASE test FROM DISK = 'C:\test.bak' WITH NORECOVERY
DECLARE c CURSOR FOR SELECT when_ FROM #dt ORDER BY seq
DECLARE @dt datetime
OPEN c
FETCH NEXT FROM c INTO @dt
WHILE @@FETCH_STATUS = 0
BEGIN
SET @dt = DATEADD(ms, 50, @dt)
RESTORE LOG test FROM DISK = 'c:\test.bak'
WITH FILE = 2, STANDBY = 'C:\test.und'
,STOPAT = @dt
SELECT * FROM test..t
FETCH NEXT FROM c INTO @dt
END
CLOSE c
DEALLOCATE c
GO

--Clean up  
DROP DATABASE test