Thursday 17 May 2012

Killing processes

There's a fairly standard way of killing out of control or overrunning processes - identify the culprit, work out if it can be killed, and let it roll back the transaction. But what do you do if this doesn't roll back quickly?

Having not had this immediately to the forefront of my mind today, I've collated some information on killing and rollback of connections.

To identify the blocking process SPID:

EXEC sp_who2

See what's long running or blocking, assess it, then if appropriate kill off the culprit with the command:

KILL XX

(Where XX is the ID of the SPID)

Great, but then it rolls back.
So, how can you tell how long that will take?

select wait_type, wait_time, percent_complete from sys.dm_exec_requests where session_id = XX

Alternatively

KILL XX WITH STATUSONLY

This latter query also gives you an estimation of the time remaining for the rollback to complete.

If you need to run it more urgently, you're limited to doing a dirty rollback.

This can be accomplished by connecting to the database and setting it to single user mode with rollback immediate. Note however this leaves the data in an uncertain state and therefore is NOT recommended for production databases, as it is a "dirty" rollback.

If you needed to do this (e.g. if a rollback has hung) on a production server you could set the database offline with rollback immediate and then restore the database from backups.

Rollback is mostly single-threaded, which means what a rollback can take significantly longer than the query was running before the rollback was issued.

Rollback can also take longer because the SQL may use cached plans for the initial query but be unable to use them for rollback.

Restarting the server isn't always the answer either - and in any case is a very drastic step on a production server! (It also drops the connection which can be problematic for determining remaining time).

See this MSDN blog post for more information.

Note that if an external process is 'killed' then it may not rollback all the way. To avoid this, try to kill the external process at the OS level.

There is some discussion of that in this MSDN social post.

This connect article has more information on what to do about unresponsive rollbacks.

Of course the ideal is that we don't get into this situation in production - and code reviews etc help here, but the world I inhabit isn't always perfect.