Load and logevity testing, caused some new issues to be reported by our software. The main symptom was failed database calls with the error message [ODBC SQL server Driver]Timeout Expired.
How do we debug this ? Well, the very obvious thing to do is to find out what that error message signifies. A brief search on the net showed a plausible list of issues –
1. You are using a remote server and there is a connectivity issue
2. Your queries are blocking and causing timeouts
3. The query execution is very slow (pure Db design / queries or server performance issues and/or too many results)
We were running our system on a local machine, which ruled out connectivity issues. The queries that were timing out, where heavy ones, but ones that normally work under normal circumstances. However the test being extrememly loaded in nature, how do we pin point the issue to be a load issue vs a blocking issue ?
Well the only thing to do when faced with multiple such options is to try and rule out the options one by one until you are left with the true cause.
So the first thing we tried was to check if lock timeouts exist in the system at all.
Perfmon counters (MSSQL$YourDbname:Locks – Lock Timouts/sec) : told us that lock timeouts where indeed present in the system.
Now how would you find out what is blocked ? MSDN resources can show you the SQL server process(spid) that blocks other spid’s – but it never tells what is getting blocked.
Fortunately for us, Erland Sommarskog has a cool utility that combines both information to show exactly which connection is causing the blocks. It also tells you the resource type, leaving no room for confusion.
All that i had to do to use the tool is to download it and run the sql file which creates the stored procedure to monitor the locking. After that i simplay ran the tool as “EXEC aba_lockinfo”
Next i exported the results as a csv file and imported them into excel, which allowed me to sort the data very easily. The basic thing to look for is, who is blocking whom. This can be checked by looking at the blkby column. This told me which of my spids where being blocked and which resource it was waiting for.
From here it was a short path into finding the actual issue in the code base.
Happy Debugging !!!