SQL-Server


Some time back i encountered a scenario, in which a chain of stored procedures gets called recursively and this was eating into our performance (90% cpu time). All of a sudden obscure micro issue about whether to use table variable or temp tables suddenly became important.
After ferreting out some good sources of information on comparing these, we managed to reduce the cpu consumption by
  • Making all our stored proc use temp tables rather than table variables made a difference for us, because our procedures where getting invoked frequently and stored-procedures are recompiled every time if table variables are used.
  • But we had minimal impact from db writes due to temp tables because very few data was actually used
Ultimately the maximum improvement came from changes to the algorithm to make sure that the procedures do not get needlessly invoked or at least parts of them do not run (recursive algorithm was flattened out)

LearningIf micro optimizations start to look important, it is time to look for improvements in the algorithms that are used.
NOTES – Information gleamed from the exercise
  1. Most often you will require table variables
  2. Table variables cause FEWER recompilations than temporary tables. (source MSDN)
  3. Table variables seems to be scoped at the stored proc level. So if you create a table variable inside a while loop, it would retain old data (painfully found from a bug)
  4. Table variables cannot have indexes created on them except the primary key specified when creating these tables
  5. Create index command can cause query recompilations
  6. Table variables are held in memory and involve no disk IO – therefore no rollbacks are possible on these. On the upper side no locks are acquired by using table variables.
  7. Table variables cannot have statistics created on them so for huge data it COULD be inefficient  ?
Advertisements

Load and logevity testing, caused some new issues to be reported by our software. The main symptom was failed database calls with the error message [ODBC SQL server Driver]Timeout Expired.

How do we debug this ? Well, the very obvious thing to do is to find out what that error message signifies. A brief search on the net showed a plausible list of issues –

1. You are using a remote server and there is a connectivity issue
2. Your queries are blocking and causing timeouts
3. The query execution is very slow (pure Db design / queries or server performance issues and/or too many results)

We were running our system on a local machine, which ruled out connectivity issues. The queries that were timing out, where heavy ones, but ones that normally work under normal circumstances. However the test being extrememly loaded in nature, how do we pin point the issue to be a load issue vs a blocking issue ?

Well the only thing to do when faced with multiple such options is to try and rule out the options one by one until you are left with the true cause.

So the first thing we tried was to check if lock timeouts exist in the system at all.

Perfmon counters (MSSQL$YourDbname:Locks – Lock Timouts/sec) :  told us that lock timeouts where indeed present in the system.

Now how would you find out what is blocked ? MSDN resources can show you the SQL server process(spid) that blocks other spid’s – but it never tells what is getting blocked.

Fortunately for us, Erland Sommarskog has a cool utility that combines both information to show exactly which connection is causing the blocks. It also tells you the resource type, leaving no room for confusion.

All that i had to do to use the tool is to download it and run the sql file which creates the stored procedure to monitor the locking. After that i simplay ran the tool as “EXEC aba_lockinfo”

Next i exported the results as a csv file and imported them into excel, which allowed me to sort the data very easily. The basic thing to look for is, who is blocking whom. This can be checked by looking at the blkby column. This told me which of my spids where being blocked and which resource it was waiting for.

From here it was a short path into finding the actual issue in the code base.

Happy Debugging !!!

I turned off constraint checks for foreign keys in my central table – this worked wonders for me. But your mileage may vary depending on the amount of contention you face.

In this case, my table was frequently updated even though none of the updates modified the foreign keys.  But y removing the foreign key constraint check (the check not the foreign key relation-ship), i was able to see significant improvements.

I do not have the exact figures but i belive it was to the tune of 40-50% or so.

Disclaimer – Of-course i hope you know what removing of the FK constraint check implies

Disclaimer – I turn off the checks only weeks before the final release, and make a range of unit tests apart from having a dedciated test team to run more tests. If you rely on the Fk constraint to catch errors this is not recommended

Thats quite easily done.

  1. For a foreign key, explicitly set “Enforce Foreign Key Relationship” = No (Default is yes)
  2. Set Cascade Deletes to be No Action instead of Delete

There, now you have a table that will not enforce foreign key relationships !!!

ps : By default when a Foreign key relationship is created, the defaults for the foreign key relationship fields are set to enforce the FK. You have to explicitly set them to No.

Disclaimer : I can think of situations where this behavior is required as in report tables, or some funk de-normalized high performance corner cases. But i would ask you to rethink those cases if ever you come against them.

MSDE workload governor kicks in whenever there more than 8 concurrent queries running against the sql server engine. It then proceeds to introduce artifical delays in your query results such that the performance of the entire system is affected.

We faced the same issue for the Cisco NetManager tool that we were constructing and in the end, after a long period of trial and errors we have managed to cure us of this persistent little “feature” MSDE has.

So how does one go about beating / disabling the Workload Governor ?

  1. Optimize Optimize all dB operations.
  2. Reduce the no of round trips to the database by getting the data you know will be required later in one shot, in a single stored procedure or performing multiple operations together.
  3. Use a semaphore to the guard the no of concurrent accesses to the database – make sure all db access has to pass through this guard. The ADO methods one would have to guard are
    1. Execute
    2. Connection Open
    3. Connection Close
    4. Recordset – Open / Update / NextRecordset  /MoveFirst
    5. BeginTransaction / CommitTransaction / Rollback
  4. Perform application level caching of  DB connections if the no of open close connections are too high.
  5. Reduce the load at the sql server – how ? By caching the most frequently required Db data in memory – Not all applications can do this as data integrity becomes an issue. But if properly done, this is the single most important step that can give the biggest bang for your money.

    However please do note that this is also the single most dangerous act that can be done on a DB application. Managing a cache and controlling concurrent updates is not an easy task. We managed to achieve this by using only volatile reads and interlocked updates. Since no explicit locks are involved, this gives us a high level of concurrency.

This it – these small 5 steps are all that it takes to transform your slow moving implementation into a killer app at least wrt to the DB. Verifying these 5 steps took 6 months of time in skunk work style investigations.But now that i have a working application with these features built in, i can now confidently vouch for their efficiency.

Had it not been for the MSDE workload governor, we might have shipped a super duper unoptimized version of our Cisco netmanager tool, which stressed the disk, stressed the cpu and created an artificial bottleneck in sql server.

The restrictions that workload governor imposed on us, strove us to better the design and optimize the process as much was required and we have finally managed to beat the workload governor.(Shall say how in the next installment)

Had this workload governor issue, never come under our radar, we would have gone forward and deployed the unoptimized version and encountered the issues with our design, much later into the product cycle, by which time the code is more entrenched and harder to root out.

I should say that for any future SQL server product, that i encounter, MSDE would be made the preffered initial platform and only by reaching an acceptable level of performance using this little server, would i want to use the full version, and that too only if the product requires that amount of performance.

From a product management point of view , this saves around 1000$ on the product cost at the very minimum depending on your scale of resource consumption. But the real savings will be achieved when you take the same system and try to make it big aka scale.

When you try to scale an in-efficient system, you would also scale your inefficiencies for free along with it. So instead of say, necessitating a 1000$ SQL-server workgroup edition running on a 4000$ hardware, you might have to go for two such servers running on much bigger systems with more tweakings on the DB end (partitioned tables, disk arrays), requiring a complex installation, which again has to be coded and documented, and perhaps require a DBA to handle all this. Talk about expense !!!! It would take a long time to convert the poor end user, even if there are no other comparable alternatives in the market.

I wish there were comparable restrictions in place for all our tools, so that we could try and get under them and better ourselves in the process.

SQL server manages memory quite unlike other applications.  This has been a cause of many heartburns, especially when it comes to explaining to customers and test teams as to why SQL server consumes so much memory or explaining why the “memory leak” that is observed is not really a leak.

One key factor that contributes to this perceived memory leak is SQL server’s aggressive caching policy. It primarily caches the data, and the SQL execution plans and dig this “never gives up the memory unless and until it feels system is starved of memory”.  However this behavior is not all that great when it comes to less than perfect applications.  There seems to be a flaw (atleast wrt SQL server 2000) in the way it releases memory in that, the operating system seems to become aware of a memory starvation quite before SQL server realizes this fact and expands its paging system to increase the amount of “total memory” available in the system.

So it is in the best interest of the application to not eat up un-necessary cache unless it is required. This post covers the plan cache and how its growth can be curtailed without sacrificing any performance. In fact you gain more performance by better cache utilization in terms of cpu and memory. Read ahead to find out how this can be done.

Facts of life

  • SQL server will create execution plans for all queries and stored procedures it executes. Query plans are detailed steps as to how to get the results for the query specified. So this would include which tables to query, which indexes to use, how to join or merge the results, and so on. Quite understandably all this magic does not come free.
  • Creating query plans consumes CPU, as the server needs to know what is asked of it, and quite probably run it through tons of algorithms and data structure to create a reusable generic plan, which is going to fetch the results.
  • This means that, if SQL server were able to reuse the plans it creates, quite a bit of savings in terms of CPU consumption could be had whenever it runs the same queries again and again. Therefore, it follows that

SQL SERVER TRIES TO CACHE ITS QUERY PLANS

What does this imply for applications and administrator?

  1. Cache means store. So if you are going to store the plans (in memory of-course) then that means that your memory requirements are going to go up. However if that saves CPU consumption that is good.
  2. However you have a problem, if the store grows uncontrollably or if the savings in CPU does not measure up to your memory costs.
  3. You can measure what amount of memory is used by the procedural cache from Perfmon. SQLInstance – Buffer Manager – Procedure Cache Pages (8 Kb pages). Do check out the other counters in the Buffer Manager object to get a relative measure of how much memory the ProcedureCache occupies wrt other pages
  4. Most installations face this problem. i.e. the memory that SQL server consumes for the query plan goes up and unless you know exactly what you are looking for, its hard to keep this under control, much less account for what eats up this memory. Therefore,

IT’S IMPERATIVE THAT ONE UNDERSTANDS THE QUERY PLAN CACHE

How does query caching work?

  • Exec “Proc_Get_Orders”
    1. Plan for the stored procedure “Proc_Get_Orders” is created once and used multiple times.
  • sql_string = “select * from orders where customerid=” + customerid
    Execute(sql_string)

    1. Plan gets created for the string ‘select * from orders where customerid=1’
    2. Plan gets created for the string ‘select * from orders where customerid=2’
    3. ….ad -infinitum

    SQL server creates n plans for n ad-hoc queries. THIS is what bites you. THIS is what contributes towards bulging memory pages for the query plan cache. To control this growth either switch to stored procedures or switch to prepared statements with parameter place holders, which has the added benefit of preventing SQL injection attacks.

  • Prepared statementssql_string = ‘select * from orders where customerid=?’AdoCommandObject.CreateParameter(customerid)
    AdoCommandObject.AddParameter()
    AdoCommandObject.Execute(sql_string)

    Plan gets created for the string ‘select * from orders where customerid=?’Each time this query is executed using parameters, the cache plan gets reused, with the customer id as the varying input.

How does one see what is inside the plan cache ?

  • Or in other words how can one verify that these recommendation works?
  • Use these commands to display the contents in the plan cache (syscacheobjects)
  • The great thing about this is that you can actually see the the exact text of the plan which is in the cache – talk about accountability !!!
  • query_cache_plan
  • The key in understanding the output is the usecounts. Good applications will have a high ratio for the reused plans.
  • Note : Its easy to see from this output how much memory is wasted due to a particular ad-hoc statement. You could calculated the amount of savings by converting a particular module to use prepared statements by adding up the memory usage counts.
  • Use the MSDN explanation of syscacheobjects to further understand what the different fields signify and to find out other issues that may be plaguing your application.

Once you are done optimizing away all your sql statements, the memory growth or leak if you would prefer, from procedural cache would be well under control for your application. Happy optimization !!! 🙂

For further detailed reference –

Technet Article on MSDN

Minding Memory, from Microsoft’s customer Advisory Team

Next Page »