This piece of code we are about to discuss was used in production software and i was tasked with reviewing the patch. Try to see if you can spot *any* issues with code. Later we will take a look at the exact observations, which are hugely interesting and instructive regarding system behavior.

The Query

This query retains top n records of a particular type and deletes the other rows in the table of the same type from a Sybase database.

for (int i = 0;i < ListOfTypes; i++)
ExecuteQuery “delete from table where PrimaryKey NOT IN (select top n PrimaryKey from table where Id = i) AND Id = i”

Observation – No Data to be Deleted in the database

  1. CPU consumption during query execution is negligible when there are matching rows to be deleted. The fact that this code is issuing multiple dynamic queries which have to be parsed does not seem to be too much of an issue for Sybase.
  2. 2000 queries took around 3 seconds overall at around 12% of CPU cost on a Xeon box.

Observation – delete finds rows for deletion

  1. With data to be deleted, the query took 5 minutes when there was valid data in 30 loops, that is over 5 hours when deletes are extrapolated to be repeated 2000 times.
  2. The CPU consumption figure remained the same.

Why are Database Deletes So Slow ?

Relational databases tracks everything using logs or otherwise called Transaction Logs because they record all the transactions aka changes that happen to the database. The ACID nature of transactions are guaranteed using these logs. So when you make a change to the database, a LOG is written to disk.

The actual change is only made in memory and not written to the disks immediately, for efficiency sake. But Logs are always written promptly and without fail, so that if the machine crashes or rollback is required, this log can be used to reconstruct the state of the database with minimal loss of data. This means that every time you make an individual delete you are going to make the disk spin and disks spinning are horridly slower that anything else you can think about.

How slow can the disks be ?

Numbers from a talk by Jeff Dean (works on both MapReduce and BigTable) of Google, as reported here

Comparative slowness aka latency in terms of CPU cycles

Floating point calculations = 0.25 to 3 cycles
L1  cache access                       = 3 cycles
L2 cache access                       = 14 cycles
Dram                                            = 200 cycles
Flash                                            = 3 Million cycles
Disk                                               = 15 Million cycles

Comparative Slowness aka Latency in absolute time

L1 cache reference                                                        0.5 ns
Branch mispredict                                                             5 ns
L2 cache reference                                                             7 ns
Mutex lock/unlock                                                        100 ns
Main memory reference                                              100 ns
Compress 1K bytes with XXX                              10,000 ns
Send 2K bytes over 1 Gbps network                 20,000 ns
Read 1 MB sequentially from memory          250,000 ns
Round trip within same datacenter                500,000 ns
Disk seek                                                              10,000,000 ns
Read 1 MB sequentially from network    10,000,000 ns
Read 1 MB sequentially from disk            30,000,000 ns
Send packet CA->Netherlands->CA      150,000,000 ns

Can you speed up the disks?

SCSI disk array on a server

SCSI disk array on a server

You bet !!. We tried to speed up this whole experiment by moving the tests from an IDE based system (with 7500 rpm drive) to a system with a single (the picture has six – no we used just one of them) 15k RPM SCSI disk + SCSI controller (no on board memory).

The results from moving to SCSI system –

  1. The delete time improved by a factor of 4-5 times and the worse case average time period dropped to around 133 minutes from 5 hours.
  2. However with the SCSI system in use, the CPU consumption during query execution also jumped by the same amount, by 4-5 times because there are fewer blocks due to disks and therefore more query execution (4-5 times worth) is done by the DB engine.

As you can see, it would be really easy for us to calculate how many disks we would need to speed up the deletes to a manageable time. I might have to go with a couple of SCSI disks in a RAID configuration and try to achieve the speeds that i want.  At the same time the database engine’s CPU consumption would also have to be separately addressed such that it does not have to do much work in finding the rows to be deleted.

Is there any way to improve this ?

  1. Use Transactions –

    SQL server engine makes transactional records only as part of a single transaction. Not sure how this works vis via Java and sybase engine using which this query was tested. So if you ensure all your changes are made as part of a single transaction the disks would be less affected than if it was done otherwise.

  2. Do not use a database for real time stuff
    An even more optimal solution will be to not rely on the database at all and instead do all this in a specialized in memory component in a thread safe hash or something equivalent. Use database only as a slow data backup for being used during failure recovery. This would give the incomparable throughputs compared to any RDBMS / disk based solution.

My suggestion to improve CPU (untested)

  1. Move the entire loop into a stored procedure
  2. Run a select to get the top 20 and extract the ID of the bottom one
  3. Run a delete to delete all ID > ID extracted.What are the advantages – You remove  possible query compilation for the execution of 2000 queries and at the same time, remove usage of possibly highly inefficient NOT IN operator. This needs to be verified though.
  4. Remove all foreign key references to the current table – but of-course that would be going against the grain of RDBMS concept itself. However for desperate situations, this might work ….

What was implemented in the end?

Well, the production software had this delete sequence running every 30 mins. The no of events this system claims support to is only upto 25 events per hour.  So at this rate, there would be no pile up and no big deletes happening. So we dont have a problem !!!!