The other day, while automating our load tests, (you heard that right – we are even automating load tests as part of agile efforts) i came across an interesting scenario.

Load runner, which we where using to automate the clustered web application, spread all its output across tons of small files, containing one/many similar lines. The 2-3 lines each file contained would look something like this

Report.c(199): Error: App Error: refresh report. DataSet took longer than the refresh rate to load.  This may indicate a slow database (e.g. HDS) [loadTime: 37.708480, refreshRate: 26].      [MsgId: MERR-17999]
Report.c(83): Error: App Error: report took longer than refresh rate: 26; iter: 32;  vuser: ?; total time 41.796206    [MsgId: MERR-17999]

But what we wanted to know to know at the end of the test was a summary of errors as in

  • Report.c(199): Error: App Error: refresh report. DataSet took longer than the refresh rate to load.  This may indicate a slow database (e.g. HDS) – 100 errors
  • Report.c(200) Error Null arguments – 50 errors

Approaching the solution

How would you solve this problem ? Think for a moment and not more ….

Well being the C++ junkie that i’m, i immediately saw an oppurtunity to apply my sword and utility belts of hash-tables / efficient string comparisons etc and whipped out Visual Studio 2010 (i havent used 2010 too much and wanted to use it) and then it occured to me

Why not use Java ? I had recently learned it and it might be easier.

The Final Solution

cat *.log | sed  ‘s/total time \([0-9]*\)\.\([0-9]*\)/total time/;s/loadTime: \([0-9]*\)\.\([0-9]*\)/loadTime: /;s/iter: \([0-9]*\)/iter: /;s/refresh rate: \([0-9]*\)/refresh rate:  /;s/refreshRate: \([0-9]*\)/refreshRate:  /’ | sort | uniq –c

I probably should optimize the sed list of commands such that i can pass in the replacement commands as input to sed from a file. But i left it at that, happy that my solution was ready to use / would work all the time / does not require maintenance / saved me a lot of time.

The end

 

Thinking about the episode, i was reminded of the hit Japanese samurai movie Zatoichi and its happy ending at the end.

  • The villain (problem) is dead
  • Samurai is happy with his effort
  • The subjects (files/compiler/system/software/hardware) are happy due to the efficiency created
  • Karma (company + job) is happy due to the time saved.

and to boot it nearly follows the Art of War zen which advices the clever warrior to win without pulling out his sword …

 

Happy coding !!!

 

 

I was reading the VC++ team blog ten minutes back and took their poll, inputs from which would go to improve the next Visual Studio release.

The interesting question (one of many) from the survey, which prompted me into writing this post is as follows –

How much of your time do you spend on each of the following?

  • Defining the problem
  • Requirements gathering
  • Designing solution
  • Writing code
  • Building code
  • Refactoring code
  • Debugging code
  • Writing tests
  • Testing
  • Deployment Support

The options against each are, (Not at all, some, About 50% and A lot). There can be overlaps. So what would be your answer?

Here is mine –

  1. About 50% time designing the solution
  2. About 50% time writing code (Base code writing time)
  3. About 50% time refactoring it  (I always keep improving the structure)
  4. some time testing it    (I write a lot of functional tests to test my code)
  5. some time debugging (I pride myself in lowest bug counts ever in most teams i ever worked)

What does your distribution look like?

ps : If you are in big corporation, your time spend overall for coding related activities above might be only 30-40% of the overall time you have.  This poll is meant to measure the activity spread within that little amount of time you really get to work with the code.

If you ever needed a reason to enforce clean code – here it is, right out of MIT and researched to the hilt. The video is extremely interesting from a behavioral standpoint and extremely educative. As highly recommended as the famous Steve Jobs video.

I can vouch for what is being said because i have seen this happening many times in development. The implied up shoot  ?

Clean code bases beget new code which is cleaner or clean enough. Dirty code bases attract more code which is at least just as dirty or usually even more so.

In the first installment of this four part series i essentially described how a good pure play software company operates and how the process or rather lack of process enabled much code to be created efficiently.

Compare this to any / most corporate environments. In this post i want to detail how software development happens in such a highly charged but disabled environments. This would also be a scary but educational read, for any un-initiated newbies who want to shift into such an environment but does not know what to expect.

helpless

Requirements

After the *soft* in-principle decision to create a product is made, a master list of un-meet-able requirements are made. This is called the PRD, aka the Project Requirements Document.

Evaluate Requirements (1 month)

The managers task their teams to evaluate the PRD’s and come up with an estimated time and the doable items. The developers then go into an investigative mode, which again is loosely time bound. The developers come up with some arbitrary numbers after a rather relaxed period.

Estimation (1 hour – 1 day)

These numbers might get shortened or not depending on what effort, the manager arbitrarily decides the task is going to take. These numbers are informally communicated and then a rough priority of items that are doable is decided. Once this rough estimation is done someone gets tasked with creating the final list of items that will be taken up.

This phase takes a minimum of one month to an average of 2 months to complete.

SFS & SFS Review ( > 1 month)

The new task list is communicated in a new document called the SFS (system functional spec). During all this time meetings keep happenings twice a week between developers, and between development, management, program management and marketing to evaluate the concerns and overall progress.

The SFS once completed, is sent out for review where developers testers and anyone else remotely connected with the program take a dig at the rationale behind the features, the implications and so on. In short everything that be criticized will be criticized and the necessary concerns addressed and the items addressed in the document. This phase generally takes over a month to complete.

Executive Decision (2-4 weeks)

Once SFS has been reviewed over and over again, the program starts moving towards what is known as the EC (Execution Commit) which is a final go that the management has to get, from the upper echelons who ar ready to fund the program, for the features they are going to commit (aka promise) for the release.

As EC dates approach (and keep slipping), everyone goes into an overdrive to actually start more investigations into the features they are supposed to investigate and more issues crop up all over the place which in most likely hood, delays the EC plans. After a few number of delays, the SFS is finally frozen and the EC happens. Most of the SFS reviews and answers actually happens towards this period.

Committed ie Promised features have to be released without fail and hence everyone including managers make sure they consider the features only by adding liberal amount of buffer zones in their individual plans to make failure a low probability affair.

Design ( < 1 month)


Once EC is made, the design starts and this is when the actual troubles that a feature could run into are encountered (assuming the design is properly done). This might cause further delays as the issues are communicated back and forth to the management. The manager ensures that the teams actually do the designs by ensuring design review meetings happen. During this time the testing teams also start writing test cases, which the developers responsible for the modules are supposed to review and ok, apart from being reviewed by their own peers.

This phase can take from two to four weels overall. When controversies erupt, the design could be stretched to another two weeks.

Coding (2-3 months)

Once this phase is over, a hectic phase of coding starts, which is never more than 2-3 months on an average.

Developer Testing ( > 1 month)

Once this short affair is over development testing / integration testing etc starts which one or two of the developers do, to ensure things are fine before handing over the first cut of the product to the testing team.

During this phase not much happens with other developers and they sporadically fix bugs that appear off and on from their individual modules. The integrated pieces are handed over to the testing team, who now starts with each of their test cases and again report issues, which basically reflect the health of the system. Many of the issues come up during this phase and there occurs some or  more delays depending on how sloppy the coding was. One more release is then made within a short span of time addressing the issues discovered and finishing whatever features where unfinished from the first release.

This developer done testing and fixing phase takes well over a month totally.

Formal Testing (2 months)


Next the formal testing backed by formal test cases, starts in full earnest and a HOST of issues are discovered. The issues that consume the most amount of time are those that are related to system performance and this drags on quite a while until the test team is finally pressured into releasing the product and sweep some of the critical issues under the rug to at-least enabled a delayed shipping.

The no of emails that get sent during this phase and the meetings that occur between the test and the development teams, haggling over the actual nature of the issues observed are a tumulus affair. The entire testing phase takes up at least 1/3rd of the entire development time so you can imagine how long this affair drags.

In the meanwhile folks who have done their work fast and in a high quality manner can actually relax and drag their feet while the rest of the modules come up to speed. Most of the documentation effort starts towards the end of this phase and gets done in a few days time.

Release(1 month)

Once the entire affair is finished, the release is finally ok-d and then again starts a phase where the CD, licenses and other nitty gritty project management tasks are taken up and completed. Chalk up another month for this to finish.

Next Phase = Next Year


Once this is completed a new release and new PRD is announced which is circulated and the whole routine starts all over again. In my experience only 2 such phase ever gets completed in one year with a little bit of the next release overlapping into the same year.

Total amount of time spent coding = 2-3 months.

No matter how you look at this, there is no way a company that works on these lines can produce new code or new features that matter without a HUGE amount of time.  Some companies beats this by acquiring small companies regularly and customizes these well written products in ways it thinks suitable. Sometimes, the results are terrible but at times they are good. But hardware also folks can get away with this, since the over-priced items can always be packaged along with hardware solutions and sold to enterprise-y folks.

But never would a pure play software firm be able to survive on these lines. For that matter, never would an enterprise, be able to create quality products the same way that these other startup firms / pure play software firms do.

Key observations

  1. Coding happens for a max of four months an year
  2. Design happens for a maximum of 2-4 weeks an year and estimation takes only a day or an hour at the max.
  3. Documents are generated at each phase mostly as a way of keeping tabs
  4. Since these documents are control points, they need to be reviewed.
  5. Since review is work, and by reviewing you are agreeing to whatever is being said, no one individual is allowed to to do this. Instead the “team” does this, which forms a type of shared responsibility.
  6. Creating the SFS and design documents, reviewing the said documents, coding and some amount of formal testing is done by the engineers themselves.
  7. Since reviews are public and everyone reviews everyone else’s designs, there is a certain softness in the design comments. Maybe this applies only to India but its usually common to let pass docs that are even mediocre to a point due to a softness in cultural values that prevent one from being rude to one’s colleagues. No one but managers therefore can really therefore enforce design quality after a certain basic level.
  8. No formal time is allocated for design reviews

Implications of all these on the way an organization engaged in product development are many. But for this series, we are interested only on what the low levels of coding time imply. That brings us to the final post in this series.

hassle-free1

When i was asked to create a custom, server   monitoring solution, in a start-up environment,  run by a very wise team, the only requirements list that i received, was a call from my overseas manager.

“Have you seen task manager on XP”
“Yes ?? !!”
“Well, we need the exact same data, except, your process must broadcast it to our messaging network”
“Get cracking on it.”

It was my sole responsibility to figure out

  1. How to extract this data
  2. How to efficiently store and compare successive values for % calculations
  3. How to make the program faster and also consume small memory
  4. Eventually, how to port the code to Solaris & Linux (I received 2 more calls)

The Process

We had frequent commits into the code server, and bi-weekly status checks to ensure things progressed as required. The output of my code was used by  another programmer for his work and occasionally i would be alerted to a bug  from him which i would then fix.

Working Managers

My managers and HIS supervisors where the ones who actually verified this code and made sure the performance and data where correct. The initial effort that these guys put in ensured that i understood the goals of the program from the beginning and what was important vs what was not.

They frequently re-aligned what i developed, with what they thought the customer might want. There was no super designs with a super-human architect / program manager deciding before hand what the customer would like to see. Our program grew in bits and pieces but relevant bits and pieces.

Deciding the next small bit and ensuring coding standards where met and that the output looked neat and clean was what the management did and other stuff that i never cared about / knew. Oh and almost all discussions was over the phone and  emails summarized whatever we discussed.

Formal Testing


Once the size of the program was big enough, (3 processes and 3 operating systems) we had a dedicated person to verify everything was correct and also integrated well. Of-course the managers still did their bit at-least every 2 months.  But basically this single person’s ass was fry if something went badly wrong and he did and damn good job of making sure the stuff i said i had done actually worked.

What did i do – Hands Free

During all this time, all that i did was code, learn, investigate and code again. Of-course the code had to be of the highest possible standard and i had enough time to make this happen.

Lots of Good Work

I ended up creating 3 different agents for 3 different OS’s, collecting ALL sorts of data you could imagine about the internals of processes / memory architecture / networking throughput / files and ports opened / disk sub-system / version-ing information and what not.The agents worked across almost all in-production servers versions and editions of the OSes concerned.

Lots of Education

Additionally all this information was standardized ie to say memory free reported by Linux might not be the same logically for Solaris.  Our agents normalized this information and for this a lot of kernel code peering where available, or reading heaps on internal documentation was required.

Designing all this stuff and constantly striving to improve quality again gave a lot in terms of the learnings. The fact that i had some top notch designers / coders to review and suggest changes made things even better.

Code smells / data structures / design decisions / optimization concerns / OS internals are few of the things i eventually learned.

Quality

On Solaris my agent could beat top in its CPU consumption and on the whole i might have recieved about 2 bugs max overall for the entire code base after it had shipped to the customer. (We had a single huge customer for this program at the time)

Initiatives

In between all this, i had time to work on another other side project (to get real time feeds into excel) and formal investigations like creating a distributed real-time excel prototype (you change a cell in your laptop and anyone viewing the same sheet would be immediately updated) and other works my managers deemed worthwhile to investigate.

Personal Initiatives

I had time in the interim to investogate something i thought was worth doing, on scaling our middle level layers. It never took off, but the investigations tought me a lot about scalability and scales of efficiency in database related work that helped a lot with my current project and current job.

I also ended up creating a prototype trouble-shooter for our stock trading network that listened to different TROUBLE broadcasts our individual processes made and showed them in real time to the administrator in a tree view / LIFO ordering, hashed and searchable overall. That again never took off, but i have fond memories of the same and I’m sure the folks who were in charge still remember the tool. 

Results

Its been over 3 years since i left the place and I’m yet to receive / know of any important bugs affecting the functionality of the program apart from of-course keeping the agents up to date with the release of new versions. My wife joined the same place i used to work and she sometimes tell me the people i worked with, the engineers, over there thought the world of me. That, is good karma any way you look.

The other fact, that got me into writing this blog, from a co-orporate environment, was the amount of code that we did and how hassle free the whole process was, which brings me to the next installment of this series.