hassle-free1

When i was asked to create a custom, server   monitoring solution, in a start-up environment,  run by a very wise team, the only requirements list that i received, was a call from my overseas manager.

“Have you seen task manager on XP”
“Yes ?? !!”
“Well, we need the exact same data, except, your process must broadcast it to our messaging network”
“Get cracking on it.”

It was my sole responsibility to figure out

  1. How to extract this data
  2. How to efficiently store and compare successive values for % calculations
  3. How to make the program faster and also consume small memory
  4. Eventually, how to port the code to Solaris & Linux (I received 2 more calls)

The Process

We had frequent commits into the code server, and bi-weekly status checks to ensure things progressed as required. The output of my code was used by  another programmer for his work and occasionally i would be alerted to a bug  from him which i would then fix.

Working Managers

My managers and HIS supervisors where the ones who actually verified this code and made sure the performance and data where correct. The initial effort that these guys put in ensured that i understood the goals of the program from the beginning and what was important vs what was not.

They frequently re-aligned what i developed, with what they thought the customer might want. There was no super designs with a super-human architect / program manager deciding before hand what the customer would like to see. Our program grew in bits and pieces but relevant bits and pieces.

Deciding the next small bit and ensuring coding standards where met and that the output looked neat and clean was what the management did and other stuff that i never cared about / knew. Oh and almost all discussions was over the phone and  emails summarized whatever we discussed.

Formal Testing


Once the size of the program was big enough, (3 processes and 3 operating systems) we had a dedicated person to verify everything was correct and also integrated well. Of-course the managers still did their bit at-least every 2 months.  But basically this single person’s ass was fry if something went badly wrong and he did and damn good job of making sure the stuff i said i had done actually worked.

What did i do – Hands Free

During all this time, all that i did was code, learn, investigate and code again. Of-course the code had to be of the highest possible standard and i had enough time to make this happen.

Lots of Good Work

I ended up creating 3 different agents for 3 different OS’s, collecting ALL sorts of data you could imagine about the internals of processes / memory architecture / networking throughput / files and ports opened / disk sub-system / version-ing information and what not.The agents worked across almost all in-production servers versions and editions of the OSes concerned.

Lots of Education

Additionally all this information was standardized ie to say memory free reported by Linux might not be the same logically for Solaris.  Our agents normalized this information and for this a lot of kernel code peering where available, or reading heaps on internal documentation was required.

Designing all this stuff and constantly striving to improve quality again gave a lot in terms of the learnings. The fact that i had some top notch designers / coders to review and suggest changes made things even better.

Code smells / data structures / design decisions / optimization concerns / OS internals are few of the things i eventually learned.

Quality

On Solaris my agent could beat top in its CPU consumption and on the whole i might have recieved about 2 bugs max overall for the entire code base after it had shipped to the customer. (We had a single huge customer for this program at the time)

Initiatives

In between all this, i had time to work on another other side project (to get real time feeds into excel) and formal investigations like creating a distributed real-time excel prototype (you change a cell in your laptop and anyone viewing the same sheet would be immediately updated) and other works my managers deemed worthwhile to investigate.

Personal Initiatives

I had time in the interim to investogate something i thought was worth doing, on scaling our middle level layers. It never took off, but the investigations tought me a lot about scalability and scales of efficiency in database related work that helped a lot with my current project and current job.

I also ended up creating a prototype trouble-shooter for our stock trading network that listened to different TROUBLE broadcasts our individual processes made and showed them in real time to the administrator in a tree view / LIFO ordering, hashed and searchable overall. That again never took off, but i have fond memories of the same and I’m sure the folks who were in charge still remember the tool. 

Results

Its been over 3 years since i left the place and I’m yet to receive / know of any important bugs affecting the functionality of the program apart from of-course keeping the agents up to date with the release of new versions. My wife joined the same place i used to work and she sometimes tell me the people i worked with, the engineers, over there thought the world of me. That, is good karma any way you look.

The other fact, that got me into writing this blog, from a co-orporate environment, was the amount of code that we did and how hassle free the whole process was, which brings me to the next installment of this series.

Advertisements