Cisco NetManager


What is the yard stick by which you decide the trade offs while designing features? This post is an attempt to look at two radically different point of views regarding the trade offs in designs vis via an ongoing feature implementation and another example taken from Vista’s user interface changes.

The feature

Our network management tool, Cisco NetManager, can perform fault monitoring and notification for network devices. One such fault it can sense, are the link down states for the individual ports, which is basically when the particular network ports appear to be off-line or non operational.

event-list

Quite understandably Link Down is an important event to be sensed and this is done in three different ways by NetManager.

  1. By listening to ad-hoc Link-Down SNMP  traps sent from the device.
  2. By periodic SNMP based polling of all devices, which senses that a particular port is down
  3. By CDP based discovery process that runs every once in a while to discover the physical connectivity of the devices being monitored.

Issue – Flood of Correct but Un-Required Messages

As a result of these fast multiple port status detection methods, network port status changes are sensed almost in real time by NetManager. However real networks have switches and routers with plenty of un-used ports which appear to be down, simply because they are not in use. Users get inundated with notifications about the down ports of such devices, whenever notification sends down the initial list of failures it has detected initially after install. For big networks, this runs into many thousands of useless but correct alerts and needs to be controlled.

The New Requirement

We are planning to correct the above issue of un-necessary alerts in the upcoming version, which aims to cut down on this un-used ports traffic, by sensing un-used interfaces and not have any alerts reported on them. What follows is the list of facts we have considered in designing this feature and how we plan to accommodate them in the final design.

  1. Other Network Port Related Features –The next release of NetManager will also include a voice utilization feature for voice gateways, which can be used to properly size the available bandwidth using Erlang based calculations. For this reason, some of the ports discarded by the current code would be required to be processed.
  2. Only Specific Ports are Currently Alerted – Not all ports are considered for alerting. This is because many port types really do not have a strict up down status or they might translate to other status types frequently, depending on the nature of operations. For this reason, the current set of valid ports are hard coded into the source code, which is really a bad programming practice. One goal of the current set of modifications is to make this list of types looked up from some external source in an easily configurable manner.
  3. Future Requirements – There is a definite possibility that some customers might not want the same type of ports to be alerted upon by default, for reasons specific to their deployment.The challenge is to consider (not implement) this requirement along with the current feature so as to make the design flexible enough when the time comes for implementing it.
  4. Familiarity with Code – An additional advantage of expanding the design at this stage, rather than later is because this promotes familiarity with the relevant module and affords easier modifications than if the code base where to be freshly looked at during the next code cycle, after a gap of 6-7 months.

Designing for Configurable Network Port Types

The complete design, which makes the default list of network ports look up based would ideally

  1. Make the default list of port types configurable AND
  2. Make the alerting for the individual ports customizable, and based on user preference for the specific port.

Design Side Effects

Lets inspect what happens if the user changes the global alert-able setting on any port type.

  • What happens to devices with ports of the same type, that the user has already configured?
    • Leave the customized ports alone – in this case ports of some devices would show a different behaviour compared to other devices.
    • Change the alert settings irrespective of individual device settings – here the user would lose all the customizations he has made and the expectation of how a devices configured previously behaves would be violated.
  • What happens if the system does not make any changes for devices that are already present and instead affect only the new devices that will be added in the future – in this case some devices would behave totally differently from other devices and lead to confusions overall.

No matter which option we chose, we will be left with in-consistencies in the way system behaves wrt to the alert-able ports. Even though it is the user who changes these settings, in the end they would be left in need of explanations as to why the system behaves the way it does.

Rules Rules Rules

Features that exhibit this sort of behavior, forces users to learn how the system reacts, if they have to be able to use it effectively. When you have multiple pieces of system designed in this manner, with possible inter-relationships to moot, the overall complexity shoots up quite fast.

These sort of designs used to be the norm till recent times. But thoughts about quality in designs, that create simpler and more usable products have been around in the background for some time too, of the type famously pioneered by Apple perhaps, in their design of extremely user friendly and hence popular but at times restrictive products. Here is a sample

XP sound control

xp-sound-controlNotice how the sound control talks about SW Synth and Wave?

What are they and how is it different from the volume? This is the sort of design that’s typically promoted by feature rich craziness.

Designing for Usability – Vista

basic-vista-sound-control Contrast the above with new Vista Sound Control. This, is all that turns up whether you double / single click the sound icon on the Vista task bar. Notice how the other “features” have been cleaned up and the icon made more realistic so as to leave no more doubt regarding what is the intention.

Folks who are more detailed, would notice another click-able option below, termed the mixer, used to bring up the more advanced user interface. Basically Microsoft seems to be relying on the ubiquitous http style user interfacing which everyone already knows and favours, as a  basic program-user interfacing guideline.

Look below for what comes up, if you click on the mixer label.

The Real Feature – Mixer

extended-vista-sound-control

That dialog allows you to control the volumes emitted by the various applications you might have running inside your computer. Now, this is definitely what can be termed as a feature. Another rule, or complexity or menu interface is really not what makes a new feature, but something totally new, that makes life easier for the user, without adding more rules.

Apple’s, famous designs became favored in the same manner, with more simplification rather than complexity / features compared to other players. This unmistakably popular appeal of usability as opposed to pointless features is what i will use to guide our design on the network port configurability issue.

Design for Usability – Network Ports Decision

How do we factor this into the network port customization issue we discussed above.

  • When devices are added, each port will inherit a setting as to whether they are alert-able or not based on  the global defaults. BUT THE DEFAULTS WILL NOT BE CUSTOMIZABLE BY THE USER.
  • The user can override the alert-able setting for individual ports on a device

This solution i believe would be simple enough and give the user a feature that is required, without complicating usability anymore than is required.  It does give a feeling of lost features compared to the full design, as might be the case with the Vista sound control, which lost SW SYNTH, WAVE etc.

But ti feel that the usability improvements and reduction in complexity more than offsets the loss of these minor details more than anything else, for the simple reason that the product is more easily used and less complex and hence is more value for money.


ps : Customer types that require more control can very well tweak these settings themselves by using external scripts that can be provided to achieve this.

ps : Netmanager 1.1 has a notification filtering feature that can decide the type of alerts and the exact group of devices you want to be alerted for.

Advertisements
  1. Ever racked your head over how your devices are connected to each other ?
  2. Ever wondered if the ideas you had in your head still holds after the new guy crossed those cables while you where gone on vacation?
  3. Ever wished you could monitor / administer all the devices you have with a perspective of how your network is organized and connected to each other ?

Cisco NetManager is a network monitoring tool which can help you do this and much more. Here is the flash based dash board that it employs to enable you to manage your devices from a connecitivity orineted view.

ps : This high end patented flash based dynamic view is taken out of its big $$$ CUOM – so you get one thing thats cool in CUOM without the entire CUOM baggage.

NOTE : We are in the process of scaling the CNM tool to handle more devices and more phones and the next version might just have a view thats even more exciting than the current one.

pcv

One cool thing about NetManager is the way data is organized around the concept of portlets and views.

Each view would show you a different set of portlets by default (which you can customize). The typical views are

  1. The device specific view (eg Communication manager attributes)
  2. General Inventory (CPU Memory etc)
  3. Problem Areas
  4. Device Inventory (Flash IOS Modules etc)

Here is a sample. Note that you can –

  1. Move those portlets around and re-arrange how they appear inside the view.
  2. Add other portlets from a multitude of different information sets such that they appear in the view you have customized.

device-view

Cisco NetManager is a Network Monitoring tool, aimed squarely at vastly improving the user experience of Network Management tasks. For folks used to the eye sore of CUOM and its complicated ways of achieving what it does, this would be a breath of fresh air.

Folks who went to Clarus for want of a better user experience might want to take another look at this tool.

The official CNM page is a bit shy about the features we support, and does not clarify the license we have.  So let me try to present all that out here.

Licensing

The items indicated in the pink cells below, in the feature chart, are present only if you add a voice license to the base image. The voice license is called Unified Communications.  The base license is termed Ip Infrastructure.

Downloads and Customer Enquiry

The software is not available as a download and can only be ordered on a CD. (877-204-3975 or e-mail). Or drop me a comment on this page below and i will get back to you.

Features

You can click on the individual features to get a more detailed write up and some screen shots. But they are work in progress. So bear with with me until that time.

Automatic Device Discovery (Ip range scan, file import, Device Classification) Collects Inventory of Device Components (25 types – cpu, memory, disk, services, ports, modules, fans, mail boxes, flash files, etc)
Seriously Cool User-Interface (Web portlet based, customizable, embedded charts) Periodic Fault Monitoring (through repeated polling of components and protocols identified during discovery)

Customizable Fault Notification

(specify what kind-of alerts and for which devices)

Physical Node Connectivity Graphs

Real time trending Graphs
(Used to trend resource consumption like CPU, port usage etc)
Trap + Syslog based Fault Monitoring

User creatable custom Monitors via scripts
(e.g. WMI script to monitor a particular registry key in a remote machine)
Tons of reports (performance, problems , grouped reports (eg CPU consumption of all devices in one page) etc
User Customizable Dynamic Device Grouping via SQL(eg. All devices which has location attribute = NY as “Trading-Servers”) 16 types of protocol based monitors
(ping snmp telnet http NNTP etc)
Phone Discovery + Tracking
Out of the Box support for Multiple Device type Families

(Routers Switches Printer Firewalls, Wireless, Web Servers, WorkStations etc)

Monitoring for all types of Cisco Voice Applications

Latest versions of Unity & CallManagers applications & Voice Gateway Routers
Logical Connectivity graphs
Shows connection between CallManagers, Voice Gateways, Unity and Phones

Current limitation = Can support only 100 devices and up to 1000 phones.

ps : I forgot that to mention the product has its own secret trouble-shooting tool (developed indepedently by its developers) that automates the task of debugging an installation. The tool does not exist because the product has many issues.

Rather it exists so that you, the user, will not have to sit through big and many debug sessions with developers and marketers any longer, unlike other enterprisy products ;).

One of the biggest new addition to the new release of Cisco NetManager is the spanky new notification FILTERING module. What does it have ?

Features

  1. The obvious – You can now create rules to dictate which events you want and for which devices you want them for.

  2. This implementation is real time ie it sends out the event as soon as the polling detects it.(but whne does the polling detect the event? could take a max of 2.5-3 mins on a 100 device 1000 phone system)
  3. Real time means that you can get trap based events intimated at real time (Authentication failures, Call Manager Code Red / Yellow etc)
  4. You have the choice of using a different SMTP server for each rule.
  5. Each rule can have different recipients. So for eg all Call Manager related failures can get alerted to the poor fellow A and all router related warnings get delivered to poor fellow B.
  6. You will not get multiple emails even if you are part of 2 rules that satisfies the same event.

Maintenance

  1. Whenever you get a notification you will be intimated about the rule which caused you to get the email. When faced with a lot many rules and lot many emails you can always find the culprit that caused the tons of emails to be sent out.
  2. You can tweak how much of a load your SMTP server should stand due to high notification loads. You can either allow lots of juice for the notification / ensure your Exchange does not get overwhelmed.
  3. You can configure the timeouts used for the communication between the net-manager and your SMTP server. This allows you to handle slow servers like the ones on the WAN.

Performance

  1. The implementation is quite fast. It has surpassed the Notification engine specs required of the next- gen management tool from Cisco stables intended for use with the ISP market.
  2. The implementation costs nothing CPU wise when there are no events
  3. The implementation is extremely efficient in saving DB load. All it does is mark the event as processed whenever an event is sent out. Rest stays loaded in memory.
  4. Each additional recipient does not cause a new mail to be sent out. Everyone is there in the cc’s so that you know who all received the event.  This is also being kind on the SMTP server resource.

New features customers are asking for [add your responses and i shall update them here and take them up later]

  1. Conditional emailing – send emails to mail id 1 during time 9-10 and send mails to another mail id during other times

You can find the installation (383 KB) for the Cisco NetManager troubleshooting tool, here.  (Warning:  you have to scroll down to find the download link)

The tool

The output

The need ?

Read on …

Ever Experienced this before ?

Day 1

Customer (NewYork)    – “Hyper drive feature doesnt work right”
Tech Support (Sydney) – “Have you switched on the feature ?”
Customer – “Yes i have ”
Tech Support – “Please send the logs from xyz directory”

Day 2

Logs and Installation get verified. Nothing turns up.

Day 3

Developer (Bangalore)  – “Check if the registry entries are present …”
Customer – “Registry what ?”
Tech Support – “We need to look into your box – when can we do it”
Customer – “Cant let you – but i orderd this *@* software – let me get the permissions”

Day 5

Agree to have a meeting

Day 6

Developer – “Feature seems to be started  – What exactly did you modify?”
Translation – Its 11.45 pm and i dont feel too bright

Customer – “err .. not sure – cant you figure it out”
Translation – Hell if i knew – Its been 6 days – i dont even remember if i came to work on that day

Tech support – “Let us get back to you”
Translation – This call sucked

Day 7

Customer – Any F@@@@ idea what has happened ???!!!??
Tech support – I want a new job ….
Developer – I have to attend more customer calls. Somebody help …

Sounds familiar ?

How well does this sort of support scale when you have a software that has shipped to thousands of customers and has a good amount of bugs? I would say you would have a problem in your hands.

Our product, the NetManager, is intended for use with smaller setups and priced as such and hence is expected to do more volumes than pricey “enterprise” class softwares. Therefore the rate at which bugs come in when the tool really starts shipping in volume might be higher during the early stages. Therefore a higher bug fix rate / turaround time is required.

This thought was what got me to collate my multiple tools that existed internally into a single diagnostic tool for use with netmanager.  I had been using many scripts for exporting device list, getting the DB state for reporting purposes etc), which I combined into a single stand-alone program. Thats one reason everyone should be writing automated tools for everything including unit test cases.

Using a tool would ensure that the there is a better chance of capturing the state of problem than when the issue is investigated 3-4 days after the problem has occured. This might increase the chances of the bug being resolved faster.

State information that cannot be duplicated at a later time, like

1. CPU consumption figures
2. State of the disks
3. The logs
4. The database state / event logs

are captured in a timely fashion. This ensures that the developers have a unique window into the state of the system during the time which the problem was observed.

Then again by using a tool we reduce the loops of communication between the user and tech support which is the other factor that contributes to a long delay in the time it takes to fix the bugs. Information like

1. System configuration
2. Software configuration
3. Startup services
4. patches installed

can be easily captured by an automated tool. No bugs should be delayed based on requests for such information that are pften required as a bug fixing session unrolls.

Further using a tool is so much more proffessional and builds good will. The  program also benefits from the lesser interaction with users and tech support which might often rub the users in the wrong way depending on how the communication is carried forward.

On the whole, its one more tool to make life easier for everyone involved. Even for me, who thinks the cost of supporting the tool is well earned in the less no of support calls i might have to take.

Update : The tool might be included it in the next skew

Finally, we have managed to bring to pass one more release of our Network Management tool aimed at small and medium enterprises, Cisco NetManager version 1.1

The major features are

  1. Notification filtering feature that can process large no of events extremely fast and deliver notification to multiple email aliases / smtp servers.  No more event spam anymore and get notified only abut the events you care about / for the devices you need to know about.|
  2. New device support has been added – these are
    • CM 6.1, 7.0
    • CM Business edition 6.1, 7.0
    • Unity Connection 7.0
    • IP Comm 2.1, 3.0
    • CUPC 1.2
    • Cisco Unified Communication 500 aka UC 500 aka 1861 ISR
    • New MCS servers – I2, I3 and H2,H3 MCS server families
  3. Code optimizations – 1.0 could poll about 100 devices using a voice only license, in about 2minutes 20 secs. 1.1 can do the same in about 1 minutes 50 secs, with a combined license that would typically involve about 60-70 extra polling tasks per 100 devices. Some more memory leaks have been plugged too.
  4. Database growth due to log tables and graphed data is being more actively contained and rolled up.
  5. Status of certain components like Fan / temperature and Power supply could have been reported wrongly which is now taken care of.
  6. Auto restart of critical program components upon failure.
  7. Certain device capability discovery and logical view related bugs fixed.
  8. Service status alerting had some bugs which are now fixed.

Evaluation Versions (90 Day) will be soon available in a couple of weeks time.

How to upgrade to NetManager 1.1

  1. Go to http://www.cisco.com
  2. Log in with CCO UserID and Password
  3. Select Ordering on the top navigation bar
  4. Select Ordering Tool
  5. Proceed with the fields required and enter the netManager 1.1 SKU into the ordering tool

if you have issues with orderability or anything else in particular, the quickest way to get response is through CIsco Customer Service.

We are beginning to think about new features for the next versions and are conducting some polls to determine the same. If you are interested and would like to have a feature added, drop a line in the comments.