Computerworld

SLA Enforcement Tools to the Rescue

FRAMINGHAM (04/03/2000) - If Michael Jordan, Bugs Bunny, Candace Bergen and Paul Reiser advertised a nickel-a-minute rate for T-1 and high-bandwidth frame relay connections, you wouldn't have to constantly monitor your remote links to make sure you're getting your money's worth from your WAN provider. You could afford to lease extra lines for increased capacity, quicker transaction response times and backup purposes. Unfortunately, linking remote sites via T-1 or frame relay is still a relatively expensive proposition.

Until linking remote sites becomes a reliable and cheap process, you need monitoring tools to verify you're getting the service levels your WAN provider promised in exchange for your WAN dollars. Typically, a service-level agreement (SLA) between you and a provider identifies the quality of service (QoS) you should expect. SLAs are written contracts guaranteeing an availability uptime percentage and minimum bandwidth for specified IT-based business processes such as e-mail, groupware, e-commerce and industry-specific business applications.

Your WAN provider may show its compliance by sending you monthly or even weekly reports. However, historical logs aren't enough. When the inevitable yet unexpected slowdowns or intermittent outages occur, quickly identifying a broken or sick WAN link is the first step toward getting it fixed. Frame relay places a special burden on monitoring systems because a single physical interface may fan out to many remote sites. From this perspective, frame relay's complexity cries out for management. Moreover, understanding its complexity and arcane terminology takes considerable effort. For example, a Data Link Connection Identifier (DLCI), which defines a permanent virtual circuit (PVC) end point, denotes each remote site. Frame relay receiving devices send Backward Explicit Congestion Notifications (BECN) to tell transmitting devices to initiate congestion-avoidance procedures. Each PVC may have unique bandwidth (for example, committed information rate, or CIR), priority and QoS metrics.

To help you find the best tool, we invited vendors to submit SLA monitoring products for this review. We specified that a tool must be able to monitor WAN links in a heterogeneous environment across a variety of vendors' hardware devices. The product has to work independently of any underlying systems management framework, although integration with that framework would be a plus.

Visual Networks Inc. sent us two DSU/ CSUs, an Internet probe and a Dell OptiPlex G1 computer preloaded with Visual UpTime 5.2.3 software. Concord Communications Inc. submitted its Network Health software with FR Module 4.5.1, and Lucent supplied us with its VitalSuite 7.1 software. NetReality Inc. delivered a WiseWAN 200 probe and WiseWAN 3.1.1 software. Paradyne Corp. shipped its OpenLane 5.1 software and two FrameSaver SLV DSU/CSUs.

In addition, NetScout provided us with two NetScout probes and NetScout Manager Plus software. We used the probes in our tests of the other products and along the way examined the NetScout Manager Plus reports to verify the current configuration and health of our network. We didn't grade NetScout Manager Plus because it's not an SLA monitoring tool but rather a general-purpose network monitoring tool that works with NetScout probes and other vendors' SNMP devices to produce useful network status reports.

Many of the software vendors provide interfaces to each other's reporting modules, and many of the hardware vendors have partnered with the software vendors to increase the number of ways customers can view and manipulate a hardware device's statistics. For example, Paradyne will put a copy of NetScout Manager Plus in the box with its DSU/CSUs if you wish. NetScout recommends the use of Paradyne's DSU/CSUs with its software (as well as the company's own probe hardware, of course), and NetReality's WiseWAN software can export data into Concord's Network Health.

With its soup-to-nuts approach to WAN monitoring, Visual UpTime proved the best SLA monitoring tool in our evaluation and takes home our Blue Ribbon Award.

Other tools, such as Lucent's VitalSuite and Concord's Network Health, were better at overall network monitoring. But strictly from an SLA-compliance standpoint, Visual UpTime is our clear choice for keeping WAN links up and running at the lowest possible cost.

Private Network reporting for duty, sir Visual UpTime's Visual Service Advisor component gave us exactly the data we needed to track the SLA compliance of our WAN links. Its display of round-trip delay, throughput and availability statistics was timely and incredibly accurate. Visual UpTime is the perfect tool for understanding WAN link bandwidth usage, determining SLA compliance, isolating WAN link faults and monitoring frame relay expenses.

Visual Service Advisor presented current and 14-day moving averages for several key SLA-compliance criteria, such as frame delivery (the ratio of successfully delivered frames to total frames offered, excluding traffic offered above the excess burst rate); elapsed time for a test packet to traverse the network; PVC uptime; and PVC throughput (delivery success above and below the CIR).

Visual UpTime's reporting of real-time and historical metrics by PVC provided just the right level of operational detail. Additionally, a well-chosen summary of those metrics gave us an executive summary we'd be proud to hand to a CIO or CEO. Because Visual UpTime stores network device data, thresholds and statistics in the included Microsoft SQL Server 7.0 database, we were able to create additional customized reports for our own use by simply invoking standard query tools to mine Visual UpTime's database.

The product also offers full packet capture and protocol decode as well as Visual Burst Advisor, a useful bandwidth planning feature. Visual UpTime did a superior job of helping us correctly determine the bandwidth we needed for our WAN links. With a fine granularity that let us see and account for even the shortest bursts of activity, it accurately measured WAN usage to show our peaks and valleys of data transmission as a percentage of each port's speed and each PVC's CIR. Visual Burst Advisor used this information to display a recommendation for the bandwidth we should have for each WAN link.

Measuring and analyzing WAN delay is one of Visual UpTime's strong suits. On a per-circuit basis, the WAN Delay tool determined end-to-end delay times in a nonintrusive manner. From one WAN link telephone company network interface to the other, the tool separated the customer premises equipment latency from the WAN latency to reveal the exact WAN delay we experienced in our WAN links.

Finding the OpenLane

Paradyne's OpenLane is an excellent Java-based SLA monitoring tool if you pair Paradyne's FrameSaver SLV DSU/CSUs at both ends of a WAN link, but it's limited in its ability to detect and display detail from other vendors' frame relay equipment. In our tests, OpenLane's Network Navigator component correctly discovered the FrameSaver DSU/ CSUs as well as the NetScout probes and Visual DSU/CSUs. However, for non-Paradyne devices, OpenLane's accuracy suffered. For example, the detailed Local Management Interface statistics report, Link Integrity report and PVC Congestion report didn't show counts of latency problems, signaling errors, frame errors or dropped frames for the Visual DSU/CSUs.

When used in pairs, the FrameSaver DSU/CSUs intelligently coordinate with each other to measure and record SLA-significant data such as latency and dropped packets. The units monitor the active frame relay link, out of band, to ensure that what's sent is exactly what's received, and they note the slightest discrepancy. (A WAN link with different vendors' equipment at each site doesn't yet offer this level of detail.) The paired devices shared the information with each other, which let us monitor the link from either endpoint, and they made it available to tools such as OpenLane and NetScout Manager Plus.

Additionally, we were able to load SLA performance parameters directly into the FrameSaver SLV units. With OpenLane, we viewed the DSU/ CSUs' transmitted alerts for each specific WAN link. With rather fine granularity, each device tracked the CIR by packet and recorded SLA-aware PVC statistics for both directions of the link, which OpenLane also displayed.

The FrameSaver unit includes frame relay diagnostic tools, such as a nondisruptive PVC loopback for testing and verifying DLCI configurations. The unit offered a direct interface to Concord's Network Health software package, which let us combine these reviewed products in yet more interesting ways.

For storing network device, threshold and statistical information, Paradyne bundles the CloudScape relational DBMS with OpenLane. It can also interface with an Oracle database. Both storage mechanisms worked well in our tests, with Oracle8i naturally offering greater scalability.

Scheduling OpenLane or NetScout Manager Plus reports also satisfies the daily need to download FrameSaver's 24-hour statistics buffer to avoid losing its accumulated performance and utilization data.

Complex equations

Lucent's VitalSuite is complex software for monitoring complex networks. Like Network Health, it's a general-purpose network-monitoring tool that includes components for tracking SLA compliance. The suite consists of VitalNet, VitalAnalysis, VitalHelp and VitalAgent.

From desktop machines on which you've installed VitalAgent and SNMP-based devices, VitalNet gathers information it relays to VitalAnalysis and VitalHelp.

VitalAnalysis performs service-level monitoring and historical analysis of system and application performance and trends. It maintains a year's worth of data in the included Sybase database or, optionally, in a Microsoft SQL Server database you buy separately. VitalHelp assesses the health of TCP/IP-based applications. When it determines the cause of a problem, it posts alerts to a network administrator.

The Network Heat Chart is a VitalSuite tool that's particularly useful for tracking SLA compliance. A historical report of availability and response time data, the Heat Chart provides a visual, high-level summary of network quality for five VitalNet resource classes: routers, WANs, LANs, frame relay links and ATM links. The report shows the performance of devices within each resource class, characterized by availability, utilization, congestion and errors. Each Heat Chart cell corresponds to a resource class and a performance metric. As with WiseWAN's WanXplorer, Heat Chart cells change colors depending on the health of the underlying resources that comprise each of the corresponding resource classes.

Health check

If you like customizable, flexible and useful reports on network activity, Concord's Network Health is for you. The product excelled at discovering all devices on our network, and its frame relay module efficiently and accurately collected network statistics from the DSU/CSUs in our WAN links. Automatically analyzing each WAN circuit for traffic congestion and packet discards, the frame relay module recorded SLA-sensitive activity and stored the result in the Computer Associates OpenIngres database bundled with Network Health.

Network Health's at-a-glance reports provided an overview of the WAN links using thumbnail graphs. Clicking on each graph let us drill down for more detail. More useful reports on trends and exceptions were available after we had run Network Health over time to generate baseline data. The reports made useful comparisons between current data and the accumulated baseline. For consistency, each report used baseline data for the same time, day of week and same class of devices. After its first poll of the network, we told Network Health the speeds of the frame relay devices, including CIR and burst rates.

Thereafter, monitoring for SLA compliance was just a matter of scheduling the reports we wanted to see. The service-level reports combined and presented daily and long-term reports for various user levels, ranging from an IT manager to an executive. Each report provided generalized and specific information about the WAN links.

Shaping up with WiseWAN

In addition to monitoring WAN links, NetReality's WiseWAN software and WiseWAN 200 probe can shape them. Shaping is a euphemism for WAN traffic control and bandwidth allocation based on settable parameters. The parameters express how much bandwidth the probe should make available to different kinds of traffic.

The WiseWAN 200 probe uses the parameters to line up, prioritize, sort and retransmit packets so more important traffic comes out of the unit first. By specifying corporate policies regarding which application's network packets (identified by protocol, packet type and port) are important, we were able to dynamically prioritize traffic flow. For example, we were able to force less-important packets, such as Microsoft Exchange e-mail traffic, to take a back seat while allowing critical application traffic, such as SAP R/3 or Oracle database transactions, to always have a green light - that is, be the first packets to cross the WAN link. WiseWAN 200's Adaptive Circuit- based Shaping algorithm detected intervals of congestion in the high-speed link and managed traffic accordingly. We configured the probe with NetReality's Java-based WanXplorer client software. Thereafter, WiseWAN 200 automatically found the available DLCIs and began controlling the flow of WAN traffic. Of course, the shaping feature is only effective if your bursts of traffic are a mixture of low- and high-priority packets.

WiseWAN's standard reports show the health of the WAN link, top DLCIs and DLCI utilization. Network protocol distribution reports reveal the relative traffic levels of WAN protocols. WiseWAN's History reports show activity for longer intervals, while the Typical reports present daily or weekly averages for simple trend analysis. The primary SLA reports are Line Availability, SLA Breaches Summary and SLA Breaches Details. Other SLA-related reports include Line Statistics, DLCI Traffic by Bandwidth Consumption, PVC by CIR Load, DLCI Performance and Response Times. Because the reports relied on data from the WiseWAN 200 probe, they, like OpenLane's reports, contained less data and were less useful for circuits managed by other vendors' equipment.

WiseWAN's alarm feature can be set to notify you, for example, when WAN link congestion occurs, a link fails entirely or link utilization increases beyond a threshold you configure.

WiseWAN identifies and summarizes the different protocols flowing through a WAN link, but it doesn't capture or decode packets. It can export data directly into Concord's Network Health, and it stores network device data, thresholds and statistics in the bundled Sybase database.

Administering the tools

Visual UpTime measures availability on a continuous basis, with a resolution of 1 second. Furthermore, its calculation of round-trip delays excludes router serialization and insertion delay, and thus is a true measure of network delay for each PVC. We even found that for the sake of accuracy, we could exclude scheduled maintenance periods from Visual UpTime's calculations of uptime and bandwidth utilization.

We were impressed by how easily we could configure Visual UpTime to automatically collect, interpret and present SLA management information for each of our WAN links. However, drilling down to see probe-level detail was easier with NetReality's WanXplorer and Lucent's VitalSuite interfaces.

VitalSuite organizes its network usage reporting in three views: Business, Network and Reports. Customizing the Business view as either My Vital or My Business is a preference you can configure, with each view offering a different way to look at performance metrics from application and network statistics.

The Network view groups tab-indexed information into Router, WAN, LAN, ATM, Availability/Response Time, Servers and Other (such as Remote Monitoring [RMON] statistics) categories. Each tab index displays device statistics such as speed, average utilization, peak utilization, errors and discards. Clicking on a column heading sorted the table by that statistic, which made it easy to identify problems.

The Reports view is a high-level menu of available reports, categorized by job description. These descriptions include management, application monitoring, network monitoring and capacity planning. To show network usage trends, VitalSuite's planning report uses a simple trending arrow, pointing up or down, along with the current average, and one-month, three-month, six-month and one-year utilizations.

VitalSuite's user interface is intuitive and easy to use. Once we became familiar with VitalSuite's many options and features, dealing with its complexity was not a problem. The My Vital personal Web page is highly configurable and uses password protection to restrict access to and the configuration of the page.

VitalSuite's folder metaphor let us store icons, shown as thumbnail graphs, representing devices we wanted to monitor. The icons themselves are a quick overview of critical interface and device performance. Clicking on an icon called up detailed statistics on that device.

Like Concord's Network Health, VitalSuite groups devices by type (router, WAN, LAN, server, frame relay link or ATM link). If you create custom groups, perhaps based on IP address or device name, it allows groups to overlap. We used this handy device-grouping feature to examine our network from multiple perspectives. VitalSuite's groups are dynamic, easy to maintain and subject to user permissions you can grant or deny.

Network Health's reporting tool let us choose from myriad ways to present and relate the WAN link statistics, including Health reports, Service Level reports, At-A-Glance reports, Trend reports, Top N reports and Traffic Accountant reports. The Traffic Accountant reports provide a great view of network nodes and probes. Via a Web browser, we accessed the tool's easy interface to create circuit-specific presentations of uptime and bandwidth utilization. We then quickly created impressive executive summary reports that showed an aggregate picture of our WAN links. With just a few additional mouse clicks, we easily exported our statistics into Microsoft Excel. Additionally, Network Health's Java-based LiveTrend reporting component for real-time display of just-polled performance data is an impressive tool. Like NetScout Manager Plus and VitalSuite, Network Health's usefulness extends far beyond monitoring for SLA compliance.

Via SNMP, Network Health polled our routers, smart hubs and switches, DSU/CSUs and RMON probes in an especially bandwidth-frugal manner. The product let us configure the polling rate (fast, medium or slow) for each network element. In most cases, only two small packets totaling about 250 bytes traversed the network during a poll - Network Health's SNMP request and the agent's response.

Network Health somewhat eases the process of identifying network devices by letting you categorize network elements by class or IP address grouping, and performs a discovery process to find those elements on the network. More importantly, Network Health was able to understand and interpret every vendor's Management Information Base that we asked it to examine.

Through its well-designed WanXplorer user interface, NetReality's WiseWAN displays WAN topology information in a tree view that makes selecting and working with particular WAN links a breeze. We found we could move objects via drag-and-drop, sort columns of data by clicking on the column header and right-click to display the consistent and intuitive pop-up menus. Best of all, WanXplorer color codes currently set alarms to show a rising status (red) or a falling status (gray).

WanXplorer features a range of reporting options. Real-time reports show network events soon after they occur, based on the polling of each remote probe every 60 seconds. The ability to pause, rewind and replay a real-time report was helpful in our troubleshooting of WAN link errors.

The WiseWAN reporting tool has several different chart styles you can choose, but it doesn't offer the sophisticated trend tracking found in the other products we reviewed.

All the products were easy to install, came with adequate documentation and integrated in our tests with Hewlett-Packard's OpenView.

Nance, a software developer and consultant for 29 years, is the author of Introduction to Networking, 4th Edition and Client/Server LAN Programming. You can contact him at barryn@erols.com.