DOE/MICS Mid Year Project Report

Date: December 1, 2003

Project Title:  Self-Configuring Network Monitor

Project Type: Base

PIs:  Brian Tierney and Deb Agarwal

__________________________________________________________________

Executive summary

Achieving the goals of high performance distributed data access and computing will require wringing the best possible performance from the networks. Most existing tools for end-to-end tests of network performance provide the end user little or no information from intermediate hops within the network. Without this information, the end-to-end system is unable to identify and diagnose problems within the network.  The goal of this project is to design and implement a self-configuring monitoring system that uses special request packets to automatically activate monitors deployed at the Layer three ingress and egress routers of the ESNet network and within the end site networks. A principal design goal of the system is to provide components that are secure, easy to install, and easy to maintain so that the system does not add a burden to the network’s administration. This architecture will not require modifications to the application or network routing infrastructure. Archived monitoring data will help point the way beyond the handcrafted systems of network testbeds to a production environment that can routinely support high performance distributed applications. This passive monitoring system will integrate with active monitoring efforts and provide an essential component in a complete end-to-end network test and monitoring capability.

 

We now have SCNM hosts installed at LBNL, NERSC, SLAC, and ORNL.

Accomplishments June 1, 2003 to December 1, 2003

 


The focus of the past 6 months has been on functional enhancements to the core SCNM software, and on
data analysis. The software architecture has been redesigned to minimize the maintainence cost,
to increase stability, and enable us to easily add new functionalities.Time control functions such as when to start and stop monitoring, and deactivation messaging capabilities have been added into activation control and capture engine. The capture engine has been updated to work with the latest FreeBSD stable release, version 4.9. BPF has been updated towork with four bonded gigabit Ethernet channels. A number of new statistical functions has been added into analysis tools
fc2xp and fc2xg for network traffic analysis.SCNM revision 2.0 has been released based on above enhancement.


We used SCNM to assist network troubleshooting on the path between LBL, NERSC, ORNL, and BNL. In particular, w
e worked with networking engineers at NERSC and BNL to diagnose performance problems on this path. We will continue this analysis when the SCNM box at BNL is installed.

We used SCNM to assist network bandwidth estimation algorithm and tools development. We also used SCNM for network transmission protocol study and research.

We are also researching hardware capabilities and strategies for capturing network traffic on ultra high-speed networks, and using port replication strategies at locations in ESNet where deploying a fiber splitter is not possible.

 

We configured and shipped SCNM monitoring hosts to ANL, BNL, and PSC. These should all be set up soon, providing us with a much larger testbed for collecting SCNM results.

We designed the security model for the "administrator mode", and will start the implementation soon.

We completed the integration of SCNM with NTAF and netarchd, so that SCNM results are collected regularly. This has been very useful for verifying the stability of the SCNM software components, and for providing the data necessary to figure out baselines for what is "normal" and what is "abnormal".

We completed initial work on some "automatic" analysis tools, allowing us to generate alarms of we detect behavior that is significantly different than "normal" results.

We began testing SCNM performance limitation on new hardware platforms, and try using bonded gigabit ethernet, and estimate potential abilities for 10 Gbit ethernet

 

We continue to improve documentation of all the SCNM components (see: http://www-itg.lbl.gov/Net-Mon/SCNM_hw_sw.html)

 

Future Plans (Next six months)

Research interactions

We are working closely with the Net100 project and the LBNL portion of the Bandwidth Estimation project to coordinate opportunities to monitor their active measurement traffic using the self-configuring network monitors.  We are also working with NERSC, ORNL, SLAC, PSC and ESNet personnel to identify issues and deploy monitors.