DOE/MICS Project Report

Date: March 1, 2004

Project Title:  Self-Configuring Network Monitor

Project Type: Base

PIs:  Brian Tierney and Deb Agarwal

__________________________________________________________________

Executive summary

Achieving the goals of high performance distributed data access and computing will require wringing the best possible performance from the networks. Most existing tools for end-to-end tests of network performance provide the end user little or no information from intermediate hops within the network. Without this information, the end-to-end system is unable to identify and diagnose problems within the network. The goal of this project is to design and implement a self-configuring monitoring system that uses special request packets to automatically activate monitors deployed at the Layer three ingress and egress routers of the ESNet network and within the end site networks. A principal design goal of the system is to provide components that are secure, easy to install, and easy to maintain so that the system does not add a burden to the network's administration. This architecture will not require modifications to the application or network routing infrastructure. Archived monitoring data will help point the way beyond the handcrafted systems of network testbeds to a production environment that can routinely support high performance distributed applications. This passive monitoring system will integrate with active monitoring efforts and provide an essential component in a complete end-to-end network test and monitoring capability.

 

We now have SCNM hosts installed at LBNL, NERSC, SLAC, PSC, BNL, ANL and ORNL.

Accomplishments for period ending March 1, 2004

 


The focus of the past 6 months has been on functional enhancements to the core SCNM software, gathering more data, setting up more sites and data analysis. The software architecture has been redesigned to minimize the maintenance cost, to increase stability, and enable us to easily add new functionalities. Time control functions such as when to start and stop monitoring, and deactivation messaging capabilities have been added into activation control and capture engine. The capture engine has been updated to work with the latest FreeBSD stable release, version 5.2. BPF has been updated to work with four bonded gigabit Ethernet channels. A number of new statistical functions have been added into the analysis tools fc2xp and fc2xg for network traffic analysis.


We have been using SCNM to assist network troubleshooting on the path between LBL, NERSC, ORNL, and BNL. In particular, we have been working with networking engineers at NERSC and BNL to diagnose performance problems on this path. We have shipped a SCNM box to BNL and are waiting for BNL to finish the installation. We will continue this analysis when the SCNM box at BNL is installed. We have been using the SCNM box at NERSC to help with traffic analysis to/from BNL.

We have been using SCNM to assist in the design and testing of a new network bandwidth estimation algorithm and for tools development. SCNM is also being used for network transmission protocol study and research. A pre-proposal for this work has been submitted to DOE.

We are also researching hardware capabilities and strategies for capturing network traffic on ultra high-speed networks, and using port replication strategies at locations in ESNet where deploying a fiber splitter is not possible. We have implemented a new updated BPF infrastructure that now allows us to use multiple bonded gig-e channels to aggerate the amount of traffic we can monitor.

We have also enhanced SCNM to do analysis on UDP packet streams in addition to traditional TCP streams, allowing for much more flexibility in our application monitoring.

We have enhanced the GUI to allow the display of multiple SCNM capture files of the same stream, allowing one to view the same stream from two different perspectives in the network.

 

We have configured and shipped additional SCNM monitoring hosts to ANL, BNL, and PSC. The ones at ANL and BNL should all be set up soon, providing us with a much larger testbed for collecting SCNM results.

The SCNM box at PSC has been setup and is being actively used to for testing and profiling of applications over several networks.

We have started to implement the security model for the "administrator mode".

We have been using the SCNM data collected in NTAF for verifying the stability of the SCNM software components, and for providing the data necessary to compute baselines for what could be a "normal" and what might be considered "abnormal".

We continue to test SCNM performance limitations on various hardware platforms, and experimenting with using bonded gigabit ethernet, and the potential abilities for 10 Gbit ethernet

 

We continue to improve documentation of all the SCNM components (see: http://www.dsd.lbl.gov/Net-Mon/SCNM_hw_sw.html) and now have several papers about our work on the web-site.

 

Future Plans (Final 3 months)

Research interactions

We are working closely with the Net100 project and the LBNL portion of the Bandwidth Estimation project to coordinate opportunities to monitor their active measurement traffic using the self-configuring network monitors.  We are also working with NERSC, ORNL, SLAC, PSC, BNL, and ESNet personnel to identify issues and deploy monitors everywhere.