DOE/MICS Mid Year Project Report

Date: June 1, 2003

Project Title:  Self-Configuring Network Monitor

Project Type: Base

PIs:  Brian Tierney and Deb Agarwal

__________________________________________________________________

Executive summary

Achieving the goals of high performance distributed data access and computing will require wringing the best possible performance from the networks. Most existing tools for end-to-end tests of network performance provide the end user little or no information from intermediate hops within the network. Without this information, the end-to-end system is unable to identify and diagnose problems within the network.  The goal of this project is to design and implement a self-configuring monitoring system that uses special request packets to automatically activate monitors deployed at the Layer three ingress and egress routers of the ESNet network and within the end site networks. A principal design goal of the system is to provide components that are secure, easy to install, and easy to maintain so that the system does not add a burden to the network’s administration. This architecture will not require modifications to the application or network routing infrastructure. Archived monitoring data will help point the way beyond the handcrafted systems of network testbeds to a production environment that can routinely support high performance distributed applications. This passive monitoring system will integrate with active monitoring efforts and provide an essential component in a complete end-to-end network test and monitoring capability.

 

We now have SCNM hosts installed at LBNL, NERSC, SLAC, and ORNL.

Accomplishments Dec 2002 to May 2003

 

The focus of the past 6 months has been on dissemination of our results for the first half of the project. We completed two papers (one on SCNM and one on and one on one of the software components, pktd) for the Passive and Active Monitoring Workshop, and presented this work at the workshop. Several attendees at this workshop were very interested in SCNM technology, especially our modified GigE device driver.

 

We performed a large number of tests on the LBNL to ORNL and LBNL to SLAC paths, which helped us understand the types of information that SCNM can provide, and helped uncover a few bugs in the SCNM software, which we fixed.

 

We gave presentations on SCNM to ESNet, NERSC, and SLAC representatives on the capabilities and use of SCNM. After the meeting with ESNet, SCNM was used to help track down a misconfigured Juniper router on the LBNL to ORNL path.

 

We installed an SCNM host at SLAC, and began using it to test the LBNL to SLAC link. We found that in general the WAN between LBNL and SLAC is quite good, but saw loss on both sites LANs. We have begun discussions with ANL on installing a SCNM host at there site, and hope to install it in the next 3 months.

 

We began the process of included SCNM data capture to the iperf tests that are run regularly from NTAF as part of the Net100 project. and store the results in an archive. This will allow us to track SCNM results over time and look for trends. It will also verify that the SCNM software is stable.

 

We found that SCNM is very good at finding TCP implemenation problems such as the Linux 2.4 SACK issue. This problem consists of an extremely inefficient SACK implementation, which looks up ACKed data instead of SACK holes. This implementation chokes even a high-end Linux host when the losses coincide with large congestion windows (e.g. 2000 packets).

 

Other work included:

We continue to work with Tom Dunigan of ORNL to install and test the SCNM box at ORNL, and have been looking at data from the LBNL to ORNL network path. We continue to work with NERSC personnel to test and use the SCNM box at NERSC.

We continue to improve documentation of all the SCNM components (see: http://www-itg.lbl.gov/Net-Mon/SCNM_hw_sw.html)

Future Plans (Next six months)

Research interactions

We are working closely with the Net100 project and the LBNL portion of the Bandwidth Estimation project to coordinate opportunities to monitor their active measurement traffic using the self-configuring network monitors.  We are also working with NERSC, ORNL, SLAC, PSC and ESNet personnel to identify issues and deploy monitors.