DOE/MICS Mid Year Project Report
Date: Dec 1, 2002
Project Title:
Self-Configuring Network Monitor
Project Type: Base
PI: Deb Agarwal and
Brian Tierney
__________________________________________________________________
Executive summary
Achieving the goals of high performance distributed data
access and computing will require wringing the best possible performance from
the networks. Most existing tools for end-to-end tests of network performance
provide the end user little or no information from intermediate hops within the
network. Without this information, the end-to-end system is unable to identify
and diagnose problems within the network.
The goal of this project is to design and implement a self-configuring
monitoring system that uses special request packets to automatically activate
monitors deployed at the Layer three ingress and egress routers of the ESNet
network and within the end site networks. A principal design goal of the system
is to provide components that are secure, easy to install, and easy to maintain
so that the system does not add a burden to the network’s administration. This
architecture will not require modifications to the application or network
routing infrastructure. Archived monitoring data will help point the way beyond
the handcrafted systems of network testbeds to a production environment that
can routinely support high performance distributed applications. This passive
monitoring system will integrate with active monitoring efforts and provide an
essential component in a complete end-to-end network test and monitoring
capability.
Accomplishments May 2002 to Dec 2002
The focus of the past 6 months has been on performance
issues, both for the packet capture and for the SNCMPlot analysis tool.
We found that the capture daemon was sometimes missing
packets. A thorough analysis of the system showed that this was mainly due to
the fact that we were doing too many memory copies of the packet header and
generating too much data. To address these issues we did the following:
- Performed
a detailed analysis of the SCNM data path and identified the
hardware/software requirements for high-speed data capture.
- Performance analysis and tuning of the performance of the packet
capture daemon, pktd. We found and addressed several key issues associated
with performing packet capture on a high bitrate data stream. These issues include:
- daemon overhead: the packet daemon has to
be as efficient as possible, in order to ensure the minimum amount of
work is carried out on a per-packet basis.
- kernel buffer size: in the BPF network
device, the default kernel buffer size is 32 KB. This size is way
too small to support capture of high speed data streams.
- buffer management: an important aspect of
delivering high volumes of monitoring data is that the data transmission
can interfere with the data capture. We found we needed to buffer packet
headers before sending them out.
- user-level buffering: under FreeBSD,
the performance of the standard C stdio library for small writes
is 10% worse than providing your own user-level buffering. We modified
our routines to provide user-level buffering.
- packet compression: another problem with
capturing very high data rate streams is that the resulting stream of
captured packet headers accounts for enough data to affect the
measured link. This indicated the need for packet header compression. An
important consideration here is daemon overhead required to compress the
data (i.e.: no extra memory copies). Our implemented packet compression has
improved the overall monitoring performance of the daemon.
Other work includes:
- finished SCNMPlot: addressed many performance issues and improved
ability to overlay and compare multiple plots. Sample results are shown here.
- used SCNM traces to help debug and test the network measurement
tools netest and ncsd
- Designed, implemented, and improved the following post capture and
pre visualization processing tools, including:
- SCNM format to tcpdump format (and
reverse) filters
- SCNM data graphical analysis
preprocessing tools
fc2xg -- convert scnm data to
xgraph data
fc2xp -- convert scnm data to
xplot data
- indexing filter for faster SCNM data
segment search
- simple SCNM data analysis tool to
verify the accuracy and correctness of captured data
We continue to work with Tom Dunigan of ORNL to install and test the SCNM
box at ORNL, and have been looking at data from the LBNL to ORNL network path. We
continue to work with NERSC personnel to test and use the SCNM box at NERSC. We
also continue to work with Tom Lehman from ISI, who installed SCNM boxes at ISI
East (Arlington, VA) and ISI West (Marina Del Ray, CA), and is planning to move
these boxes to PSC and possibly StarTap. We have begun discussions with the
network and security staff at SLAC to get approval to install a box there.
We continue to improve documentation of all the SCNM
components (see: http://www-itg.lbl.gov/Net-Mon/SCNM_hw_sw.html)
Future Plans (Next six months)
- Continue working on BPF enhancements, in particular, packet header
compression. While there are some well-known approaches to this
problem (e.g.: CSLIP, CRTP, ROHC), none of them is particularly well suited
to trace compression, where timestamps are a major issue, but
complete semantics storage is not.
- Begin design and implementation of the secure sensor activation
protocol implementation
- Continue to explore salability issues, and enhance SCNM tools
where necessary
- Continue to enhance the activation options of SCNM
- Continue to use SCNM to help debug and test network measurement
tools such as netest and ncsd
- Continue to develop tools to analyze the SCNM dump files
- Continue design and implementation of Berkeley Packet Filter (BPF)
enhancements to improve handling of large numbers of flows
- Explore the use of the Grid Monitoring Architecture for the
collection and publication of monitoring data
- Explore the possibility of storing the results in the Net100
monitoring data archive
Research interactions
We are working closely with the Net100 project and the LBNL
portion of the Bandwidth Estimation project to coordinate opportunities to
monitor their active measurement traffic using the self-configuring network
monitors. We are also working with
NERSC, ORNL, SLAC, PSC and ESNet personnel to identify issues and deploy
monitors.