Quarterly Report for the Distributed Monitoring Framework (DMF) project, October 2002
Progress: July 2002 to Oct 2002
The focus of the DMF project for FY02 was to lead efforts in the HEP and GGF communities to define requirements for a Grid monitoring system, to finish a prototype GMA implementation, to improve the performance and fault tolerance of NetLogger, and to design and implement a prototype monitoring event archive. More details on each of these topics follow.
Brian Tierney continued co-leading the "Glue Schema Work Group", which is tasked to define common schemas for inter operability between the EU physics grid projects (focusing on EDG and DataTag) and the US physics Grid projects (focusing in on PPDG, GriPhyN and iVDGL). The web page for this project is http://www.hicb.org/glue/glue-schema/schema.htm. This work is part of the Grid Laboratory Uniform Environment (GLUE) Phase I task (http://www.hicb.org/glue/GLUE-v0.04.doc).
The Glue Schema Group has now completed defining the schema for a Compute Elements (CE), and the the Storage Element (SE). Defining SE schemas involved a lot of interaction with Arie Shoshani's Data Management Group, to ensure that all the concepts defined by the Storage Resource Manager (SRM) architecture were represented in the schema. The Globus MDS and EU DataGrid R-GMA Grid Information Services will use these schemas. This group is now working on schemas for network monitoring information. The goal is to have common schemas defined, deployed, and tested in time for the EU DataGrid Testbed 2 release in November 2002. The GLUE schema work will provide input into multiple GGF working groups that are addressing various schema issues.
We continued further implementation of a prototype event archive for monitoring data, based on an open source relational database (mySQL). In this quarter we completed the final version of a paper on this for the IEEE supercomputing conference. (see http://www-didc.lbl.gov/papers/Monitoring-archive-SC02.pdf) The archive provides GMA consumer and producer web services interfaces. We also continued development of a browser-based GUI for analyzing and visualizing archive events.
We completed documentation and packaging for a “NetLogger version 2.0”, which included many improvements to NetLogger. Details are in the last quarterly report, and are also described in a paper accepted to the 2002 High Performance Distributed Computing Conference (see: http://www-didc.lbl.gov/papers/HPDC02-HP-monitoring.pdf). We also finalized the open source legal issues to enable NetLogger to be included as part of the Globus Toolkit.
We continued work on a GMA-based "activation service" for NetLogger. This works as follows: a consumer sends a GMA subscribe request to an activation service for particular type(s) of monitoring events. The activation service creates an entry in a "trigger file". The application, via NetLogger library calls, checks the trigger files and logs the requested event data. The activation service buffers, filters, and forwards the requested event data back to the consumer. The design is flexible: multiple consumers can subscribe to the same set of events, and multiple applications can provide events to a single consumer. We are working with the Atlas software group to add this to their Athena framework, and will demonstrate this functionality at SC2002.
We were invited to give numerous talks on various aspects of our work. This included a talk on TCP tuning at UC Berkeley (URL here), and a series of talks at a mini-workshop with Les Cottrell's group at SLAC.
We continue to be extremely active in the Global Grid Forum. Dan Gunter is co-chair of the DAMED working Group, and Brian Tierney is co-chair of the Network Measurements working group. Both of these working groups have new documents that were be discussed at the July and October GGFs. The DAMED working group as determined that its work is complete. and the Network Measurements working group has decided to start working on schemas for network measurements, and several memembers of the DAMED working group will join this effort. Dan Gunter will likely co-lead a BOF on a generalized Grid event notifications at the next GGF.
We continue to collaborate with several groups, including NLANR, EU DataGrid, Globus, and the IEPM project at SLAC on the possible use of NetLogger to collect monitoring data for their projects.
We have continued to work with the Globus project to define instrumentation and monitoring services for Globus and the new "Open Grid Services Architecture" (OGSA), based on NetLogger.
We worked with several groups to help add NetLogger instrumentation to their software. This quarter this includes Globus (ANL), EU DataGrid R-GMA, R. Reddy at PSC, and Atlas Athena software.