Progress: October 2002 to Jan 2003
The focus of the DMF project for has been to lead efforts in the HEP and GGF communities to define requirements for a Grid monitoring system, to finish a prototype GMA implementation, to improve the performance and fault tolerance of NetLogger, and to design and implement a prototype monitoring event archive. More details on each of these topics follow.
The majority of the work this quarter went into a demonstration of the pyGMA Activation Service at SC02. This involved working closely with Craig Tull of the Atlas Software group to define requirements, and integrate Athena with pyGMA. A number of performance issues were encountered using SOAP and Python, and a number of components had to be redesiged and performance-critical pieces reimplemented in C to improve performance. A great deal was learned from this excercise, and updates to both NetLogger and pyGMA will be released in the next quarter that reflect these changes.
We inproved the performance of a prototype event archive for monitoring data, based on an open source relational database (mySQL). In this quarter we presented a paper on this at the IEEE supercomputing conference. (see http://www-didc.lbl.gov/papers/Monitoring-archive-SC02.pdf ) The archive provides GMA consumer and producer web services interfaces. We also continued development of a browser-based GUI for analyzing and visualizing archive events.
We began working with the DOE Science Grid and the Atlas Project to define a very early prototype Grid troubleshooting system to let us better understand the issues and requirements of Grid troubleshooting.
We were invited to give numerous talks on various aspects of our work. This included a talk on NetLogger at the LBNL booth at SC02, and a talk on using the DMF for Grid troubleshooting at a DOE workshop on Grid troubleshooting issues.
Brian Tierney continued co-leading the "Glue Schema Work Group", which is tasked to define common schemas for inter operability between the EU physics grid projects (focusing on EDG and DataTag) and the US physics Grid projects (focusing in on PPDG, GriPhyN and iVDGL). The web page for this project is http://www.hicb.org/glue/glue-schema/schema.htm . This work is part of the Grid Laboratory Uniform Environment (GLUE) Phase I task (http://www.hicb.org/glue/GLUE-v0.04.doc ). The Glue Schema Group has been refining the schema for the Compute Entity (CE), and has been working on schema for representing network entities.
We continue to be extremely active in the Global Grid Forum. Dan Gunter has been co-chair of the DAMED working Group, and Brian Tierney is co-chair of the Network Measurements working group. Both of these working groups had new documents that were discussed at the October GGF. The DAMED working group determined that its work is complete. and the Network Measurements working group has decided to start working on schemas for network measurements, and several memembers of the DAMED working group will join this effort. Dan Gunter will likely co-lead a BOF on a generalized Grid event notifications at the next GGF.
We continue to collaborate with several groups, including NLANR, EU DataGrid, Globus, and the IEPM project at SLAC on the possible use of NetLogger to collect monitoring data for their projects. We are working to add NetLogger to the Grid Virtual Data Toolkit (VDT).
We worked with several groups to help add NetLogger instrumentation to their software. This quarter this includes Globus (ANL), EU DataGrid R-GMA, R. Reddy at PSC, and Atlas Athena software.