Progress:
November 2003 to January 2004
The focus of the DMF project for this quarter has been to integrate NetLogger with pyGlobus for a Grid Troubleshooting demonstration at SuperComputing 2003, to continue work with the GGF to develop interoperable schemas for network monitoring, to document the "continuous query" component for NetLogger messages, and to work with Grid3/PPDG to add NetLogger to the Grid3 effort. More details on each of these topics follow.
At
Supercomputing 2003, The Distributed Systems Department demonstrated
integration of functionality from three DSD projects to provide a
securely authenticated, grid-enabled, and network monitored
prototype job submission and monitoring system. We added NetLogger
message logging calls into the pyGlobus file staging and remote
executing modules, as well as the Globus gatekeeper and jobmanager, and
the Akenti access control policy library. The GUI acted as a
message sink for NetLogger log messages from all the instrumented
components. The NetLogger messages were used to visualize the
file staging, remote execution, and access control steps of the
demonstration. The GUI used the status and error codes from the
NetLogger messages to draw a “timeline” of events which indicate to the
user the progression of the file staging and job execution
components. The GUI inspected the NetLogger messages to determine
if errors occurred during the procedure and visually flagged the events
which failed. The user can drill-down into a failed event to
inspect the status
code and error message to determine why the procedure failed.
This
demonstration went very well, and provided some excellent design
feedback for the NetLogger Activation Service. Some changes were made
for the demo itself. And in the coming months, we will work on making
some modifications that will increase the robustness and ease-of-use of
some Activation Service components.
With the
GGF Network Measurements WG, we continued to lead work on providing a
framework for interoperable querying, scheduling, and reporting of
network measurements. This work builds upon the NM-WG's GGF draft
recommendation, "A Hierarchy of Network Performance Characteristics for
Grid Applications and Services", an abstract classification and
description of the most important network measurements. The principal
elements of the framework are two XML schemas: one to query and/or
request a measurement (immediately or in the future), and the other to
report the result. These schemas will be combined into a single Web
Services request-response operation, thus providing a Web Services
front-end to existing network monitoring systems. For this effort, we continue to be extremely active in the Global Grid Forum.
Dan will attend GGF-10 in Berlin to continue schema discussions.
We
finished the first stage of implementation for our "continuous", i.e.
streaming real-time, relational queries over NetLogger data streams,
which promises to add powerful real-time filtering to NetLogger.
We presented several talks related to our work with DMF. At
Grid 2003, Brian Tierney presented our accepted paper,
On-Demand Grid Application Tuning and Debugging with the NetLogger
Activation Service. And at Globus World, Brian Tierney gave
an invited talk about Grid Troubleshooting with NetLogger.
Based
on our experience with the SC2003 demo, we have submitted a paper for
consideration at High Performance Distributed Computing 13, describing
the importance of unique Grid workflow identifiers (GIDs) for
monitoring Grid applications.
For the iVDGL/PPDG Grid3 effort, we continued running our NetLogger repository service for use by all of Grid3. See http://www-didc.lbl.gov/NetLogger/netlogger-grid3.html. We are using this to monitoring the status of Grid3 sites.
We
continue to collaborate with several groups, including NLANR, EU
DataGrid, Globus, and the IEPM project at SLAC on the possible use of NetLogger to collect monitoring data for their projects.
We continue working with several groups to help add NetLogger instrumentation to their software. This quarter this includes Bill Allcock and Jenny (ANL), Constantinos Dovorlis (GA Tech), IEPM (SLAC), EU DataGrid R-GMA, and Atlas Athena software.