parsing SRM logs
I’ve updated the SRM parser in an attempt to more easily link together the various events with a unique identifier (which I’m putting in the ‘guid’ attribute).
To get the newest code, follow the cookbook instructions for getting the code from subversion and setting up your environment.
Then, you can test out the parser on this sample log. After saving the log to some file we’ll just call $LOGFILE, do this:
nl_parser -m bestman -p version=2 $LOGFILE
This should print, to standard output, 14 lines of pretty unreadable log output. However, if you save this in a file, say $LOGFILE2, you could now load it into a database with nl_loader. Here’s an example of how to load it into an sqlite3 database (file) called $DBFILE — no extra installation is required for SQLite.
nl_parser -m bestman -p version=2 $LOGFILE > $LOGFILE2
DBFILE=example.sqlite
nl_loader -u sqlite://$DBFILE -C -i $LOGFILE2
To verify that there actually is something in the database, you can do a simple query like this:
sqlite3 $DBFILE "select * from event"
# Output:
1|207deecdec2f9a3f0c8d33ccd7298072|1225524876.722|srm.server.copy.in|2|4
2|0b468528bc886b71176a1f4bb1f8a0a4|1225524876.731|srm.server.list|2|4
3|106f4156e5517858ec0f4684dfe73a9d|1225524876.731|srm.server.req|0|4
4|597e7ba2a8cb6383cc871e383d05e350|1225524876.732|srm.server.req.to.queued|2|4
5|82973ccd386f6cee9ebf5fe33ef71e69|1225524876.732|srm.server.req.to.scheduled|2|4
6|f8fa6278a718101c9a4a4482744f5548|1225524876.733|srm.server.copy.out|2|4
7|0a50c7178490b8f8904e52ea863afd7c|1225524876.894|srm.server.req.to.status|2|4
8|014e3fef040a8e048d09e059ae7987b4|1225524876.894|srm.server.TSRMRequestCopyToRemote.upload|2|4
9|ab2bb06914ce590266bca5117c11ed91|1225524876.894|srm.server.req.to.status|2|4
10|d16c585253b2e420be8322fc64f759fc|1225524876.895|srm.server.tx.push|0|4
11|7b7370e35a8917ae5f976dc6f4f611f9|1225524876.895|srm.server.tx.push.size|2|4
12|dc1d0b07b1b404be2be3c2a7ddf23148|1225524935.733|srm.server.tx.push|1|4
13|ee8bc3e776910fae76f3ec310eaba879|1225524935.733|srm.server.req.to.status|2|4
14|d2c23970199473453820ac07aec5eeb7|1225524935.733|srm.server.req.to|1|4
You could also try out the R code in trunk/R/bestman.R in the NetLogger subversion repository. Just looking at the contained SQL should give you an idea how this data can be queried and joined together on those ‘guid’ attributes so you can look at all the information from one transfer together. Note that there are still some issues with getting the same GUID for all the events, and in particular the ‘queued’ event doesn’t seem to get the same GUID as the rest.
If you want to look at the source code for the parser itself, look under trunk/python/netlogger/parsers/modules/bestman.py and in particular the code section that mentions “version 2″. Hopefully it’s reasonably clear how events are being mapped and processed.
