Home > Java, Python > Parsing log files with python and the pyparsing module

Parsing log files with python and the pyparsing module

I got a wild hair and have been trying to determine the best way to go about parsing java thread dumps with python. After a few failed attempts, I came across the pyparsing python module. Pyparsing is a fairly simple to use (considering what it is doing) parsing module, that makes quick work out of cryptic log files…

I was trying to parse a typical java thread dump, like this one, “web81.prod.dump”:

“FILE Message Writer” daemon prio=5 tid=0x0093d7c0 nid=0xf in Object.wait() [a4e81000..a4e819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cb70> (a java.util.LinkedList)
at java.lang.Object.wait(Object.java:429)
at com.sitraka.pas.common.util.queue.ListQueue.dequeue(ListQueue.java:137)
- locked <0xbe71cb70> (a java.util.LinkedList)
at com.sitraka.pas.common.log.FileLogTarget$MessageWriter.run(FileLogTarget.java:359)
at java.lang.Thread.run(Thread.java:534)

“Timestamp Updater” daemon prio=10 tid=0x00ab9fa0 nid=0xe waiting on condition [a4f81000..a4f819c0]
at java.lang.Thread.sleep(Native Method)
at com.sitraka.pas.agent.recording.Timestamp$Updater.run(Timestamp.java:159)
at java.lang.Thread.run(Thread.java:534)

“Signal Dispatcher” daemon prio=10 tid=0x000f3660 nid=0×8 waiting on condition [0..0]

“Finalizer” daemon prio=8 tid=0x000f0488 nid=0×6 in Object.wait() [f9581000..f95819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

“Reference Handler” daemon prio=10 tid=0x000eeb58 nid=0×5 in Object.wait() [f9681000..f96819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cdd8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115)
- locked <0xbe71cdd8> (a java.lang.ref.Reference$Lock)

“main” prio=5 tid=0x000387e0 nid=0×1 in Object.wait() [ffbee000..ffbee4a4]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71d2a0> (a weblogic.t3.srvr.T3Srvr)
at java.lang.Object.wait(Object.java:429)
at weblogic.t3.srvr.T3Srvr.waitForDeath(T3Srvr.java:1208)
- locked <0xbe71d2a0> (a weblogic.t3.srvr.T3Srvr)
at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:390)
at weblogic.Server.main(Server.java:32)

and here’s the python code:

#!/usr/bin/env python

from pyparsing import *

input = open(“web81.prod.dump”, ‘r’)
data = input.read()

#————————————————————————
# Define Grammars
#————————————————————————

integer = Word(nums)
hexnums = Word(alphanums)
end = Literal(“\n”).suppress()
all = SkipTo(end)
threadname = dblQuotedString
daemon = Literal(“daemon”)
objectwait = Literal(“in Object.wait()”)
waitmon = Literal(“waiting for monitor entry”)
waitcon = Literal(“waiting on condition”)
runnable = Literal(“runnable”)
runstate = objectwait | runnable | waitmon | waitcon
memloc = Word(alphanums + “\[\].”)
waitlock = Combine (Group(Literal(“- waiting to lock”)+ all))
waiton = Combine (Group(Literal(“- waiting on”)+ all))
locked = Combine (Group(Literal(“- locked”)+ all))
verbline = Combine (Group(“at ” + all))
condition = waitlock | waiton | locked
cond = ZeroOrMore(condition + restOfLine).setResultsName(“condition”)
cond.ignore(verbline)

priority = “prio=” + integer.setResultsName(“prio”)
tidref = “tid=” + hexnums.setResultsName(“tid”)
nidref = “nid=” + hexnums.setResultsName(“nid”)

logEntry = threadname.setResultsName(“threadname”) + daemon + priority + tidref + nidref \
+ runstate.setResultsName(“runstate”) + memloc.setResultsName(“memloc”) \
+ cond

#————————————————————————

for tokens in logEntry.searchString(data):
print
print “THREADNAME =\t “+ tokens.threadname
print “PRIORITY =\t “+ tokens.prio
print “TID =\t\t “+ tokens.tid
print “NID =\t\t “+ tokens.nid
print “RUNSTATE =\t “+ tokens.runstate
print “MEMORY ADDRESS = “+ tokens.memloc
print
print “CONDITIONS:”
print
for x in tokens.condition:
print x
print 50*”-”

which outputs this:

THREADNAME = “FILE Message Writer”
PRIORITY = 5
TID = 0x0093d7c0
NID = 0xf
RUNSTATE = in Object.wait()
MEMORY ADDRESS = [a4e81000..a4e819c0]

CONDITIONS:

- waiting on <0xbe71cb70> (a java.util.LinkedList)

- locked <0xbe71cb70> (a java.util.LinkedList)

————————————————–

THREADNAME = “Timestamp Updater”
PRIORITY = 10
TID = 0x00ab9fa0
NID = 0xe
RUNSTATE = waiting on condition
MEMORY ADDRESS = [a4f81000..a4f819c0]

CONDITIONS:

————————————————–

THREADNAME = “Signal Dispatcher”
PRIORITY = 10
TID = 0x000f3660
NID = 0×8
RUNSTATE = waiting on condition
MEMORY ADDRESS = [0..0]

CONDITIONS:

————————————————–

THREADNAME = “Finalizer”
PRIORITY = 8
TID = 0x000f0488
NID = 0×6
RUNSTATE = in Object.wait()
MEMORY ADDRESS = [f9581000..f95819c0]

CONDITIONS:

- waiting on <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)

- locked <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)

————————————————–

THREADNAME = “Reference Handler”
PRIORITY = 10
TID = 0x000eeb58
NID = 0×5
RUNSTATE = in Object.wait()
MEMORY ADDRESS = [f9681000..f96819c0]

CONDITIONS:

- waiting on <0xbe71cdd8> (a java.lang.ref.Reference$Lock)

- locked <0xbe71cdd8> (a java.lang.ref.Reference$Lock)

————————————————–

So now I need to put these tokens into a dictionary and I can start searching for problems…
You can download pyparsing here

Categories: Java, Python Tags:
  1. No comments yet.
  1. No trackbacks yet.