Processing Exchange 2010 Message Tracking Log files with Python
So my day job includes doing some work on MS Exchange 2010. I'm a stats guy and like digging in the log files to get details on the systems I work with.
I was asked to pull some stats out of Exchange and went straight into the Message Tracking logs.
By the unwritten laws of loggiles you will get a ton of information, some of which you don't need. I identified the fields important to me and wrote this python script to parse a folder full of log files into a summarised set of .csv files that you can process further.
In my test case, I copy the log files from 4 production servers into a dev box and then run the script there. This way there is a minimal performance / disk impact to the production users.
I am *NOT* a python programmer so make fun of the code if you wish, I won't be offended. This script is basically a product of much googling and reading of docs.python.org.
domain = recips[i].split('@')
I was asked to pull some stats out of Exchange and went straight into the Message Tracking logs.
By the unwritten laws of loggiles you will get a ton of information, some of which you don't need. I identified the fields important to me and wrote this python script to parse a folder full of log files into a summarised set of .csv files that you can process further.
In my test case, I copy the log files from 4 production servers into a dev box and then run the script there. This way there is a minimal performance / disk impact to the production users.
I am *NOT* a python programmer so make fun of the code if you wish, I won't be offended. This script is basically a product of much googling and reading of docs.python.org.
# parse Exchange 2010 MessageTracker Log Files
# (c) Ian Stoffberg
# 0) make folders source and output in the same folder as the script
# 1) put all the exchange logs you want to strip down into the source folder
# 2) delete the output filename output\output.csv
# 3) run script (i use cygwin)
# 4) import output.csv into excel
import os
import string
# sourcefolder
logdir = ".\source"
# outputfile contains the message events
outputfile = ".\output\output.csv"
# outputfile2 contains the expanded list of recipients and the domain per message
outputfile2 = ".\output\messages.csv"
writer = open(outputfile,"a")
messagewriter = open(outputfile2,"a")
# wanted is the list of columns you want to keep
wanted = [1,3,5,9,10,14,15,18,19,20,22]
logs = os.listdir(logdir)
messagewriter.write(' emaildate,messageid, recipientemail,domain\n')
for log in logs:
logfiles = open("%s%s%s" % (logdir, os.sep, log)).readlines()
for line in logfiles:
try:
# next line discards commented lines
if line[1] != '#':
myinputrow = line.split(',')
eventid = myinputrow[8]
# I am discard other event types ( you might want to see those )
if eventid == 'SEND' or eventid == 'RECEIVE':
myoutputline = ''
for item in wanted:
myoutputline = myoutputline + myinputrow[item-1] + ','
writer.write(myoutputline + '\n')
recips = myinputrow[11].split(';')
i = 0
while recips[i] != '' :
emaildate = myinputrow[0].split('T')
emaildate = emaildate[0]
domain = domain[1]
messagewriter.write(emaildate + ',' + myinputrow[9] + ',' + recips[i] + ',' + domain + '\n')
i = i + 1
except:
print ''
writer.close
--------------------
This is the basic version I first wrote. I have since tweaked it for my needs, but i figure any scripting guy could adapt this to their needs.
Once you have this info, the sky's the limit.
I imported the data into SQL, created some views based on it and linked to other system data.
I then created reports in Reporting Services which will be available for Managers/Administrators to query.
Why I prefer python to other scripting tools in this instance. The code executes fairly quickly. Its very readable and logical to follow). Its easy to move the python interpreter between machines.
There is a big bug which I will fix and post later. If subject field has comma's in, the script might get confused. need to test.
Final note: If you're not too familiar with python, some words of advice. WHITE SPACE MATTERS. Every indent is there for a reason. if , while, for statements have indents to indicate program structure. It affects logic and is not there to make the code readable, but affect its function. Mix spaces & tabs at your peril.
I hope this is of use to admins out there and can easily be adapted for any type of .csv log.
Tested on Exchange 2010 SP1. Log formats may differ in other systems.
----
edit: version 2 on the way. substantially improved. fixed the subject comma bug, etc.
----
edit: version 2 on the way. substantially improved. fixed the subject comma bug, etc.
Comments