Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.
I wrote my solution, calculate-data-transfer.py, in Python:
import re
import sys
fileName = sys.argv[1]
compiledExpression = re.compile(".*\".*\" [-0-9]* ([0-9]*)")
fpFullLog = file(fileName)
totalBytes = 0
for line in fpFullLog:
matches = compiledExpression.match(line)
if matches is None:
continue
bytes = matches.group(1)
if len(bytes) > 0: # avoid zero-length matches
bytes = int(bytes)
totalBytes += bytes
fpFullLog.close()
print "%.2f MiB" % (totalBytes/2.0**20)Use is simple:
% python calculate-data-transfer.py access.log
The script will print out the data transfer in MiB, based on the power of 2 (2^20) rather than 10 (10^6).
These other articles I've written may be interesting to you as well:
Comments
was it so easy? :)) i
was it so easy? :))
i tried this script and saw that it is really working.
is bandwidth calculation so easy? :)
thanks!
your regex appears to be off.
Also, it is much easier to so something like:
awk '{ sum += $10 } END { print sum }' access_log
From the command line. You are guaranteed that *nix will have awk. Never know if you'll have python.
mmmm google
It is scary when you google search something and come across people you know. You saved me 5 minutes of work ;)