Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.
I wrote my solution, calculate-data-transfer.py, in Python:
import re
import sys
fileName = sys.argv[1]
compiledExpression = re.compile(".*\".*\" [-0-9]* ([0-9]*)")
fpFullLog = file(fileName)
totalBytes = 0
for line in fpFullLog:
matches = compiledExpression.match(line)
if matches is None:
continue
bytes = matches.group(1)
if len(bytes) > 0: # avoid zero-length matches
bytes = int(bytes)
totalBytes += bytes
fpFullLog.close()
print "%.2f MiB" % (totalBytes/2.0**20)Use is simple:
% python calculate-data-transfer.py access.log
The script will print out the data transfer in MiB, based on the power of 2 (2^20) rather than 10 (10^6).