patternpythonMinor
Parsing HTTP server logs
Viewed 0 times
serverhttpparsinglogs
Problem
I have a relatively simple project to parse some HTTP server logs using Python and SQLite. I wrote up the code but I'm always looking for tips on being a better Python scripter. Though this is a simple task and my code works as written, I was hoping for some pointers to improve my coding ability.
Here is some sample data:
```
99.122.86.237 - - [14/Oct/2012:00:01:06 -0400] "GET /epic/running_epic_tier_cover_300w.jpg HTTP/1.1" 200 81804 "http://images.google.com/search?num=10&hl=en&site=&tbm=isch&source=hp&q=epic&oq=epic&gs_l=img.3..0l10.2603.4210.0.4470.5.5.0.0.0.0.137.412.4j1.5.0...0.0...1ac.1.Ycx4MqWP66w&biw=1024&bih=672&sei=mzd6UIjlKMrcqQHM3YGwAg" "Mozilla/5.0 (iPad; CPU O
import re
import sqlite3
## Load in the log file
file = open("logs")
rawdata = file.read()
## Create a database
conn = sqlite3.connect("parsed_logs.sqlite3")
c = conn.cursor()
# Build the SQLite database if needed
#c.execute('''CREATE TABLE requests (ip text, date text, requested_url text, response_code int, referer text, agent text);''')
#conn.commit()
## Prepare data
lines = rawdata.split("\n")
for line in lines:
#print line
# Parse data from each line
date = re.findall("\[.*?\]", line)
date = re.sub("\[", "", date[0])
date = re.sub("\]", "", date)
quoted_data = re.findall("\".*?\"", line)
#print quoted_data
requested_url = quoted_data[0]
referer = quoted_data[1]
agent = quoted_data[2]
unquoted_data_stream = re.sub("\".*?\"", "", line)
unquoted_data = unquoted_data_stream.split(" ")
ip = unquoted_data[0]
response_code = unquoted_data[6]
#print ip
#print date
#print requested_url
#print response_code
#print referer
#print agent
## Insert elements into rows
c.execute("INSERT INTO requests VALUES (?, ?, ?, ?, ?, ?)", [ip, date, requested_url, response_code, referer, agent])
conn.commit()
## Check to see if it worked
for row in c.execute("SELECT count(*) from requests"):
print rowHere is some sample data:
```
99.122.86.237 - - [14/Oct/2012:00:01:06 -0400] "GET /epic/running_epic_tier_cover_300w.jpg HTTP/1.1" 200 81804 "http://images.google.com/search?num=10&hl=en&site=&tbm=isch&source=hp&q=epic&oq=epic&gs_l=img.3..0l10.2603.4210.0.4470.5.5.0.0.0.0.137.412.4j1.5.0...0.0...1ac.1.Ycx4MqWP66w&biw=1024&bih=672&sei=mzd6UIjlKMrcqQHM3YGwAg" "Mozilla/5.0 (iPad; CPU O
Solution
It would probably be better if you didn't read the whole log file at once. You could try something like this instead
with open('logs','r') as f:
for line in f:
#print line
# Parse data from each line
date = re.findall("\[.*?\]", line)
...Code Snippets
with open('logs','r') as f:
for line in f:
#print line
# Parse data from each line
date = re.findall("\[.*?\]", line)
...Context
StackExchange Code Review Q#17592, answer score: 2
Revisions (0)
No revisions yet.