HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Parsing HTTP server logs

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
serverhttpparsinglogs

Problem

I have a relatively simple project to parse some HTTP server logs using Python and SQLite. I wrote up the code but I'm always looking for tips on being a better Python scripter. Though this is a simple task and my code works as written, I was hoping for some pointers to improve my coding ability.

import re
import sqlite3

## Load in the log file

file = open("logs")
rawdata = file.read()

## Create a database

conn = sqlite3.connect("parsed_logs.sqlite3")
c = conn.cursor()

# Build the SQLite database if needed
#c.execute('''CREATE TABLE requests (ip text, date text, requested_url text, response_code int, referer text, agent text);''')
#conn.commit()

## Prepare data

lines = rawdata.split("\n")
for line in lines:
    #print line
    # Parse data from each line
    date = re.findall("\[.*?\]", line)
    date = re.sub("\[", "", date[0])
    date = re.sub("\]", "", date)
    quoted_data = re.findall("\".*?\"", line)
    #print quoted_data
    requested_url = quoted_data[0]
    referer = quoted_data[1]
    agent = quoted_data[2]
    unquoted_data_stream = re.sub("\".*?\"", "", line)    
    unquoted_data = unquoted_data_stream.split(" ")
    ip = unquoted_data[0]
    response_code = unquoted_data[6]

    #print ip
    #print date
    #print requested_url
    #print response_code
    #print referer
    #print agent

    ## Insert elements into rows
    c.execute("INSERT INTO requests VALUES (?, ?, ?, ?, ?, ?)", [ip, date, requested_url, response_code, referer, agent])

conn.commit()

## Check to see if it worked

for row in c.execute("SELECT count(*) from requests"):
    print row


Here is some sample data:

```
99.122.86.237 - - [14/Oct/2012:00:01:06 -0400] "GET /epic/running_epic_tier_cover_300w.jpg HTTP/1.1" 200 81804 "http://images.google.com/search?num=10&hl=en&site=&tbm=isch&source=hp&q=epic&oq=epic&gs_l=img.3..0l10.2603.4210.0.4470.5.5.0.0.0.0.137.412.4j1.5.0...0.0...1ac.1.Ycx4MqWP66w&biw=1024&bih=672&sei=mzd6UIjlKMrcqQHM3YGwAg" "Mozilla/5.0 (iPad; CPU O

Solution

It would probably be better if you didn't read the whole log file at once. You could try something like this instead

with open('logs','r') as f:
    for line in f:
        #print line
        # Parse data from each line
        date = re.findall("\[.*?\]", line)
        ...

Code Snippets

with open('logs','r') as f:
    for line in f:
        #print line
        # Parse data from each line
        date = re.findall("\[.*?\]", line)
        ...

Context

StackExchange Code Review Q#17592, answer score: 2

Revisions (0)

No revisions yet.