HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Extracting a conversation from a Skype sqlite database

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
conversationskypesqlitedatabaseextractingfrom

Problem

I've made this code to extract a conversation from a Skype database file with a certain user ID, all this being specified as a command line argument.

Since Skype uses SQLite to store the messages I can use the sqlite3 Python module to retrieve the messages.

The code first checks for the database file, opens it if it exists, parses it and finally outputs the chat either to stdout or a specified output file (it uses the partner's ID by default if nothing is provided). The output looks somewhat like what you would expect from an IRC conversation.

I'm quite satisfied with the script in general however I had several issues with unicode characters which made me have to use some "cheats". I'm not satisfied with all these unicode() conversions, and the fact I have to specify once more at the end via encode('utf-8') that I am indeed going to output other things than ASCII. I also think this code could be made more elegant, especially with the SQL query. If I ever want to add more options, such as sorting, I feel like putting everything in the SQL query looks ugly.

```
#!/usr/bin/env python2
# charset=utf-8

from __future__ import print_function

import sys
import sqlite3
import os.path
import datetime
import HTMLParser

def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)

def eprint_use():
eprint("usage : " + sys.argv[0] + " [output file]")

# actual code here
# first of all check argv and print the help message if it's wrong

if not len(sys.argv) in [3, 4]:
eprint_use()

else:
database_path = sys.argv[1]
partner_id = sys.argv[2]
output_path = sys.argv[3] if len(sys.argv) == 4 else partner_id + '.txt'

if not os.path.isfile(database_path):
sys.exit('the file %s does not exist' % (database_path))

output_file = sys.stdout if output_path == '-' else open(output_path, 'w')

connection = sqlite3.connect(database_path)
cursor = connection.cursor()

parser = HTMLParser.HTMLParser()

cursor.execut

Solution

First off, you should be using argparse for any argument parsing more complicated than python script.py arg1. It has automatic parsing of sys.argv, allow default values, custom help messages, custom argument types, prints a help message if any arguments mismatch and exits, ...

import argparse
import os.path
import sys

def is_file(f):
    if os.path.isfile(f):
        return f
    raise OSError("{} does not exist".format(f))

parser = argparse.ArgumentParser(descr="Skype history reader")
parser.add_argument('db_path', type=is_file, help="Database path")
parser.add_argument('partner_id', help="Partner to show history of")
parser.add_argument('output_path', help="Location to save output to", default="-")
args = parser.parse_args()

args.output_path = args.output_path or "{.partner_id}.txt".format(args)
output = sys.stdout if args.output == "-" else open(args.output_path, "w")


For your SQL, you should be using Prepared Statements (or the Python implementation of it)

cursor.execute("""
    SELECT timestamp, from_dispname, body_xml
    FROM Messages
    WHERE dialog_partner = ?
    ORDER BY timestamp
    """, partner_id)


This needs paramstyle='qmark'. Otherwise use %s instead of ?. It ensures that partner_id is correctly escaped, so you can't do SQL injection with something like

partner_id = "'; DROP TABLE Messages; SELECT * FROM USERS '"


(Not sure this exact string would work, but something similar to it would be able to delete your whole database. Or print all your chat partners. Or add swearwords to all your messages.)

You should always close your files, otherwise you could get file corruption. Normally I would advise using with open(file_name, "w") as output:, but here we need to also use sys.stdout. Therefore you need to add an explicit output_file.close() at the end, ideally by wrapping your code in a try...finally to ensure it is also closed if some code in between raises an exception.

Likewise, you should close your connection to the db at the end:
connection.close().

Code Snippets

import argparse
import os.path
import sys

def is_file(f):
    if os.path.isfile(f):
        return f
    raise OSError("{} does not exist".format(f))

parser = argparse.ArgumentParser(descr="Skype history reader")
parser.add_argument('db_path', type=is_file, help="Database path")
parser.add_argument('partner_id', help="Partner to show history of")
parser.add_argument('output_path', help="Location to save output to", default="-")
args = parser.parse_args()

args.output_path = args.output_path or "{.partner_id}.txt".format(args)
output = sys.stdout if args.output == "-" else open(args.output_path, "w")
cursor.execute("""
    SELECT timestamp, from_dispname, body_xml
    FROM Messages
    WHERE dialog_partner = ?
    ORDER BY timestamp
    """, partner_id)
partner_id = "'; DROP TABLE Messages; SELECT * FROM USERS '"

Context

StackExchange Code Review Q#148284, answer score: 4

Revisions (0)

No revisions yet.