Recent Entries 10
- gotcha major 32d agoSQLite FTS5 porter stemmer silently over-counts short keywords; special chars in MATCH throw syntax errorsWhen using a SQLite FTS5 virtual table with tokenize='porter' to count keyword/brand mentions in a text corpus, the Porter stemmer conflates short keywords with their stemmed roots. A short keyword gets stemmed to a common English root and MATCH then returns every document containing the unrelated common word. Example seen in the wild: MATCH 'hims' == MATCH 'him' (5,414 rows) because both stem to 'him' — so a keyword's signal is computed against thousands of pronoun-noise documents. Separately, FTS5 MATCH treats &, ., -, and other punctuation as query syntax: passing a token like 'a&f', 'e.l.f.', or 'coca-cola' verbatim raises 'fts5: syntax error near ...' or 'no such column: cola'. If the caller catches that error and returns [], those keywords silently produce ZERO matches on every run — total, invisible signal loss for exactly the hand-curated terms you care most about.
- pattern major 112d agoBulk importing StackExchange XML data dumps into SQLite with streaming XML parsingNeed to import tens of thousands of Q&A entries from StackExchange data dumps (archive.org/details/stackexchange) into a SQLite database. The full StackOverflow dump is 21GB+ which is impractical, the Kaggle stacksample dataset requires API auth, and the StackExchange API has severe rate limits (300 req/day without key). Need an efficient approach to get 50K+ high-quality entries.
- pattern moderate 112d agoFetch live data from SQLiteI'm writing a small Python script which needs to fetch live data from a database (I'm using SQLite 3 for now). I may reuse this function for a lot of scripts. Can this code be made better? ``` def fetch_database(database, table, filter_dict, case=None): """ Takes filter parameter as a dict dict = {'column_name':['filter', 'filter2'], 'column_name2'=[], ..} case = wrap around the dictionary key(UPPER, LOWER, etc..), None database (with pathname ex: database.db table column = dict.keys() """ conn = sqlite3.connect(database) c = conn.cursor() filter_dict = dict_format(filter_dict) # A separate function deletes empty keys from dictionary keys_list = filter_dict.keys() statement = 'SELECT * FROM ' + table if len(keys_list) > 0: statement += ' WHERE ' for keys in keys_list: if case == None: key = keys else: key = case + '(' + keys + ')' temp_data = filter_dict[keys] temp_size = len(temp_data) if keys_list.index(keys) != 0: statement += ' AND ' if temp_size > 0: for data in temp_data: if temp_data.index(data) == 0: statement += key + '="' + data + '"' else: statement += ' OR ' + key + '="' + data + '"' dataset = c.execute(statement) dataset = dataset.fetchall() return dataset ```
- pattern minor 112d agoExpress routing with a login action using SQLiteI'm new to Express and SQL, so I don't know the conventional ways of combining the two. Right now I have done it the following way: ``` app.get('/login', function (req, res) { res.render('login'); }); app.get('/home', function (req, res) { res.render('home'); }); app.post('/login', function (req, res) { db = new sqlite3.Database(file); db.serialize(function () { [...] db.all(query, function (err, rows) { if(rows.length == 1) { [...] res.render('home', { username: rows[0].username }); } else { res.render('login', { message: "Login not successful!" }); } }); }); db.close(); }); ``` However, I feel like the routing should be separated from the database stuff. What should I do different? Or is this normal?
- pattern minor 112d agoBackup a SQLite databaseI want to backup a single SQLite database daily up to 30 days back, but I also want to keep at least 2 backups at all times (i.e. if there have been no backups in the last 30 days because the database didn't change, I don't want to delete old backups). I came up with this simple script that is supposed to run as a daily cronjob: ``` #!/bin/bash BACKUP_FILE="/path/to/db.sqlite3" BACKUP_DIR="$HOME/backups" today=`date "+%Y-%m-%d"` # Less than 31 days old, i.e. 30 days or younger if find "$BACKUP_FILE" -type f -mtime -31 | grep -q . then find "$BACKUP_DIR" -type f -mtime +30 -exec rm {} \; fi last_backup=$(ls -t "$BACKUP_DIR" | head -1) if [ -n "$last_backup" ] && diff "$BACKUP_FILE" "$BACKUP_DIR/$(ls -t "$BACKUP_DIR" | head -1)" >/dev/null then : else cp "$BACKUP_FILE" "$BACKUP_DIR/$today.sqlite3" fi ``` I'm not sure if using a nop in the second `then` clause makes sense, but it seemed cleaner to me than wrapping the condition in a `test` and checking `$?`. The database is used by <20 people and not changed very often, but I'm not sure if I should lock the database anyways - and I'm also not sure how to lock it from a shell script. Since this is the first bash script I've ever written for serious use, I'd appreciate any feedback on how it could be improved.
- pattern minor 112d agoSqlite.swift -- Selecting all rows and returning them in an array of tuplesLink to SQLite wrapper ``` func queryAll() -> [(String, String, String)] { do { let sql = try DB?.prepare("SELECT city, zip , temp FROM weather") var arr : [(city : String, zip : String, temp : String)] = [] for row in sql! { arr += [(city: row[0] as! String, zip: row[1] as! String, temp: row[2] as! String)] } return arr; } catch { print("\(error)") } return [("","","")] } ``` Grabbing all the rows, and returning them as a tuple for later use. I've read the docs but I cannot think of a more efficient way to return all the rows to a TableView ``` override func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell { let cell : CustomTableViewController = self.tableView.dequeueReusableCell(withIdentifier: "Cell")! as! CustomTableViewController let DB : WeatherDataStorage = WeatherDataStorage() var arr = [(city : String, zip : String, temp : String)]() arr = DB.queryAll() cell.cityLabel.text = arr[i].city cell.zipLabel.text = arr[i].zip cell.temperatureLabel.text = arr[i].temp i += 1 return cell } ``` EDIT: GIST
- pattern minor 112d agoFind my colleaguesI recently finished the Using Databases with Python course. To figure out whether I really understood what was being taught, I build an application which does roughly the same but uses a different formatting. The idea is to have a list with addresses, look up the corresponding coordinates using the Google Maps API, store name/address/coordinates in a SQLite database, retrieve the data from the database and show the retrieved locations on a map. To show my colleagues what keeps me busy during the weekends, I inserted their addresses, plotted a map and showed them the might of Python. Of-course, all data here is example data and not the actual dataset. The real set is less than 200 addresses though, so the difference shouldn't matter much. All code is real. `geoload.py` queries the API for all addresses provided in `locations.data`. The results will be stored in `geodata.sqlite`. All data previously encountered will be skipped instead of retrieved again to allow multi-stage loading (which is helpful in cases of rate limiting). `geodump.py` will retrieve the data from `geodata.sqlite` and put data ready for display in JSON format in `locations.js`. `locations.html` will display a map with markers placed on the locations provided. Hovering over the markers will tell who's stationed there. locations.data `Headquarters, Havenmeesterweg 1, Haarlemmermeer Alice, Spoorstraat 2, Leeuwarden Bob, Stadskanaal Charlie, Maastricht Devon, Woerden Eddy, P.J. Jongstraat, Lutjebroek Freddy, Roosendaal Giles, Almere Harry, Spoorstraat 4, Winterswijk Igor, Middelburg Janine, Terschelling ` locations.js `locations = [ ['Headquarters',52.3046539,4.7588565], ['Alice',53.1975889,5.8055371], ['Bob',52.9919853,6.9462217], ['Charlie',50.8513682,5.6909725], ['Devon',52.0798287,4.8627239], ['Eddy',52.6979589,5.2007523], ['Freddy',51.535849,4.4653213], ['Giles',52.3507849,5.2647016], ['Harry',51.9698835,6.7204984], ['Igor',51.4987962,3.610998], ['Janine',53.3978747,5.3466786] ]; ` geoload
- pattern minor 112d agoExtracting a conversation from a Skype sqlite databaseI've made this code to extract a conversation from a Skype database file with a certain user ID, all this being specified as a command line argument. Since Skype uses SQLite to store the messages I can use the `sqlite3` Python module to retrieve the messages. The code first checks for the database file, opens it if it exists, parses it and finally outputs the chat either to stdout or a specified output file (it uses the partner's ID by default if nothing is provided). The output looks somewhat like what you would expect from an IRC conversation. I'm quite satisfied with the script in general however I had several issues with unicode characters which made me have to use some "cheats". I'm not satisfied with all these `unicode()` conversions, and the fact I have to specify once more at the end via `encode('utf-8')` that I am indeed going to output other things than ASCII. I also think this code could be made more elegant, especially with the SQL query. If I ever want to add more options, such as sorting, I feel like putting everything in the SQL query looks ugly. ``` #!/usr/bin/env python2 # charset=utf-8 from __future__ import print_function import sys import sqlite3 import os.path import datetime import HTMLParser def eprint(*args, **kwargs): print(*args, file=sys.stderr, **kwargs) def eprint_use(): eprint("usage : " + sys.argv[0] + " [output file]") # actual code here # first of all check argv and print the help message if it's wrong if not len(sys.argv) in [3, 4]: eprint_use() else: database_path = sys.argv[1] partner_id = sys.argv[2] output_path = sys.argv[3] if len(sys.argv) == 4 else partner_id + '.txt' if not os.path.isfile(database_path): sys.exit('the file %s does not exist' % (database_path)) output_file = sys.stdout if output_path == '-' else open(output_path, 'w') connection = sqlite3.connect(database_path) cursor = connection.cursor() parser = HTMLParser.HTMLParser() cursor.execut
- pattern minor 112d agoCar leasing system - add lease to databaseI just came back to programming and decided to create a car leasing system (GitHub page here.) to teach myself about databases, gui programming and web development. And I would like some input on what I can imporve on a specific function that I just rewrote. What the function does is add a car lease to the customers name, makes the car unavailable for others to lease and adds a record of the lease. All to the database. I am pretty happy with it but I know there is parts where I can improve. Here is the function: ``` import sqlite3 import datetime conn = sqlite3.connect("db/test.db") c = conn.cursor() current_datetime = datetime.datetime.now() def add_lease(customer_id, car_id, lease_expire): """ Adds lease to customers account and removes the car from available cars list. """ lease_start = current_datetime.strftime("%Y-%m-%d") car_active = c.execute("SELECT IS_LEASED FROM CARS WHERE ID=?", (car_id,)).fetchone()[0] lease_active = c.execute("SELECT ACTIVE_LEASE FROM CUSTOMERS WHERE ID=?", (customer_id,)).fetchone()[0] if not car_active or car_active == None: if not lease_active or lease_active == None: c.execute("UPDATE CARS SET IS_LEASED = 1 WHERE ID = ?", (car_id,)) c.execute("""INSERT INTO LEASES (CAR_ID, LEASE_START, LEASE_EXPIRE, CUSTOMER, IS_ACTIVE) VALUES (?, ?, ?, ?, ?)""", (car_id, lease_start, lease_expire, customer_id, 1,)) lease_id = c.execute("SELECT ID FROM LEASES").fetchall() lease_id = ''.join(c for c in lease_id if c not in '[](),') c.execute("UPDATE CUSTOMERS SET LEASE_ID = ? WHERE ID = ?", (lease_id, customer_id,)) c.execute("UPDATE CUSTOMERS SET ACTIVE_LEASE = 1 WHERE ID = ?", (customer_id,)) conn.commit() else: return 2 ``` If you want to try it out yourself here is the db_setup.py file: ``` import sqlite3 as lite con = lite.connect('db/test.db') with con: cur
- pattern minor 112d agoActivity time & duration trackerThis is my first attempt at creating a simple Python tool that tracks the time of certain activities retrieved from a database. After stopping the timer, this tool writes the duration of the activity into the database. Can you please provide me some tricks to make this code more Pythonic? objects.py ``` #!/usr/bin/env python # -*- coding: utf-8 -*- class Project: '''Class that holds the projects''' projects = [] def __init__(self, name): self.name = name self.tasks = [] type(self).projects.append(self) def __str__(self): return self.name @property def name(self): return self._name @name.setter def name(self, name): if not name: raise ValueError(_('You need to specify a value!')) self._name = name def add_task(self, task): if (task not in self.tasks): self.tasks.append(task) def get_tasks(self): return self.tasks @staticmethod def get_project_by_name(name): for project in Project.projects: if name == project.name: return project class Task: '''A task for a project''' def __init__(self, id, name): self.id = id self.name = name def __str__(self): return '{0}:{1}'.format(self.id, self.name) def __eq__(self, other): return self.id == other.id and self.name == other.name @property def name(self): return self._name @name.setter def name(self, name): if not name: raise ValueError(_('You need to specify a value!')) self._name = name class Entry: '''A timesheet entry that contains the task, the date and the time spent''' def __init__(self, task, date, duration): self.task = task self.date = date self.time_in_minutes = self._get_dur