patternsqlMinor
SQLite: choosing a unique separator
Viewed 0 times
uniquesqliteseparatorchoosing
Problem
I'm working on a project that will result in an SQLite database of about 6 GB of text content (encoded UTF-8). The text will be diverse: it will have a great deal of plain text writing, but also a significant amount of special characters (tildes, backticks, section symbols, mdash's, endash's etc.). There will also be math formulas.
It looks like
Question: What could I use as an
I've grepped the future text with a few ideas and have not identified a separator that isn't in the actual content.
I suppose I could escape whatever separators may be in the text. But I prefer to avoid that option if I can.
It looks like
.import will be the load method.Question: What could I use as an
.separator value that won't be in my text?I've grepped the future text with a few ideas and have not identified a separator that isn't in the actual content.
I suppose I could escape whatever separators may be in the text. But I prefer to avoid that option if I can.
Solution
I was just about to suggest using a multi-character separator or a character from a foreign language (e.g.
I would use a multi-character separator with very low probability to be in the text to import and then parse the document using a custom script. Here is an example in Python3 using
import.csv
sqliteimport.py
日) but .separator does not allow multi-character strings or even a multi-byte characters.I would use a multi-character separator with very low probability to be in the text to import and then parse the document using a custom script. Here is an example in Python3 using
$$$$$$ as a separator, but consider strings like [ŠĐć~^˘°˛˙€] if needed.import.csv
hello $$$ world $$$ 1
foo $$$ bar $$$ 2sqliteimport.py
import sqlite3
import_file_name = "import.csv"
cell_separator = "$$$"
lines = []
insert_query = "INSERT INTO imported (a, b, c) VALUES (?, ?, ?);"
db_connection = sqlite3.connect("database.sqlite")
db_cursor = db_connection.cursor()
db_cursor.execute("CREATE TABLE imported (a TEXT, b TEXT, c INTEGER);")
db_cursor.execute("BEGIN TRANSACTION")
with open(import_file_name, 'r') as import_file:
for line in import_file:
# Split each line at each cell_separator into a list of strings.
# Strip each one of the columns to remove whitespaces.
cleaned_columns = [column.strip()
for column in line.split(cell_separator)]
db_cursor.execute(insert_query, tuple(cleaned_columns))
db_connection.commit()
db_cursor.execute("VACUUM;")
db_cursor.close()
db_connection.close()Code Snippets
hello $$$$$$ world $$$$$$ 1
foo $$$$$$ bar $$$$$$ 2import sqlite3
import_file_name = "import.csv"
cell_separator = "$$$$$$"
lines = []
insert_query = "INSERT INTO imported (a, b, c) VALUES (?, ?, ?);"
db_connection = sqlite3.connect("database.sqlite")
db_cursor = db_connection.cursor()
db_cursor.execute("CREATE TABLE imported (a TEXT, b TEXT, c INTEGER);")
db_cursor.execute("BEGIN TRANSACTION")
with open(import_file_name, 'r') as import_file:
for line in import_file:
# Split each line at each cell_separator into a list of strings.
# Strip each one of the columns to remove whitespaces.
cleaned_columns = [column.strip()
for column in line.split(cell_separator)]
db_cursor.execute(insert_query, tuple(cleaned_columns))
db_connection.commit()
db_cursor.execute("VACUUM;")
db_cursor.close()
db_connection.close()Context
StackExchange Database Administrators Q#124524, answer score: 3
Revisions (0)
No revisions yet.