patternpythonMinor
Myth-busting SQLite3 performance w. pysqlite
Viewed 0 times
pysqlitesqlite3mythperformancebusting
Problem
I've read most of the posts I could find on optimizing SQLite3 performance, such as:
Which are good, but they have not been updated for a few SQLite releases, so I went for "testing before believing". In the "testing" I became suspicious of whether my approach to benchmarking of the modes of using SQLite is correct? The reason for asking is that I get a much lower threshold difference in performance - which is slightly contrary to the posts I've read, where some report orders of magnitude in difference.
for example http://www.sqlite.org/pragma.html#pragma_journal_mode comments:
The TRUNCATE journaling mode commits transactions by truncating the rollback journal to zero-length instead of deleting it. On many systems, truncating a file is much faster than deleting the file since the containing directory does not need to be changed.
But I cannot replicate the drop/increase in performance which should be "expected" from performing single transactions in different modes. So I wonder: Has SQLite3 evolved so much that these differences become a legend of the past? - or did I just miss something?
Update: Yes something was missing:
The documentation states that:
By default, the sqlite3 module opens transactions implicitly before a Data Modification Language (DML) statement (i.e. INSERT/UPDATE/DELETE/REPLACE), and commits transactions implicitly before a non-DML, non-query statement (i. e. anything other than SELECT or the aforementioned).
So if you are within a transaction and issue a command like CREATE TABLE ..., VACUUM, PRAGMA, the sqlite3 module will commit implicitly before executing that command.
This might not to be the case, as pysqlite appears cycles through the operation without executing a commit between "execute" statements. I don't know
- How do I improve the performance of SQLite?,
- Is it possible to insert multiple rows at a time in an SQLite database?
- What are the performance characteristics of sqlite with very large database files?
Which are good, but they have not been updated for a few SQLite releases, so I went for "testing before believing". In the "testing" I became suspicious of whether my approach to benchmarking of the modes of using SQLite is correct? The reason for asking is that I get a much lower threshold difference in performance - which is slightly contrary to the posts I've read, where some report orders of magnitude in difference.
for example http://www.sqlite.org/pragma.html#pragma_journal_mode comments:
The TRUNCATE journaling mode commits transactions by truncating the rollback journal to zero-length instead of deleting it. On many systems, truncating a file is much faster than deleting the file since the containing directory does not need to be changed.
But I cannot replicate the drop/increase in performance which should be "expected" from performing single transactions in different modes. So I wonder: Has SQLite3 evolved so much that these differences become a legend of the past? - or did I just miss something?
Update: Yes something was missing:
The documentation states that:
By default, the sqlite3 module opens transactions implicitly before a Data Modification Language (DML) statement (i.e. INSERT/UPDATE/DELETE/REPLACE), and commits transactions implicitly before a non-DML, non-query statement (i. e. anything other than SELECT or the aforementioned).
So if you are within a transaction and issue a command like CREATE TABLE ..., VACUUM, PRAGMA, the sqlite3 module will commit implicitly before executing that command.
This might not to be the case, as pysqlite appears cycles through the operation without executing a commit between "execute" statements. I don't know
Solution
I have just had a similar experience using Another Python SQLite Wrapper (APSW), which is an alternative to pysqlite that tends to be a bit faster. The PRAGMA options make practically no difference on a database of ~2 million rows (6 columns, also doing some string formatting on the values being entered):
Tests were run three times, values averaged. I determined the PRAGMA values for each of these settings iteratively by testing a range of values, each of which was slightly faster, but the overall improvement is negligible.
Although not exactly relevant to your post (but relevant to people using Python with SQLite), one thing that makes a massive performance difference is generating table indexes after adding values. Including the time to create two indexes on the table after the values are inserted, the numbers above drop to:
However, creating two indexes on the table before adding the values, and running the same tests as above, I get a ~7-fold reduction:
This shows a speed decrease with the PRAGMAs set, however that could just be within the margin of error (due to disk loading, file fragmentation, etc...?) as I only ran this latter test once for each condition.
- 753,347 rows/minute with all PRAGMAs at default (i.e. nothing set)
- 764,967 rows/minute with:
- synchronous = OFF
- locking_mode = Exclusive
- page_size = 4096
- journal_mode = WAL
Tests were run three times, values averaged. I determined the PRAGMA values for each of these settings iteratively by testing a range of values, each of which was slightly faster, but the overall improvement is negligible.
Although not exactly relevant to your post (but relevant to people using Python with SQLite), one thing that makes a massive performance difference is generating table indexes after adding values. Including the time to create two indexes on the table after the values are inserted, the numbers above drop to:
- 691,774 rows/minute with all PRAGMAs at default (i.e. nothing set)
- 711,995 rows/minute with PRAGMAs set as described above
However, creating two indexes on the table before adding the values, and running the same tests as above, I get a ~7-fold reduction:
- 119,005 rows/minute with all PRAGMAs at default (i.e. nothing set)
- 114,209 rows/minute with PRAGMAs set as described above
This shows a speed decrease with the PRAGMAs set, however that could just be within the margin of error (due to disk loading, file fragmentation, etc...?) as I only ran this latter test once for each condition.
Context
StackExchange Code Review Q#26822, answer score: 8
Revisions (0)
No revisions yet.