HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

Is COMPACT a better format for fixed length rows than DYNAMIC?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsformatlengththanbetterfordynamicfixedcompact

Problem

In MySQL / MariaDB I have this table with fixed-length rows (no VARCHAR, TEXT, etc)
CREATE TABLE trigram (
id BIGINT(20) NOT NULL,
trigram CHAR(3) NOT NULL COLLATE 'utf8mb4_general_ci',
PRIMARY KEY (trigram, id) USING BTREE,
INDEX id (id) USING BTREE
)
COLLATE='utf8mb4_general_ci' ENGINE=InnoDB ROW_FORMAT=COMPACT;


The table has tens of megarows, and gets production queries of this form
SELECT id FROM trigram
WHERE trigram IN ('dba', 'ba.', 'a.s', '.st', 'sta', 'tac', 'ack')
GROUP BY ID HAVING COUNT(*) = 7


as well as INSERTs and DELETE FROM trigram WHERE id = 12345 maintenance queries. The indexes are appropriate for the table's query patterns.

This table is a poor-man's trigram index. (This poor man can't upgrade to postgreSQL and use their built in trigram indexes, sigh.) The sample query looks for ids that have 'dba.stack' strings in them. It's waaaay faster than content_column LIKE '%dba.stack%' once the trigram table is built.

Edit: What do I mean by "better"? Faster, more reliable, less buffer pool flushing in production, less maintenance burden on non-DBA users.

Question: Should I define this fixed-length-row table with ROW_FORMAT=COMPACT? Or is DYNAMIC called for? I've noticed it takes quite a bit less disk space with COMPACT.

Question: Any other suggestions or performance things to worry about?

My users (WordPress.org software users) are mostly on MariaDB 10.3+, but some are on MySQL 8, and a few are on MySQL 5.7-. I don't need to support Antelope or MyISAM legacy stuff.

Another edit:

My IN() query does a range scan on a test data set with 180K rows in the table. The JOIN to the UNION table suggested in an answer does a nested loop. The range scan takes less time. True on MariaDB 10.11, MySQL 8, and MySQL 5.7. For what it's worth. Looks like the skip-scan optimization works pretty well.

Solution

Another option would be to use multiple joins, to avoid grouping and counting and leveraging the fact that after the first few joins there are less and less rows satisfying the conditions:

SELECT t1.id FROM trigram t1
JOIN trigram t2 ON (t1.id=t2.id)
JOIN trigram t3 ON (t1.id=t3.id)
JOIN trigram t4 ON (t1.id=t4.id)
JOIN trigram t5 ON (t1.id=t5.id)
JOIN trigram t6 ON (t1.id=t6.id)
JOIN trigram t7 ON (t1.id=t7.id)
WHERE t1.trigram = 'dba' and t2.trigram ='ba.' 
  and t3.trigram = 'a.s' and t4.trigram ='.st' 
  and t5.trigram = 'sta' and t6.trigram ='tac' 
  and t7.trigram = 'ack'

Code Snippets

SELECT t1.id FROM trigram t1
JOIN trigram t2 ON (t1.id=t2.id)
JOIN trigram t3 ON (t1.id=t3.id)
JOIN trigram t4 ON (t1.id=t4.id)
JOIN trigram t5 ON (t1.id=t5.id)
JOIN trigram t6 ON (t1.id=t6.id)
JOIN trigram t7 ON (t1.id=t7.id)
WHERE t1.trigram = 'dba' and t2.trigram ='ba.' 
  and t3.trigram = 'a.s' and t4.trigram ='.st' 
  and t5.trigram = 'sta' and t6.trigram ='tac' 
  and t7.trigram = 'ack'

Context

StackExchange Database Administrators Q#338029, answer score: 3

Revisions (0)

No revisions yet.