patternsqlMinor
Most efficient (or meaningful) way to index grid-based data
Viewed 0 times
gridmostefficientwaybasedindexdatameaningful
Problem
I have a MySQL table which I expect to be in surplus of a few million rows and 99% select statements. The problem I am having is coming up with a meaningful way to determine the Primary Key. (I have provided a table dump at the bottom for reference)
For some background, I am working with a 2-D grid, whose ranges vary from approximately -800000 to +800000 in each direction. Each row is identified by its X/Z coordinate, for which each coordinate may have 1-30 associated bitstrings (type).
The concerns I have are:
1) I lack a meaningful way to Primary key this table. While I know I can create an
2) I intend to index on multiple columns (the logical x/z), via:
While I feel this appropriately addresses my real-world selects, I am constantly concerned it is insufficient indexing, based on the dozens of posts saying 'YES, YOU NEED A PK'. Is this a case where a PK can be ignored or is the arbitrary
As a side note, I have absolutely no limitations on completely restructuring this table, if there is are any more practical, proven ways to store this sort of data.
For some background, I am working with a 2-D grid, whose ranges vary from approximately -800000 to +800000 in each direction. Each row is identified by its X/Z coordinate, for which each coordinate may have 1-30 associated bitstrings (type).
The concerns I have are:
1) I lack a meaningful way to Primary key this table. While I know I can create an
id field with auto_increment, I know that in practice this key will never be used, as 100% of SELECTS will be in the form of:SELECT `type`, `offset`, `bitstring` WHERE `x` = 0 AND `z` = 0;2) I intend to index on multiple columns (the logical x/z), via:
CREATE INDEX coordinate ON bitstrings(x, z)While I feel this appropriately addresses my real-world selects, I am constantly concerned it is insufficient indexing, based on the dozens of posts saying 'YES, YOU NEED A PK'. Is this a case where a PK can be ignored or is the arbitrary
id still ultimately going to provide some behind-the-scenes optimization well worth the additional table size and column?As a side note, I have absolutely no limitations on completely restructuring this table, if there is are any more practical, proven ways to store this sort of data.
CREATE TABLE `bitstrings` (
`x` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;
--
-- Dumping data for table `bitstrings`
--
INSERT INTO `bitstrings` (`x`, `z`, `type`, `offset`, `bitstring`) VALUES
(0, 0, 1, 0, 0x52),
(0, 0, 2, 1878, 0x52);Solution
Two things
Make these chnages
When it comes to indexes, you may have a choice
CHOICE #1 : Use a primary key of ID
Setup the table like this
This could allow you to query using x and z, while retrieving all ids to have a quick reference back to the row:
If you ever have to retrieve the data for a given id, you could select it as
This will retrieve the specific info for whatever x, z and type are there.
CHOICE #2 : Use x,z,type as the PRIMARY KEY
Run this query
If this query comes back with no rows at all, this can be your primary key
CONCLUSION
Since all queries are based on mainly on x and z, either choice would be OK
- Since range of x and z are +/- 1000000, use
meduimint(Range of meduimint is +/- 8388608)
- Since ranges of offset and type is 1-65535, use
smallint unsigned
Make these chnages
CREATE TABLE `bitstrings` (
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;When it comes to indexes, you may have a choice
CHOICE #1 : Use a primary key of ID
Setup the table like this
CREATE TABLE `bitstrings` (
`id` int unsigned not null auto_incrmenet,
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
primary key (id),
key xztype (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;This could allow you to query using x and z, while retrieving all ids to have a quick reference back to the row:
SELECT id FROM bitstrings WHERE x=0 and z=0;If you ever have to retrieve the data for a given id, you could select it as
SELECT * FROM bitstrings WHERE id = 12;This will retrieve the specific info for whatever x, z and type are there.
CHOICE #2 : Use x,z,type as the PRIMARY KEY
Run this query
SELECT COUNT(1) rcount,x,z,type FROM bitstrings GROUP BY x,z,type HAVING COUNT(1) > 1;If this query comes back with no rows at all, this can be your primary key
CREATE TABLE `bitstrings` (
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
primary key (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;CONCLUSION
Since all queries are based on mainly on x and z, either choice would be OK
- The first choice would yield a bigger .MYI file, but two ways to gather data (grouped by x,z or granularly by id).
- The second choice only gathers by x and z but a smaller .MYI file.
Code Snippets
CREATE TABLE `bitstrings` (
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;CREATE TABLE `bitstrings` (
`id` int unsigned not null auto_incrmenet,
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
primary key (id),
key xztype (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;SELECT id FROM bitstrings WHERE x=0 and z=0;SELECT * FROM bitstrings WHERE id = 12;SELECT COUNT(1) rcount,x,z,type FROM bitstrings GROUP BY x,z,type HAVING COUNT(1) > 1;Context
StackExchange Database Administrators Q#16086, answer score: 5
Revisions (0)
No revisions yet.