patternsqlMinor
Indexes/optimization for IN, JOIN, GROUP BY, ORDER BY query
Viewed 0 times
ordergroupqueryjoinindexesoptimizationfor
Problem
I'm working on a query where I need to use
Some considerations
Table Structures
Table 1
Table 2
Most Basic Data I want to retreive
Want to get 20 (Pagination) unique visitors (
Query 1
First query if I write down for this then it would be:
```
SELECT
t1.visitor_id
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON t2.visitor_id = t1.visitor_id
WHERE
t2.company_id IN (528,211,1275,521,1299,493,492,852,868,869,1235,486,485,1238,855,1237,651,538,1241,1240,548,543,1247,1253,490,468,582,583,569,477,488,802,1294,518,
IN, BETWEEN, GROUP BY, JOIN, ORDER BY all in one query. I am struggling with performance for that query, so I need help to choose indexes or to make changes to table structures if indexes won't help.Some considerations
- Number of rows for both below tables are in
millions.
- There is functionality where user can filter list by
name,age,genderetc.
- There is functionality where user can sort list by some metrics like
age,visits_countetc.
- Need Pagination for list.
Table Structures
Table 1
CREATE TABLE `table_1` (
`visitor_id` varchar(32) CHARACTER SET ascii NOT NULL,
`name` varchar(200) NOT NULL,
`gender` varchar(1) NOT NULL DEFAULT 'M',
`mobile_number` int(10) unsigned DEFAULT NULL,
`age` tinyint(1) unsigned NOT NULL DEFAULT '1',
`visits_count` mediumint(5) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`visitor_id`),
KEY `indx_t1_test` (`visitor_id`,`visits_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8Table 2
CREATE TABLE `table_2` (
`company_id` bigint(20) unsigned NOT NULL,
`visitor_id` varchar(32) CHARACTER SET ascii NOT NULL,
`time_duration` mediumint(5) unsigned NOT NULL DEFAULT '0',
`visited_on` date NOT NULL,
PRIMARY KEY (`company_id`,`visitor_id`,`visited_on`),
KEY `indx_t2_test` (`visited_on`,`company_id`,`visitor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8Most Basic Data I want to retreive
Want to get 20 (Pagination) unique visitors (
GROUP BY / DISTINCT) that visited particular group of companies (IN part) between selected time (BETWEEN part) period order by their age (ORDER BY part).Query 1
First query if I write down for this then it would be:
```
SELECT
t1.visitor_id
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON t2.visitor_id = t1.visitor_id
WHERE
t2.company_id IN (528,211,1275,521,1299,493,492,852,868,869,1235,486,485,1238,855,1237,651,538,1241,1240,548,543,1247,1253,490,468,582,583,569,477,488,802,1294,518,
Solution
age int(3) unsigned - That allows you to store ages up to 4 billion and wastes 4 bytes. Change to TINYINT UNSIGNED (1 bytes).Ascii for names? Limited to the US? Even so, disallowing some odd names.
I'm puzzled by t2's
PRIMARY KEY. Since the PK is Unique, this disallows recording more than one visit to a company for a person. If the restriction is OK, add this (in case the Optimizer decides that the data range is the best filter):INDEX(visited_on, conpany_id, visitor_id)If my hunch is correct then change the PK and add an index:
PRIMARY KEY(`company_id`, `visitor_id`, visited_on),
INDEX(visited_on, conpany_id, visitor_id)Then check out your various Queries.
Code Snippets
INDEX(visited_on, conpany_id, visitor_id)PRIMARY KEY(`company_id`, `visitor_id`, visited_on),
INDEX(visited_on, conpany_id, visitor_id)Context
StackExchange Database Administrators Q#163999, answer score: 2
Revisions (0)
No revisions yet.