HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Indexes/optimization for IN, JOIN, GROUP BY, ORDER BY query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ordergroupqueryjoinindexesoptimizationfor

Problem

I'm working on a query where I need to use IN, BETWEEN, GROUP BY, JOIN, ORDER BY all in one query. I am struggling with performance for that query, so I need help to choose indexes or to make changes to table structures if indexes won't help.

Some considerations

  • Number of rows for both below tables are in millions.



  • There is functionality where user can filter list by name, age, gender etc.



  • There is functionality where user can sort list by some metrics like age, visits_count etc.



  • Need Pagination for list.



Table Structures

Table 1

CREATE TABLE `table_1` (
  `visitor_id` varchar(32) CHARACTER SET ascii NOT NULL,
  `name` varchar(200) NOT NULL,
  `gender` varchar(1) NOT NULL DEFAULT 'M',
  `mobile_number` int(10) unsigned DEFAULT NULL,
  `age` tinyint(1) unsigned NOT NULL DEFAULT '1',
  `visits_count` mediumint(5) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`visitor_id`),
  KEY `indx_t1_test` (`visitor_id`,`visits_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Table 2

CREATE TABLE `table_2` (
  `company_id` bigint(20) unsigned NOT NULL,
  `visitor_id` varchar(32) CHARACTER SET ascii NOT NULL,
  `time_duration` mediumint(5) unsigned NOT NULL DEFAULT '0',
  `visited_on` date NOT NULL,
  PRIMARY KEY (`company_id`,`visitor_id`,`visited_on`),
  KEY `indx_t2_test` (`visited_on`,`company_id`,`visitor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Most Basic Data I want to retreive

Want to get 20 (Pagination) unique visitors (GROUP BY / DISTINCT) that visited particular group of companies (IN part) between selected time (BETWEEN part) period order by their age (ORDER BY part).

Query 1

First query if I write down for this then it would be:

```
SELECT
t1.visitor_id
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON t2.visitor_id = t1.visitor_id
WHERE
t2.company_id IN (528,211,1275,521,1299,493,492,852,868,869,1235,486,485,1238,855,1237,651,538,1241,1240,548,543,1247,1253,490,468,582,583,569,477,488,802,1294,518,

Solution

age int(3) unsigned - That allows you to store ages up to 4 billion and wastes 4 bytes. Change to TINYINT UNSIGNED (1 bytes).

Ascii for names? Limited to the US? Even so, disallowing some odd names.

I'm puzzled by t2's PRIMARY KEY. Since the PK is Unique, this disallows recording more than one visit to a company for a person. If the restriction is OK, add this (in case the Optimizer decides that the data range is the best filter):

INDEX(visited_on, conpany_id, visitor_id)


If my hunch is correct then change the PK and add an index:

PRIMARY KEY(`company_id`, `visitor_id`, visited_on),
INDEX(visited_on, conpany_id, visitor_id)


Then check out your various Queries.

Code Snippets

INDEX(visited_on, conpany_id, visitor_id)
PRIMARY KEY(`company_id`, `visitor_id`, visited_on),
INDEX(visited_on, conpany_id, visitor_id)

Context

StackExchange Database Administrators Q#163999, answer score: 2

Revisions (0)

No revisions yet.