patternsqlMinor
sql indexing and performance
Viewed 0 times
sqlindexingandperformance
Problem
I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.
Below is the show create table output
As you can see I have four indexes
Below are the two queries which take long time 4.26 and 0.64 respectively.
And this one
```
SELECT *
FROM hybrid_exp_trait_dtl h
where h.experiment_id='NSKOK12K1102'
and h.TRAIT
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.
Below is the show create table output
CREATE TABLE `hybrid_exp_trait_dtl` (
`HYBRID_NUMBER` varchar(100) NOT NULL,
`TRIAL_STATUS` char(4) NOT NULL,
`LOCATION_CODE` char(5) NOT NULL,
`EXPERIMENT_ID` char(20) NOT NULL,
`REPLICATION_NUMBER` int(3) unsigned NOT NULL,
`TRAIT_NAME` varchar(30) NOT NULL,
`TRAIT_VALUE` varchar(50) DEFAULT NULL,
`MACHINE_NO` char(2) NOT NULL,
`USER_ID` char(8) NOT NULL,
`MYTIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `FK_HYBRID_EXP_TRAIT_DTL_2` (`LOCATION_CODE`),
KEY `FK_hybrid_exp_trait_dtl_3` (`USER_ID`),
KEY `Index_4` (`HYBRID_NUMBER`, `EXPERIMENT_ID`, `TRIAL_STATUS`, `LOCATION_CODE`, `REPLICATION_NUMBER`, `TRAIT_NAME`, `USER_ID`) USING BTREE,
CONSTRAINT `FK_HYBRID_EXP_TRAIT_DTL_2` FOREIGN KEY (`LOCATION_CODE`) REFERENCES `location_mst` (`LOCATION_CODE`),
CONSTRAINT `FK_hybrid_exp_trait_dtl_3` FOREIGN KEY (`USER_ID`) REFERENCES `user_data` (`USER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1As you can see I have four indexes
- KEY
FK_HYBRID_EXP_TRAIT_DTL_2(LOCATION_CODE)
- KEY
FK_hybrid_exp_trait_dtl_3(USER_ID)
- KEY
Index_4(HYBRID_NUMBER,EXPERIMENT_ID,TRIAL_STATUS,LOCATION_CODE,REPLICATION_NUMBER,TRAIT_NAME,USER_ID) USING BTREE
Below are the two queries which take long time 4.26 and 0.64 respectively.
select distinct
location_code,
Max(replication_number) as replication
from hybrid_exp_trait_dtl t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT';And this one
```
SELECT *
FROM hybrid_exp_trait_dtl h
where h.experiment_id='NSKOK12K1102'
and h.TRAIT
Solution
I took the liberty of editing your SQL so the clauses are on separate lines. I violated CodeReview group discipline by doing that. Sorry! My suggestion to you is to format your queries so the various clauses are clearly visible, as I did. This kind of formatting, in my experience, makes it easier to understand the logic of queries and to spot errors.
You've done a good job declaring your columns
Your first query seems to be illogical, as it combines an aggregate function with a
It looks like you're looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?
Your
Your second query will most likely be a lot faster if you change your fourth compound key (the one you've named
That is because your query looks for exact values of
Now, you may have some other query that needs to have
You asked what effect these indexes will have on load performance. It's hard to answer that question exactly without knowing a lot about your load process.
You don't have any indexes that call for uniqueness. That is very good for loading performance.
But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.
Then do a few hundred
Don't try to do too many
You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you'll have a mess.
When you're done with a load operation that changes a large fraction of the rows, do
Your table will be unavailable while this
You've done a good job declaring your columns
NOT NULL where possible. That helps performance a lot.Your first query seems to be illogical, as it combines an aggregate function with a
DISTINCT qualifier. MySQL allows a lot of sloppiness in aggregate queries unfortunately.It looks like you're looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?
select location_code,
Max(replication_number) as replication
from hybrid_exp_trait_dtl t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
group by location_codeYour
Index_4 will allow this query to be satisfied using a so-called loose index scan, which is really very fast.Your second query will most likely be a lot faster if you change your fourth compound key (the one you've named
hbrid) to have the following order.(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)That is because your query looks for exact values of
EXPERIMENT_ID and TRAIT_NAME. When those two columns come first in the index, the query can jump to the exact position in the index and then sequentially reel off your results.Now, you may have some other query that needs to have
HYBRID_NUMBER first, but you didn't mention it in your question.You asked what effect these indexes will have on load performance. It's hard to answer that question exactly without knowing a lot about your load process.
You don't have any indexes that call for uniqueness. That is very good for loading performance.
But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.
SET autocommit=0;Then do a few hundred
INSERT or UPDATE statements in a row, then do COMMIT;Don't try to do too many
INSERT or UPDATE statements before COMMIT or you'll use up a lot of RAM with transaction buffers.You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you'll have a mess.
When you're done with a load operation that changes a large fraction of the rows, do
OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;Your table will be unavailable while this
OPTIMIZE operation runs: it rebuilds the table and indexes.Code Snippets
select location_code,
Max(replication_number) as replication
from hybrid_exp_trait_dtl t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
group by location_code(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)SET autocommit=0;OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;Context
StackExchange Database Administrators Q#63417, answer score: 2
Revisions (0)
No revisions yet.