HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

sql indexing and performance

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlindexingandperformance

Problem

I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.

There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.

Below is the show create table output

CREATE TABLE `hybrid_exp_trait_dtl` (
  `HYBRID_NUMBER` varchar(100) NOT NULL,
  `TRIAL_STATUS` char(4) NOT NULL,
  `LOCATION_CODE` char(5) NOT NULL,
  `EXPERIMENT_ID` char(20) NOT NULL,
  `REPLICATION_NUMBER` int(3) unsigned NOT NULL,
  `TRAIT_NAME` varchar(30) NOT NULL,
  `TRAIT_VALUE` varchar(50) DEFAULT NULL,
  `MACHINE_NO` char(2) NOT NULL,
  `USER_ID` char(8) NOT NULL,
  `MYTIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  KEY `FK_HYBRID_EXP_TRAIT_DTL_2` (`LOCATION_CODE`),
  KEY `FK_hybrid_exp_trait_dtl_3` (`USER_ID`),
  KEY `Index_4` (`HYBRID_NUMBER`, `EXPERIMENT_ID`, `TRIAL_STATUS`, `LOCATION_CODE`, `REPLICATION_NUMBER`, `TRAIT_NAME`, `USER_ID`) USING BTREE,
  CONSTRAINT `FK_HYBRID_EXP_TRAIT_DTL_2` FOREIGN KEY (`LOCATION_CODE`) REFERENCES `location_mst` (`LOCATION_CODE`),
  CONSTRAINT `FK_hybrid_exp_trait_dtl_3` FOREIGN KEY (`USER_ID`) REFERENCES `user_data` (`USER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


As you can see I have four indexes

  • KEY FK_HYBRID_EXP_TRAIT_DTL_2 (LOCATION_CODE)



  • KEY FK_hybrid_exp_trait_dtl_3 (USER_ID)



  • KEY Index_4 (HYBRID_NUMBER, EXPERIMENT_ID, TRIAL_STATUS, LOCATION_CODE, REPLICATION_NUMBER, TRAIT_NAME, USER_ID) USING BTREE



Below are the two queries which take long time 4.26 and 0.64 respectively.

select  distinct 
        location_code, 
        Max(replication_number) as replication
from hybrid_exp_trait_dtl  t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT';


And this one

```
SELECT *
FROM hybrid_exp_trait_dtl h
where h.experiment_id='NSKOK12K1102'
and h.TRAIT

Solution

I took the liberty of editing your SQL so the clauses are on separate lines. I violated CodeReview group discipline by doing that. Sorry! My suggestion to you is to format your queries so the various clauses are clearly visible, as I did. This kind of formatting, in my experience, makes it easier to understand the logic of queries and to spot errors.

You've done a good job declaring your columns NOT NULL where possible. That helps performance a lot.

Your first query seems to be illogical, as it combines an aggregate function with a DISTINCT qualifier. MySQL allows a lot of sloppiness in aggregate queries unfortunately.

It looks like you're looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?

select  location_code, 
        Max(replication_number) as replication
   from hybrid_exp_trait_dtl  t
  where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
  group by location_code


Your Index_4 will allow this query to be satisfied using a so-called loose index scan, which is really very fast.

Your second query will most likely be a lot faster if you change your fourth compound key (the one you've named hbrid) to have the following order.

(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)


That is because your query looks for exact values of EXPERIMENT_ID and TRAIT_NAME. When those two columns come first in the index, the query can jump to the exact position in the index and then sequentially reel off your results.

Now, you may have some other query that needs to have HYBRID_NUMBER first, but you didn't mention it in your question.

You asked what effect these indexes will have on load performance. It's hard to answer that question exactly without knowing a lot about your load process.

You don't have any indexes that call for uniqueness. That is very good for loading performance.

But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.

SET autocommit=0;


Then do a few hundred INSERT or UPDATE statements in a row, then do

COMMIT;


Don't try to do too many INSERT or UPDATE statements before COMMIT or you'll use up a lot of RAM with transaction buffers.

You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you'll have a mess.

When you're done with a load operation that changes a large fraction of the rows, do

OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;


Your table will be unavailable while this OPTIMIZE operation runs: it rebuilds the table and indexes.

Code Snippets

select  location_code, 
        Max(replication_number) as replication
   from hybrid_exp_trait_dtl  t
  where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
  group by location_code
(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)
SET autocommit=0;
OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;

Context

StackExchange Database Administrators Q#63417, answer score: 2

Revisions (0)

No revisions yet.