HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MySQL optimization - year column grouping - using temporary table, filesort

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
yeargroupingcolumntemporaryoptimizationmysqlfilesortusingtable

Problem

I have a transactions table which is having 600,000 records, I need to list the count for the dashboard on financial year basis. The table used is MyISAM. I tried adding index for the transaction date (tran_date). Even though it is using the index it creates temporary table which is taking more time because of the temporary table and the filesort. Is there any way to optimize the query to improve the query time?

SELECT COUNT( * ) AS cnt, CASE WHEN MONTH( tran_date ) >=3
THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 )
ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) )
END AS financial_year
FROM transactions1
WHERE tran_date >= '2010-06-01'
GROUP BY financial_year

Showing rows 0 - 4 (5 total, Query took 1.2095 sec)

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE transactions1 range PRIMARY,tran_date tran_date 8 NULL 346485 Using where; Using index; Using temporary; Using filesort

Keyname Type Unique Packed Field Cardinality Collation
PRIMARY BTREE Yes No tran_date 205720 A
tran_ID 617162 A
coupon_No BTREE No No coupon_No 617162 A
account_typeBTREE No No account_type 3 A
prodCode BTREE No No prodCode 430 A
tran_date 308581 A
tran_date BTREE No No tran_date 205720 A
cust_ID BTREE No No cust_ID 3265 A
tran_date 308581 A
account_type 308581 A
points_earned 617162 A

Update :

Tried adding partition which is not that much helpful in comparison with non partitioned one. Does replication help in this case for reading this table?. The

Solution

I don't see a lot of opportunity for improvement.

The index you added was probably a big help, because it's being used for the range matching on the WHERE clause (type => range, key => tran_date), and it's being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.

But since you're using functions to construct the financial_year value for the group by, both the "using filesort" and "using temporary" can't be avoided. But, those aren't the real problem. The real problem is that you're evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times... ~700,000 function calls in one second doesn't seem too bad.

Plan B: I am definitely not a fan of storing redundant data, and I'm dead-set against making the application responsible for maintaining it... but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.

That option has a cost, of course -- adding to the amount of time it takes to update/insert/delete a transaction... but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.

Context

StackExchange Database Administrators Q#27074, answer score: 3

Revisions (0)

No revisions yet.