snippetsqlMinor
How to optimize a very slow query with joins and group by?
Viewed 0 times
howgroupwithqueryslowoptimizeveryandjoins
Problem
I have this GTFS database :
All the
So basically the link between a stop and a route is
I have created and filled a table to mention all stops that are physically close to each other, filling
Now what I want is to get from route A all other routes that can be reached by one of the stops of this route.
So basically I want to go like:
then use
I would get a table similar to this:
This is my query:
```
-- get from/to route IDs
select
r.route_id as from_route_id,
c_t.route_id as to_route_id
from routes r
-- going down on all trips on route A
left join trips t on t.route_id = r.route_id
-- going down on all stop_times for A's trips
left join stop_times st on st.trip_id = t.trip_id
-- going down on all stops use by A's trips/vehicles
left join stops s on s.stop_id = st.stop_id
-- get connected stops
inner join stop_connections c_s on c_s.from_stop_id = s.stop_id
-- going up to stop_times
inner join stop_times c_st on c_st.stop_id = c_s.to_stop_id
-- going up to trips
inner join trips c_t on c_t.trip_id = c_st.trip_id and c_t.route_id <> r.route_id
-- no need to actually go up to routes because we already have the r
All the
***_id columns are indexed, and there are foreign keys between tables (such as a FK on trips.route_id to routes.route_id).So basically the link between a stop and a route is
routes→trips→stop_times→stops.I have created and filled a table to mention all stops that are physically close to each other, filling
stop_connections. Pretty simple.Now what I want is to get from route A all other routes that can be reached by one of the stops of this route.
So basically I want to go like:
A.route → A.trips (many) → A.trips.stop_times(many^2) → A.trips.stop_times.stops(many^3)then use
stop_connections to get related stops, and then go backwards to get the routes from these stops:connected_stops → connected_stops.stop_times → connected_stops.stop_times.trips → connected_stops.stop_times.trips.routesI would get a table similar to this:
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001 | 0002 |
| 0001 | 0004 |
| 0001 | 0005 |
| 0002 | 0001 |
| 0002 | 0005 |
+---------------+-------------+This is my query:
```
-- get from/to route IDs
select
r.route_id as from_route_id,
c_t.route_id as to_route_id
- start from route A
from routes r
-- going down on all trips on route A
left join trips t on t.route_id = r.route_id
-- going down on all stop_times for A's trips
left join stop_times st on st.trip_id = t.trip_id
-- going down on all stops use by A's trips/vehicles
left join stops s on s.stop_id = st.stop_id
-- get connected stops
inner join stop_connections c_s on c_s.from_stop_id = s.stop_id
-- going up to stop_times
inner join stop_times c_st on c_st.stop_id = c_s.to_stop_id
-- going up to trips
inner join trips c_t on c_t.trip_id = c_st.trip_id and c_t.route_id <> r.route_id
-- no need to actually go up to routes because we already have the r
Solution
OK, I ended up adding another lookup table:
Filling it was fairly fast:
Using it is blazingly fast:
Now I can fill my last lookup table (set of connections between every routes on my GTFS network):
Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.
CREATE TABLE IF NOT EXISTS `stops_routes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stop_id` varchar(100) NOT NULL,
`route_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
KEY `stop_id` (`stop_id`),
KEY `route_id` (`route_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;Filling it was fairly fast:
mysql> insert into stops_routes (stop_id, route_id)
->
-> select
-> s.stop_id,
-> r.route_id as from_route_id
->
-> from routes r
-> left join trips t on t.route_id = r.route_id
-> left join stop_times st on st.trip_id = t.trip_id
-> left join stops s on s.stop_id = st.stop_id
-> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496 Duplicates: 0 Warnings: 0Using it is blazingly fast:
mysql> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id
-> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001 | 0002 |
| 0001 | 0003 |
| 0001 | 0004 |
| 0001 | 0005 |
| 0001 | 0006 |
| 0001 | 0008 |
| 0001 | 0009 |
| 0001 | 0011 |
| 0001 | 0014 |
| 0001 | 0031 |
+---------------+-------------+
10 rows in set (0.63 sec)Now I can fill my last lookup table (set of connections between every routes on my GTFS network):
mysql> insert into route_connections (from_route_id, to_route_id)
-> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848 Duplicates: 0 Warnings: 0Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.
Code Snippets
CREATE TABLE IF NOT EXISTS `stops_routes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stop_id` varchar(100) NOT NULL,
`route_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
KEY `stop_id` (`stop_id`),
KEY `route_id` (`route_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;mysql> insert into stops_routes (stop_id, route_id)
->
-> select
-> s.stop_id,
-> r.route_id as from_route_id
->
-> from routes r
-> left join trips t on t.route_id = r.route_id
-> left join stop_times st on st.trip_id = t.trip_id
-> left join stops s on s.stop_id = st.stop_id
-> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496 Duplicates: 0 Warnings: 0mysql> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id
-> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001 | 0002 |
| 0001 | 0003 |
| 0001 | 0004 |
| 0001 | 0005 |
| 0001 | 0006 |
| 0001 | 0008 |
| 0001 | 0009 |
| 0001 | 0011 |
| 0001 | 0014 |
| 0001 | 0031 |
+---------------+-------------+
10 rows in set (0.63 sec)mysql> insert into route_connections (from_route_id, to_route_id)
-> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848 Duplicates: 0 Warnings: 0Context
StackExchange Database Administrators Q#77597, answer score: 2
Revisions (0)
No revisions yet.