principlesqlMinor
MongoDB performance vs. PostgreSQL with 5.5 million rows / documents
Viewed 0 times
postgresqlmillionrowswithmongodbperformancedocuments
Problem
Can someone help me compare these queries and explain why the PostgreSQL query executes in just under 2000ms and the MongoDB aggregate query takes almost 9000ms and sometimes as high as 130K ms?
PostgreSQL query
MongoDB Query
Both the PostgreSQL table and the MongoDB collection are indexed on datetime : 1 and locomotive_id : 1
These queries are being testing on an iMac with a 2TB hybrid drive and 16GB of memory. I have received comparable results on a Windows 7 machine with 8GB of memory and a 256GB SSD.
Thanks!
** Update : I am posting the EXPLAIN (BUFFERS, ANALYZE) results after my question was posted
```
"Sort (cost=146036.84..146036.88 rows=19 width=24) (actual time=2182.443..2182.457 rows=152 loops=1)"
" Sort Key: locomotive_id"
" Sort Method: quicksort Memory: 36kB"
" Buffers: shared hit=13095"
" -> HashAggregate (cost=146036.24..146036.43 rows=19 width=24) (actual time=2182.144..2182.360 rows=152 loops=1)"
" Buffers: shared hit=13095"
" -> Bitmap Heap Scan on bpkdmp (cost=12393.84..138736.97 rows=583942 width=24) (actu
PostgreSQL 9.3.2 on x86_64-apple-darwin, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00), 64-bitPostgreSQL query
SELECT locomotive_id,
SUM(date_trunc('second', datetime) - date_trunc('second', prevDatetime)) AS utilization_time
FROM bpkdmp
WHERE datetime >= '2013-7-26 00:00:00.0000'
AND datetime <= '2013-7-26 23:59:59.9999'
GROUP BY locomotive_id
order by locomotive_idMongoDB Query
db.bpkdmp.aggregate([
{
$match : {
datetime : { $gte : new Date(2013,6,26, 0, 0, 0, 0), $lt : new Date(2013,6,26, 23, 59, 59, 9999) }
}
},
{
$project: {
locomotive_id : "$locomotive_id",
loco_time : { $subtract : ["$datetime", "$prevdatetime"] },
}
},
{
$group : {
_id : "$locomotive_id",
utilization_time : { $sum : "$loco_time" }
}
},
{
$sort : {_id : 1}
}
])Both the PostgreSQL table and the MongoDB collection are indexed on datetime : 1 and locomotive_id : 1
These queries are being testing on an iMac with a 2TB hybrid drive and 16GB of memory. I have received comparable results on a Windows 7 machine with 8GB of memory and a 256GB SSD.
Thanks!
** Update : I am posting the EXPLAIN (BUFFERS, ANALYZE) results after my question was posted
```
"Sort (cost=146036.84..146036.88 rows=19 width=24) (actual time=2182.443..2182.457 rows=152 loops=1)"
" Sort Key: locomotive_id"
" Sort Method: quicksort Memory: 36kB"
" Buffers: shared hit=13095"
" -> HashAggregate (cost=146036.24..146036.43 rows=19 width=24) (actual time=2182.144..2182.360 rows=152 loops=1)"
" Buffers: shared hit=13095"
" -> Bitmap Heap Scan on bpkdmp (cost=12393.84..138736.97 rows=583942 width=24) (actu
Solution
All PostgreSQL is doing here is a bitmap heap scan on
There's no parallelism anywhere in that query, either; it'll all happen on one core.
I can only assume that MongoDB is using a less efficient method, or is not benefiting from an appropriate index. You'd need to show the
bpkdmp_datetime_ix to find blocks that might contain matching rows, then a heap scan of those blocks to find matching rows in bpkdmp. It then groups the rows into hash buckets using hashes of the grouping key, sums each bucket, and sorts the results. It's a simple, basic query plan - it might perform better if you throw lots of work_mem at it, but it might not, too.There's no parallelism anywhere in that query, either; it'll all happen on one core.
I can only assume that MongoDB is using a less efficient method, or is not benefiting from an appropriate index. You'd need to show the
explain for the MongoDB query for a useful comment there to be possible; see cursor.explain.Context
StackExchange Database Administrators Q#57448, answer score: 9
Revisions (0)
No revisions yet.