HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

MongoDB: is order guaranteed deep into the aggregation pipeline?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
aggregationordertheintomongodbdeepguaranteedpipeline

Problem

I'm trying to get the "top" 2 items for any number of foreign keys, sorted by some key. What I have is the following, which appears to work:

db.getCollection('col1').aggregate([
    {$match: {fk: {$in: [1, 2]}}}
    {$sort: {name: -1}}, 
    {$group: {_id: "$fk", items: {$push: "$ROOT"} }}, 
    {$project: {items: {$slice: ["$items", 2]} }}
])


My question is: does MongoDB guarantee that the sorting I apply in the second step in my aggregation via $sort will be maintained when I get down to the $slice inside my $project in the last step of my aggregation?

The data I'm seeing indicates that MongoDB is currently behaving this way, but I'm wondering if this is spec'd or if MongoDB is free to return the items array in a different order if the optimizer ever decides it would be advantageous to do so.

Solution

While the accepted answer is correct in that $group stage will not preserver order, I think it left some important information out and can be misleading.

If you look at the comments on the Jira Ticket linked it clears it up there to. Basically it says that $group maintains the order within the 'groups' but order will not be guaranteed for stages after the group stage.

So $items will be sorted by name: -1 in the example and calling $slice 2 on it in the $project stage will result in the first 2 items sorted by name.

So I guess for example if you have the following data
[
{ fk: 1, name: 'James' },
{ fk: 2, name: 'Zelda' },
{ fk: 2, name: 'Victor' },
{ fk: 1, name: 'Becky' },
{ fk: 1, name: 'Carlos' },
{ fk: 2, name: 'Katie' },
]


Running the example query
db.col1.aggregate([
{ $match: { fk: { $in: [1, 2] } } },
{ $sort: { name: -1 } },
{ $group: { _id: "$fk", items: { $push: "$$ROOT" } } },
{ $project: { items: {$slice: ["$items", 2] } } }
])


You would get

{ _id: 2, items: [ { fk: 2, name: "Zelda" }, { fk: 2, name: "Victor" }] }
{ _id: 1, items: [ { fk: 1, name: "James" }, { fk: 1, name: "Carlos" }] }


which returns the first 2 names in descending order

But if you changed the sort by to be on fk
db.col1.aggregate([
{ $match: { fk: { $in: [1, 2] } } },
{ $sort: { fk: -1 } },
{ $group: { _id: "$fk", items: { $push: "$$ROOT" } } },
{ $project: { items: {$slice: ["$items", 2] } } }
])


you could see something like
{ _id: 1, items: [ { fk: 1, name: "James" }, { fk: 1, name: "Becky" } ] }
{ _id: 2, items: [ { fk: 2, name: "Zelda" }, { fk: 2, name: "Victor" } ] }


where you can see that _id (which is fk) is not sorted in descending order. If you wanted it to be sorted by _id you could just add that sorting after the $group stage

Code Snippets

{ _id: 2, items: [ { fk: 2, name: "Zelda" }, { fk: 2, name: "Victor" }] }
{ _id: 1, items: [ { fk: 1, name: "James" }, { fk: 1, name: "Carlos" }] }

Context

StackExchange Database Administrators Q#197434, answer score: 4

Revisions (0)

No revisions yet.