patternjavascriptMinor
Query for finding collection entries matching an arbitrary set of URLs
Viewed 0 times
setarbitraryquerycollectionforurlsfindingmatchingentries
Problem
Most of my slow queries on my server are due to a single "find" query. I have a collection that represents website pages. I want to be able to look up a page in the collection by URL. The caveat is that a website might have URLs that redirect to it.
I've modeled my schema (with Node.js/Mongoose) like this:
In other words, one entry might have the URL
Later, I want to query the collection to see if we have an entry which matches an arbitrary set of URLs. I don't know if the URLs are redirected URLs or what -- I just want to find all matches in my collection that represent these URLs.
So I do this:
Unfortunately, I might be looking for around 200 URLs at once, so this query sometimes takes > 1 second.
Is there further optimization I can do, or would it be better to split the query into multiple queries, capping the search-size each time?
I've modeled my schema (with Node.js/Mongoose) like this:
{
'url': { type: String, index: {unique: true, dropDups: true} },
'redirects': { type: Array, index: true },
// ... other stuff
}In other words, one entry might have the URL
http://codereview.stackexchange.com/ but also have an array of 5 other URLs (such as bit.ly URLs) that redirect to it.Later, I want to query the collection to see if we have an entry which matches an arbitrary set of URLs. I don't know if the URLs are redirected URLs or what -- I just want to find all matches in my collection that represent these URLs.
So I do this:
// urls is an array of URL strings we want to find...
model.find({$or: [{'url': {$in: urls}}, {'redirects': {$in: urls}}]}).lean().exec( ... );Unfortunately, I might be looking for around 200 URLs at once, so this query sometimes takes > 1 second.
Is there further optimization I can do, or would it be better to split the query into multiple queries, capping the search-size each time?
Solution
I haven't used MongoDB, but I have a few suggestions based on experiences with optimization and other databases.
-
Index hashes of the URLs and search for those instead. Using a simple MD5 hash would probably speed up searching with the cost of dealing with false positives (unlikely but possible).
-
Store every URL as a top-level object and add a
Here's some psuedocode to clarify what I mean:
-
Update your question with the higher-level problem you're trying to solve. I often find that when I'm trying to optimize slow task
-
Index hashes of the URLs and search for those instead. Using a simple MD5 hash would probably speed up searching with the cost of dealing with false positives (unlikely but possible).
-
Store every URL as a top-level object and add a
redirectsTo attribute. This alleviates the need to index or search the redirects collections. While now you need to perform two queries--one for the original URLs and another for the redirected URLs--this could end up being faster if only a small percentage of URLs are redirects.Here's some psuedocode to clarify what I mean:
all = []
found = model.find({'url': {$in: urls}})...
redirects = []
for each item in found
if item.redirectsTo
redirects += item.redirectsTo
else
all += item
redirected += model.find({'url': {$in: redirects}})...
for each item in redirected
all += item-
Update your question with the higher-level problem you're trying to solve. I often find that when I'm trying to optimize slow task
T to solve problem P, the better approach is to rethink P.Code Snippets
all = []
found = model.find({'url': {$in: urls}})...
redirects = []
for each item in found
if item.redirectsTo
redirects += item.redirectsTo
else
all += item
redirected += model.find({'url': {$in: redirects}})...
for each item in redirected
all += itemContext
StackExchange Code Review Q#23542, answer score: 3
Revisions (0)
No revisions yet.