patternCritical
Boolean search explained
Viewed 0 times
searchexplainedboolean
Problem
My mother is taking some online course in order to be a librarian of sorts, in this course they cover boolean searches, so they can search databases efficiently, however, she got a question sounding something like this:
The search "x OR y" will result in 105 000 hits, while a search for only x will result in 80 000 hits, and a search for only y will get 35 000 hits. Why does the search "x OR y" give 105 000 hits, when the combined individual searches gives 115 000 hits?
For me this sounded strange, so I tested this myself, using the words bacon and sandwich.
But for me it adds up: 179 000 000 (bacon) + 312 000 000 (sandwich) = 491 000 000 (bacon OR sandwich)
Why could an OR query result in fewer hits than both individual queries combined?
The search "x OR y" will result in 105 000 hits, while a search for only x will result in 80 000 hits, and a search for only y will get 35 000 hits. Why does the search "x OR y" give 105 000 hits, when the combined individual searches gives 115 000 hits?
For me this sounded strange, so I tested this myself, using the words bacon and sandwich.
- Only bacon yielded 179 000 000 results
- Only sandwich yielded 312 000 000 results
- bacon OR sandwich gave 491 000 000 results
But for me it adds up: 179 000 000 (bacon) + 312 000 000 (sandwich) = 491 000 000 (bacon OR sandwich)
Why could an OR query result in fewer hits than both individual queries combined?
Solution
The counting principle that applies here is inclusion-exclusion.
$$ \left|X \cup Y\right| = \left|X\right| + \left|Y\right| - \left|X \cap Y \right|$$
To make the numbers work out, $\left|X \cap Y \right|$ must be 10000.
A Venn diagram may be more convincing to someone who may be intimidated by the notation.
$$ \left|X \cup Y\right| = \left|X\right| + \left|Y\right| - \left|X \cap Y \right|$$
To make the numbers work out, $\left|X \cap Y \right|$ must be 10000.
A Venn diagram may be more convincing to someone who may be intimidated by the notation.
Context
StackExchange Computer Science Q#57003, answer score: 95
Revisions (0)
No revisions yet.