HiveBrain v1.2.0
Get Started
← Back to all entries
patternCritical

Boolean search explained

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
searchexplainedboolean

Problem

My mother is taking some online course in order to be a librarian of sorts, in this course they cover boolean searches, so they can search databases efficiently, however, she got a question sounding something like this:

The search "x OR y" will result in 105 000 hits, while a search for only x will result in 80 000 hits, and a search for only y will get 35 000 hits. Why does the search "x OR y" give 105 000 hits, when the combined individual searches gives 115 000 hits?

For me this sounded strange, so I tested this myself, using the words bacon and sandwich.

  • Only bacon yielded 179 000 000 results



  • Only sandwich yielded 312 000 000 results



  • bacon OR sandwich gave 491 000 000 results



But for me it adds up: 179 000 000 (bacon) + 312 000 000 (sandwich) = 491 000 000 (bacon OR sandwich)

Why could an OR query result in fewer hits than both individual queries combined?

Solution

The counting principle that applies here is inclusion-exclusion.

$$ \left|X \cup Y\right| = \left|X\right| + \left|Y\right| - \left|X \cap Y \right|$$

To make the numbers work out, $\left|X \cap Y \right|$ must be 10000.

A Venn diagram may be more convincing to someone who may be intimidated by the notation.

Context

StackExchange Computer Science Q#57003, answer score: 95

Revisions (0)

No revisions yet.