patternMinor
Estimate result size of Natural Join
Viewed 0 times
estimateresultsizejoinnatural
Problem
I have the following tables
R1(A, B, C) with 1.000 rows
R2(C, D, E) with 1.500 rows
R3(E, F) with 750 rows
where a bold letter denotes a primary key.
I need to estimate the number of rows of the natural join R1 |x| R2 |x| R3.
My textbook has proposed the following solution
The Natural join of R1, R2 and R3 will be the same no matter which way we join them (join is both associative and commutative). Size can be estimated using the strategy of first joining R1 and R2, and then joining the result with R3.
Joining R1 with R2 will yield a table of at most 1.000 rows, since C is a key for R2. Likewise, joining that result with R3 will yield a table of at most 1.000 rows because E is a key for R3. Therefore the final relation will have at most 1.000 rows!
I would have thought that the final relation will have at most 750 rows, since R3 only has 750 rows.
Is the textbook solution incorrect, or am I missing something?
R1(A, B, C) with 1.000 rows
R2(C, D, E) with 1.500 rows
R3(E, F) with 750 rows
where a bold letter denotes a primary key.
I need to estimate the number of rows of the natural join R1 |x| R2 |x| R3.
My textbook has proposed the following solution
The Natural join of R1, R2 and R3 will be the same no matter which way we join them (join is both associative and commutative). Size can be estimated using the strategy of first joining R1 and R2, and then joining the result with R3.
Joining R1 with R2 will yield a table of at most 1.000 rows, since C is a key for R2. Likewise, joining that result with R3 will yield a table of at most 1.000 rows because E is a key for R3. Therefore the final relation will have at most 1.000 rows!
I would have thought that the final relation will have at most 750 rows, since R3 only has 750 rows.
Is the textbook solution incorrect, or am I missing something?
Solution
You have at most 1000 rows from the natural join of
In the first case you have exactly 1000 rows in the join, since the for each row of
In the second case you have instead n in rows in the result, with 0 ≤ n ≤ 1000, where n is the number of row in
Combining these two possibilities, we can say that the number of rows in
Now consider the second join. Since you have at most 1000 rows in
Putting together these results, we know that for m, the number of rows of
Added
(Based on the comment of @ypercubeᵀᴹ).
Note that the result can be easily generalized. If you have a natural join or an inner equijoin between k tables Ri, i = 1, ..., k, and if you can reorder the join as: R1 ⨝ R2 ⨝ ... ⨝ Rk, such that Ri ⨝ Ri+1 is on the primary key (or unique attribute) of Ri+1, then the cardinality of the result will be always less than or equal to the cardinality of R1.
R1 and R2 because there are two possible cases: either 1) all the values of R1.C are present in R2.C (i.e. R1.C is a “foreign key” for R2), or, 2) there are values in R1.C not present in R2.C. In the first case you have exactly 1000 rows in the join, since the for each row of
R1 there is exactly one row in R2 with the same value of C, since C is a key of R2, and for a certain value of C there is only one row in it, so you can join a row in R1 with only one row in R2, and the result has exactly 1000 rows.In the second case you have instead n in rows in the result, with 0 ≤ n ≤ 1000, where n is the number of row in
R1 whose value of C is present in R2: in fact, when a row of R1 has a value of C present in R2, you obtain again a single row in the result.Combining these two possibilities, we can say that the number of rows in
R1 ⨝ R2 is a value n, with 0 ≤ n ≤ 1000.Now consider the second join. Since you have at most 1000 rows in
R1 ⨝ R2, in each of those 1000 rows you can have a value of E which is present or not in table R3. But since E is a primary key in R3, when you join each of those n rows with R3 you can find at maximum one corresponding row in R3. So your result will have a number of rows which will be equal to m, such that m ≤ n. Putting together these results, we know that for m, the number of rows of
R1 ⨝ R2 ⨝ R2, holds the following: 0 ≤ m ≤ 1000.Added
(Based on the comment of @ypercubeᵀᴹ).
Note that the result can be easily generalized. If you have a natural join or an inner equijoin between k tables Ri, i = 1, ..., k, and if you can reorder the join as: R1 ⨝ R2 ⨝ ... ⨝ Rk, such that Ri ⨝ Ri+1 is on the primary key (or unique attribute) of Ri+1, then the cardinality of the result will be always less than or equal to the cardinality of R1.
Context
StackExchange Database Administrators Q#139204, answer score: 5
Revisions (0)
No revisions yet.