patternsqlMinor
Find the second highest salary of employees
Viewed 0 times
salaryfindtheemployeessecondhighest
Problem
Imagine there is a table with these fields:
We need to write a query in order to find the second highest salary in this table.
I found this query :
I'm a bit confused by this part
Can anyone here explain more about the query and the part
Actually this is the execution plan can anyone explain the steps more?
Employee: E-Number
E-Name
Department
SalaryWe need to write a query in order to find the second highest salary in this table.
I found this query :
select distinct salary
from Employee e1
where 2 = ( select count(distinct salary)
from Employee e2
where e1.salary < e2.salary)I'm a bit confused by this part
where 2 = because I have not seen something like that before.Can anyone here explain more about the query and the part
where 2 =? What is the result of the subquery and in what way are table rows (e1, e2) being compared with each other?Actually this is the execution plan can anyone explain the steps more?
Solution
I'm going to address this part of the question:
...this is the execution plan can anyone explain the steps more?
Assuming the table definition looks something like this:
Empty table
The following query (not my recommendation, just an example):
Gives this execution plan (with no data in the table):
The numbers correspond to the Node ID property of each operator.
With duplicate Salary values
If we add a few rows to the table, with some duplicate
The query above produces a plan with a couple of new operators (highlighted):
The new Sort orders rows by
The Lazy Index Spool saves the output from the operators below it as they arrive. These are the Stream Aggregate, Distinct Sort, and
Because the Sort ordered rows from
The index on the spool is keyed by
If you look at the Actual Executions value for each operator in the plan, you will see that the Lazy Index Spool executes 6 times (once per outer row), whereas the Stream Aggregate, Distinct Sort, and
The optimizer introduces the extra Sort and Spool as a performance optimization, based on automatically collected statistical information about the data in the table.
...this is the execution plan can anyone explain the steps more?
Assuming the table definition looks something like this:
CREATE TABLE dbo.Employee
(
[E-Number] integer NOT NULL,
[E-Name] nvarchar(50) NOT NULL,
Department nvarchar(30) NOT NULL,
Salary smallmoney NOT NULL
);Empty table
The following query (not my recommendation, just an example):
SELECT DISTINCT
E1.Salary
FROM dbo.Employee AS E1
WHERE
1 =
(
SELECT
COUNT(DISTINCT E2.Salary)
FROM dbo.Employee AS E2
WHERE
E2.Salary > E1.Salary
);Gives this execution plan (with no data in the table):
The numbers correspond to the Node ID property of each operator.
- The
E1Table Scan (4) returns all rows (in no particular order). For each row:
- The Nested Loops join (3) executes the inner side, passing the current
E1.Salaryvalue as a parameter.
- The
E2Table Scan (8) finds rows matchingE2.Salary > E1.Salary
- The Distinct Sort (7) groups rows by
E2.Salary.
- The Stream Aggregate (6) counts the number of rows (1 per distinct
E2.Salary). The new count column is given the labelExpr1008.
- The Compute Scalar (5) converts the native count data type (
bigint) tointegerbecause the query specifiesCOUNTrather thanCOUNT_BIG. The new column is given the labelExpr1004.
- As rows emerge from the Nested Loops join, the Filter (2) applies the predicate
Expr1004 = 1
- The final Distinct Sort returns the distinct
E1.Salaryvalues.
With duplicate Salary values
If we add a few rows to the table, with some duplicate
Salary values:INSERT dbo.Employee
([E-Number], [E-Name], Department, Salary)
VALUES
(1, 'A', 'D1', $1),
(2, 'B', 'D1', $1),
(3, 'C', 'D1', $2),
(4, 'D', 'D1', $2),
(5, 'E', 'D1', $3),
(6, 'F', 'D1', $3);The query above produces a plan with a couple of new operators (highlighted):
The new Sort orders rows by
E1.Salary. The ensures that any duplicate Salary values are presented to the join sequentially.The Lazy Index Spool saves the output from the operators below it as they arrive. These are the Stream Aggregate, Distinct Sort, and
E2 Table Scan (with predicate), which as described before filter E2 (using the current value of E1.Salary), remove duplicate E2.Salary values, and count the resulting rows. The Lazy Index Spool saves recomputing the result in case the same Salary value from table E1 is presented on the next iteration of the Nested Loops join.Because the Sort ordered rows from
E1 by E1.Salary, any duplicates are presented to the join one after the other, so the saved result in the spool can be reused instead of re-running the Stream Aggregate, Distinct Sort, and E2 Table Scan each time.The index on the spool is keyed by
Salary, making it quick to find the saved COUNT(DISTINCT E2.Salary) result for the current E1.Salary value.If you look at the Actual Executions value for each operator in the plan, you will see that the Lazy Index Spool executes 6 times (once per outer row), whereas the Stream Aggregate, Distinct Sort, and
E2 Table Scan execute only 3 times (once per distinct salary value).The optimizer introduces the extra Sort and Spool as a performance optimization, based on automatically collected statistical information about the data in the table.
Code Snippets
CREATE TABLE dbo.Employee
(
[E-Number] integer NOT NULL,
[E-Name] nvarchar(50) NOT NULL,
Department nvarchar(30) NOT NULL,
Salary smallmoney NOT NULL
);SELECT DISTINCT
E1.Salary
FROM dbo.Employee AS E1
WHERE
1 =
(
SELECT
COUNT(DISTINCT E2.Salary)
FROM dbo.Employee AS E2
WHERE
E2.Salary > E1.Salary
);INSERT dbo.Employee
([E-Number], [E-Name], Department, Salary)
VALUES
(1, 'A', 'D1', $1),
(2, 'B', 'D1', $1),
(3, 'C', 'D1', $2),
(4, 'D', 'D1', $2),
(5, 'E', 'D1', $3),
(6, 'F', 'D1', $3);Context
StackExchange Database Administrators Q#185795, answer score: 7
Revisions (0)
No revisions yet.