patternsqlMinor
Query challenge: Creating even sized buckets, based on a measure not row count
Viewed 0 times
countbucketschallengecreatingquerysizedbasedevenmeasurenot
Problem
I'll describe the problem in terms of loading a fixed number of trucks with orders, as evenly as possible.
Inputs:
A set:
The challenge here is to assign a
A single order cannot be split across trucks.
Trucks should be as evenly* loaded as possible, measured by
* Evenly: The smallest achievable delta between the least loaded truck and most loaded truck. By this definition, 1,2,3 is more evenly distributed than 1,1,4. If it helps, pretend you are stats algorithm, creating even height histograms.
There is no consideration for maximum truck load. These are magic elastic trucks. The number of trucks however is fixed.
There’s an obviously solution which is iterative - round robin allocate orders.
But can it be done as set based logic?
My main interest is for SQL Server 2014 or later. But set based solutions for other platforms could also be interesting.
This feels like Itzik Ben-Gan territory :)
My real world application is distributing a processing workload into a number of buckets to match the number of logical CPUs. Hence each bucket has no maximum size. Stats updates, specifically. I just thought it was more fun to abstract the problem into trucks as a way of framing the challenge.
```
CREATE TABLE #OrderDetail (
OrderId int NOT NULL,
OrderDetailId int NOT NULL PRIMARY KEY,
OrderDetailSize tinyint NOT NULL,
TruckId tinyint NULL)
-- Sample Data
INSERT #OrderDetail (OrderId, OrderDetailId, OrderDetailSize)
VALUES
(1 ,100 ,75 ),
(2 ,101 ,5 ),
(2 ,102 ,5 ),
(2 ,103 ,5 ),
(2 ,104 ,5 ),
(2 ,105 ,5 ),
(3 ,106 ,100),
(4 ,107 ,1 ),
(5 ,108 ,11 ),
(6 ,109 ,21 ),
(7 ,110 ,49 ),
(8 ,111 ,25 ),
(8 ,112 ,25 ),
(9 ,113 ,40 ),
(10 ,114 ,49 ),
(11 ,115 ,10 ),
(11 ,11
Inputs:
@TruckCount - the number of empty trucks to fillA set:
OrderId,
OrderDetailId,
OrderDetailSize,
TruckId (initially null)Orders are composed of one or more OrderDetails.The challenge here is to assign a
TruckId to each record.A single order cannot be split across trucks.
Trucks should be as evenly* loaded as possible, measured by
sum(OrderDetailSize).* Evenly: The smallest achievable delta between the least loaded truck and most loaded truck. By this definition, 1,2,3 is more evenly distributed than 1,1,4. If it helps, pretend you are stats algorithm, creating even height histograms.
There is no consideration for maximum truck load. These are magic elastic trucks. The number of trucks however is fixed.
There’s an obviously solution which is iterative - round robin allocate orders.
But can it be done as set based logic?
My main interest is for SQL Server 2014 or later. But set based solutions for other platforms could also be interesting.
This feels like Itzik Ben-Gan territory :)
My real world application is distributing a processing workload into a number of buckets to match the number of logical CPUs. Hence each bucket has no maximum size. Stats updates, specifically. I just thought it was more fun to abstract the problem into trucks as a way of framing the challenge.
```
CREATE TABLE #OrderDetail (
OrderId int NOT NULL,
OrderDetailId int NOT NULL PRIMARY KEY,
OrderDetailSize tinyint NOT NULL,
TruckId tinyint NULL)
-- Sample Data
INSERT #OrderDetail (OrderId, OrderDetailId, OrderDetailSize)
VALUES
(1 ,100 ,75 ),
(2 ,101 ,5 ),
(2 ,102 ,5 ),
(2 ,103 ,5 ),
(2 ,104 ,5 ),
(2 ,105 ,5 ),
(3 ,106 ,100),
(4 ,107 ,1 ),
(5 ,108 ,11 ),
(6 ,109 ,21 ),
(7 ,110 ,49 ),
(8 ,111 ,25 ),
(8 ,112 ,25 ),
(9 ,113 ,40 ),
(10 ,114 ,49 ),
(11 ,115 ,10 ),
(11 ,11
Solution
My first thought was
The "best solution" part is defined in the question - the smallest difference between the most loaded and least loaded trucks. The other bit - all combinations - caused me pause for thought.
Consider a situation where we have three orders A, B and C and three trucks. The possibilities are
Many of these are symmetric. The first six rows, for example, differ only in which truck each order is placed. Since the trucks are fungible these arrangemets will produce the same outcome. I shall ignore this for now.
There are known queries for producing permutations and combinations. However, these will produce arrangements within a single bucket. For this problem I need arrangements across multiple buckets.
Looking at the output from the standard "all combinations" query
I noted the results formed the same pattern as Table A. By making the congnitive leap of considering each column to be an Order1, the values to say which truck will hold that Order, and a row to be an arrangement of Orders within trucks. The query then becomes
Expaning this to cover the fourteen Orders in the example data, and simplifying the names we get this:
I choose to hold the intermediate results in temporary tables for convenience.
Subsequent steps will be much easier if the data is first UNPIVOTED.
```
select
Arrangement,
TruckNumber,
ItemNumber = case NewColumn
when 'First' then 1
when 'Second' then 2
when 'Third' then 3
when 'Fourth' then 4
when 'Fifth' then 5
when 'Sixth' then 6
when 'Seventh' then 7
when 'Eigth' then 8
when 'Ninth' then 9
when 'Tenth' then 10
when 'Eleventh' then 11
when 'Twelth' then 12
when 'Thirteenth' then 13
when 'Fourteenth' then 14
else -1
end
into #FilledTrucks
from #Arrangements
unpivot
(
TruckNumber
for NewColumn IN
(
First,
Second,
Third,
Fourth,
Fifth,
Sixth,
select
from
The "best solution" part is defined in the question - the smallest difference between the most loaded and least loaded trucks. The other bit - all combinations - caused me pause for thought.
Consider a situation where we have three orders A, B and C and three trucks. The possibilities are
Truck 1 Truck 2 Truck 3
------- ------- -------
A B C
A C B
B A C
B C A
C A B
C B A
AB C -
AB - C
C AB -
- AB C
C - AB
- C AB
AC B -
AC - B
B AC -
- AC B
B - AC
- B AC
BC A -
BC - A
A BC -
- BC A
A - BC
- A BC
ABC - -
- ABC -
- - ABC
Table A: all permutations.Many of these are symmetric. The first six rows, for example, differ only in which truck each order is placed. Since the trucks are fungible these arrangemets will produce the same outcome. I shall ignore this for now.
There are known queries for producing permutations and combinations. However, these will produce arrangements within a single bucket. For this problem I need arrangements across multiple buckets.
Looking at the output from the standard "all combinations" query
;with Numbers as
(
select n = 1
union
select 2
union
select 3
)
select
a.n,
b.n,
c.n
from Numbers as a
cross join Numbers as b
cross join Numbers as c
order by 1, 2, 3;
n n n
--- --- ---
1 1 1
1 1 2
1 1 3
1 2 1
3 2 3
3 3 1
3 3 2
3 3 3
Table B: cross join of three values.I noted the results formed the same pattern as Table A. By making the congnitive leap of considering each column to be an Order1, the values to say which truck will hold that Order, and a row to be an arrangement of Orders within trucks. The query then becomes
select
Arrangement = ROW_NUMBER() over(order by (select null)),
First_order_goes_in = a.TruckNumber,
Second_order_goes_in = b.TruckNumber,
Third_order_goes_in = c.TruckNumber
from Trucks a -- aka Numbers in Table B
cross join Trucks b
cross join Trucks c
Arrangement First_order_goes_in Second_order_goes_in Third_order_goes_in
----------- ------------------- -------------------- -------------------
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
Query C: Orders in trucks.Expaning this to cover the fourteen Orders in the example data, and simplifying the names we get this:
;with Trucks as
(
select *
from (values (1), (2), (3)) as T(TruckNumber)
)
select
arrangement = ROW_NUMBER() over(order by (select null)),
First = a.TruckNumber,
Second = b.TruckNumber,
Third = c.TruckNumber,
Fourth = d.TruckNumber,
Fifth = e.TruckNumber,
Sixth = f.TruckNumber,
Seventh = g.TruckNumber,
Eigth = h.TruckNumber,
Ninth = i.TruckNumber,
Tenth = j.TruckNumber,
Eleventh = k.TruckNumber,
Twelth = l.TruckNumber,
Thirteenth = m.TruckNumber,
Fourteenth = n.TruckNumber
into #Arrangements
from Trucks a
cross join Trucks b
cross join Trucks c
cross join Trucks d
cross join Trucks e
cross join Trucks f
cross join Trucks g
cross join Trucks h
cross join Trucks i
cross join Trucks j
cross join Trucks k
cross join Trucks l
cross join Trucks m
cross join Trucks n;
Query D: Orders spread over trucks.I choose to hold the intermediate results in temporary tables for convenience.
Subsequent steps will be much easier if the data is first UNPIVOTED.
```
select
Arrangement,
TruckNumber,
ItemNumber = case NewColumn
when 'First' then 1
when 'Second' then 2
when 'Third' then 3
when 'Fourth' then 4
when 'Fifth' then 5
when 'Sixth' then 6
when 'Seventh' then 7
when 'Eigth' then 8
when 'Ninth' then 9
when 'Tenth' then 10
when 'Eleventh' then 11
when 'Twelth' then 12
when 'Thirteenth' then 13
when 'Fourteenth' then 14
else -1
end
into #FilledTrucks
from #Arrangements
unpivot
(
TruckNumber
for NewColumn IN
(
First,
Second,
Third,
Fourth,
Fifth,
Sixth,
Code Snippets
select
<best solution>
from
<all possible combinations>Truck 1 Truck 2 Truck 3
------- ------- -------
A B C
A C B
B A C
B C A
C A B
C B A
AB C -
AB - C
C AB -
- AB C
C - AB
- C AB
AC B -
AC - B
B AC -
- AC B
B - AC
- B AC
BC A -
BC - A
A BC -
- BC A
A - BC
- A BC
ABC - -
- ABC -
- - ABC
Table A: all permutations.;with Numbers as
(
select n = 1
union
select 2
union
select 3
)
select
a.n,
b.n,
c.n
from Numbers as a
cross join Numbers as b
cross join Numbers as c
order by 1, 2, 3;
n n n
--- --- ---
1 1 1
1 1 2
1 1 3
1 2 1
<snip>
3 2 3
3 3 1
3 3 2
3 3 3
Table B: cross join of three values.select
Arrangement = ROW_NUMBER() over(order by (select null)),
First_order_goes_in = a.TruckNumber,
Second_order_goes_in = b.TruckNumber,
Third_order_goes_in = c.TruckNumber
from Trucks a -- aka Numbers in Table B
cross join Trucks b
cross join Trucks c
Arrangement First_order_goes_in Second_order_goes_in Third_order_goes_in
----------- ------------------- -------------------- -------------------
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
<snip>
Query C: Orders in trucks.;with Trucks as
(
select *
from (values (1), (2), (3)) as T(TruckNumber)
)
select
arrangement = ROW_NUMBER() over(order by (select null)),
First = a.TruckNumber,
Second = b.TruckNumber,
Third = c.TruckNumber,
Fourth = d.TruckNumber,
Fifth = e.TruckNumber,
Sixth = f.TruckNumber,
Seventh = g.TruckNumber,
Eigth = h.TruckNumber,
Ninth = i.TruckNumber,
Tenth = j.TruckNumber,
Eleventh = k.TruckNumber,
Twelth = l.TruckNumber,
Thirteenth = m.TruckNumber,
Fourteenth = n.TruckNumber
into #Arrangements
from Trucks a
cross join Trucks b
cross join Trucks c
cross join Trucks d
cross join Trucks e
cross join Trucks f
cross join Trucks g
cross join Trucks h
cross join Trucks i
cross join Trucks j
cross join Trucks k
cross join Trucks l
cross join Trucks m
cross join Trucks n;
Query D: Orders spread over trucks.Context
StackExchange Database Administrators Q#211510, answer score: 5
Revisions (0)
No revisions yet.