HiveBrain v1.2.0
Get Started
← Back to all entries
principleMajor

Can random suitless $52$ playing card data be compressed to approach, match, or even beat entropy encoding storage? If so, how?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
randomplayingcanhowbeatmatchcardstoragesuitlessentropy

Problem

I have real data I am using for a simulated card game. I am only interested in the ranks of the cards, not the suits. However it is a standard $52$ card deck so there are only $4$ of each rank possible in the deck. The deck is shuffled well for each hand, and then I output the entire deck to a file. So there are only $13$ possible symbols in the output file which are $2,3,4,5,6,7,8,9,T,J,Q,K,A$. ($T$ = ten rank). So of course we can bitpack these using $4$ bits per symbol, but then we are wasting $3$ of the $16$ possible encodings. We can do better if we group $4$ symbols at a time, and then compress them, because $13^4$ = $28,561$ and that can fit rather "snugly" into $15$ bits instead of $16$. The theoretical bitpacking limit is log($13$) / log($2$) = $3.70044$ for data with $13$ random symbols for each possible card. However we cannot have $52$ kings for example in this deck. We MUST have only $4$ of each rank in each deck so the entropy encoding drops by about half a bit per symbol to about $3.2$.

Ok, so here is what I am thinking. This data is not totally random. We know there are $4$ of each rank so in each block of $52$ cards (call it a shuffled deck), so we can make several assumptions and optimizations. One of those being we do not have to encode the very last card, because we will know what it should be. Another savings would be if we end on a single rank; for example, if the last $3$ cards in the deck are $777$, we wouldn't have to encode those because the decoder would be counting cards up to that point and see that all the other ranks have been filled, and will assume the $3$ "missing" cards are all $7$s.

So my question to this site is, what other optimizations are possible to get an even smaller output file on this type of data, and if we use them, can we ever beat the theoretical (simple) bitpacking entropy of $3.70044$ bits per symbol, or even approach the ultimate entropy limit of about $3.2$ bits per symbol on average? If so, how?

Solution

Here is a complete algorithm which reaches the theoretical limit.

Prologue: Encoding integer sequences

A 13-integer sequence "integer with upper limit $a-1$, integer with upper limit $b-1$, "integer with upper limit $c-1$, integer with upper limit $d-1$,... integer with upper limit $m-1$" can always be coded with perfect efficiency.

  • Take the first integer, multiply it by $b$, add the second, multiply the result by $c$, add the third, multiply the result by $d$,… multiply the result by $m$, add the thirteenth – and that will produce a unique number between $0$ and $abcdefghijklm-1$.



  • Write down that number in binary.



The reverse is easy as well. Divide by $m$ and the remainder is the thirteenth integer. Divide the result by $l$ and the remainder is the twelfth integer. Carry on until you have divided by $b$: the remainder is the second integer and the quotient is the first integer.

So to code your cards in the best possible way, all we have to do is find a perfect correspondence between 13-integer sequences (with given upper limits) and arrangements of your shuffled cards.

Here is how to do it.

Correspondence between shufflings and integer sequences

Start with a sequence of 0 cards on the table in front of you.

Step 1

Take the four 2s in your pack and place them on the table.

What choices do you have? A card or cards may be placed either at the beginning of the sequence already on the table, or after any one of the cards in that sequence. In that case, this means that there are $1+0=1$ possible places to put cards.

The total number of ways of placing 4 cards in 1 places is $1$. Encode each of those ways as a number between $0$ and $1-1$. There is 1 such number.

I got 1 by considering the ways of writing 0 as the sum of 5 integers: it is $\frac{4\times 3\times 2 \times 1}{4!}$.

Step 2

Take the four 3s in your pack and place them on the table.

What choices do you have? A card or cards may be placed either at the beginning of the sequence already on the table, or after any one of the cards in that sequence. In that case, this means that there are $1+4=5$ possible places to put cards.

The total number of ways of placing 4 cards in 5 places is $70$. Encode each of those ways as a number between $0$ and $70-1$. There are 70 such numbers.

I got 70 by considering the ways of writing 4 as the sum of 5 integers: it is $\frac{8\times 7\times 6 \times 5}{4!}$.

Step 3

Take the four 4s in your pack and place them on the table.

What choices do you have? A card or cards may be placed either at the beginning of the sequence already on the table, or after any one of the cards in that sequence. In that case, this means that there are $1+8=9$ possible places to put cards.

The total number of ways of placing 4 cards in 9 places is $495$. Encode each of those ways as a number between $0$ and $495-1$. There are 495 such numbers.

I got 495 by considering the ways of writing 8 as the sum of 5 integers: it is $\frac{12\times 11\times 10 \times 9}{4!}$.

And so on, until...

Step 13

Take the four aces in your pack and place them on the table.

What choices do you have? A card or cards may be placed either at the beginning of the sequence already on the table, or after any one of the cards in that sequence. In that case, this means that there are $1+48=49$ possible places to put cards.

The total number of ways of placing 4 cards in 49 places is $270725$. Encode each of those ways as a number between $0$ and $270725-1$. There are 270725 such numbers.

I got 270725 by considering the ways of writing 48 as the sum of 5 integers: it is $\frac{52\times 51\times 50 \times 49}{4!}$.

This procedure yields a 1-to-1 correspondence between (a) shufflings of cards where you don't care about the suit and (b) sequences of integers where the first is between $0$ and $1-1$, the second is between $0$ and $70-1$, the third is between $0$ and $495-1$, and so on until the thirteenth, which is between $0$ and $270725-1$.

Referring to "Encoding integer sequences", you can see that such a sequence of integers is in 1-1 correspondence with the numbers between $0$ and $(1\times 70\times 495\times … \times 270725)-1$. If you look at the "product divided by a factorial" expression of each of the integers (as described in italics at the end of each step) then you will see that this means the numbers between $0$ and $$\frac{52!}{(4!)^{13}}-1\text,$$ which my previous answer showed was the best possible.

So we have a perfect method for compressing your shuffled cards.

The algorithm

Precompute a list of all the ways of writing 0 as the sum of 5 integers, of writing 4 as the sum of 5 integers, of writing 8 as the sum of 5 integers,… of writing 48 as the sum of 5 integers. The longest list has 270725 elements, so it is not particularly big. (Precomputation isn't strictly necessary because you can easily synthesize each list as and when you need it: trying with Microsoft QuickBasic, even going through the 270725-element list wa

Context

StackExchange Computer Science Q#62697, answer score: 33

Revisions (0)

No revisions yet.