HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Efficient compression of simple binary data

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
simplecompressionefficientbinarydata

Problem

I have a file containing ordered binary numbers from $0$ to $2^n - 1$:

0000000000
0000000001
0000000010
0000000011
0000000100
...
1111111111


7z did not compress this file very efficiently (for n = 20, 22 MB were compressed to 300 kB).

Are there algorithms that can recognize very simple structure of data and compress file to several bytes? Also I want to know what area of CS or information theory studies such smart algorithms. "AI" would be too broad, please suggest more concrete keywords.

Notion of symmetry should play fundamental role in data compression, but search queries "symmetry in data compression" and "group theory in data compression" surprisingly return almost nothing relevant.

Solution

Sure, of course there are algorithms. Here is my algorithm:

-
First, check if the file contains ordered binary numbers from $0$ to $2^n-1$, for some $n$. If so, write out a 0 bit followed by $n$ one bits followed by a 0 bit.

-
If not, write out a 1 bit, then write out the 7z-compression of the file.

This is extremely efficient for files of that particular structure.

The point is: there is no free lunch in data compression. You might be able to build a compression algorithm that compresses one type of file well, at the cost of compressing others worse. But, if you know a priori something about the nature of the files you will be compressing, you can optimize your algorithm for that particular type of file.

The area is "data compression". See our data-compression tag, and read textbooks on data compression.

Context

StackExchange Computer Science Q#75046, answer score: 44

Revisions (0)

No revisions yet.