patternMajor
Data compression using prime numbers
Viewed 0 times
compressionnumbersdatausingprime
Problem
I have recently stumbled upon the following interesting article which claims to efficiently compress random data sets by always more than 50%, regardless of the type and format of the data.
Basically it uses prime numbers to uniquely construct a representation of 4-byte data chunks which are easy to decompress given that every number is a unique product of primes. In order to associate these sequences with the primes it utilizes a dictionary.
My question is:
Basically it uses prime numbers to uniquely construct a representation of 4-byte data chunks which are easy to decompress given that every number is a unique product of primes. In order to associate these sequences with the primes it utilizes a dictionary.
My question is:
- Is this really feasible as the authors suggest it? According to the paper, their results are very efficient and always compress data to a smaller size. Won't the dictionary size be enormous?
- Couldn't this be used to iteratively re-compress the compressed data using the same algorithm? It is obvious, and has been demonstrated, that such techniques (where the compressed data is re-compressed as many times as possible, dramatically reducing the file size) are impossible; indeed, there would be no bijection between the set of all random data and the compressed data. So why does this feel like it would be possible?
- Even if the technique is not perfect as of yet, it can obviously be optimized and strongly improved. Why is this not more widely known/studied? If indeed these claims and experimental results are true, couldn't this revolutionalize computing?
Solution
always compress random data sets by more than 50%
That's impossible. You can't compress random data, you need some structure to take advantage of. Compression must be reversible, so you can't possibly compress everything by 50% because there are far less strings of length $n/2$ than there are of length $n$.
There are some major issues with the paper:
-
They use 10 test files without any indication of their content. Is the data really random? How were they generated?
-
They claim to achieve compression ratios of at least 50%, while their test data shows they achieve at most 50%.
This algorithm defines a lossless strategy which makes use of the prime numbers present in the decimal number
system
-
What? Prime numbers are prime numbers regardless of the base.
-
Issue #1 with decompression: prime factorization is a hard problem, how do they do it efficiently?
-
Issue #2 with decompression (this is the kicker): they multiply the prime numbers together, but doing so you lose any information about the order, since $2\cdot 5 = 10 = 5\cdot 2$. I don't think it is possible to decompress at all using their technique.
I don't think this paper is very good.
That's impossible. You can't compress random data, you need some structure to take advantage of. Compression must be reversible, so you can't possibly compress everything by 50% because there are far less strings of length $n/2$ than there are of length $n$.
There are some major issues with the paper:
-
They use 10 test files without any indication of their content. Is the data really random? How were they generated?
-
They claim to achieve compression ratios of at least 50%, while their test data shows they achieve at most 50%.
This algorithm defines a lossless strategy which makes use of the prime numbers present in the decimal number
system
-
What? Prime numbers are prime numbers regardless of the base.
-
Issue #1 with decompression: prime factorization is a hard problem, how do they do it efficiently?
-
Issue #2 with decompression (this is the kicker): they multiply the prime numbers together, but doing so you lose any information about the order, since $2\cdot 5 = 10 = 5\cdot 2$. I don't think it is possible to decompress at all using their technique.
I don't think this paper is very good.
Context
StackExchange Computer Science Q#44594, answer score: 34
Revisions (0)
No revisions yet.