patternMinor
Storing a file by converting it into $n$ files, some of which can be lost
Viewed 0 times
canfilelostintofilessomewhichconvertingstoring
Problem
Let's say that I have a file that I really don't want to lose.
I want to convert the original file into $n$ files in the way that will allow me to recover the original file if I have any $m$ of the converted files. I want an algorithm that will solve this problem for any given $m$ and $n$, where $0 < m < n$.
A trivial way to do it would be to just have $n$ copies of the original file but I want the converted files to be as small as possible.
Is there a name for the problem that I am trying to describe? I tried googling things like "redundant encoding" but I was not able to find what I am looking for. Error-correction algorithms seem to deal with some bits of the file lost or corrupted but in my case each of the converted files is either lost completely or fully available and uncorrupt.
I would really appreciate if someone would tell me where to look for existing approaches to solving this problem. Thanks in advance!
I want to convert the original file into $n$ files in the way that will allow me to recover the original file if I have any $m$ of the converted files. I want an algorithm that will solve this problem for any given $m$ and $n$, where $0 < m < n$.
A trivial way to do it would be to just have $n$ copies of the original file but I want the converted files to be as small as possible.
Is there a name for the problem that I am trying to describe? I tried googling things like "redundant encoding" but I was not able to find what I am looking for. Error-correction algorithms seem to deal with some bits of the file lost or corrupted but in my case each of the converted files is either lost completely or fully available and uncorrupt.
I would really appreciate if someone would tell me where to look for existing approaches to solving this problem. Thanks in advance!
Solution
Let $p \geq n$ be prime, and suppose that $p \leq 2^k$; we would like $p$ to be as close to $2^k$ as possible. Convert your input into chunks of $mk$ bits, and interpret each chunk as $m$ integers $0 \leq a_0,\ldots,a_{m-1} < p$. Define a polynomial
$$
P(x) = a_0 + a_1 x + \cdots + a_{m-1} x^{m-1}.
$$
For $0 \leq i \leq n-1$, the $i$'th copy will encode this chunk as $P(i) \bmod p$.
The idea is that given the values of $P(i)$ for $m$ many values of $i$, we can use Lagrange interpolation to recover $a_0,\ldots,a_{m-1}$. This amount to solving a certain system of linear equations.
For the experts: we can use any finite field of size at least $n$. If we choose a finite field of characteristic 2, then computations become more difficult, but the encoding scheme becomes completely lossless.
$$
P(x) = a_0 + a_1 x + \cdots + a_{m-1} x^{m-1}.
$$
For $0 \leq i \leq n-1$, the $i$'th copy will encode this chunk as $P(i) \bmod p$.
The idea is that given the values of $P(i)$ for $m$ many values of $i$, we can use Lagrange interpolation to recover $a_0,\ldots,a_{m-1}$. This amount to solving a certain system of linear equations.
For the experts: we can use any finite field of size at least $n$. If we choose a finite field of characteristic 2, then computations become more difficult, but the encoding scheme becomes completely lossless.
Context
StackExchange Computer Science Q#146359, answer score: 2
Revisions (0)
No revisions yet.