patternMajor
Algorithm for merging arrays while keeping their order
Viewed 0 times
keepingarrayswhileordermergingalgorithmfortheir
Problem
My question is generalizable to arrays of any type but I'll use strings to keep it short.
We take a couple of strings as an input. Let's denote them ${S_1,...S_n}$. (Ex: "ABEF" and "CDE"). The output $U$ must fulfill the following conditions:
The problem can have multiple solutions or no solutions. I'm fine with finding just one of them. Some examples might be:
inputs
possible solutions
ABEF, CDE
ABCDEF, CDABEF, CABDEF...
ABEF, CDE, BC
ABCDEF (only solution)
ABEF, CDE, CB
CDABEF, CADBEF, ...
I found this question with a similar algorithm. However, their algorithm allows repeated elements in their output.
Is there a name for this algorithm? What's the time complexity? Does anyone know an implementation in a Python library? Thanks!
Clarifications:
We take a couple of strings as an input. Let's denote them ${S_1,...S_n}$. (Ex: "ABEF" and "CDE"). The output $U$ must fulfill the following conditions:
- The output must contain the entire "vocabulary" of all of the inputs and not more: $x\in U \Leftrightarrow \exists i : x\in S_i$
- Each element in the output appears only once in the output.
- The elements in the output keep their relative ordering from their original strings. So if 'A' comes before 'B' in the string $S_i$, then 'A' must come before 'B' in the output $U$.
The problem can have multiple solutions or no solutions. I'm fine with finding just one of them. Some examples might be:
inputs
possible solutions
ABEF, CDE
ABCDEF, CDABEF, CABDEF...
ABEF, CDE, BC
ABCDEF (only solution)
ABEF, CDE, CB
CDABEF, CADBEF, ...
I found this question with a similar algorithm. However, their algorithm allows repeated elements in their output.
Is there a name for this algorithm? What's the time complexity? Does anyone know an implementation in a Python library? Thanks!
Clarifications:
- the inputs do not have an ordering. The ordering is given by the inputs themselves. That's why "CDABEF" is a valid solution in the first example.
- If the problem does not have a solution, throw an exception.
Solution
You can easily solve your problem by creating a directed graph $G=(V,E)$ in which $V$ is the set of all characters that appear in some input string and $E$ contains the edge $(u,v)$ if character $u$ is immediately followed by character $v$ in some string.
This graph has $O(n)$ edges where $n$ is the total length of the input (i.e., the sum of the strings' lengths) and can be built in $O( \min\{n+|\Sigma|, n \log n\} )$ worst-case time, where $\Sigma$ is your alphabet (if the alphabet has constant size then this is just $O(n)$) or $O(n)$ expected time.
Then your problem admits a solution if and only if $G$ is a directed acyclic graph (DAG). This can be tested in $O(n)$ time. If $G$ is indeed a DAG, any topological ordering of the vertices in $G$ is a valid solution to your problem. A topological ordering of $G$ can be found in $O(n)$ time.
This graph has $O(n)$ edges where $n$ is the total length of the input (i.e., the sum of the strings' lengths) and can be built in $O( \min\{n+|\Sigma|, n \log n\} )$ worst-case time, where $\Sigma$ is your alphabet (if the alphabet has constant size then this is just $O(n)$) or $O(n)$ expected time.
Then your problem admits a solution if and only if $G$ is a directed acyclic graph (DAG). This can be tested in $O(n)$ time. If $G$ is indeed a DAG, any topological ordering of the vertices in $G$ is a valid solution to your problem. A topological ordering of $G$ can be found in $O(n)$ time.
Context
StackExchange Computer Science Q#152572, answer score: 23
Revisions (0)
No revisions yet.