patternMinor
Index matching algorithm without hash-based data structures?
Viewed 0 times
withouthashalgorithmbasedindexdatastructuresmatching
Problem
I am programming in C, so I do not want to implement a hash-based datastructure such as a hashset or hashmap/dictionary. However, I need to solve the following task in linear time.
Given two arrays $a$ and $b$ which contain the same set of distinct integers, determine for every element of $a$ the index of the same element in $b$.
For example, if $a=[9,4,3,7]$ and $b=[3,4,7,9]$, then the output should be $[3,1,0,2]$.
Note that this becomes a very easy task when you have a hashset, because you can simply store for every element in $b$ the index, and then query the hashmap for every element of $a$.
So my question is whether there is a linear algorithm for this task that does not use any hashsets.
Given two arrays $a$ and $b$ which contain the same set of distinct integers, determine for every element of $a$ the index of the same element in $b$.
For example, if $a=[9,4,3,7]$ and $b=[3,4,7,9]$, then the output should be $[3,1,0,2]$.
Note that this becomes a very easy task when you have a hashset, because you can simply store for every element in $b$ the index, and then query the hashmap for every element of $a$.
So my question is whether there is a linear algorithm for this task that does not use any hashsets.
Solution
If the only operation allowed between any two (possibly the same) elements in the two arrays is to determine which one is the smaller one, then it will take $\Theta(n\log n)$ time in worst case for any algorithm.
This can be seen from the situation when array $a$ is sorted while array $b$ is arbitrary before we apply the algorithm. Knowing the index $I(k)$ of the element in $b$ which is the same as the $k$-th element of $a$ for all $k$, we can sort $b$ in $O(n)$ time by simply putting $b_{I(k)}$ in $k$-th position (using one temporary working space or a new result array of length $n$). However, it is well-known that it takes at least $\Theta(n\log n)$ time (comparisons) to sort $b$ in worst cases for any algorithm. So obtaining that knowledge, the index $I(k)$ for all $k$ must
take at least $$\Theta(n\log n)- O(n)=\Theta(n\log n)$$ time as well in worst cases.
The following is a formal formulation of the conclusion above in the comparison computation model.
Let $\mathcal O$ be an oracle that can tell a fixed strict linear ordering on $E$, a set of $n$ elements. That is, on input $e,f\in E$, $\mathcal O$ outputs -1 if $e\prec f$, 0 if $e$ is $f$ and 1 otherwise. Let $a$ and $b$ are two bijections from $\{0, 1,\cdots, n-1\}$ to $E$. To output $I(0), I(1), \cdots, I(n-1)$ in that order such that $a(k)=b(I(k))$ for all $0\le k\le n-1$, it will take $\Theta(n\log n)$ queries against $\mathcal O$ in the worst case.
whether there is a linear algorithm for this task that does not use any hashsets.
A computation model that is defined by no usage of hashset is not a well-defined computation mode. How can you check there is no usage of hashset? There are literally hundreds of ways to implement a data structure that is a hashset or looks like a hashset or looks like a hashset partially. In general, a well-defined computation model must be defined by what can be done formally.
This can be seen from the situation when array $a$ is sorted while array $b$ is arbitrary before we apply the algorithm. Knowing the index $I(k)$ of the element in $b$ which is the same as the $k$-th element of $a$ for all $k$, we can sort $b$ in $O(n)$ time by simply putting $b_{I(k)}$ in $k$-th position (using one temporary working space or a new result array of length $n$). However, it is well-known that it takes at least $\Theta(n\log n)$ time (comparisons) to sort $b$ in worst cases for any algorithm. So obtaining that knowledge, the index $I(k)$ for all $k$ must
take at least $$\Theta(n\log n)- O(n)=\Theta(n\log n)$$ time as well in worst cases.
The following is a formal formulation of the conclusion above in the comparison computation model.
Let $\mathcal O$ be an oracle that can tell a fixed strict linear ordering on $E$, a set of $n$ elements. That is, on input $e,f\in E$, $\mathcal O$ outputs -1 if $e\prec f$, 0 if $e$ is $f$ and 1 otherwise. Let $a$ and $b$ are two bijections from $\{0, 1,\cdots, n-1\}$ to $E$. To output $I(0), I(1), \cdots, I(n-1)$ in that order such that $a(k)=b(I(k))$ for all $0\le k\le n-1$, it will take $\Theta(n\log n)$ queries against $\mathcal O$ in the worst case.
whether there is a linear algorithm for this task that does not use any hashsets.
A computation model that is defined by no usage of hashset is not a well-defined computation mode. How can you check there is no usage of hashset? There are literally hundreds of ways to implement a data structure that is a hashset or looks like a hashset or looks like a hashset partially. In general, a well-defined computation model must be defined by what can be done formally.
Context
StackExchange Computer Science Q#105808, answer score: 3
Revisions (0)
No revisions yet.