HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is the algorithm implemented by git bisect optimal?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
theoptimalimplementedalgorithmgitbisect

Problem

Let $G$ be a DAG. We know that some nodes in $G$ are "bad", while the others are "good"; a descendant of a bad node is bad while the ancestors of a good node are good. We also know that bad nodes have a unique minimal element in $G$ which we'd like to find querying as few nodes as possible with queries of the type "Are you good or bad?".

This problem is solved in Git, the popular version control system, by the command git-bisect, which helps a programmer find the first commit in which a bug was introduced.

At the start, the algorithm implemented by Git assumes to know a single bad commit and one or more good commits. At each step of its execution, the algorithm finds a commit using the following steps (taken from here):

-
Keep only the commits that:

a) are an ancestor of the bad commit (including the bad commit itself), and

b) are not an ancestor of a good commit (excluding the good commits).

-
Starting from the good ends of the resulting graph, associate to each
commit the number of ancestors it has plus one.

-
Associate to each commit $\min(X, N-X)$, where $X$ is the value associated to the commit in step 2, and $N$ is the total number of commits in the graph (after it was reduced in step 1).

-
The best bisection point is the commit with the highest associated
number.

This algorithm is essentially finding the commit that achieves the "worst best case": in fact, $\min(X,N-X)$ is the number of nodes in the DAG at the next iteration in the best case, thus $\max\min(X,N-X)$ is the worst best case.

I'm wondering:

  • Does it make any difference if we select the "best worst case", that is, the node that achieves $\min\max(X,N-X)$?



  • Is this algorithm worst-case optimal?



EDIT: I've noticed that this problem has a $\Omega(N)$ bound. Consider the DAG formed by a single node $b$ with $N-1$ parents called $g_1,\dots,g_{N-1}$. If we know that $b$ is bad then we have check each of the parents to see if they are the minimal bad node.

EDIT 2: The previous is ac

Solution

Here's some intuition for what $X$ and $N$ are doing. Focus on a particular commit $c$. Suppose we test $c$ and classify it as either "good" or "bad". Until we test it, we don't know whether it is good or bad, but we can predict in advance how much smaller the graph will get in each of those two cases. In particular, $X$ is the number of commits that would be trimmed away if commit $c$ turns out to be good, and $N-X$ is the number of commits that would be trimmed away if commit $c$ turns out to be bad.

Therefore, the value $\min(X,N-X)$ is a lower bound on the number of commits we'll be able to trim away in the next step, no matter how the test turns out. The idea of the Git algorithm is to maximize of this metric. In other words, Git picks a threshold $t$ that is as large as possible, and a commit $c$ to test next, such that Git can be sure that it'll be able to trim away at least $t$ commits in the next step.

If we have no information about whether each commit is likely to turn out good or bad, so it's equally likely that it's good or bad, then this looks like a locally optimal choice. Thus, the Git algorithm is a greedy algorithm.

Is the Git algorithm globally optimal? That will depend upon the definition of "optimal", and (probably) on the distribution of DAGs one encounters in practice. Probably there is no simple characterization of the probability distribution on DAGs one encounters in practice, so I'd expect it is probably going to be difficult to find an optimality result for this problem.

Context

StackExchange Computer Science Q#22451, answer score: 5

Revisions (0)

No revisions yet.