patterncsharpMinor
Tarjan's algorithm for Strongly Connected Components without keeping SCC in stack
Viewed 0 times
keepingwithoutstacktarjanstronglysccalgorithmcomponentsconnectedfor
Problem
I've been looking for an implementation of Tarjan's algorithm in C#/C++ for a strongly-connected components in graphs, and couldn't find any implementation without use of additional Stack - most of them just follow wikipedia pseudocode. I've decided to create one in case if someone is looking for it. Please take a look and codereview. The main point is that mark the SCC as completed once I'm done with descendand processing, and that any cross-edge towards completed SCC will be ignored.
```
using System;
using System.Collections.Generic;
using System.Linq;
public class Graph
{
private int _verticesCount;
private List[] _vertexAdjancedVertices; // i-th element contains info about all adjanced vertices of vertex #i
public Graph(int[,] edges)
{
_verticesCount = edges.Cast().Max() + 1;
_vertexAdjancedVertices = new List[_verticesCount];
for (int i = 0; i ();
for(int i = 0; i > GetStronglyConnectedComponents()
{
//DFS
var processed = new bool[_verticesCount];
var minConnectedValue = new int[_verticesCount];
var sccCompleted = new bool[_verticesCount];
int currentTime = 0;
for(int startingVertex = 0; startingVertex new {Vertex = i, MinConnectedValue = mcv})
.GroupBy(vmcv => vmcv.MinConnectedValue)
.Select(g => g.Select(vmcv => vmcv.Vertex).ToList()).ToList();
return res;
}
private void GetStronglyConnectedComponents(int vertex, ref int currentTime, bool[] processed, int[] minConnectedValue, bool[] sccCompleted)
{
processed[vertex] = true;
++currentTime;
//var currentDiscoveryTime = currentTime;
minConnectedValue[vertex] = currentTime; // initialize to current time
sccCompleted[vertex] = false;
foreach (var neighbour in _vertexAdjancedVertices[vertex])
{
if (!processed[neighbour])
{
GetStronglyConnectedComponents(neighbour, ref cur
```
using System;
using System.Collections.Generic;
using System.Linq;
public class Graph
{
private int _verticesCount;
private List[] _vertexAdjancedVertices; // i-th element contains info about all adjanced vertices of vertex #i
public Graph(int[,] edges)
{
_verticesCount = edges.Cast().Max() + 1;
_vertexAdjancedVertices = new List[_verticesCount];
for (int i = 0; i ();
for(int i = 0; i > GetStronglyConnectedComponents()
{
//DFS
var processed = new bool[_verticesCount];
var minConnectedValue = new int[_verticesCount];
var sccCompleted = new bool[_verticesCount];
int currentTime = 0;
for(int startingVertex = 0; startingVertex new {Vertex = i, MinConnectedValue = mcv})
.GroupBy(vmcv => vmcv.MinConnectedValue)
.Select(g => g.Select(vmcv => vmcv.Vertex).ToList()).ToList();
return res;
}
private void GetStronglyConnectedComponents(int vertex, ref int currentTime, bool[] processed, int[] minConnectedValue, bool[] sccCompleted)
{
processed[vertex] = true;
++currentTime;
//var currentDiscoveryTime = currentTime;
minConnectedValue[vertex] = currentTime; // initialize to current time
sccCompleted[vertex] = false;
foreach (var neighbour in _vertexAdjancedVertices[vertex])
{
if (!processed[neighbour])
{
GetStronglyConnectedComponents(neighbour, ref cur
Solution
Brace Usage
It is generally considered best practice to always use braces with control statements (such as "if" and "for" statements). If you are concerned with whitespace, consider that placing the opening brace on the same line as the control statement means that the total increase in code size is only 1 line (for the closing brace). It has the added benefit of causing the compiler to complain if you later delete the statement, without deleting the entire block of code attached to it.
Preincrement vs. Postincrement
While both preincrement (++i) and postincrement (i++) do the same thing in isolation, I generally prefer to read the latter. The latter also happens to behave the way most people expect when refactoring the code (render the value, then increment), whereas the former tends to be a very special case (increment first, then render the value) which can cause errors and confusion during refactoring and maintenance. As such, I would recommend using preincrement operators only in the rare occasions they are actually needed (they're a flag that you need to slow down and read that chunk of code carefully).
Algorithm Behavior
The algorithm's output varies based on the order that the edges are presented to the class. Either the class needs to ensure that its internal edge lists are always in the appropriate order assumed/required by the algorithm, or the algorithm needs to be modified to handle edges in an arbitrary order (which the stack-based algorithm does).
The original question's test vector results in:
0
1,2,3,4
5,6,7
8
Whereas reordering the edges to be:
Results in an output of:
0
1,2,3,4
5,6,7,8
There may be additional deficiencies/broken cases surrounding this particular implementation. I ceased investigation upon finding this problem, but I would strongly suggest extensive verification against a known-working stack-based implementation after taking any corrective actions on your stackless algorithm.
Container Types
As a general rule of thumb, when returning a container, you should return the most-generic container interface type appropriate. It is almost never appropriate to return a concrete container, such as a
Comments
Comments are far easier to read when they are above the line they are commenting, rather than to the right. Comments on the right of lines tend to contribute to overly-long lines, they tend to not be well-aligned (and come out of alignment upon refactoring), and tend to get lost/pushed off the side of the screen. All of this makes it much more difficult to do a comments-only skim of the code to get the gist of what it's supposed to be doing, prior to analyzing the code proper, or when quickly trying to find the part of the code you need to modify. The only benefit of comments on the right is that they take up less horizontal space -- but the hit to readability, and therefore usefulness, makes this placement unacceptable to me in most cases.
Some of the comments could also use improvement. "If we will ever find cycle" is confusing; it's either an incomplete thought, or placed on the wrong line. "Initialize to current time" is self-evident; it would be more useful to know why initializing to the current time is important (and possibly comment what currentTime is/does). "DFS" I assume means "depth-first search," and if so, should go into the recursive GetStronglyConnectedComponents call (right before the if statement that kicks off the actual depth-first searching), not the root call.
The "var currentDiscoveryTime[...]" comment is also dead code, and should be removed.
Structures
It would be nice from an aesthetic standpoint to, e.g., pass around a single array of a structure containing your working data, rather than three same-sized arrays.
Speed
If you're simply aiming to get to an \$O(1)\$ sidestep of the stack lookup, consider the easier alternative of keeping a secondary data structure (such as your
It is generally considered best practice to always use braces with control statements (such as "if" and "for" statements). If you are concerned with whitespace, consider that placing the opening brace on the same line as the control statement means that the total increase in code size is only 1 line (for the closing brace). It has the added benefit of causing the compiler to complain if you later delete the statement, without deleting the entire block of code attached to it.
Preincrement vs. Postincrement
While both preincrement (++i) and postincrement (i++) do the same thing in isolation, I generally prefer to read the latter. The latter also happens to behave the way most people expect when refactoring the code (render the value, then increment), whereas the former tends to be a very special case (increment first, then render the value) which can cause errors and confusion during refactoring and maintenance. As such, I would recommend using preincrement operators only in the rare occasions they are actually needed (they're a flag that you need to slow down and read that chunk of code carefully).
Algorithm Behavior
The algorithm's output varies based on the order that the edges are presented to the class. Either the class needs to ensure that its internal edge lists are always in the appropriate order assumed/required by the algorithm, or the algorithm needs to be modified to handle edges in an arbitrary order (which the stack-based algorithm does).
The original question's test vector results in:
0
1,2,3,4
5,6,7
8
Whereas reordering the edges to be:
{4, 8}, {1, 2}, {8, 6}, {2, 5}, {3, 1}, {7, 5}, {4, 1}, {2, 4}, {4, 6}, {5, 6}, {6, 7}, {2, 3}Results in an output of:
0
1,2,3,4
5,6,7,8
There may be additional deficiencies/broken cases surrounding this particular implementation. I ceased investigation upon finding this problem, but I would strongly suggest extensive verification against a known-working stack-based implementation after taking any corrective actions on your stackless algorithm.
Container Types
As a general rule of thumb, when returning a container, you should return the most-generic container interface type appropriate. It is almost never appropriate to return a concrete container, such as a
List or Dictionary. In the case of public List GetStronglyConnectedComponents(), it should probably return ICollection (since knowing how many groups and how many vertices are in each group makes sense enough to not use IEnumerable, but all of the IList functionality seems overkill).Comments
Comments are far easier to read when they are above the line they are commenting, rather than to the right. Comments on the right of lines tend to contribute to overly-long lines, they tend to not be well-aligned (and come out of alignment upon refactoring), and tend to get lost/pushed off the side of the screen. All of this makes it much more difficult to do a comments-only skim of the code to get the gist of what it's supposed to be doing, prior to analyzing the code proper, or when quickly trying to find the part of the code you need to modify. The only benefit of comments on the right is that they take up less horizontal space -- but the hit to readability, and therefore usefulness, makes this placement unacceptable to me in most cases.
Some of the comments could also use improvement. "If we will ever find cycle" is confusing; it's either an incomplete thought, or placed on the wrong line. "Initialize to current time" is self-evident; it would be more useful to know why initializing to the current time is important (and possibly comment what currentTime is/does). "DFS" I assume means "depth-first search," and if so, should go into the recursive GetStronglyConnectedComponents call (right before the if statement that kicks off the actual depth-first searching), not the root call.
The "var currentDiscoveryTime[...]" comment is also dead code, and should be removed.
Structures
It would be nice from an aesthetic standpoint to, e.g., pass around a single array of a structure containing your working data, rather than three same-sized arrays.
Speed
If you're simply aiming to get to an \$O(1)\$ sidestep of the stack lookup, consider the easier alternative of keeping a secondary data structure (such as your
List) that you can modify to mirror the stack, allowing you \$O(1)\$ insertion and lookup in \$2N\$ space. This would eliminate your need to get clever with edge orderings (which will take time to sort) or subtle algorithm alterations (translating the stack to effectively an array), while getting you the speedup you want. It's not as cool or elegant as the algorithmic gymnastics of doing away with the stack altogether, but it's much more likely to be maintainable, fast, and correct in the future. ;)Code Snippets
{4, 8}, {1, 2}, {8, 6}, {2, 5}, {3, 1}, {7, 5}, {4, 1}, {2, 4}, {4, 6}, {5, 6}, {6, 7}, {2, 3}Context
StackExchange Code Review Q#47846, answer score: 5
Revisions (0)
No revisions yet.