patterncppMinor
Implementation of Kosaraju's algorithm for strongly connected components
Viewed 0 times
kosarajustronglyalgorithmcomponentsconnectedforimplementation
Problem
To learn and practice coding in C++, I wrote an implementation of Kosaraju's two-pass algorithm for computing the strongly connected components in a directed graph, using depth-first search.
This was my first time touching any C++ code in a while (I mainly program in Python or C#). As such, any critique or advice on style, layout, readability, maintainability, and best practice would be greatly appreciated. And, although performance wasn't a primary goal, I am highly interested in any optimizations that could be made (currently it can process about 800 thousand nodes with 5 million edges in around 10 seconds).
```
#include
#include
#include
#include
#include
using std::vector;
using std::map;
using std::list;
using std::ifstream;
using std::cout;
using std::endl;
// Constants
//-------------------------------
const char FILENAME[] = "SCC.txt";
// Prototypes
//-------------------------------
long get_node_count(const char filename[]);
vector > parse_file(const char filename[]);
map > compute_scc(vector > &graph);
vector > reverse_graph(const vector > &graph);
void dfs_loop(const vector > &graph, vector &finishTime, vector &leader);
long dfs(const vector > &graph, long nodeIndex, vector &expanded, vector &finishTime, long t, vector &leader, long s);
list get_largest_components(const map > scc, long size);
/**
* Main
*/
int main() {
// Get the sequential graph representation from the file
vector > graph = parse_file(FILENAME);
// Compute the strongly-connected components
map > scc = compute_scc(graph);
// Compute the largest 5 components and print them out
list largestComponents = get_largest_components(scc, 5);
list::iterator it;
for (it = largestComponents.begin(); it != largestComponents.end(); it++) {
cout > parse_file(const char filename[]) {
// Get the node count and prepare the graph
long nodeCount = get_node_count(filename);
vector > graph(nodeCount);
// Open file and extract the data
This was my first time touching any C++ code in a while (I mainly program in Python or C#). As such, any critique or advice on style, layout, readability, maintainability, and best practice would be greatly appreciated. And, although performance wasn't a primary goal, I am highly interested in any optimizations that could be made (currently it can process about 800 thousand nodes with 5 million edges in around 10 seconds).
```
#include
#include
#include
#include
#include
using std::vector;
using std::map;
using std::list;
using std::ifstream;
using std::cout;
using std::endl;
// Constants
//-------------------------------
const char FILENAME[] = "SCC.txt";
// Prototypes
//-------------------------------
long get_node_count(const char filename[]);
vector > parse_file(const char filename[]);
map > compute_scc(vector > &graph);
vector > reverse_graph(const vector > &graph);
void dfs_loop(const vector > &graph, vector &finishTime, vector &leader);
long dfs(const vector > &graph, long nodeIndex, vector &expanded, vector &finishTime, long t, vector &leader, long s);
list get_largest_components(const map > scc, long size);
/**
* Main
*/
int main() {
// Get the sequential graph representation from the file
vector > graph = parse_file(FILENAME);
// Compute the strongly-connected components
map > scc = compute_scc(graph);
// Compute the largest 5 components and print them out
list largestComponents = get_largest_components(scc, 5);
list::iterator it;
for (it = largestComponents.begin(); it != largestComponents.end(); it++) {
cout > parse_file(const char filename[]) {
// Get the node count and prepare the graph
long nodeCount = get_node_count(filename);
vector > graph(nodeCount);
// Open file and extract the data
Solution
Fair enough. Lazy but OK I suppose. Personally when using
The idea of making it
This is very obnoxious to read. A bit of work formatting these lines to make it easy to read would have gone a long way:
Nothing wrong with this:
But you don' think it would have been more intuitive to read as:
A tiny bit of work wrapping your data into a class goes a long way to make the code more readable. Your style you are using the classes available in C++ but really your style is more C than C++.
Learn to use the standard algorithms:
Main is special (you don't actually need to return anything. If there is no return in main() it is equivalent to return 0).
If you're code never returns anything but 0 then I would not return anything (let the language do it stuff). Do this to distinguish it from code that can return an error code.
I hope you profiled to make sure it was worth reading the file twice.
The vector automatically re-sizes as required. It uses a heuristic to prevent it re-sizing too many times. If you are using a C++11 compiler then the cost of resizing will be minimized as it will be using move rather than copy to reduce the cost.
This is almost always wrong:
The last successfull read will read
The correct way to write the loop is:
Learn the standard functions:
There is no point in calculating
using like this I bind it to the tightest scope possible.using std::vector;
using std::map;
using std::list;
using std::ifstream;
using std::cout;
using std::endl;The idea of making it
std and not standard was so that it would not be too obnoxious when going std::cout in the code.This is very obnoxious to read. A bit of work formatting these lines to make it easy to read would have gone a long way:
long get_node_count(const char filename[]);
vector > parse_file(const char filename[]);
map > compute_scc(vector > &graph);
vector > reverse_graph(const vector > &graph);
void dfs_loop(const vector > &graph, vector &finishTime, vector &leader);
long dfs(const vector > &graph, long nodeIndex, vector &expanded, vector &finishTime, long t, vector &leader, long s);
list get_largest_components(const map > scc, long size);Nothing wrong with this:
// Get the sequential graph representation from the file
vector > graph = parse_file(FILENAME);But you don' think it would have been more intuitive to read as:
MyGraph graph(FILENAME);A tiny bit of work wrapping your data into a class goes a long way to make the code more readable. Your style you are using the classes available in C++ but really your style is more C than C++.
Learn to use the standard algorithms:
list::iterator it;
for (it = largestComponents.begin(); it != largestComponents.end(); it++) {
cout (std::cout, " ");
std::cout << std::endl;Main is special (you don't actually need to return anything. If there is no return in main() it is equivalent to return 0).
return 0;If you're code never returns anything but 0 then I would not return anything (let the language do it stuff). Do this to distinguish it from code that can return an error code.
I hope you profiled to make sure it was worth reading the file twice.
long nodeCount = get_node_count(filename);
vector > graph(nodeCount);The vector automatically re-sizes as required. It uses a heuristic to prevent it re-sizing too many times. If you are using a C++11 compiler then the cost of resizing will be minimized as it will be using move rather than copy to reduce the cost.
This is almost always wrong:
while (graphFile) {The last successfull read will read
up-to but not past the end of file. Thus leaving the stream in a good state even though there is no data left to read from the stream. Thus the loop will be enetered but the first read will fail. But inside your loop you don't check for failure. As a result you push_back() one more node than exists in the file (though the last node is a copy of the previous node).The correct way to write the loop is:
while(graphFile >> nodeIndex >> outIndex)
{
// Read of both values was successful
// Add it to the graph.
graph[nodeIndex - 1].push_back(outIndex - 1);
}Learn the standard functions:
if (nodeIndex > maxNodeIndex) {
maxNodeIndex = nodeIndex;
}
// Easier to read as:
maxNodeIndex = std::max(maxNodeIndex, nodeIndex);There is no point in calculating
nodeIndex. Either use the iterators, or use the index into the array the combination of both techniques is untidy.for (it = graph.begin(); it != graph.end(); it++) {
long nodeIndex = it - graph.begin();
// Loop through all outgoing edges, and reverse them in new graph
vector::const_iterator eit;
for (eit = graph[nodeIndex].begin(); eit != graph[nodeIndex].end(); eit++) {
reversed[*eit].push_back(nodeIndex);
}Code Snippets
using std::vector;
using std::map;
using std::list;
using std::ifstream;
using std::cout;
using std::endl;long get_node_count(const char filename[]);
vector< vector<long> > parse_file(const char filename[]);
map< long, vector<long> > compute_scc(vector< vector<long> > &graph);
vector< vector<long> > reverse_graph(const vector< vector<long> > &graph);
void dfs_loop(const vector< vector<long> > &graph, vector<long> &finishTime, vector<long> &leader);
long dfs(const vector< vector<long> > &graph, long nodeIndex, vector<bool> &expanded, vector<long> &finishTime, long t, vector<long> &leader, long s);
list<unsigned long> get_largest_components(const map< long, vector<long> > scc, long size);// Get the sequential graph representation from the file
vector< vector<long> > graph = parse_file(FILENAME);MyGraph graph(FILENAME);list<unsigned long>::iterator it;
for (it = largestComponents.begin(); it != largestComponents.end(); it++) {
cout << *it << ' ';
}
cout << endl;
// Easier to write as
std::copy(largestComponents.begin(), largestComponents.end(),
std::ostream_iterator<unsigned long>(std::cout, " ");
std::cout << std::endl;Context
StackExchange Code Review Q#10942, answer score: 8
Revisions (0)
No revisions yet.