patterncsharpMinor
Removing equivalent authors in a list
Viewed 0 times
listauthorsequivalentremoving
Problem
I have a list of authors, where seeking duplicity. The indexes of these duplicates are stored in a list of lists named duplicity. For example, at index 0 in the list duplicity is nested list, which refer to indexes of authors list with duplicates author A, index 1 author duplicity B, etc.
I also have list edge, where indexes of nodes(authors) are stored that are connected by an edge. Program passes through and removes duplicate authors by list duplicity.
For example I have stored in duplicity list indexes 1, 2, 3. These refers to authors list, where are stored names of authors. It also refers to edge list, where are stored start and end point of edges. For example if I have authors stored on indexes 1,2,3 and they are connected altogether, than the edge list is 1,2, 1,3, 2,3. Because authors on indexes 1,2,3 are same, I need make them one. So I pass edge list and replace all 2 and 3 to 1.
The problem is that progress of code below is very slow. I'm sure that in the code is something useless which raise complexity, but I can not figure it out. I think, the main failure in my code is the first part with the for loops. It many times passes authors list and if there are tens of thousands items, it's really slow. Can you help me optimize this code?
```
for (int i = 0; i > GetDuplicates(this IList source)
{
HashSet itemsSeen = new HashSet();
HashSet itemsYielded = new HashSet();
List> duplicates = new List>();
List dupLow = new List();
HashSet temp = new HashSet();
int c = 0;
foreach (string item in source)
{
if (!itemsSeen.Add(item))
{
if (itemsYielded.Add(item))
{
if (item != "-")
{
int w = 0;
for (int j = 0; j (temp);
duplicates.Add(dupLow);
temp.Clear();
}
}
}
I also have list edge, where indexes of nodes(authors) are stored that are connected by an edge. Program passes through and removes duplicate authors by list duplicity.
For example I have stored in duplicity list indexes 1, 2, 3. These refers to authors list, where are stored names of authors. It also refers to edge list, where are stored start and end point of edges. For example if I have authors stored on indexes 1,2,3 and they are connected altogether, than the edge list is 1,2, 1,3, 2,3. Because authors on indexes 1,2,3 are same, I need make them one. So I pass edge list and replace all 2 and 3 to 1.
The problem is that progress of code below is very slow. I'm sure that in the code is something useless which raise complexity, but I can not figure it out. I think, the main failure in my code is the first part with the for loops. It many times passes authors list and if there are tens of thousands items, it's really slow. Can you help me optimize this code?
```
for (int i = 0; i > GetDuplicates(this IList source)
{
HashSet itemsSeen = new HashSet();
HashSet itemsYielded = new HashSet();
List> duplicates = new List>();
List dupLow = new List();
HashSet temp = new HashSet();
int c = 0;
foreach (string item in source)
{
if (!itemsSeen.Add(item))
{
if (itemsYielded.Add(item))
{
if (item != "-")
{
int w = 0;
for (int j = 0; j (temp);
duplicates.Add(dupLow);
temp.Clear();
}
}
}
Solution
I'm not sure I follow the code, but I would try removing the repeated calls to
I've also replaced the calls to
GetDuplicates:foreach (var duplicate in authors.GetDuplicates())
{
for (int j = 0; j < edge.Count; j++)
{
for (int k = 0; k < duplicate.Count - 1; k++)
{
if (edge[j] == duplicate[k])
{
edge[j] = duplicate[k + 1];
}
}
}
}I've also replaced the calls to
ElementAt with use of the indexer, as I think it's clearer.Code Snippets
foreach (var duplicate in authors.GetDuplicates())
{
for (int j = 0; j < edge.Count; j++)
{
for (int k = 0; k < duplicate.Count - 1; k++)
{
if (edge[j] == duplicate[k])
{
edge[j] = duplicate[k + 1];
}
}
}
}Context
StackExchange Code Review Q#72271, answer score: 3
Revisions (0)
No revisions yet.