HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Better method for updating data table

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
methodupdatingbetterfordatatable

Problem

I have completed an application which gathers specific records and marks them as Collected. As of now, the application runs perfectly and does exactly what I need it to, but the problem comes in when a large data set is reviewed. If I am pulling records for one day or even maybe a week, it runs at a decent pace, but once you get a month of data or more, it takes a fairly long amount of time.

DataTable ModelData = getModelData() // Returns all records to search.
...
EnumerableRowCollection modelRows = (from model in ModelData.AsEnumerable()
                                              where (model.Field(GeographicalKey) ?? (object)String.Empty).ToString() == GeographicCode
                                              select model);
ModelResults = modelRows.Any() ? modelRows.CopyToDataTable() : ModelData.Clone();

for (int i = 0; i < ModelResults.Rows.Count; i++)
{
    for (int j = 0; j < ModelData.Rows.Count - 1; j++)
    {
        if (ModelResults.Rows[i]["Request ID"].ToString() == ModelData.Rows[j]["Request ID"].ToString())
        {
            ModelData.Rows[j]["Collected"] = "1";
        }
    }
}


The only part that executes slowly would be the for and the nested for. Is there possibly a better way that I can code this? I can't imagine this is the optimal coding for data table update operations, but it might be. Is this a possibility in LINQ (I'm completely new to LINQ)? I think this is enough code to make sense of what I'm doing, but if more is needed just let me know and I can post it up as well.

Solution

The only part that executes slowly would be the for and the nested
for. Is there possibly a better way that I can code this?

I take you ask us to optimize the nested loops:

The most obvious problem, as @Antonio also noted, is the O(n^2) filtering in the nested fors. Also as he noted, you need some kind of data structure with O(1) Contains method. However since you do not use other fields of from the rows of ModelResults, you only need a HashSet of ["Request ID"]s.


I can't imagine this is the optimal coding for data table update
operations, maybe though? Is this a possibility in LINQ (I'm
completely new to LINQ)?

I take you are asking for a more compact, more LINQy way of updating a collection, in this case a DataTable:

LINQ is a query language and a LINQ query does not modify underlying IEnumerable. (In most cases you would select a modified copy of the underlying IEnumerable, instead.)

However, you can use List.ForEach on the result of such a query, as a more compact replacement of for loop.

Applying two suggestions above, we get:

var collectedRequestIds = new HashSet(
    modelResults.AsEnumerable().Select(row => row["Request ID"].ToString()));

modelData.AsEnumerable()
    .Where(collectedRequestIds.Contains(row => row["Request ID"].ToString())))
    .ToList().ForEach(row => row["Collected"] = "1");

Code Snippets

var collectedRequestIds = new HashSet<string>(
    modelResults.AsEnumerable().Select(row => row["Request ID"].ToString()));

modelData.AsEnumerable()
    .Where(collectedRequestIds.Contains(row => row["Request ID"].ToString())))
    .ToList().ForEach(row => row["Collected"] = "1");

Context

StackExchange Code Review Q#44973, answer score: 4

Revisions (0)

No revisions yet.