HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Loops to merge documents and renumber pages

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mergeloopsrenumberandpagesdocuments

Problem

I have a for loop implementation for merging documents and page re-numbering. But since I am modifying the collection while iterating, I suspect that the code might break. As foreach doesn't allow collection modification, I have implemented it with for.

Is there any better way to implement/optimize this part of code? Also, is it possible to reduce these 3 loops to a single one?

// Merge Document Sequence
for (int k = 0; k < response.documents.Count; k++)
{
    string presentDocument = string.Empty;
    string laterDocument = string.Empty;

    presentDocument = response.documents[k].docType;
    if ((response.documents.Count - 1) != k)
        laterDocument = response.documents[k + 1].docType;

    if ((laterDocument == presentDocument))
    {
        response.documents[k].pages.AddRange(response.documents[k + 1].pages);
        response.documents.RemoveAt(k + 1);
        k = k - 1;
    }
}

// Page Re-numbering                    
for (int i = 0; i < response.documents.Count; i++)
{
    for (int j = 0; j < response.documents[i].pages.Count; j++)
        response.documents[i].pages[j].pageNo= j + 1;
}

Solution

One way to make it easier to follow is to construct your merged documents result separately from your input (above code is not tested).

Construct a dictionary to easily get the document to append to, based on type. It also ensures a fast operation. For each document from the input, append its pages in the dictionary:

// type => merged document mapping
var mergedDocsDict = new Dictionary();

response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, new Document());

       mergedDocDict[doc.docType].pages.AddRange(doc.pages);
    };


Finally renumbering all the pages:

mergedDocDict.Keys.ForEach(key => 
   { 
       var doc = mergedDocDict[key];
       int pageCount = doc.pages.Count;
       for (int i = 0; i < pageCount; i ++)
          doc.pages[i].pageNo = i + 1;
   };


You result can be obtain from the dictionary:

mergedDocDict.Values.ToList();


This solution should be easier to understand, but uses extra memory for the result.

This can be overcome, by reusing the same documents:

response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, doc);
       else
          mergedDocDict[doc.docType].pages.AddRange(doc.pages);
    };


Also, page renumbering can be done in the first loop, obtaining a single foreach:

response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, doc);
       else
       {
          int existingCount = mergedDocDict[doc.docType].pages.Count;
          mergedDocDict[doc.docType].pages.AddRange(doc.pages);
          int newCount = doc.pages.Count + existingCount;
          for (int i = existingCount; i < newCount; i ++)
            mergedDocDict[doc.docType].pages[i].pageNo = i + 1; 
       }
    };


Complexity should be O(number_of_docs * number_of_pages) as all dictionary operations are done in O(1).

If your documents were copied from some other data structure to your response, I think it is better to construct the dictionary from that data structure and obtain respose.documents from the merged data. Otherwise, you have to use a for loop to remove documents already merged (somewhat more convoluted than foreach version).

Code Snippets

// type => merged document mapping
var mergedDocsDict = new Dictionary<String, Document>();

response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, new Document());

       mergedDocDict[doc.docType].pages.AddRange(doc.pages);
    };
mergedDocDict.Keys.ForEach(key => 
   { 
       var doc = mergedDocDict[key];
       int pageCount = doc.pages.Count;
       for (int i = 0; i < pageCount; i ++)
          doc.pages[i].pageNo = i + 1;
   };
mergedDocDict.Values.ToList();
response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, doc);
       else
          mergedDocDict[doc.docType].pages.AddRange(doc.pages);
    };
response.documents.ForEach(doc => 
    {
       if (!mergedDocDict.ContainsKey(doc.docType))
          mergedDocDict.Add(doc.docType, doc);
       else
       {
          int existingCount = mergedDocDict[doc.docType].pages.Count;
          mergedDocDict[doc.docType].pages.AddRange(doc.pages);
          int newCount = doc.pages.Count + existingCount;
          for (int i = existingCount; i < newCount; i ++)
            mergedDocDict[doc.docType].pages[i].pageNo = i + 1; 
       }
    };

Context

StackExchange Code Review Q#117291, answer score: 3

Revisions (0)

No revisions yet.