HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Optimisation XML handling within vb.net application

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
handlingapplicationoptimisationxmlwithinnet

Problem

I need to optimize this snippet to be faster:

Dim lst = (From t In DocElet.ChildNodes Select ID = t.item("ID").outerxml).Distinct().ToList()
       Parallel.For(0, lst.Count, Sub(i)
         Dim P As XmlElement = GetElement(lst(i))
       Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t)
         Parallel.ForEach(ls, Sub(D)
         Dim verif_date As String = D.Item("DAD").InnerText
        Sej.ID = D.Item("ID").InnerText
         End Sub)
            End Sub)


This is the XML structure:


  


I'm asking how I can optimize my code because it takes a lot of time (50 sec) in the case where the list contains 20000 elements.

Solution

The indentation makes your snippet quite hard to read - the VB.NET syntax for inline delegates is fairly bulky (at least compared to C#), I'd give it more air; being able to read the code is step one.

Dim lst = (From t In DocElet.ChildNodes 
           Select ID = t.item("ID").outerxml).Distinct()
                                             .ToList()

Parallel.For(0, lst.Count, 
         Sub(i)
             Dim P As XmlElement = GetElement(lst(i))
             Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)

             Parallel.ForEach(ls, 
                      Sub(D)
                          Dim verif_date As String = D.Item("DAD").InnerText
                          Sej.ID = D.Item("ID").InnerText
                      End Sub)
         End Sub)


Ok. So you're running a Parallel.For, and inside that, you're running a Parallel.ForEach. That's quite a lot of overhead just there, it's likely that you're shooting yourself in the foot here, I'd first remove the inner Parallel.ForEach:

Parallel.For(0, lst.Count, 
         Sub(i)
             Dim P As XmlElement = GetElement(lst(i))
             Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)

             For Each foo In ls
                 Dim verif_date As String = D.Item("DAD").InnerText
                 Sej.ID = D.Item("ID").InnerText
             Next
         End Sub)


What's the purpose of verif_date? If it's not used, remove it:

For Each foo In ls
                 Sej.ID = D.Item("ID").InnerText
             Next


Now the outer loop is making yet another selection:

Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)


I'd try to flatten that. I'm not very familiar with the LINQ syntax in VB.NET, but what you want is a SelectMany, where you'd select the child nodes where "ID"'s inner text matches the element's inner text for the node you're iterating - in other words the ls query gets outside the outer loop and into the lst query, so you only need a single loop (which can probably run in parallel).

If the XML is massive, it could be worth trying some PLINQ:

Dim lst = (From t In DocElet.ChildNodes 
           Select ID = t.item("ID").outerxml)
          .Distinct()
          .AsParallel()
          .SelectMany(...)
          .ToList()


Now you can iterate lst with Parallel.ForEach (time it with a normal For Each first) and do your thing.

As an aside, I have to say your naming is inconsistent and hard to follow: Sub(i) looks good (Sub(index) would be better), Sub(D) doesn't. I realize VB.NET isn't case-sensitive, but if you're going to have a convention to make function parameters camelCase, by all means stick to it. lst would probably be better off as elements or whatever makes sense - use real, readable words for your identifiers, not just arbitrary abbreviations.

Code Snippets

Dim lst = (From t In DocElet.ChildNodes 
           Select ID = t.item("ID").outerxml).Distinct()
                                             .ToList()

Parallel.For(0, lst.Count, 
         Sub(i)
             Dim P As XmlElement = GetElement(lst(i))
             Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)

             Parallel.ForEach(ls, 
                      Sub(D)
                          Dim verif_date As String = D.Item("DAD").InnerText
                          Sej.ID = D.Item("ID").InnerText
                      End Sub)
         End Sub)
Parallel.For(0, lst.Count, 
         Sub(i)
             Dim P As XmlElement = GetElement(lst(i))
             Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)

             For Each foo In ls
                 Dim verif_date As String = D.Item("DAD").InnerText
                 Sej.ID = D.Item("ID").InnerText
             Next
         End Sub)
For Each foo In ls
                 Sej.ID = D.Item("ID").InnerText
             Next
Dim ls = (From t In DocElet.ChildNodes 
                       Where t.item("ID").innerText = P.InnerText 
                       Select t)
Dim lst = (From t In DocElet.ChildNodes 
           Select ID = t.item("ID").outerxml)
          .Distinct()
          .AsParallel()
          .SelectMany(...)
          .ToList()

Context

StackExchange Code Review Q#55475, answer score: 2

Revisions (0)

No revisions yet.