HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Download Images as fast as possible

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
downloadimagespossiblefast

Problem

I have a spreed sheet and its something like 550+ columns and I need to pull URLs from it. Previously I was using LinqToExcel when the sheet was less than 255 columns a recent update has produced this monstrosity, so I have moved on to interop with excel to parse the data.

Needless to say I have a list of a string I want to name all the images to, plus the index of that image for this item(uniqueness of file names). Currently I use 10 threads to do the queries for the data. 10 HttpClients, 1 on each thread, and then I use a paralle.ForEach and Partitioner to dole out batches of 100 items to each parallel thread. This is not a Image pull from 1 website, many websites may be covered in the items.

Is This the Fastest way to do it?
Old Code

Relevant new code:

```
Imports System.Collections
Imports System.Collections.Generic
Imports System.Collections.Concurrent
Imports System.Data
Imports System.Linq
Imports System.Net
Imports System.Net.Http
Imports System.IO
Imports System.Text
Imports System.Xml.Linq
Imports Microsoft.Office.Interop.Excel
Imports System.Threading.Tasks

Public Class MDI
Public Class ImgThreadParams
Public gtin As String
Public lUrls As List(Of String) = New List(Of String)()
End Class
Public Shared Function GetXlsxFiles(targetFolder As String) As String()
Dim files As String()
If targetFolder = vbNullString Then
files = System.IO.Directory.GetFiles(System.IO.Directory.GetCurrentDirectory(), "Export*.xlsx")
Else
If System.IO.Directory.Exists(targetFolder) = True Then
files = System.IO.Directory.GetFiles(targetFolder, "Export*.xls")
Else
files = System.IO.Directory.GetFiles(System.IO.Directory.GetCurrentDirectory(), "Export*.xlsx")
End If
End If
Return files
End Function
Public Shared Sub ProcessFolder(targetFolder As String)
Dim ListParams As New List(Of ImgThreadParams)()
Dim gtin As String = vbNullString
Dim files = GetXlsxFiles(targetFolder)
Di

Solution

Just a few quick remarks (for now):

-
vbNullString is a remnant of VB6 (which I think would be under the Microsoft.VisualBasic.Constants namespace), and used to be a null string pointer often confused with "", an empty string. In VB.NET vbNullString is Nothing, so when you do this:

If targetFolder = vbNullString Then


You're actually verifying whether targetFolder contains a reference at all... but you probably mean to check whether the provided string is null or empty. A more idiomatic way to test for this is String.IsNullOrEmpty:

If String.IsNullOrEmpty(targetFolder) Then


VB.NET's String type overrides the = operator such as Not String.IsNullOrEmpty(s), s <> "", s <> String.Empty and s <> vbNullString are all equivalent.

-
The following only apply if your project defaults options aren't customized:

  • You're not specifying Option Explicit explicitly, but that's fine because in VB.NET that's the default behavior, contrary to VB6/VBA where implicit declarations are allowed by default.



  • You're not specifying Option Strict either, and that's less fine, because by default, the VB.NET compiler won't enforce strict data typing - VB6/VBA was very loose with type conversions, and without Option Strict, VB.NET keeps a lot of that looseness. Note that Option Strict implies Option Explicit, so you don't have to specify both.



-
You're specifying Imports statements, but then you fully-qualify things in these imported namespaces - this defeats the purpose: either you Imports namespaces, or you fully-qualify stuff. For example, you have Imports System.IO, so this code:

files = System.IO.Directory.GetFiles(System.IO.Directory.GetCurrentDirectory(), "Export*.xlsx")


Could read like this instead:

files = Directory.GetFiles(Directory.GetCurrentDirectory(), "Export*.xlsx")


There really isn't much of a reason to directly work with a Thread anymore, especially given Imports System.Threading.Tasks. Let the framework deal with the metal-level threading stuff: it's very unlikely that starting 10 threads is buying you anything in reality - no computer has 10 CPU cores anyway.

It's impossible to read ProcessFolder and know what's going on at a glance. The procedure is doing WAY too many things, at a WAY too low abstraction level. Extract methods out of code blocks, reduce the nesting: this is a clear sign that something isn't right:

That ThreadedGetGtinImages procedure is running on a thread already - and then you start a Parallel.ForEach loop inside that to start an Async Sub that does its thing... Get rid of the threads, and launch asynchronous tasks instead: I suspect your code has a TON of threading overhead that's nullifying the benefits of it all.

Also... whenever you feel the need to name something t1, t2, t3, ..., t12 ...it means you've missed an opportunity to use an appropriate data structure, e.g. some array or some List(Of Something) - that would have eliminated all that huge chunk of repeated code there, where you populate the Ln lists - IMO that should be a function in its own right.

The identifiers you choose aren't consistently meaningful - I get ProcessFolder and excel and sheet, but t7 and itp and r and w and hn and hlare terrible names that will definitely make you want to pull your hair off if you drop this and come back to it in 6 months from now. By the way hl is pure evil, especially in a code base where it might just as well be h1.

Code Snippets

If targetFolder = vbNullString Then
If String.IsNullOrEmpty(targetFolder) Then
files = System.IO.Directory.GetFiles(System.IO.Directory.GetCurrentDirectory(), "Export*.xlsx")
files = Directory.GetFiles(Directory.GetCurrentDirectory(), "Export*.xlsx")

Context

StackExchange Code Review Q#141438, answer score: 9

Revisions (0)

No revisions yet.