HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Iterating thousands of times over large collections: advices

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
iteratingadviceslargecollectionsthousandstimesover

Problem

This is a pet project :)

I have hundreds of playlist files .m3u whose many entries have a wrong path. I'd like to fix them programmatically. Suppose one of my .m3U contains the following entry:

z:\wrong-folder\Michael Jackson - Thriller.mp3


The file is stored in:

G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3


The goal is to update the wrong path with the right one :)

I have to create a collection of all my mp3 (around 6000 files) and their path. As I said, I have hundreds of .mu3 ; each .m3u has 15-20 entries. I have to check each entry against my 6000-entries-collection until the name matches. Well it is a plethora of loops :)

To ease the process I would create an ordered dictionary of dictionary whose the key would be the first letter of the name i.e. "M" (for Michael Jackson - Thriller.mp3) and the sub dictionary would be:

- key: Michael Jackson - Thriller.mp3 
- value: G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3


Doing so I would only to iterate over a subset of the collection (looking up for the first letter of the name) instead of iterating over the whole non ordered collection.

A / Albert Jones - Song 1.mp3 -> G:\my-collection-of-mp3\any-folder\Albert Jones - Song 1.mp3

M / Madonna - Song 1.mp3 -> D:\somewhere-else\Madonna - Song 1.mp3
M / Michael Jackson - Thriller -> G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3


Do you think it is ok? What else would you recommend?

Solution

If you are particularly interested in doing substring look-ups, you could look into the Trie data structure. You will be stuck writing your own, however, as there is no implementation I know of in the BCL.

Alternatively, if you just want to put something together quick, you could just use a normal, single-level-deep dictionary keyed by file name. If you build your dictionary for existing MP3s first using an actual System.Collections.Hashtable or (preferably) System.Collections.Generic.Dictionary, the key lookups will be constant-time.

The only loops you will need are the initial loop to build the correct file path dictionary, then the loops over .m3u files with sub-loops over their referenced files. Retrieving the correct mp3 reference for each file in a m3u is just value = dictionary [key] or dictionary.TryGetValue (key, out value). The lookups themselves are not loops - rather, they generate hash codes for the keys which ultimately translate to array indexes in the underlying data structure.

Context

StackExchange Code Review Q#14951, answer score: 4

Revisions (0)

No revisions yet.