patterncsharpMinor
Iterating thousands of times over large collections: advices
Viewed 0 times
iteratingadviceslargecollectionsthousandstimesover
Problem
This is a pet project :)
I have hundreds of playlist files .m3u whose many entries have a wrong path. I'd like to fix them programmatically. Suppose one of my .m3U contains the following entry:
The file is stored in:
The goal is to update the wrong path with the right one :)
I have to create a collection of all my mp3 (around 6000 files) and their path. As I said, I have hundreds of .mu3 ; each .m3u has 15-20 entries. I have to check each entry against my 6000-entries-collection until the name matches. Well it is a plethora of loops :)
To ease the process I would create an ordered dictionary of dictionary whose the key would be the first letter of the name i.e. "M" (for Michael Jackson - Thriller.mp3) and the sub dictionary would be:
Doing so I would only to iterate over a subset of the collection (looking up for the first letter of the name) instead of iterating over the whole non ordered collection.
Do you think it is ok? What else would you recommend?
I have hundreds of playlist files .m3u whose many entries have a wrong path. I'd like to fix them programmatically. Suppose one of my .m3U contains the following entry:
z:\wrong-folder\Michael Jackson - Thriller.mp3The file is stored in:
G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3The goal is to update the wrong path with the right one :)
I have to create a collection of all my mp3 (around 6000 files) and their path. As I said, I have hundreds of .mu3 ; each .m3u has 15-20 entries. I have to check each entry against my 6000-entries-collection until the name matches. Well it is a plethora of loops :)
To ease the process I would create an ordered dictionary of dictionary whose the key would be the first letter of the name i.e. "M" (for Michael Jackson - Thriller.mp3) and the sub dictionary would be:
- key: Michael Jackson - Thriller.mp3
- value: G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3Doing so I would only to iterate over a subset of the collection (looking up for the first letter of the name) instead of iterating over the whole non ordered collection.
A / Albert Jones - Song 1.mp3 -> G:\my-collection-of-mp3\any-folder\Albert Jones - Song 1.mp3
M / Madonna - Song 1.mp3 -> D:\somewhere-else\Madonna - Song 1.mp3
M / Michael Jackson - Thriller -> G:\my-collection-of-mp3\any-folder\Michael Jackson - Thriller.mp3Do you think it is ok? What else would you recommend?
Solution
If you are particularly interested in doing substring look-ups, you could look into the Trie data structure. You will be stuck writing your own, however, as there is no implementation I know of in the BCL.
Alternatively, if you just want to put something together quick, you could just use a normal, single-level-deep dictionary keyed by file name. If you build your dictionary for existing MP3s first using an actual System.Collections.Hashtable or (preferably) System.Collections.Generic.Dictionary, the key lookups will be constant-time.
The only loops you will need are the initial loop to build the correct file path dictionary, then the loops over .m3u files with sub-loops over their referenced files. Retrieving the correct mp3 reference for each file in a m3u is just
Alternatively, if you just want to put something together quick, you could just use a normal, single-level-deep dictionary keyed by file name. If you build your dictionary for existing MP3s first using an actual System.Collections.Hashtable or (preferably) System.Collections.Generic.Dictionary, the key lookups will be constant-time.
The only loops you will need are the initial loop to build the correct file path dictionary, then the loops over .m3u files with sub-loops over their referenced files. Retrieving the correct mp3 reference for each file in a m3u is just
value = dictionary [key] or dictionary.TryGetValue (key, out value). The lookups themselves are not loops - rather, they generate hash codes for the keys which ultimately translate to array indexes in the underlying data structure.Context
StackExchange Code Review Q#14951, answer score: 4
Revisions (0)
No revisions yet.