snippetcsharpMinor
Adding many items generated from Taglib# is incredibly slow and expensive, how can I increase the performance?
Viewed 0 times
generatedcanthehowincrediblyaddingslowitemsexpensiveincrease
Problem
I'm trying to make an audio file tag editor, but I ran into some serious performance issues. Here's my method for loading files:
So I have a
I have a few problems with this function though:
-
It is incredibly slow. Even with threading, it takes about 12 seconds before the DataGridView even updates and adds the first rows if I load 2GB of files (on my machine).
That may not seem long, but imagine if a user adds 10 or 20GB of music - they could be waiting several minutes before they can use the application.
-
The
private void LoadFiles(params string[] fileNames) {
foreach (string fileName in fileNames) {
string path = fileName;
if (loadedSongs.ContainsKey(path))
continue;
new Thread(new ThreadStart(() => {
using (TagFile file = TagFile.Create(path)) {
Song song = new Song() {
Album = file.Tag.Album,
AlbumArtists = file.Tag.AlbumArtists,
Artists = file.Tag.Performers,
BeatsPerMinute = (file.Tag.BeatsPerMinute != 0 ?
(uint?)file.Tag.BeatsPerMinute : null),
// ...snip...
};
lock (this.loadedSongs) {
this.loadedSongs.Add(path, song);
}
this.Invoke((MethodInvoker)delegate {
int rowId = songDataGrid.Rows.Add();
DataGridViewRow row = songDataGrid.Rows[rowId];
UpdateRow(row, song);
});
}
})).Start();
}
}So I have a
Song class defined which is really just a container for the various tags that TagLib.File provides, so I don't need to keep a handle on the file.I have a few problems with this function though:
-
It is incredibly slow. Even with threading, it takes about 12 seconds before the DataGridView even updates and adds the first rows if I load 2GB of files (on my machine).
That may not seem long, but imagine if a user adds 10 or 20GB of music - they could be waiting several minutes before they can use the application.
-
The
songDataGrid.Rows.Add() call doesn't seem to execute immediately - it seems to do it in "batches." I don't know if the problem is the TagLib Sharp library or with the DataGridView control, or if I'm just imagining things. It's probably tSolution
There are a couple suggestions:
However, my suspicion is that the biggest gain you will get is implementing your own tag reader library rather than using TagLib, assuming you are correct that it reads the entire file. I/O is one of the most expensive things you can do on a computer. ID3v1 and ID3V2 tags should generally appear in the first X bytes of the file, so you only have to read until the end of the tag, rather than the entire file.
As with any performance issue, though, you need to run this through a profiler. The results may very well point you in a completely different direction. For example, it may reveal that TagLib isn't actually reading the entire file. At the very least, it provides baselines to use in determining if you are making meaningful gains.
- First, you do not need to use your path variable - it is simply fileName re-packaged. This is probably of negligible impact, but there's no reason to keep the extra string around, either.
- Next, a filesystem is not going to give you much in the way of performance gains by multi-threading access. In fact, if you're doing this on a spindle drive, you are probably making things worse. Instead of generating threads for every file, toss your entire loop into a separate async method and have the loop run on a single thread. You still get the UI responsiveness of it being a background task this way.
- You may want to avoid hand-building threads. It is generally preferable to use one of the other mechanisms in the language instead, such as ThreadPool, Task, or even BeginAsync or BackgroundWorker (since you seem to be in a UI).
- Consider having your async loop build its own collection and doing a batch add to loadedSongs later. This removes the need to synchronize access to loadedSongs until the very end and the corresponding locking overhead.
- Finally, I would suspend layout on your grid until the update is complete. DataGridView is pretty poor at updating itself quickly. You almost always win by suspending, making all your updates in a batch, and then resuming layout.
However, my suspicion is that the biggest gain you will get is implementing your own tag reader library rather than using TagLib, assuming you are correct that it reads the entire file. I/O is one of the most expensive things you can do on a computer. ID3v1 and ID3V2 tags should generally appear in the first X bytes of the file, so you only have to read until the end of the tag, rather than the entire file.
As with any performance issue, though, you need to run this through a profiler. The results may very well point you in a completely different direction. For example, it may reveal that TagLib isn't actually reading the entire file. At the very least, it provides baselines to use in determining if you are making meaningful gains.
Context
StackExchange Code Review Q#9785, answer score: 4
Revisions (0)
No revisions yet.