patterncsharpMinor
FileSystemWatcher with threaded FIFO processing
Viewed 0 times
fifofilesystemwatcherwithprocessingthreaded
Problem
I have single threaded C# console application that uses
My app sees a file being written, waits until it is released, then picks it up and processes it. At the same time it writes a rolling log to a text file:
The current method ensures the following:
Files created in the folder seem to be cached
This works OK but I don't like the fact that
I think I need two threads:
I started work on this and decided to use an
My questions are:
The files that
Here is the code I have so far:
`//Make th
FileSystemWatcher to watch folder for new files.My app sees a file being written, waits until it is released, then picks it up and processes it. At the same time it writes a rolling log to a text file:
2014-08-06 16:20:1.500 - Found file : C:\test1.pdf
2014-08-06 16:20:1.510 - Waiting for file to become available
2014-08-06 16:20:2.010 - Processing file
2014-08-06 16:20:8.256 - Finished processing file: C:\test1.pdf
2014-08-06 16:20:8.785 - File C:\test2.pdf found!
etc..etc..
The current method ensures the following:
- Each file is processed in the order that
FileSystemWatchersees it.
- The log file is written in a linear fashion, one file at a time.
Files created in the folder seem to be cached
FileSystemWatcher until the current file has finished processing.This works OK but I don't like the fact that
FileSystemWatcher is caching it, it must have a buffer limit somewhere and files might drop of the end.I think I need two threads:
FileSystemWatcherthread that sees the new files and passes to a 'Files to Process' collection.
- File processing thread that sees there are items in the 'Files to Process' collection and processes them, FIFO style.
I started work on this and decided to use an
ObservableCollection for the 'Files to Process' list, I thought I could hang methods off the NotifyCollectionChangedAction events but I am a bit stuck as to where to now put the thread.My questions are:
- Do I need two threads?
- Is
ObservableCollectionthe best object to use to manage the list of files to process?
- Where do I put a second thread? I am guessing that I need a new thread each time a new file is added to the
ObservableCollectionbut won't that trigger a thread for each file, then the log will be jumbled up with each file that is added to the collection?
The files that
FileSystemWatcher pick up become Document objects.Here is the code I have so far:
`//Make th
Solution
Files created in the folder seem to be cached
Not really cached,
The docs say this:
To avoid missing events, follow these guidelines:
So you shouldn't do it this way.
Do I need two threads?
If you want to make reasonably sure you won't miss anything, yes, you need (at least) two threads.
Is
In the current state,
If you switched to two (or more) threads,
Where do I put a second thread? I am guessing that I need a new thread each time a new file is added to the
If you wanted to process each file in parallel, you would need more threads. To keep the log clean (and to perform any other operations that are not thread-safe), you should use a lock.
In any case, you shouldn't directly use
But if you don't need parallelism, a single processing thread (i.e. two threads in total) is enough and it also means you don't need any locks (assuming only this thread writes to the log).
Some specific options on how you could implement this:
-
Use
In the event handler, you would call
-
Use
In the event handler, you would call
This will work best if there is only a single
-
Create a separate
In the event handler, you would create a
This way, files will be processed in parallel.
FileSystemWatcher until the current file has finished processing.Not really cached,
FileSystemWatcher just raises its events one at a time. So, until your event handler returns, you won't get another notification. And since you execute all your code in that event handler, that can take a very long time.The docs say this:
To avoid missing events, follow these guidelines:
- […]
- Keep your event handling code as short as possible.
So you shouldn't do it this way.
Do I need two threads?
If you want to make reasonably sure you won't miss anything, yes, you need (at least) two threads.
Is
ObservableCollection the best object to use to manage the list of files to process?In the current state,
ObservableCollection doesn't make much sense, because there is no list, you will only ever have one item in the collection.If you switched to two (or more) threads,
ObservableCollection is still not a great choice, since it's not thread-safe.Where do I put a second thread? I am guessing that I need a new thread each time a new file is added to the
ObservableCollection but won't that trigger a thread for each file, then the log will be jumbled up with each file that is added to the collection?If you wanted to process each file in parallel, you would need more threads. To keep the log clean (and to perform any other operations that are not thread-safe), you should use a lock.
In any case, you shouldn't directly use
Threads. Instead, you should use Tasks, or some higher-level constructs (see below for more), since they are more efficient and easier to work with.But if you don't need parallelism, a single processing thread (i.e. two threads in total) is enough and it also means you don't need any locks (assuming only this thread writes to the log).
Some specific options on how you could implement this:
-
Use
ActionBlock from TPL Dataflow (requires .Net 4.5). This could be the simplest option, since it means you don't need to create any Threads or Tasks manually.In the event handler, you would call
Post(), and the work would be handled by the delegate that you passed to ActionBlock's constructor.ActionBlock can also work in parallel, if you set its MaxDegreeOfParallelism.-
Use
BlockingCollection and a Task with a loop.In the event handler, you would call
Add(). You would then also create a Task (using Task.Run() or Task.Factory.StartNew()) with a foreach loop over GetConsumingEnumerable(), that process the files.This will work best if there is only a single
Task, which means it won't be parallel.-
Create a separate
Task for each file.In the event handler, you would create a
Task with a delegate that processes the file.This way, files will be processed in parallel.
Context
StackExchange Code Review Q#59385, answer score: 8
Revisions (0)
No revisions yet.