patterncsharpMinor
File System Read Buffer Efficiency
Viewed 0 times
filereadsystemefficiencybuffer
Problem
Context
I'm working on a new Enterprise Library at my company and I am implementing some code that will be used to read files into memory. Since this is an enterprise library to be shared across the organization, the size of the file to be read in is unknown. I chose to manually read the contents of the files into memory (using a buffer) instead of
Code
```
///
/// Class that provides access to the file system by wrapping system level file IO interactions.
///
public class FileSystemProvider : IFileSystemProvider {
///
/// Initializes a new instance of the class.
///
public FileSystemProvider() {
} // end default constructor
///
/// Returns a value indicating whether or not the specified file exists.
///
/// string: The fully qualified path to the file.
/// true if the specified file is found; false otherwise
public virtual bool FileExists(string filePath) {
bool returnValue = false;
if (this.IsValidPath(filePath)) {
returnValue = File.Exists(filePath);
}
return returnValue;
} // end function FileExists
///
/// Returns a value indicating whether or not the given path is valid. This method is provided as a means to keep the code
/// DRY by providing a common place to verify the contents of a string prior to performing any file IO operations. This
/// method can / should be overridden by a derived type to perform any additional checks (regex, length, etc...). The
/// default implementation just checks for a null or whitespace-trimmed string.
///
/// string: The path to check.
/// true if the path is not null or an empty, whitespace-tr
I'm working on a new Enterprise Library at my company and I am implementing some code that will be used to read files into memory. Since this is an enterprise library to be shared across the organization, the size of the file to be read in is unknown. I chose to manually read the contents of the files into memory (using a buffer) instead of
File.ReadAllText in order to have more control over the read process. For now, the buffer size is fixed at 512 bytes, but may be configurable in the future to allow a given use-case to have a larger or smaller buffer size. I have come up with the following code (which appears to work well):Code
```
///
/// Class that provides access to the file system by wrapping system level file IO interactions.
///
public class FileSystemProvider : IFileSystemProvider {
///
/// Initializes a new instance of the class.
///
public FileSystemProvider() {
} // end default constructor
///
/// Returns a value indicating whether or not the specified file exists.
///
/// string: The fully qualified path to the file.
/// true if the specified file is found; false otherwise
public virtual bool FileExists(string filePath) {
bool returnValue = false;
if (this.IsValidPath(filePath)) {
returnValue = File.Exists(filePath);
}
return returnValue;
} // end function FileExists
///
/// Returns a value indicating whether or not the given path is valid. This method is provided as a means to keep the code
/// DRY by providing a common place to verify the contents of a string prior to performing any file IO operations. This
/// method can / should be overridden by a derived type to perform any additional checks (regex, length, etc...). The
/// default implementation just checks for a null or whitespace-trimmed string.
///
/// string: The path to check.
/// true if the path is not null or an empty, whitespace-tr
Solution
Guard Clauses
You have code like:
This would be better written as a guard clause:
Performance
Your concepts of 'large' and 'very large', are unconventional. I would consider 'large' to be in the > 100MiB ballpark, and very large to be > 4GiB (more than 32-bit size).
This has impacted the features you consider to be performance-enhancing. All file-systems I know of use at least a 512-byte extent, with the extends merged in to at least 4KiB blocks. If you are buffering data, I would recommend at least a 4KiB buffer. I have done similar things in the past, and I now typically use a 1MiB buffer to get queued reads happening.
I would thus have a buffer like:
That will set a buffer that can read all, or big chunks of the file.
If you do it that way, you can also remove the code path for files that are smaller than the buffer, they become irrelevant.
Byte[] method:
This method is horrible overkill. Since you have to return all the bytes anyway, you may as well just allocate a single large buffer for the file size, populate it, and return it.
You have code like:
public virtual string ReadFileContents(string filePath, Encoding fileEncoding) {
string returnValue = string.Empty;
if (this.FileExists(filePath)) {
... do stuff
returnValue = .....
}
return returnValue;
}This would be better written as a guard clause:
public virtual string ReadFileContents(string filePath, Encoding fileEncoding) {
if (!this.FileExists(filePath)) {
return string.Empty;
}
... do stuff
return ....;
}Performance
Your concepts of 'large' and 'very large', are unconventional. I would consider 'large' to be in the > 100MiB ballpark, and very large to be > 4GiB (more than 32-bit size).
This has impacted the features you consider to be performance-enhancing. All file-systems I know of use at least a 512-byte extent, with the extends merged in to at least 4KiB blocks. If you are buffering data, I would recommend at least a 4KiB buffer. I have done similar things in the past, and I now typically use a 1MiB buffer to get queued reads happening.
I would thus have a buffer like:
var bufferSize = Math.Min(1024 * 1024, fs.Length)
byte[] bufferBlock = new byte[bufferSize];That will set a buffer that can read all, or big chunks of the file.
If you do it that way, you can also remove the code path for files that are smaller than the buffer, they become irrelevant.
Byte[] method:
byte[] ReadFileContents(string filePath)This method is horrible overkill. Since you have to return all the bytes anyway, you may as well just allocate a single large buffer for the file size, populate it, and return it.
if (fs.Length > (long)Int32.MaxValue)
{
... throw appropriate exception
}
byte[] returnValue = new byte[(int32)fs.Length];Code Snippets
public virtual string ReadFileContents(string filePath, Encoding fileEncoding) {
string returnValue = string.Empty;
if (this.FileExists(filePath)) {
... do stuff
returnValue = .....
}
return returnValue;
}public virtual string ReadFileContents(string filePath, Encoding fileEncoding) {
if (!this.FileExists(filePath)) {
return string.Empty;
}
... do stuff
return ....;
}var bufferSize = Math.Min(1024 * 1024, fs.Length)
byte[] bufferBlock = new byte[bufferSize];if (fs.Length > (long)Int32.MaxValue)
{
... throw appropriate exception
}
byte[] returnValue = new byte[(int32)fs.Length];Context
StackExchange Code Review Q#72256, answer score: 7
Revisions (0)
No revisions yet.