patterncsharpMinor
Guessing file type from content
Viewed 0 times
filetypeguessingcontentfrom
Problem
This code is to guess_file type searching its content for known signatures. It is intended to be used both from command line (searching entire media for mismatched extensions) and from code to determine and validate type of uploaded files which will be stored in a database (and eventually downloaded by someone else and possibly viewed).
One simple usage case:
For each file I have one or more
```
public sealed class InspectionResult
{
public enum GuessQuality
{
VeryGood,
Good,
Low,
VeryLow,
}
public InspectionResult(string fileName, string description, string[] expectedExtensions,
GuessQuality quality = GuessQuality.VeryGood)
{
if (expectedExtensions == null)
throw new ArgumentNullException(nameof(expectedExtensions));
FileName = fileName ?? "";
Description = description ?? "";
Quality = quality;
string fileExtension = Path.GetExtension(fileName);
ExtensionMatchesContent = expectedExtensions.Any(x => String.Equals(x, fileExtension,
StringComparison.CurrentCultureIgnoreCase));
}
public InspectionResult(string fileName, string description, string expectedExtension,
GuessQuality quality = GuessQuality.VeryGood)
: this(fileName, description, new string[] { expectedExtension }, quality)
{
}
public string FileName { get; }
public string Description { get; }
public string MimeType
{
get
{
if (_mimeType == null)
_mimeType = ResolveMimeTypeByExtension();
return _mimeType;
}
set
{
One simple usage case:
var inspector = new Inspector();
inspector.Sources.Add(@"c:\users\adriano\documents\*.*");
var mismatches = inspector.Inspect.Where(x => !x.ExtensionMatchesContent);
foreach (var mismatch in mismatches)
Console.WriteLine(mismatch.FileName);For each file I have one or more
InspectionResult (in case it matches multiple types, for any reason):```
public sealed class InspectionResult
{
public enum GuessQuality
{
VeryGood,
Good,
Low,
VeryLow,
}
public InspectionResult(string fileName, string description, string[] expectedExtensions,
GuessQuality quality = GuessQuality.VeryGood)
{
if (expectedExtensions == null)
throw new ArgumentNullException(nameof(expectedExtensions));
FileName = fileName ?? "";
Description = description ?? "";
Quality = quality;
string fileExtension = Path.GetExtension(fileName);
ExtensionMatchesContent = expectedExtensions.Any(x => String.Equals(x, fileExtension,
StringComparison.CurrentCultureIgnoreCase));
}
public InspectionResult(string fileName, string description, string expectedExtension,
GuessQuality quality = GuessQuality.VeryGood)
: this(fileName, description, new string[] { expectedExtension }, quality)
{
}
public string FileName { get; }
public string Description { get; }
public string MimeType
{
get
{
if (_mimeType == null)
_mimeType = ResolveMimeTypeByExtension();
return _mimeType;
}
set
{
Solution
FileName = fileName ?? "";This doesn't look right. Does an empty file name really makes sense? I think you should prevent it from being null-or-empty or add another constructor that doesn't require it and the one that does, should not allow a null.
public void AddFromAssembly(params Assembly[] assemblies)This method shouldn't be part of the
SleuthCollection. It should be implemented in a factory. The collection should just store the items. Knowing how to load an assembly and even create instances is another responsibility.After all you actually don't need the
SleuthCollection at all because if it wasn't a collection it meets everything to be a factory already.There are two possibilities: either a dedicated factory class or a factory method on the
Sleuth type itself. You could use it like:Sleuth.FromAssembly(...) // returns IEnumerableBut I find a dedicated factory would be better.
public sealed class GifSleuth : Sleuth
public abstract class MultipleSleuth : SleuthNested public classes should be avoided. If they don't need an access to private members of the parent type there is not reason for them to be inside another class. A namespace like
Sleuths would be more appropriate.public sealed class DataSourceCollection : CollectionThis type actually stores
DataSources but internaly it depends on a hard-coded FileDataSource. This isn't good. If you want to have a collection of file-data-sources then this should be the type of the collection. Also in this case a List<DataSource would be enough and the FileDataSource should have a factory method FileDataSource.From(string path). The DataSourceCollection shouldn't know how to create a FileDataSource or even how to check any wildcards. It's the responsibility of the FileDataSource.public enum GuessQualityA public enum nested in another class, no no no ;-)
return ""This
"" is dangerous. Use string.Empty in such cases.if (_mimeType == null)Since
_mimeType is a string you should use string.IsNullOrEmpty. You might be sure now that it is either null or a valid string but this is a dangerous assumption.I think it's very strange that the
InspectionResult is only 99% immutable. The MimeType shouldn't have a setter and you could initialize it like the other properties inside the construtor.Code Snippets
FileName = fileName ?? "";public void AddFromAssembly(params Assembly[] assemblies)Sleuth.FromAssembly(...) // returns IEnumerable<Sleuth>public sealed class GifSleuth : Sleuth
public abstract class MultipleSleuth : Sleuthpublic sealed class DataSourceCollection : Collection<DataSource>Context
StackExchange Code Review Q#146506, answer score: 2
Revisions (0)
No revisions yet.