HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Guessing file type from content

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filetypeguessingcontentfrom

Problem

This code is to guess_file type searching its content for known signatures. It is intended to be used both from command line (searching entire media for mismatched extensions) and from code to determine and validate type of uploaded files which will be stored in a database (and eventually downloaded by someone else and possibly viewed).

One simple usage case:

var inspector = new Inspector();
inspector.Sources.Add(@"c:\users\adriano\documents\*.*");

var mismatches = inspector.Inspect.Where(x => !x.ExtensionMatchesContent);
foreach (var mismatch in mismatches)
    Console.WriteLine(mismatch.FileName);


For each file I have one or more InspectionResult (in case it matches multiple types, for any reason):

```
public sealed class InspectionResult
{
public enum GuessQuality
{
VeryGood,
Good,
Low,
VeryLow,
}

public InspectionResult(string fileName, string description, string[] expectedExtensions,
GuessQuality quality = GuessQuality.VeryGood)
{
if (expectedExtensions == null)
throw new ArgumentNullException(nameof(expectedExtensions));

FileName = fileName ?? "";
Description = description ?? "";
Quality = quality;

string fileExtension = Path.GetExtension(fileName);
ExtensionMatchesContent = expectedExtensions.Any(x => String.Equals(x, fileExtension,
StringComparison.CurrentCultureIgnoreCase));
}

public InspectionResult(string fileName, string description, string expectedExtension,
GuessQuality quality = GuessQuality.VeryGood)
: this(fileName, description, new string[] { expectedExtension }, quality)
{
}

public string FileName { get; }

public string Description { get; }

public string MimeType
{
get
{
if (_mimeType == null)
_mimeType = ResolveMimeTypeByExtension();

return _mimeType;
}
set
{

Solution

FileName = fileName ?? "";


This doesn't look right. Does an empty file name really makes sense? I think you should prevent it from being null-or-empty or add another constructor that doesn't require it and the one that does, should not allow a null.

public void AddFromAssembly(params Assembly[] assemblies)


This method shouldn't be part of the SleuthCollection. It should be implemented in a factory. The collection should just store the items. Knowing how to load an assembly and even create instances is another responsibility.

After all you actually don't need the SleuthCollection at all because if it wasn't a collection it meets everything to be a factory already.

There are two possibilities: either a dedicated factory class or a factory method on the Sleuth type itself. You could use it like:

Sleuth.FromAssembly(...) // returns IEnumerable


But I find a dedicated factory would be better.

public sealed class GifSleuth : Sleuth
public abstract class MultipleSleuth : Sleuth


Nested public classes should be avoided. If they don't need an access to private members of the parent type there is not reason for them to be inside another class. A namespace like Sleuths would be more appropriate.

public sealed class DataSourceCollection : Collection


This type actually stores DataSources but internaly it depends on a hard-coded FileDataSource. This isn't good. If you want to have a collection of file-data-sources then this should be the type of the collection. Also in this case a List<DataSource would be enough and the FileDataSource should have a factory method FileDataSource.From(string path). The DataSourceCollection shouldn't know how to create a FileDataSource or even how to check any wildcards. It's the responsibility of the FileDataSource.

public enum GuessQuality


A public enum nested in another class, no no no ;-)

return ""


This "" is dangerous. Use string.Empty in such cases.

if (_mimeType == null)


Since _mimeType is a string you should use string.IsNullOrEmpty. You might be sure now that it is either null or a valid string but this is a dangerous assumption.

I think it's very strange that the InspectionResult is only 99% immutable. The MimeType shouldn't have a setter and you could initialize it like the other properties inside the construtor.

Code Snippets

FileName = fileName ?? "";
public void AddFromAssembly(params Assembly[] assemblies)
Sleuth.FromAssembly(...) // returns IEnumerable<Sleuth>
public sealed class GifSleuth : Sleuth
public abstract class MultipleSleuth : Sleuth
public sealed class DataSourceCollection : Collection<DataSource>

Context

StackExchange Code Review Q#146506, answer score: 2

Revisions (0)

No revisions yet.