patterncsharpMinor
Reddit RepostFinder Bot
Viewed 0 times
repostfinderredditbot
Problem
I've been making a bot for Reddit that finds reposts (based on image MD5 comparing) and posts a small comment on the thread (that part will be later, right now it downloads images and compares MD5). If the image is original content, it will add it to its list of hashes. The source code is here. Any suggestions as to what I can improve?
```
public static bool DownloadImage(string url, bool isDirect) // returns if repost
{ // example url: http://imgur.com/UndN77d
if (url.Contains('?'))
{
url = url.Split('?')[0]; // if it has a query at the end like ?1
}
string filename = null;
if (url.Contains("imgur"))
{ // is imgur, so use ImgurAPI class for it
string imgurID = url.Split('/')[url.Split('/').Length - 1];
Console.WriteLine("Processing imgur id {0}", imgurID);
if (!isDirect)
{ // the sole goal for this block is to get Vars.picToCompare
string details = Imgur.GetImageDetails(imgurID);
if (details == null)
{
return false;
}
Console.WriteLine("Got response: {0}", details); // json
JObject imgur = JObject.Parse(details);
if (imgur["success"].ToString() == "false")
{ // auth failed
Console.WriteLine("Imgur processing failed.. Printing JSON contents: ");
Console.WriteLine(details);
Console.ReadKey();
return false;
}
else
{
Console.WriteLine("Imgur details recieved successfully!");
filename = imgur["data"]["link"].ToString();
url = filename;
Console.WriteLine("Processed direct URL: " + filename);
filename = filename.Split('/')[filename.Split('/').Length - 1];
Console.WriteLine("Processed image filename: " + filename);
Vars.picToCompare = Path.Combine(Vars.picsFolder, filename);
}
```
public static bool DownloadImage(string url, bool isDirect) // returns if repost
{ // example url: http://imgur.com/UndN77d
if (url.Contains('?'))
{
url = url.Split('?')[0]; // if it has a query at the end like ?1
}
string filename = null;
if (url.Contains("imgur"))
{ // is imgur, so use ImgurAPI class for it
string imgurID = url.Split('/')[url.Split('/').Length - 1];
Console.WriteLine("Processing imgur id {0}", imgurID);
if (!isDirect)
{ // the sole goal for this block is to get Vars.picToCompare
string details = Imgur.GetImageDetails(imgurID);
if (details == null)
{
return false;
}
Console.WriteLine("Got response: {0}", details); // json
JObject imgur = JObject.Parse(details);
if (imgur["success"].ToString() == "false")
{ // auth failed
Console.WriteLine("Imgur processing failed.. Printing JSON contents: ");
Console.WriteLine(details);
Console.ReadKey();
return false;
}
else
{
Console.WriteLine("Imgur details recieved successfully!");
filename = imgur["data"]["link"].ToString();
url = filename;
Console.WriteLine("Processed direct URL: " + filename);
filename = filename.Split('/')[filename.Split('/').Length - 1];
Console.WriteLine("Processed image filename: " + filename);
Vars.picToCompare = Path.Combine(Vars.picsFolder, filename);
}
Solution
public static bool DownloadImage(string url, bool isDirect) // returns if repostC# supports something called XML Documentation Comments which allow you to document your code directly while writing it in a processor (as in IDE) friendly way.
public static bool DownloadImage(string url, bool isDirect) // returns if repostLiar! This does not download the image but does checking on the URL, downloads the image, creates a MD5 sum for that image and then returns if it is already in the list of hashes.
You have one function that does everything, that's bad. Ideally you would split the responsibilities between multiple methods in one class, in some kind of such a structure (pseudo code):
class RepostChecker
byte[] ComputeMD5(byte[] data)
byte[] GetImage(string url)
bool IsKnown(byte[] hash)All of these methods can be static. So you can chain them like this:
IsKnown(ComputeMD5(GetImage(url)))Or you can write a convenience method:
bool IsRepost(string url)Which does the chaining for you. Then you define a class which does hold your hashes:
class Hashes
static string HashToString(byte[] hash)
void Add(string hash)
bool Contains(string hash)
void Load(whateverDatastoreYouUse)
void Save(whateverDatastoreYouUse)Internally this should use a
HashSet, not a normal List like your implementation.These changes would allow a more structured approach to the problem.
if (url.Contains('?'))
{
url = url.Split('?')[0]; // if it has a query at the end like ?1
}
string filename = null;
if (url.Contains("imgur"))You should use the Uri class if you work with URLs.
Uri uri = new Uri(url); // http://imgur.com/blabla?whatever
uri.GetLeftPart(UriPartial.Path); // http://imgur.com/blabla
uri.Host; // imgur.comIdeally you would not have one
if {} else {} construct to handle this situation, but an interface which implementations and a static factory:interface ImageHosterInteractor // Notice that the name is not optimal
byte[] GetImage(string url)From this interface you can now derive implementations for imgur and/or any other site. The static factory would look like this:
public static class ImageHosterInteractorFactory
{
public static ImageHosterInteractor Create(Uri uri)
{
if (uri.PathAndQuery.EndsWith(".png")) // Or any other for that matter
{
// Implementation which simply loads the data from
// the given Uri.
return new ImageHosterInteractors.Plain();
}
// The uri does not end with an explicit ending,
// so there's a good chance that we need to follow
// redirections or perform some other magic.
// So we use one of the explicit implementations.
switch(uri.Host)
{
case "imgur.com":
return new ImageHosterInteractors.Imgur();
break;
case "whatever.com":
return new ImageHosterIntactors.Whatever();
break;
default:
throw new NotImplementedException("There is no implementation assigned to handle: " + uri.Host);
}
}
}That allows your code to simply process the URL, pass it into the factory and get something back that knows how to treat that site and you don't need to worry about it further down.
Uri uri = new Uri(fromwhatever);
ImagerHosterInteractor interactor = ImageHosterInteractorFactory.Create(uri);
byte[] image = interactor.GetImage(uri);Also adding new sites to handle is as easy as implementing
ImageHosterInteractor and adding it to the factory method.At the moment it looks like you're saving the images you download, is that necessary or is it enough to simply save the hashes?
Code Snippets
public static bool DownloadImage(string url, bool isDirect) // returns if repostpublic static bool DownloadImage(string url, bool isDirect) // returns if repostclass RepostChecker
byte[] ComputeMD5(byte[] data)
byte[] GetImage(string url)
bool IsKnown(byte[] hash)IsKnown(ComputeMD5(GetImage(url)))bool IsRepost(string url)Context
StackExchange Code Review Q#40927, answer score: 5
Revisions (0)
No revisions yet.