patterncsharpModerate
Parallelized download of multiple images based on URL
Viewed 0 times
parallelizedbasedmultipledownloadimagesurl
Problem
I decided to rewrite a previous program of mine from scratch. The result is a lot better than the previous one but now that it seems like it is working I want to optimize it (hopefully it is not too early..).
Provided a
```
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
namespace MangaRipper
{
internal class Manga
{
public string mainUrl { get; private set; }
public List chapterList { get; private set; }
public Dictionary> chapterImages { get; private set; }
public Manga(string url)
{
this.mainUrl = url;
this.chapterList = new List();
this.chapterImages = new Dictionary>();
}
}
internal class Program
{
private static string currentWorkingDirectory = Directory.GetCurrentDirectory();
private static string outputDirectory = "";
private static int degreeOfParallelism = Environment.ProcessorCount;
private static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
Init();
List Urls = ReadUrlListFromFile();
List mangaList = new List();
if (Urls != null)
{
foreach (var url in Urls)
{
Console.WriteLine(url);
mangaList.Add(new Manga(url));
}
mangaList = PopulateChapterList(mangaList);
mangaList = PopulateChapterImages(mangaList);
}
// Download
foreach (var manga in mangaList)
{
var mangaName = Directory.CreateDirectory(outputDirectory + @"\" + GetStringUntilSlashFromBack(m
Provided a
UrlList.txt, it read line by line the urls of a manga page, e.g. http://www.readmanga.today/naruto and downloads everything in parallel. Only the list is gone through synchronous.```
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
namespace MangaRipper
{
internal class Manga
{
public string mainUrl { get; private set; }
public List chapterList { get; private set; }
public Dictionary> chapterImages { get; private set; }
public Manga(string url)
{
this.mainUrl = url;
this.chapterList = new List();
this.chapterImages = new Dictionary>();
}
}
internal class Program
{
private static string currentWorkingDirectory = Directory.GetCurrentDirectory();
private static string outputDirectory = "";
private static int degreeOfParallelism = Environment.ProcessorCount;
private static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
Init();
List Urls = ReadUrlListFromFile();
List mangaList = new List();
if (Urls != null)
{
foreach (var url in Urls)
{
Console.WriteLine(url);
mangaList.Add(new Manga(url));
}
mangaList = PopulateChapterList(mangaList);
mangaList = PopulateChapterImages(mangaList);
}
// Download
foreach (var manga in mangaList)
{
var mangaName = Directory.CreateDirectory(outputDirectory + @"\" + GetStringUntilSlashFromBack(m
Solution
Three programmers are ordering Christmas presents online.
The first programmer orders a present, and goes to wait by the mailbox. He waits there for 24 hours until the first present arrives. He takes the present inside, orders a second present, and heads out to wait by the mailbox again. This goes on for several days until all his presents have arrived.
The second programmer sees this and thinks to himself, "How inefficient." He invites some friends around, and they each order one present online. He and his friends go outside together to wait by the mailbox. After 24 hours all the presents have arrived, and the programmer thanks his friends for saving him so much time.
The third programmer orders all his presents, puts on a movie, tidies the house, and makes some eggnog. The next day, when he has a break between chores, he goes outside to check his mailbox and finds that all his presents have arrived.
It's not the perfect analogy, but I hope it gets the idea across.
In
Eric Lippert has a series of blog posts about asynchrony in C#, if you're not familiar with it. I would also recommend reading Stephen Cleary's There is no thread.
The first programmer orders a present, and goes to wait by the mailbox. He waits there for 24 hours until the first present arrives. He takes the present inside, orders a second present, and heads out to wait by the mailbox again. This goes on for several days until all his presents have arrived.
The second programmer sees this and thinks to himself, "How inefficient." He invites some friends around, and they each order one present online. He and his friends go outside together to wait by the mailbox. After 24 hours all the presents have arrived, and the programmer thanks his friends for saving him so much time.
The third programmer orders all his presents, puts on a movie, tidies the house, and makes some eggnog. The next day, when he has a break between chores, he goes outside to check his mailbox and finds that all his presents have arrived.
It's not the perfect analogy, but I hope it gets the idea across.
PopulateChapterList is like the first programmer -- ordering gifts one-at-a-time, and waiting by the mailbox for each one to arrive.In
Main, we have the second programmer, waiting by the mailbox with his friends.WebClient provides asynchronous methods for downloading files, so you can send off all your orders at once, and then wait for them all to arrive. If you want, you can even do other work in the meantime.Eric Lippert has a series of blog posts about asynchrony in C#, if you're not familiar with it. I would also recommend reading Stephen Cleary's There is no thread.
Context
StackExchange Code Review Q#113958, answer score: 12
Revisions (0)
No revisions yet.