HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Manga ripper with performance and blocking issues

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mangawithissuesripperblockingperformanceand

Problem

I just thought about creating a C# program to download all chapters of a manga given an URL. HTML parsing is done with HtmlAgilityPack.

Issues I have yet to work out are the blocking of the whole program, the rather slow performance of GetPagesLink() as it calls LoadHtmlCode() which uses WebClient a lot (one Webclient object for every page inside a chapter multiplied by number of chapters) and the continuous increase of used memory. At first it starts with ~14 mb but increases infinitely. Besides that, everything works.

```
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Web;
using System.Windows.Forms;
using Treasure;

namespace MangaRipper
{
public partial class Form1 : Form
{
#region Properties

private Uri _Uri;

public Uri Uri
{
get { return _Uri; }
set { _Uri = value; }
}

private List> _Chapters = new List>();

public List> Chapters
{
get { return _Chapters; }
set { _Chapters = value; }
}

private string _MangaName;

public string MangaName
{
get { return _MangaName; }
set { _MangaName = value; }
}

#endregion Properties

public Form1()
{
InitializeComponent();
}

private void exitToolStripMenuItem_Click(object sender, EventArgs e)
{
Application.Exit();
}

private string LoadHtmlCode(string url)
{
using (WebClient client = new WebClient())
{
try
{
// Avoid too many connection requests at once to prevent website from blocking us

Solution

A large part of the problem is doing Thread.Sleep on the UI thread. You could mark the methods async and use await Task.Delay instead if you want a non-blocking sleep.

public async Task MethodThatRunsOnUIThread()
{
    //Do stuff
    //Wait
    await Task.Delay(150);
    //Do more stuff
}


But if I designed this I would do the entire workload asynchronously. That would involve a little bit more work. But you could start with something simple like this:

  • Disable part of the UI.



  • await a Task.Run on the entire workload.



  • Re-enable part of the UI to allow a second run.



Something like this:

private async void btnLoad_Click(object sender, EventArgs e)
{
    btnLoad.Enabled = false;
    string url = txtURL.Text;
    await Task.Run(() => DoLoad(url));
    btnLoad.Enabled = true;
}


Then while performing work asynchronously you can marshal back to the UI thread to update status using BeginInvoke.

I'm not sure why you're creating a million WebClient instances. Sometimes not even using them:

using (WebClient client = new WebClient())
{
    string htmlCode = LoadHtmlCode(Uri.AbsoluteUri);
    LoadAllChapters(htmlCode);
    Download();
}


There's nothing stopping you from sharing a WebClient instance.

Code Snippets

public async Task MethodThatRunsOnUIThread()
{
    //Do stuff
    //Wait
    await Task.Delay(150);
    //Do more stuff
}
private async void btnLoad_Click(object sender, EventArgs e)
{
    btnLoad.Enabled = false;
    string url = txtURL.Text;
    await Task.Run(() => DoLoad(url));
    btnLoad.Enabled = true;
}
using (WebClient client = new WebClient())
{
    string htmlCode = LoadHtmlCode(Uri.AbsoluteUri);
    LoadAllChapters(htmlCode);
    Download();
}

Context

StackExchange Code Review Q#91099, answer score: 2

Revisions (0)

No revisions yet.