patterncsharpMinor
Scraping HTML via async controller & classes + HTML agility pack
Viewed 0 times
asyncpackscrapingcontrolleragilityviaclasseshtml
Problem
I've developed a simple application to grab golfer index scores from a website that has no API. The application works but is very slow, with 6 users that require updating takes 60 seconds. I've tried to make my web requests asyncronous to offset some of this lag but it only resulted in a 15% increase in performance.
Description of the code:
On my view I have anchor tag that when clicked hides all the elements in the DOM and loads a preloader, after the preloader is appended to the DOM a AJAX call is executed that calls the UpdatedHandicap method on a controller in my project. From there we await a static method GrabIndexValue. All the code works, it's just very slow.
Possible solution:
This website allows me to input multiple GHIN #s however the result set is in a table with strangely generated xpaths:
I don't know how to dynamically grab those result sets and parse them properly. So I feel like I'm stuck doing 1 web request per index value.
Async controller method:
Method to actually grab the data:
```
public static async Task GrabIndexValue(int ghin)
{
string url = $"http://xxxxxxxxx/Widgets/HandicapLookupResults.aspx?entry=1&ghinno="+ghin+"&css=default&dynamic=&small=0&mode=&tab=0";
HtmlWeb w
Description of the code:
On my view I have anchor tag that when clicked hides all the elements in the DOM and loads a preloader, after the preloader is appended to the DOM a AJAX call is executed that calls the UpdatedHandicap method on a controller in my project. From there we await a static method GrabIndexValue. All the code works, it's just very slow.
Possible solution:
This website allows me to input multiple GHIN #s however the result set is in a table with strangely generated xpaths:
//*[@id="ctl00_bodyMP_gvLookupResults_ctl02_lblHI"] which returns index of: 10.5
//*[@id="ctl00_bodyMP_gvLookupResults_ctl03_lblHI"] which returns index of: 9
//*[@id="ctl00_bodyMP_gvLookupResults_ctl04_lblHI"] which returns index of: 13.5I don't know how to dynamically grab those result sets and parse them properly. So I feel like I'm stuck doing 1 web request per index value.
Async controller method:
public async Task UpdateHandicap()
{
//Fetch all the golfers
var results = db.Users.ToList();
//Iterate through the golfers and update their index value based off their GHIN #. We store this
//value in the database to make our handicap calculation
foreach (Users user in results)
{
user.Index = await Calculations.GrabIndexValue(user.GHID);
db.Entry(user).State = EntityState.Modified;
db.SaveChanges();
}
return RedirectToAction("Index", "Users");
}Method to actually grab the data:
```
public static async Task GrabIndexValue(int ghin)
{
string url = $"http://xxxxxxxxx/Widgets/HandicapLookupResults.aspx?entry=1&ghinno="+ghin+"&css=default&dynamic=&small=0&mode=&tab=0";
HtmlWeb w
Solution
public async Task UpdateHandicap()We add the
Async suffix to methods that are marked with the async keyword. This is a naming convention that as you'll see in a moment Entity Framework follows too.var results = db.Users.ToList();You can use the
Users directly in the loop, you don't have to call ToList first.foreach (Users user in results)
{
user.Index = await Calculations.GrabIndexValue(user.GHID);
db.Entry(user).State = EntityState.Modified;
db.SaveChanges();
}The
async calls are incomplete. SaveChanges would still block so you also want to use theawait db.SaveChangesAsync();but do you really need to call
SaveChanges in a loop? This could be bad for the performance. I think you should do it after the loop:foreach (var user in db.Users)
{
user.Index = await Calculations.GrabIndexValue(user.GHID);
db.Entry(user).State = EntityState.Modified;
}
await db.SaveChangesAsync();Code Snippets
public async Task<ActionResult> UpdateHandicap()var results = db.Users.ToList();foreach (Users user in results)
{
user.Index = await Calculations.GrabIndexValue(user.GHID);
db.Entry(user).State = EntityState.Modified;
db.SaveChanges();
}await db.SaveChangesAsync();foreach (var user in db.Users)
{
user.Index = await Calculations.GrabIndexValue(user.GHID);
db.Entry(user).State = EntityState.Modified;
}
await db.SaveChangesAsync();Context
StackExchange Code Review Q#152442, answer score: 5
Revisions (0)
No revisions yet.