HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Manipulate HTML document loaded into WebBrowser control

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
controlmanipulateintowebbrowserloadeddocumenthtml

Problem

I have developed my custom solution for this. It happened that the first solution is using XPath queries and the second, a conceptually similar to the first one, is using CSS queries processed by sizzle.js

Here is the sample code for the second solution:

```
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;

namespace myTest.WinFormsApp
{
public partial class MainFormForSizzleTesting : Form
{
public MainFormForSizzleTesting()
{
InitializeComponent();
}

private void MainForm_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = @"




Product Details

Paperback: 648 pages
Publisher: Wiley; Unlimited Edition edition (October 15, 2001)
Language: English
ISBN-10: 0764547763

";
}

private void cmdTest_Click(object sender, EventArgs e)
{
var processor = new WebBrowserControlCSSQueriesProcessor(webBrowser1);

// change attributes of the first element of the list
{
var li = processor.GetHtmlElement("li");
li.innerHTML = string.Format("{0}", li.innerText);
}

// change attributes of the elements with class = "test"
var list = processor.GetHtmlElements("li.test");
foreach (var li in list)
{
li.innerHTML = string.Format("{0}", li.innerText);
}

}

///
/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)
/// and to return CSS queries results to the calling C# code as strongly typed
/// mshtml.IHTMLElement and IEnumerable
///

Solution

private void cmdTest_Click(object sender, EventArgs e)


This looks like your button is called cmdTest, why? Hungarian notation is generally considered a bad thing and even then, why cmd for a button? I think a good name for that button would be TestButton.

WebBrowserControlCSSQueriesProcessor


That name is way too long, why not shorten it to something like WebBrowserCssProcessor?

li.innerHTML = string.Format("{0}", li.innerText);
li.innerHTML = string.Format("{0}", li.innerText);


These two lines are almost the same, consider extracting them into a method:

private static void ChangeStyle(mshtml.IHTMLElement element, string color)
{
    element.innerHTML = string.Format(
        "{0}",
        element.innerText, color);
}


And use it like this:

ChangeStyle(li, "green");
ChangeStyle(li, "blue");


/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)


Why link the included version of the file here? If I need to know that, I can look at the source code. I would either give no link here (and assume people can google sizzle.js) or link to the main page.

System.Windows.Forms.WebBrowser


No need to spell out the whole namespace every time, when you have using System.Windows.Forms at the top of your file.

The same applies to the mshtml namespace: you should put that into a using.

HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.src = "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js";

…

scriptEl = _webBrowser.Document.CreateElement("script");
element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = javaScriptText;


I don't have much experience with WebBrowser or mshtml, but why are you using mshtml here in the first place? Why not just use the HtmlElement directly:

scriptEl.SetAttribute("src", "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js");

scriptEl.InnerText = javaScriptText;


Also, reusing variables (scriptEl and element) like this is not great. You should use different variables here (e.g. sizzleScriptElement and functionScriptElement).

_webBrowser.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement
_webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) })


Repeated code again, so extract it into a method again:

public T Eval(string code)
{
    return (T)_webBrowser.Document.InvokeScript("eval", new object[] { code });
}


Notice that I used a cast and not as. That's because when an error happens, cast gives you immediately a clear InvalidCastException, while as gives you a confusing NullReferenceException later.

public IEnumerable GetHtmlElements(string cssQuery)
{
    // Thanks to: http://stackoverflow.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
    var comObject = _webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) });
    Type type = comObject.GetType();
    int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);

    for (int i = 1; i <= length; i++)
    {
        yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
    }
}


From the linked question, it seems that accessing length using dynamic works, if you do it from JS first. I would do that, since it means you avoid writing all that reflection code.

Code Snippets

private void cmdTest_Click(object sender, EventArgs e)
WebBrowserControlCSSQueriesProcessor
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>{0}</span>", li.innerText);
private static void ChangeStyle(mshtml.IHTMLElement element, string color)
{
    element.innerHTML = string.Format(
        "<span style='text-transform: uppercase;font-family:verdana;color:{1};'>{0}</span>",
        element.innerText, color);
}
ChangeStyle(li, "green");
ChangeStyle(li, "blue");

Context

StackExchange Code Review Q#60426, answer score: 3

Revisions (0)

No revisions yet.