patterncsharpMinor
Manipulate HTML document loaded into WebBrowser control
Viewed 0 times
controlmanipulateintowebbrowserloadeddocumenthtml
Problem
I have developed my custom solution for this. It happened that the first solution is using XPath queries and the second, a conceptually similar to the first one, is using CSS queries processed by sizzle.js
Here is the sample code for the second solution:
```
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;
namespace myTest.WinFormsApp
{
public partial class MainFormForSizzleTesting : Form
{
public MainFormForSizzleTesting()
{
InitializeComponent();
}
private void MainForm_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = @"
Product Details
Paperback: 648 pages
Publisher: Wiley; Unlimited Edition edition (October 15, 2001)
Language: English
ISBN-10: 0764547763
";
}
private void cmdTest_Click(object sender, EventArgs e)
{
var processor = new WebBrowserControlCSSQueriesProcessor(webBrowser1);
// change attributes of the first element of the list
{
var li = processor.GetHtmlElement("li");
li.innerHTML = string.Format("{0}", li.innerText);
}
// change attributes of the elements with class = "test"
var list = processor.GetHtmlElements("li.test");
foreach (var li in list)
{
li.innerHTML = string.Format("{0}", li.innerText);
}
}
///
/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)
/// and to return CSS queries results to the calling C# code as strongly typed
/// mshtml.IHTMLElement and IEnumerable
///
Here is the sample code for the second solution:
```
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;
namespace myTest.WinFormsApp
{
public partial class MainFormForSizzleTesting : Form
{
public MainFormForSizzleTesting()
{
InitializeComponent();
}
private void MainForm_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = @"
Product Details
Paperback: 648 pages
Publisher: Wiley; Unlimited Edition edition (October 15, 2001)
Language: English
ISBN-10: 0764547763
";
}
private void cmdTest_Click(object sender, EventArgs e)
{
var processor = new WebBrowserControlCSSQueriesProcessor(webBrowser1);
// change attributes of the first element of the list
{
var li = processor.GetHtmlElement("li");
li.innerHTML = string.Format("{0}", li.innerText);
}
// change attributes of the elements with class = "test"
var list = processor.GetHtmlElements("li.test");
foreach (var li in list)
{
li.innerHTML = string.Format("{0}", li.innerText);
}
}
///
/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)
/// and to return CSS queries results to the calling C# code as strongly typed
/// mshtml.IHTMLElement and IEnumerable
///
Solution
private void cmdTest_Click(object sender, EventArgs e)This looks like your button is called
cmdTest, why? Hungarian notation is generally considered a bad thing and even then, why cmd for a button? I think a good name for that button would be TestButton.WebBrowserControlCSSQueriesProcessorThat name is way too long, why not shorten it to something like
WebBrowserCssProcessor?li.innerHTML = string.Format("{0}", li.innerText);
li.innerHTML = string.Format("{0}", li.innerText);These two lines are almost the same, consider extracting them into a method:
private static void ChangeStyle(mshtml.IHTMLElement element, string color)
{
element.innerHTML = string.Format(
"{0}",
element.innerText, color);
}And use it like this:
ChangeStyle(li, "green");
ChangeStyle(li, "blue");/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)Why link the included version of the file here? If I need to know that, I can look at the source code. I would either give no link here (and assume people can google sizzle.js) or link to the main page.
System.Windows.Forms.WebBrowserNo need to spell out the whole namespace every time, when you have
using System.Windows.Forms at the top of your file.The same applies to the
mshtml namespace: you should put that into a using.HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.src = "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js";
…
scriptEl = _webBrowser.Document.CreateElement("script");
element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = javaScriptText;I don't have much experience with
WebBrowser or mshtml, but why are you using mshtml here in the first place? Why not just use the HtmlElement directly:scriptEl.SetAttribute("src", "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js");
scriptEl.InnerText = javaScriptText;Also, reusing variables (
scriptEl and element) like this is not great. You should use different variables here (e.g. sizzleScriptElement and functionScriptElement)._webBrowser.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement
_webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) })Repeated code again, so extract it into a method again:
public T Eval(string code)
{
return (T)_webBrowser.Document.InvokeScript("eval", new object[] { code });
}Notice that I used a cast and not
as. That's because when an error happens, cast gives you immediately a clear InvalidCastException, while as gives you a confusing NullReferenceException later.public IEnumerable GetHtmlElements(string cssQuery)
{
// Thanks to: http://stackoverflow.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
var comObject = _webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) });
Type type = comObject.GetType();
int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);
for (int i = 1; i <= length; i++)
{
yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
}
}From the linked question, it seems that accessing
length using dynamic works, if you do it from JS first. I would do that, since it means you avoid writing all that reflection code.Code Snippets
private void cmdTest_Click(object sender, EventArgs e)WebBrowserControlCSSQueriesProcessorli.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>{0}</span>", li.innerText);private static void ChangeStyle(mshtml.IHTMLElement element, string color)
{
element.innerHTML = string.Format(
"<span style='text-transform: uppercase;font-family:verdana;color:{1};'>{0}</span>",
element.innerText, color);
}ChangeStyle(li, "green");
ChangeStyle(li, "blue");Context
StackExchange Code Review Q#60426, answer score: 3
Revisions (0)
No revisions yet.