patterncsharpMinor
Extracting text fields from <span> tags in an HTML message
Viewed 0 times
fieldstextmessagespantagsextractingfromhtml
Problem
What i'm doing
I have a string with html information like this:
My goal in the method is to create a dictionary with this value:
This is the code that i'm using to accomplish my task:
How can i improve my code?
I have a string with html information like this:
Some text this is a testMy goal in the method is to create a dictionary with this value:
**key** **value**
field-4 Some textThis is the code that i'm using to accomplish my task:
public static Dictionary getFields(String mensaje)
{
Dictionary fields = new Dictionary();
Match m = Regex.Match(mensaje, @"^(.*?(.*?).*?)+$", RegexOptions.Singleline);
for (int i = 0; i .*?)+$", RegexOptions.Singleline);
String fieldId = m2.Groups[2].Captures[0].Value;
fieldId = fieldId.Replace("field-", String.Empty);
fields.Add(int.Parse(fieldId),m.Groups[2].Captures[i].Value);
}
return fields;
}How can i improve my code?
Solution
I know this is Code Review not Rewrite My Code, however I would suggest using a third-party Html parser (like the Html Agility Pack for example) over regular expressions if that's an option.
I realize you're doing very trivial parsing here, but from my personal experiences regular expressions grow to unmaintainable status quicker than anything in software development.
If you were to use a Html parser, you could do something like this:
You get the output:
IMHO, it's much cleaner and maintainable.
I realize you're doing very trivial parsing here, but from my personal experiences regular expressions grow to unmaintainable status quicker than anything in software development.
If you were to use a Html parser, you could do something like this:
string htmlToParse = "Some text this is a testSome more text this is another test";
const string ElementToParse = "span";
const string IdField = "FieldId";
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(htmlToParse);
int fieldId = default( int );
Dictionary fieldValuesTable =
(
from
htmlNode in htmlDocument.DocumentNode.DescendantNodes()
where
htmlNode.Name.Equals( ElementToParse, StringComparison.InvariantCultureIgnoreCase )
&&
htmlNode.Attributes.Contains( IdField )
let
id = htmlNode.Attributes[ IdField ].Value
where
Int32.TryParse( id.Substring( id.IndexOf( "-" ) + 1 ), out fieldId ) // this is stil not ideal,
select
new { Id = fieldId, Text = htmlNode.InnerText }
).ToDictionary( f => f.Id, f => f.Text );You get the output:
4 : Some text
5 : Some more textIMHO, it's much cleaner and maintainable.
Code Snippets
string htmlToParse = "<p><span class=\"fieldText\" fieldId=\"field-4\">Some text</span> this is a test</p><p><span class=\"fieldText\" fieldId=\"field-5\">Some more text</span> this is another test</p>";
const string ElementToParse = "span";
const string IdField = "FieldId";
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(htmlToParse);
int fieldId = default( int );
Dictionary<int,string> fieldValuesTable =
(
from
htmlNode in htmlDocument.DocumentNode.DescendantNodes()
where
htmlNode.Name.Equals( ElementToParse, StringComparison.InvariantCultureIgnoreCase )
&&
htmlNode.Attributes.Contains( IdField )
let
id = htmlNode.Attributes[ IdField ].Value
where
Int32.TryParse( id.Substring( id.IndexOf( "-" ) + 1 ), out fieldId ) // this is stil not ideal,
select
new { Id = fieldId, Text = htmlNode.InnerText }
).ToDictionary( f => f.Id, f => f.Text );4 : Some text
5 : Some more textContext
StackExchange Code Review Q#3547, answer score: 9
Revisions (0)
No revisions yet.