patterncsharpMinor
Getting information of countries out of a website that isn't using consistent verbiage
Viewed 0 times
countrieswebsiteconsistentverbiagegettingthatusingisnoutinformation
Problem
From this website I needed to grab the information for each country and insert it into an Excel spreadsheet.
My original plan was to use my program and search each website for the text and later parse that text, but I ran out of time. So what I ended up doing was copying and pasting the information into a text box. With a few characters I was able to separate the countries without too much trouble.
Here is an example of two countries:
>>>Ethiopia
Overview
Some overview information.
Children Available: more information.
Parent Qualifications: more information.
Travel: more information.
Timeline: more information. >>Ghana
Overview
Some overview information.
Children Available: more information.
Parent Qualifications : more information. //Note the space between
Qualifications and the ':'
Travel: more information.
Timeline: more information. >>' > Copy > Paste > add '
-
How would I be able to get that information?
-
Would I need to find every case and then use the
-
Lastly, am I getting OOP right?
My original plan was to use my program and search each website for the text and later parse that text, but I ran out of time. So what I ended up doing was copying and pasting the information into a text box. With a few characters I was able to separate the countries without too much trouble.
Here is an example of two countries:
>>>Ethiopia
Overview
Some overview information.
Children Available: more information.
Parent Qualifications: more information.
Travel: more information.
Timeline: more information. >>Ghana
Overview
Some overview information.
Children Available: more information.
Parent Qualifications : more information. //Note the space between
Qualifications and the ':'
Travel: more information.
Timeline: more information. >>' > Copy > Paste > add '
-
How would I be able to get that information?
-
Would I need to find every case and then use the
IndexFinder() with every case?-
Lastly, am I getting OOP right?
Solution
Form1
Please, please, please rename this class. There is nothing more uterlessly insignificant than a class named
Good to hear. But this line is dead code, and the comment is pure noise; code in a code base shouldn't be its own history - there's source control for that. Delete dead code, without mercy.
This is by no means obligatory, but c# conventions place the scope-opening curly brace on the next line. I like that you are consistent about your brace style though, so kudos for that!
Naming conventions also state that methods should be
Reading the code file top to bottom is a little off-putting, since this
XML comments are awesome... on public members (to document the API, as XML comments are picked up by IntelliSense). In private methods, they're more or less useless. And this particular comment isn't very useful at all, as it states what the code does - good comments should say why, not what. I'd remove it altogether.
Again the comments are just noise here, they say nothing that the code doesn't already say.
Event handlers shouldn't implement logic; there shouldn't be more than a handful of lines of code in there, delegating the work to more specialized methods.
Well that's surprising. I'd have expected
Actually... why do we even need a
That said, having both a
I'll get back to the parsers. Now back to the crowded event handler.
Don't do that. Instead, place a breakpoint, and *inspect
Actually, that seems like a pretty good reason to write a number of unit tests for your
The method then creates an Excel instance, creates a new workbook (I don't think you need to specify a missing value here), gets the first worksheet, and then proceeds to populate it with the list of countries.
Divide and conquer - move that code elsewhere, it doesn't belong in a button click event handler! Each bullet point above should be the body of a distinct method.
Once you've got a bunch of small methods that do very few things, look at what you've got, and consider moving them to specialized classes - the Single Responsibility Principle applies at that level, too. Good luck!
Country
The
```
public string Continent, CountryName, Overview, ParentQualifications, ChildrenProfiles, Tra
Please, please, please rename this class. There is nothing more uterlessly insignificant than a class named
Class1, an event named Event, a method named Method1, a button named Button1, ...or a form named Form1. Use meaningful names.public List Countries = new List(); //This never ended up working for me.Good to hear. But this line is dead code, and the comment is pure noise; code in a code base shouldn't be its own history - there's source control for that. Delete dead code, without mercy.
private void releaseObject(object obj) {This is by no means obligatory, but c# conventions place the scope-opening curly brace on the next line. I like that you are consistent about your brace style though, so kudos for that!
Naming conventions also state that methods should be
PascalCase - and the code you're showing us is not consistent about that - sometimes it's camelCase, sometimes it's PascalCase - I double-checked the tags to make sure I wasn't answering a java question!Reading the code file top to bottom is a little off-putting, since this
releaseObject function just pops out of nowhere and so one comes across the function and mentally scans the next couple of lines underneath to try and find a usage. I'd put private methods like this below where they're used, for a better reading flow.///
/// This is where we start everything. We take the text from the textbox and move forward.
///
private void parse_Click(object sender, EventArgs e) {XML comments are awesome... on public members (to document the API, as XML comments are picked up by IntelliSense). In private methods, they're more or less useless. And this particular comment isn't very useful at all, as it states what the code does - good comments should say why, not what. I'd remove it altogether.
parse_Click sounds like you've got a Button named parse - "parse" is a verb, which makes it a good name for a method... but not for an object. I'd rename it ParseButton, and rename the handler OnParseButtonClick - give that Ctrl+R, R shortcut some lovin'!Again the comments are just noise here, they say nothing that the code doesn't already say.
Event handlers shouldn't implement logic; there shouldn't be more than a handful of lines of code in there, delegating the work to more specialized methods.
Parser parser = new Parser(textBox1.Text);//We create a new parser object and pass it the text from the textbox.
parser.parseDocument();Well that's surprising. I'd have expected
parseDocument() to return something, and the line to read something like this:var result = parser.Parse();Actually... why do we even need a
Parser instance? And what is it parsing anyway? "Parser" is as vague as it gets. Perhaps CountryParser could be clearer, with a static IEnumerable Parse(string text) signature... but from scrolling further down your code I see that there's also a CountryParser class! That's a little bit confusing.That said, having both a
Parser and a CountryParser class is also confusing because at a glance, there seems to be an inheritance relationship between the two: it sounds as if CountryParser would be derived from Parser... yet it's not the case.I'll get back to the parsers. Now back to the crowded event handler.
foreach (var item in parser.CountryTexts) {//This was so I could see we actually had items to deal with.
Console.WriteLine(item);
}Don't do that. Instead, place a breakpoint, and *inspect
parser.CountryTexts while debugging, by hovering the identifier. Visual Studio has pretty good debugger tools, use them - don't mix test/debug code with production code!Actually, that seems like a pretty good reason to write a number of unit tests for your
Parser class, so that you can pass it various inputs and see what tests pass and what tests fail, and why.The method then creates an Excel instance, creates a new workbook (I don't think you need to specify a missing value here), gets the first worksheet, and then proceeds to populate it with the list of countries.
- Getting an Excel instance is one thing.
- Creating a new workbook is another.
- Populating a worksheet with some values, is yet another thing.
- Formatting the worksheet to your liking is another thing.
- [Getting a filename, ] Saving & closing a workbook is something else.
- Quitting & releasing the Excel instance is another thing.
Divide and conquer - move that code elsewhere, it doesn't belong in a button click event handler! Each bullet point above should be the body of a distinct method.
Once you've got a bunch of small methods that do very few things, look at what you've got, and consider moving them to specialized classes - the Single Responsibility Principle applies at that level, too. Good luck!
Country
The
Country class, as written, is a BIG HUGE NO-NO!```
public string Continent, CountryName, Overview, ParentQualifications, ChildrenProfiles, Tra
Code Snippets
public List<Country> Countries = new List<Country>(); //This never ended up working for me.private void releaseObject(object obj) {/// <summary>
/// This is where we start everything. We take the text from the textbox and move forward.
/// </summary>
private void parse_Click(object sender, EventArgs e) {Parser parser = new Parser(textBox1.Text);//We create a new parser object and pass it the text from the textbox.
parser.parseDocument();var result = parser.Parse();Context
StackExchange Code Review Q#68098, answer score: 7
Revisions (0)
No revisions yet.