HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Loading and converting data from CSV

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
csvloadingandconvertingfromdata

Problem

My application imports a CSV file and parses each row, turning it into an object in my code.

The speed for a 500,000 row x 68 column takes 2+ minutes to read. I've narrowed down the bottleneck to the part where the strings are being converted into their respective types using the converter I created. I need some pointers on how I can optimize my code.

This is the code I use to load:

```
public static void Load(object target, string[] fields, bool supressErrors)
{
Type targetType = target.GetType();
PropertyInfo[] properties = targetType.GetProperties();

foreach (PropertyInfo property in properties)
{
object[] attributes = property.GetCustomAttributes(typeof(CSVPositionAttribute), false);

if (attributes.Length > 0)
{
CSVPositionAttribute positionAttr = (CSVPositionAttribute)attributes[0];

int position = positionAttr.Position;
try
{
object data = fields[position];

if (positionAttr.DataTransform != string.Empty)
{
MethodInfo method = targetType.GetMethod(positionAttr.DataTransform);
data = method.Invoke(target, new object[] { data });
}
var prop = property.PropertyType;

//bottleneck here:
if (prop == typeof(string))
{
property.SetValue(target, CheckAndConvert.CheckString(data), null);
}
else if (prop == typeof(int))
{
property.SetValue(target, CheckAndConvert.CheckInt(data.ToString()), null);
}
else if (prop == typeof(double))
{
property.SetValue(target, CheckAndConvert.CheckDouble(data.ToString()), null);
}
else if (prop == typeof(long))
{
property.SetValue(target, CheckAndConvert.CheckLon

Solution

There's a few ways you could improve your CheckAndConvert class and functions.

Design Suggestions

  • Fluency - It's not a requirement by any means but fluency helps readability. CheckAndConvert.CheckInt is almost a sentence, perhaps something shorter Parse.Int or Parse.Bool.



  • Naming Simplicity - On the lines of the previous suggestion, your class/function names imply that it's doing more than simply parsing a given value. You're checking if the parse succeeds but then returning a default value if it fails. The user of this function needs not to know about the check part at all. And the word check is repeated (class and function name). Programmers don't typically like redundancy.



  • Naming Accuracy - CheckInt and CheckDouble they check each other if their primary data type TryParse fails. In the case of CheckDouble I don't think it's ever going to be true for the second if-statement. In the case of CheckInt since int.TryParse can fail on decimals that you want to be integers I would just always use double.TryParse with a cast. (Remember: Cast truncates and Convert.ToInt32 rounds).



I would be surprised if any of the above suggestions other than maybe the removal of int.TryParse would actually improve performance but it would still make for some cleaner simpler code.

Performance

If you've narrowed down that the parsing of values into their respective properties is the time consumer research alternative ways to parse numbers (they're ugly but claim they're faster) additionally this might be helpful for possibly speeding up your SetValue calls (be warned it's a long read).

Link Information

  • The parse numbers link decribes parsing the string yourself as .Parse uses CultureInfo and stuff that can slows it down.



  • The SetValue link describes research into using delegates for method calls in reflection and shows some promising performance boosts that I've used myself.

Context

StackExchange Code Review Q#109057, answer score: 4

Revisions (0)

No revisions yet.