patterncsharpMinor
Loading and converting data from CSV
Viewed 0 times
csvloadingandconvertingfromdata
Problem
My application imports a CSV file and parses each row, turning it into an object in my code.
The speed for a 500,000 row x 68 column takes 2+ minutes to read. I've narrowed down the bottleneck to the part where the strings are being converted into their respective types using the converter I created. I need some pointers on how I can optimize my code.
This is the code I use to load:
```
public static void Load(object target, string[] fields, bool supressErrors)
{
Type targetType = target.GetType();
PropertyInfo[] properties = targetType.GetProperties();
foreach (PropertyInfo property in properties)
{
object[] attributes = property.GetCustomAttributes(typeof(CSVPositionAttribute), false);
if (attributes.Length > 0)
{
CSVPositionAttribute positionAttr = (CSVPositionAttribute)attributes[0];
int position = positionAttr.Position;
try
{
object data = fields[position];
if (positionAttr.DataTransform != string.Empty)
{
MethodInfo method = targetType.GetMethod(positionAttr.DataTransform);
data = method.Invoke(target, new object[] { data });
}
var prop = property.PropertyType;
//bottleneck here:
if (prop == typeof(string))
{
property.SetValue(target, CheckAndConvert.CheckString(data), null);
}
else if (prop == typeof(int))
{
property.SetValue(target, CheckAndConvert.CheckInt(data.ToString()), null);
}
else if (prop == typeof(double))
{
property.SetValue(target, CheckAndConvert.CheckDouble(data.ToString()), null);
}
else if (prop == typeof(long))
{
property.SetValue(target, CheckAndConvert.CheckLon
The speed for a 500,000 row x 68 column takes 2+ minutes to read. I've narrowed down the bottleneck to the part where the strings are being converted into their respective types using the converter I created. I need some pointers on how I can optimize my code.
This is the code I use to load:
```
public static void Load(object target, string[] fields, bool supressErrors)
{
Type targetType = target.GetType();
PropertyInfo[] properties = targetType.GetProperties();
foreach (PropertyInfo property in properties)
{
object[] attributes = property.GetCustomAttributes(typeof(CSVPositionAttribute), false);
if (attributes.Length > 0)
{
CSVPositionAttribute positionAttr = (CSVPositionAttribute)attributes[0];
int position = positionAttr.Position;
try
{
object data = fields[position];
if (positionAttr.DataTransform != string.Empty)
{
MethodInfo method = targetType.GetMethod(positionAttr.DataTransform);
data = method.Invoke(target, new object[] { data });
}
var prop = property.PropertyType;
//bottleneck here:
if (prop == typeof(string))
{
property.SetValue(target, CheckAndConvert.CheckString(data), null);
}
else if (prop == typeof(int))
{
property.SetValue(target, CheckAndConvert.CheckInt(data.ToString()), null);
}
else if (prop == typeof(double))
{
property.SetValue(target, CheckAndConvert.CheckDouble(data.ToString()), null);
}
else if (prop == typeof(long))
{
property.SetValue(target, CheckAndConvert.CheckLon
Solution
There's a few ways you could improve your
Design Suggestions
I would be surprised if any of the above suggestions other than maybe the removal of
Performance
If you've narrowed down that the parsing of values into their respective properties is the time consumer research alternative ways to parse numbers (they're ugly but claim they're faster) additionally this might be helpful for possibly speeding up your SetValue calls (be warned it's a long read).
Link Information
CheckAndConvert class and functions. Design Suggestions
- Fluency - It's not a requirement by any means but fluency helps readability.
CheckAndConvert.CheckIntis almost a sentence, perhaps something shorterParse.IntorParse.Bool.
- Naming Simplicity - On the lines of the previous suggestion, your class/function names imply that it's doing more than simply parsing a given value. You're checking if the parse succeeds but then returning a default value if it fails. The user of this function needs not to know about the check part at all. And the word check is repeated (class and function name). Programmers don't typically like redundancy.
- Naming Accuracy -
CheckIntandCheckDoublethey check each other if their primary data type TryParse fails. In the case ofCheckDoubleI don't think it's ever going to be true for the second if-statement. In the case ofCheckIntsinceint.TryParsecan fail on decimals that you want to be integers I would just always usedouble.TryParsewith a cast. (Remember: Cast truncates and Convert.ToInt32 rounds).
I would be surprised if any of the above suggestions other than maybe the removal of
int.TryParse would actually improve performance but it would still make for some cleaner simpler code.Performance
If you've narrowed down that the parsing of values into their respective properties is the time consumer research alternative ways to parse numbers (they're ugly but claim they're faster) additionally this might be helpful for possibly speeding up your SetValue calls (be warned it's a long read).
Link Information
- The parse numbers link decribes parsing the string yourself as .Parse uses CultureInfo and stuff that can slows it down.
- The SetValue link describes research into using delegates for method calls in reflection and shows some promising performance boosts that I've used myself.
Context
StackExchange Code Review Q#109057, answer score: 4
Revisions (0)
No revisions yet.