snippetpythonMajor
Regex to parse semicolon-delimited fields is too slow
Viewed 0 times
delimitedfieldssemicolontooparseslowregex
Problem
I have a file with just 3500 lines like these:
Then I want to grab every line from the
It needs 17 seconds for each
filecontent= "13P397;Fotostuff;t;IBM;IBM lalala 123|IBM lalala 1234;28.000 things;;IBMlalala123|IBMlalala1234"Then I want to grab every line from the
filecontent that matches a certain string (with python 2.7):this_item= "IBMlalala123"
matchingitems = re.findall(".*?;.*?;.*?;.*?;.*?;.*?;.*?"+this_item,filecontent)It needs 17 seconds for each
findall. I need to search 4000 times in these 3500 lines. It takes forever. Any idea how to speed it up?Solution
.?;.? will cause catastrophic backtracking.To resolve the performance issues, remove
.?; and replace it with [^;];, that should be much faster.Context
StackExchange Code Review Q#32449, answer score: 35
Revisions (0)
No revisions yet.