HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMajor

Regex to parse semicolon-delimited fields is too slow

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
delimitedfieldssemicolontooparseslowregex

Problem

I have a file with just 3500 lines like these:

filecontent= "13P397;Fotostuff;t;IBM;IBM lalala 123|IBM lalala 1234;28.000 things;;IBMlalala123|IBMlalala1234"


Then I want to grab every line from the filecontent that matches a certain string (with python 2.7):

this_item= "IBMlalala123"
matchingitems =  re.findall(".*?;.*?;.*?;.*?;.*?;.*?;.*?"+this_item,filecontent)


It needs 17 seconds for each findall. I need to search 4000 times in these 3500 lines. It takes forever. Any idea how to speed it up?

Solution

.?;.? will cause catastrophic backtracking.

To resolve the performance issues, remove .?; and replace it with [^;];, that should be much faster.

Context

StackExchange Code Review Q#32449, answer score: 35

Revisions (0)

No revisions yet.