patternpythonMinor
Blocking production lines in Python
Viewed 0 times
pythonlinesblockingproduction
Problem
I have the following code that works great, but takes 40 mins to 1 hours to finish the task. It has at least 30 rules for blocking lines, and 30k+ orders to check. I'm new to Python (3 months working with it), so I don't know what can I change to make it run faster. The version in my computer is Python 2.7.9.
```
import codecs
import re
def find_word(text, search):
result = re.findall('\\b'+search+'\\b', text, flags=re.IGNORECASE)
if len(result)>0:
return True
else:
return False
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/FO.txt','r',encoding = 'utf-8') as FO:
finalorder = FO.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Familias.txt','r',encoding = 'utf-8') as familias:
list_familias = familias.readlines()
fo = ''
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','w',encoding = 'utf-8') as data:
for forder in finalorder:
order = forder.split(';')
orden = [order[0],order[1],order[15],order[24],order[2],'','']
for familia in list_familias:
fam = familia.split(';')
if order[2] in fam[0]:
orden[5] = fam[1]
orden[6] = fam[2]
fo = ';'.join(orden)
data.write(fo+'\n')
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','r',encoding = 'utf-8') as cross:
cr = cross.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/App_Data/BloqLines.txt','r',encoding = 'utf-8') as bloqueos:
block = bloqueos.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/nodos.txt','r',encoding = 'utf-8') as node:
nodos = node.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Lineas.txt','r',encoding = 'utf-8') as listlineas:
tlines = listlineas.readlines()
count = 0
fileblock = codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Restricciones_Ruteo.txt','w',encoding = 'utf-8')
for forder in cr:
fo = forder.split(';')
for bloqueo in block:
check = bloqu
```
import codecs
import re
def find_word(text, search):
result = re.findall('\\b'+search+'\\b', text, flags=re.IGNORECASE)
if len(result)>0:
return True
else:
return False
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/FO.txt','r',encoding = 'utf-8') as FO:
finalorder = FO.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Familias.txt','r',encoding = 'utf-8') as familias:
list_familias = familias.readlines()
fo = ''
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','w',encoding = 'utf-8') as data:
for forder in finalorder:
order = forder.split(';')
orden = [order[0],order[1],order[15],order[24],order[2],'','']
for familia in list_familias:
fam = familia.split(';')
if order[2] in fam[0]:
orden[5] = fam[1]
orden[6] = fam[2]
fo = ';'.join(orden)
data.write(fo+'\n')
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','r',encoding = 'utf-8') as cross:
cr = cross.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/App_Data/BloqLines.txt','r',encoding = 'utf-8') as bloqueos:
block = bloqueos.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/nodos.txt','r',encoding = 'utf-8') as node:
nodos = node.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Lineas.txt','r',encoding = 'utf-8') as listlineas:
tlines = listlineas.readlines()
count = 0
fileblock = codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Restricciones_Ruteo.txt','w',encoding = 'utf-8')
for forder in cr:
fo = forder.split(';')
for bloqueo in block:
check = bloqu
Solution
I'm going to give a frank assessment: I think that this code is effectively unmaintainable, and that you would be better of rewriting it from scratch, because you are using the wrong tool for the job. Essentially, you want to combine data from five tables to produce a query result. There are better ways to do that than writing 400 lines of
Option A: SQL
A lot of what you are doing is basically equivalent to a join in SQL. For example, this code…
… is just a textbook example of a LEFT OUTER JOIN operation:
(I don't know what all of your columns represent. I've used identifiers like
I'm not going to reverse-engineer all of your code, but I would be willing to bet that the entire program can be reduced to the
So how would you go about rewriting the code? If you already have a database at your company, such as sql-server or postgresql, then great, use that instead of your .txt files.
If you don't have a database handy, then just install sqlite, which is an SQL engine that doesn't require a database server. Import your .txt files to SQLite, either using the
Option B: PANDAS
If you prefer to avoid SQL and stick to Python and semicolon-delimited files, then there are libraries that can help you. In particular, pandas is designed to work with tabular data sets. The first page of code would look more like this:
Personally, I'd prefer to go with SQL than PANDAS for this task. Even if you aren't using a database now, I'm guessing that you will eventually wish that you had one, because you're already part way to building a database system yourself anyway.
Additional observations
There are other general issues with your program.
-
Each row of data is "stringly typed". That is, you're keeping each row as a semicolon-delimited string. As a result, you have
What you want to do instead is get it into 2D array form as soon as possible:
-
You're using the
For example, when you wrote
for loops and if blocks. This is a solved problem in computing, and there is no sense in reinventing the wheel.Option A: SQL
A lot of what you are doing is basically equivalent to a join in SQL. For example, this code…
fo = ''
with codecs.open('data.txt','w',encoding = 'utf-8') as data:
for forder in finalorder:
order = forder.split(';')
orden = [order[0],order[1],order[15],order[24],order[2],'','']
for familia in list_familias:
fam = familia.split(';')
if order[2] in fam[0]:
orden[5] = fam[1]
orden[6] = fam[2]
fo = ';'.join(orden)
data.write(fo+'\n')
with codecs.open('data.txt','r',encoding = 'utf-8') as cross:
cr = cross.readlines()… is just a textbook example of a LEFT OUTER JOIN operation:
CREATE VIEW CrossOrders AS
SELECT Order.col0 AS order_id
, Order.col1 AS item_rep
, Order.col15 AS est_com
, Order.col24
, Order.col2
, COALESCE(Familia.col1, '')
, COALESCE(Familia.col2, '')
FROM Order
LEFT OUTER JOIN Familia
ON Order.col2 = Familia.col0;(I don't know what all of your columns represent. I've used identifiers like
col24 as placeholders, but of course you should define your database tables with column names that make sense, like quantity or supplier).I'm not going to reverse-engineer all of your code, but I would be willing to bet that the entire program can be reduced to the
VIEW I defined above, plus one giant SQL query.So how would you go about rewriting the code? If you already have a database at your company, such as sql-server or postgresql, then great, use that instead of your .txt files.
If you don't have a database handy, then just install sqlite, which is an SQL engine that doesn't require a database server. Import your .txt files to SQLite, either using the
sqlite shell or Python code. Even if you have to re-create a new SQLite database from scratch with every batch run, import all of the CSV data, and throw away the database after the query, it's still worth it for the massive productivity gains. The resulting program would have a chance of being maintainable. It would likely be faster as well, as SQL engines are specifically optimized to do this kind of thing.Option B: PANDAS
If you prefer to avoid SQL and stick to Python and semicolon-delimited files, then there are libraries that can help you. In particular, pandas is designed to work with tabular data sets. The first page of code would look more like this:
import pandas as pd
finalorders = pd.read_csv('FO.txt', encoding='UTF-8', sep=';', header=None)
familias = pd.read_csv('Table-Familias.txt', encoding='UTF-8', sep=';', header=None)
orden = pd.merge(
left=finalorders, right=familias, how='left',
left_on=0, right_on=2
)[['0_x', '1_x', 15, 24, '2_x', '1_y', '2_y']]
normalize_caps = lambda s: s.lower().capitalize()
blocks = pd.read_csv('BloqLines.txt', encoding='UTF-8', sep=';', header=None, converters={5:normalize_caps, 6:normalize_caps})
tlines = pd.read_csv('Table-Lineas.txt', encoding='UTF-8', sep=';', header=None)Personally, I'd prefer to go with SQL than PANDAS for this task. Even if you aren't using a database now, I'm guessing that you will eventually wish that you had one, because you're already part way to building a database system yourself anyway.
Additional observations
There are other general issues with your program.
- Use more list comprehensions. Not only do they make your code more compact and readable, they also reduce the number of variables floating around in your code. (And you have a lot of similarly named variables like
node,nodos, andnod, which makes it confusing.)
-
Each row of data is "stringly typed". That is, you're keeping each row as a semicolon-delimited string. As a result, you have
line.split(';') littered all over the code, which is both inefficient and hard to follow.What you want to do instead is get it into 2D array form as soon as possible:
with codecs.open('Table-Familias.txt', 'r', encoding='utf-8') as familias:
list_familias = [line.rstrip().split(';') for line in familias]- As specified in PEP 8, indentation should be 4 spaces, not 3. Since whitespace is significant in Python, this is a pretty strong convention that you should follow.
-
You're using the
in keyword rather carelessly in many places.For example, when you wrote
if order[2] in fam[0]:, you really meant if order[2] == fam[0]:. Also, instead of if 'True' in check[0]:, you should write `if 'TrCode Snippets
fo = ''
with codecs.open('data.txt','w',encoding = 'utf-8') as data:
for forder in finalorder:
order = forder.split(';')
orden = [order[0],order[1],order[15],order[24],order[2],'','']
for familia in list_familias:
fam = familia.split(';')
if order[2] in fam[0]:
orden[5] = fam[1]
orden[6] = fam[2]
fo = ';'.join(orden)
data.write(fo+'\n')
with codecs.open('data.txt','r',encoding = 'utf-8') as cross:
cr = cross.readlines()CREATE VIEW CrossOrders AS
SELECT Order.col0 AS order_id
, Order.col1 AS item_rep
, Order.col15 AS est_com
, Order.col24
, Order.col2
, COALESCE(Familia.col1, '')
, COALESCE(Familia.col2, '')
FROM Order
LEFT OUTER JOIN Familia
ON Order.col2 = Familia.col0;import pandas as pd
finalorders = pd.read_csv('FO.txt', encoding='UTF-8', sep=';', header=None)
familias = pd.read_csv('Table-Familias.txt', encoding='UTF-8', sep=';', header=None)
orden = pd.merge(
left=finalorders, right=familias, how='left',
left_on=0, right_on=2
)[['0_x', '1_x', 15, 24, '2_x', '1_y', '2_y']]
normalize_caps = lambda s: s.lower().capitalize()
blocks = pd.read_csv('BloqLines.txt', encoding='UTF-8', sep=';', header=None, converters={5:normalize_caps, 6:normalize_caps})
tlines = pd.read_csv('Table-Lineas.txt', encoding='UTF-8', sep=';', header=None)with codecs.open('Table-Familias.txt', 'r', encoding='utf-8') as familias:
list_familias = [line.rstrip().split(';') for line in familias]Context
StackExchange Code Review Q#111747, answer score: 6
Revisions (0)
No revisions yet.