HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Blocking production lines in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythonlinesblockingproduction

Problem

I have the following code that works great, but takes 40 mins to 1 hours to finish the task. It has at least 30 rules for blocking lines, and 30k+ orders to check. I'm new to Python (3 months working with it), so I don't know what can I change to make it run faster. The version in my computer is Python 2.7.9.

```
import codecs
import re

def find_word(text, search):
result = re.findall('\\b'+search+'\\b', text, flags=re.IGNORECASE)
if len(result)>0:
return True
else:
return False

with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/FO.txt','r',encoding = 'utf-8') as FO:
finalorder = FO.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Familias.txt','r',encoding = 'utf-8') as familias:
list_familias = familias.readlines()

fo = ''
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','w',encoding = 'utf-8') as data:
for forder in finalorder:
order = forder.split(';')
orden = [order[0],order[1],order[15],order[24],order[2],'','']
for familia in list_familias:
fam = familia.split(';')
if order[2] in fam[0]:
orden[5] = fam[1]
orden[6] = fam[2]
fo = ';'.join(orden)
data.write(fo+'\n')
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/data.txt','r',encoding = 'utf-8') as cross:
cr = cross.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/App_Data/BloqLines.txt','r',encoding = 'utf-8') as bloqueos:
block = bloqueos.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/nodos.txt','r',encoding = 'utf-8') as node:
nodos = node.readlines()
with codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Table-Lineas.txt','r',encoding = 'utf-8') as listlineas:
tlines = listlineas.readlines()
count = 0
fileblock = codecs.open('C:/inetpub/wwwroot/BloqueoLinea/DATA/Restricciones_Ruteo.txt','w',encoding = 'utf-8')
for forder in cr:
fo = forder.split(';')
for bloqueo in block:
check = bloqu

Solution

I'm going to give a frank assessment: I think that this code is effectively unmaintainable, and that you would be better of rewriting it from scratch, because you are using the wrong tool for the job. Essentially, you want to combine data from five tables to produce a query result. There are better ways to do that than writing 400 lines of for loops and if blocks. This is a solved problem in computing, and there is no sense in reinventing the wheel.

Option A: SQL

A lot of what you are doing is basically equivalent to a join in SQL. For example, this code…

fo = ''
with codecs.open('data.txt','w',encoding = 'utf-8') as data:
   for forder in finalorder:
      order = forder.split(';')
      orden = [order[0],order[1],order[15],order[24],order[2],'','']
      for familia in list_familias:
         fam = familia.split(';')
         if order[2] in fam[0]:
            orden[5] = fam[1]
            orden[6] = fam[2]
            fo = ';'.join(orden)
            data.write(fo+'\n')
with codecs.open('data.txt','r',encoding = 'utf-8') as cross:
   cr = cross.readlines()


… is just a textbook example of a LEFT OUTER JOIN operation:

CREATE VIEW CrossOrders AS
    SELECT Order.col0 AS order_id
         , Order.col1 AS item_rep
         , Order.col15 AS est_com
         , Order.col24
         , Order.col2
         , COALESCE(Familia.col1, '')
         , COALESCE(Familia.col2, '')
        FROM Order
            LEFT OUTER JOIN Familia
                ON Order.col2 = Familia.col0;


(I don't know what all of your columns represent. I've used identifiers like col24 as placeholders, but of course you should define your database tables with column names that make sense, like quantity or supplier).

I'm not going to reverse-engineer all of your code, but I would be willing to bet that the entire program can be reduced to the VIEW I defined above, plus one giant SQL query.

So how would you go about rewriting the code? If you already have a database at your company, such as sql-server or postgresql, then great, use that instead of your .txt files.

If you don't have a database handy, then just install sqlite, which is an SQL engine that doesn't require a database server. Import your .txt files to SQLite, either using the sqlite shell or Python code. Even if you have to re-create a new SQLite database from scratch with every batch run, import all of the CSV data, and throw away the database after the query, it's still worth it for the massive productivity gains. The resulting program would have a chance of being maintainable. It would likely be faster as well, as SQL engines are specifically optimized to do this kind of thing.

Option B: PANDAS

If you prefer to avoid SQL and stick to Python and semicolon-delimited files, then there are libraries that can help you. In particular, pandas is designed to work with tabular data sets. The first page of code would look more like this:

import pandas as pd

finalorders = pd.read_csv('FO.txt', encoding='UTF-8', sep=';', header=None)
familias = pd.read_csv('Table-Familias.txt', encoding='UTF-8', sep=';', header=None)
orden = pd.merge(
    left=finalorders, right=familias, how='left',
    left_on=0, right_on=2
)[['0_x', '1_x', 15, 24, '2_x', '1_y', '2_y']]

normalize_caps = lambda s: s.lower().capitalize()
blocks = pd.read_csv('BloqLines.txt', encoding='UTF-8', sep=';', header=None, converters={5:normalize_caps, 6:normalize_caps})
tlines = pd.read_csv('Table-Lineas.txt', encoding='UTF-8', sep=';', header=None)


Personally, I'd prefer to go with SQL than PANDAS for this task. Even if you aren't using a database now, I'm guessing that you will eventually wish that you had one, because you're already part way to building a database system yourself anyway.

Additional observations

There are other general issues with your program.

  • Use more list comprehensions. Not only do they make your code more compact and readable, they also reduce the number of variables floating around in your code. (And you have a lot of similarly named variables like node, nodos, and nod, which makes it confusing.)



-
Each row of data is "stringly typed". That is, you're keeping each row as a semicolon-delimited string. As a result, you have line.split(';') littered all over the code, which is both inefficient and hard to follow.

What you want to do instead is get it into 2D array form as soon as possible:

with codecs.open('Table-Familias.txt', 'r', encoding='utf-8') as familias:
    list_familias = [line.rstrip().split(';') for line in familias]


  • As specified in PEP 8, indentation should be 4 spaces, not 3. Since whitespace is significant in Python, this is a pretty strong convention that you should follow.



-
You're using the in keyword rather carelessly in many places.

For example, when you wrote if order[2] in fam[0]:, you really meant if order[2] == fam[0]:. Also, instead of if 'True' in check[0]:, you should write `if 'Tr

Code Snippets

fo = ''
with codecs.open('data.txt','w',encoding = 'utf-8') as data:
   for forder in finalorder:
      order = forder.split(';')
      orden = [order[0],order[1],order[15],order[24],order[2],'','']
      for familia in list_familias:
         fam = familia.split(';')
         if order[2] in fam[0]:
            orden[5] = fam[1]
            orden[6] = fam[2]
            fo = ';'.join(orden)
            data.write(fo+'\n')
with codecs.open('data.txt','r',encoding = 'utf-8') as cross:
   cr = cross.readlines()
CREATE VIEW CrossOrders AS
    SELECT Order.col0 AS order_id
         , Order.col1 AS item_rep
         , Order.col15 AS est_com
         , Order.col24
         , Order.col2
         , COALESCE(Familia.col1, '')
         , COALESCE(Familia.col2, '')
        FROM Order
            LEFT OUTER JOIN Familia
                ON Order.col2 = Familia.col0;
import pandas as pd

finalorders = pd.read_csv('FO.txt', encoding='UTF-8', sep=';', header=None)
familias = pd.read_csv('Table-Familias.txt', encoding='UTF-8', sep=';', header=None)
orden = pd.merge(
    left=finalorders, right=familias, how='left',
    left_on=0, right_on=2
)[['0_x', '1_x', 15, 24, '2_x', '1_y', '2_y']]

normalize_caps = lambda s: s.lower().capitalize()
blocks = pd.read_csv('BloqLines.txt', encoding='UTF-8', sep=';', header=None, converters={5:normalize_caps, 6:normalize_caps})
tlines = pd.read_csv('Table-Lineas.txt', encoding='UTF-8', sep=';', header=None)
with codecs.open('Table-Familias.txt', 'r', encoding='utf-8') as familias:
    list_familias = [line.rstrip().split(';') for line in familias]

Context

StackExchange Code Review Q#111747, answer score: 6

Revisions (0)

No revisions yet.