HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Fetching settings for batches of wafers using loops

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
settingswafersloopsbatchesforusingfetching

Problem

I have this code below that needs to query multiple database tables.

import mysql.connector
import itertools, time
cnx = mysql.connector.connect(host="host", port=3306, user="user", passwd="pass", db="factory")

cursor = cnx.cursor()

cursor.execute("select distinct(batch) from batchdata where equipment = 'DUV001'")
batch = cursor.fetchall()
cursor.execute("select distinct(settingname) from runsettings where runnumber = 1 and equipment = 'DUV001'")
settings = cursor.fetchall()

start =  time.time()

settings = list(itertools.chain(*settings))
batches = list(itertools.chain(*batch))

lstoflst = [] 
for batch in batches:
    cursor.execute("select distinct(wafer) from batchdata where batch=%s and equipment = 'DUV001'",(batch,))
    wafers = cursor.fetchall()
    wafers = list(itertools.chain(*wafers))

for wafer in wafers:
    waflist = {}
    for i in settings:
        cursor.execute("select float8value from runsettings where settingname = %s and wafer = %s",(i,wafer))
        vals = cursor.fetchall()
        waflist[i] = vals[0][0]

    lstoflst.append(waflist)

print time.time() - start


I know there are a lot of things to improve like:

  • Error handling



  • Database connection



  • prepared statements on the query



But I want to focus first on the performance of this code. It works fine, but its taking at least 5 to 10 seconds minimum before it can finish running the loop.

In what way can I improve the performance of the code above?

Solution

Your program reminds me of a Java applet that I wrote during my first summer internship. I spent a significant chunk of that summer writing code to query several database tables and match values from one table with values from another table. The performance sucked, but it was tolerable. Then the applet was deployed to an overseas office — and it took minutes to display one screen.

The lesson is, any time you execute queries in a loop, and the number of those queries depends on how much data you have, you're going to have a bad time. The overhead of issuing a query and retrieving the results is significant. What you should do instead is issue one query (or a small fixed number of queries) that gets all of the information you want.

I later learned that all of those troubles could have been avoided by writing a proper SQL query in the first place, namely a join query.

Here, it looks like you are interested in constructing lstoflst, which contains the runsettings used for run 1 of each wafer that was in a batch that went through equipment DUV001. The program should look more or less like this:

cnx = …
cursor = cnx.cursor()
cursor.execute("""
    SELECT batchdata.wafer
         , runsettings.settingname
         , runsettings.float8value
        FROM runsettings
            INNER JOIN batchdata
                ON batchdata.equipment = runsettings.equipment
                AND batchdata.wafer = runsettings.wafer
        WHERE
            runsettings.equipment = 'DUV001' AND
            runsettings.runnumber = 1
        ORDER BY batchdata.wafer;
""")
wafer_groups = itertools.groupby(cursor.fetchall(), key=operator.itemgetter(0))
lstoflst = [dict(row[1:] for row in group[1]) for group in wafer_groups]

Code Snippets

cnx = …
cursor = cnx.cursor()
cursor.execute("""
    SELECT batchdata.wafer
         , runsettings.settingname
         , runsettings.float8value
        FROM runsettings
            INNER JOIN batchdata
                ON batchdata.equipment = runsettings.equipment
                AND batchdata.wafer = runsettings.wafer
        WHERE
            runsettings.equipment = 'DUV001' AND
            runsettings.runnumber = 1
        ORDER BY batchdata.wafer;
""")
wafer_groups = itertools.groupby(cursor.fetchall(), key=operator.itemgetter(0))
lstoflst = [dict(row[1:] for row in group[1]) for group in wafer_groups]

Context

StackExchange Code Review Q#111680, answer score: 4

Revisions (0)

No revisions yet.