patternpythonMinor
Fetching settings for batches of wafers using loops
Viewed 0 times
settingswafersloopsbatchesforusingfetching
Problem
I have this code below that needs to query multiple database tables.
I know there are a lot of things to improve like:
But I want to focus first on the performance of this code. It works fine, but its taking at least 5 to 10 seconds minimum before it can finish running the loop.
In what way can I improve the performance of the code above?
import mysql.connector
import itertools, time
cnx = mysql.connector.connect(host="host", port=3306, user="user", passwd="pass", db="factory")
cursor = cnx.cursor()
cursor.execute("select distinct(batch) from batchdata where equipment = 'DUV001'")
batch = cursor.fetchall()
cursor.execute("select distinct(settingname) from runsettings where runnumber = 1 and equipment = 'DUV001'")
settings = cursor.fetchall()
start = time.time()
settings = list(itertools.chain(*settings))
batches = list(itertools.chain(*batch))
lstoflst = []
for batch in batches:
cursor.execute("select distinct(wafer) from batchdata where batch=%s and equipment = 'DUV001'",(batch,))
wafers = cursor.fetchall()
wafers = list(itertools.chain(*wafers))
for wafer in wafers:
waflist = {}
for i in settings:
cursor.execute("select float8value from runsettings where settingname = %s and wafer = %s",(i,wafer))
vals = cursor.fetchall()
waflist[i] = vals[0][0]
lstoflst.append(waflist)
print time.time() - startI know there are a lot of things to improve like:
- Error handling
- Database connection
- prepared statements on the query
But I want to focus first on the performance of this code. It works fine, but its taking at least 5 to 10 seconds minimum before it can finish running the loop.
In what way can I improve the performance of the code above?
Solution
Your program reminds me of a Java applet that I wrote during my first summer internship. I spent a significant chunk of that summer writing code to query several database tables and match values from one table with values from another table. The performance sucked, but it was tolerable. Then the applet was deployed to an overseas office — and it took minutes to display one screen.
The lesson is, any time you execute queries in a loop, and the number of those queries depends on how much data you have, you're going to have a bad time. The overhead of issuing a query and retrieving the results is significant. What you should do instead is issue one query (or a small fixed number of queries) that gets all of the information you want.
I later learned that all of those troubles could have been avoided by writing a proper SQL query in the first place, namely a join query.
Here, it looks like you are interested in constructing
The lesson is, any time you execute queries in a loop, and the number of those queries depends on how much data you have, you're going to have a bad time. The overhead of issuing a query and retrieving the results is significant. What you should do instead is issue one query (or a small fixed number of queries) that gets all of the information you want.
I later learned that all of those troubles could have been avoided by writing a proper SQL query in the first place, namely a join query.
Here, it looks like you are interested in constructing
lstoflst, which contains the runsettings used for run 1 of each wafer that was in a batch that went through equipment DUV001. The program should look more or less like this:cnx = …
cursor = cnx.cursor()
cursor.execute("""
SELECT batchdata.wafer
, runsettings.settingname
, runsettings.float8value
FROM runsettings
INNER JOIN batchdata
ON batchdata.equipment = runsettings.equipment
AND batchdata.wafer = runsettings.wafer
WHERE
runsettings.equipment = 'DUV001' AND
runsettings.runnumber = 1
ORDER BY batchdata.wafer;
""")
wafer_groups = itertools.groupby(cursor.fetchall(), key=operator.itemgetter(0))
lstoflst = [dict(row[1:] for row in group[1]) for group in wafer_groups]Code Snippets
cnx = …
cursor = cnx.cursor()
cursor.execute("""
SELECT batchdata.wafer
, runsettings.settingname
, runsettings.float8value
FROM runsettings
INNER JOIN batchdata
ON batchdata.equipment = runsettings.equipment
AND batchdata.wafer = runsettings.wafer
WHERE
runsettings.equipment = 'DUV001' AND
runsettings.runnumber = 1
ORDER BY batchdata.wafer;
""")
wafer_groups = itertools.groupby(cursor.fetchall(), key=operator.itemgetter(0))
lstoflst = [dict(row[1:] for row in group[1]) for group in wafer_groups]Context
StackExchange Code Review Q#111680, answer score: 4
Revisions (0)
No revisions yet.