HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonsqlalchemyMinor

Efficient Pandas to MySQL "UPDATE... WHERE"

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
updatepandaswhereefficientmysql

Problem

I have a pandas DataFrame and a (MySQL) database with the same columns. The database is not managed by me.

I want to update the values in the database in an "UPDATE... WHERE" style, updating only some columns wherever some other columns match.

Here's my code:

import sqlalchemy as sqla

def save_to_db(final_df, passwd):

engine_str = 'mysql+mysqldb://username:{}@localhost/mydb'.format(passwd)

engine = sqla.create_engine(engine_str)

sm = sessionmaker(bind=engine)
session = sm()

metadata = sqla.MetaData(bind=engine)

datatable = sqla.Table('AdcsLogForProduct', metadata, autoload=True)

for ind, row in final_df.iterrows():
u = sqla.sql.update(datatable) \
.values({"q_ECI_B_x": row.q_ECI_B_x,
"q_ECI_B_y": row.q_ECI_B_y,
"q_ECI_B_z": row.q_ECI_B_z,
"q_ECI_B_s": row.q_ECI_B_s}) \
.where(sqla.and_(datatable.c.year == row.year,
datatable.c.month == row.month,
datatable.c.day == row.day,
datatable.c.hours == row.hours,
datatable.c.minutes == row.minutes,
datatable.c.seconds == row.seconds,
datatable.c.milliseconds == row.milliseconds,
datatable.c.microseconds == row.microseconds))

session.execute(u)

session.flush()
session.commit()


I'm doing this with plain sqlalchemy because apparently pandas' built-in SQL functions can't handle "UPDATE... WHERE" scenarios. However, this is really slow.

Isn't there a more efficient way to do this?

Solution

You have eight conditions to match for every UPDATE. A typical solution would store timestamps using a DATETIME or TIMESTAMP column, so that there is only one value to match.

For reasonable performance, ensure that the timestamp field is indexed.

Context

StackExchange Code Review Q#81795, answer score: 7

Revisions (0)

No revisions yet.