HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Build a MySQL upsert query to insert/update multiple records in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
insertupdatequeryupsertrecordsmysqlpythonmultiplebuild

Problem

I'm generating a multi-row insert/update with the MySQLdb/MySQL-python module from lists of data rows and field names. For reference, see the executemany() example in the docs.

You can see some revised code on GitHub.

def upsert(db,table,fields,object_list):
    cursor = db.cursor();
    placeholders = ["%s" for x in fields]
    assignments = ["{x} = VALUES({x})".format(x=x) for x in fields]

    query_string = """INSERT INTO
    {table}
    ({fields})
    VALUES
    ({placeholders})
    ON DUPLICATE KEY UPDATE {assignments}"""

    cursor.executemany(query_string.format(
        table = table,
        fields = ", ".join(fields),
        placeholders = ", ".join(placeholders),
        assignments = ", ".join(assignments)
    ),object_list)
    db.commit()


Should I be quoting or escaping some strings/fields? The data is made safe by parameterized queries, but the table and field names could still cause trouble. I'm less concerned about security than failed queries due to odd characters.

Is there already a library out there for this? The existing solutions for upsert and batch-insert seem more generalized, and therefore slower, than implementing a MySQL-specific solution.

Also, I'm clearly in love with str.format()—is this the best way of handling string composition?

Here's some code to run the function, assuming a database named demo exists and mysql is listening on the localhost socket.

import MySQLdb
db = MySQLdb.connect(host="localhost", user="root", passwd="", db="demo", charset="utf8")
c = db.cursor()

c.execute("""DROP TABLE IF EXISTS upsert_demo""")

c.execute("""CREATE TABLE upsert_demo (
  `id` int(11) unsigned NOT NULL,
  `foo` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8""")

c.execute("""INSERT INTO upsert_demo
  (`id`,`foo`)
  VALUES (1, 'baz')""")

demo_fields = ("id","foo")
demo_objects = [(1,"upserted!"),(2,"new record")]

upsert(db,"upsert_demo",demo_fields,demo_objects)

Solution

Should I be quoting or escaping some strings/fields?

Yes, you should definitely escape table and field names. Currently this is trivial to exploit. It should not be possible to produce invalid SQL with any input parameters. If you get to that point there might not be any exploits left.


The existing solutions for upsert and batch-insert seem more generalized, and therefore slower, than implementing a MySQL-specific solution.

Which options are these? And have you tested them? I would be very surprised if there are no options available which would save you much more time overall than rolling your own.


Also, I'm clearly in love with str.format()—is this the best way of handling string composition?

str.format() is very nice for relatively simple format strings, but in this code it's hard to see what the actual result will be for some given input. I would pull out variables for the various .join()ed strings, but even then it's a big string to compose.

Some general things:

  • Use more descriptive variable names. database = MySQLdb.connect(… and cursor = database.cursor() are much more descriptive.



  • Use transactions by initialising your cursors using with database.cursor() as cursor:



  • You'll want to run your code through pep8 to improve readability.

Context

StackExchange Code Review Q#158227, answer score: 8

Revisions (0)

No revisions yet.