HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

MySQL-to-PostgreSQL migration script

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
postgresqlmysqlmigrationscript

Problem

I'm working on a Python script to migrate a MySQL database into a PostgreSQL database with a different Schema (different table structures, different datatypes and so on).

I'm a sysadmin and unfortunately I don't code very often. So I'm having some doubts about this initial programming phase.

I begin with the tables that are easy to be migrate (almost the same structure), but very soon I will have to transfer tables that need more operations to be converted for compatability.

My code actually looks like this:

```
#!/usr/bin/python

# Script Name: database-migration_msql-psql.py
# Description: Migrate mysql database a2
# into postgresql database a3.
# Created By: phphil.
# Date: 7 Oct 2015.
#

# ------------------------------
# Import standard libraries |
# ------------------------------
#
import os
import sys
import mysql.connector
import psycopg2
from pprint import pprint
import MySQLdb

# ------------------------------
# Import internal snippets |
# ------------------------------
#
from include.db_config import *
#from include.MySQLCursorDict import *

# ------------------------------
# Open database connections |
# ------------------------------
#
# Mysql connection
try:
cnx_msql = mysql.connector.connect( host=host_mysql, user=user_mysql, passwd=pswd_mysql, db=dbna_mysql )
except mysql.connector.Error as e:
print "MYSQL: Unable to connect!", e.msg
sys.exit(1)

# Postgresql connection
try:
cnx_psql = psycopg2.connect(conn_string_psql)
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)

# Cursors initializations
cur_msql = cnx_msql.cursor(dictionary=True)
cur_psql = cnx_psql.cursor()

# ---------------------------------
# A2.msql-table1 > A3.psql-table1 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5 FROM msql-table1")

for row in cur_msql:

### trasformation/ conversion of mysql data OR in other cases type casting
if r

Solution

This code is fine. As far as scripts go, I'm assuming this is something that is run once and then forgotten about. But of course, in the interest of code review...

Don't Repeat Yourself

You have seven blocks of code that look something like:

cur_msql.execute("SELECT ...")

for row in cur_msql:
  try:
    cur_psql.execute("INSERT INTO ...", row)
  except psycopg2.Error as e:
    print "cannot execute that query!!", e.pgerror
    sys.exit(...)


That's just crying out for a function:

def select_and_insert(select_query, insert_query):
    cur_msql.execute(select_query)

    try:
        for row in cur_msql:
            cur_psql.execute(insert_query, row)
        except psycopg2.Error as e:
            print 'failed to execute query', e.pgerror
            sys.exit(...)


That way, you just have 7 calls to select_and_insert, and all you're writing is the various queries without having to repeat all the extra logic.

If statements

In Python, unlike some functional languages, not every statement needs to evaluate to something. So in this check:

if row['user_id'] == 0:
  row['user_id'] = row['group_id']
else:
  pass


The else: pass is unnecessary and redundant. You could simply have done the row check.

Of course, this looks like it breaks the don't repeat yourself idea of the function - but we can simply move this if statement into the SQL query:

select ..., case when user_id == 0 then group_id else user_id end as user_id, ...


No if necessary.

Efficiency

Running one insert statement per row is very inefficient, since you can run one insert statement for all the rows. Now that we've refactored this, we only have to change it in one function. Check out this answer for how we might go about doing such a thing and what the performance implication is. We're talking order of magnitude.

Code Snippets

cur_msql.execute("SELECT ...")

for row in cur_msql:
  try:
    cur_psql.execute("INSERT INTO ...", row)
  except psycopg2.Error as e:
    print "cannot execute that query!!", e.pgerror
    sys.exit(...)
def select_and_insert(select_query, insert_query):
    cur_msql.execute(select_query)

    try:
        for row in cur_msql:
            cur_psql.execute(insert_query, row)
        except psycopg2.Error as e:
            print 'failed to execute query', e.pgerror
            sys.exit(...)
if row['user_id'] == 0:
  row['user_id'] = row['group_id']
else:
  pass
select ..., case when user_id == 0 then group_id else user_id end as user_id, ...

Context

StackExchange Code Review Q#107289, answer score: 8

Revisions (0)

No revisions yet.