patternpythonMinor
MySQL-to-PostgreSQL migration script
Viewed 0 times
postgresqlmysqlmigrationscript
Problem
I'm working on a Python script to migrate a MySQL database into a PostgreSQL database with a different Schema (different table structures, different datatypes and so on).
I'm a sysadmin and unfortunately I don't code very often. So I'm having some doubts about this initial programming phase.
I begin with the tables that are easy to be migrate (almost the same structure), but very soon I will have to transfer tables that need more operations to be converted for compatability.
My code actually looks like this:
```
#!/usr/bin/python
# Script Name: database-migration_msql-psql.py
# Description: Migrate mysql database a2
# into postgresql database a3.
# Created By: phphil.
# Date: 7 Oct 2015.
#
# ------------------------------
# Import standard libraries |
# ------------------------------
#
import os
import sys
import mysql.connector
import psycopg2
from pprint import pprint
import MySQLdb
# ------------------------------
# Import internal snippets |
# ------------------------------
#
from include.db_config import *
#from include.MySQLCursorDict import *
# ------------------------------
# Open database connections |
# ------------------------------
#
# Mysql connection
try:
cnx_msql = mysql.connector.connect( host=host_mysql, user=user_mysql, passwd=pswd_mysql, db=dbna_mysql )
except mysql.connector.Error as e:
print "MYSQL: Unable to connect!", e.msg
sys.exit(1)
# Postgresql connection
try:
cnx_psql = psycopg2.connect(conn_string_psql)
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)
# Cursors initializations
cur_msql = cnx_msql.cursor(dictionary=True)
cur_psql = cnx_psql.cursor()
# ---------------------------------
# A2.msql-table1 > A3.psql-table1 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5 FROM msql-table1")
for row in cur_msql:
### trasformation/ conversion of mysql data OR in other cases type casting
if r
I'm a sysadmin and unfortunately I don't code very often. So I'm having some doubts about this initial programming phase.
I begin with the tables that are easy to be migrate (almost the same structure), but very soon I will have to transfer tables that need more operations to be converted for compatability.
My code actually looks like this:
```
#!/usr/bin/python
# Script Name: database-migration_msql-psql.py
# Description: Migrate mysql database a2
# into postgresql database a3.
# Created By: phphil.
# Date: 7 Oct 2015.
#
# ------------------------------
# Import standard libraries |
# ------------------------------
#
import os
import sys
import mysql.connector
import psycopg2
from pprint import pprint
import MySQLdb
# ------------------------------
# Import internal snippets |
# ------------------------------
#
from include.db_config import *
#from include.MySQLCursorDict import *
# ------------------------------
# Open database connections |
# ------------------------------
#
# Mysql connection
try:
cnx_msql = mysql.connector.connect( host=host_mysql, user=user_mysql, passwd=pswd_mysql, db=dbna_mysql )
except mysql.connector.Error as e:
print "MYSQL: Unable to connect!", e.msg
sys.exit(1)
# Postgresql connection
try:
cnx_psql = psycopg2.connect(conn_string_psql)
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)
# Cursors initializations
cur_msql = cnx_msql.cursor(dictionary=True)
cur_psql = cnx_psql.cursor()
# ---------------------------------
# A2.msql-table1 > A3.psql-table1 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5 FROM msql-table1")
for row in cur_msql:
### trasformation/ conversion of mysql data OR in other cases type casting
if r
Solution
This code is fine. As far as scripts go, I'm assuming this is something that is run once and then forgotten about. But of course, in the interest of code review...
Don't Repeat Yourself
You have seven blocks of code that look something like:
That's just crying out for a function:
That way, you just have 7 calls to
If statements
In Python, unlike some functional languages, not every statement needs to evaluate to something. So in this check:
The
Of course, this looks like it breaks the don't repeat yourself idea of the function - but we can simply move this if statement into the SQL query:
No
Efficiency
Running one
Don't Repeat Yourself
You have seven blocks of code that look something like:
cur_msql.execute("SELECT ...")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO ...", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit(...)That's just crying out for a function:
def select_and_insert(select_query, insert_query):
cur_msql.execute(select_query)
try:
for row in cur_msql:
cur_psql.execute(insert_query, row)
except psycopg2.Error as e:
print 'failed to execute query', e.pgerror
sys.exit(...)That way, you just have 7 calls to
select_and_insert, and all you're writing is the various queries without having to repeat all the extra logic.If statements
In Python, unlike some functional languages, not every statement needs to evaluate to something. So in this check:
if row['user_id'] == 0:
row['user_id'] = row['group_id']
else:
passThe
else: pass is unnecessary and redundant. You could simply have done the row check.Of course, this looks like it breaks the don't repeat yourself idea of the function - but we can simply move this if statement into the SQL query:
select ..., case when user_id == 0 then group_id else user_id end as user_id, ...No
if necessary. Efficiency
Running one
insert statement per row is very inefficient, since you can run one insert statement for all the rows. Now that we've refactored this, we only have to change it in one function. Check out this answer for how we might go about doing such a thing and what the performance implication is. We're talking order of magnitude.Code Snippets
cur_msql.execute("SELECT ...")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO ...", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit(...)def select_and_insert(select_query, insert_query):
cur_msql.execute(select_query)
try:
for row in cur_msql:
cur_psql.execute(insert_query, row)
except psycopg2.Error as e:
print 'failed to execute query', e.pgerror
sys.exit(...)if row['user_id'] == 0:
row['user_id'] = row['group_id']
else:
passselect ..., case when user_id == 0 then group_id else user_id end as user_id, ...Context
StackExchange Code Review Q#107289, answer score: 8
Revisions (0)
No revisions yet.