HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Replacing MySQL's AUTO_INCREMENT with Postgres' SERIAL

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
auto_incrementpostgreswithserialmysqlreplacing

Problem

I am making changes to a Python script that converts MySQL scripts to PostgreSQL, and I want to replace strings such as id INTEGER NOT NULL AUTO_INCREMENT with id SERIAL NOT NULL.

This is the code I got to work:

import re

line = 'id INTEGER(11) NOT NULL AUTO_INCREMENT,'

numeric_types = ['(BIG|MEDIUM|SMALL|TINY)*INT(EGER)*(\(.*?\))*',
'DEC(IMAL)*(\(.*?\))*', 'NUMERIC(\(.*\))*', 'FIXED(\(.*\))*', 
'FLOAT(\(.*\))*', 'DOUBLE( PRECISION)*(\(.*?\))*', 'REAL(\(.*\))*', 
'BIT', 'BOOL(EAN)*']

for i in range(len(numeric_types)):
    type = numeric_types[i]
    if (re.search(type, line)):
        line = re.sub(type, "SERIAL", line).replace(" AUTO_INCREMENT", "")
        print line
        break


Notes:

  • line will be a column from a CREATE TABLE statement inputted by the user



  • I could probably join all regular expressions into one using ORs, but I do not know if that would be a good practice or not

Solution

The MySQL syntax for a CREATE TABLE command is, in part:

column_definition:
    data_type [NOT NULL | NULL] [DEFAULT default_value]
      [AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
      [COMMENT 'string']
      [COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}]
      [STORAGE {DISK|MEMORY|DEFAULT}]
      [reference_definition]
  | data_type [GENERATED ALWAYS] AS (expression)
      [VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
      [NOT NULL | NULL] [[PRIMARY] KEY]

data_type:
    BIT[(length)]
  | TINYINT[(length)] [UNSIGNED] [ZEROFILL]
  | SMALLINT[(length)] [UNSIGNED] [ZEROFILL]
  | MEDIUMINT[(length)] [UNSIGNED] [ZEROFILL]
  | INT[(length)] [UNSIGNED] [ZEROFILL]
  | INTEGER[(length)] [UNSIGNED] [ZEROFILL]
  | BIGINT[(length)] [UNSIGNED] [ZEROFILL]
  | REAL[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | DOUBLE[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | FLOAT[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | DECIMAL[(length[,decimals])] [UNSIGNED] [ZEROFILL]
  | NUMERIC[(length[,decimals])] [UNSIGNED] [ZEROFILL]
  | DATE
  | TIME[(fsp)]
  | TIMESTAMP[(fsp)]
  | DATETIME[(fsp)]
  | YEAR
  | CHAR[(length)] [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | VARCHAR(length) [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | BINARY[(length)]
  | VARBINARY(length)
  | TINYBLOB
  | BLOB
  | MEDIUMBLOB
  | LONGBLOB
  | TINYTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | TEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | MEDIUMTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | LONGTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | ENUM(value1,value2,value3,...)
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | SET(value1,value2,value3,...)
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | JSON
  | spatial_type


To handle the general case is a non-trivial task! I highly recommend writing your parsing regexes such that they are recognizably based on the documentation. (One question I have is why you look for BOOL(EAN)* when MySQL has no boolean type.)

According to the PostgreSQL documentation:


The data types smallserial, serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property supported by some other databases). In the current implementation, specifying:

CREATE TABLE tablename (
    colname SERIAL
);




is equivalent to specifying:

CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
    colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;


Note that the equivalent MySQL syntax for that would be

CREATE TABLE tablename (
    colname integer NOT NULL AUTO_INCREMENT
);


If I were trying to machine-translate a database schema specification, I would prefer to independently translate the size-and-range aspect of the numeric type and the auto-incrementing nature of the column. Otherwise, trying to handle the column name, the data type, any NOT NULL constraints, primary and foreign keys, etc. all at once will result in a mess.

Specifically, I would say that MySQL's AUTO_INCREMENT keyword translates directly to PostgreSQL's DEFAULT nextval(…), with a CREATE SEQUENCE … preamble and an ALTER SEQUENCE … OWNED BY … postamble.

Such a translation might not result in pretty and idiomatic PostgreSQL code, but the translation code would be more robust and maintainable.

Code Snippets

column_definition:
    data_type [NOT NULL | NULL] [DEFAULT default_value]
      [AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
      [COMMENT 'string']
      [COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}]
      [STORAGE {DISK|MEMORY|DEFAULT}]
      [reference_definition]
  | data_type [GENERATED ALWAYS] AS (expression)
      [VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
      [NOT NULL | NULL] [[PRIMARY] KEY]

data_type:
    BIT[(length)]
  | TINYINT[(length)] [UNSIGNED] [ZEROFILL]
  | SMALLINT[(length)] [UNSIGNED] [ZEROFILL]
  | MEDIUMINT[(length)] [UNSIGNED] [ZEROFILL]
  | INT[(length)] [UNSIGNED] [ZEROFILL]
  | INTEGER[(length)] [UNSIGNED] [ZEROFILL]
  | BIGINT[(length)] [UNSIGNED] [ZEROFILL]
  | REAL[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | DOUBLE[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | FLOAT[(length,decimals)] [UNSIGNED] [ZEROFILL]
  | DECIMAL[(length[,decimals])] [UNSIGNED] [ZEROFILL]
  | NUMERIC[(length[,decimals])] [UNSIGNED] [ZEROFILL]
  | DATE
  | TIME[(fsp)]
  | TIMESTAMP[(fsp)]
  | DATETIME[(fsp)]
  | YEAR
  | CHAR[(length)] [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | VARCHAR(length) [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | BINARY[(length)]
  | VARBINARY(length)
  | TINYBLOB
  | BLOB
  | MEDIUMBLOB
  | LONGBLOB
  | TINYTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | TEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | MEDIUMTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | LONGTEXT [BINARY]
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | ENUM(value1,value2,value3,...)
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | SET(value1,value2,value3,...)
      [CHARACTER SET charset_name] [COLLATE collation_name]
  | JSON
  | spatial_type
CREATE TABLE tablename (
    colname SERIAL
);
CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
    colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
CREATE TABLE tablename (
    colname integer NOT NULL AUTO_INCREMENT
);

Context

StackExchange Code Review Q#156174, answer score: 5

Revisions (0)

No revisions yet.