patternpythonMinor
Replacing MySQL's AUTO_INCREMENT with Postgres' SERIAL
Viewed 0 times
auto_incrementpostgreswithserialmysqlreplacing
Problem
I am making changes to a Python script that converts MySQL scripts to PostgreSQL, and I want to replace strings such as
This is the code I got to work:
Notes:
id INTEGER NOT NULL AUTO_INCREMENT with id SERIAL NOT NULL.This is the code I got to work:
import re
line = 'id INTEGER(11) NOT NULL AUTO_INCREMENT,'
numeric_types = ['(BIG|MEDIUM|SMALL|TINY)*INT(EGER)*(\(.*?\))*',
'DEC(IMAL)*(\(.*?\))*', 'NUMERIC(\(.*\))*', 'FIXED(\(.*\))*',
'FLOAT(\(.*\))*', 'DOUBLE( PRECISION)*(\(.*?\))*', 'REAL(\(.*\))*',
'BIT', 'BOOL(EAN)*']
for i in range(len(numeric_types)):
type = numeric_types[i]
if (re.search(type, line)):
line = re.sub(type, "SERIAL", line).replace(" AUTO_INCREMENT", "")
print line
breakNotes:
linewill be a column from a CREATE TABLE statement inputted by the user
- I could probably join all regular expressions into one using
ORs, but I do not know if that would be a good practice or not
Solution
The MySQL syntax for a
To handle the general case is a non-trivial task! I highly recommend writing your parsing regexes such that they are recognizably based on the documentation. (One question I have is why you look for
According to the PostgreSQL documentation:
The data types
is equivalent to specifying:
Note that the equivalent MySQL syntax for that would be
If I were trying to machine-translate a database schema specification, I would prefer to independently translate the size-and-range aspect of the numeric type and the auto-incrementing nature of the column. Otherwise, trying to handle the column name, the data type, any
Specifically, I would say that MySQL's
Such a translation might not result in pretty and idiomatic PostgreSQL code, but the translation code would be more robust and maintainable.
CREATE TABLE command is, in part:column_definition:
data_type [NOT NULL | NULL] [DEFAULT default_value]
[AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
[COMMENT 'string']
[COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}]
[STORAGE {DISK|MEMORY|DEFAULT}]
[reference_definition]
| data_type [GENERATED ALWAYS] AS (expression)
[VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
[NOT NULL | NULL] [[PRIMARY] KEY]
data_type:
BIT[(length)]
| TINYINT[(length)] [UNSIGNED] [ZEROFILL]
| SMALLINT[(length)] [UNSIGNED] [ZEROFILL]
| MEDIUMINT[(length)] [UNSIGNED] [ZEROFILL]
| INT[(length)] [UNSIGNED] [ZEROFILL]
| INTEGER[(length)] [UNSIGNED] [ZEROFILL]
| BIGINT[(length)] [UNSIGNED] [ZEROFILL]
| REAL[(length,decimals)] [UNSIGNED] [ZEROFILL]
| DOUBLE[(length,decimals)] [UNSIGNED] [ZEROFILL]
| FLOAT[(length,decimals)] [UNSIGNED] [ZEROFILL]
| DECIMAL[(length[,decimals])] [UNSIGNED] [ZEROFILL]
| NUMERIC[(length[,decimals])] [UNSIGNED] [ZEROFILL]
| DATE
| TIME[(fsp)]
| TIMESTAMP[(fsp)]
| DATETIME[(fsp)]
| YEAR
| CHAR[(length)] [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| VARCHAR(length) [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| BINARY[(length)]
| VARBINARY(length)
| TINYBLOB
| BLOB
| MEDIUMBLOB
| LONGBLOB
| TINYTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| TEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| MEDIUMTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| LONGTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| ENUM(value1,value2,value3,...)
[CHARACTER SET charset_name] [COLLATE collation_name]
| SET(value1,value2,value3,...)
[CHARACTER SET charset_name] [COLLATE collation_name]
| JSON
| spatial_typeTo handle the general case is a non-trivial task! I highly recommend writing your parsing regexes such that they are recognizably based on the documentation. (One question I have is why you look for
BOOL(EAN)* when MySQL has no boolean type.)According to the PostgreSQL documentation:
The data types
smallserial, serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property supported by some other databases). In the current implementation, specifying:CREATE TABLE tablename (
colname SERIAL
);is equivalent to specifying:
CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;Note that the equivalent MySQL syntax for that would be
CREATE TABLE tablename (
colname integer NOT NULL AUTO_INCREMENT
);If I were trying to machine-translate a database schema specification, I would prefer to independently translate the size-and-range aspect of the numeric type and the auto-incrementing nature of the column. Otherwise, trying to handle the column name, the data type, any
NOT NULL constraints, primary and foreign keys, etc. all at once will result in a mess.Specifically, I would say that MySQL's
AUTO_INCREMENT keyword translates directly to PostgreSQL's DEFAULT nextval(…), with a CREATE SEQUENCE … preamble and an ALTER SEQUENCE … OWNED BY … postamble.Such a translation might not result in pretty and idiomatic PostgreSQL code, but the translation code would be more robust and maintainable.
Code Snippets
column_definition:
data_type [NOT NULL | NULL] [DEFAULT default_value]
[AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
[COMMENT 'string']
[COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}]
[STORAGE {DISK|MEMORY|DEFAULT}]
[reference_definition]
| data_type [GENERATED ALWAYS] AS (expression)
[VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
[NOT NULL | NULL] [[PRIMARY] KEY]
data_type:
BIT[(length)]
| TINYINT[(length)] [UNSIGNED] [ZEROFILL]
| SMALLINT[(length)] [UNSIGNED] [ZEROFILL]
| MEDIUMINT[(length)] [UNSIGNED] [ZEROFILL]
| INT[(length)] [UNSIGNED] [ZEROFILL]
| INTEGER[(length)] [UNSIGNED] [ZEROFILL]
| BIGINT[(length)] [UNSIGNED] [ZEROFILL]
| REAL[(length,decimals)] [UNSIGNED] [ZEROFILL]
| DOUBLE[(length,decimals)] [UNSIGNED] [ZEROFILL]
| FLOAT[(length,decimals)] [UNSIGNED] [ZEROFILL]
| DECIMAL[(length[,decimals])] [UNSIGNED] [ZEROFILL]
| NUMERIC[(length[,decimals])] [UNSIGNED] [ZEROFILL]
| DATE
| TIME[(fsp)]
| TIMESTAMP[(fsp)]
| DATETIME[(fsp)]
| YEAR
| CHAR[(length)] [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| VARCHAR(length) [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| BINARY[(length)]
| VARBINARY(length)
| TINYBLOB
| BLOB
| MEDIUMBLOB
| LONGBLOB
| TINYTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| TEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| MEDIUMTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| LONGTEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
| ENUM(value1,value2,value3,...)
[CHARACTER SET charset_name] [COLLATE collation_name]
| SET(value1,value2,value3,...)
[CHARACTER SET charset_name] [COLLATE collation_name]
| JSON
| spatial_typeCREATE TABLE tablename (
colname SERIAL
);CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;CREATE TABLE tablename (
colname integer NOT NULL AUTO_INCREMENT
);Context
StackExchange Code Review Q#156174, answer score: 5
Revisions (0)
No revisions yet.