patternsqlCritical
What is the best way to store an email address in PostgreSQL?
Viewed 0 times
postgresqltheaddresswhatemailwaystorebest
Problem
What would be the right datatype to store email addresses in PostgreSQL?
I can use
I can use
varchar (or even text), but I wonder if there is a more specific data type for emails.Solution
Custom
I don't think using
Using HTML5
Currently the most correct answer to the question what is an e-mail address is specified in RFC5322. That spec is insanely complex[2], so much so that everything breaks it. HTML5 contains a different spec for email,
This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here. [...] The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
This is likely what you want, and if it's good enough for HTML5, it's probably good enough for you. We can make use of that directly in PostgreSQL. I also use
Now you can do...
%%CODEBLOCK_2%%
But not
%%CODEBLOCK_3%%
Because both of those return
%%CODEBLOCK_4%%
Because this is based on citext too
%%CODEBLOCK_5%%
returns true by default.
Using
As an important note, there is a more correct method of doing this that is far more complex using
%%CODEBLOCK_6%%
Then create the function, notice we mark at as an
%%CODEBLOCK_7%%
Then create the domain,
%%CODEBLOCK_8%%
Footnotes
Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time.
DOMAINsI don't think using
citext (case-insensitive) is enough[1]. Using PostgreSQL we can create a custom domain which is essentially some defined constraints over a type. We can create a domain for instance over the citext type, or over text.Using HTML5
type=email specCurrently the most correct answer to the question what is an e-mail address is specified in RFC5322. That spec is insanely complex[2], so much so that everything breaks it. HTML5 contains a different spec for email,
This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here. [...] The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/This is likely what you want, and if it's good enough for HTML5, it's probably good enough for you. We can make use of that directly in PostgreSQL. I also use
citext here (which technically means you can simply the regex a bit visually by removing either the upper-case or lower-case).CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
Now you can do...
SELECT 'asdf@foobar.com'::email;
But not
SELECT 'asdf@foob,,ar.com'::email;
SELECT 'asd@f@foobar.com'::email;
Because both of those return
ERROR: value for domain email violates check constraint "email_check"
Because this is based on citext too
SELECT 'asdf@foobar.com'::email = 'ASdf@fooBAR.com';
returns true by default.
Using plperlu/Email::Valid
As an important note, there is a more correct method of doing this that is far more complex using plperlu. If you need this level of correctness you do not want citext. Email::Valid can even check if the domain has an MX record (example in docs of Email::Valid)! First, add plperlu (requires superuser).
CREATE EXTENSION plperlu;
Then create the function, notice we mark at as an IMMUTABLE:
CREATE FUNCTION valid_email(text)
RETURNS boolean
LANGUAGE plperlu
IMMUTABLE LEAKPROOF STRICT AS
$
use Email::Valid;
my $email = shift;
Email::Valid->address($email) or die "Invalid email address: $email\n";
return 'true';
$;
Then create the domain,
CREATE DOMAIN validemail AS text NOT NULL
CONSTRAINT validemail_check CHECK (valid_email(VALUE));
Footnotes
- Using
citext is technically wrong. SMTP defines local-part as being case sensitive. But, again, this is a case of the spec being stupid. It contains its own identity crises. The spec says local-part (the part before the @) "MAY be case-sensitive" ... "MUST BE treated as case sensitive" ... and yet "exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged."
- The spec for an email address is so complex, it's not even self-contained. Complex is truly an understatement, those making the spec don't even understand it.. From the docs on regular-expression.info
Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time. );Now you can do...
%%CODEBLOCK_2%%
But not
%%CODEBLOCK_3%%
Because both of those return
%%CODEBLOCK_4%%
Because this is based on citext too
%%CODEBLOCK_5%%
returns true by default.
Using
plperlu/Email::ValidAs an important note, there is a more correct method of doing this that is far more complex using
plperlu. If you need this level of correctness you do not want citext. Email::Valid can even check if the domain has an MX record (example in docs of Email::Valid)! First, add plperlu (requires superuser).%%CODEBLOCK_6%%
Then create the function, notice we mark at as an
IMMUTABLE:%%CODEBLOCK_7%%
Then create the domain,
%%CODEBLOCK_8%%
Footnotes
- Using
citextis technically wrong. SMTP defineslocal-partas being case sensitive. But, again, this is a case of the spec being stupid. It contains its own identity crises. The spec sayslocal-part(the part before the@) "MAY be case-sensitive" ... "MUST BE treated as case sensitive" ... and yet "exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged."
- The spec for an email address is so complex, it's not even self-contained. Complex is truly an understatement, those making the spec don't even understand it.. From the docs on regular-expression.info
Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time.
Code Snippets
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );SELECT 'asdf@foobar.com'::email;SELECT 'asdf@foob,,ar.com'::email;
SELECT 'asd@f@foobar.com'::email;ERROR: value for domain email violates check constraint "email_check"Context
StackExchange Database Administrators Q#68266, answer score: 78
Revisions (0)
No revisions yet.