HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMajor

Accurate email syntax validation (no seriously)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
emailseriouslyvalidationsyntaxaccurate

Problem

So a friend happened to show me how odd and specific the general email syntax rules are. For instance, emails can have "comments". Basically you can put characters in parentheses that are just ignored. So not only is it valid, email(this seems extremely redundant)@email.com is the same email as email@email.com.

Now most email providers have more simpler and easier to work restrictions (like only ascii, digits, dots and dashes). But I thought it'd be a fun exercise to follow the exact guidelines as best I could. I wont delineate every specific here, as I (hopefully) have made it all clear in the code itself.

I did heavily consult the font of all knowledge, Wikipedia for its summary on the rules.

I'm particularly interested on feedback for how robust I made this and how I did the testing and separation of functions. In theory this should be a module people could import and call on (though I have no idea when someone would actually want to use it) so I'd like reviews to focus on that. Feedback about better or more efficient methods are, of course, welcome.

```
"""This module will evaluate whether a string is a valid email or not.

It is based on the criteria laid out in RFC documents, summarised here:
https://en.wikipedia.org/wiki/Email_address#Syntax

Many email providers will restrict these further, but this module is primarily
for testing whether an email is syntactically valid or not.

Calling validate() will run all tests in intelligent order.
Any error found will raise an InvalidEmail error, but this also inherits from
ValueError, so errors can be caught with either of them.

If you're using any other functions, note that some of the tests will return
a modified string for the convenience of how the default tests are structured.
Just calling valid_quotes(string) will work fine, just don't use the assigned
value unless you want the quoted sections removed.
Errors will be raised from the function regardless.

>>> validate("local-part@domain")
>>> validate("

Solution

"@"@example.com and "\ "@example.com both fail, but they are valid.

" "@example.com passes, but it is, in fact, invalid.*

You probably missed the idea to confirm your knowledge with the relevant RFCs, as a conforming implementation should abide by the rules described therein. While Wikipedia is quite reliable nowadays, it is by no means a normative source.

 

*RFC 5322 describes quoted-string as follows:

quoted-string   =   [CFWS]
                    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                    [CFWS]


FWS means "folding white space" and is a construct containing an optional sequence made up of whitespaces that are followed by a single CRLF; that sequence (if present) preceding a mandatory part that consists of a single whitespace. While an address' local part can legally begin and end with a space, both spaces need to be separated by at least one character forming qcontent.

Code Snippets

quoted-string   =   [CFWS]
                    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                    [CFWS]

Context

StackExchange Code Review Q#117584, answer score: 33

Revisions (0)

No revisions yet.