patternpythonMajor
Accurate email syntax validation (no seriously)
Viewed 0 times
emailseriouslyvalidationsyntaxaccurate
Problem
So a friend happened to show me how odd and specific the general email syntax rules are. For instance, emails can have "comments". Basically you can put characters in parentheses that are just ignored. So not only is it valid,
Now most email providers have more simpler and easier to work restrictions (like only ascii, digits, dots and dashes). But I thought it'd be a fun exercise to follow the exact guidelines as best I could. I wont delineate every specific here, as I (hopefully) have made it all clear in the code itself.
I did heavily consult the font of all knowledge, Wikipedia for its summary on the rules.
I'm particularly interested on feedback for how robust I made this and how I did the testing and separation of functions. In theory this should be a module people could import and call on (though I have no idea when someone would actually want to use it) so I'd like reviews to focus on that. Feedback about better or more efficient methods are, of course, welcome.
```
"""This module will evaluate whether a string is a valid email or not.
It is based on the criteria laid out in RFC documents, summarised here:
https://en.wikipedia.org/wiki/Email_address#Syntax
Many email providers will restrict these further, but this module is primarily
for testing whether an email is syntactically valid or not.
Calling validate() will run all tests in intelligent order.
Any error found will raise an InvalidEmail error, but this also inherits from
ValueError, so errors can be caught with either of them.
If you're using any other functions, note that some of the tests will return
a modified string for the convenience of how the default tests are structured.
Just calling valid_quotes(string) will work fine, just don't use the assigned
value unless you want the quoted sections removed.
Errors will be raised from the function regardless.
>>> validate("local-part@domain")
>>> validate("
email(this seems extremely redundant)@email.com is the same email as email@email.com.Now most email providers have more simpler and easier to work restrictions (like only ascii, digits, dots and dashes). But I thought it'd be a fun exercise to follow the exact guidelines as best I could. I wont delineate every specific here, as I (hopefully) have made it all clear in the code itself.
I did heavily consult the font of all knowledge, Wikipedia for its summary on the rules.
I'm particularly interested on feedback for how robust I made this and how I did the testing and separation of functions. In theory this should be a module people could import and call on (though I have no idea when someone would actually want to use it) so I'd like reviews to focus on that. Feedback about better or more efficient methods are, of course, welcome.
```
"""This module will evaluate whether a string is a valid email or not.
It is based on the criteria laid out in RFC documents, summarised here:
https://en.wikipedia.org/wiki/Email_address#Syntax
Many email providers will restrict these further, but this module is primarily
for testing whether an email is syntactically valid or not.
Calling validate() will run all tests in intelligent order.
Any error found will raise an InvalidEmail error, but this also inherits from
ValueError, so errors can be caught with either of them.
If you're using any other functions, note that some of the tests will return
a modified string for the convenience of how the default tests are structured.
Just calling valid_quotes(string) will work fine, just don't use the assigned
value unless you want the quoted sections removed.
Errors will be raised from the function regardless.
>>> validate("local-part@domain")
>>> validate("
Solution
"@"@example.com and "\ "@example.com both fail, but they are valid." "@example.com passes, but it is, in fact, invalid.*You probably missed the idea to confirm your knowledge with the relevant RFCs, as a conforming implementation should abide by the rules described therein. While Wikipedia is quite reliable nowadays, it is by no means a normative source.
*RFC 5322 describes
quoted-string as follows:quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]FWS means "folding white space" and is a construct containing an optional sequence made up of whitespaces that are followed by a single
CRLF; that sequence (if present) preceding a mandatory part that consists of a single whitespace. While an address' local part can legally begin and end with a space, both spaces need to be separated by at least one character forming qcontent.Code Snippets
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]Context
StackExchange Code Review Q#117584, answer score: 33
Revisions (0)
No revisions yet.