Validating email address in JavaScript

Wai Park Soon
NoteToPS
Published in
3 min readJul 13, 2015

Webapps often ask users to input email addresses for authentication or referrals. To provide good experiences, it is always a good idea to validate email addresses before actually using it, such as to check for typo in UI to give immediate feedback or to verify the validity email address before actually sending the email.

The Standards

Alright so what makes a valid email? If we follow the definition in RFCs (RFC5322 and its references), it will be overly complicated. For common use cases, it is sufficient to use the regex provided by WHATWG:

/^[a-zA-Z0–9.!#$%&’*+\/=?^_`{|}~-]+@[a-zA-Z0–9](?:[a-zA-Z0–9-]{0,61}[a-zA-Z0–9])?(?:\.[a-zA-Z0–9](?:[a-zA-Z0–9-]{0,61}[a-zA-Z0–9])?)*$/

Lets break down the regex

  • There should be exactly one @ symbol.
  • The basic form is local-part@domain.
  • The local part should contain at least one alphanumeric character or special character (. ! # $ % & ‘ * + \ / = ? ^ _ ` { | } ~ -).
  • The domain should contain at least one label, delimited by dot.
  • A label is limited to 63 characters. It should contain only alphanumeric characters or hyphens. However, the first and the last character cannot be hyphens.

Technically,

email = local-part "@" domainlocal-part = 1*(atext / ".")
domain = label *("." label)
label = let-dig [[ldh-str] let-dig] ;limited to 63 characters
let-dig = ALPHA / DIGIT
ldh-str = *(let-dig / "-")

atext = ALPHA / DIGIT / ! # $ % & ' * + / = ? ^ _ ` { | } ~ -
ALPHA = a-z A-Z
DIGIT = 0-9

*(n) = 0 or more (n)
1*(n) = 1 or more (n)
(n){x,y} = x-y occurrence of (n)

As stated in the document itself, this regex is

a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

Supporting Unicode

With the extension of SMTP by RFC6531, Unicode is allowed to be part of email addresses. This extension is highly relevant if you have an internationalized product. In fact, I have encountered feature requests from users to allow Unicode in their emails. In most of the cases, we need only to support umlauts. But sometimes, you might need to support Chinese or Japanese characters too. In theory, an email address like 客服@买卖.商务 is perfectly fine.

I have searched through various sources without reaching a good result. So I have decided to modify the regex provided by WHATWG to support Unicode characters.

/^(?:[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]|[^\u0000-\u007F])+@(?:[a-zA-Z0-9]|[^\u0000-\u007F])(?:(?:[a-zA-Z0-9-]|[^\u0000-\u007F]){0,61}(?:[a-zA-Z0-9]|[^\u0000-\u007F]))?(?:\.(?:[a-zA-Z0-9]|[^\u0000-\u007F])(?:(?:[a-zA-Z0-9-]|[^\u0000-\u007F]){0,61}(?:[a-zA-Z0-9]|[^\u0000-\u007F]))?)*$/

Here's a Plunker that you can play around.

This regex is in essence the same with the one above, except that it allows any non-ASCII UTF8 characters in any part of an email address. There are actually more restrictions on the usage of Unicode characters for the domain part, but those are actually rare cases so we can safely ignore them (until more somebody requested it). To break it down,

  • Only non-ASCII UTF8 characters are allowed. That means characters in from \u0000 to \u007F has to be excluded. Anything else is fine.
  • The other rules follow the original one.

Technically,

email = local-part "@" domainlocal-part = 1*(atext / "." / utf8-non-ascii)
domain = label *("." label)
label = let-dig [[ldh-str] let-dig] ;limited to 63 characters
let-dig = ALPHA / DIGIT / utf8-non-ascii
ldh-str = *(let-dig / "-")
atext = ALPHA / DIGIT / ! # $ % & ' * + / = ? ^ _ ` { | } ~ -
utf8-non-ascii = NOT(\u0000-\u007F)
ALPHA = a-z A-Z
DIGIT = 0-9

*(n) = 0 or more (n)
1*(n) = 1 or more (n)
(n){x,y} = x-y occurrence of (n)
NOT(n) = Non-existence of (n)

Basically [^\u0000-\u007F] is injected for all parts of the regex by using (?:<original>|[^\u0000-\u007F] pattern. Let me know if this regex can be further improved. Thanks!

PS: A valid email address does not necessarily be an email address that exists. We still need to actually send the email to know its existence.

--

--