email regex

email regex

Old forum URL: forums.lhotka.net/forums/t/205.aspx


RockfordLhotka posted on Thursday, May 25, 2006

From what I've seen on the regex web sites, I'm risking life and limb by asking this question - but here goes...

The regex I used in CommonRules to validate email addresses is not quite right, because it only works against upper-case email addresses. I could fix this in RegExMatch() by upper-casing the property value, but that would be a hack.

This is the regex I am using:

\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Is there a better expression that works for all email addresses - including lower-case ones?
 I'm not a regex person - but would this work?

\b[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b

or is that total garbage?

cds replied on Thursday, May 25, 2006

Hey Rocky

From what I know of regex expressions that should work fine.

Don't get too worried about them - they're very elegant but the syntax is hardly rememberable. I used to teach them to students a dozen or so years ago and I always have to look them up, and it's always trial and error to get them right unless you're using them on a daily basis.

Craig

tetranz replied on Thursday, May 25, 2006

Here's what I've been happily using for a few years.

(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}

I think it came from a regex recipe site. I've never really tried to understand it.

I have heard rumors and folklore about addresses of the form x@xx which are theoretically possible if someone puts an MX on the TLD for a country. The above regex fails on those but so far it hasn't been a problem. Smile [:)]

Cheers
Ross

zythra replied on Thursday, May 25, 2006

Here is one I've used and have written several tests against:

^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

Handles upper and lower casing as well as allowed positioning of numbers within an address.  The following is a list of some variations that pass and fail with it:

Pass:
john_smith@domain.com
john.smith@domain.com.au

Fail:
john_smith.domain.com.au
john smith@domain.com.au
john_smith@domain
john_smith@

It's long and ugly, but I've never gotten a false positive or negative with it. 

Ross' looks like it would probably do a good job as well.

Dustin

Fabio replied on Thursday, May 25, 2006

I find this one
^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$

bye.
Fabio.

pfeds replied on Friday, May 26, 2006

Well that would explain why my e-mail property was invalid when I tried that yesterday ;o)

Rocky, as you said, the following RegEx would solve the upper/lowercase issue:

\b[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b

The problem I find with RegExes that you find on the internet is that they are normally over complex and long-winded.  I *think* the above RegEx would resolve almost all e-mail addresses (I can't think of any exceptions to the rule).

A tool that I find very useful (and very usefully free) is the Sells Brothers RegexDesigner, which can be found here:

http://www.sellsbrothers.com/tools/#regexd

 

pfeds replied on Friday, May 26, 2006

Infact, this would be better:

^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$

This is because the previous regex would validate something like this:

"John Doe@hotmail.com"

This is because it will ignore the "John " and simply validate the "Doe@hotmail.com"

 

Dave.Erwin replied on Friday, May 26, 2006

I found this on CodeProject at http://www.codeproject.com/csharp/rfc822validator.asp. I was concerned that some of the simpler expressions would miss something or not allow something. Since this purports to handle RFC822 I'm assuming it's pretty complete and will handle some of the possible oddball email addresses. He based his C# version on an expression done by Jeffrey Friedl (author of Mastering Regular BLOCKED EXPRESSION. I translated it to VB.NET and modified it slightly since it seemed like StringBuilder would help.

I'm no regex guru so this expression is a little beyond me <g> but it has worked well here. Someday when I have time I'd like to tear it apart and see how it works. I haven't had any bad addresses show up and no complaints that a valid address couldn't be entered. That said I'm only using it internally at a very small company so "buyer beware".

I don't see a way to attach a file otherwise I'd send you a copy of the VB.NET version. I'd post the code in the message but it's about 300 lines which would be kind of annoying. If you'd like a copy I can email it to you.

I can't even begin to tell you how much help your CSLA books have been to me. Thanks for the effort in putting them out. I'm sure some days it would be easier just to develop CSLA for your own use and not bother with the books.

Thanks again.

Michael Hildner replied on Friday, June 02, 2006

I won't pretend to offer a solution, but this is a pretty good read http://www.regular-expressions.info/email.html

It also claims to to have a valid RFC 822 regex, although it's different than the one posted above. I like how the article explains why email regexes are a tradeoff, and why you may not want to target RFC 822 in the first place.

Mike

tetranz replied on Tuesday, June 06, 2006

Something that managed to bite me this morning on this subject is that you need to allow addresses with an apostrophe in the name for names like O'Reilly etc. Apparently it is legal but lots of systems reject it.

Ross

SoftwareArchitect replied on Friday, June 09, 2006

what if I want to validate against a regex pattern (such as email) ONLY if the user supplied a value?  In other words, my email property is NOT required but if the user enters it it should be validated.

This has got to be simple...do I need to create my own version of RegExMatch that ignores a zero length string?  Or should my email pattern specify that?  If so, how would I get that email expression to allow it?

Thanks,
Mike

xal replied on Friday, June 09, 2006

You could create a rule that checks the length and if it's zero return true and if not, call the email validation rule...


Andrés

SoftwareArchitect replied on Monday, June 12, 2006

Thanks much.  That is exactly what I will do.

Mike

mcfin replied on Friday, September 22, 2006

Was reading this thread looking for a way to do the same thing.

If you add the following to your regular expression it will match on an empty string as well as the match you are looking for:

|^[/S/s]{0,0}$

 

Skafa replied on Friday, September 22, 2006

there's an official regex for email-addresses, but it's kinda useless (around 6300 characters). see http://www.regular-expressions.info/email.html (last paragraph). 

pelinville replied on Friday, September 22, 2006

That is funny.

Adam replied on Wednesday, July 08, 2009

Hi peeps

Just a note on something I have noticed when using the regex match with the default common email regex the email field becomes mandatory is this correct even for nullable email fields? I have updated mine as follows but wondered if this was desired functionality?

ValidationRules.AddRule(CommonRules.RegExMatch, new CommonRules.RegExRuleArgs(SalesContactEmailProperty, @"^$|^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$"));


RockfordLhotka replied on Wednesday, July 08, 2009

I'll be removing all "default" expressions from the next release of CSLA.

There are so many variations and opinions on expressions that it is clear to me that I can't win. And there's no value to me in trying - the RegEx rule lets you specify the expression directly, so there's simply no value at all in CSLA providing "default" expressions that (from what I can tell) most people find to be insufficient.

Clearly there's just no industry-wide concencus on how to validate anything :)

Copyright (c) Marimer LLC