Skip to content
Advertisement

Unique regex for first name and last name

I have a single input where users should enter name and surname. The problem is i need to use checking regEx. There’s a list of a requirements:

  1. The name should start from Capital Letter (not space)
  2. There can’t be space stacks
  3. It’s obligate to support these Name and Surname (all people are able to write theirs first/name). Example:

    John Smith
    and
    Armirat Bair Hossan 
    
  4. And the last symbol shouldn’t be space.

Please help,

ATM i have regex like

^\p{L}\[p{L} ,.'-]+$

but it denies ALL input, which is not good

Thanks for helping me

UPDATE:

CORRECT INPUT: 
"John Smith"
"Alberto del Muerto"

INCORRECT
"   John Smith   "
" John Smith"

Advertisement

Answer

You can use

^[p{Lu}p{M}][p{L}p{M},.'-]+(?: [p{L}p{M},.'-]+)*$

or

^p{Lu}p{M}*+(?:p{L}p{M}*+|[,.'-])++(?: (?:p{L}p{M}*+|[,.'-])++)*+$

See the regex demo and demo 2

Java declaration:

if (str.matches("[\p{Lu}\p{M}][\p{L}\p{M},.'-]+(?: [\p{L}\p{M},.'-]+)*")) { ... } 
// or if (str.matches("\p{Lu}\p{M}*+(?:\p{L}\p{M}*+|[,.'-])++(?: (?:\p{L}\p{M}*+|[,.'-])++)*+")) { ... } 

The first regex breakdown:

  • ^ – start of string (not necessary with matches() method)
  • [p{Lu}p{M}] – 1 Unicode letter (incl. precomposed ones as p{M} matches diacritics and p{Lu} matches any uppercase Unicode base letter)
  • [p{L}p{M},.'-]+ – matches 1 or more Unicode letters, a ,, ., ' or - (if 1 letter names are valid, replace + with - at the end here)
  • (?: [p{L}p{M},.'-]+)* – 0 or more sequences of
    • – a space
    • [p{L}p{M},.'-]+ – 1 or more characters that are either Unicode letters or commas, or periods, or apostrophes or -.
  • $ – end of string (not necessary with matches() method)

NOTE: Sometimes, names contain curly apostrophes, you can add them to the character classes ([‘’]).

The 2nd regex is less effecient but is more accurate as it will only match diacritics after base letters. See more about matching Unicode letters at regular-expressions.info:

To match a letter including any diacritics, use p{L}p{M}*+.

Advertisement