Regex

Ab 1    literals — letters, digits, and spaces match themselves

Character classes

.       any character [poor perf]
[Ab1]   matching one instance of any of the contents
[a-z]   all letters within a range
[0-9]   all digits
[^x]    negation of the character class
\s      whitespace char (negation is \S)
\d      single digit char (negation is \D)
\w      single word char (negation is \W)

No escaping is needed inside character classes.

Quantifiers

*       zero or more [greedy]
+       one or more [greedy]
?       zero or one
{2}     exactly 2 instances [greedy]
{2,}    2 or more instances
{2,5}   2 to 5 instances

Anchors

^       beginning of string (line if /m flag)
$       end of string (line if /m flag)

Grouping and capturing

( )     create a capturing group
(?: )   disable the capturing group
(?<foo> ) create a named group

Back-references

\1      refer to a matched group ($1 is sometimes used)
\k<foo> refer to a named matched group

Look-ahead, Look-behind

b(?=c)  match b only if followed by c
(?<=c)b match b only if preceded by c
b(?!c)  match b only if not followed by c
(?!c)b  match b only if not preceded by c

Misc

\       escape any special character
o|r     o or r
\t      tab
\n      new-line
\r      carriage return
\b      boundary: one side is \w and other side is \W (negation is \B)

Flags

m       multi-line
i       case insensitive
g       global: restart subsequent searches at the end of the prev match

Greedy, Lazy matching

Grammar

expression = term term | expression

term = factor factor term

factor = atom atom metacharacter

atom = character . ( expression ) [ characterclass ] [ ^ characterclass ] { min } { min , } { min , max }

characterclass = characterrange characterrange characterclass

characterrange = begincharacter begincharacter - endcharacter

begincharacter = character endcharacter = character

character = anycharacterexceptmetacharacters  anycharacterexceptspecialcharacters

metacharacter = ? * {=0 or more, greedy} *? {=0 or more, non-greedy} + {=1 or more, greedy} +? {=1 or more, non-greedy} ^ {=begin of line character} $ {=end of line character} $` {=the characters to the left of the match} $’ {=the characters to the right of the match} $& {=the characters that are matched} $N{=the characters in Nth tag (if not on match side)} {=not a whitespace, [^ } {=change characters to uppercase, until }

min = integer max = integer integer = digit digit integer

anycharacter = ! ” # $ % & ’ ( ) * + , - . / : ; < = > ? @ [  ] ^ _ ` { | } ~ 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z

References

https://stackoverflow.com/questions/265457/regex-bnf-grammar