Home : Perl : Regular expressions
| $var =~ /regexp/options $var =~ m/regexp/options |
returns "true" if regexp found; options can be:
i = ignore letter case. o = only expand scalar variables within regexp once, the first time it is executed. |
| $var =~ s/regexp/text/options | replace the first occurrence of regexp by text;
options can be any of the above, plus:
e = interpret text as an
expression. g = replace every occurrence. |
If the "$var =~" portion is omitted, the operator works on the default $_ variable.
| Single characters | ||
| x | matches the single character 'x' | |
| [xyz] | matches the single character 'x' or 'y' or 'z' | |
| [^xyz] | matches any single character except 'x' or 'y' or 'z' | |
| [a-z] | matches any single character that is in the range 'a' to 'z' inclusive | |
| . | matches any single character (except '\n') | |
| \d | matches any single digit | equivalent to [0-9] |
| \w | matches any single alphanumeric character | equivalent to [a-zA-Z0-9_] |
| \s | matches any single whitespace character | equivalent to [ \r\t\n\f] |
| \D | matches any single non-digit | equivalent to [^0-9] |
| \W | matches any single non-alphanumeric character | equivalent to [^a-zA-Z0-9_] |
| \S | matches any single non-whitespace character | equivalent to [^ \r\t\n\f] |
Multiple characters |
||
| x? | matches zero or one 'x' characters | equivalent to x{0,1} |
| x* | matches zero or more 'x' characters | equivalent to x{0,} |
| x+ | matches one or more 'x' characters | equivalent to x{1,} |
| x{n} | matches exactly n 'x' characters | |
| x{n,} | matches n or more 'x' characters | |
| x{n,m} | matches between n and m 'x' characters | |
By default, these patterns are "greedy" - they match as many characters as possible. Add a ? suffix (e.g. x*?) to make them "lazy" - match as few characters as possible. |
||
Anchors |
||
| ^xxx | "xxx" must be at the beginning of the string | |
| xxx$ | "xxx" must be at the end of the string | |
| \bxxx | "xxx" must be at the beginning of a word (boundary between /w and /W) | |
| xxx\b | "xxx" must be at the end of a word (boundary between /w and /W) | |
| \Bxxx | "xxx" must not be at the beginning of a word | |
| xxx\B | "xxx" must not be at the end of a word | |
Memory |
|
| xxx(yyy)zzz | The portion of the regular expression matching yyy is "memorised". |
The first parenthesised ("memorised") portion can be "recalled" later in the regular expression using \1. A second parenthesised portion can be recalled using \2, and so on.
The first parenthesised portion is also stored in the scalar variable $1, for use by later statements. A second parenthesised portion is stored in $2, and so on.
Finally, all parenthesised portions are returned as a list by the
regular expression. For example:
($before, $after) = $var =~
/\s*(.*?)\s*=\s*(.*)
extracts the text (in $var) before and
after the first equals sign.