Home : Perl : Regular expressions
$var =~ /regexp/options $var =~ m/regexp/options |
returns "true" if regexp found; options can be:
i = ignore letter case. o = only expand scalar variables within regexp once, the first time it is executed. |
$var =~ s/regexp/text/options | replace the first occurrence of regexp by text;
options can be any of the above, plus:
e = interpret text as an
expression. g = replace every occurrence. |
If the "$var =~" portion is omitted, the operator works on the default $_ variable.
Single characters | ||
x | matches the single character 'x' | |
[xyz] | matches the single character 'x' or 'y' or 'z' | |
[^xyz] | matches any single character except 'x' or 'y' or 'z' | |
[a-z] | matches any single character that is in the range 'a' to 'z' inclusive | |
. | matches any single character (except '\n') | |
\d | matches any single digit | equivalent to [0-9] |
\w | matches any single alphanumeric character | equivalent to [a-zA-Z0-9_] |
\s | matches any single whitespace character | equivalent to [ \r\t\n\f] |
\D | matches any single non-digit | equivalent to [^0-9] |
\W | matches any single non-alphanumeric character | equivalent to [^a-zA-Z0-9_] |
\S | matches any single non-whitespace character | equivalent to [^ \r\t\n\f] |
Multiple characters |
||
x? | matches zero or one 'x' characters | equivalent to x{0,1} |
x* | matches zero or more 'x' characters | equivalent to x{0,} |
x+ | matches one or more 'x' characters | equivalent to x{1,} |
x{n} | matches exactly n 'x' characters | |
x{n,} | matches n or more 'x' characters | |
x{n,m} | matches between n and m 'x' characters | |
By default, these patterns are "greedy" - they match as many characters as possible. Add a ? suffix (e.g. x*?) to make them "lazy" - match as few characters as possible. |
||
Anchors |
||
^xxx | "xxx" must be at the beginning of the string | |
xxx$ | "xxx" must be at the end of the string | |
\bxxx | "xxx" must be at the beginning of a word (boundary between /w and /W) | |
xxx\b | "xxx" must be at the end of a word (boundary between /w and /W) | |
\Bxxx | "xxx" must not be at the beginning of a word | |
xxx\B | "xxx" must not be at the end of a word |
Memory |
|
xxx(yyy)zzz | The portion of the regular expression matching yyy is "memorised". |
The first parenthesised ("memorised") portion can be "recalled" later in the regular expression using \1. A second parenthesised portion can be recalled using \2, and so on.
The first parenthesised portion is also stored in the scalar variable $1, for use by later statements. A second parenthesised portion is stored in $2, and so on.
Finally, all parenthesised portions are returned as a list by the
regular expression. For example:
($before, $after) = $var =~
/\s*(.*?)\s*=\s*(.*)
extracts the text (in $var) before and
after the first equals sign.