- Regular Expressions
- Matches and replacements return a quantity.
- Don't use capture variables without checking that the match succeeded.
- XXX m// in list context gives a list of matches
- Common match flags
- Capture variables $1 and friends
- Avoid capture with ?:
- Allow easier reading with the /x switch
- Automatically quote your regexes with \Q and \E
- Execute code with /e flag to s///
- Know when to use study
- Handle multi-line regexes
- Use re => debug
Regular Expressions
Regular expressions are too huge of a topic to introduce here,but make sure that you understand these concepts.For tutorials,see perlrequick or perlretut.For the definitive documentation,see perlre.
Matches and replacements return a quantity.
The m//
and s///
operators return the number of matches or replacements they made,respectively.You can either use the number directly,or check it for truth.
- if ( $str =~ /Diggle|Shelley/ ) {
- print "We found Pete or Steve!\n";
- }
- if ( my $n = ($str =~ s/this/that/g) ) {
- print qq{Replaced $n occurrence(s) of "this"\n};
- }
Don't use capture variables without checking that the match succeeded.
The capture variables, $1
, etc, are not valid unless the match succeeded, and they're not cleared, either.
- # BAD: Not checked, but at least it "works".
- my $str = 'Perl 101 rocks.';
- $str =~ /(\d+)/;
- print "Number: $1"; # Prints "Number: 101";
- # WORSE: Not checked, and the result is not what you'd expect
- $str =~ /(Python|Ruby)/;
- print "Language: $1"; # Prints "Language: 101";
Instead, you must check the return value from the match:
- # GOOD: Check the results
- my $str = 'Perl 101 rocks.';
- if ( $str =~ /(\d+)/ ) {
- print "Number: $1"; # Prints "Number: 101";
- }
- if ( $str =~ /(Python|Ruby)/ ) {
- print "Language: $1"; # Never gets here
- }
XXX m// in list context gives a list of matches
Common match flags
/i
- case insensitive match/g
- match multiple times
- $var = "match match match";
- while ($var =~ /match/g) { $a++; }
- print "$a\n"; # prints 3
- $a = 0;
- $a++ foreach ($var =~ /match/g);
- print "$a\n"; # prints 3
/m
-^
and$
change meaning- Ordinarily,
^
means "start of string" and$
, "end of string" /m
makes them mean start and end of line, respectively
- Ordinarily,
- $str = "one\ntwo\nthree";
- @a = $str =~ /^\w+/g; # @a = ("one");
- @b = $str =~ /^\w+/gm; # @b = ("one","two","three")
- Use
\A
and\z
for start and end of string regardless of/m
\Z
is the same as\z
except it will ignore a final newline/s
-.
also matches newline
- $str = "one\ntwo\nthree\n";
- $str =~ /^(.{8})/s;
- print $1; # prints "one\ntwo\n"
Capture variables $1 and friends
- Sets of capturing parentheses are stored in numeric variables
- Parenthesis are assigned left to right:
- my $str = "abc";
- $str =~ /(((a)(b))(c))/;
- print "1: $1 2: $2 3: $3 4: $4 5: $5\n";
- # prints: 1: abc 2: ab 3: a 4: b 5: c
- No upper limit on number of capturing parenthesis and variables
Avoid capture with ?:
- If a parenthesis is followed by
?:
, the group will not be captured - Useful if you don't want the matches to be saved
- my $str = "abc";
- $str =~ /(?:a(b)c)/;
- print "$1\n"; # prints "b"
Allow easier reading with the /x switch
- If you're doing something tricky with a regex, comment it.
- You can do this with the
/x
flag.
This ugly behemoth
- my ($num) = $ARGV[0] =~ m/^\+?((?:(?<!\+)-)?(?:\d*.)?\d+)$/x;
is more readable with whitespace and comments, as allowed by the /x
flag.
- my ($num) =
- $ARGV[0] =~ m/^ \+? # An optional plus sign, to be discarded
- ( # Capture...
- (?:(?<!\+)-)? # a negative sign, if there's no plus behind it,
- (?:\d*.)? # an optional number, followed by a point if a decimal,
- \d+ # then any number of numbers.
- )$/x;
- Whitespace and comments are stripped unless escaped.
Automatically quote your regexes with \Q and \E
- Automatically escapes regex metacharacters
- Won't escape dollar signs
- my $num = '3.1415';
- print "ok 1\n" if $num =~ /\Q3.14\E/;
- $num = '3X1415';
- print "ok 2\n" if $num =~ /\Q3.14\E/;
- print "ok 3\n" if $num =~ /3.14/;
prints
- ok 1
- ok 3
Execute code with /e flag to s///
- Allows arbitrary code to replace a string in a regular expression
- my $str = "AbCdE\n";
- $str =~ s/(\w)/lc $1/eg;
- print $str; # prints "abcde"
- Use
$1
and friends if necessary
Know when to use study
study is not helpful in the vast majority of cases. All it does is make a table of where the first occurrence of each of 256 bytes is in the string. This means that if you have a 1,000-character string, and you search for lots of strings that begin with a constant character, then the matcher can jump right to it. For example:
"This is a very long [… 900 characters skipped…] string that I have here, ending at position 1000"
Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching.
Handle multi-line regexes
Use re => debug
- -Mre=debug
Want to contribute?
Submit a PR to github.com/petdance/perl101