Monday, April 21, 2014

Saving a variable number of matches from a Perl regex

Intuitively it seems like the regex /(.)+/ should return an array of all the characters in a string.  It's a match group with a quantifier allowing it to repeat, after all.  But in fact it only returns the last group.

My best guess is that it's done this way so that the rule always works that says you can always count left parens to figure out which position the matching value will be returned in.

To get around this, remove the quantifier and use /g.  For example, 'asdf' =~ /(.)/g returns ('a', 's', 'd', 'f') as expected.

The downside is that I can't use fancier expressions as I would have hoped.  What I originally wanted was an expression like this: 'a12345' =~ /^([a-z])(\d)+$/ to return ('a', '1', '2', '3', '4', '5').  I don't see an obvious way to have multiple matches like that with /g, since with /g it has to match the entire regex multiple times.

The best I can come up with for that scenario is to split into multiple regexes and handle the input in chunks:

$_ = 'a12345';
s/^([a-z])//;
$letter = $1;
@numbers = /(\d)/g;

Is there a better solution I'm missing?

No comments: