## Documentation Center |

On this page… |
---|

Dynamic Match Expressions — (??expr) Commands That Modify the Match Expression — (??@cmd) |

In a dynamic expression, you can make the pattern that you want `regexp` to
match dependent on the content of the input string. In this way, you
can more closely match varying input patterns in the string being
parsed. You can also use dynamic expressions in replacement strings
for use with the `regexprep` function. This gives
you the ability to adapt the replacement text to the parsed input.

You can include any number of dynamic expressions in the `match_expr` or `replace_expr` arguments
of these commands:

regexp(string, match_expr) regexpi(string, match_expr) regexprep(string, match_expr, replace_expr)

As an example of a dynamic expression, the following `regexprep` command
correctly replaces the term `internationalization` with
its abbreviated form, `i18n`. However, to use it
on a different term such as `globalization`, you
have to use a different replacement expression:

match_expr = '(^\w)(\w*)(\w$)'; replace_expr1 = '$118$3'; regexprep('internationalization', match_expr, replace_expr1)

ans = i18n

replace_expr2 = '$111$3'; regexprep('globalization', match_expr, replace_expr2)

ans = g11n

Using a dynamic expression `${num2str(length($2))}` enables
you to base the replacement expression on the input string so that
you do not have to change the expression each time. This example uses
the dynamic replacement syntax `${cmd}`.

match_expr = '(^\w)(\w*)(\w$)'; replace_expr = '$1${num2str(length($2))}$3'; regexprep('internationalization', match_expr, replace_expr)

ans = i18n

`regexprep('globalization', match_expr, replace_expr)`

ans = g11n

When parsed, a dynamic expression must correspond to a complete,
valid regular expression. In addition, dynamic match expressions that
use the backslash escape character (`\`) require
two backslashes: one for the initial parsing of the expression, and
one for the complete match. The parentheses that enclose dynamic expressions
do *not* create a capturing group.

There are three forms of dynamic expressions that you can use in match expressions, and one form for replacement expressions, as described in the following sections

The `(??expr)` operator parses expression `expr`,
and inserts the results back into the match expression. MATLAB^{®} then
evaluates the modified match expression.

Here is an example of the type of expression that you can use with this operator:

str = {'5XXXXX', '8XXXXXXXX', '1X'}; regexp(str, '^(\d+)(??X{$1})$', 'match', 'once');

The purpose of this particular command is to locate a series
of `X` characters in each of the strings stored in
the input cell array. Note however that the number of `X`s
varies in each string. If the count did not vary, you could use the
expression `X{n}` to indicate that you want to match `n` of
these characters. But, a constant value of `n` does
not work in this case.

The solution used here is to capture the leading count number
(e.g., the `5` in the first string of the cell array)
in a token, and then to use that count in a dynamic expression. The
dynamic expression in this example is `(??X{$1})`,
where `$1` is the value captured by the token `\d+`.
The operator `{$1}` makes a quantifier of that token
value. Because the expression is dynamic, the same pattern works on
all three of the input strings in the cell array. With the first input
string, `regexp` looks for five `X` characters;
with the second, it looks for eight, and with the third, it looks
for just one:

regexp(str, '^(\d+)(??X{$1})$', 'match', 'once')

ans = '5XXXXX' '8XXXXXXXX' '1X'

MATLAB uses the `(??@cmd)` operator to
include the results of a MATLAB command in the match expression.
This command must return a string that can be used within the match
expression.

For example, use the dynamic expression `(??@flilplr($1))` to
locate a palindrome string, "Never Odd or Even", that
has been embedded into a larger string.

First, create the input string. Make sure that all letters are lowercase, and remove all nonword characters.

str = lower(... 'Find the palindrome Never Odd or Even in this string'); str = regexprep(str, '\W*', '')

str = findthepalindromeneveroddoreveninthisstring

Locate the palindrome within the string using the dynamic expression:

palstr = regexp(str, '(.{3,}).?(??@fliplr($1))', 'match')

str = 'neveroddoreven'

The dynamic expression reverses the order of the letters that
make up the string, and then attempts to match as much of the reversed-order
string as possible. This requires a dynamic expression because the
value for `$1` relies on the value of the token `(.{3,})`.

Dynamic expressions in MATLAB have access to the currently active workspace. This means that you can change any of the functions or variables used in a dynamic expression just by changing variables in the workspace. Repeat the last command of the example above, but this time define the function to be called within the expression using a function handle stored in the base workspace:

fun = @fliplr; palstr = regexp(str, '(.{3,}).?(??@fun($1))', 'match')

palstr = 'neveroddoreven'

The `(?@cmd)` operator specifies a MATLAB command
that `regexp` or `regexprep` is to run while parsing the
overall match expression. Unlike the other dynamic expressions in MATLAB,
this operator does not alter the contents of the expression it is
used in. Instead, you can use this functionality to get MATLAB to
report just what steps it is taking as it parses the contents of one
of your regular expressions. This functionality can be useful in diagnosing
your regular expressions.

The following example parses a word for zero or more characters followed by two identical characters followed again by zero or more characters:

regexp('mississippi', '\w*(\w)\1\w*', 'match')

ans = 'mississippi'

To track the exact steps that MATLAB takes in determining
the match, the example inserts a short script `(?@disp($1))` in
the expression to display the characters that finally constitute the
match. Because the example uses greedy quantifiers, MATLAB attempts
to match as much of the string as possible. So, even though MATLAB finds
a match toward the beginning of the string, it continues to look for
more matches until it arrives at the very end of the string. From
there, it backs up through the letters `i` then `p` and
the next `p`, stopping at that point because the
match is finally satisfied:

regexp('mississippi', '\w*(\w)(?@disp($1))\1\w*', 'match')

i p p ans = 'mississippi'

Now try the same example again, this time making the first quantifier
lazy (`*?`). Again, MATLAB makes the same match:

regexp('mississippi', '\w*?(\w)\1\w*', 'match')

ans = 'mississippi'

But by inserting a dynamic script, you can see that this time, MATLAB has matched the string quite differently. In this case, MATLAB uses the very first match it can find, and does not even consider the rest of the string:

regexp('mississippi', '\w*?(\w)(?@disp($1))\1\w*', 'match')

m i s ans = 'mississippi'

To demonstrate how versatile this type of dynamic expression
can be, consider the next example that progressively assembles a cell
array as MATLAB iteratively parses the input string. The `(?!)` operator
found at the end of the expression is actually an empty lookahead
operator, and forces a failure at each iteration. This forced failure
is necessary if you want to trace the steps that MATLAB is taking
to resolve the expression.

MATLAB makes a number of passes through the input string,
each time trying another combination of letters to see if a fit better
than last match can be found. On any passes in which no matches are
found, the test results in an empty string. The dynamic script `(?@if(~isempty($&)))` serves
to omit these strings from the `matches` cell array:

matches = {}; expr = ['(Euler\s)?(Cauchy\s)?(Boole)?(?@if(~isempty($&)),' ... 'matches{end+1}=$&;end)(?!)']; regexp('Euler Cauchy Boole', expr); matches

matches = 'Euler Cauchy Boole' 'Euler Cauchy ' 'Euler ' 'Cauchy Boole' 'Cauchy ' 'Boole'

The operators `$&` (or the equivalent `$0`), `$``,
and `$'` refer to that part of the input string that
is currently a match, all characters that precede the current match,
and all characters to follow the current match, respectively. These
operators are sometimes useful when working with dynamic expressions,
particularly those that employ the `(?@cmd)` operator.

This example parses the input string looking for the letter `g`.
At each iteration through the string, `regexp` compares
the current character with `g`, and not finding it,
advances to the next character. The example tracks the progress of
scan through the string by marking the current location being parsed
with a `^` character.

(The `$`` and `$´` operators
capture that part of the string that precedes and follows the current
parsing location. You need two single-quotation marks (`$''`)
to express the sequence `$´` when it appears
within a string.)

str = 'abcdefghij'; expr = '(?@disp(sprintf(''starting match: [%s^%s]'',$`,$'')))g'; regexp(str, expr, 'once');

starting match: [^abcdefghij] starting match: [a^bcdefghij] starting match: [ab^cdefghij] starting match: [abc^defghij] starting match: [abcd^efghij] starting match: [abcde^fghij] starting match: [abcdef^ghij]

The `${cmd}` operator modifies the contents
of a regular expression replacement string, making this string adaptable
to parameters in the input string that might vary from one use to
the next. As with the other dynamic expressions used in MATLAB,
you can include any number of these expressions within the overall
replacement expression.

In the `regexprep` call shown here, the replacement
string is `'${convertMe($1,$2)}'`. In this case,
the entire replacement string is a dynamic expression:

regexprep('This highway is 125 miles long', ... '(\d+\.?\d*)\W(\w+)', '${convertMe($1,$2)}');

The dynamic expression tells MATLAB to execute a function
named `convertMe` using the two tokens `(\d+\.?\d*)` and `(\w+)`,
derived from the string being matched, as input arguments in the call
to `convertMe`. The replacement string requires a
dynamic expression because the values of `$1` and `$2` are
generated at runtime.

The following example defines the file named `convertMe` that
converts measurements from imperial units to metric.

function valout = convertMe(valin, units) switch(units) case 'inches' fun = @(in)in .* 2.54; uout = 'centimeters'; case 'miles' fun = @(mi)mi .* 1.6093; uout = 'kilometers'; case 'pounds' fun = @(lb)lb .* 0.4536; uout = 'kilograms'; case 'pints' fun = @(pt)pt .* 0.4731; uout = 'litres'; case 'ounces' fun = @(oz)oz .* 28.35; uout = 'grams'; end val = fun(str2num(valin)); valout = [num2str(val) ' ' uout]; end

At the command line, call the `convertMe` function
from `regexprep`, passing in values for the quantity
to be converted and name of the imperial unit:

regexprep('This highway is 125 miles long', ... '(\d+\.?\d*)\W(\w+)', '${convertMe($1,$2)}')

ans = This highway is 201.1625 kilometers long

regexprep('This pitcher holds 2.5 pints of water', ... '(\d+\.?\d*)\W(\w+)', '${convertMe($1,$2)}')

ans = This pitcher holds 1.1828 litres of water

regexprep('This stone weighs about 10 pounds', ... '(\d+\.?\d*)\W(\w+)', '${convertMe($1,$2)}')

ans = This stone weighs about 4.536 kilograms

As with the `(??@ )` operator discussed in
an earlier section, the `${ }` operator has access
to variables in the currently active workspace. The following `regexprep` command
uses the array `A` defined in the base workspace:

A = magic(3)

A = 8 1 6 3 5 7 4 9 2

regexprep('The columns of matrix _nam are _val', ... {'_nam', '_val'}, ... {'A', '${sprintf(''%d%d%d '', A)}'})

ans = The columns of matrix A are 834 159 672

Was this topic helpful?