Extract regexp tokens with regexpPattern

3 views (last 30 days)
Jan Kappen
Jan Kappen on 29 Feb 2024
Commented: Jan Kappen on 29 Feb 2024
With regexp I could extract the tokens of my capture groups via
regexp("abcd3e", "\w+(\d)+\w", "tokens")
ans = 1×1 cell array
{["3"]}
The result is a cell array. With the new regexpPattern and extract functions, the return values usually are string (arrays) which is something I prefer.
Question: Is there an analogon of the above regexp using something like extract("abcd3e", regexpPattern("\w+(\d)+\w"), "tokens")? This syntax obviously does not work in R2023b, but are there standard ways to rewrite these patterns to return my tokens?
Thanks,
Jan
EDIT: this is just a toy example, I do not only want to extract digits which could be done with digitsPattern. Ideally, I'd like to understand how directly translate the regexps.
To show a more realistic example:
str = [
"42652Z_HEX"
"42652X"
"42652Y"
"42652Z"
"42652GYRO-X_HEX"
"42652GYRO-Y_HEX"
"42652GYRO-Z_HEX"
"42351Temp_HEX"
"42652Temp_HEX"
"42652GYRO-X"
"42652GYRO-Y"
"42652GYRO-Z"
"42351Temp"
"42652Temp"
];
res = string(regexp(str, "\d+(?:GYRO-)?([XYZ])?.*", "tokens"))
res = 14×1 string array
"Z" "X" "Y" "Z" "X" "Y" "Z" "" "" "X" "Y" "Z" "" ""
% how to get the same result with matches and regexpPattern?
  2 Comments
Dyuman Joshi
Dyuman Joshi on 29 Feb 2024
Moved: Dyuman Joshi on 29 Feb 2024
If you just want to extract numbers between letters -
str = "abcd3e57xyz";
out = extract(str, digitsPattern)
out = 2×1 string array
"3" "57"
Jan Kappen
Jan Kappen on 29 Feb 2024
Moved: Dyuman Joshi on 29 Feb 2024
Thanks for your answer.
No, I do not only want to extract numbers, it's a toy example. I'd like to translate the regexps which already exist into the new regexpPattern - if possible. The regexp might get more complicated than the shown one. I'll edit my question accordingly.

Sign in to comment.

Answers (1)

the cyclist
the cyclist on 29 Feb 2024
I realize that this is not really an answer to your question, but I just wanted to make sure you are aware that one option is to wrap the string function around the regexp:
string(regexp("abcd3e fghi4j", "\w+(\d)+\w", "tokens"))
ans = 1×2 string array
"3" "4"
Also, if you are guaranteed to have only one match, you could do
regexp("abcd3e", "\w+(\d)+\w", "tokens","once")
ans = "3"
but that's somewhat fragile coding, I would say.
I'm not yet sure if there is a more "direct" way with more recent functions.
  2 Comments
the cyclist
the cyclist on 29 Feb 2024
Your updated question clarifies that my answer is not what you are looking for, but I'll leave it here anyway. :-)
Jan Kappen
Jan Kappen on 29 Feb 2024
Thank you very much for your answer.
Yes, I updated the question to clarify a bit, sorry.
There were cases in the past where I could not cast to string, I'll need to check why. In fact that's not a terrible solution, but I'm simply wondering how to use the new regexpPattern properly and maybe I'm missing something.

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!