PCRE -- Perl Compatible Regular Expressions

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

PCRE -- Perl Compatible Regular Expressions

#1 Post by s243a »

In this thread lets give some examples of interesting perl compatible regular expressions and some tips on how to understand them.

We of course could use either perl or ssed (see post). Here's an example. Consider:

Code: Select all

a.so.b
I want to match something that is "Not = ".so." followed by ".so." followed by "b". Here is my attempt which appears to work:

Code: Select all

^((?![.]so[.]).)*([.]so[.])(.*)$
https://regex101.com/r/aUa8Ml/1/
**uses negative lookahead. See:
https://www.regular-expressions.info/lookaround.htm
https://www.regular-expressions.info/refadv.html

Now say, we don't know if there will be a ".so.b" then we might try this

Code: Select all

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?

Here are some related links:
https://stackoverflow.com/questions/752 ... y-is-not-x
https://unix.stackexchange.com/question ... -was-found
https://stackoverflow.com/questions/977 ... cter-group
https://www.perlmonks.org/?node_id=229044/
https://stackoverflow.com/questions/234 ... ng-pattern
Last edited by s243a on Sun 08 Mar 2020, 01:07, edited 2 times in total.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#2 Post by s243a »

Anyway, once one gets a good grasp on perl compatible regular expressions, note that they can be used in several places. For instance grep can use PCRE with certain options, one can use ssed (see post) instead of sed if they want to use PCRE (perl compatible regular expressions), or you can do perl one liners.

For example:

Code: Select all

# echo abc | perl -pe 's/(ab)/12/'
12c
See:
https://stackoverflow.com/questions/227 ... er-to-perl

I'm pretty sure puppy typically comes with a minimal amount of perl functionality out of the box, so I presume that you can do something like the above out of the box with puppy.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: PCRE -- Perl Comptable Regular Expressions

#3 Post by s243a »

s243a wrote:
Now say, we don't know if there will be a ".so.b" then we might try this

Code: Select all

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]
The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code: Select all

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code: Select all

Full match	0-6	a.so.b
Group 1.	0-1	a
Group 2.	1-5	.so.
Group 3.	5-6	b
and with the test string ab we have:

Code: Select all

Full match	0-2	ab
Group 1.	0-2	ab
Does this look like a good regular expression or can someone find issues with it?
Last edited by s243a on Sun 08 Mar 2020, 02:54, edited 1 time in total.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: PCRE -- Perl Comptable Regular Expressions

#4 Post by s243a »

s243a wrote:
s243a wrote:
Now say, we don't know if there will be a ".so.b" then we might try this

Code: Select all

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]
The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code: Select all

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code: Select all

Full match	0-6	a.so.b
Group 1.	0-1	a
Group 2.	1-5	.so.
Group 3.	5-6	b
and with the test string ab we have:

Code: Select all

Full match	0-2	ab
Group 1.	0-2	ab
Does this look like a good regular expression or can someone find issues with it?
Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code: Select all

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b
**The above will work on bash but not ash since ash doesn't support process substitution.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: PCRE -- Perl Comptable Regular Expressions

#5 Post by s243a »

s243a wrote:
s243a wrote:
s243a wrote:
Now say, we don't know if there will be a ".so.b" then we might try this

Code: Select all

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]
The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code: Select all

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$
https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code: Select all

Full match	0-6	a.so.b
Group 1.	0-1	a
Group 2.	1-5	.so.
Group 3.	5-6	b
and with the test string ab we have:

Code: Select all

Full match	0-2	ab
Group 1.	0-2	ab
Does this look like a good regular expression or can someone find issues with it?
Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code: Select all

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b
**The above will work on bash but not ash since ash doesn't support process substitution.
I wrote some code to do the same thing as above but without using regualar expressions. I'm not sure which approach is faster and/or more readable.

Code: Select all

function split_on_so(){
  local str=$1
  local len=${#str}
  local len_m=$((len-1))
  local index
  local s1
  local s2
  local p1
  local p2

    ind=$(expr index $str .so)
    len=${#str}
    [ $ind -eq 0 ] && ind=$len
    p1=$((ind-1))
    s1=${str:0:$p1}
    if [ $ind -lt $len ]; then
      p2=$((ind+2))
      if [ ${str:p2:1} = '.' ]; then
        p2=$((p2+1))
      fi
    else
      p2=len
    fi
    s2=${str:$p2}

  echo "$s1"
  echo "$s2"
}
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: PCRE -- Perl Comptable Regular Expressions

#6 Post by s243a »

s243a wrote:
Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code: Select all

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b
**The above will work on bash but not ash since ash doesn't support process substitution.
Here's a slightly improved regular expression:

Code: Select all

# echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
a
b
# echo a.so | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
a

# echo ab | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
ab
What I added was the question marke at the end of "[.]so[.]?", so that it would work with an input such as "a.so".
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Re: PCRE -- Perl Comptable Regular Expressions

#7 Post by MochiMoppel »

s243a wrote:I wrote some code to do the same thing as above
Not the same. The latter may produce wrong results. Hint: Check the documentation for the expr command:
  "index STRING CHARS   Index in STRING where any CHARS is found, or 0"
Try 's.so.b' or 'o.so.b' so see what "any CHARS" means.
I'm not sure which approach is faster
The latter
and/or more readable.
:lol:

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: PCRE -- Perl Comptable Regular Expressions

#8 Post by s243a »

MochiMoppel wrote:
s243a wrote:I wrote some code to do the same thing as above
Not the same. The latter may produce wrong results. Hint: Check the documentation for the expr command:
  "index STRING CHARS   Index in STRING where any CHARS is found, or 0"
Try 's.so.b' or 'o.so.b' so see what "any CHARS" means.
It should be fixed now:

Code: Select all

function split_on_so(){
  local str=$1
  local len=${#str}
  local len_m=$((len-1))
  local s1
  local s2
  local p1
  local p2

    len=${#str}    
    s1=${str%%.so*}
    p1=${#s1}

    if [ $p1 -lt $len ]; then
      p2=$((p1+3))
      if [ ${str:p2:1} = '.' ]; then
        p2=$((p2+1))
      fi
    else
      p2=len
    fi
    s2=${str:$p2}

  echo "$s1"
  echo "$s2"
}
I'm not sure which approach is faster
The latter
and/or more readable.
:lol:
I suppose then I'll have to find a better application of PCRE.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
GustavoYz
Posts: 883
Joined: Wed 07 Jul 2010, 05:11
Location: .ar

#9 Post by GustavoYz »

Code: Select all

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$
but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
Not sure I understood the problem correctly, but by the time you reach "b" you're overwriting
the capture group at $1 (where was "a" and then back-tracked to just put "b").

Using your expression, I think this works:

Code: Select all

^((?![.]so[.]).+?)(?:([.]so[.])(.*))?$
It gets anything before '.so.' at $1, '.so.' at $2 and anything after at $3.
If there is no '.so.', all goes to $1. However be aware that you can expect a
bazillion backtracks if the input string is somehting like 'bunchoftextcuzwhynot.so.nowsomemore'.

If you're using Perl, I'd recommend you Regexp::Debugger which is awesome and has a nice
interface that shows you the steps of the matching process.

Post Reply