Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Tue 10 Dec 2019, 20:01
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
AWK: match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [10 Posts]  
Author Message
s243a

Joined: 02 Sep 2014
Posts: 2199

PostPosted: Fri 20 Sep 2019, 23:19    Post subject:  AWK: match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
Subject description: Matches libcg but 'g' is not a digit????? (solved)
 

I want to use AWK to mach libcN where 'N' is the major version number. I'm using AWK on a puppy database file for a package repo. These repo files follow the petspet format where the second field (i.e. $2) is the package name. To the best of my understanding the regular expression to do this should be:
Code:

match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)

https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html

but for some inexplicable reason it appears to be matching 'g' as a numeric digit even though the docs say the following:

Quote:

A character class is only valid in a regexp inside the brackets of a bracket expression. Character classes consist of ‘[:’, a keyword denoting the class, and ‘:]’. Table 3.1 lists the character classes defined by the POSIX standard.
....
[:digit:] Numeric characters

https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html#Bracket-Expressions

Here is my debugging output which shows the awk program:

Code:

++ cat /var/packages/repo/Packages-devuan-ascii-non-free
++ awk '    BEGIN{FS="|"}
    {
      match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
      if ( pkg_split[1] == "libc" ) {
        print
      }
    }'
+ awk_result='libcg_3.1.0013-2+b1|libcg|3.1.0013-2+b1||BuildingBlock|11609K|pool/DEBIAN/non-free/n/nvidia-cg-toolkit|libcg_3.1.0013-2+b1_i386.deb|+libc6&ge2.3.6-6|Nvidia Cg core runtime library|devuan|ascii|'
+ '[' '!' -z 'libcg_3.1.0013-2+b1|libcg|3.1.0013-2+b1||BuildingBlock|11609K|pool/DEBIAN/non-free/n/nvidia-cg-toolkit|libcg_3.1.0013-2+b1_i386.deb|+libc6&ge2.3.6-6|Nvidia Cg core runtime library|devuan|ascii|' ']'



The debugging output is produced as follows:
Code:

bash -x /usr/sbin/pkg-list-alias libc 2>&1 | tee pkg_list_alias.log

and my script can be found at:
https://pastebin.com/Yb7gNV2r

which is an updated version of a script which I discussed at:
http://murga-linux.com/puppy/viewtopic.php?p=1037047#1037047

Here is the line of code which calls the AWK program:

Code:

awk_result="$(cat $aRepoDB | awk "$AWK_PRG")"

_________________
Find me on minds and on pearltrees.

Last edited by s243a on Sat 21 Sep 2019, 08:55; edited 6 times in total
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2199

PostPosted: Fri 20 Sep 2019, 23:21    Post subject: Re: AWK: match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
Subject description: Matches libcg but 'g' is not a digit?????
 

delete
_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4872
Location: Blue Springs, MO

PostPosted: Sat 21 Sep 2019, 00:25    Post subject:  

It may save you some time to test it here:
https://regex101.com/

When you use parens, you can usually print out the matches with \N for debugging (where N is the Nth set of parens), I don't recall how to do it in awk though.

_________________
Check out my github repositories. I may eventually get around to updating my blogspot.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1943
Location: Japan

PostPosted: Sat 21 Sep 2019, 00:56    Post subject:  

Please post a simple example. Your input string and the expected output.
Your regex pattern looks wrong as you may need an additional set of square brackets.
Back to top
View user's profile Send private message 
Burunduk

Joined: 21 Aug 2011
Posts: 76

PostPosted: Sat 21 Sep 2019, 07:58    Post subject:  

As MochiMoppel has pointed out, you definitely need additional square brackets here. [^:digit:] is the same as [^:dgit] and it matches anything but 'g' and the other four characters.
So, if all you want to match is something like AAANNN where A is not a digit and N is, then this will do:

Code:
/([^0-9]+)([0-9]*)/


or using POSIX character classes:

Code:
/([^[:digit:]]+)([[:digit:]]*)/


Note also that an array argument is a GAWK extension not supported by the busybox awk.
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2199

PostPosted: Sat 21 Sep 2019, 08:53    Post subject:  

Burunduk wrote:
As MochiMoppel has pointed out, you definitely need additional square brackets here. [^:digit:] is the same as [^:dgit] and it matches anything but 'g' and the other four characters.
So, if all you want to match is something like AAANNN where A is not a digit and N is, then this will do:

Code:
/([^0-9]+)([0-9]*)/


or using POSIX character classes:

Code:
/([^[:digit:]]+)([[:digit:]]*)/



Thankyou. That was most helpful Smile. Now I get the correct debugging output:

Code:

++ cat /var/packages/repo/Packages-devuan-ascii-main
++ awk '    BEGIN{FS="|"}
    {
      match($2,/^(.*[^[:digit:]])([[:digit:]]*$|$)/,pkg_split)
      if ( pkg_split[1] == "libc" ) {
        print
      }
    }'
+ awk_result='libc6_2.24-11+deb9u4|libc6|2.24-11+deb9u4||BuildingBlock|9579K|pool/DEBIAN/main/g/glibc|libc6_2.24-11+deb9u4_i386.deb|+libgcc1|GNU C Library: Shared libraries|devuan|ascii|'

At fist I didn't read your post carefully enough so I only fixed the first set of square brackets. [^[:digit:]] and didn't realize that I also needed to double up on the second set of square brackets [[:digit:]]. In hindsight, I can see how this would make parsing easier for awk.

Quote:
Note also that an array argument is a GAWK extension not supported by the busybox awk.


I was wondering that. This is good to know. We can do something similar with grep, if we have the full version of grep but not awk. However, awk is more efficient for this application.

Edit: an updated version of the original script (with the bracket fix) can be found at: https://pastebin.com/KtUikhdS

_________________
Find me on minds and on pearltrees.

Last edited by s243a on Sat 21 Sep 2019, 13:44; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1943
Location: Japan

PostPosted: Sat 21 Sep 2019, 08:58    Post subject:  

Burunduk wrote:
As MochiMoppel has pointed out, you definitely need additional square brackets here.
Yes, there Laughing
Quote:
Note also that an array argument is a GAWK extension not supported by the busybox awk.
Methinks that s243a doesn't need array arguments at all. For pulling out a leading non numeric string something like this should do
Code:
       sub(/[0-9].*/,"",$2)
       print $2
but I don't know what his strings look like and what he wants to achieve.
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2199

PostPosted: Sat 21 Sep 2019, 09:23    Post subject:  

MochiMoppel wrote:
Burunduk wrote:
As MochiMoppel has pointed out, you definitely need additional square brackets here.
Yes, there Laughing
Quote:
Note also that an array argument is a GAWK extension not supported by the busybox awk.
Methinks that s243a doesn't need array arguments at all. For pulling out a leading non numeric string something like this should do
Code:
       sub(/[0-9].*/,"",$2)
       print $2
but I don't know what his strings look like and what he wants to achieve.

I want the entire repo db record. So something like the following might also work (untested):

Code:

pkg=gensub(/[0-9].*/,"","g",$2)
if ( pkg = libc ) {
  print
}

https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html

**Note that I like the array syntax because it is more general and more efficient than the gensub approach even if it isn't as widely supported.

As a side note I thought the '6' in libc6 was the major package version but I see from above that the package version is 2.24-11+deb9u4. However, I note that if I look at the file names in the package that the actual lib is called:
Code:

/lib/i386-linux-gnu/libc.so.6

https://packages.debian.org/stretch/i386/libc6/filelist

by linux standards the '6' should be the version of the lib rather than the version of the package:

Quote:

3.1.1. Shared Library Names

Every shared library has a special name called the ``soname''. The soname has the prefix ``lib'', the name of the library, the phrase ``.so'', followed by a period and a version number that is incremented whenever the interface changes (as a special exception, the lowest-level C libraries don't start with ``lib''). A fully-qualified soname includes as a prefix the directory it's in; on a working system a fully-qualified soname is simply a symbolic link to the shared library's ``real name''.

http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html

I didn't realize that the package versions were different than the lib versions for some linux packages. I will think about the implications of this distinction.

I will note that some package repos do not use the lib version as a suffix in the package name.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4872
Location: Blue Springs, MO

PostPosted: Sat 21 Sep 2019, 12:45    Post subject:  

Burunduk wrote:
As MochiMoppel has pointed out, you definitely need additional square brackets here. [^:digit:] is the same as [^:dgit] and it matches anything but 'g' and the other four characters.
So, if all you want to match is something like AAANNN where A is not a digit and N is, then this will do:

Code:
/([^0-9]+)([0-9]*)/


or using POSIX character classes:

Code:
/([^[:digit:]]+)([[:digit:]]*)/


Note also that an array argument is a GAWK extension not supported by the busybox awk.

\d is also the same as the [[:digit:]] in some regex engines
You can emulate n-dimensional arrays in busybox awk with a separator - usually a comma.
... so instead of array[i][j][k] you'd use array[i,j,k] (works in gawk&make too)

The link I posted is useful to build and test your regex, but I used to always just do a Ctrl+h in geany.

_________________
Check out my github repositories. I may eventually get around to updating my blogspot.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2199

PostPosted: Mon 23 Sep 2019, 01:10    Post subject:  

technosaurus wrote:
It may save you some time to test it here:
https://regex101.com/

When you use parens, you can usually print out the matches with \N for debugging (where N is the Nth set of parens), I don't recall how to do it in awk though.


I found an example where awk behaves differently than this test program.

Code:

# echo ac | awk "{match(\$1,/(:?a|b)(c|d)/,matches); print matches[1]}"
a


If AWK supported non-capturing groups than the result would be "c". AWK doesn't appear to support non-capturing groups. The following link seems to agree with my claim about AWK's limitation here:
https://comp.unix.programmer.narkive.com/pvJASnhW/is-there-a-non-capturing-grouping-in-gawk-regex

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
Display posts from previous:   Sort by:   
Page 1 of 1 [10 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1123s ][ Queries: 12 (0.0189s) ][ GZIP on ]