Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Wed 13 Nov 2019, 20:31
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
AWK Based Version Comparison
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [6 Posts]  
Author Message
s243a

Joined: 02 Sep 2014
Posts: 2144

PostPosted: Mon 23 Sep 2019, 16:25    Post subject:  AWK Based Version Comparison
Subject description: Some draft code (untested changes)
 

Intro

The woof-CE team has noticed how Jamesbond's repo database parser is faster than the puppy package manager at parsing this database. Aside from speed, another advantage that AWK has the advantage over a language like Perl and Python is that it is more stable and more frequently available in a system.

I'm exploring the idea of doing version comparison in AWK so that we can filter the database to includes results that are only within a given version range. Note that there is a difference between lib version as per the so-name of the primary lib within a package (e.g. libc6) which has a lib version of 6 vs the package version (e.g. glibc-2.24) (See post).

We could build tools to construct a database which equates the lib version with the package version either by downloading packages and using ldd or in some cases (e.g. Debian Repos) extracting the lib version from the package name.

Anyway, in my version comparison tool, that I'm working on we strip the lib version which searching for repo records.

Bash Code:
Code:

stripped="$(echo $pkg_name | sed -e 's/[0-9]*$//g')"


AWK code:
Code:

           match(\$2,/^(.*[^[:digit:]])([[:digit:]]*$|$)/,pkg_split)
           if ( pkg_split[1] == \"$stripped\" $CMP_Function ) {
             print



Methodology

My mythology here is to build my AWK code in a Bash function called, "mk_AWK_prg()"

Version comparisons are based from the input to this function and used to create an array that will be used in the AWK function

Code:

    declare -a options="$(getopt -o f:l:m:np:su:v: --long full-version:,lib-version,min-version,--no-strip:,package:,stripped,version:,gt:,ge:,lt:,le: -- $@)"
    eval set --"$options"
    while [ $# -gt 0 ]; do
      case "$1" in
...
      -m|--min-version|--ge)
         n_cmp=$((n_cmp+1))
         awk_cmp_ary_op="$awk_cmp_ary_op$'\n'awk_cmp_ary_op[$n_cmp]=\"ge\""
         awk_cmp_ary_val="$awk_cmp_ary_val$'\n'awk_cmp_ary_val[$n_cmp]=\"$2\""
         shift 2  ;;   


The version comparison function uses regular expressions to parse the package version:

Code:

      AWK_Functions="\
         function version_split(version1,version_array,split_chars       i1,remainder1,matches){
           match(num1,/[[:digit:]]*:(.*|$)/,num1_epoch_split)
           if (length(num1_epoch_split) > 0 ){
             version_array[1]=num1_epoch_split[1]
             remainder1=num1_epoch_split[2]
           } else {
             version_array[1]=0
             remainder1=version1
           }
           split_chars[1]=\":\"
           i1=2
           match(remainder1,/^([^+-~:])(([+.~])([^+.~-:]+))*(([-])([^+.~-:])+)?$/,matches) 
           version_array[i1]=matches[i2]
           
           for (i2 = 4; i<length(matches); i2=i2+3){
             i1=i1+1
             version_array[i1]=matches[i2]
             split_chars[i1-1]=matches[i2-1]
           }
         }
...

see: https://linux.die.net/man/5/deb-version

Note that the epoch is split first because we want to successively match up the other delimiters.

We test each element in the array of version comparisons against the repo record. If any test fails we return '0' Otherwise we return '1'.

AWK Code (part of the bash string "AWK_Functions")
Code:

         arry_cmp(version,ops_array,val_array){
           for (i=1; i<=length(ops_array)){
             #https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement
             switch(ops_array){
             case \"<\":
             case \"lt\":
               if (v_lt(version,val_array[i]) == 0 ){
                 return 0
               }
               break


If any versions comparisons were supplied as inputs to mk_AWK_prg() then we add AWK_Functions and the related arrays to our code. Otherwise these functions aren't added to the AWK code.

Code:

    CMP_Function="&& arry_cmp(\$3,awk_cmp_ary_op,awk_cmp_ary_val)"$'\n'
    else
      AWK_Functions=''
      CMP_Function=''
    fi
   
    #https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html
    if [ $stripped_match -eq 1 ]
        AWK_PRG=$AWK_Functions \
"        BEGIN{FS=\"|\"
         $awk_cmp_ary_op
         $awk_cmp_ary_val 
       }
       {
         if( \$2 == \"$pkg_name\" $CMP_Function ) {
           print
         }
        else
           match(\$2,/^(.*[^[:digit:]])([[:digit:]]*$|$)/,pkg_split)
           if ( pkg_split[1] == \"$stripped\" $CMP_Function ) {
             print
        }
     }"


Note that: this code is very preliminary with untested changes..

Further Work
1. Aside from the epoch, I'm not making much of a distinction based on the type of version separator. This will require further research. However, in most cases the version separator that we will be primary interested is the period (i.e. '.'), Therefore the code should be useful prior to me doing this investigation.
2. .... (more to follow)

Link to preliminary code: https://pastebin.com/wXQ6uLwY

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
musher0

Joined: 04 Jan 2009
Posts: 14455
Location: Gatineau (Qc), Canada

PostPosted: Mon 23 Sep 2019, 19:24    Post subject:  

Hi, s243a.

Sorry for being so dumb, but you wish to compare the version of what with what,
exactly?

Within what? The PPM database?

If you wish to compare the versions of execs and libs from one Puppy to the next,
I hope you have lots of time on your hands.

I at least got that you do NOT want to check the version of AWK itself. Phew! Smile
That said, if you really are after speed, use mawk, not the (usually) native gawk.
Ref.: https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language

Respectfully.

_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2144

PostPosted: Mon 23 Sep 2019, 21:04    Post subject:  

musher0 wrote:
Hi, s243a.

Sorry for being so dumb, but you wish to compare the version of what with what,
exactly?


If we look at the repo db record for bash we see:

Code:

bash_4.4-5|bash|4.4-5||BuildingBlock|5949K|pool/DEBIAN/main/b/bash|bash_4.4-5_i386.deb|+base-files&ge2.1.12,+debianutils&ge2.15|GNU Bourne Again SHell|devuan|ascii|


We see for dependencies that 'base-files must be greater than or equal to 2.1.12 and debianutils must be greater than 2.15. So if we are looking for dependencies we might want to verify the version meets these requirements before installing it.

Quote:
Within what? The PPM database?

If you wish to compare the versions of execs and libs from one Puppy to the next,
I hope you have lots of time on your hands.


I want to be able to mix different repos. Puppy actually does this because we have Package-puppy-common64-official and Package-puppy-common32-official but these are puppy specific binaries are compiled in such a way to be widely compatible.

I have two rules of thub which are:
1. repos which appear first in the list have priority
2. stick primary to binary compatible repos (e.g. simmilar versions of glibc)

The repo which I give primary priority are those of the binary comptable distro. However, say we end up installing a package from a repo of lesser priority then which versions of the dependencies should we install. My default preference is to install the dependencies which have versions matching the binary compatible distro (when available) but perhaps this behaviour can be configured via settings and where available other information can be used to resolve conflicts (e.g. the more stringent version range).

Quote:
I at least got that you do NOT want to check the version of AWK itself. Phew! Smile
That said, if you really are after speed, use mawk, not the (usually) native gawk.
Ref.: https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language

Respectfully.


Thankyou for the tip. I don't know if this will be helpful to me or not because I'm probably using gawk features that aren't available in mawk. However, it might be more efficient to pre-process with mawk.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
musher0

Joined: 04 Jan 2009
Posts: 14455
Location: Gatineau (Qc), Canada

PostPosted: Mon 23 Sep 2019, 22:34    Post subject:  

s243a wrote:
musher0 wrote:
Hi, s243a.

Sorry for being so dumb, but you wish to compare the version of what with what,
exactly?


If we look at the repo db record for bash we see:

Code:

bash_4.4-5|bash|4.4-5||BuildingBlock|5949K|pool/DEBIAN/main/b/bash|bash_4.4-5_i386.deb|+base-files&ge2.1.12,+debianutils&ge2.15|GNU Bourne Again SHell|devuan|ascii|


We see for dependencies that 'base-files must be greater than or equal to 2.1.12 and debianutils must be greater than 2.15. So if we are looking for dependencies we might want to verify the version meets these requirements before installing it.

Quote:
Within what? The PPM database?

If you wish to compare the versions of execs and libs from one Puppy to the next,
I hope you have lots of time on your hands.


I want to be able to mix different repos. Puppy actually does this because we have Package-puppy-common64-official and Package-puppy-common32-official but these are puppy specific binaries are compiled in such a way to be widely compatible.

I have two rules of thub which are:
1. repos which appear first in the list have priority
2. stick primary to binary compatible repos (e.g. simmilar versions of glibc)

The repo which I give primary priority are those of the binary comptable distro. However, say we end up installing a package from a repo of lesser priority then which versions of the dependencies should we install. My default preference is to install the dependencies which have versions matching the binary compatible distro (when available) but perhaps this behaviour can be configured via settings and where available other information can be used to resolve conflicts (e.g. the more stringent version range).

Quote:
I at least got that you do NOT want to check the version of AWK itself. Phew! Smile
That said, if you really are after speed, use mawk, not the (usually) native gawk.
Ref.: https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language

Respectfully.


Thankyou for the tip. I don't know if this will be helpful to me or not because I'm probably using gawk features that aren't available in mawk. However, it might be more efficient to pre-process with mawk.


Thanks for the example. That clears things up a bit.

As to the GNU EXTENSIONS, there is a list at
http://man7.org/linux/man-pages/man1/gawk.1.html
May I suggest working with < gawk --traditional > to avoid
gawk peculiarities in the code.

BFN.

_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
sc0ttman


Joined: 16 Sep 2009
Posts: 2741
Location: UK

PostPosted: Tue 24 Sep 2019, 17:52    Post subject: Re: AWK Based Version Comparison
Subject description: Some draft code (untested changes)
 

s243a wrote:
I'm exploring the idea of doing version comparison in AWK

if it's just the version comparison, and not about extracting it, why not use vercmp?

(sorry if i missed the point..)

_________________
Pkg, mdsh, Woofy, Akita, VLC-GTK, Search
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2144

PostPosted: Sat 28 Sep 2019, 23:18    Post subject: Re: AWK Based Version Comparison
Subject description: Some draft code (untested changes)
 

sc0ttman wrote:
s243a wrote:
I'm exploring the idea of doing version comparison in AWK

if it's just the version comparison, and not about extracting it, why not use vercmp?

(sorry if i missed the point..)


I want the whole repo-db record. I'm using AWK to filter the repo-db. I actually considered your suggestion, because I was having a bit of trouble debuging my AWK code but I'm making progress. I'll keep your idea in mind though as a fallback (e.g. in case gawk isn't installed).

Anyway, I started writing some test code:

Test file "~/repo_test_file"
Code:

base-files_9.9+devuan2.5|base-files|9.9+devuan2.5||Network|368K|pool/DEVUAN/main/b/base-files|base-files_9.9+devuan2.5_all.deb||Devuan base system miscellaneous files|devuan|ascii|


Test command
Code:

cat ~/repo_test_file 2>/dev/null | ~/Find_Base_Files


Test AWK code "~/Find_Base_Files"
** Contains lots of debugging print statements and some commented out old code.
Code:

#!/usr/bin/gawk -f
        function version_split(version1,version_array,split_chars,       i1,remainder1,matches,num1_epoch_split){
          print "version1=" version1
          match(version1,/([[:digit:]])*:(.*|$)/,num1_epoch_split)
          if (length(num1_epoch_split) > 0 ){
            version_array[1]=num1_epoch_split[1]
            remainder1=num1_epoch_split[2]
          } else {
              version_array[1]=0
              remainder1=version1
          }
          delete num1_epoch_split
          split_chars[1]=":"
          i1=2
          #match(remainder1,/^([^+\.\-~:]+)(([+\.~])([^+\.~\-:]+))*((\-)([^+\.~\-:])+)?$/,matches)   
          #match(remainder1,/^([^+\.\-~:]+)(([+\.\-~:])(.*))?$/,matches)
          match(remainder1,/^([[:digit:]]+)(([+\.\-~:])(.*))?$/,matches)           
          while (length(matches) > 0) {
            version_array[i1]=matches[1]   
             print "version_array[" i1 "]=" version_array[i1]      
             if (length(matches)>1){       
              split_chars[i1]=matches[3]
               print "split_chars[" i1 "]=" split_chars[i1]
               remainder1=matches[4]
               print "remainder1=" remainder1
               #match(remainder1,/^([^+\.\-~:]+)(([+\.\-~:]+)(.*))?$/,matches)
               match(remainder1,/^([[:digit:]]+)(([+\.\-~:]+)(.*))?$/,matches)
             }
             else{
               break
             }      
                   
             i1=i1+1                                   
          }
        }
        function v_le(ver_split, val_split,       len_ver){
          return v_ge(val_split,ver_split)
        }
        function v_ge(ver_split, val_split,       len_ver){
          print "v_ge"
          if (length(ver_split)<length(val_split)){
            len_ver=length(ver_split)
          }
          else{
           len_ver=length(val_split)
          }
           for (i=1; i<=len_ver; i++){
             print "ver_split[" i "]=" ver_split[i]
             print "val_split[" i "]=" val_split[i]
             if ( ver_split[i] < val_split[i] ) return 0
             if ( ver_split[i] > val_split[i] ) return 1
          }
          print "finished ge compare"
          print "length_ver_split=" length(ver_split)
          print "length_val_split=" length(val_split)
          if ( length(ver_split) >= length(val_split) )
            return 1
          else
            return 0
        }
        function v_gt(num1, num2,    le){
          ge = v_ge(num2,num1)
          if ( ge == 1 ){
            return 0
          }
          else {
            return 1
          }
        }
        function v_lt(num1, num2){
          return v_gt(num2, num1)
        }
        #An equal-ish functions.
        function v_e(ver_split, val_split,       len_ver){
          if (length(ver_split)<length(val_split)) len_ver=length(ver_split)
          else len_ver=length(val_split)
           for (i=1; i<len_ver; i++){
             if (version_array1[i] != version_array2[i])
               return 0
          }
          return 1
        }
        function arry_cmp(version,ops_array,val_array){
          print "test"
          version_split(version,ver_split,ver_split_chars)        
          for (i=1; i<=length(ops_array); i++){
            print "Ops_array " ops_array[i] " " val_array[i]
             version_split(val_array[i],val_split,val_split_chars)   
            #https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement
            switch(ops_array[i]){
            case "<":
            case "lt":
              if (v_lt(ver_split,val_split) == 0 ){
                return 0
             }
             break
           case ">":
           case "gt":
              if (v_gt(ver_split,val_split) == 0 ){
                return 0
             }
             break
           case "<=":
            case "le":
              if (v_le(ver_split,val_split) == 0 ){
                return 0
             }
             break          
           case ">=":
            case "ge":
              if (v_ge(ver_split,val_split) == 0 ){
                return 0
             }
             break          
           case "==":
            case "e":
              if (v_e(ver_split,val_split) == 0 ){
                return 0
             }
             break          
           }
           #https://unix.stackexchange.com/questions/147957/delete-an-array-in-awk
           delete val_split
           delete val_split_chars
          }
          print "returning result=1"
          return 1
        }
             BEGIN{FS="|"
        
awk_cmp_ary_op[1]="lt"
        
awk_cmp_ary_val[1]="9.10" #"2.1.12"
        }
        {
          #print "wtf"
          if( $2 == "base-files") {
                          if ( arry_cmp($3,awk_cmp_ary_op,awk_cmp_ary_val) == 1 ){
                 print "printing result 1"                           
                print
              }
          }
          else{
            match($2,/^(.*[^[:digit:]])([[:digit:]]*$|$)/,pkg_split)
            if ( pkg_split[1] == "base-files" ) {
                            if ( arry_cmp($3,awk_cmp_ary_op,awk_cmp_ary_val) == 1 ){
                print "printing result 2"           
                print
              }
            }
          }
          delete pkg_split
        }

https://pastebin.com/dyPMD49F

In line 133
Code:

awk_cmp_ary_val[1]="9.9" #"2.1.12"


the version number can be changed to compare against the test file. The test code seems to work so I need to integrate it back into my bash test code then eventually back into my pkg fork.

One thing that I had to give up on is repeated capture groups in my regular expressions. AWK seems to only save the last capture (in regular expressions) of a repeated capture group.

Note that my code behaves differently than vercmp. I noticed the version number of the record that I was looking at ended in "+devuan2.5". Presumably this is the revision of devuan rather than the revision of the package. This information isn't useful in comparing packages between different versions of linux but might be indirectly useful for deducing binary compatability (e.g. glibc requirments). What I decided to do is to stop parsing when the next non seperator tolken resulted in a non numeric value.
Code:

match(remainder1,/^([[:digit:]]+)(([+\.\-~:]+)(.*))?$/,matches)

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
Display posts from previous:   Sort by:   
Page 1 of 1 [6 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2316s ][ Queries: 12 (0.0276s) ][ GZIP on ]