Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Fri 28 Nov 2014, 21:24
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
awk-wordness simplify and speed up your code with awk
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 2 [23 Posts]   Goto page: 1, 2 Next
Author Message
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Wed 02 May 2012, 19:44    Post subject:  awk-wordness simplify and speed up your code with awk  

awk is a powerful tool that is often overlooked for more familiar tools like sh,sed,grep,wc,head,tail,cat,cut,...
Quite often we can drastically speed up a command by using awk built-ins (it takes time to call external programs). I will try to show the major ones here.

grep STRING FILE ==> awk '/STRING/{print}' FILE
grep -v STRING FILE ==> awk '!/STRING/{print}' FILE

cut -d "\t" -f2,3 FILE ==> awk 'FS="\t" {print $2 $3}'

sed 's/string/newstring/' ==> awk 'sub(/string/,"newstring");print}' FILE
sed 's/string/newstring/g' ==>awk 'gsub(/string/,"newstring");print}' FILE

cat FILE ==> awk '{print}' FILE

That's all the time I have right now, so I will add a ...
TODO
show how to execute an external program using builtin - system() command
show how to do head/tail-like operations
show how to do math operations
show how to store variables and arrays
show how to do various loops
show how to do other stuff I am forgetting

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Wed 09 May 2012, 11:07    Post subject:  

here is an example that is simple in awk, but would be pretty complex without it

example: targress file.tar.{gz,bz2,xz,...}
Code:
[ -f "$1" ] && tar -tvf "$1" >/tmp/tarfiles || exit
tar -xvf "$1" |awk '{
if ( $3 != "" ){
   size[$6]=$3
   tot+=$3
}else{
   subtot+=size[$1]
   printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles -


explanation:
[ -f "$1" ] && tar -tvf "$1" >/tmp/tarfiles || exit
if the input is a file, get a long listing containing all of the file sizes (or exit)

tar -xvf "$1" |awk '{ ... }' /tmp/tarfiles -
go through the file /tmp/tarfiles first and then - (stdin from the tar -xvf "$1", which lists the files as they are decompressed)

if ( $3 != "" ){
size[$6]=$3
tot+=$3

if there is a size field (only in /tmp/tarfiles), then add an associative array with the file name ($6th field) as the name with the size ($3rd field) as the value, then increment the total by that amount

}else{
subtot+=size[$1]
printf "%d\n", 100 * subtot / tot

since there is no field 3, we are processing the verbose output from decompressing the tarball (each filename as it is decompressed) We use that as the name of the associative array and add its value to the subtotal, then use that subtotal to print the percentage

you can make this into a standalone script for use by yad or {,X,gtk}dialog by adding a #!/bin/sh to the top or as a function like
targress(){
#code here
}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
disciple

Joined: 20 May 2006
Posts: 6458
Location: Auckland, New Zealand

PostPosted: Thu 10 May 2012, 02:00    Post subject:  

It would be good if you could add a one-sentence explanation of the purpose of that targress script Smile

Quote:
Quite often we can drastically speed up a command by using awk built-ins (it takes time to call external programs)

So you're not saying for example that an awk script is faster than a sed script, just that a single awk script is faster than a script which calls multiple tools e.g. both awk and sed. And that awk is a particularly powerful tool, so there is a good chance you can do the whole task with it. Is that right?
What about the issue that many of those tools are built in to busybox? Is awk still faster?

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send private message 
sunburnt


Joined: 08 Jun 2005
Posts: 5042
Location: Arizona, U.S.A.

PostPosted: Thu 10 May 2012, 17:01    Post subject:  

technosaurus brought this up a while back when I was writing a script library.
Awk was always faster than mulitple piped commands.

Plus you can specify awk at the start of the script to save an additional call.
The whole script needs to be written in awk this way, but it`s fast.
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Thu 10 May 2012, 21:11    Post subject:  

the targress code just outputs the percentage that a tarball has been extracted ... good for use in a package installer to show the user how far it has to go ... for example
targress tarball.tar | Xdialog --progress "message here" 0 0 -

using busybox is somewhat closer if you use "prefer applets" (most distros do not ... nor does Barry, due to compatibility issues) but it still has to fork /proc/self/exe for many and extra streams and pipes have to be set up ... if that is in a loop of any kind awk will win hands down, but the busybox version of awk is extremely fast and nearly as capable as gawk nawk and standard awk

awk seems difficult at first, but once you get it, its actually pretty capable of doing things that we often use several other tools for. If you can use only one tool for a task, it is often both simpler and faster (not always, my jwm_menu_create only uses busybox ash and it does a lot of stuff that would be faster/easier in awk... I just knew shell scripting better at the time)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
disciple

Joined: 20 May 2006
Posts: 6458
Location: Auckland, New Zealand

PostPosted: Thu 10 May 2012, 22:29    Post subject:  

Is http://awk.info/ the closest equivalent to http://sed.sourceforge.net/ (but presumably not written itself in awk)?
_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Tue 15 May 2012, 02:38    Post subject:  

Ok, here it is as something somewhat useful - a tarball extractor that _should_ work with all versions of gtkdialog (2,3,4 and the backport to gtk1)... I used Xdialog for the progress bar, since not all versions of gtkdialog have it. This could probably be extended to work for other types of archives as well by using if (FILENAME ~ ".tar")...

Code:
targress(){
[ -f "$1" ] && TARBALL="$1" || return 1
shift
[ "$1" ] && tar -tvf "$TARBALL" $@ >/tmp/tarfiles
tar -xvf "$TARBALL" $@ |awk '{
if ( $3 != "" ){
   size[$6]=$3
   tot+=$3
}else{
   subtot+=size[$1]
   printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles - |Xdialog --wmclass xmessage --progress "extracting ${TARBALL%%*/}" 0 0 -
}

tar -tvf "$1" >/tmp/tarfiles

eval `awk 'BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<tree><label>Permissions|Size|Date|Time|Filename</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $1 "|" $3 "|" $4 "|" $5 "|" $6 "</item>" }
END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && targress "$1" $TREE1

Edit: fixed formatting related typo that broke the script

_________________
Web Programming - Pet Packaging 100 & 101

Last edited by technosaurus on Tue 15 May 2012, 23:12; edited 1 time in total
Back to top
View user's profile Send private message 
goingnuts

Joined: 07 Dec 2008
Posts: 783

PostPosted: Tue 15 May 2012, 15:06    Post subject:  

Cant get it running (named the script "test"):
Code:
# ./test disktype-9.tar.bz2
./test: line 23: Error: command not found

line 23 is the line with:
Code:
END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`

my gtkdialog1 works and /tmp/tarfiles contains:
Code:
drwxr-xr-x root/root         0 2012-05-15 20:33 bin/
-rwxr-xr-x root/root    411644 2012-05-15 20:33 bin/disktype

Question
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Tue 15 May 2012, 23:10    Post subject:  

oops, I must have accidentally deleted the "<" before the height tag after pasting it ... here it is working with busybox applets and gtk1 versions of gtkdialog and Xdialog
Code:
#!/bin/ash

targress(){
[ -f "$1" ] && TARBALL="$1" || return 1
shift
[ "$1" ] && busybox tar -tvf "$TARBALL" $@ >/tmp/tarfiles
busybox tar -xvf "$TARBALL" $@ |awk '{
if ( $3 != "" ){
   size[$6]=$3
   tot+=$3
}else{
   subtot+=size[$1]
   printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles - |Xdialog --wmclass xmessage --progress "extracting ${TARBALL##*/}" 0 0 -
}

busybox tar -tvf "$1" >/tmp/tarfiles

eval `busybox awk '
BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<tree><label>Permissions|Size|Date|Time|Filename</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $1 "|" $3 "|" $4 "|" $5 "|" $6 "</item>" }
END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && targress "$1" $TREE1

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
goingnuts

Joined: 07 Dec 2008
Posts: 783

PostPosted: Wed 16 May 2012, 11:12    Post subject:  

Thanks! It works now - I had to do slight modification as gtkdialog1 only reports the value of the first column - and it seems that gtkdialog2 tree code wont show the content so change tree to table. I havent got the gtkdialog3 to run yet.
Code:
eval `busybox awk '
BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<table><label>Filename|Size     |Date       |Time     |Permissions</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $6 "|" $3 "|" $4 "|" $5 "|" $1 "</item>" }
END{print "</table></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && echo "TREE is $TREE1" && targress "$1" $TREE1


I had to load a kernel-pkg to actually view the progress bar - but it works very well!
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Sat 19 May 2012, 12:59    Post subject:  

goingnuts wrote:
Thanks! It works now - I had to do slight modification as gtkdialog1 only reports the value of the first column - and it seems that gtkdialog2 tree code wont show the content so change tree to table. I havent got the gtkdialog3 to run yet.
...
I had to load a kernel-pkg to actually view the progress bar - but it works very well!

good catch, I was only testing with the name ($6) & forgot to modify that when it changed, it would be nice if the first column would expand instead of the last, since it is the one that is actually used

yeah, the second part is pretty quick, but I may need to add
| tee /tmp/tarfile |
rather than pre-generating it, but I don't know if it will help - I don't think gtkdialog draws until the whole tree is loaded - I need to look into fixing that

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
goingnuts

Joined: 07 Dec 2008
Posts: 783

PostPosted: Sat 19 May 2012, 17:05    Post subject:  

The loading of the kernel pkg was rather slow...maybe also use the progress-bar when loading into gtkdialog? Or load part of the archive listing into gtkdialog - and then refresh the table when all files available? Or both...
Well - maybe this get too much off topic - sorry for that!
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Sat 19 May 2012, 23:49    Post subject:  

goingnuts wrote:
The loading of the kernel pkg was rather slow...maybe also use the progress-bar when loading into gtkdialog? Or load part of the archive listing into gtkdialog - and then refresh the table when all files available? Or both...
Well - maybe this get too much off topic - sorry for that!


with awk you can check how many lines to list using FNR or NR < gtkdialog_limit (what maybe 40 or so?) ... not sure the gtkdialog command to refresh or how to signal it to refresh other than just redrawing in the END section using variables after the listing is complete (I'm pretty sure the tar listing is what takes so long) ... there is no real way to do a percent bar though just an spinning hourglass (AFAIK, unless there is a way to quickly get the raw number of files in a tarball)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
goingnuts

Joined: 07 Dec 2008
Posts: 783

PostPosted: Sun 20 May 2012, 10:55    Post subject:  

gtkdialog refresh needs a trigger from one of the widgets so it will not work...
Seems that gtkdialog1 autosize first column but gtkdialog2 does not...
Below works in gtkdialog1&2, maybe some awk-script can be used for the loading progress-bar...
Code:
#!/bin/ash

echo "
targress(){   
[ -f \"\$1\" ] && TARBALL=\"\$1\" || return 1
shift
[ \"\$1\" ] && busybox tar -tvf \"\$TARBALL\" \$@ >/tmp/tarfiles
busybox tar -xvf \"\$TARBALL\" \$@ | awk '{
if ( \$3 != \"\" ){
   size[\$6]=\$3
   tot+=\$3
}else{   
   subtot+=size[\$1]
   printf \"%d\n\", 100 * subtot / tot
}}' /tmp/tarfiles - | Xdialog --title \"Archiver\" --wmclass gtkdialog2 --progress \"extracting \${TARBALL##*/}\" 0 0 -   
}
"> /tmp/ashfunc

tarfile="$1"

busybox tar -tvf "$tarfile" | busybox awk  'FS=" " {print $6"|"$3"|"$4"|"$5"|"$1}' > /tmp/tarfiles &

count=10
(while [ ! "$(ps | grep 'tar -tvf')" = "" ];do sleep 1; count=$(expr $count + 2); echo $count; done) | Xdialog --title "Archiver" --wmclass gtkdialog2 --progress "Loading archive - Please wait..." 0 0

export MAIN_DIALOG='
<wtitle>Archiver</wtitle>
<vbox>
   <hbox>
      <button><input file>/usr/share/mini-icons/pupget.xpm</input>
      <label>Extract</label>
      <action>targress '$tarfile' $TREE1</action>
      </button>
      <button cancel></button>
   </hbox>
   <table>
      <label>Filename|Size     |Date           |Time       |Permissions</label>
      <variable>TREE1</variable>
      <width>500</width><height>200</height>
      <input>cat /tmp/tarfiles</input>
   </table>
</vbox>'


gtkdialog1 --program=MAIN_DIALOG -i /tmp/ashfunc
snap0001.png
 Description   
 Filesize   82.09 KB
 Viewed   744 Time(s)

snap0001.png

Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4380

PostPosted: Wed 23 May 2012, 01:22    Post subject:  

@goingnuts thanks for the fixes, I guess I'll try to come up with some more examples ... probably some kind of universal daemon process that works with sit (my simple icon tray) to handle multiple tray applets in one process ... btw I could probably make something similar to sit for gtk1, but it would need to be swallowed by the tray... didn't you already do one using only xlib and xpm though?
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 2 [23 Posts]   Goto page: 1, 2 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1121s ][ Queries: 13 (0.0078s) ][ GZIP on ]