Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sun 20 Apr 2014, 19:28
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
[bash] variable persistence in piping
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 2 [18 Posts]   Goto page: 1, 2 Next
Author Message
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Wed 28 Jul 2010, 09:39    Post subject:  [bash] variable persistence in piping  

I'm learning a bit of shell scripting and tring to setup a simple Gmail checker script.

This is what I got reading here and there:

Code:

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom
rss=$(curl -su USER:PASS https://mail.google.com/mail/feed/atom)

#find <title> lines
lines=$(echo "$rss" | grep "$opentag")

#line number
lnum=0

#iterate each line
echo "$lines" | while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then
        echo "$lnum $line"
    fi
    (( lnum ++ ))
done

#show numbero of messagges
echo "$lnum messages"



the problem is the lnum variable I declare does not exists in the piped while loop neither gets updated outside so, at the end I get a "0 messagges"

The only solution I found is to save variables on files and avoid piping like this:

Code:

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom on temp file
wget curl -su USER:PASS https://mail.google.com/mail/feed/atom > /tmp/gmailrss

#find <title> lines
grep "$opentag" /tmp/gmailrss > /tmp/gmaillines

#line number
lnum=0

#iterate each line
while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then
        echo "$lnum $line"
    fi
    (( lnum ++ ))
done < /tmp/gmaillines

echo "$lnum messages"


I'm used to program in python and using two files instead of two vars is... well... I don't even start listing cons.

I hope there's a better way than piping to pass read and grep the content of my variables so a new subshell is not started and I can use my lnum counter.

I know this is a bash script forum related topic but I know there are a lot of good shell script writers here so why not? Rolling Eyes
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Wed 28 Jul 2010, 09:59    Post subject:  

This can be a workaround but I could need to know more from the loop I still run...

Code:

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom
rss=$(curl -su USER:PASS https://mail.google.com/mail/feed/atom)

#find <title> lines
lines=$(echo "$rss" | grep "$opentag")

#iterate each line
echo "$lines" | while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then echo "$lnum $line"
    else lnum=0
    fi
    (( lnum ++ ))
done

#show number of messagges
lnum=$(echo "$lines" | wc -l)
echo "Total $lnum messages"

Back to top
View user's profile Send private message 
ken geometrics

Joined: 23 Jan 2009
Posts: 76
Location: California

PostPosted: Wed 28 Jul 2010, 10:43    Post subject: Re: [bash] variable persistence in piping  

[quote="neurino"]I'm learning a bit of shell scripting and tring to setup a simple Gmail checker script.

This is what I got reading here and there:

Code:

echo "$lines" | while read -r line


This line causes bash to make a new shell and feed the output of the echo into its stdin. Try a "here-document" instead.

Code:

while real -l line ; do
echo "do stuff"
done <<XYZZY
$lines
XYZZY


Shouldn't cause a new shell. If you don't cause the new shell, you don't lose your variables.
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Wed 28 Jul 2010, 10:53    Post subject:  

pretty genial Cool

thank you
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4134

PostPosted: Wed 28 Jul 2010, 20:38    Post subject:  

you can set up each section of code as a function and use return to return parameters of your choosing or you can use export (probably not secure to do export with user and password though)
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Thu 29 Jul 2010, 03:47    Post subject:  

You're right but for what I needed in this case (just keeping variables insied and outside a loop) the use of functions would be overkill.
Just think if I had to change the value of 10 existing vars inside the loop: I should pass them to function and get them returned back...

A good solution for other cases tho... Idea thanks for answering
Back to top
View user's profile Send private message 
potong

Joined: 06 Mar 2009
Posts: 88

PostPosted: Thu 29 Jul 2010, 09:39    Post subject:  

I have found that bash scripts can be a lot simpler when you can arrange the data the way you want to end up using it.

Bash is the "glue" for the countless other command line tools.

For instance: why stop at grep, having retrieved information from the web?

Other little languages can do far more, think sed/awk/perl etc

If you are going to invoke a 2nd process, make it count!

Here's an alternative:
Code:
#!/bin/bash

# set some variables
authority="USER:PASSWORD@"
url="https://${authority}mail.google.com/mail/feed/atom"
db_file=/tmp/debug.txt
tag="title"

# set IFS to newline for slurping lines into arrays
IFS="
"
# save result of gmail messages into an array
# save results of curl into a debug file (drop when goes to production)
# use sed to drop first tag (line 3) and strip remaining ones
line=($(curl -s $url |tee $db_file |sed -nr '3d;s|<('$tag'>)(.*)</\1|\2|p'))

# show number of messages
echo "${#line[@]} messages"
# show messages
for (( i=0;i<${#line[@]};i++)){ printf "[%02d] ${line[i]}\n" $((i+1)); }


Also see here for more bash best practices.

HTH

Potong
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Thu 29 Jul 2010, 10:38    Post subject:  

Thank you, I'll have to get a closer look to sed and awk.

Since I'm trying to learn a bit of shell scripting I'd prefer to avoid other 'proper' languages like PERL and similar: this would be an easy task for me in python and I could use bash just to call py script but no learning bash this way ^^

Anyway the fact of using bash only as a glue is THE right way... and thanks for the very useful wiki you linked.

P.S.: your script needs a fix: outputs a new mail for each word in the subject:

Code:

# set IFS to newline for slurping lines into arrays
IFS=$'\n'


found googling around for IFS
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Thu 29 Jul 2010, 11:08    Post subject:  

Another thing I hate of shell scripting and his tools is space management: readability is awfull... morover for someone coming from python!

I had to reindent - space up your code to understand how it works.
Given that I had to reset otherwise it doesn't work... Confused

edit: no spaces but new lines work... Rolling Eyes sorry...

The way you suggested is way way simpler... here are names and links extracted the same way

Code:


#!/bin/bash

# set some variables
authority="USER:PASS@"
url="https://${authority}mail.google.com/mail/feed/atom"
db_file=/tmp/debug.txt

titletag="title"
nametag="name"
linktag="link"

# set IFS to newline for slurping lines into arrays
IFS=$'\n'

#get feed
feed=$(curl -s $url)
 
# save subjects into an array
titles=($(echo "$feed" | \
    sed -nr '
        #cut 3rd line
        3d
        #get only 2nd group ( \2 ) in regexp
        s|<('$titletag'>)(.*)</\1|\2|p
        ') \
    )

# save names into another array
names=($(echo "$feed" | \
    sed -nr '
        #get only 2nd group ( \2 ) in regexp
        s|<('$nametag'>)(.*)</\1|\2|p
        ') \
    )

# save links to conversations
links=($(echo "$feed" | \
    sed -nr '
        s|<'$linktag'.*href="(.*)".*/>|\1|p
        ') \
    )

# show number of messages
echo "${#titles[@]} messages"
# show messages
for (( i=0; i<${#titles[@]}; i++)){
    printf "[%02d] ${names[i]} | ${titles[i]}\n" $(( i + 1 ));
    printf "URL: ${links[i]}\n";
}


Last edited by neurino on Thu 29 Jul 2010, 12:03; edited 2 times in total
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Thu 29 Jul 2010, 11:34    Post subject:  

Just a question: sed works on a line basis?

If so there's no chance to expand the regexp to match only <title> tags enclosed in <entry> tags:

Code:

sed -nr 's|<entry>.*<('$titletag'>)(.*)</\1.*</entry>|\2|p'



P.S: Please forum admins switch code blocks to monospaced fonts! It's always on redability... come on it's not Linuxish!
Doesn't your terminal use that kind of font? I guess it's for a precise reason... Rolling Eyes


sorry for the big font
Back to top
View user's profile Send private message 
potong

Joined: 06 Mar 2009
Posts: 88

PostPosted: Fri 30 Jul 2010, 02:51    Post subject:  

neurino:

If you look at the xml file provided by google via the curl command you will see that each <entry>/ </entry> tags are on separate lines. So....
Code:
tag1=entry tag2=title
line=($(curl -s $url |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))

should fit the bill

However if you have a not so nicely formatted xml file try:
Code:
tag1=entry tag2=title
line=($(curl -s $url |xmllint --format - |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))

A good sed tutorial can be found here.

HTH

Potong

p.s. a tip to grab code from the browser is to select it then switch to a terminal and type:
Code:
xclip -o>filename && chmod +x filename && ./filename

of course make sure the code is not malicious first!
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Fri 30 Jul 2010, 11:05    Post subject:  

potong wrote:
neurino:

If you look at the xml file provided by google via the curl command you will see that each <entry>/ </entry> tags are on separate lines. So....
Code:
tag1=entry tag2=title
line=($(curl -s $url |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))

should fit the bill



It's going handy since the "name" tag is not used only for mail author but also for "contributors" so I need a way to parse xml someway...
Back to top
View user's profile Send private message 
ljfr

Joined: 23 Apr 2009
Posts: 176

PostPosted: Fri 30 Jul 2010, 14:00    Post subject: xmllint shell  

Hi,

To do basic parsing you could have a look at xmllint shell:
Code:
echo "cat //contributors/name" | xmllint $my_xml_file_path --shell

...but results would need some formating,
or you could use a small xml parser build using libxml2 (find an example attached),

regards,
xmltool.c.tar.bz2
Description 
bz2

 Download 
Filename  xmltool.c.tar.bz2 
Filesize  2.87 KB 
Downloaded  242 Time(s) 
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4134

PostPosted: Fri 30 Jul 2010, 18:26    Post subject:  

you may also be able to use sed. See the script here:
http://www.dotkam.com/2007/04/04/sed-to-parse-and-modify-xml-element-nodes/

Edit
for multiple parameters within the same tag you'd need another function that does something like

Code:
my_untested_function(){
OUT=""
for x in $@; do x=`echo $x |grep $PARAM |sed "s/=*/=$NEWVALUE/g"` && OUT=$OUT $x; done
return $OUT
}


where $@ (input) is the entire contents of the tag, PARAM is the field before the "=" (name, etc...) and $NEWVALUE is the new value that you want to assign to the field

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
neurino


Joined: 15 Oct 2009
Posts: 360

PostPosted: Sat 31 Jul 2010, 08:43    Post subject:  

Thank you all, reading sed tutorial link above I found how to filter tags according to the precedent line (like only <name> tags after <author> ones) using N parameter, also matching empty subjects / author names that cause wrong array splits.

edit, here is the code, if someone wants to test it:

Code:

#!/bin/bash

# set some variables
authority="USER:PASSWD@"
url="https://${authority}mail.google.com/mail/feed/atom"

# set IFS to newline for slurping lines into arrays
IFS=$'\n'

#get feed
feed=$(curl -s $url | tee $db_file)
 
#<author>=><name>
authors=($(echo "$feed" | \
    sed -nr '\|<author>| {
            N
            \|<author>.*<name>.*</name>| {
            #in case of NOT empty tag
            s|<author>.*<name>(.+)</name>|\1| p
            #in case of empty tag
            s|<author>.*<name></name>|--| p
        }
    }'
))

#<entry>=><title>
subjects=($(echo "$feed" | \
    sed -nr '\|<entry>| {
            N
            \|<entry>.*<title>.*</title>| {
            #in case of NOT empty tag
            s|<entry>.*<title>(.+)</title>|\1| p
            #in case of empty tag
            s|<entry>.*<title></title>|no subject| p
        }
    }'
))

#<summary>=><link>
links=($(echo "$feed" | \
    sed -nr '\|<summary>| {
            N
            \|<summary>.*<link.*/>| {
            #([^"]*) instead of .* is sed way for non greedy matches
            s|<summary>.*<link.*href="([^"]*)".*/>|\1| p
        }
    }'
))

# show number of messages
echo "${#authors[@]} messages"
# show messages
for (( i=0; i<${#authors[@]}; i++)){
    printf "[%02d] ${authors[i]} | ${subjects[i]}\n" $(( i + 1 ));
    printf "URL: ${links[i]}\n\n";
}
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 2 [18 Posts]   Goto page: 1, 2 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0970s ][ Queries: 13 (0.0088s) ][ GZIP on ]