[bash] variable persistence in piping

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

[bash] variable persistence in piping

#1 Post by neurino »

I'm learning a bit of shell scripting and tring to setup a simple Gmail checker script.

This is what I got reading here and there:

Code: Select all

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom
rss=$(curl -su USER:PASS https://mail.google.com/mail/feed/atom)

#find <title> lines
lines=$(echo "$rss" | grep "$opentag")

#line number
lnum=0

#iterate each line
echo "$lines" | while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then
        echo "$lnum $line"
    fi
    (( lnum ++ ))
done

#show numbero of messagges
echo "$lnum messages"

the problem is the lnum variable I declare does not exists in the piped while loop neither gets updated outside so, at the end I get a "0 messagges"

The only solution I found is to save variables on files and avoid piping like this:

Code: Select all

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom on temp file
wget curl -su USER:PASS https://mail.google.com/mail/feed/atom > /tmp/gmailrss

#find <title> lines
grep "$opentag" /tmp/gmailrss > /tmp/gmaillines

#line number
lnum=0

#iterate each line
while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then
        echo "$lnum $line"
    fi
    (( lnum ++ ))
done < /tmp/gmaillines

echo "$lnum messages"
I'm used to program in python and using two files instead of two vars is... well... I don't even start listing cons.

I hope there's a better way than piping to pass read and grep the content of my variables so a new subshell is not started and I can use my lnum counter.

I know this is a bash script forum related topic but I know there are a lot of good shell script writers here so why not? :roll:

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#2 Post by neurino »

This can be a workaround but I could need to know more from the loop I still run...

Code: Select all

#!/bin/bash

opentag=\<title\>
closetag=\</title\>

#get Gmail rss atom
rss=$(curl -su USER:PASS https://mail.google.com/mail/feed/atom)

#find <title> lines
lines=$(echo "$rss" | grep "$opentag")

#iterate each line
echo "$lines" | while read -r line
do
    #strip open tag
    line=${line#*$opentag}
    #strip close tag
    line=${line%$closetag*}
    #jump first occourence since it's whole XML title
    if (( lnum )); then echo "$lnum $line"
    else lnum=0
    fi
    (( lnum ++ ))
done

#show number of messagges
lnum=$(echo "$lines" | wc -l)
echo "Total $lnum messages"


ken geometrics
Posts: 76
Joined: Fri 23 Jan 2009, 14:59
Location: California

Re: [bash] variable persistence in piping

#3 Post by ken geometrics »

[quote="neurino"]I'm learning a bit of shell scripting and tring to setup a simple Gmail checker script.

This is what I got reading here and there:

Code: Select all

echo "$lines" | while read -r line
This line causes bash to make a new shell and feed the output of the echo into its stdin. Try a "here-document" instead.

Code: Select all

while real -l line ; do
echo "do stuff"
done <<XYZZY
$lines
XYZZY
Shouldn't cause a new shell. If you don't cause the new shell, you don't lose your variables.

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#4 Post by neurino »

pretty genial 8)

thank you

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#5 Post by technosaurus »

you can set up each section of code as a function and use return to return parameters of your choosing or you can use export (probably not secure to do export with user and password though)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#6 Post by neurino »

You're right but for what I needed in this case (just keeping variables insied and outside a loop) the use of functions would be overkill.
Just think if I had to change the value of 10 existing vars inside the loop: I should pass them to function and get them returned back...

A good solution for other cases tho... :idea: thanks for answering

potong
Posts: 88
Joined: Fri 06 Mar 2009, 04:01

#7 Post by potong »

I have found that bash scripts can be a lot simpler when you can arrange the data the way you want to end up using it.

Bash is the "glue" for the countless other command line tools.

For instance: why stop at grep, having retrieved information from the web?

Other little languages can do far more, think sed/awk/perl etc

If you are going to invoke a 2nd process, make it count!

Here's an alternative:

Code: Select all

#!/bin/bash

# set some variables
authority="USER:PASSWORD@"
url="https://${authority}mail.google.com/mail/feed/atom"
db_file=/tmp/debug.txt
tag="title"

# set IFS to newline for slurping lines into arrays
IFS="
"
# save result of gmail messages into an array
# save results of curl into a debug file (drop when goes to production)
# use sed to drop first tag (line 3) and strip remaining ones
line=($(curl -s $url |tee $db_file |sed -nr '3d;s|<('$tag'>)(.*)</\1|\2|p'))

# show number of messages
echo "${#line[@]} messages" 
# show messages
for (( i=0;i<${#line[@]};i++)){ printf "[%02d] ${line[i]}\n" $((i+1)); }
Also see here for more bash best practices.

HTH

Potong

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#8 Post by neurino »

Thank you, I'll have to get a closer look to sed and awk.

Since I'm trying to learn a bit of shell scripting I'd prefer to avoid other 'proper' languages like PERL and similar: this would be an easy task for me in python and I could use bash just to call py script but no learning bash this way ^^

Anyway the fact of using bash only as a glue is THE right way... and thanks for the very useful wiki you linked.

P.S.: your script needs a fix: outputs a new mail for each word in the subject:

Code: Select all

# set IFS to newline for slurping lines into arrays 
IFS=$'\n'
found googling around for IFS

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#9 Post by neurino »

Another thing I hate of shell scripting and his tools is space management: readability is awfull... morover for someone coming from python!

I had to reindent - space up your code to understand how it works.
Given that I had to reset otherwise it doesn't work... :?

edit: no spaces but new lines work... :roll: sorry...

The way you suggested is way way simpler... here are names and links extracted the same way

Code: Select all


#!/bin/bash 

# set some variables 
authority="USER:PASS@" 
url="https://${authority}mail.google.com/mail/feed/atom" 
db_file=/tmp/debug.txt 

titletag="title" 
nametag="name"
linktag="link"

# set IFS to newline for slurping lines into arrays 
IFS=$'\n'

#get feed
feed=$(curl -s $url)
 
# save subjects into an array 
titles=($(echo "$feed" | \
    sed -nr '
        #cut 3rd line
        3d
        #get only 2nd group ( \2 ) in regexp
        s|<('$titletag'>)(.*)</\1|\2|p 
        ') \
    )

# save names into another array
names=($(echo "$feed" | \
    sed -nr '
        #get only 2nd group ( \2 ) in regexp
        s|<('$nametag'>)(.*)</\1|\2|p 
        ') \
    )

# save links to conversations
links=($(echo "$feed" | \
    sed -nr '
        s|<'$linktag'.*href="(.*)".*/>|\1|p 
        ') \
    )

# show number of messages 
echo "${#titles[@]} messages" 
# show messages 
for (( i=0; i<${#titles[@]}; i++)){
    printf "[%02d] ${names[i]} | ${titles[i]}\n" $(( i + 1 ));
    printf "URL: ${links[i]}\n";
}

Last edited by neurino on Thu 29 Jul 2010, 16:03, edited 2 times in total.

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#10 Post by neurino »

Just a question: sed works on a line basis?

If so there's no chance to expand the regexp to match only <title> tags enclosed in <entry> tags:

Code: Select all

sed -nr 's|<entry>.*<('$titletag'>)(.*)</\1.*</entry>|\2|p'

P.S: Please forum admins switch code blocks to monospaced fonts! It's always on redability... come on it's not Linuxish!
Doesn't your terminal use that kind of font? I guess it's for a precise reason... :roll:


sorry for the big font

potong
Posts: 88
Joined: Fri 06 Mar 2009, 04:01

#11 Post by potong »

neurino:

If you look at the xml file provided by google via the curl command you will see that each <entry>/ </entry> tags are on separate lines. So....

Code: Select all

tag1=entry tag2=title
line=($(curl -s $url |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))
should fit the bill

However if you have a not so nicely formatted xml file try:

Code: Select all

tag1=entry tag2=title
line=($(curl -s $url |xmllint --format - |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))
A good sed tutorial can be found here.

HTH

Potong

p.s. a tip to grab code from the browser is to select it then switch to a terminal and type:

Code: Select all

xclip -o>filename && chmod +x filename && ./filename
of course make sure the code is not malicious first!

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#12 Post by neurino »

potong wrote:neurino:

If you look at the xml file provided by google via the curl command you will see that each <entry>/ </entry> tags are on separate lines. So....

Code: Select all

tag1=entry tag2=title
line=($(curl -s $url |tee $db_file |sed -nr '/<'$tag1'>/,/<\/'$tag1'>/s|<('$tag2'>)(.*)</\1|\2|p'))
should fit the bill
It's going handy since the "name" tag is not used only for mail author but also for "contributors" so I need a way to parse xml someway...

ljfr
Posts: 176
Joined: Thu 23 Apr 2009, 08:35

xmllint shell

#13 Post by ljfr »

Hi,

To do basic parsing you could have a look at xmllint shell:

Code: Select all

echo "cat //contributors/name" | xmllint $my_xml_file_path --shell
...but results would need some formating,
or you could use a small xml parser build using libxml2 (find an example attached),

regards,
Attachments
xmltool.c.tar.bz2
(2.87 KiB) Downloaded 419 times

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#14 Post by technosaurus »

you may also be able to use sed. See the script here:
http://www.dotkam.com/2007/04/04/sed-to ... ent-nodes/

Edit
for multiple parameters within the same tag you'd need another function that does something like

Code: Select all

my_untested_function(){
OUT=""
for x in $@; do x=`echo $x |grep $PARAM |sed "s/=*/=$NEWVALUE/g"` && OUT=$OUT $x; done
return $OUT
}
where $@ (input) is the entire contents of the tag, PARAM is the field before the "=" (name, etc...) and $NEWVALUE is the new value that you want to assign to the field
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#15 Post by neurino »

Thank you all, reading sed tutorial link above I found how to filter tags according to the precedent line (like only <name> tags after <author> ones) using N parameter, also matching empty subjects / author names that cause wrong array splits.

edit, here is the code, if someone wants to test it:

Code: Select all

#!/bin/bash 

# set some variables 
authority="USER:PASSWD@" 
url="https://${authority}mail.google.com/mail/feed/atom" 

# set IFS to newline for slurping lines into arrays 
IFS=$'\n'

#get feed
feed=$(curl -s $url | tee $db_file)
 
#<author>=><name>
authors=($(echo "$feed" | \
    sed -nr '\|<author>| {
            N
            \|<author>.*<name>.*</name>| {
            #in case of NOT empty tag
            s|<author>.*<name>(.+)</name>|\1| p
            #in case of empty tag
            s|<author>.*<name></name>|--| p
        }
    }'
))

#<entry>=><title>
subjects=($(echo "$feed" | \
    sed -nr '\|<entry>| {
            N
            \|<entry>.*<title>.*</title>| {
            #in case of NOT empty tag
            s|<entry>.*<title>(.+)</title>|\1| p
            #in case of empty tag
            s|<entry>.*<title></title>|no subject| p
        }
    }'
))

#<summary>=><link>
links=($(echo "$feed" | \
    sed -nr '\|<summary>| {
            N
            \|<summary>.*<link.*/>| {
            #([^"]*) instead of .* is sed way for non greedy matches
            s|<summary>.*<link.*href="([^"]*)".*/>|\1| p
        }
    }'
))

# show number of messages 
echo "${#authors[@]} messages" 
# show messages 
for (( i=0; i<${#authors[@]}; i++)){
    printf "[%02d] ${authors[i]} | ${subjects[i]}\n" $(( i + 1 ));
    printf "URL: ${links[i]}\n\n";
}

potong
Posts: 88
Joined: Fri 06 Mar 2009, 04:01

#16 Post by potong »

neurino:

It's looking good!

Perhaps a bit of refactoring with regards to the three calls to sed;

Code: Select all

#!/bin/bash

# set some variables
authority="USER:PASS@"
url="https://${authority}mail.google.com/mail/feed/atom"
db_file=/tmp/db.txt

# set IFS to newline for slurping lines into arrays
OIFS="$IFS" IFS=$'\n'

# get messages
message=($(curl -s $url | tee $db_file| sed -nr '
    #
    # each message should resemble this: title@@link@@name
    #
    # only interested in tags between entries
    /<entry>/,/<\/entry>/{ 
        # bodge link tag to look like title or name
        s/<link.*href="([^"]*)".*\/>/<link>\1<\/link>/ 
        # extract details and add place holder
        s/<(title|link|name)>(.*)<\/\1>/\2@@/ 
        # save lines with a placeholder into HS
        /@@/ H 
        # end of an entry: process HS
        /<\/entry>/ { 
            # overwrite PS with HS
            g
            # drop newlines and final placeholder
            s/amp;|\n|@@$//gp
            # clear PS
            s/.*//
            # swap empty PS into HS
            x 
        }
    }'))

# reset IFS (be kind to the maintainer!)
IFS="$OIFS"
 
# show number of messages
echo "${#message[@]} messages"
# show messages
for (( i=0; i<${#message[@]}; i++)){
    msg=${message[i]} tmp=${msg#*@@}
    author=${msg##*@@} subject=${msg%%@@*} link=${tmp%@@*}
    printf "[%02d] %s | %s\nURL: %s\n\n" $(( i + 1 )) ${author:-"no author"} \
            ${subject:-"no subject"} ${link:-"no link"}
} 
After all, the less code you write, hopefully the less bugs you have to fix.

The definitive sed site is here
Check out the lookup table method here

HTH

Potong

p.s. you often see

Code: Select all

echo "$some_variable"| whatever ....
it can usually be replaced by a here-string

Code: Select all

whatever... <<<"$some_variable"
p.p.s PS and HS stand for Pattern Space and Hold Space in sed

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#17 Post by neurino »

Uh-oh a lot to read and understand!
I'm copying your code in geany in order to see how it works.

About your p.p.s.:

Code: Select all

whatever... <<<"$some_variable"
I'll use this now on, I never liked the idea of using echo that way (morover being used to pythonic coding...) but it was way better than stroring strings in files...

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#18 Post by neurino »

Ok, I almost got it.

since the feed reports all email thread contributors the <entry> may look like this:

Code: Select all

...
<entry>
<title>mail title</title>
<summary>mail summary blah blah</summary>
<link rel="alternate" href="http://mail.google.com/mail?account_id..." type="text/html" />
<modified>2010-07-31T20:00:46Z</modified>
<issued>2010-07-31T20:00:46Z</issued>
<id>tag:gmail.google.com,2004:1342813184987251</id>
<author>
<name>author name</name>
<email>author@email.com</email>
</author>
<contributor>
<name>contrib 1 name</name>
<email>contrib1@hotmail.com</email>
</contributor>
<contributor>
<name>contrib 2</name>
<email>contrib 2@libero.it</email>
</contributor>
</entry>
...
I added a rule to rename contributors <name> tag to <cname> to leave the rest of the code unchanged. Otherwise we could end with different lenght arrays

Also added quotes to printf parameters since it broke with multiple words, don't know if there's a better/safer way.
:D

Code: Select all

#!/bin/bash 

# set some variables 
authority="USER:PASS@" 
url="https://${authority}mail.google.com/mail/feed/atom" 
db_file=/tmp/db.txt 

# set IFS to newline for slurping lines into arrays 
OIFS="$IFS" IFS=$'\n' 

# get messages 
message=($(curl -s $url | tee $db_file| sed -nr ' 
    # 
    # each message should resemble this: title@@link@@name 
    # 
    # only interested in tags between entries 
    /<entry>/,/<\/entry>/{ 
        #rename (we may need it after) contributor <name> to NOT match author <name>
        /<contributor>/,/<\/contributor>/{
                s/<(\/?)name>/<\1cname>/
        }
        # bodge link tag to look like title or name 
        s/<link.*href="([^"]*)".*\/>/<link>\1<\/link>/
        # extract details and add place holder 
        s/<(title|link|name)>(.*)<\/\1>/\2@@/ 
        # save lines with a placeholder into HS 
        /@@/ H 
        # end of an entry: process HS 
        /<\/entry>/ { 
            # overwrite PS with HS 
            g 
            # drop newlines and final placeholder 
            s/amp;|\n|@@$//gp 
            # clear PS 
            s/.*// 
            # swap empty PS into HS 
            x 
        } 
    }')) 

# reset IFS (be kind to the maintainer!) 
IFS="$OIFS" 
  
# show number of messages 
echo "${#message[@]} messages" 
# show messages 
for (( i=0; i<${#message[@]}; i++)){
    #~ echo ${message[i]}
    msg=${message[i]} tmp=${msg#*@@} 
    author=${msg##*@@} subject=${msg%%@@*} link=${tmp%@@*} 
    printf "[%02d] %s | %s\nURL: %s\n\n" $(( i + 1 )) "${author:-"no author"}" \
        "${subject:-"no subject"}" "${link:-"no link"}" 
}
P.S.: there's some easier way to do what in python is tuple unpacking?

Something like:

Code: Select all

>>> title, name, summary, link = ('the title', 'the name', 'the summary', 'http://mail.google.com/mail?account_id...')
but for bash

Code: Select all

# title, name, summary, link = ("the title@@the name@@the summary@@http://mail.google.com/mail?account_id...")
:?: :?:

Thanks for teaching!

Post Reply