[bash] variable persistence in piping

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
potong
Posts: 88
Joined: Fri 06 Mar 2009, 04:01

#16 Post by potong »

neurino:

It's looking good!

Perhaps a bit of refactoring with regards to the three calls to sed;

Code: Select all

#!/bin/bash

# set some variables
authority="USER:PASS@"
url="https://${authority}mail.google.com/mail/feed/atom"
db_file=/tmp/db.txt

# set IFS to newline for slurping lines into arrays
OIFS="$IFS" IFS=$'\n'

# get messages
message=($(curl -s $url | tee $db_file| sed -nr '
    #
    # each message should resemble this: title@@link@@name
    #
    # only interested in tags between entries
    /<entry>/,/<\/entry>/{ 
        # bodge link tag to look like title or name
        s/<link.*href="([^"]*)".*\/>/<link>\1<\/link>/ 
        # extract details and add place holder
        s/<(title|link|name)>(.*)<\/\1>/\2@@/ 
        # save lines with a placeholder into HS
        /@@/ H 
        # end of an entry: process HS
        /<\/entry>/ { 
            # overwrite PS with HS
            g
            # drop newlines and final placeholder
            s/amp;|\n|@@$//gp
            # clear PS
            s/.*//
            # swap empty PS into HS
            x 
        }
    }'))

# reset IFS (be kind to the maintainer!)
IFS="$OIFS"
 
# show number of messages
echo "${#message[@]} messages"
# show messages
for (( i=0; i<${#message[@]}; i++)){
    msg=${message[i]} tmp=${msg#*@@}
    author=${msg##*@@} subject=${msg%%@@*} link=${tmp%@@*}
    printf "[%02d] %s | %s\nURL: %s\n\n" $(( i + 1 )) ${author:-"no author"} \
            ${subject:-"no subject"} ${link:-"no link"}
} 
After all, the less code you write, hopefully the less bugs you have to fix.

The definitive sed site is here
Check out the lookup table method here

HTH

Potong

p.s. you often see

Code: Select all

echo "$some_variable"| whatever ....
it can usually be replaced by a here-string

Code: Select all

whatever... <<<"$some_variable"
p.p.s PS and HS stand for Pattern Space and Hold Space in sed

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#17 Post by neurino »

Uh-oh a lot to read and understand!
I'm copying your code in geany in order to see how it works.

About your p.p.s.:

Code: Select all

whatever... <<<"$some_variable"
I'll use this now on, I never liked the idea of using echo that way (morover being used to pythonic coding...) but it was way better than stroring strings in files...

User avatar
neurino
Posts: 362
Joined: Thu 15 Oct 2009, 13:08

#18 Post by neurino »

Ok, I almost got it.

since the feed reports all email thread contributors the <entry> may look like this:

Code: Select all

...
<entry>
<title>mail title</title>
<summary>mail summary blah blah</summary>
<link rel="alternate" href="http://mail.google.com/mail?account_id..." type="text/html" />
<modified>2010-07-31T20:00:46Z</modified>
<issued>2010-07-31T20:00:46Z</issued>
<id>tag:gmail.google.com,2004:1342813184987251</id>
<author>
<name>author name</name>
<email>author@email.com</email>
</author>
<contributor>
<name>contrib 1 name</name>
<email>contrib1@hotmail.com</email>
</contributor>
<contributor>
<name>contrib 2</name>
<email>contrib 2@libero.it</email>
</contributor>
</entry>
...
I added a rule to rename contributors <name> tag to <cname> to leave the rest of the code unchanged. Otherwise we could end with different lenght arrays

Also added quotes to printf parameters since it broke with multiple words, don't know if there's a better/safer way.
:D

Code: Select all

#!/bin/bash 

# set some variables 
authority="USER:PASS@" 
url="https://${authority}mail.google.com/mail/feed/atom" 
db_file=/tmp/db.txt 

# set IFS to newline for slurping lines into arrays 
OIFS="$IFS" IFS=$'\n' 

# get messages 
message=($(curl -s $url | tee $db_file| sed -nr ' 
    # 
    # each message should resemble this: title@@link@@name 
    # 
    # only interested in tags between entries 
    /<entry>/,/<\/entry>/{ 
        #rename (we may need it after) contributor <name> to NOT match author <name>
        /<contributor>/,/<\/contributor>/{
                s/<(\/?)name>/<\1cname>/
        }
        # bodge link tag to look like title or name 
        s/<link.*href="([^"]*)".*\/>/<link>\1<\/link>/
        # extract details and add place holder 
        s/<(title|link|name)>(.*)<\/\1>/\2@@/ 
        # save lines with a placeholder into HS 
        /@@/ H 
        # end of an entry: process HS 
        /<\/entry>/ { 
            # overwrite PS with HS 
            g 
            # drop newlines and final placeholder 
            s/amp;|\n|@@$//gp 
            # clear PS 
            s/.*// 
            # swap empty PS into HS 
            x 
        } 
    }')) 

# reset IFS (be kind to the maintainer!) 
IFS="$OIFS" 
  
# show number of messages 
echo "${#message[@]} messages" 
# show messages 
for (( i=0; i<${#message[@]}; i++)){
    #~ echo ${message[i]}
    msg=${message[i]} tmp=${msg#*@@} 
    author=${msg##*@@} subject=${msg%%@@*} link=${tmp%@@*} 
    printf "[%02d] %s | %s\nURL: %s\n\n" $(( i + 1 )) "${author:-"no author"}" \
        "${subject:-"no subject"}" "${link:-"no link"}" 
}
P.S.: there's some easier way to do what in python is tuple unpacking?

Something like:

Code: Select all

>>> title, name, summary, link = ('the title', 'the name', 'the summary', 'http://mail.google.com/mail?account_id...')
but for bash

Code: Select all

# title, name, summary, link = ("the title@@the name@@the summary@@http://mail.google.com/mail?account_id...")
:?: :?:

Thanks for teaching!

Post Reply