Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 01 Sep 2014, 08:31
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
how to grep multiple lines into 1? [SOLVED]
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [9 Posts]  
Author Message
sc0ttman


Joined: 16 Sep 2009
Posts: 2376
Location: UK

PostPosted: Tue 19 Feb 2013, 22:05    Post subject:  how to grep multiple lines into 1? [SOLVED]
Subject description: whats the easiest way? (see last few posts)
 

I want to get various fields from an xml file, and output them on a single line, and repeat for each item in the xml list... example, i want only name, url, genre:

file:

Code:
<item01>
<name>01n</name>
<url>01u</url>
<notinterested>01ni</notinterested>
<genre>01g</genre>
<unneeded>01un</unneeded>
</item01>

<item02>
<name>02n</name>
<url>02u</url>
<notinterested>02ni</notinterested>
<genre>02g</genre>
<unneeded>02un</unneeded>
</item02>

...

and all I want in my output file is `name|url|genre` of each item, on a new line:

01n|01u|01g
02n|02u|02g
...

I cant seem to write a fast way of doing this... But then I am crap at this sort of thing...

EDIT:: SOLVED!

_________________
Akita Linux, VLC-GTK, Pup Search, Pup File Search

Last edited by sc0ttman on Fri 22 Feb 2013, 09:32; edited 1 time in total
Back to top
View user's profile Send private message 
Ibidem

Joined: 25 May 2010
Posts: 484
Location: State of Jefferson

PostPosted: Tue 19 Feb 2013, 22:33    Post subject:  

grep won't cut it: it doesn't transform text.
Off the top of my head:
Code:
#!/bin/sh
while read line
  do
     case $line in
#This is for every tag before the last interesting one:
#outputs "content|"
       (*<name>* |*<url>*)
        echo -n "$line" |sed -e 's/.*>\(..*\)<.*/\1|/g'
        ;;
#This is for the last tag of interest:
#outputs "content\n"
       (*<genre>*)
        echo "$line" |sed -e 's/.*>\(..*\)<.*/\1/g'
        ;;
       (*) ;;
   esac
  done

This assumes that for every tag you're interested in, both opening and closing are on the same line.
You need to redirect the input to the XML file:
Code:
example.sh < sample.xml
#or
cat sample.xml | example.sh

No idea how fast it is, though.

Last edited by Ibidem on Thu 21 Feb 2013, 21:24; edited 1 time in total
Back to top
View user's profile Send private message 
amigo

Joined: 02 Apr 2007
Posts: 2238

PostPosted: Wed 20 Feb 2013, 03:42    Post subject:  

If the xml files are small and you are *sure* that the xml is one-line-one-tag clean, then some variation of Ibidem's solution will work. Otherwise, you'll need to use a good xml-parser to retrieve tag values.
Back to top
View user's profile Send private message 
jamesbond

Joined: 26 Feb 2007
Posts: 2134
Location: The Blue Marble

PostPosted: Wed 20 Feb 2013, 10:38    Post subject:  

Code:
#!/bin/dash
{ echo "<root>"; cat -; echo "</root>"; } | xml2 | awk -F= '
$1~/name$/ || $1~/url$/ { printf $2 "|"}
$1~/genre$/ { print $2 }'

Get xml2 from www.ofb.net/~egnor/xml2/

Timing on my lousy machine for 1million records:
Code:
 time ./x.sh < y.xml  > /dev/null

real   0m21.785s
user   0m34.710s
sys   0m2.857s


where y.xml is this
Code:
<item01>
<name>01n</name>
<url>http://host1.com/url1</url>
<notinterested>01ni</notinterested>
<genre>01g</genre>
<unneeded>01un</unneeded>
</item01>

<item02>
<name>02n</name>
<url>http://host2.com/url2</url>
<notinterested>02ni</notinterested>
<genre>02g</genre>
<unneeded>02un</unneeded>
</item02>
repeated 500thousand times (thus a million records).

Output:
Code:
01n|http://host1.com/url1|01g
02n|http://host2.com/url2|02g


Limitation: you'd better make sure that all the fields you want to extract doesn't contain equal sign (=) or things will break.

_________________
Fatdog64, Slacko and Puppeee user. Puppy user since 2.13.
Contributed Fatdog64 packages thread
Back to top
View user's profile Send private message 
seaside

Joined: 11 Apr 2007
Posts: 886

PostPosted: Wed 20 Feb 2013, 21:18    Post subject:  

sc0ttman,

As everyone mentioned, this could be simple if you can count on the xml being in the format you've shown.

Code:

while read line
do
case $line in
*name*) nline=${line%<*}  name=${nline#*>}   name="$name|"                 ;;
*url*)   nline=${line%<*} url=${nline#*>}    url="$url|"                ;;
*genre*) nline=${line%<*} genre=${nline#*>}   
echo $name$url$genre >>pipe-sep-file                   ;;
esac
done <xmlfile


Regards,
s
Back to top
View user's profile Send private message 
amigo

Joined: 02 Apr 2007
Posts: 2238

PostPosted: Thu 21 Feb 2013, 06:09    Post subject:  

Here's a more complete example:
Code:
#!/bin/bash

while read LINE ; do
   # if both NAME and GENRE are set, then the  output is ready
   # otherwise, we are just beinning or still composing output
   case $NAME in
      '') : ;;
      *)   if [[ $GENRE ]] ; then
            echo "$NAME|$URL|$GENRE"
            NAME= URL= GENRE=
         fi
      ;;
   esac
   
   case $LINE in
      '<name'*) NAME=${LINE#*>} ; NAME=${NAME%%<*}
      ;;
      
      '<url'*) URL=${LINE#*>} ; URL=${URL%%<*}
      ;;
      '<genre'*) GENRE=${LINE#*>} ; GENRE=${GENRE%%<*}
      ;;
   esac
done <test.xml
Back to top
View user's profile Send private message 
sc0ttman


Joined: 16 Sep 2009
Posts: 2376
Location: UK

PostPosted: Thu 21 Feb 2013, 19:44    Post subject:  

Thanks for all the responses, so far I've gone with the snippet that's easiest to understand, being the lazy soul I am... I used amigos snippet, fitted my purposes most closely, and so far i have this (it'll end up in vlc-gtk when it's done):

Code:
#!/bin/bash
get_icecasts () {
   rm /tmp/icecastlist
   rm /tmp/icecast.xml
   wget -4 -O /tmp/icecast.xml "http://dir.xiph.org/yp.xml"
   if [ -f /tmp/icecast.xml ];then
      while read LINE ; do
         # if both NAME and GENRE are set, then the  output is ready
         # otherwise, we are just beinning or still composing output
         case $NAME in
           '') : ;;
           *) if [[ $GENRE ]];then
               echo "IceCast Radio ($GENRE): $NAME|$URL" >> /tmp/icecastlist
               NAME= URL= GENRE=
             fi
           ;;
         esac
         case $LINE in
           '<server_name'*) NAME="${LINE#*>}" ; NAME="${NAME%%<*}" ;;
           '<listen_url'*) URL="${LINE#*>}" ; URL="${URL%%<*}" ;;
           '<genre'*) GENRE="${LINE#*>}" ; GENRE="${GENRE%%<*}" ;;
         esac
      done </tmp/icecast.xml
      icecastlist="`cat /tmp/icecastlist | sort | sed 's/&\#039;//g' | uniq`" #clean up a bit
      echo "$icecastlist"
      rm /tmp/icecast.xml
   fi
}
LIST="`get_icecasts`"
echo "$LIST"
(note the sed stuff actually has no backslash, but i added one to show i will be removing the actual html entity)

Anyway.. ideally, just cos I think I can get away with it, I wanna be able to remove the need to write to /tmp/icecastlist, just write to a variable, while retaining the `sort | sed` stuff ... Some thing like, changing:

Code:
echo "IceCast Radio ($GENRE): $NAME|$URL" >> /tmp/icecastlist
...
LIST=`cat /tmp/icecastlist | sort | ...`


to

Code:
LIST="$LIST
IceCast Radio ($GENRE): $NAME|$URL"
...
LIST_CLEANED=`echo "$LIST" | sort | ...`



I tried various things that I expected to work (I can see it shouldn't be hard!) but even that has stumped me - i get a frozen script, no output..

_________________
Akita Linux, VLC-GTK, Pup Search, Pup File Search
Back to top
View user's profile Send private message 
amigo

Joined: 02 Apr 2007
Posts: 2238

PostPosted: Fri 22 Feb 2013, 07:05    Post subject:  

Everything should be as simple as possible, but no simpler than it is.

Code:
get_icecasts () {
while read LINE ; do
   # if both NAME and GENRE are set, then the  output is ready
   # otherwise, we are just beinning or still composing output
   case $NAME in
      '') : ;;
      *)   if [[ $GENRE ]] ; then
            #echo "$NAME|$URL|$GENRE"
            echo "IceCast Radio ($GENRE): $NAME|$URL"
            NAME= URL= GENRE=
         fi
      ;;
   esac
   
   case $LINE in
      '<name'*) NAME=${LINE#*>} ; NAME=${NAME%%<*}
      ;;
      
      '<url'*) URL=${LINE#*>} ; URL=${URL%%<*}
      ;;
      '<genre'*) GENRE=${LINE#*>} ; GENRE=${GENRE%%<*}
      ;;
   esac
done <test.xml
}

#get_icecasts
get_icecasts | sort | sed 's/&\#039;//g' | uniq



Although, do you really need to sort and uniq it? Are the entries in the input file not in order and have duplicates?
Why do you need echo? As above it already echos it. And if you need to output it to a file or other program simply pipe or redirect it.
Back to top
View user's profile Send private message 
sc0ttman


Joined: 16 Sep 2009
Posts: 2376
Location: UK

PostPosted: Fri 22 Feb 2013, 09:31    Post subject:  

Thanks amigo, yep that'll do it... I do need to sort it, as I want genres grouped together.. but I didn't need uniq, force of habit..

The last echo was just so I can test the output... there was another stray one in there as well..

I'll mark the thread SOLVED. Thanks everyone.

Just for completeness:

Code:
get_icecasts () {
   wget -4 -O /tmp/icecast.xml "http://dir.xiph.org/yp.xml"
   while read LINE ; do
      # if both NAME and GENRE are set, then the  output is ready
      # otherwise, we are just beinning or still composing output
      case $NAME in
        '') : ;;
        *)   if [[ $GENRE ]] ; then
            #echo "$NAME|$URL|$GENRE"
            echo "IceCast Radio ($GENRE): $NAME|$URL"
            NAME= URL= GENRE=
          fi
        ;;
      esac
      
      case $LINE in
        '<server_name'*) NAME="${LINE#*>}" ; NAME="${NAME%%<*}"
        ;;
         
        '<listen_url'*) URL="${LINE#*>}" ; URL="${URL%%<*}"
        ;;
        '<genre'*) GENRE="${LINE#*>}" ; GENRE="${GENRE%%<*}"
        ;;
      esac
   done </tmp/icecast.xml
}

get_icecasts | sort | sed 's/&\#039;//g'


The above will get all Icecast radio stations, and build a sorted list, in this format:

IceCast Radio ($GENRE): $NAME|$URL

_________________
Akita Linux, VLC-GTK, Pup Search, Pup File Search
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [9 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0896s ][ Queries: 11 (0.0043s) ][ GZIP on ]