The time now is Sat 25 May 2013, 23:10
All times are UTC - 4 |
| Author |
Message |
sc0ttman

Joined: 16 Sep 2009 Posts: 2175 Location: UK
|
Posted: Tue 19 Feb 2013, 22:05 Post subject:
how to grep multiple lines into 1? [SOLVED] Subject description: whats the easiest way? (see last few posts) |
|
I want to get various fields from an xml file, and output them on a single line, and repeat for each item in the xml list... example, i want only name, url, genre:
file:
| Code: | <item01>
<name>01n</name>
<url>01u</url>
<notinterested>01ni</notinterested>
<genre>01g</genre>
<unneeded>01un</unneeded>
</item01>
<item02>
<name>02n</name>
<url>02u</url>
<notinterested>02ni</notinterested>
<genre>02g</genre>
<unneeded>02un</unneeded>
</item02>
|
...
and all I want in my output file is `name|url|genre` of each item, on a new line:
01n|01u|01g
02n|02u|02g
...
I cant seem to write a fast way of doing this... But then I am crap at this sort of thing...
EDIT:: SOLVED!
_________________ Akita Linux, VLC-GTK, Pup Search, Pup File Search
Last edited by sc0ttman on Fri 22 Feb 2013, 09:32; edited 1 time in total
|
|
Back to top
|
|
 |
Ibidem
Joined: 25 May 2010 Posts: 249
|
Posted: Tue 19 Feb 2013, 22:33 Post subject:
|
|
grep won't cut it: it doesn't transform text.
Off the top of my head:
| Code: | #!/bin/sh
while read line
do
case $line in
#This is for every tag before the last interesting one:
#outputs "content|"
(*<name>* |*<url>*)
echo -n "$line" |sed -e 's/.*>\(..*\)<.*/\1|/g'
;;
#This is for the last tag of interest:
#outputs "content\n"
(*<genre>*)
echo "$line" |sed -e 's/.*>\(..*\)<.*/\1/g'
;;
(*) ;;
esac
done
|
This assumes that for every tag you're interested in, both opening and closing are on the same line.
You need to redirect the input to the XML file:
| Code: | example.sh < sample.xml
#or
cat sample.xml | example.sh
|
No idea how fast it is, though.
Last edited by Ibidem on Thu 21 Feb 2013, 21:24; edited 1 time in total
|
|
Back to top
|
|
 |
amigo
Joined: 02 Apr 2007 Posts: 1759
|
Posted: Wed 20 Feb 2013, 03:42 Post subject:
|
|
If the xml files are small and you are *sure* that the xml is one-line-one-tag clean, then some variation of Ibidem's solution will work. Otherwise, you'll need to use a good xml-parser to retrieve tag values.
|
|
Back to top
|
|
 |
jamesbond
Joined: 26 Feb 2007 Posts: 1542 Location: The Blue Marble
|
Posted: Wed 20 Feb 2013, 10:38 Post subject:
|
|
| Code: | #!/bin/dash
{ echo "<root>"; cat -; echo "</root>"; } | xml2 | awk -F= '
$1~/name$/ || $1~/url$/ { printf $2 "|"}
$1~/genre$/ { print $2 }'
|
Get xml2 from www.ofb.net/~egnor/xml2/
Timing on my lousy machine for 1million records:
| Code: | time ./x.sh < y.xml > /dev/null
real 0m21.785s
user 0m34.710s
sys 0m2.857s
|
where y.xml is this | Code: | <item01>
<name>01n</name>
<url>http://host1.com/url1</url>
<notinterested>01ni</notinterested>
<genre>01g</genre>
<unneeded>01un</unneeded>
</item01>
<item02>
<name>02n</name>
<url>http://host2.com/url2</url>
<notinterested>02ni</notinterested>
<genre>02g</genre>
<unneeded>02un</unneeded>
</item02> | repeated 500thousand times (thus a million records).
Output:
| Code: | 01n|http://host1.com/url1|01g
02n|http://host2.com/url2|02g |
Limitation: you'd better make sure that all the fields you want to extract doesn't contain equal sign (=) or things will break.
_________________ Fatdog64, Slacko and Puppeee user. Puppy user since 2.13
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 837
|
Posted: Wed 20 Feb 2013, 21:18 Post subject:
|
|
sc0ttman,
As everyone mentioned, this could be simple if you can count on the xml being in the format you've shown.
| Code: |
while read line
do
case $line in
*name*) nline=${line%<*} name=${nline#*>} name="$name|" ;;
*url*) nline=${line%<*} url=${nline#*>} url="$url|" ;;
*genre*) nline=${line%<*} genre=${nline#*>}
echo $name$url$genre >>pipe-sep-file ;;
esac
done <xmlfile
|
Regards,
s
|
|
Back to top
|
|
 |
amigo
Joined: 02 Apr 2007 Posts: 1759
|
Posted: Thu 21 Feb 2013, 06:09 Post subject:
|
|
Here's a more complete example:
| Code: | #!/bin/bash
while read LINE ; do
# if both NAME and GENRE are set, then the output is ready
# otherwise, we are just beinning or still composing output
case $NAME in
'') : ;;
*) if [[ $GENRE ]] ; then
echo "$NAME|$URL|$GENRE"
NAME= URL= GENRE=
fi
;;
esac
case $LINE in
'<name'*) NAME=${LINE#*>} ; NAME=${NAME%%<*}
;;
'<url'*) URL=${LINE#*>} ; URL=${URL%%<*}
;;
'<genre'*) GENRE=${LINE#*>} ; GENRE=${GENRE%%<*}
;;
esac
done <test.xml
|
|
|
Back to top
|
|
 |
sc0ttman

Joined: 16 Sep 2009 Posts: 2175 Location: UK
|
Posted: Thu 21 Feb 2013, 19:44 Post subject:
|
|
Thanks for all the responses, so far I've gone with the snippet that's easiest to understand, being the lazy soul I am... I used amigos snippet, fitted my purposes most closely, and so far i have this (it'll end up in vlc-gtk when it's done):
| Code: | #!/bin/bash
get_icecasts () {
rm /tmp/icecastlist
rm /tmp/icecast.xml
wget -4 -O /tmp/icecast.xml "http://dir.xiph.org/yp.xml"
if [ -f /tmp/icecast.xml ];then
while read LINE ; do
# if both NAME and GENRE are set, then the output is ready
# otherwise, we are just beinning or still composing output
case $NAME in
'') : ;;
*) if [[ $GENRE ]];then
echo "IceCast Radio ($GENRE): $NAME|$URL" >> /tmp/icecastlist
NAME= URL= GENRE=
fi
;;
esac
case $LINE in
'<server_name'*) NAME="${LINE#*>}" ; NAME="${NAME%%<*}" ;;
'<listen_url'*) URL="${LINE#*>}" ; URL="${URL%%<*}" ;;
'<genre'*) GENRE="${LINE#*>}" ; GENRE="${GENRE%%<*}" ;;
esac
done </tmp/icecast.xml
icecastlist="`cat /tmp/icecastlist | sort | sed 's/&\#039;//g' | uniq`" #clean up a bit
echo "$icecastlist"
rm /tmp/icecast.xml
fi
}
LIST="`get_icecasts`"
echo "$LIST" | (note the sed stuff actually has no backslash, but i added one to show i will be removing the actual html entity)
Anyway.. ideally, just cos I think I can get away with it, I wanna be able to remove the need to write to /tmp/icecastlist, just write to a variable, while retaining the `sort | sed` stuff ... Some thing like, changing:
| Code: | echo "IceCast Radio ($GENRE): $NAME|$URL" >> /tmp/icecastlist
...
LIST=`cat /tmp/icecastlist | sort | ...` |
to
| Code: | LIST="$LIST
IceCast Radio ($GENRE): $NAME|$URL"
...
LIST_CLEANED=`echo "$LIST" | sort | ...` |
I tried various things that I expected to work (I can see it shouldn't be hard!) but even that has stumped me - i get a frozen script, no output..
_________________ Akita Linux, VLC-GTK, Pup Search, Pup File Search
|
|
Back to top
|
|
 |
amigo
Joined: 02 Apr 2007 Posts: 1759
|
Posted: Fri 22 Feb 2013, 07:05 Post subject:
|
|
Everything should be as simple as possible, but no simpler than it is.
| Code: | get_icecasts () {
while read LINE ; do
# if both NAME and GENRE are set, then the output is ready
# otherwise, we are just beinning or still composing output
case $NAME in
'') : ;;
*) if [[ $GENRE ]] ; then
#echo "$NAME|$URL|$GENRE"
echo "IceCast Radio ($GENRE): $NAME|$URL"
NAME= URL= GENRE=
fi
;;
esac
case $LINE in
'<name'*) NAME=${LINE#*>} ; NAME=${NAME%%<*}
;;
'<url'*) URL=${LINE#*>} ; URL=${URL%%<*}
;;
'<genre'*) GENRE=${LINE#*>} ; GENRE=${GENRE%%<*}
;;
esac
done <test.xml
}
#get_icecasts
get_icecasts | sort | sed 's/&\#039;//g' | uniq
|
Although, do you really need to sort and uniq it? Are the entries in the input file not in order and have duplicates?
Why do you need echo? As above it already echos it. And if you need to output it to a file or other program simply pipe or redirect it.
|
|
Back to top
|
|
 |
sc0ttman

Joined: 16 Sep 2009 Posts: 2175 Location: UK
|
Posted: Fri 22 Feb 2013, 09:31 Post subject:
|
|
Thanks amigo, yep that'll do it... I do need to sort it, as I want genres grouped together.. but I didn't need uniq, force of habit..
The last echo was just so I can test the output... there was another stray one in there as well..
I'll mark the thread SOLVED. Thanks everyone.
Just for completeness:
| Code: | get_icecasts () {
wget -4 -O /tmp/icecast.xml "http://dir.xiph.org/yp.xml"
while read LINE ; do
# if both NAME and GENRE are set, then the output is ready
# otherwise, we are just beinning or still composing output
case $NAME in
'') : ;;
*) if [[ $GENRE ]] ; then
#echo "$NAME|$URL|$GENRE"
echo "IceCast Radio ($GENRE): $NAME|$URL"
NAME= URL= GENRE=
fi
;;
esac
case $LINE in
'<server_name'*) NAME="${LINE#*>}" ; NAME="${NAME%%<*}"
;;
'<listen_url'*) URL="${LINE#*>}" ; URL="${URL%%<*}"
;;
'<genre'*) GENRE="${LINE#*>}" ; GENRE="${GENRE%%<*}"
;;
esac
done </tmp/icecast.xml
}
get_icecasts | sort | sed 's/&\#039;//g' |
The above will get all Icecast radio stations, and build a sorted list, in this format:
IceCast Radio ($GENRE): $NAME|$URL
_________________ Akita Linux, VLC-GTK, Pup Search, Pup File Search
|
|
Back to top
|
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|