| Author |
Message |
PaulR
Joined: 04 May 2005 Posts: 202 Location: UK
|
Posted: Wed 22 May 2013, 16:22 Post subject:
Can grep do something like this? |
|
Using BaCon with some shell commands (a total novice in the scripting department!)...
I'm trying to parse a dynamically generated (aspx) html file to get at data stored in a table.
First I'm using wget with a custom-built url to get the page to a local file (page.txt). Next I'm using grep like this to just get the lines containing data...
SYSTEM "grep '<tr class=\"aclass\">' page.txt > data.txt"
Then I'm stepping through the lines in data.txt and getting the information between the <td></td> tags.
Ir works great except that the first line contains a single record AND lots of stuff before it (the table tags, table headers etc).
My question therefore: is it possible for grep to return only the contents of the line after the match rather than the whole line? For example, if the whole line was like this;
<tr class=\"aclass\"><td>some data here</td></tr>
Can grep return:
<td>some data here</td></tr>
Ideally I'd like to do this in one line rather than execute a separate script
TIA
Paul
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 841
|
Posted: Wed 22 May 2013, 18:09 Post subject:
|
|
PaulR,
You can do this-
| Code: | echo '<tr class=\"aclass\"><td>some data here</td></tr>'|grep -o '<td>.*'
<td>some data here</td></tr> |
"Grep -o target.*" matches the target only and .* returns the rest of the line.
Regards,
s
|
|
Back to top
|
|
 |
some1
Joined: 17 Jan 2013 Posts: 8
|
Posted: Wed 22 May 2013, 18:49 Post subject:
|
|
Hi
looks like an awk-job
awk is good for tag-parsing
-maybe as simple as this:
| Quote: |
awk '/<td>/,/<\/td>/' "$filein">"$fileout"
|
get stuff between <td> and </td> : escape the /
Cheers
|
|
Back to top
|
|
 |
PaulR
Joined: 04 May 2005 Posts: 202 Location: UK
|
Posted: Thu 23 May 2013, 12:46 Post subject:
|
|
They look promising! Thanks both, I'll try them out later
Paul
|
|
Back to top
|
|
 |
Karl Godt

Joined: 20 Jun 2010 Posts: 2736 Location: Kiel,Germany
|
Posted: Thu 23 May 2013, 13:49 Post subject:
|
|
| Quote: | Can grep return:
<td>some data here</td></tr> |
grep has as many linux tools the --help or -h option .
This would show
| Quote: | -o, --only-matching show only the part of a line matching PATTERN
|
option .
| Code: | | SYSTEM "grep -o '<tr class=\"aclass\">' page.txt > data.txt" |
DisClaimer > I am NO BaCon expert .
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 841
|
Posted: Thu 23 May 2013, 15:34 Post subject:
|
|
PaulR,
You could code it all in Bacon using just one system call to wget. Vovchik has coded Yweather which parses xml data all written in bacon basic for example.
http://murga-linux.com/puppy/viewtopic.php?mode=attach&id=63295
Regards,
s
|
|
Back to top
|
|
 |
PaulR
Joined: 04 May 2005 Posts: 202 Location: UK
|
Posted: Fri 24 May 2013, 13:39 Post subject:
|
|
I couldn't get awk or grep to work - both kept returning the entire line (including the search target) rather than text from the target onwards.
In the end I made a work-around with two lines of BaCon code. It'll be fine as long as the format of the web page doesn't change radically!
Paul
|
|
Back to top
|
|
 |
Ibidem
Joined: 25 May 2010 Posts: 262
|
Posted: Fri 24 May 2013, 19:32 Post subject:
|
|
I know it's late, but...
I would have done something like this:
sed -ne 's/.*<tr class="class">\(<td>.*</td></tr>\).*/\1/gp'
grep cannot modify matches; you print everything, the whole match, or nothing.
|
|
Back to top
|
|
 |
PaulR
Joined: 04 May 2005 Posts: 202 Location: UK
|
Posted: Sat 25 May 2013, 08:24 Post subject:
|
|
Thanks for that, I don't pretend to understand the syntax at present but it's always useful to have an alternative!
Paul
|
|
Back to top
|
|
 |
|