Can grep do something like this?

Message

PaulR · #1 Post by **PaulR** » Wed 22 May 2013, 20:22

Using BaCon with some shell commands (a total novice in the scripting department!)...

I'm trying to parse a dynamically generated (aspx) html file to get at data stored in a table.

First I'm using wget with a custom-built url to get the page to a local file (page.txt). Next I'm using grep like this to just get the lines containing data...

SYSTEM "grep '<tr class=\"aclass\">' page.txt > data.txt"

Then I'm stepping through the lines in data.txt and getting the information between the <td></td> tags.

Ir works great except that the first line contains a single record AND lots of stuff before it (the table tags, table headers etc).

My question therefore: is it possible for grep to return only the contents of the line after the match rather than the whole line? For example, if the whole line was like this;

<tr class=\"aclass\"><td>some data here</td></tr>

Can grep return:

<td>some data here</td></tr>

Ideally I'd like to do this in one line rather than execute a separate script

TIA

Paul

seaside · #2 Post by **seaside** » Wed 22 May 2013, 22:09

PaulR,

You can do this-

Code: Select all

 echo '<tr class=\"aclass\"><td>some data here</td></tr>'|grep -o '<td>.*'
<td>some data here</td></tr>

"Grep -o target.*" matches the target only and .* returns the rest of the line.

Regards,
s

some1 · #3 Post by **some1** » Wed 22 May 2013, 22:49

Hi
looks like an awk-job
awk is good for tag-parsing

-maybe as simple as this:

awk '/<td>/,/<\/td>/' "$filein">"$fileout"

get stuff between <td> and </td> : escape the /

Cheers

PaulR · #4 Post by **PaulR** » Thu 23 May 2013, 16:46

They look promising! Thanks both, I'll try them out later

Paul

Karl Godt · #5 Post by **Karl Godt** » Thu 23 May 2013, 17:49

Can grep return:

<td>some data here</td></tr>

grep has as many linux tools the --help or -h option .

This would show

-o, --only-matching show only the part of a line matching PATTERN

option .

Code: Select all

SYSTEM "grep -o  '<tr class=\"aclass\">' page.txt > data.txt"

DisClaimer > I am NO BaCon expert .

seaside · #6 Post by **seaside** » Thu 23 May 2013, 19:34

PaulR,

You could code it all in Bacon using just one system call to wget. Vovchik has coded Yweather which parses xml data all written in bacon basic for example.

http://murga-linux.com/puppy/viewtopic. ... h&id=63295

Regards,
s

PaulR · #7 Post by **PaulR** » Fri 24 May 2013, 17:39

I couldn't get awk or grep to work - both kept returning the entire line (including the search target) rather than text from the target onwards.

In the end I made a work-around with two lines of BaCon code. It'll be fine as long as the format of the web page doesn't change radically!

Paul

Ibidem · #8 Post by **Ibidem** » Fri 24 May 2013, 23:32

I know it's late, but...
I would have done something like this:
sed -ne 's/.*<tr class="class">$<td>.*</td></tr>$.*/\1/gp'

grep cannot modify matches; you print everything, the whole match, or nothing.

PaulR · #9 Post by **PaulR** » Sat 25 May 2013, 12:24

Thanks for that, I don't pretend to understand the syntax at present but it's always useful to have an alternative!

Paul

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Can grep do something like this?

Can grep do something like this?