Can grep do something like this?

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
PaulR
Posts: 249
Joined: Wed 04 May 2005, 18:45
Location: UK

Can grep do something like this?

#1 Post by PaulR »

Using BaCon with some shell commands (a total novice in the scripting department!)...

I'm trying to parse a dynamically generated (aspx) html file to get at data stored in a table.

First I'm using wget with a custom-built url to get the page to a local file (page.txt). Next I'm using grep like this to just get the lines containing data...

SYSTEM "grep '<tr class=\"aclass\">' page.txt > data.txt"


Then I'm stepping through the lines in data.txt and getting the information between the <td></td> tags.

Ir works great except that the first line contains a single record AND lots of stuff before it (the table tags, table headers etc).

My question therefore: is it possible for grep to return only the contents of the line after the match rather than the whole line? For example, if the whole line was like this;

<tr class=\"aclass\"><td>some data here</td></tr>

Can grep return:

<td>some data here</td></tr>

Ideally I'd like to do this in one line rather than execute a separate script :)

TIA

Paul

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#2 Post by seaside »

PaulR,

You can do this-

Code: Select all

 echo '<tr class=\"aclass\"><td>some data here</td></tr>'|grep -o '<td>.*'
<td>some data here</td></tr>
"Grep -o target.*" matches the target only and .* returns the rest of the line.

Regards,
s

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#3 Post by some1 »

Hi
looks like an awk-job
awk is good for tag-parsing

-maybe as simple as this:
awk '/<td>/,/<\/td>/' "$filein">"$fileout"
get stuff between <td> and </td> : escape the /

Cheers

PaulR
Posts: 249
Joined: Wed 04 May 2005, 18:45
Location: UK

#4 Post by PaulR »

They look promising! Thanks both, I'll try them out later :D

Paul

User avatar
Karl Godt
Posts: 4199
Joined: Sun 20 Jun 2010, 13:52
Location: Kiel,Germany

#5 Post by Karl Godt »

Can grep return:

<td>some data here</td></tr>
grep has as many linux tools the --help or -h option .

This would show
-o, --only-matching show only the part of a line matching PATTERN
option .

Code: Select all

SYSTEM "grep -o  '<tr class=\"aclass\">' page.txt > data.txt"
DisClaimer > I am NO BaCon expert .

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#6 Post by seaside »

PaulR,

You could code it all in Bacon using just one system call to wget. Vovchik has coded Yweather which parses xml data all written in bacon basic for example.

http://murga-linux.com/puppy/viewtopic. ... h&id=63295

Regards,
s

PaulR
Posts: 249
Joined: Wed 04 May 2005, 18:45
Location: UK

#7 Post by PaulR »

I couldn't get awk or grep to work - both kept returning the entire line (including the search target) rather than text from the target onwards.

In the end I made a work-around with two lines of BaCon code. It'll be fine as long as the format of the web page doesn't change radically!

Paul

Ibidem
Posts: 549
Joined: Wed 26 May 2010, 03:31
Location: State of Jefferson

#8 Post by Ibidem »

I know it's late, but...
I would have done something like this:
sed -ne 's/.*<tr class="class">\(<td>.*</td></tr>\).*/\1/gp'

grep cannot modify matches; you print everything, the whole match, or nothing.

PaulR
Posts: 249
Joined: Wed 04 May 2005, 18:45
Location: UK

#9 Post by PaulR »

Thanks for that, I don't pretend to understand the syntax at present but it's always useful to have an alternative! :D

Paul

Post Reply