Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Tue 21 Oct 2014, 09:38
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Can grep do something like this?
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
Page 1 of 1 Posts_count  
Author Message
PaulR

Joined: 04 May 2005
Posts: 246
Location: UK

PostPosted: Wed 22 May 2013, 16:22    Post_subject:  Can grep do something like this?  

Using BaCon with some shell commands (a total novice in the scripting department!)...

I'm trying to parse a dynamically generated (aspx) html file to get at data stored in a table.

First I'm using wget with a custom-built url to get the page to a local file (page.txt). Next I'm using grep like this to just get the lines containing data...

SYSTEM "grep '<tr class=\"aclass\">' page.txt > data.txt"


Then I'm stepping through the lines in data.txt and getting the information between the <td></td> tags.

Ir works great except that the first line contains a single record AND lots of stuff before it (the table tags, table headers etc).

My question therefore: is it possible for grep to return only the contents of the line after the match rather than the whole line? For example, if the whole line was like this;

<tr class=\"aclass\"><td>some data here</td></tr>

Can grep return:

<td>some data here</td></tr>

Ideally I'd like to do this in one line rather than execute a separate script Smile

TIA

Paul
Back to top
View user's profile Send_private_message 
seaside

Joined: 11 Apr 2007
Posts: 887

PostPosted: Wed 22 May 2013, 18:09    Post_subject:  

PaulR,

You can do this-
Code:
 echo '<tr class=\"aclass\"><td>some data here</td></tr>'|grep -o '<td>.*'
<td>some data here</td></tr>


"Grep -o target.*" matches the target only and .* returns the rest of the line.

Regards,
s
Back to top
View user's profile Send_private_message 
some1

Joined: 17 Jan 2013
Posts: 34

PostPosted: Wed 22 May 2013, 18:49    Post_subject:  

Hi
looks like an awk-job
awk is good for tag-parsing

-maybe as simple as this:

Quote:

awk '/<td>/,/<\/td>/' "$filein">"$fileout"

get stuff between <td> and </td> : escape the /

Cheers
Back to top
View user's profile Send_private_message 
PaulR

Joined: 04 May 2005
Posts: 246
Location: UK

PostPosted: Thu 23 May 2013, 12:46    Post_subject:  

They look promising! Thanks both, I'll try them out later Very Happy

Paul
Back to top
View user's profile Send_private_message 
Karl Godt


Joined: 20 Jun 2010
Posts: 3972
Location: Kiel,Germany

PostPosted: Thu 23 May 2013, 13:49    Post_subject:  

Quote:
Can grep return:

<td>some data here</td></tr>

grep has as many linux tools the --help or -h option .

This would show
Quote:
-o, --only-matching show only the part of a line matching PATTERN

option .

Code:
SYSTEM "grep -o  '<tr class=\"aclass\">' page.txt > data.txt"

DisClaimer > I am NO BaCon expert .
Back to top
View user's profile Send_private_message Visit_website 
seaside

Joined: 11 Apr 2007
Posts: 887

PostPosted: Thu 23 May 2013, 15:34    Post_subject:  

PaulR,

You could code it all in Bacon using just one system call to wget. Vovchik has coded Yweather which parses xml data all written in bacon basic for example.

http://murga-linux.com/puppy/viewtopic.php?mode=attach&id=63295

Regards,
s
Back to top
View user's profile Send_private_message 
PaulR

Joined: 04 May 2005
Posts: 246
Location: UK

PostPosted: Fri 24 May 2013, 13:39    Post_subject:  

I couldn't get awk or grep to work - both kept returning the entire line (including the search target) rather than text from the target onwards.

In the end I made a work-around with two lines of BaCon code. It'll be fine as long as the format of the web page doesn't change radically!

Paul
Back to top
View user's profile Send_private_message 
Ibidem

Joined: 25 May 2010
Posts: 501
Location: State of Jefferson

PostPosted: Fri 24 May 2013, 19:32    Post_subject:  

I know it's late, but...
I would have done something like this:
sed -ne 's/.*<tr class="class">\(<td>.*</td></tr>\).*/\1/gp'

grep cannot modify matches; you print everything, the whole match, or nothing.
Back to top
View user's profile Send_private_message 
PaulR

Joined: 04 May 2005
Posts: 246
Location: UK

PostPosted: Sat 25 May 2013, 08:24    Post_subject:  

Thanks for that, I don't pretend to understand the syntax at present but it's always useful to have an alternative! Very Happy

Paul
Back to top
View user's profile Send_private_message 
Display_posts:   Sort by:   
Page 1 of 1 Posts_count  
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
 Forum index » Off-Topic Area » Programming
Jump to:  

Rules_post_cannot
Rules_reply_cannot
Rules_edit_cannot
Rules_delete_cannot
Rules_vote_cannot
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0899s ][ Queries: 11 (0.0345s) ][ GZIP on ]