Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Wed 03 Sep 2014, 01:04
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
A sed expression to deal with parsing wikitext [SOLVED]
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [11 Posts]  
Author Message
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 10:13    Post subject:  A sed expression to deal with parsing wikitext [SOLVED]  

I've written and am tweaking a wikitext parser using sed and I want to make it as compatible with Creole 1.0 as possible but I'm having problems with //italic//.

I can't find a way to deal with this:
Code:
//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//


All I've got is this which italicises at least one char:
Code:
sed -e `|//\([^/]\+\)//|<em>\1</em>|g'


To be honest this is acceptable anyway:
Code:
//some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text//


but I just wondered if there's a sed wizard about who knows how to deal with "// .* not:// .* //" because it might help me tweak some other stuff. I basically want to not more than one char and I think you can only not single chars.

Regards,
Thunor

Last edited by thunor on Fri 03 May 2013, 17:45; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website 
seaside

Joined: 11 Apr 2007
Posts: 886

PostPosted: Fri 03 May 2013, 11:52    Post subject:  

Hey thunor,

I don't have a sed answer, but perhaps a bash solution would do....
Code:
 # line='//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//'
# line=${line/#\/\//<em>}  line=${line/%\/\//</em>}
# echo $line
<em>some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text</em>


Best regards,
s
EDIT: A little experimenting and
Code:
 echo $line |sed 's|^\/\/|<em>|;s|\/\/$|</em>|' 
works Smile
Back to top
View user's profile Send private message 
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 17:42    Post subject:  

Thanks seaside but it needs to deal with multiples on the same line which I should've mentioned.

It did get me thinking though about maybe dealing with it before I use sed or after with something like you've done or a case statement and then I thought about temporarily substituting "://" and putting it back afterwards. The conclusion is I managed it in sed using temporary string substitution:
Code:
echo '//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//' | sed \
   -e 's|://|@COLON_SLASH_SLASH@|g' \
   -e 's|//|@SLASH_SLASH@|g' \
   -e 's|/|@SLASH@|g' \
   -e 's|@SLASH_SLASH@|//|g' \
\
   -e 's|//\([^/]\+\)//|<em>\1</em>|g' \
\
   -e 's|@SLASH@|/|g' \
   -e 's|@COLON_SLASH_SLASH@|://|g'

Cheers and regards,
Thunor
Back to top
View user's profile Send private message Visit poster's website 
sunburnt


Joined: 08 Jun 2005
Posts: 5016
Location: Arizona, U.S.A.

PostPosted: Fri 03 May 2013, 18:39    Post subject:  

thunor; You`re not very clear about what you`re trying to do.

You posted an example input line, can you post what you want it to look like?

Or is this what you want?
Quote:
Input: //some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//

Output: //some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text//

If so then this does the trick: echo $input |sed 's# \[\[#// \[\[#;s#|#|//#;s#\]\] #//\]\] //#'
You need to escape "\" the "[" and "]" characters as Bash uses them to evaluate expressions: [ -d /root ]&& echo GOOD

### But maybe you`re trying to italicize the "some text" parts?
.
Back to top
View user's profile Send private message 
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 18:51    Post subject:  

sunburnt wrote:
thunor; You`re not very clear about what you`re trying to do.

You posted an example input line, can you post what you want it to look like?...

Hi sunburnt

This (I'll give you an example using multiples on the same line which needs to be supported):
Code:
//some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text// some non-italicised text //some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text//

to:
Code:
<em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em> some non-italicised text <em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em>

and ultimately once I've processed the wikitext formatted external URLs it'll output as:

some italicised text Learn Linux some italicised text some non-italicised text some italicised text Learn Linux some italicised text

I did solve it by substituting the conflicting slashes with something else and then putting them back afterwards which seems the logical thing to do.

This is just an example of the problem I had. I need to be able to italicise //everything and anything// that appear inside double slashes //multiple times// on the same line.

Regards,
Thunor
Back to top
View user's profile Send private message Visit poster's website 
seaside

Joined: 11 Apr 2007
Posts: 886

PostPosted: Fri 03 May 2013, 20:26    Post subject:  

Thunor,

I guess this could be done with sed pattern holds and buffer manipulations which I don't really comprehend. Your solution is to the point and much easier to understand (none of those strange char combinations that require a lookup) Smile

Best Regards,
s
(You must be the sed wizard you were looking for Smile )
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4348

PostPosted: Fri 03 May 2013, 22:07    Post subject:  

i recommend posting this to stackoverflow if you cant already find the answer there

using awk and assuming they don't span lines (if they can span lines, just set RS="EOF" or something in the BEGIN section)

Code:
awk '
BEGIN{FS="//"}
{
for(i=1;i<=NF;i++){
    print $i
    i++
    if(i<NF){
        print "<em>" $i "</em>"
    }
}
}
'

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
sunburnt


Joined: 08 Jun 2005
Posts: 5016
Location: Arizona, U.S.A.

PostPosted: Sat 04 May 2013, 14:12    Post subject:  

That`s essentially what I was going to offer up,

A Bash loop to handle the <em></em> tag pairs and ignore "http://".

techysaurus Wink is always spot on for the most succinct script code...
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4348

PostPosted: Sat 04 May 2013, 15:18    Post subject:  

sunburnt wrote:
and ignore "http://"
for that you'd need something before the i++ like:
Code:
if(substr($i,length($i),1)==":"){printf "//";continue}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
seaside

Joined: 11 Apr 2007
Posts: 886

PostPosted: Sat 04 May 2013, 20:18    Post subject:  

technosaurus,

I tried to get this part -
Code:
if(substr($i,length($i),1)==":"){printf "//";continue}
to work and couldn't. So here's a crossover "Thunor-@colon_slash_slash@" awk version.
Code:
awk '
BEGIN{FS="//"}

{gsub("://","@colon_slash_slash@")}

{
for(i=1;i<=NF;i++){
 
    i++
    if(i<NF){
        sub("@colon_slash_slash@","://",$i)
        print "<em>" $i "</em>"
    }
}
}
'


No speed difference between sed and awk versions.

Best regards,
s
(Hmmm..."@colon_slash_slash@" sounds more like a colonoscopy, only more comfortable in code than in person) Smile
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4348

PostPosted: Sat 04 May 2013, 22:14    Post subject:  

damn,... I was trying to do it in my head again without running the code - wasn't 100% sure continue was supported the way it is in shell ... anyhow consider it pseudo code
Quote:
@colon_slash_slash@" sounds more like a colonoscopy
reminds me of a scene in the movie Seven
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [11 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0943s ][ Queries: 12 (0.0160s) ][ GZIP on ]