Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sun 04 Dec 2016, 10:28
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Geany - automate "replace" with regular expressions??
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 3 [37 Posts]   Goto page: 1, 2, 3 Next
Author Message
greengeek

Joined: 20 Jul 2010
Posts: 4326
Location: New Zealand

PostPosted: Fri 26 Jun 2015, 15:52    Post subject:  Geany - automate "replace" with regular expressions??
Subject description: Can it be done via shell script?
 

.
Is it possible to use a script to automate some of Geany's inbuilt editing functions so that I don't have to work through a long file manually driving the find/replace function?

********************************************************************
EDIT : Here is an important more recent note from technosaurus:
technosaurus wrote:
Just thought I would mention that as of 1.25 geany (released this month) has a checkbox to allow multiline regex or otherwise uses sed-style matching.
Quoted from other thread here
********************************************************************

Continued original thread:
I use Geany to edit a bunch of text messages that I copy off my cellphone. First I copy the two data lines out of each backed up text message then I use the "replace" function to help me get rid of some of the extra words/numbers/jargon/formatting etc that accompanies the text message so that I can get down to something tidy and easily readable.

I have learned how to set the "use regular expressions" function in the "replace" menu so now I can search for word strings and insert/delete line feeds, tabs etc but it would be nice to be able to automate this in a script of some sort.

Here is an example of the sort of data formatting I am working with:
Code:
Date:02.06.2015 10.12.35
TEXT:Hey,did you find the 2nd wheel?
Date:02.06.2015 10.14.17
TEXT:Look in the garage
Date:02.06.2015 10.16.00
TEXT:Its definitely there!!


and here is an example of what I want my formatting to achieve:
Code:
02.06   10.12.35      Hey,did you find the 2nd wheel?
02.06   10.14.17      Look in the garage
02.06   10.16.00      Its definitely there!!


I would like to know if there is some way to use a script to open Geany and have it access my file, do several "replace" functions then save the new file. Here are the notes I have made for myself to follow when I do the manual processing:

Code:
Open the Geany 'replace' menu then turn on "Replace all - in document". Next turn on "regular expressions" in the replace dialog and then do the following steps:

1) Replace Date: with \         (\ is the 'regular expression' which means backspace)
2) Replace .2015(followed by space) with \t (\t is the 'regular expression' that means tab)
3) Replace \nTEXT: with \t\t   (\n means "the newline or linefeed preceeding TEXT:") (\t\t means double tab)


This method of using 'regular expressions' does give me the ability to edit the document exactly as i want manually but I would like to have an automatic method to concatenate several replace functions one after the other somehow so that I can watch TV while Geany is doing the work editing what will be a long file.
(Maybe there is a better way than Geany? - sed seems like an appropriate method but I struggle with the level of technical accuracy that sed requires so I do feel more comfortable with Geany at this stage)

Here is the typical 'replace' menu I am using in Geany (with "regular expressions" ticked)
:
Replace_regular_expressions.jpg
 Description   
 Filesize   31.62 KB
 Viewed   429 Time(s)

Replace_regular_expressions.jpg


Last edited by greengeek on Wed 22 Jul 2015, 15:36; edited 1 time in total
Back to top
View user's profile Send private message 
6502coder


Joined: 23 Mar 2009
Posts: 297
Location: Western United States

PostPosted: Fri 26 Jun 2015, 17:48    Post subject:  

Better to use something like sed or awk for this.

I wrote an awk script named "dt.awk" to do the text processing you seem to want, while passing everything else through unchanged. The file "data.txt" is my guinea pig to test it out.

Code:

$ cat data.txt
Date:02.06.2015 10.12.35
TEXT:Hey,did you find the 2nd wheel?
Foo candy remains onto
Date:02.06.2015 10.14.17
TEXT:Look in the garage
jabba the hutt Date:green
Date:02.06.2015 10.16.00
TEXT:Its definitely there!!     
remarkably sound TEXT:mark
all right now

$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print
}

$ awk -f dt.awk data.txt > data2.txt

$ cat data2.txt
02.06.2015      10.12.35        Hey,did you find the 2nd wheel?
Foo candy remains onto
02.06.2015      10.14.17        Look in the garage
jabba the hutt Date:green
02.06.2015      10.16.00        Its definitely there!! 
remarkably sound TEXT:mark
all right now
Back to top
View user's profile Send private message 
01micko


Joined: 11 Oct 2008
Posts: 8590
Location: qld

PostPosted: Fri 26 Jun 2015, 18:08    Post subject:  

no need for geany or sed, just shell (ash) and /bin/echo

Code:
#!/bin/ash

while read line;do
   if echo "$line" | grep -q "^Date";then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      echo -n "$date" >> result
      time=${line##* }
      echo -en "\t $time" >> result
   else
      text=${line#*:}
      echo -e "\t\t $text" >> result
   fi
done < msgs


PS: you wont have time to watch TV as a largish file should be done in seconds.

_________________
Puppy Linux Blog - contact me for access
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1149
Location: Japan

PostPosted: Fri 26 Jun 2015, 23:18    Post subject: Re: Geany - automate "replace" with regular expressions??
Subject description: Can it be done via shell script?
 

greengeek wrote:
I would like to have an automatic method to concatenate several replace functions
Not possible with Geany. But since you have already mastered the most difficult task, creating regex patterns, all there is left is to find a tool to apply these patterns to your text. If you have come so far, sed is simple. In your example you don't even need regular expressions (you could use Geany's option "Use escape sequences") and you could do it with pure bash (see below).

01micko wrote:
no need for geany or sed, just shell
Do I see grep? Me thinks you are cheating Laughing

Here is a way to use "just shell":
Code:
#!/bin/bash
IN=$(cat msgs.txt)
OUT=${IN//Date:/}
OUT=${OUT//.2015 /$'\t'}
OUT=${OUT//$'\n'TEXT:/$'\t\t'}
echo "$OUT" > msgs_formatted.txt
Back to top
View user's profile Send private message 
01micko


Joined: 11 Oct 2008
Posts: 8590
Location: qld

PostPosted: Fri 26 Jun 2015, 23:24    Post subject: Re: Geany - automate "replace" with regular expressions??
Subject description: Can it be done via shell script?
 

MochiMoppel wrote:


01micko wrote:
no need for geany or sed, just shell
Do I see grep? Me thinks you are cheating Laughing


Indeed!

Still not pure shell but printf is a busybox applet..

Code:
#!/bin/ash

while read line;do
   if [ "${line%:*}" = "Date" ];then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      time=${line##* }
      printf "${date}\t${time}\t\t" >> result
   else
      text=${line#*:}
      echo "$text" >> result
   fi
done < msgs

_________________
Puppy Linux Blog - contact me for access
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1149
Location: Japan

PostPosted: Sat 27 Jun 2015, 02:40    Post subject:  

If it really has to be Geany: Geany provides the option of Sending text through custom commands. The manual gives an example how to built a Replace all command using sed. You can build on that and create more complex commands, but it all boils down to using shell scripting syntax.

Using my previous script you can try it:
Go to Edit > Format > Send selection to > Set custom commands
Double click on an empty command spot (probably command 1) and paste this glorious one-liner:
Code:
/bin/sh -c "IN=\"$(</dev/stdin)\";OUT=${IN//Date:/};OUT=${OUT//.2015 /$'\t'};OUT=${OUT//$'\n'TEXT:/$'\t\t'};echo -n \"$OUT\""

Preferences > Key bindings lets you define keyboard shortcuts for custom commands 1 ~ 3. I used <Primary>1 which translates into Ctrl+1
Now you can select your messages and after hitting Ctrl+1 the custom command can do the rest.
Back to top
View user's profile Send private message 
seaside

Joined: 11 Apr 2007
Posts: 911

PostPosted: Sat 27 Jun 2015, 12:44    Post subject:  

Here's a one liner sed...

Code:
sed  -e 'N;s/\(.*\)\n\(.*\)/\1\2/' -e 's/Date:/ /' -e 's/TEXT:/ /' -e 's/\.[0-9]\{4\}//' msgs


It just appends the line below to the one above and removes unwanted items.

Cheers,
s
Back to top
View user's profile Send private message 
greengeek

Joined: 20 Jul 2010
Posts: 4326
Location: New Zealand

PostPosted: Sat 27 Jun 2015, 15:41    Post subject:  

Wow! Thank you all for the options. I am working my way through them now and tried the first three so far and they are all doing the business really well - this is wetting my appetite to extend the script to pick the relevant text lines out of a whole directory full of individual sms text files. This is going to be a real time saver.

I will report back with a summary of which option I find most usable and customisable for my future needs.
cheers!
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4694

PostPosted: Sun 28 Jun 2015, 04:36    Post subject:  

Geany can still do multiple matches; use this:
Code:
Date:([.0-9]*)[.][0-9]* ([.0-9]*).*\nTEXT:

Code:
\1 \2

each parenthesis is a match, which you can use similar to $1 $2 in shell
... regexes are about 1000% more useful with this 1 feature
sed can do the same thing, but thought it may be helpful to others in geany, because I use it all the time
... especially when I am dealing with stuff that spans multiple lines (where sed is less effective)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
greengeek

Joined: 20 Jul 2010
Posts: 4326
Location: New Zealand

PostPosted: Sun 28 Jun 2015, 05:16    Post subject:  

6502coder wrote:

Code:
$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print
}


I am now trialling this method against the original unedited sms text files where there are other text lines that I have edited out and not shown in my first post above. You mentioned that this code will be "passing everything else through unchanged" which I have decided i need to avoid. I have tried modifying the last line of the code as follows:

Code:
$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print > /dev/null
}

This gives me the result I need in data2.txt although it does also create a file called "0" in /root (where I am doing my testing). Is there a better way for me to dump the extra unneeded data rather than using > /dev/null?
Back to top
View user's profile Send private message 
6502coder


Joined: 23 Mar 2009
Posts: 297
Location: Western United States

PostPosted: Sun 28 Jun 2015, 07:09    Post subject:  

greengeek wrote:
Is there a better way for me to dump the extra unneeded data rather than using > /dev/null?


Yes, simply delete the two lines
Code:

     else
            print


That "0" file was being created by your "print > /dev/null" which is not correct awk syntax. (Awk syntax is not the same as shell syntax).
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4694

PostPosted: Mon 29 Jun 2015, 01:06    Post subject:  

technosaurus wrote:
Geany can still do multiple matches; use this:
Date:([.0-9]*)[.][0-9]* ([.0-9]*).*\nTEXT:
\1 \2
each parenthesis is a match, which you can use similar to $1 $2 in shell
... regexes are about 1000% more useful with this 1 feature
sed can do the same thing, but thought it may be helpful to others in geany, because I use it all the time
... especially when I am dealing with stuff that spans multiple lines (where sed is less effective)

Has anyone tried the parentheses matches?

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
greengeek

Joined: 20 Jul 2010
Posts: 4326
Location: New Zealand

PostPosted: Mon 29 Jun 2015, 04:16    Post subject:  

technosaurus wrote:
Has anyone tried the parentheses matches?
I haven't yet - I'm still working my way through each of the options suggested. I will get through them all eventually.

As I try each one I am also trying to extend the functionality a bit to handle the original "raw" and unedited sms file format so it's going to take me a while...

EDIT Now that I look at your post again I can see that I misinterpreted it at first - I thought you were extending Mochis custom commands post (which I have not fully understood yet) but now I see that your strings go into the "replace" dialog fields.

I just tried it and it works! Although i could do with another tab just before the text field. And I simply cannot figure out how you managed to dump the .2015

The power of these regexes leaves me speechless. I'm finding this stuff really interesting!
geany_multiple_regex.jpg
 Description   
 Filesize   12.98 KB
 Viewed   192 Time(s)

geany_multiple_regex.jpg


Last edited by greengeek on Mon 29 Jun 2015, 04:29; edited 1 time in total
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1149
Location: Japan

PostPosted: Mon 29 Jun 2015, 04:23    Post subject:  

technosaurus wrote:
Has anyone tried the parentheses matches?
Yes. Works nicely, but needs a very good command of regex syntax. Maybe a bit too complex for simple search & replace operations.
Back to top
View user's profile Send private message 
greengeek

Joined: 20 Jul 2010
Posts: 4326
Location: New Zealand

PostPosted: Mon 29 Jun 2015, 05:37    Post subject:  

01micko wrote:
Code:
#!/bin/ash

while read line;do
   if echo "$line" | grep -q "^Date";then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      echo -n "$date" >> result
      time=${line##* }
      echo -en "\t $time" >> result
   else
      text=${line#*:}
      echo -e "\t\t $text" >> result
   fi
done < msgs


Code:
#!/bin/ash

while read line;do
   if [ "${line%:*}" = "Date" ];then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      time=${line##* }
      printf "${date}\t${time}\t\t" >> result
   else
      text=${line#*:}
      echo "$text" >> result
   fi
done < msgs


The first time i tried these they seemed to work ok but now I try them again it seems that they do not process the final text field. No matter how many Date/TEXT lines I add, the last TEXT field is always missing. Is it just me?
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 3 [37 Posts]   Goto page: 1, 2, 3 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0778s ][ Queries: 12 (0.0067s) ][ GZIP on ]