Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Thu 13 Dec 2018, 04:53
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
How to extract a sub-string in the middle of a main string?
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [9 Posts]  
Author Message
musher0

Joined: 04 Jan 2009
Posts: 13160
Location: Gatineau (Qc), Canada

PostPosted: Sat 03 Mar 2018, 17:39    Post subject:  How to extract a sub-string in the middle of a main string?
Subject description: Following up on a question BarryK asked a couple of years ago
 

Hello all.

As the title says.
I am creating a new thread on this subject because I cannot find BK's original.
(# Edit, an hour later: please see below, MochiMoppei found it.)

Barry asked an interesting question, because we often need to fish out some info
in the middle of a main string, and it seems there is no straightforward way to do it
in Bash.

I did provide a tentative solution at the time, based on the delimiter in Barry's
example, which was a double-colon, IIRC.
(# Edit, an hour later: Some other developers did as well.)

The following is NOT based on a delimiter, but please see the concluding note in
the example below.

Feedback welcome.

BFN

~~~~~~~~~~~~~~~~~~~~~
Code:

###############
-- Trying to mimic "Intersection" in Boolean logic --

(Please pardon my French, "Intersection" may have another name in English.
I mean the C part when circles A and B overlap.)

x=2 ########################## How many char. we want (approx.) out of the
Z="ba;be;bi;bo;bu;by" ######## middle of this string. (Acts as an "accordeon".)
############################## Note 1) : x should not be 1.
############################## Note 2) : not considering delimiters. The
############################## string could as well be "Fish and potatoes",
############################## which has the same length.

a="${#Z}" #################### We fetch the length of the string.
echo $a

b="`echo "($a/$x)-1" | bc`" ## Will give us the length of the string divided
############################## by the number of characters that we want minus
echo $b ###################### one because position one of the string in human
############################## terms is position zero as bash understands it.

c="`echo "$a-($x*$b)" | bc`" # This is the actual number of characters that
echo $c ###################### we will get from the middle of the string.

echo "${Z:$b:$c}" ############ If x=1 above, we get the last character

# b="`expr $b + $c`" ######### Same intersection starting from the end of the
# echo "${Z: -$b:$c}" ######## string, with the new $b to compensate for the
############################## backwards calculation.

Results:
17
7 (char. 8 in human terms)
3 (size of sub-string)
i;b (the contents in the middle of the sting)

############################## About delimiters: to provide a more general
############################## solution, this script should contain a "delimiter
############################## detector". This was the idea in my 1st example;
############################## is the delimiter "b" or ";"?
############################## (With a view of confusing the script and the
############################## script writer!)
#
############################## Once we have this "delimiter selector" (ideally),
############################## the script could use awk or cut.

_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)

Last edited by musher0 on Sat 03 Mar 2018, 20:38; edited 1 time in total
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1713
Location: Japan

PostPosted: Sat 03 Mar 2018, 19:32    Post subject:  

http://www.murga-linux.com/puppy/viewtopic.php?p=875403
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 13160
Location: Gatineau (Qc), Canada

PostPosted: Sat 03 Mar 2018, 20:33    Post subject:  

Thanks for the reference, MochiMoppei.
_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1713
Location: Japan

PostPosted: Sun 04 Mar 2018, 07:22    Post subject:  

Not sure if I comprehend what you are trying to achieve, but when I try x=3 it extracts
ba;be;bi;bo;bu;by

Shouldn't it be
ba;be;bi;bo;bu;by ?

And if your intention is to use bash then there is no point to use bc:
Code:
b="`echo "($a/$x)-1" | bc`"
should be the same as the pure bash arithmetic
Code:
b=$((a/x-1))
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 13160
Location: Gatineau (Qc), Canada

PostPosted: Sun 04 Mar 2018, 07:42    Post subject:  

Thanks, MochiMoppei.

I'll give it another look.

~~~~~~~~~~~
@all:

The following could be a complement to the above, or be used separately. I think it is
fairly well commented, but if you have questions, please ask them.

I spent the night on this, so I am tired and I will not do much introduction at this point. It
will have to wait until I am rested.

Here goes.
Code:
#!/bin/bash
# /opt/local/bin/FishPotatoes.sh # Alpha version.
# (Or place this script in any "/bin" IN YOUR $PATH.)
#
# Goal: Find the delimiter in a text or csv file.
#
# Uses Puppy-provided utilities: awk, paste, seq, sort. (I.e., no outside dependencies.)
#
# Usage: 1) copy your *.txt or *.csv file to, or paste it into,
# a general file called "text" in /root;
# 2) run this script from terminal.
#
# Example: open a terminal, copy your *.txt or *.csv file to "text" and type
# < FishPotatoes.sh >. # (That's it!)
#
# IMPORTANT -- Maximum size for the input text or csv file : 2Mb.
# CAUTION -- This script looks somewhat complete, and it is, but it is not perfect.
# E.g., I still do not know why a "|" delimiter shows up in the list when there is none
# in the originating text or string.  It does the job; however I feel that it has not
# been tested enough. So... NO guarantees whatsoever.  You are more than
# welcome to do some tests with it and report back on the thread: TIA.
#
# © Christian L'Écuyer, Gatineau (Qc), Canada, 2018-03-04. GPL3.
# (Alias musher0 [forum Puppy].) #
#################   # https://opensource.org/licenses/GPL-3.0
#    This program is free software: you can redistribute it and/or modify it under the
#    terms of the GNU General Public License as published by the Free Software
#    Foundation, either version 3 of the License, or  (at your option) any later version.
#         This program is distributed in the hope that it will be useful, but WITHOUT ANY
#    WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
#    A PARTICULAR PURPOSE. See the GNU General Public License for more details.
#         You should have received a copy of the GNU General Public License along with
#   this program. If not, see <http://www.gnu.org/licenses/>.
##########
#   Ce programme est libre : vous pouvez le redistribuer ou modifier selon les termes
#   de la Licence Publique Générale GNU publiée par la Free Software Foundation (v. 3
#   ou toute version ultérieure choisie par vous).
#       Ce programme est distribué dans l'espoir qu'il sera utile, mais SANS AUCUNE
#   GARANTIE, ni explicite ni implicite, y compris des garanties de commercialisation
#   ou d'adaptation à un but spécifique. Pour plus de détails, veuillez vous reporter
#   au texte officiel de cette licence à https://opensource.org/licenses/GPL-3.0, à
#   http://www.linux-france.org/article/these/gpl.html pour une traduction et, pour une
#   explication en français, à https://fr.wikipedia.org/wiki/Licence_publique_générale_GNU.
################ # set -xe
# Input # Getting the text externally is more practical.
Text="";Text="`paste -sd'\0' ~/text`"  # Text="ba;be;bi;bo;bu;by" # For test.
echo -e "The text is:\n`cat ~/text`\n"  # echo -e "The text is:\n$Text" # For test.

# Process
delim="";delim=(';' ',' '|' '\t' ' ' ':' '-') ## Standard delimiters that one comes across in csv files.
Sommaire="";champ=0;fois=0
for i in `seq ${#delim[@]}`;do
     fois="`echo -e "$Text" | tr "${delim[$champ]}" "\n" | wc -l`" # Gives us N occurrences.
     fois="`expr $fois - 1`" # Removing the LF.
     Sommaire="$Sommaire$fois -${delim[$champ]}-\n" # Gathers data for the report.
     champ="`expr $champ + 1`" # We prepare to query the next delimiter.
done

# Report
echo -e "~~~~~~~~~~~~~~~~~~~~~~\n\nDelimiter statistics:\n$Sommaire"

Several="`echo -e "$Sommaire" | awk '$1 > 0 { print }' | wc -l`"
if [ "$Several" -gt "1" ];then # In case there is more than one delimiter in this text.
     echo -e "~~~~~~~~~~~~~~~~~~~~~~\n\nThere are several standard delimiters in this text:"
     echo -e "$Sommaire" | awk '$1 > 0 { print }' | sort -n -k 1 -r
fi

echo -e "\n~~~~~~~~~~~~~~~~~~~~~~\n\nThe main delimiter is:"
MainDlmtr="`echo -e $Sommaire | sort -n -k 1 -r | head -1 | cut -d" " -f2`"
if [ "${MainDlmtr}" = "-" ];then
     echo "$MainDlmtr -" # To make obvious the < space > delimiter.
else
     echo "$MainDlmtr"
fi
echo -e "\n~~~~~~~~~~~~~~~~~~~~~~" # set +xe
exit
### 30 ###

#######################################
# To bring a chuckle out of you (hopefully!), here is the text
# that I used for my main test.  The apparent "confusion" between
# punctuation marks and standard csv delimiters is intentional.
#
# In case you are wondering, this part does not need to be
# commented out because of the exit command above.
#
   -- MUSHER0'S DINER --

--- Today's Dinner Menu ---

Appetizer -- Your choice of
soup, or salad, or vegetable juice;

Main course -- Your choice of
Dad's generous steak and mashed potatoes
with pepper gravy, or Aunt Audette's
beautiful whitefish and carrots in lemon
sauce;

Dessert -- Your choice of
Cousin Norman's "Drip" (vanilla ice cream
with caramel sauce), a slice of Mom's tasty
fruit cake, or one of my succulent donuts in
maple syrup;

Beverage -- Your choice of
coffee, tea, or home-made spruce beer.

Please ask your waitress or waiter if you wish
to see our full menu.
#
# @Sailor Enceladus: I had fun writing it! ;)
#
# BTW, first dinner will be free for visiting Puppyists if ever I
# open that diner North of Lake Superior, near Montreal River
# and Highway 17. :-)
#######################################


BFN.
Diner-Menu.txt.jpg
 Description   The top is missing, but I am sure that you get the idea.
 Filesize   232.06 KB
 Viewed   136 Time(s)

Diner-Menu.txt.jpg


_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 13160
Location: Gatineau (Qc), Canada

PostPosted: Wed 07 Mar 2018, 05:49    Post subject:  

Hi.

This is where I'm at.

This version provides the true middle of the text and the middle of the lines.
(Please see attached illustration.)

It works fine with txt and csv files, as far as I can tell. However, I feel that the results
are not reliable with quasi-text file types such as xml and sh files.

Feedback welcome.

BFN.
~~~~~~~~~~~~~~~~~~~~~~~~
Code:
#!/bin/bash
# /opt/local/bin/FishInTheMiddle.sh # (Or place this
# script in any "/bin" directory in your $PATH.)
#
# Goal -- Find the delimiter in a text or csv file and show the middle field(s)
# -- of the entire text and of the individual lines.
#
# Uses Puppy-provided utilities: awk, paste, seq, sort. (I.e., no outside dependencies.)
#
# Usage -- In terminal, type < FishInTheMiddle.sh filename >,
# or just < FishInTheMiddle.sh >, and answer the prompt.
#
# Limitations at this time: plain *.txt and *.csv files; not reliable for *.xml or *.sh files.
# (Other quasi-text file types may also display strange results.)
#
# IMPORTANT -- Maximum size of the input file: 2Mb.
#
# © Christian L'Écuyer, Gatineau (Qc), Canada, 2018-03-04 and 07. GPL3.
# (Alias musher0 [forum Puppy].) #
#################   # https://opensource.org/licenses/GPL-3.0
#    This program is free software: you can redistribute it and/or modify it under the
#    terms of the GNU General Public License as published by the Free Software
#    Foundation, either version 3 of the License, or  (at your option) any later version.
#         This program is distributed in the hope that it will be useful, but WITHOUT ANY
#    WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
#    A PARTICULAR PURPOSE. See the GNU General Public License for more details.
#         You should have received a copy of the GNU General Public License along with
#   this program. If not, see <http://www.gnu.org/licenses/>.
##########
#   Ce programme est libre : vous pouvez le redistribuer ou modifier selon les termes
#   de la Licence Publique Générale GNU publiée par la Free Software Foundation (v. 3
#   ou toute version ultérieure choisie par vous).
#       Ce programme est distribué dans l'espoir qu'il sera utile, mais SANS AUCUNE
#   GARANTIE, ni explicite ni implicite, y compris des garanties de commercialisation
#   ou d'adaptation à un but spécifique. Pour plus de détails, veuillez vous reporter
#   au texte officiel de cette licence à https://opensource.org/licenses/GPL-3.0, à
#   http://www.linux-france.org/article/these/gpl.html pour une traduction et, pour une
#   explication en français, à https://fr.wikipedia.org/wiki/Licence_publique_générale_GNU.
################ # set -xe
Text="$1"
if [ "$Text" = "" ];then
     echo -e "\n\t\e[36m\e[4m\e[1mPlease type the filename of the text that\e[24m
\t \e[4myou wish to examine. Type the full path
\t\t\e[4mif not in this directory.\e[0m"
     read Text
fi
if [ ! -f "$Text" ];then
     echo -e "\n\t\e[1m\e[5m\e[7m\e[31mFile $Text does not exist. Please retry.\e[0m\n"
     sleep 5;clear
     exit
fi
echo

Pasted="`paste -sd'\0' $Text`"  # Text="ba;be;bi;bo;bu;by" # For test.
echo -e "The text is:\n"
more "$Text"
sleep 2s

# Process
delim="";delim=(';' ',' '|' ' ' ':' '-') ## Standard delimiters that one comes across in csv files.
Sommaire="";champ=0;fois=0
for i in `seq ${#delim[@]}`;do
     fois="`echo -e "$Pasted" | tr "${delim[$champ]}" "\n" | wc -l`" # Gives us N occurrences.
     fois="`expr $fois - 1`" # Removing the LF.
     Sommaire="$Sommaire$fois -${delim[$champ]}-\n" # Gathers data for the report.
     champ="`expr $champ + 1`" # To query the next delimiter.
done

# Report
echo -e "~~~~~~~~~~~~~~~~~~~~~~\n\nDelimiter statistics:\n$Sommaire"

Several="`echo -e "$Sommaire" | awk '$1 > 0 { print }' | wc -l`"
if [ "$Several" -gt "1" ];then # In case there is more than one delimiter in this text.
     echo -e "~~~~~~~~~~~~~~~~~~~~~~\n\nThere are several standard delimiters in this text:"
     echo -e "$Sommaire" | awk '$1 > 0 { print }' | sort -n -k 1 -r
fi

echo -e "\n~~~~~~~~~~~~~~~~~~~~~~\n\nThe main delimiter is:"
MainDlmtr="`echo -e $Sommaire | sort -n -k 1 -r | head -1 | cut -d" " -f2`"
if [ "${MainDlmtr}" = "-" ];then
     MainDlmtr="- -"
     echo "$MainDlmtr" # To make obvious the < space > delimiter.
else
     echo "$MainDlmtr"
fi
echo -e "\n~~~~~~~~~~~~~~~~~~~~~~"
sep="${MainDlmtr:1:1}"
Fish="`echo "$Pasted" | awk -F"$sep" '{print $((NF/2)+1)}'`"
MidLine="`cat "$Text" | awk -F"$sep" '{print "\t"$((NF/2)+1)}'`"
echo -e "\nThe field in the middle of the text is:\n\n\t$Fish"
echo -e "\n~~~~~~~~~~~~~~~~~~~~~~\n"
if [ "`wc -l < $Text`" -gt "1" ];then
     echo -e "The fields in the middle of each line are:\n\n$MidLine"
     echo -e "\n~~~~~~~~~~~~~~~~~~~~~~\n"
fi # set +xe
exit
### 30 ###

~~~~~~~~~~~~~~~~~~~~~~~~

The test texts were:
Quote:
AA;BB;CC;DD;EE
---;---;---;---;---
ba;ze;ci;xo;du
bark;meow;howl;buzz;snort
along with the "restaurant menu" previously provided.
Where-Im-at.jpg
 Description   
 Filesize   139.06 KB
 Viewed   111 Time(s)

Where-Im-at.jpg


_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
slavvo67

Joined: 12 Oct 2012
Posts: 1574
Location: The other Mr. 305

PostPosted: Wed 07 Mar 2018, 11:30    Post subject:  

@Musher0

Interesting. I never thought to have the computer search for delimiters. Generally, I know them going in but an interesting concept for sure.

Thanks,

Slavvo67
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 13160
Location: Gatineau (Qc), Canada

PostPosted: Wed 07 Mar 2018, 12:26    Post subject:  

Hi slavvo67.

You're welcome.

Like you, I generally know what the delimiter is in a csv file. And if I don't, I can screen
the file with cat or more to find out.

The goal of these two scripts is to try to discover the "middle" of a text file or string
automatically. We have the head and tail utilities for tops and bottoms, but nothing for
middles, AFAIK.

Now, to get to the middle, one method is to know the delimiter, to parse the text
coherently. As in this second script, and it turns out to be quite precise.

The first script tried a different method, based on the length of the file, plus the
proportion (or fraction) of the middle that the user wants to see. That first script worked
like an accordeon or elastic or small window relative to the entire text. Less precise, but
for cursive texts (not databases), this "elastic" approach may perhaps produce more
"meaning".

I'm still not sure if these scripts will be useful. For now I consider them "Studies".
(Chopin wrote more beautiful ones, I know!!!)

BFN.

_________________
musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
Back to top
View user's profile Send private message 
slavvo67

Joined: 12 Oct 2012
Posts: 1574
Location: The other Mr. 305

PostPosted: Wed 07 Mar 2018, 13:00    Post subject:  

Really cool stuff. It's funny because I see from the past that you and I look at very similar things like file conversions, for example and how to make them work well in Puppy or Linux in general.

I like seeing your work. Thanks...

Slavvo67
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [9 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1000s ][ Queries: 13 (0.0093s) ][ GZIP on ]