Split Index and File Name from string?

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
amigo
Posts: 2629
Joined: Mon 02 Apr 2007, 06:52

#16 Post by amigo »

Offhand I don't even remember what this bash-centric feature is called! :oops: Maybe it's 'variable substitution'.

But this one means "everything(*) to the left of the first '-'":
INDEX=${OUTSTR%%-*}

And this one means "everything to the right of the first '-'":
FILENAME=${OUTSTR#*-}

You can play with it in a terminal to understand it better:

Code: Select all

TEST=59170-LazY-Fred-English-Locals.tar.gz
echo ${TEST%%-*}
echo ${TEST#*-}
Then vary that like this:

Code: Select all

echo ${TEST%-*}
echo ${TEST##*-}
Using one or two '#' characters parses the string from left to right. Using one or two '%' chars parses from right to left. Using doublke characters means to use the longest match, using a single char returns the shortest match.

Anyway, a little trick like that can make your code run hundreds of times faster than some multi-command pipeline. For a single instance you'd not notice the difference, but if that line is being used inside a loop which runs many times, then the difference can be dramatic.

User avatar
Karl Godt
Posts: 4199
Joined: Sun 20 Jun 2010, 13:52
Location: Kiel,Germany

#17 Post by Karl Godt »

There are many possibilities using cut, awk, sed, grep . Since cut is already known in combination with echo ,

here the others :

Code: Select all

STRING="1234-\"attached.file.pet\""
awk :

Code: Select all

echo "$STRING" |awk -F '"' '{print "\""$2"\""}'
sed :

Code: Select all

echo "$STRING" |sed 's:.*-::'
grep :

Code: Select all

echo "$STRING" |grep -o '".*"'
*

On a large database file :

Code: Select all

awk -F '"' '{print "\""$2"\""}' /database_file.db >/newfile.filenames-only.awk.list

Code: Select all

sed 's:.*-::' /database_file.db >/newfile.filenames-only.sed.list

Code: Select all

grep -o '".*"'/database_file.db >/newfile.filenames-only.grep.list
might be faster than a loop .

http://www.grymoire.com/Unix/Sed.html is the reference i download whenever i have/had sed questions .
http://www.grymoire.com/Unix/Awk.html is not as good as the sed tutorial as my impression is,
still have to checkout
http://www.grymoire.com/Unix/Grep.html .

Google for " $COMMAND tutorial " brings up a lot of stuff .

*

Other chars for shell variable substitution are '/' & '//' and since bash-4 '^' ',' & '^^' ',,' :

Code: Select all

STRING="ABCDCEFG-123.456.tar"
echo "${STRING//C/cccCccc}"
echo "${STRING/C/cccCccc}"

Code: Select all

echo "${STRING,C}"
Learn here :
http://www.gnu.org/software/bash/manual ... -Expansion

Bruce B

#18 Post by Bruce B »

You can mostly automate this by downloading the page as html only. Then run the script against the page(s). And have that script download the attachments with both numbers and filenames.

Is this even what you are thinking about?

~

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#19 Post by seaside »

amigo wrote:Offhand I don't even remember what this bash-centric feature is called! :oops: Maybe it's 'variable substitution'.

................................

Anyway, a little trick like that can make your code run hundreds of times faster than some multi-command pipeline. For a single instance you'd not notice the difference, but if that line is being used inside a loop which runs many times, then the difference can be dramatic.
amigo,

Yes, I spent time avoiding using bash string manipulations because when I read the explanations, I thought I understood how they worked, and then later when I went to use them, I'd draw a blank and have to extensively play with the expressions in a terminal to get it right, until this change in thought process......

Think of what you don't want in the string (what you want to cut out)

# (first instance) = left side (beginning) of string ##=last instance encountered
% (first instance) = right side (end) of string %%=last instance encountered

str="don't do what I do, do what I say"
eliminate 'don't do' from the left (return "what I do, do what I say")
${str#*do }
starting from the left (#) eliminate *(all chars) up to the first "do " encountered -notice the space after do, if it's not there, it will only eliminate the first "do" in "don't" , leaving "n't do what I do....."

eliminate "do what I say" from the right
${str%do*}
starting from the right (%) eliminate all chars (*) up to the first encountered "do"

In addition to speed and efficiency, another advantage over "cut" is that you can use more than one char as a delimiter.

Regards,
s

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#20 Post by technosaurus »

stu91 wrote:
amigo wrote:All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:

Code: Select all

INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}
Hi Amigo,
Could you give a breakdown on what the different characters represent in you code - or any links etc that might expand on such code further.

Thanks in advance.
see "substring manipulation" in the advanced bash scripting guide.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Moose On The Loose
Posts: 965
Joined: Thu 24 Feb 2011, 14:54

#21 Post by Moose On The Loose »

technosaurus wrote:
stu91 wrote:
amigo wrote:All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:

Code: Select all

INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}
see "substring manipulation" in the advanced bash scripting guide.
Just as a BTW a trick I use fairly often is to make the thing I am working on contain spaces at key points and then do:

Code: Select all

MyFunctionName  $TheStringInQuestion
This way all the bits of the string come into the function as $1 ...

Inside the function I can then do all manner of stuff to the string. This is good when what you want to do is more complex than the example here because it makes it easier to document what is done.

Post Reply