Split Index and File Name from string?

Message

RSH · #1 Post by **RSH** » Sat 08 Sep 2012, 03:51

Hi.

I have list files containing strings like: 59170-"LazY-Fred-English-Locals.tar.gz" which is the last entry in one list file.

This command COUNT=$((`tail -1 $INDEXFILE | cut -d "-" -f1`)) gives me the 59170 as a result in $COUNT - coming from the last enry of the list file.

Let's say i have a single string (no file) 59170-"LazY-Fred-English-Locals.tar.gz" in $OUTSTR

What do i have to change in COUNT=$((`tail -1 $INDEXFILE | cut -d "-" -f1`)) to get the Index and File Name (without the double quotes) as a result in $INDEX and $FILENAME?

INDEX= ? ? ?
FILENAME= ? ? ?

Thanks

RSH

Bruce B · #2 Post by **Bruce B** » Sat 08 Sep 2012, 05:48

It could be easier if all your 'indexes' were 5 digits like in the example above. Are they? If so, then something like this:

Code: Select all

STRING="59170-LazY-Fred-English-Locals.tar.gz"

FILENAME=`echo $STRING | sed 's/^.....//'`

COUNT=`echo $STRING | cut -d - -f 1`

#test content of variables

echo \$STRING = $STRING  
echo \$FILENAME = $FILENAME
echo \$COUNT = $COUNT

testing outputs
$STRING = 59170-LazY-Fred-English-Locals.tar.gz
$FILENAME = -LazY-Fred-English-Locals.tar.gz
$COUNT = 59170

RSH · #3 Post by **RSH** » Sat 08 Sep 2012, 05:58

Bruce B wrote:It could be easier if all your 'indexes' were 5 digits like in the example above. Are they?

~

Unfortunately not.

Is there any bash function that gives me the position of - ? I could do the rest using string functions that i know and have already used. Just how to find the - ! Could do it in a loop, thought there would be any easier way to do that.

Bruce B · #4 Post by **Bruce B** » Sat 08 Sep 2012, 06:11

I think a loop would be helpful

Please post about 10 or more actual filenames

~

RSH · #5 Post by **RSH** » Sat 08 Sep 2012, 06:52

Bruce B wrote:I think a loop would be helpful

Please post about 10 or more actual filenames

~

Hi Bruce B.

Thanks for your code example. Please take the files below. Would be nice, to have a smarter solution.

How I did solve this:

Code: Select all

INDEXSTR="108-"grubfd01.zip""
#INDEXSTR="1168-"FreeSans.zip""
#INDEXSTR="58992-"LazY-FReD-1.0.2.sfs.gz""
#INDEXSTR="58993-"LazY-FReD-1.0.2.pet""
#INDEXSTR="59170-"LazY-Fred-English-Locals.tar.gz""
nlen="`echo ${#INDEXSTR}`"
echo $nlen

doloop="true"
start=0
while $doloop; do
	s0=${INDEXSTR:$start:1}
	echo $s0
	if [ "$s0" = "-" ]; then
		doloop="false"
	fi
	((start++))
	echo $start
done

s1=${INDEXSTR:0:$start-1}
s2=${INDEXSTR:$start-1:$nlen-$start+1}
echo $s2
nlen="`echo ${#s2}`"
echo $nlen
s3=${s2:1:$nlen-1}
echo $s1
echo $s3

I wonder, if it could be done to make this shown code a function in a single script that would be called from another script and would return $s1 and $s3 ---> or similar $COUNT and $FILENAME ?
RSH

Bruce B · #6 Post by **Bruce B** » Sat 08 Sep 2012, 07:28

what the files have in common is the - after the numbers.

to get just the numbers you could use cut -d - -f 1

Bruce B · #7 Post by **Bruce B** » Sat 08 Sep 2012, 07:36

I don't understand what you are trying to accomplish, if I knew it would be most helpful.

BTW if you want to remove quotes in the file name in the variable

I have a better idea, don't use quotes or any special characters or spaces. I don't and it makes writing scripts to handle files much easier.

~

RSH · #8 Post by **RSH** » Sat 08 Sep 2012, 07:50

I am working on a program that can download the attached files from murga forum, selectable from a list in a gui. Therefor i read out the database - using attach&id , which is in each download link of each attached file.

After building the index file i do get the listed names as a combination of the index and the file name, formatted as shown: 108-"grubfd01.zip"

To download the file i do need the number of its index. But the file is been downloaded as: viewtopic.php?mode=attach&id=108 which gives me absolutely no information on what file type it is. Therefor i do need the file name. After downloading as viewtopic.php?mode=attach&id=108 i move the file and give it a new name ---> the file name.

This is the Project of it all.

RSH · #9 Post by **RSH** » Sat 08 Sep 2012, 09:05

And this is the Application

SFR · #10 Post by **SFR** » Sat 08 Sep 2012, 10:12

RSH wrote:
Bruce B wrote:I think a loop would be helpful

Please post about 10 or more actual filenames

~
Hi Bruce B.

Thanks for your code example. Please take the files below. Would be nice, to have a smarter solution.

How I did solve this:
Code: Select all
INDEXSTR="108-"grubfd01.zip""
#INDEXSTR="1168-"FreeSans.zip""
#INDEXSTR="58992-"LazY-FReD-1.0.2.sfs.gz""
#INDEXSTR="58993-"LazY-FReD-1.0.2.pet""
#INDEXSTR="59170-"LazY-Fred-English-Locals.tar.gz""
nlen="`echo ${#INDEXSTR}`"
echo $nlen

doloop="true"
start=0
while $doloop; do
	s0=${INDEXSTR:$start:1}
	echo $s0
	if [ "$s0" = "-" ]; then
		doloop="false"
	fi
	((start++))
	echo $start
done

s1=${INDEXSTR:0:$start-1}
s2=${INDEXSTR:$start-1:$nlen-$start+1}
echo $s2
nlen="`echo ${#s2}`"
echo $nlen
s3=${s2:1:$nlen-1}
echo $s1
echo $s3

Maybe something like this:

Code: Select all

#!/bin/bash

INDEXSTR="108-"grubfd01.zip"" 
#INDEXSTR="1168-"FreeSans.zip"" 
#INDEXSTR="58992-"LazY-FReD-1.0.2.sfs.gz"" 
#INDEXSTR="58993-"LazY-FReD-1.0.2.pet"" 
#INDEXSTR="59170-"LazY-Fred-English-Locals.tar.gz"" 

COUNT=`echo $INDEXSTR | cut -d '-' -f1`
FILENAME=`echo $INDEXSTR | cut -d '-' -f2-`

echo $INDEXSTR
echo $COUNT
echo $FILENAME

RSH wrote:I wonder, if it could be done to make this shown code a function in a single script that would be called from another script and would return $s1 and $s3 ---> or similar $COUNT and $FILENAME ?

As far as I know there's no a simple way to communicate between scripts.
You can use a temporary file (script1 writes to -> script2 reads from) or you can try this:
http://www.murga-linux.com/puppy/viewtopic.php?t=75778
I never tried it, however, so don't know how exactly use it.

Greetings!

rcrsn51 · #11 Post by **rcrsn51** » Sat 08 Sep 2012, 15:14

RSH wrote:I wonder, if it could be done to make this shown code a function in a single script that would be called from another script and would return $s1 and $s3 ---> or similar $COUNT and $FILENAME ?
RSH

If script1 sends its data to stdout using echo statements, then script2 can retrieve it with code like

Code: Select all

OUT=$(script1)
COUNT=$(echo $OUT | cut -d " " -f 1)
FILENAME=$(echo $OUT | cut -d " " -f 2)

amigo · #12 Post by **amigo** » Sat 08 Sep 2012, 15:33

All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:

Code: Select all

INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}

RSH · #13 Post by **RSH** » Sat 08 Sep 2012, 15:54

amigo wrote:All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:
Code: Select all
INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}

Thanks amigo.

Looks like this could be what i'm looking for. Will test it later.

RSH

rcrsn51 · #14 Post by **rcrsn51** » Sat 08 Sep 2012, 16:04

[Edit] My mistake.

stu91 · #15 Post by **stu91** » Sat 08 Sep 2012, 17:03

amigo wrote:All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:
Code: Select all
INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}

Hi Amigo,
Could you give a breakdown on what the different characters represent in you code - or any links etc that might expand on such code further.

Thanks in advance.

amigo · #16 Post by **amigo** » Sat 08 Sep 2012, 17:16

Offhand I don't even remember what this bash-centric feature is called!

Maybe it's 'variable substitution'.

But this one means "everything(*) to the left of the first '-'":
INDEX=${OUTSTR%%-*}

And this one means "everything to the right of the first '-'":
FILENAME=${OUTSTR#*-}

You can play with it in a terminal to understand it better:

Code: Select all

TEST=59170-LazY-Fred-English-Locals.tar.gz
echo ${TEST%%-*}
echo ${TEST#*-}

Then vary that like this:

Code: Select all

echo ${TEST%-*}
echo ${TEST##*-}

Using one or two '#' characters parses the string from left to right. Using one or two '%' chars parses from right to left. Using doublke characters means to use the longest match, using a single char returns the shortest match.

Anyway, a little trick like that can make your code run hundreds of times faster than some multi-command pipeline. For a single instance you'd not notice the difference, but if that line is being used inside a loop which runs many times, then the difference can be dramatic.

Karl Godt · #17 Post by **Karl Godt** » Sun 09 Sep 2012, 02:35

There are many possibilities using cut, awk, sed, grep . Since cut is already known in combination with echo ,

here the others :

Code: Select all

STRING="1234-\"attached.file.pet\""

awk :

Code: Select all

echo "$STRING" |awk -F '"' '{print "\""$2"\""}'

sed :

Code: Select all

echo "$STRING" |sed 's:.*-::'

grep :

Code: Select all

echo "$STRING" |grep -o '".*"'

*

On a large database file :

Code: Select all

awk -F '"' '{print "\""$2"\""}' /database_file.db >/newfile.filenames-only.awk.list

Code: Select all

sed 's:.*-::' /database_file.db >/newfile.filenames-only.sed.list

Code: Select all

grep -o '".*"'/database_file.db >/newfile.filenames-only.grep.list

might be faster than a loop .

http://www.grymoire.com/Unix/Sed.html is the reference i download whenever i have/had sed questions .
http://www.grymoire.com/Unix/Awk.html is not as good as the sed tutorial as my impression is,
still have to checkout
http://www.grymoire.com/Unix/Grep.html .

Google for " $COMMAND tutorial " brings up a lot of stuff .

*

Other chars for shell variable substitution are '/' & '//' and since bash-4 '^' ',' & '^^' ',,' :

Code: Select all

STRING="ABCDCEFG-123.456.tar"
echo "${STRING//C/cccCccc}"
echo "${STRING/C/cccCccc}"

Code: Select all

echo "${STRING,C}"

Learn here :
http://www.gnu.org/software/bash/manual ... -Expansion

Bruce B · #18 Post by **Bruce B** » Sun 09 Sep 2012, 06:10

You can mostly automate this by downloading the page as html only. Then run the script against the page(s). And have that script download the attachments with both numbers and filenames.

Is this even what you are thinking about?

~

seaside · #19 Post by **seaside** » Sun 09 Sep 2012, 16:41

amigo wrote:Offhand I don't even remember what this bash-centric feature is called! Maybe it's 'variable substitution'.

................................

Anyway, a little trick like that can make your code run hundreds of times faster than some multi-command pipeline. For a single instance you'd not notice the difference, but if that line is being used inside a loop which runs many times, then the difference can be dramatic.

amigo,

Yes, I spent time avoiding using bash string manipulations because when I read the explanations, I thought I understood how they worked, and then later when I went to use them, I'd draw a blank and have to extensively play with the expressions in a terminal to get it right, until this change in thought process......

Think of what you don't want in the string (what you want to cut out)

# (first instance) = left side (beginning) of string ##=last instance encountered
% (first instance) = right side (end) of string %%=last instance encountered

str="don't do what I do, do what I say"
eliminate 'don't do' from the left (return "what I do, do what I say")
${str#*do }
starting from the left (#) eliminate *(all chars) up to the first "do " encountered -notice the space after do, if it's not there, it will only eliminate the first "do" in "don't" , leaving "n't do what I do....."

eliminate "do what I say" from the right
${str%do*}
starting from the right (%) eliminate all chars (*) up to the first encountered "do"

In addition to speed and efficiency, another advantage over "cut" is that you can use more than one char as a delimiter.

Regards,
s

technosaurus · #20 Post by **technosaurus** » Sun 09 Sep 2012, 17:30

stu91 wrote:
amigo wrote:All these loops and extraneous use of echo, sed and cut... Bash makes this very simple:
Code: Select all
INDEX=${OUTSTR%%-*}
FILENAME=${OUTSTR#*-}
Hi Amigo,
Could you give a breakdown on what the different characters represent in you code - or any links etc that might expand on such code further.

Thanks in advance.

see "substring manipulation" in the advanced bash scripting guide.

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Split Index and File Name from string?

Split Index and File Name from string?