Bash: Count numbers in list

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
akash_rawal
Posts: 229
Joined: Wed 25 Aug 2010, 15:38
Location: ISM Dhanbad, Jharkhand, India

#21 Post by akash_rawal »

jamesbond wrote: My submission:
...
I tested your solution, it takes only one second to process 100 songs with 100000 timestamps. Faster by orders of magnitude :)

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Count numbers in list

#22 Post by L18L »

I have been testing jamesbond´s submission, too.

change
print a[1], min
to
print a[1], NF-min
to get TIMES from FIND to NOW as required.

Speed: wow, it is TIME to learn awk now. 8)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#23 Post by technosaurus »

@L18L
the awk would be even faster if it only computed stuff for the match:

Code: Select all

FILEPATH=${FILEPATH//\//\\\/} #escape the slashes for awk ... if you care about paths
#FILEPATH=${FILEPATH##*/} #remove the path for awk ... if you only care about names
awk -F, -v TS=$FIND '/'$FILEPATH'/{...
not sure how well awk works on non-ascii filenames though
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Count numbers in list

#24 Post by L18L »

technosaurus wrote:the awk would be even faster if it only computed stuff for the match
Yes, and slower if we care about accuracy.

The binary search algorithm is not accurate if case of some identic values.
But on a single user system there are no doubles :lol:

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#25 Post by zigbert »

Test results so far with the following file

Code: Select all

/root/file1.mp3| 1345641102,1345644563,1345647584,1345647585,1345647586,1345647587,1345647588,1345647589,1345647590,1345740286,1345740287,1345740288,1345740290,1345740291,1345740291,1345740292,1345740293,1345740293,1345740294,1345740295,1345740296,1345740297,1345740298,1345740299,1345740300
/root/file2.mp3| 1345623045,1345643786,1345648695,1345648696,1345648697,1345648698,1345648699,1345648700,1345648701,1345740286,1345740287,1345740288,1345740290,1345740291,1345740291,1345740292,1345740293,1345740293,1345740294,1345740295,1345740296,1345740297,1345740298,1345740299,1345740300
[...]
/root/file1999.mp3| 1345641102,1345644563,1345647584,1345647585,1345647586,1345647587,1345647588,1345647589,1345647590,1345740286,1345740287,1345740288,1345740290,1345740291,1345740291,1345740292,1345740293,1345740293,1345740294,1345740295,1345740296,1345740297,1345740298,1345740299,1345740300
/root/file2000øæå.mp3| 1345623045,1345643786,1345648695,1345648696,1345648697,1345648698,1345648699,1345648700,1345648701,1345740286,1345740287,1345740288,1345740290,1345740291,1345740291,1345740292,1345740293,1345740293,1345740294,1345740295,1345740296,1345740297,1345740298,1345740299,1345740300
########################################

Code: Select all

#!/bin/bash

#usage: played_since_foreach time

time="$1";

ifs_bak="$IFS"
IFS=""

while read -d "|" record; do
   IFS=","
   #Get all timestamps in array
   read -a times
   #Binary search
   i=$(( ${#times[*]}/2 ))
   d=$(( $i/2 ))
   test "$d" -eq "0" && d=1 #precaution
   ulim=$(( ${#times[*]}-1 ))
   count=-1
   while test "$i" -ge "0" -a "$i" -lt "$ulim"; do #check whether index is in range
      if test "$time" -lt "${times[$i]}"; then
         i=$(( $i-$d ))
      elif test "$time" -gt "${times[$(( $i+1 ))]}"; then
         i=$(( $i+$d ))
      else
         #found, tell no. of records greater than this
         count="$(( $ulim-$i ))"
         break
      fi
      d=$(( ($d/2) ))
      test "$d" -eq "0" && d=1 #precaution
   done
   #In case of 'out of range'
   if test "$count" -lt "0"; then
      if test "$ulim" -lt "0"; then
         count=0
      else
         if test "$time" -lt "$times"; then
            count="${#times[*]}"
         elif test "$time" -gt "${times[$ulim]}"; then
            count=0
         else
            count="bug" #This should not be reached
         fi
      fi
   fi
   #Show result for this record
   echo "$record: $count"
done < database.txt

IFS="$ifs_bak" 

Code: Select all

real	0m1.129s
user	0m1.080s
sys	0m0.047s
########################################

Code: Select all

#!/bin/ash
timestamp=$1

ifs_bak=$IFS
IFS=","

while read LOGENTRY; do

filename=${LOGENTRY%%|*}
#LOGENTRY=`grep $filename database.txt`
#grep is faster than while read case on large files

STAMPS=${LOGENTRY#*|}
#you could use bash arrays instead - set allows "arrays" in sh, ash, etc...
#set ${STAMPS//,/ }
set ${STAMPS}
#for speed change IFS to "," vs. the //,/ } but don't forget to reset it
while ( [ $1 ] && [ $1 -lt $timestamp ]) do
shift #a hacky way to remove all entries < timestamp
done
# $# is the number of args passed or in this case set with "set ..."
echo The number of times played $filename since $timestamp is $#

done < database.txt

IFS=$ifs_bak 

Code: Select all

real	0m1.434s
user	0m0.190s
sys	0m0.633s
########################################

Code: Select all

#!/bin/ash
LANG=C
FIND=$1

cat /root/database.txt | 
awk -F, -v TS=$FIND '{
      split($1, a, "|"); $1=a[2] # fix first entry
      max=NF; min=1;
      
      while (max-min > 1) {
         i=int( (max+min)/2 )
         if (TS >= $i) min = i
         else max = i
      }
      print a[1], NF-min 
}' 

Code: Select all

real	0m0.026s
user	0m0.020s
sys	0m0.003s
########################################

All worked with non-english filename /root/file2000øæå.mp3


Thank you all
Sigmund

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#26 Post by zigbert »

Ok, next stage
I don't understand a bit of the awk-code, so I need some help to integrate it fully into pMusic code.

It now works ok (only ok) to find the last played songs for a given period:
My NEW index file looks right now like this:

Code: Select all

  /mnt/sdb1/musikk/mp3/Judas priest - Monsters of rock.mp3|    Judas priest - Monsters of rock.mp3|:,1345762237
  /mnt/sdb1/musikk/mp3/Iron maiden - Seventh son of a seventh son.mp3|    Iron maiden - Seventh son of a seventh son.mp3|:,1345762833
  /mnt/sdb1/musikk/mp3/Savatage - Dead winter dead.mp3|    Savatage - Dead winter dead.mp3|:,1345763094
  /mnt/sdb1/musikk/mp3/Twisted sister - Hot love.mp3|    Twisted sister - Hot love.mp3|:,1345763569
  /mnt/sdb1/musikk/mp3/Twisted sister - I believe in you.mp3|    Twisted sister - I believe in you.mp3|:,1345763892
  /mnt/sdb1/musikk/mp3/Twisted sister - I'm so hot for you.mp3|    Twisted sister - I'm so hot for you.mp3|:,1345764140
  /mnt/sdb1/musikk/mp3/Twisted sister - I wanna rock.mp3|    Twisted sister - I wanna rock.mp3|:,1345764326
  /mnt/sdb1/musikk/mp3/Twisted sister - King of the fools.mp3|    Twisted sister - King of the fools.mp3|:,1345764715
  /mnt/sdb1/musikk/mp3/Twisted sister - Leader of the pack.mp3|    Twisted sister - Leader of the pack.mp3|:,1345764940
  /mnt/sdb1/musikk/mp3/Twisted sister - Lookin out for no. 1.mp3|    Twisted sister - Lookin out for no. 1.mp3|:,1345765130
  /mnt/sdb1/musikk/mp3/Twisted sister - One bad habit.mp3|    Twisted sister - One bad habit.mp3|:,1345765538
  /mnt/sdb1/musikk/mp3/Twisted sister - Run for your life.mp3|    Twisted sister - Run for your life.mp3|:,1345766019
  /mnt/sdb1/musikk/mp3/Twisted sister - The beast.mp3|    Twisted sister - The beast.mp3|:,1345760757,1345766230
  /mnt/sdb1/musikk/mp3/Twisted sister - The price.mp3|    Twisted sister - The price.mp3|:,1345766462
  /mnt/sdb1/musikk/mp3/Twisted sister - Under the blade.mp3|    Twisted sister - Under the blade.mp3|:,1345766745
  /mnt/sdb1/musikk/mp3/Twisted sister - Wake up (the sleeping giant).mp3|    Twisted sister - Wake up (the sleeping giant).mp3|:,1345767008
  /mnt/sdb1/musikk/mp3/Twisted sister - We're not gonna take it.mp3|    Twisted sister - We're not gonna take it.mp3|:,1345767230
  /mnt/sdb1/musikk/mp3/Twisted sister - You can't stop rock'n'roll.mp3|    Twisted sister - You can't stop rock'n'roll.mp3|:,1345767514
  /mnt/sdb1/musikk/mp3/Twisted sister - You're not alone (Suzette's song).mp3|    Twisted sister - You're not alone (Suzette's song).mp3|:,1345767762
  /mnt/sdb1/musikk/mp3/Twisted sister - You want what we got.mp3|    Twisted sister - You want what we got.mp3|:,1345767984
  /mnt/sdb1/musikk/mp3/Saxon - Wheels of steel.mp3|    Saxon - Wheels of steel.mp3|:,1345789853
  /mnt/sdb1/musikk/mp3/Judas priest - Parental guidance (LP).mp3|    Judas priest - Parental guidance (LP).mp3|:,1345790062
  /mnt/sdb1/musikk/mp3/Judas priest - Turbo lover (LP).mp3|    Judas priest - Turbo lover (LP).mp3|:,1345790660
  /mnt/sdb1/musikk/mp3/Alice Cooper - Poison.mp3|    Alice Cooper - Poison.mp3|:,1345790933
  /mnt/sdb1/musikk/mp3/Faith no more - Easy.mp3|    Faith no more - Easy.mp3|:,1345791129
  /mnt/sdb1/musikk/mp3/Bangles - Manic monday.mp3|    Bangles - Manic monday.mp3|:,1345791315
  /mnt/sdb1/musikk/mp3/Secret garden - Nocturne.mp3|    Secret garden - Nocturne.mp3|:,1345791511
  /mnt/sdb1/musikk/mp3/Dimmu borgir - Vredesbyrd.mp3|    Dimmu borgir - Vredesbyrd.mp3|:,1345760201,1345760237,1345794966
  /mnt/sdb1/musikk/mp3/Black sabbath - Heaven and hell.mp3|    Black sabbath - Heaven and hell.mp3|:,1345760180,1345761905,1345796352
  /mnt/sdb1/musikk/mp3/Twisted sister - Love is for suckers.mp3|    Twisted sister - Love is for suckers.mp3|:,1345765338,1345797921
  /mnt/sdb1/musikk/mp3/Dimmu borgir - Burn in hell.mp3|    Dimmu borgir - Burn in hell.mp3|:,1345760219,1345760545,1345795930,1345798229
  /mnt/sdb1/musikk/mp3/Twisted sister - Destroyer.mp3|    Twisted sister - Destroyer.mp3|:,1345763352,1346008426
  /mnt/sdb1/musikk/flac/Candlemass - At the gallows end.flac|    Candlemass - At the gallows end.flac|:,1345761248,1345796972,1346008710
  /mnt/sdb1/musikk/mp3/Saint deamon - My heart.mp3|    Saint deamon - My heart.mp3|:,1345761483,1345797207,1346008944
  /mnt/sdb1/musikk/mp3/Ræva Rockers - Depp (LP).mp3|    Ræva Rockers - Depp (LP).mp3|:,1345795310,1345797401,1346009139
To get it work I made a workaround and used awk separator |: instead of simply |
... I couldn't get the awk code to read column 3 instead of column 2 (as in the example in the main post.)
Some #comments would be great :D

The function in pMusic:

Code: Select all

-index_rating_buildlist)
	TIMESTAMP="$2"
	cat "$3" | 
	awk -F, -v TS=$TIMESTAMP '{
		  split($1, a, "|:"); $1=a[2] # fix first entry
		  max=NF; min=1;
		  
		  while (max-min > 1) {
			 i=int( (max+min)/2 )
			 if (TS >= $i) min = i
			 else max = i
		  }
		  print a[1], NF-min 
	}' 
	;;
Also, what we want is to define a alternative max value to find timestamps in a specific period. - Like february 2012. Let's say that is $4.


Anyone???
Sigmund

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Awk: Count numbers in list

#27 Post by L18L »

@zigbert - faster than
read LINE; do ..... done < database.txt
is your
cat /root/database.txt | awk -F, -v TS=$FIND '...'

But even faster (10%) is:
awk -F, -v TS=$FIND ' ...' database.txt

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Awk: Count numbers in list

#28 Post by L18L »

zigbert wrote:...My NEW index file looks right now like this:

Code: Select all

  /mnt/sdb1/musikk/mp3/Judas priest - Monsters of rock.mp3|    Judas priest - Monsters of rock.mp3|:,1345762237
To get it work I made a workaround and used awk separator |: instead of simply |
... I couldn't get the awk code to read column 3 instead of column 2 (as in the example in the main post.)
Some #comments would be great :D
Why 3 columns?
I have successfully tried to print file and basename from 2 column database:
changing just
print a[1], NF-min
to
num=split (a[1], b , "/")
print a[1], b[num], NF-min

Learning awk:
split(s, a [, r])
Splits the string s into the array a on the
regular expression r, and returns the number of
fields. If r is omitted, FS is used instead.
The array a is cleared first. Splitting behaves
identically to field splitting.

Go back to 2 column database :?:

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Awk: Count numbers in list

#29 Post by L18L »

zigbert wrote:Also, what we want is to define a alternative max value to find timestamps in a specific period. - Like february 2012. Let's say that is $4.
Where is $1 ?

Here is a solution using $1 and $2 timestamp from and -to
#!/bin/ash
#additional output filename only=basename
#TIMESTAMP from to
#
LANG=C
#
FIND_FROM=$1 #timestamp
FIND_TO=$2

awk -F, -v TS1=$FIND_FROM -v TS2=$FIND_TO '{
split($1, a, "|"); $1=a[2] # fix first entry

max=NF; min=0;
while (max-min > 1) {
i=int( (max+min)/2 )
if (TS1 >= $i) min = i
else max = i
}
from=NF-min

max=NF; min=0;
while (max-min > 1) {
i=int( (max+min)/2 )
if (TS2 >= $i) min = i
else max = i
}
to=NF-min


num=split (a[1], b , "/")
print a[1], b[num], from-to

}' database.txt

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

Re: Awk: Count numbers in list

#30 Post by zigbert »

L18L wrote:But even faster (10%) is:
awk -F, -v TS=$FIND ' ...' database.txt
Done :D

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#31 Post by zigbert »

L18L wrote:Why 3 columns?
Column 2 is not always basename.
Preferable it collects information from meta tags or m3u-file.

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

Re: Awk: Count numbers in list

#32 Post by zigbert »

L18L wrote:
zigbert wrote:Also, what we want is to define a alternative max value to find timestamps in a specific period. - Like february 2012. Let's say that is $4.
Where is $1 ? $1 is for pMusic internal redirection

Here is a solution using $1 and $2 timestamp from and -to
#!/bin/ash
#additional output filename only=basename
#TIMESTAMP from to
#
LANG=C
#
FIND_FROM=$1 #timestamp
FIND_TO=$2

awk -F, -v TS1=$FIND_FROM -v TS2=$FIND_TO '{
split($1, a, "|"); $1=a[2] # fix first entry

max=NF; min=0;
while (max-min > 1) {
i=int( (max+min)/2 )
if (TS1 >= $i) min = i
else max = i
}
from=NF-min

max=NF; min=0;
while (max-min > 1) {
i=int( (max+min)/2 )
if (TS2 >= $i) min = i
else max = i
}
to=NF-min

variable 'to' is never 0, even if zero hits

num=split (a[1], b , "/")
print a[1], b[num], from-to

}' database.txt
Else, I find this excellent!!!!!

User avatar
L18L
Posts: 3479
Joined: Sat 19 Jun 2010, 18:56
Location: www.eussenheim.de/

Re: Awk: Count numbers in list

#33 Post by L18L »

zigbert wrote:Else, I find this excellent!!!!!
And our thanks go to: jamesbond :!:

Post Reply