Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Thu 17 Apr 2014, 02:58
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Bash: Count numbers in list
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 3 [33 Posts]   Goto page: 1, 2, 3 Next
Author Message
zigbert


Joined: 29 Mar 2006
Posts: 5561
Location: Valåmoen, Norway

PostPosted: Wed 22 Aug 2012, 10:55    Post subject:  Bash: Count numbers in list  

Hello!
I am working on a new rating system in pMusic. I want to see how many times the track is played last week or last month.
The idea is to add 'date +%s' to the rating-index each time the track has played. The index file could look like this
Code:
/root/file1.mp3| 1345641102,1345644563,1345647584,...
/root/file2.mp3| 1345623045,1345643786,1345648695,...

Now the question is:
How many times has file2 played since 1345643390?

In other words:
How many numbers in line 2 is above 1345643390?

The list will be long, so we need a fast calculation


Thank you
Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
Flash
Official Dog Handler


Joined: 04 May 2005
Posts: 10656
Location: Arizona USA

PostPosted: Wed 22 Aug 2012, 11:20    Post subject:  

So those numbers are time stamps that show each time the file was played? Are they in chronological order?

Couldn't you just find the number of commas in each line? (The reverse of counting sheep by counting their legs and dividing by four. Laughing )
Back to top
View user's profile Send private message 
L18L

Joined: 19 Jun 2010
Posts: 2473
Location: Burghaslach, Germany somewhere also known as "Hosla"

PostPosted: Wed 22 Aug 2012, 12:24    Post subject: Re: Bash: Count numbers in list  

Code:
#!/bin/ash
#
log="/root/file2.mp3| 1345623045,1345643786,1345648695"
#
#How many numbers in log is below 1345643390?
#How many numbers in log is below oneTIME
oneTIME=1345643390 # seconds since 1970-1-1
COUNT=0
for VAR in `echo $log | cut -d'|' -f2 | tr ',' '\n'`; do
 [ $(( $VAR - $oneTIME )) -gt 0 ] &&  COUNT=$(( $COUNT + 1 ))
done
echo $COUNT numbers are above $oneTIME

The following should be faster if in chronological order:
Code:
#!/bin/ash
#
log="/root/file2.mp3| 1345623045,1345643786,1345648695"
#
#How many numbers in log is above 1345643390?
#How many numbers in log is above oneTIME
oneTIME=1345643390 # seconds since 1970-1-1
COUNT=0
for VAR in `echo $log | cut -d'|' -f2 | tr ',' '\n'`; do
 [ $(( $VAR - $oneTIME )) -lt 0 ] &&  COUNT=$(( $COUNT + 1 )) || break
done
echo $COUNT numbers are below $oneTIME

You have to subtract this from the total number of commas. Cool
Back to top
View user's profile Send private message 
seaside

Joined: 11 Apr 2007
Posts: 878

PostPosted: Wed 22 Aug 2012, 15:46    Post subject:  

zigbert,

If you don't care about dates, you might just do:
Code:
grep <file name> music_play_log | wc -l


Regards,
s
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5561
Location: Valåmoen, Norway

PostPosted: Wed 22 Aug 2012, 16:54    Post subject:  

Thank you guys for your feedback

Yes, the numbers will be in in chronological order.

@seaside
The rating system today shows how many times you have played the track - with no dates. I want more.....

@L18L
This is sure a solution, but it will be slow when rating-index grows.


Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
SFR


Joined: 26 Oct 2011
Posts: 876

PostPosted: Wed 22 Aug 2012, 17:04    Post subject:  

zigbert wrote:
[...]it will be slow when rating-index grows.

How about binary search algorithm?
Although it looks pretty complex to implement (at least for me) when using such a type of index (item1, item2, item3...), but it should be easier with arrays.
Here's the snippet I found:
http://www.bashguru.com/2011/01/binarysearch-shell-script.html

Greetings!

_________________
[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Back to top
View user's profile Send private message 
L18L

Joined: 19 Jun 2010
Posts: 2473
Location: Burghaslach, Germany somewhere also known as "Hosla"

PostPosted: Thu 23 Aug 2012, 09:02    Post subject: Count numbers in list
Subject description: binary search
 

SFR wrote:
How about binary search algorithm?
Although it looks pretty complex to implement (at least for me) when using such a type of index (item1, item2, item3...), but it should be easier with arrays.
Here's the snippet I found:
http://www.bashguru.com/2011/01/binarysearch-shell-script.html


Here is what I made from that snippet:
Code:
#!/bin/sh
# SCRIPT : derived from: binarysearch.sh
# SCRIPT : bsearch.sh
# USAGE  : bsearch.sh
# PURPOSE: Searches given number in a sorted list.
#                         \\\\ ////
#                         \\ - - //
#                            @ @
#                    ---oOOo-( )-oOOo---
#
#####################################################################
#                      Define Functions Here                        #
#####################################################################

binarysearch()
{
    status=-1
    i=1
    array=($(echo "$@"))
    LowIndex=0
    HeighIndex=$((${#array[@]}-1))
    while [ $LowIndex -lt $HeighIndex ]
    do
        MidIndex=$(($LowIndex+($HeighIndex-$LowIndex)/2))
        MidElement=${array[$MidIndex]}
        case $(($MidElement - $SearchedItem)) in
        -*) LowIndex=$(($MidIndex+1)) ;;
         0) status=0
            searches=$i
            return ;;
        *)  HeighIndex=$(($MidIndex-1)) ;;
        esac
        let i++
    done
}

#####################################################################
#                       Variable Declaration                       #
#####################################################################

clear
echo "Enter Sorted Array Elements : "
read -a ARRAY
count=${#ARRAY[@]}

#####################################################################
#                       Main Script Starts Here                     #
#####################################################################

while [ : ]
do
     echo -n "
     Type in a number or break using CTRL-C: "
     read SearchedItem
     binarysearch "${ARRAY[@]}"
     [ $SearchedItem -gt $MidElement ] \
      && echo "nothing found equal or later than $SearchedItem" \
      || echo "$SearchedItem found after $searches searches, MidIndex=$MidIndex MidElement=$MidElement"

done
Back to top
View user's profile Send private message 
akash_rawal

Joined: 25 Aug 2010
Posts: 232
Location: ISM Dhanbad, Jharkhand, India

PostPosted: Thu 23 Aug 2012, 11:50    Post subject: Binary search application  

And here's it's application in zigbert's problem:

Code:

#!/bin/bash

#usage: played_since filename time

#data stored in database.txt

file="$1";
time="$2";

ifs_bak="$IFS"
IFS=""

while read -d "|" record; do
   #check for file
   if test "$record" = "$file"; then
      IFS=","
      #Get all timestamps in array
      read -a times
      #Binary search
      i=$(( ${#times[*]}/2 ))
      d=$(( $i/2 ))
      test "$d" -eq "0" && d=1 #precaution
      ulim=$(( ${#times[*]}-1 ))
      while test "$i" -ge "0" -a "$i" -lt "$ulim"; do #check whether index is in range
         if test "$time" -lt "${times[$i]}"; then
            i=$(( $i-$d ))
         elif test "$time" -gt "${times[$(( $i+1 ))]}"; then
            i=$(( $i+$d ))
         else
            #found, tell no. of records greater than this
            echo "$(( $ulim-$i ))"
            exit
         fi
         d=$(( ($d/2) ))
         test "$d" -eq "0" && d=1 #precaution
      done
      #In case of 'out of range'
      if test "$ulim" -lt "0"; then
         echo "0"
      else
         if test "$time" -lt "$times"; then
            echo "${#times[*]}"
         elif test "$time" -gt "${times[$ulim]}"; then
            echo "0"
         else
            echo "bug" #This should not be reached
         fi
      fi
      exit
   fi
   read trash
done < database.txt

IFS="$ifs_bak"

echo "File not found"



I have found that overhead of reading data from file into array is much larger than time required for binary search, we have to do something there.

Plus, if the lists are going to be really large it would be performance advantage to store timestamps of each track in a separate file.

I am using a c++ program to generate about 87 MB test file to search in.

Code:

//g++ -Wall -o data_gen source.cpp

#include <iostream>
#include <fstream>
#include <cstdlib>

using namespace std;

int main()
{
   ofstream strm("database.txt", ios::out);
   
   int i, j;
   for (i = 0; i < 100; i++)
   {
      unsigned long rand_time = 0;
      strm << "file" << i << ".mp3|";
      for (j = 0; j < 100000; j++)
      {
         rand_time += (rand() % 1000);
         strm << rand_time << ",";
      }
      rand_time += (rand() % 1000);
      strm << rand_time << "\n";
   }
   
   strm.close();
   
   return 0;
}


Time taken for last file roughly equals time to generate the database.
Code:

# time ./data_gen

real   0m6.873s
user   0m3.936s
sys   0m0.257s
# time ./played_since file0.mp3 48000000
3881

real   0m0.269s
user   0m0.173s
sys   0m0.027s
# time ./played_since file99.mp3 48000000
3672

real   0m6.849s
user   0m5.696s
sys   0m0.677s
#
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5561
Location: Valåmoen, Norway

PostPosted: Thu 23 Aug 2012, 12:59    Post subject:  

Thank you guys
Pretty amazing you want to spend so much time helping out!!!!!

I made a test.
1.) Checking my existing rating index - there are about 2000 tracks in the index.
2.) Make a fake index based on the new structure including 2000 tracks and 25 timestamps for each track. All together 574kb.
3.) Run the played_since script to get an assumption of how fast it can build an updated list of favorite tracks.

Code:
# cut -d'|' -f1 ./database.txt > /tmp/played_since; time while read I; do ./played_since.sh $I 1345643390; done < /tmp/played_since
...
...
...
24
24
24
24
24
24
24
24
24
24

real   0m30.799s
user   0m21.339s
sys   0m5.560s
#

I have a modern system, and still it takes half a minute to build a rating list. Is this an understanding why ratings in the big music-apps uses stars for rating instead of a exact status.

But still, it would be awesome to define a given time (last week, januar 2011, or whatever) and then get My favorite tracks in that period.


Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
akash_rawal

Joined: 25 Aug 2010
Posts: 232
Location: ISM Dhanbad, Jharkhand, India

PostPosted: Thu 23 Aug 2012, 13:55    Post subject:  

If you want to process all files then why not go this way:
Code:

#!/bin/bash

#usage: played_since_foreach time

time="$1";

ifs_bak="$IFS"
IFS=""

while read -d "|" record; do
   IFS=","
   #Get all timestamps in array
   read -a times
   #Binary search
   i=$(( ${#times[*]}/2 ))
   d=$(( $i/2 ))
   test "$d" -eq "0" && d=1 #precaution
   ulim=$(( ${#times[*]}-1 ))
   count=-1
   while test "$i" -ge "0" -a "$i" -lt "$ulim"; do #check whether index is in range
      if test "$time" -lt "${times[$i]}"; then
         i=$(( $i-$d ))
      elif test "$time" -gt "${times[$(( $i+1 ))]}"; then
         i=$(( $i+$d ))
      else
         #found, tell no. of records greater than this
         count="$(( $ulim-$i ))"
         break
      fi
      d=$(( ($d/2) ))
      test "$d" -eq "0" && d=1 #precaution
   done
   #In case of 'out of range'
   if test "$count" -lt "0"; then
      if test "$ulim" -lt "0"; then
         count=0
      else
         if test "$time" -lt "$times"; then
            count="${#times[*]}"
         elif test "$time" -gt "${times[$ulim]}"; then
            count=0
         else
            count="bug" #This should not be reached
         fi
      fi
   fi
   #Show result for this record
   echo "$record: $count"
done < database.txt

IFS="$ifs_bak"


This takes about 20 seconds to process my 84 mb test file.

zigbert wrote:

2.) Make a fake index based on the new structure including 2000 tracks and 25 timestamps for each track. All together 574kb.

I think I overestimated the size Laughing, such a file takes about 2s on my system using the above script. But I think it could be even faster.
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5561
Location: Valåmoen, Norway

PostPosted: Thu 23 Aug 2012, 14:32    Post subject:  

1.1 sec here Smile
Quote:
But I think it could be even faster.
Would be good for those with older hardware. - And those are many in the Puppy world.


Thanks a lot (so far)
Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
zigbert


Joined: 29 Mar 2006
Posts: 5561
Location: Valåmoen, Norway

PostPosted: Thu 23 Aug 2012, 15:15    Post subject:  

akash_rawal
I don't think 5 sec is problematic if user request building a specific rating list....

The speed issue arrives when pMusic wants to give a general rating list of the most popular songs in the Overview mode. A workaround could be to always put the last played track at the bottom of the rating list. When later requesting a quick rating-list we could just 'tail -n 200 > database.txt' before executing played_since_foreach.

That means your code will somehow do the job as is, but if speed improvements is possible, it would sure help.


Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
technosaurus


Joined: 18 May 2008
Posts: 4133

PostPosted: Thu 23 Aug 2012, 21:32    Post subject:  

something like:
Code:

LOGENTRY=`grep $filename log.txt`
#grep is faster than while read case on large files
STAMPS=${LOGENTRY#*|}
#you could use bash arrays instead - set allows "arrays" in sh, ash, etc...
set ${STAMPS//,/ }
#for speed change IFS to "," vs. the //,/ } but don't forget to reset it
while ([ $1 -lt $timestamp ]) do
shift #a hacky way to remove all entries < timestamp
done
# $# is the number of args passed or in this case set with "set ..."
echo The number of times played since $timestamp is $#

haven't tested for accuracy, but something along these lines should be pretty fast... even faster using busybox ash, based on past experience

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
L18L

Joined: 19 Jun 2010
Posts: 2473
Location: Burghaslach, Germany somewhere also known as "Hosla"

PostPosted: Fri 24 Aug 2012, 08:27    Post subject: Bash: Count numbers in list  

technosaurus wrote:
...haven't tested for accuracy, but ...

Hope this will suffice for accuracy...using zigbert´s original database
# cat database.txt
/root/file1.mp3| 1345641102,1345644563,1345647584
/root/file2.mp3| 1345623045,1345643786,1345648695
#
# ./file_times_played_since 1345644563
The number of times played /root/file1.mp3 since 1345644563 is 2
The number of times played /root/file2.mp3 since 1345644563 is 1
#
# ./file_times_played_since 1345648694
ash: 1345648694: unknown operand
The number of times played /root/file1.mp3 since 1345648694 is 0
The number of times played /root/file2.mp3 since 1345648694 is 1
#

file_times_played_since:
Code:
#!/bin/ash
timestamp=$1

ifs_bak=$IFS
IFS=","

while read LOGENTRY; do

filename=${LOGENTRY%%|*}
#LOGENTRY=`grep $filename database.txt`
#grep is faster than while read case on large files

STAMPS=${LOGENTRY#*|}
#you could use bash arrays instead - set allows "arrays" in sh, ash, etc...
#set ${STAMPS//,/ }
set ${STAMPS}
#for speed change IFS to "," vs. the //,/ } but don't forget to reset it
while ([ $1 -lt $timestamp ]) do
shift #a hacky way to remove all entries < timestamp
done
# $# is the number of args passed or in this case set with "set ..."
echo The number of times played $filename since $timestamp is $#

done < database.txt

IFS=$ifs_bak
Smile
The only issue I have found is when result is zero:
ash: 1345648694: unknown operand
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4133

PostPosted: Fri 24 Aug 2012, 10:43    Post subject:  

Oops need to prepend the -lt test with a [ $1 ] && ... For the case where it has not been played in the time period
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 3 [33 Posts]   Goto page: 1, 2, 3 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1080s ][ Queries: 12 (0.0049s) ][ GZIP on ]