Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Fri 18 Apr 2014, 08:50
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Bash: sort
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [15 Posts]  
Author Message
zigbert


Joined: 29 Mar 2006
Posts: 5562
Location: Valåmoen, Norway

PostPosted: Sat 29 Sep 2012, 09:12    Post subject:  Bash: sort
Subject description: Is it possible to sort a file based on another file ?
 

Let's say file1 looks like this:
Code:
03:50|Artist - Title|001 /path/Artist_Title.mp3
04:16|Bartist - Title|002 /path/Bartist_Title.mp3
03:32|Cartist - Title|003 /path/Cartist_Title.mp3
...and file2 contains the correct order:
Code:
002 /path/Bartist_Title.mp3
001 /path/Artist_Title.mp3
003 /path/Cartist_Title.mp3
Yes, the sort order is correct even if 002 is above 001. ..


How to sort file1 based on file2 ? ... in the speed of light Wink


Thank you
Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
SFR


Joined: 26 Oct 2011
Posts: 879

PostPosted: Sat 29 Sep 2012, 09:59    Post subject:  

Interesting problem...
Ok, here's the first attempt.

This one will work only if the numbers from file2 (001, 002 ...) are exactly corresponding to the line numbers in file1, as is shown in your examples.

Code:
#!/bin/bash

for i in `awk '{print $1}' file2`; do
  awk 'NR=='$i'' file1
done

I don't know what about "the speed of light"; must be tested on something larger I guess. Wink

Greetings!

_________________
[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Back to top
View user's profile Send private message 
L18L

Joined: 19 Jun 2010
Posts: 2473
Location: Burghaslach, Germany somewhere also known as "Hosla"

PostPosted: Sat 29 Sep 2012, 11:25    Post subject: sort  

# time sort -t '|' -k 2 file1
03:50|Artist - Title|001 /path/Artist_Title.mp3
04:16|Bartist - Title|002 /path/Bartist_Title.mp3
03:32|Cartist - Title|003 /path/Cartist_Title.mp3

real 0m0.030s
user 0m0.007s
sys 0m0.007s
#

# time for i in `awk '{print $1}' file2`; do
> awk 'NR=='$i'' file1
> done
04:16|Bartist - Title|002 /path/Bartist_Title.mp3
03:50|Artist - Title|001 /path/Artist_Title.mp3
03:32|Cartist - Title|003 /path/Cartist_Title.mp3

real 0m0.158s
user 0m0.053s
sys 0m0.013s
#
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4134

PostPosted: Sat 29 Sep 2012, 17:27    Post subject:  

btw if you are using awk you can right justify text like this:
Code:
echo -e "1 hello\n2 world\n256 last" |awk '{printf "%5s %s\n",$1, $2}'


in awk you can use associative arrays and set it up by processing the first file in a 2nd file {the order matters, it parsers 1st file 1st....} you can also do stuff before and/or after all files

(by associative arrays, I mean that you can just randomly name the fields like a[filename]=number b[filename]=time ...)

here is the template I use, for when I forget all the random features;

Code:
#!/bin/awk -f
#FILENAME (name of current file) $FILENAME (contents of current file)
#NF number of fields, $NF last field
#NR line number in all files      #FNR line number in current file
#ORS (default is "\n")            #RS  (default is "\n")
#OFS (default is " ")            #FS (default is [ \t]*)
#system(command) run a command      #close(filename) close(command)
#ARGC, ARGV similar to C, but skips some stuff
#IGNORECASE (default is 0) set to non-0 or use toupper() or tolower()
#ENVIRON array of env vars ex. ENVIRON["SHELL"] (equivalent of $SHELL)
#getline var < file ... close file or command | getline var
#index(haystack, needle) find needle in haystack
#length(string)
#match(string, regexp) returns where the regex starts, or 0
#RLENGTH length of /match/ substring or -1
#RSTART position where the /match/ substring starts, or 0
#split(string, array, fieldsep) split string into an array separated by fieldsep
#printf(format, expression1,...) print format-ted replacing %* with expressions
#%{c,d/i,e,f,g,o,s,x,X,%} char, decimal int, exp notation, float, shortest of
#   exp/float, octal, string, hex int, capitalized hex int, a '%' character
#sprintf(format, expression1,...) store printf in a variable
#sub(regexp, replacement, target) replace first regex with replacement in target
#gsub(regexp, replacement, target) like gsub but for all regex in target
#substr(string, start, length)get substring of string from start to start+length
#print > /dev/stdin, /dev/stdout, /dev/stderr, /dev/fd/# or filename
#output can be piped like print $0 | command
#comparisons <,>,<=,>=,==,!=,~,!~,in use && for AND, || for OR, ! for NOT
#   (~ is for regexp and "in" looks for subscript in array)
#/word/{...} like if match(...) {...} equivalent of grep
#(condition) ? if-true-exp : if-false-exp or use if (condition){}
#math +,-,*,/,%,**,log(x),exp(x),,sqrt(x),cos(x),sin(x),atan2(y,x),
#rand(),srand(x),time(),ctime()
#
#function name (parameter-list) {
#     body-of-function
#}

BEGIN {
#actions that happen before any files are read in
}
#
{
#actions to do on files
}
#
END {
#actions to do after all files are done
}


_________________
Web Programming - Pet Packaging 100 & 101

Last edited by technosaurus on Sat 20 Oct 2012, 14:21; edited 1 time in total
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5562
Location: Valåmoen, Norway

PostPosted: Sun 30 Sep 2012, 05:55    Post subject:  

I am thankful for all tips and input.
There are many ways to solve this, but I am still searching for brilliance Wink


Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
akash_rawal

Joined: 25 Aug 2010
Posts: 232
Location: ISM Dhanbad, Jharkhand, India

PostPosted: Sun 30 Sep 2012, 13:42    Post subject:  

Here's my attempt:
Code:

#!/bin/bash

#Utility
function endl()
{
   cat
   echo
}

#Our own private directory
tmp="/tmp/sort2"
mkdir -p "$tmp"

#Index file2
i=0
ifsbak="$IFS"
IFS=""
while read line; do
   echo "$i|$line"
   i=$(( $i+1 ))
done < "./file2" > "$tmp/file2_indexed"

#Sort both files alphabetically
sort -t '|' -k 3 -o "$tmp/file1_sorted" "./file1"
sort -t '|' -k 2 -o "$tmp/file2_sorted" "$tmp/file2_indexed"

#Load 'sorted' indices into array
IFS='|'
cut -d '|' -f 1 "$tmp/file2_sorted" | tr '
' '|' | endl | while read -a indices; do
   IFS='
'
   #Don't know why read -a doesn't work outside the loop
   
   #Attach indices to file1_sorted
   IFS=""
   i=0
   while read line; do
      echo "${indices[$i]}|$line"
      i=$(( $i+1 ))
   done < "$tmp/file1_sorted" > "$tmp/file1_indexed"
   #Sort it by attached index
   sort -t '|' -k 1 -n -o "$tmp/file1_sorted_final" "$tmp/file1_indexed"
   #Final output
   cut -d '|' -f 2- "$tmp/file1_sorted_final"
   break
done


For 100000 lines it takes 15 s.

I believe translating the script / parts of the script in awk can speed it up, but too lazy to learn awk Embarassed
Back to top
View user's profile Send private message 
rcrsn51


Joined: 05 Sep 2006
Posts: 8555
Location: Stratford, Ontario

PostPosted: Sun 30 Sep 2012, 22:46    Post subject: Re: Bash: sort
Subject description: Is it possible to sort a file based on another file ?
 

zigbert wrote:
How to sort file1 based on file2 ? ... in the speed of light

That means coding it in C. See attached.

Code:
# time ./zigsort file1 file2
04:16|Bartist - Title|002 /path/Bartist_Title.mp3
03:50|Artist - Title|001 /path/Artist_Title.mp3
03:32|Cartist - Title|003 /path/Cartist_Title.mp3

real   0m0.001s
user   0m0.000s
sys   0m0.000s
zigsort-1.0.tar.gz
Description 
gz

 Download 
Filename  zigsort-1.0.tar.gz 
Filesize  3.3 KB 
Downloaded  141 Time(s) 
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4134

PostPosted: Sun 30 Sep 2012, 23:33    Post subject:  

I _was_ too lazy to learn awk, now too lazy to write 100 lines of shell to do 3 lines of awk
first arg is the unsorted file second arg is the sorted file
Code:
#!/bin/awk -f
BEGIN{FS="|"}{if($3){d[$3]=$0;}else{print d[$1]}}

in a shell script it would be:
Code:
awk 'BEGIN{FS="|"}{if($3){d[$3]=$0;}else{print d[$1]}}' unsorted_file sorted_file

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
amigo

Joined: 02 Apr 2007
Posts: 2167

PostPosted: Mon 01 Oct 2012, 03:27    Post subject:  

The suggested 'sort' command seemed to me the best:
"sort -t '|' -k 2 file1"
if that produces the desired result. The OP doesn't state how the order is pre-determined. If the order is completely arbitrary, then one of the other suggestions would be best.
Is the order arbitrary, or is it based on the data in column 2 of file1. Otherwise, how do you *produce* file2?
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4134

PostPosted: Mon 01 Oct 2012, 06:32    Post subject:  

to me it sounded as if it is based on the order they appear in the sorted file and have nothing to do with the contents (the sorted file is simply the last column of the unsorted in a user defined order?) - AFAICT everything faster (with exception of compiled C that is 40 times larger) than my awk one-liner sorted by the numeric values instead of the order they appear in the file
- I was just solving the problem - not the underlying cause (padded zeroes, the sort category at the end of line vs. the beginning, arbitrary fields, order and names...)

the time was:

    real 0m0.009s
    user 0m0.004s
    sys 0m0.004s


and time shouldn't increase significantly based on file length, since that is about the same time it takes awk to BEGIN{print .}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
rcrsn51


Joined: 05 Sep 2006
Posts: 8555
Location: Stratford, Ontario

PostPosted: Mon 01 Oct 2012, 10:09    Post subject:  

technosaurus wrote:
Code:
awk 'BEGIN{FS="|"}{if($3){d[$3]=$0;}else{print d[$1]}}' unsorted_file sorted_file

Clever. And surprisingly fast. With a test set of 999 records, it was 1/3 the speed of zigsort.

I wonder if there is any memory penalty for building an associative array that big?
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5562
Location: Valåmoen, Norway

PostPosted: Mon 01 Oct 2012, 12:09    Post subject:  

technosaurus wrote:
Code:
awk 'BEGIN{FS="|"}{if($3){d[$3]=$0;}else{print d[$1]}}' unsorted_file sorted_file
Now we're talking Very Happy


Thanks a lot
Sigmund

_________________
Stardust resources
Back to top
View user's profile Send private message Visit poster's website 
rcrsn51


Joined: 05 Sep 2006
Posts: 8555
Location: Stratford, Ontario

PostPosted: Mon 01 Oct 2012, 15:44    Post subject:  

As another test, I generated a data set of 9999 records.
Code:
# time ./zigsort file1 file2 > file3

real   0m0.031s
user   0m0.008s
sys   0m0.020s

# time ./technosort file1 file2 > file3

real   0m0.039s
user   0m0.036s
sys   0m0.000s
#

Technosort has caught up. It's holding all its data in memory so it only needs one pass through the files. Zigsort's need to re-read file1 is slowing it down.
Back to top
View user's profile Send private message 
rcrsn51


Joined: 05 Sep 2006
Posts: 8555
Location: Stratford, Ontario

PostPosted: Mon 01 Oct 2012, 16:28    Post subject:  

But if I modify Zigsort to hold all its data internally, it yields
Code:
# time ./zigsort file1 file2 > file3

real   0m0.011s
user   0m0.004s
sys   0m0.004s
Back to top
View user's profile Send private message 
jamesbond

Joined: 26 Feb 2007
Posts: 1875
Location: The Blue Marble

PostPosted: Tue 02 Oct 2012, 10:03    Post subject:  

My entry. Doesn't assume file1 is already sorted, it matches "012" from file1 exactly with "012" from file2.

Code:
#!/bin/ash

ENTRIES=10000
FILE1=/tmp/file1
FILE2=/tmp/file2
OUTFILE=/tmp/outfile

generate_file1() {
   for a in $(seq 1 $ENTRIES); do
      printf "03:50|Artist - Title|%.3d /path/Artist_Title.mp3\n" $a
   done > $FILE1
}

generate_file2() {
   for a in $(seq 1 $ENTRIES); do
      printf "%.3d /path/Artist_Title.mp3\n" $a
   done | sort -R > $FILE2
}

# generate fake data for testing
generate_file1
generate_file2

time -p -- awk -F"|" '
NR > FNR {
   # sort   
   FS=" "
   print file1[$1]
   next
}
{
   # scan
   line=$0
   sub(/ .*/,"",$3)
   file1[$3]=line
}
' $FILE1 $FILE2 > $OUTFILE

_________________
Fatdog64, Slacko and Puppeee user. Puppy user since 2.13
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [15 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1048s ][ Queries: 13 (0.0058s) ][ GZIP on ]