(OLD) (ARCHIVED) Puppy Linux Discussion Forum Forum Index (OLD) (ARCHIVED) Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info

This forum can also be accessed as http://oldforum.puppylinux.com
It is now read-only and serves only as archives.

Please register over the NEW forum
https://forum.puppylinux.com
and continue your work there. Thank you.

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 28 Sep 2020, 18:59
All times are UTC - 4
 Forum index » House Training » Beginners Help ( Start Here)
How can I sort Duplicate Files?
Moderators: Flash, Ian, JohnMurga
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies. View previous topic :: View next topic
Page 1 of 2 [16 Posts]   Goto page: 1, 2 Next
Author Message
p310don

Joined: 19 May 2009
Posts: 1502
Location: Brisbane, Australia

PostPosted: Thu 07 May 2015, 18:04    Post subject:  How can I sort Duplicate Files?  

I have a number of hard drives in my PC.

As one gets full, I buy another, bigger hard drive and put it in. Often, I'll copy some things from the old to the new so there is a backup. However, I am realising that I have done this with every new hard drive, and probably have 6 copies of the same file across 6 hard drives. It seems I also have multiple copies on the same hard drive. All in all it is a complete waste of space.

Are there any easy to use programs for Puppy that can help me sort out the mess I have?
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 18:31    Post subject:  

Are we related or twins separated at birth. That is what I do too. there are two software packages to choose from. Did not have this issue prior to going on to newer hardware that did not have a dvd drive to multisession.
What version of puppylinux are you running, would you mind running an older version of puppylinux since newer versions are not easily located.
Also expect this to take days if you are into the TByte range and a few hours deleting duplicates after it finishes full scans. Wink But its better than hunting and peaking around. Also if you mix windows drives in the scans please be aware there are thousands of identical files with different names and subfolders within windows. So maybe if you have room just move the files from the personal area into a different drive and ignore windows system files as a whole.
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 18:46    Post subject:  

Also have as many of the harddrives connected at the same time, since duplicate removal is all but one. likewise have the largest as sda then step down size wise since that will speed up selections and file removal, assuming you will want to keep the file where the other none duplicated files are. After removing duplicates I then copy non duplicates from next larger drive to this larger harddrive, and systematically free up all space in the smaller drives by this method. I find a whole bunch of smaller capacity harddrives freedup for other stuff to be more useful, since I use small capacity harddrives where normal people waste money and time with usbflash.
That also reminds me you should also scan all usb harddrives at the same time.
Back to top
View user's profile Send private message 
p310don

Joined: 19 May 2009
Posts: 1502
Location: Brisbane, Australia

PostPosted: Thu 07 May 2015, 19:24    Post subject:  

Quote:
there are two software packages to choose from


And they are...?

Quote:
Also expect this to take days if you are into the TByte range and a few hours deleting duplicates after it finishes full scans


Yup, I have a total of 12 terabytes of storage Surprised Will take a while, but will free up a load of space.

So far a basic test I've done is load the folder /mnt as a playlist into clementine. That loads all music and videos, even though it doesn't play videos. I can seen each duplicate of media files, which take up the most space.

For example, it seems I have 5 copies of the movie Dances With Wolves existing on my PC. Seems wasteful.

The Puppy I use is almost irrelevant, I am happy to boot into any. I use Saluki 021 daily, but have about 10 variants installed, and can frugal anything else too..
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 20:40    Post subject:  

ok great.. just a liveCd dvd boot into an older version of puppylinux I will look for those programs and get back with you for links, You may wish to get a UPS since this could take a month Confused
Also if you can bulk out memory of RAM you will get faster matchs, with a month of scanning that slight improvement in speed couldn't hurt.

sorry can't recall names of those two, both are poorly named, and not been updated in years ( that is way an older version of puppylinux works better newer lib modules are not as compatible )

I go through this every other year or so, so will have to dig around a bit.

Last edited by Ted Dog on Thu 07 May 2015, 20:47; edited 1 time in total
Back to top
View user's profile Send private message 
starhawk

Joined: 22 Nov 2010
Posts: 5056
Location: Everybody knows this is nowhere...

PostPosted: Thu 07 May 2015, 20:43    Post subject:  

p310don wrote:
I have a total of 12 terabytes of storage.


Holy crap, you could open a decently sized Wal*Mart in that kind of space.

Seriously, dude, that's a small datacenter you've got going there. You don't run a hosting service, do you? Razz

...naaaaaaah, just ribbin' ya a little. Don't mind me...

_________________

Back to top
View user's profile Send private message 
slavvo67

Joined: 12 Oct 2012
Posts: 1625
Location: The other Mr. 305

PostPosted: Thu 07 May 2015, 21:09    Post subject:  

I have a script that I'd be happy to share. The problem is that it doesn't let you choose which copy to keep. So, let's say I renamed one file to the actual name I want, like invest1.doc and I run the script, I might end up with 1gg67-94873.doc which would be the exact same file but the name I do not wish to keep. If that's not an issue, I'll send it over.

It runs in terminal only for faster speed.
Back to top
View user's profile Send private message 
p310don

Joined: 19 May 2009
Posts: 1502
Location: Brisbane, Australia

PostPosted: Thu 07 May 2015, 21:18    Post subject:  

I have found a program called dupeGuru which apparently works on Linux, but also in windoes. If anyone can compile it for puppy would be cool. I've tried, but turns out I can't compile anything useful.

I have booted winxp in virtualbox and run it on a shared hard drive. After 2 hours it has found 3875 duplicate files totalling 66gig. Takes a long time. I have a number of PCs. My MO is to turn them all on, and have each PC working on each drive to share resources.

It is actually too hard to go through the duplicates when there are too many.

I'm finding things like
/backup/123/xyz.jpg
/123/xyz.jpg
xyz.jpg
/holiday/xyz.jpg

and so on. Frustrating. I really need to spend about a year or so fixing this....
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 21:24    Post subject:  

compiled and tweaked code will preform much better than a script, thanks but at the dozen TBYES of data screams the need of a highly tuned and fast code.
The compiled code is tiny anyway only a few hundred k.

Could you place the harddrives in a single motherboard, the programs will pull data in parallel at the fastest the CPUs could process data so a multicore and lots of sata ports are a bonus.

The setup you discussed would actually be the slowest, shared drives have a bunch of overhead, problem is not cpu related but data flow, cpu can keep up with 6 to 8 harddrives the USB drives will be slower but having the cpu processing data made available on directly connected sata drives, beginning may be slower but as the smaller drives are finished being read they are no longer take time until bulk deletes which are FAST
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 21:53    Post subject:  

How tbey work is the software reads all the files from all the drives and sorts them by size, then processing the largest sized files first comparing and construction of checksums as it processes buffer full of data in RAM. Data is continually added and cpus churn away holding much as it can in RAM.

now if you can wait let me find those software. Linux and especially puppylinux will .ake this the fastest since it also has little to no fluff, being able to run in RAM frees up all temp writing to OS harddrive, thereby kepting them in read mode.
Also if you need to only remove duplicate video and music files you can set it to only process files larger than say 1G or so. That could be processed with in a day or so and have more realistic bang for your time. Dancing with wolves is a long movie and a good size to hunt for duplicates.
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 22:51    Post subject:  

found it on a archive of murga not searched by current software.. so bumped it from oblivion..

http://murga-linux.com/puppy/viewtopic.php?p=844312#844312

this is what I use and this link shows how to update the missing lib soft link.
Back to top
View user's profile Send private message 
Ted Dog


Joined: 13 Sep 2005
Posts: 4013
Location: Heart of Texas

PostPosted: Thu 07 May 2015, 22:53    Post subject:  

other is fdups which I could not find using search here.
Back to top
View user's profile Send private message 
p310don

Joined: 19 May 2009
Posts: 1502
Location: Brisbane, Australia

PostPosted: Thu 07 May 2015, 22:58    Post subject:  

Ted Dog, that might be a good program.

The link you used doesn't have the attachment any more, but a quick googly found it here:
http://murga-linux.com/puppy/viewtopic.php?t=89334

Will see how I go
Back to top
View user's profile Send private message 
SeeSpotRun

Joined: 13 Jan 2015
Posts: 2

PostPosted: Fri 08 May 2015, 17:21    Post subject:  

Try rmlint which is designed with just your case in mind. In particular it reads multiple disks in parallel for improved speed, and has some nice options to help decide which file to keep as the "original" out of any set of duplicates.
There is currently no PET file for rmlint; you can either compile yourself or if you're not comfortable with that then I'll see if I can make a PET file.
Back to top
View user's profile Send private message 
p310don

Joined: 19 May 2009
Posts: 1502
Location: Brisbane, Australia

PostPosted: Sat 09 May 2015, 01:01    Post subject:  

Have been using dupfinder as recommended by Ted Dog. So far I have cleaned one directory and it's subdirectories and I have gone from 178gig free space on the drive to 261gig free. Nice.
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 2 [16 Posts]   Goto page: 1, 2 Next
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies. View previous topic :: View next topic
 Forum index » House Training » Beginners Help ( Start Here)
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0832s ][ Queries: 12 (0.0385s) ][ GZIP on ]