Pbackup needs GURU-help

A home for all kinds of Puppy related projects
Post Reply
Message
Author
User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

Pbackup needs GURU-help

#1 Post by zigbert »

If Pbackup should continue to develop it will push my skills far beyond my existing universe. If you see the point of pushing a small backup-system futher, please contribute with your knowledge. There are mainly 3 areas I see natrual to include.

1. Sync tool
Pbackup has today a tool to keep 'destination' equal to 'source' (Mirror). If I understand it right, - A sync tool should work both ways. It should also handle the files in destination. I looked at rsync, and as far as I understand, the only option is to not act on new files in destination. Is this syncronisation???

It would be rather easy to give options to destination-files created since last mirror.
- Do not act on files (default)
- Copy files back to source
- Delete files

Would this make a sync-tool? Do you have experience with syncing, let me hear from you.

2. Make recovery disk
There has been some whishes for a recovery function. I thought it could be doable with the following steps.
- Burn the backup on a bootable disk.
- Boot an absolute mini linux.
- Drop to terminal.
- Autostart a small script giving the choice to copy datas back to a choosen disk.

Does there exist any presets of such a boot-image? Has any of you gurus knowledge?

3. Backup over net
I do not know anything about network, and I do not have any clue how to start.
- How would you set up a copy to FTP?
- When using ssh, are alien disks mounted as a local disk?

Sigmund

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#2 Post by HairyWill »

I'm 75% of the way to addressing point 3.
I have a script that runs every night and copies some of my 20 GB photo collection to my webspace using curlftp. My bandwidth is unmetered overnight, so each night the script fires up and copies a few gig more.

At the moment it compares filesizes reported by ls -l to decide whether to copy. If the filesizes differ it renames the remote file with a datetime based extension and copies the new file in its place. It optimises by comparing whole directories at a time and only checks individual files if the directories differ.

I will have this looking neater in a couple of days and will post it then.
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#3 Post by zigbert »

:D Looking forward to it.

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#4 Post by HairyWill »

OK here its is.
http://users.ecs.soton.ac.uk/~wmd04r/puppy/pftpcopy-0.1
pftpcopy
This script can be used to mirror(ish) a local directory to a remote ftp server. It is designed to be restartable. When it starts it scans directories comparing filesizes, where there are differences or files that are not on the remote they are copied. No files are ever deleted. If a file's size has changed the remote version has a timestamp appended to its name before the new version is copied over.
It currently reveals the password to your ftp account in the process list, the curlftps website has a suggestion to avoid this.
It would probably be good to compare files using more than just size. (last saved time would be good)
The fourth parameter of minutes_to_run allows you to specify when you want the transfer to terminate. This is useful for me as my bandwidth is free overnight. This unmounts the curlftpfs connection. If you hit CTRL-C on the program it will finish transfering the current file (avoids leaving a mess on the server) before stopping but won't unmount the curlftpfs mount.

Usage: pftpcopy ftp_connection_string local_directoy remote_directory minutes_to_run

example: pftpcopy bob:bobspassword@ftp.test.org /root/my-pictures my-pictures 60

You will need curlftpfs. Install both Gekko's dotpups here
http://murga-linux.com/puppy/viewtopic. ... 4776#94776

Here's a link to the reason why I wrote this.
http://www.murga-linux.com/puppy/viewto ... 322#111322
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
Lobster
Official Crustacean
Posts: 15522
Joined: Wed 04 May 2005, 06:06
Location: Paradox Realm
Contact:

#5 Post by Lobster »

:) Very interesting and very useful

DSL offers online storage
and I wonder if this or something similar will develop such a potential?

Do you plan a front end? Is there a command line ftp facility in Puppy? I know part of Curl is in Puppy (or will be)

Is this a Plan?:

Universal Backup
  • Back up (choose files /directories)
    Auto Backup
    Backup to (HD / USB Device / Puppy Server / Other Server)
:)
Puppy Raspup 8.2Final 8)
Puppy Links Page http://www.smokey01.com/bruceb/puppy.html :D

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#6 Post by zigbert »

:D This looks interesting. I really have to look at this one. Thanks a lot to HairyWill and Gekko.

Sigmund

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#7 Post by HairyWill »

zigbert I'm glad you're interested.
Lobster wrote:Do you plan a front end? Is there a command line ftp facility in Puppy?
There is no command line ftp in puppy. I'd rather let curlftpfs handle that layer as it has inbuilt support for maintaining/resuming connections and then I don't have to deal with scary ftp stuff.

Why would I spoil a perfectly good command line program by building a GUI?
No, to be honest, zigbert expressed a need, I had the code already and it wanted to be free. If he wants to incoporate it into pbackup then that's great, though I'd imagine he doesn't really want the dependencies. The curl libraries dotpup is over 3MB so its not cheap.

The program could still do with a lot of fleshing out. I wanted to see if anyone else was interested in it.

One area in particular that needs improvement is the file comparison, just using filesizes is not really good enough. I am considering creating a hidden directory .pftpcopy in each directory on the server this will contain the md5sums of each file in the parent directory. This way it can do comparison of md5s without having to download the entire file from the server.

Algorithm goes like this.

Code: Select all

If file doesn't exist on server
     copy file and md5 to server
else
     compare md5 of local with the stored md5 on server
     if md5s differ
          rename remote file
          rename remote md5
          copy new version of file and new md5 to server
     fi
fi
It will need some consistencey checks in case there are somehow files without md5s and vice-versa on the server.

The busybox cp in puppy seems to support the --backup=numbered switch (even though it isn't documented) so I might start using that format for file versioning.
foo
foo.~1~
foo.~2~

If anyone has any suggestions or requests I'd be very interested.
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#8 Post by zigbert »

HairyWill wrote:One area in particular that needs improvement is the file comparison, just using filesizes is not really good enough.
For Pbackup 2.2 I have another time dropped mirdir (too unflexible), and rewritten the internal mirror-function. It now scans very fast and compares different attributes. (size, time, user, group, permissions, link-status). It's all VERY simple:

1. Make a textfile with info of files in source and files in mirror.

Code: Select all

find / -printf "%p|%s|%u|%g|%m|%l\n" >> /tmpfile
Output looks like this.

Code: Select all

/root/my-documents/tmp|31|root|root|755|
/root/my-documents/tmp/README-tmp.txt|119|root|root|644|
2. Check differences in source and mirror.

Code: Select all

diff /source /mirror > /tmpfile
Output looks like this.

Code: Select all

1,5c1,57
< /root/my-documents/tmp|31|root|root|755|
< /root/my-documents/tmp/README-tmp.txt|119|root|root|644|
---
> /root/my-applications/bin/unrpm|41|root|root|755|
> /root/my-applications/bin/undeb|35|root|root|755|
3.
Divide output (difference) to 2 new textfiles.
A. Those lines which are in source, but not in mirror. - File are updated or new --> Copy to mirror.
B. Those files which are in mirror, but not in source. - File are moved or deleted --> Remove/rename...

Code: Select all

sed -i -e "s/ /{SPACE}/g" /tmpfile
for I in `cat /tmpfile`; do
	TMP=`echo $I | grep "<{SPACE}/"` #files only in source
	if test "$TMP"; then
		echo "$TMP" | cut -d "|" -f 1 >> /copyfiles
	fi
	TMP=`echo $I | grep ">{SPACE}/"` #files only in mirror
	if test "$TMP"; then
		echo "$TMP" | cut -d "|" -f 1 >> /renamefiles
	fi
done
sed -i -e "s/{SPACE}/ /g" /copyfiles
sed -i -e "s/{SPACE}/ /g" /renamefiles
Now you got 2 nice lists with files to be copied, and renamed. Be aware of the fact that for-loop uses spaces as deliminator. That's why sed converts it temporary to {SPACE}.
=======================================
HairyWill wrote:zigbert expressed a need, I had the code already and it wanted to be free. If he wants to incoporate it into pbackup then that's great, though I'd imagine he doesn't really want the dependencies. The curl libraries dotpup is over 3MB so its not cheap.
Absolutely right. Pbackup is now 17 kb compressed, and intensions are to keep Puppy small, isn't it? But I wonder, what is the difference between curl in Puppy and Gekkos package?
The program could still do with a lot of fleshing out.
I guess, and hope you are thinking of using puppys curl. :? :D

Sigmund

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#9 Post by HairyWill »

good news :D
The curl libraries in puppy214 seem to be sufficient though they are not present in 215 :-(
All that is needed above the standard 214 install is the curlftpfs binary, this is 78k uncompressed. OK its bigger than the entire of pbackup but it looks a lot more reasonable. Still not sure if Barry would buy it for default inclusion, this seems a bit of a niche market. Maybe it could be an installable extension.

Your comparison code is excellent and your coding much neater than mine.
I liked the idea of doing comparisons on md5s but on balance it is probably better not to write extra metadata to the server.

There is a significant lag in scanning for changes on my current setup, 30 - 60 seconds. I have ADSL 512K up / 8MB down and the server holds 1600 files in a 4 tier directory structure. My final use will involve 20 times as many files, a 10 - 20 minute startup time before any copying gets done is not good. This would be acceptable while I know there are just a few changes, but initially there are whole leaf directories to copy and it would be more efficient to jump straight to those.

:idea: Ahh. I suppose it could persist the change list from one session to the next. I'm not sure that it is safe to assume that the list is 100% accurate but it could be used to home in on particular areas that definitely need copying. Maybe it could keep a logfile of any directories that were copied completely and not scan these whilst there were still others known to be incomplete.

will code some more tomorrow
must sleep :)
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#10 Post by HairyWill »

problem

I don't think it is possible to guarantee much control over file attributes using ftp. Using cp -p doesn't work, at least on the ftp server I am using. I may still have to maintain some extra file metadata in a separate file.
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#11 Post by zigbert »

I don't think it is possible to guarantee much control over file attributes using ftp. Using cp -p doesn't work, at least on the ftp server I am using. I may still have to maintain some extra file metadata in a separate file.
It may be solved with the 'cmp' command.
It compares 2 files byte for byte, and doesn't bother about attributes.

Sigmund

User avatar
HairyWill
Posts: 2928
Joined: Fri 26 May 2006, 23:29
Location: Southampton, UK

#12 Post by HairyWill »

Thanks
zigbert wrote:It may be solved with the 'cmp' command.
Ah but that is the architectural problem. To do cmp would require downloading the file from the fileserver.
On ADSL the download speed may be an order of magnitude greater than the upload speed but having to download the file to do the comparison is still a significant problem.

rsync solves this problem by being able to calculate MD5s (or something similar, I think) on the server. To perform a comparison only the hashes need to be transfered. rsync is not an option as it either requires an rsync server or a shell on the remote machine, all we have is ftp.

As I see it the options are:
1) Just do comparison based on size reported by ls -l
This works well for my application of creating a copy of all my digital photos as I don't edit the files ever.

2) Store file metadata locally
Only works if there is only ever one client that uploads to the ftp server.
The ftp server only holds the actual data you want to store.

3) Store file metadata on the server
Would allow more than one client to be able to upload (not sure this is sensible anyway as it is supposed to be a backup)
This makes the client stateless (apart from the rather large consideration that the client holds the data that needs copying)

I'm envisioning something similar to CVS or SVN where each directory contains a .pftpcopy directory containing the metadata. Storing MD5 hash seems preferable to last saved date as it actually analyses the files contents. It is probably important that the metadata files are small and are written and closed as soon as possible after a file copy is made, this should help to ensure consistency.

Personally I'd rather not store the metadata locally as the network share I'm copying is mounted read only. Though I suppose it could be down to user choice.

There are some interesting architectural choices here, I'm not sure how many people will use this.
Will
contribute: [url=http://www.puppylinux.org]community website[/url], [url=http://tinyurl.com/6c3nm6]screenshots[/url], [url=http://tinyurl.com/6j2gbz]puplets[/url], [url=http://tinyurl.com/57gykn]wiki[/url], [url=http://tinyurl.com/5dgr83]rss[/url]

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#13 Post by zigbert »

Yes, I see. 'cmp' just dropped into my mind, and I didn't thought it all over.
Headfound (forum name) reported trouble when using Pbackup 2.0.0 to sync over different filesystems. I have now solved that nearly as your option 1 (ls -l). Instead of checking "all" attributs (find / -printf "%p|%s|%u|%g|%m|%l\n"), user has option to only check size (find / -printf "%p|%s\n"). Of course this is not 100 persent, and I wonder if cmp could be an answer. But on the other hand, I guess speed matters, and cmp with heavy datas doesn't sound that good.

Hmmm! :roll:

Post Reply