Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 20 Oct 2014, 22:10
All times are UTC - 4
 Forum index » Advanced Topics » Puppy Projects
Pbackup needs GURU-help
Moderators: Flash, JohnMurga
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [13 Posts]  
Author Message
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Mon 16 Apr 2007, 09:37    Post subject:  Pbackup needs GURU-help
Subject description: Sync, Recovery, FTP
 

If Pbackup should continue to develop it will push my skills far beyond my existing universe. If you see the point of pushing a small backup-system futher, please contribute with your knowledge. There are mainly 3 areas I see natrual to include.

1. Sync tool
Pbackup has today a tool to keep 'destination' equal to 'source' (Mirror). If I understand it right, - A sync tool should work both ways. It should also handle the files in destination. I looked at rsync, and as far as I understand, the only option is to not act on new files in destination. Is this syncronisation???

It would be rather easy to give options to destination-files created since last mirror.
- Do not act on files (default)
- Copy files back to source
- Delete files

Would this make a sync-tool? Do you have experience with syncing, let me hear from you.

2. Make recovery disk
There has been some whishes for a recovery function. I thought it could be doable with the following steps.
- Burn the backup on a bootable disk.
- Boot an absolute mini linux.
- Drop to terminal.
- Autostart a small script giving the choice to copy datas back to a choosen disk.

Does there exist any presets of such a boot-image? Has any of you gurus knowledge?

3. Backup over net
I do not know anything about network, and I do not have any clue how to start.
- How would you set up a copy to FTP?
- When using ssh, are alien disks mounted as a local disk?

Sigmund
Back to top
View user's profile Send private message Visit poster's website 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Mon 16 Apr 2007, 12:25    Post subject:  

I'm 75% of the way to addressing point 3.
I have a script that runs every night and copies some of my 20 GB photo collection to my webspace using curlftp. My bandwidth is unmetered overnight, so each night the script fires up and copies a few gig more.

At the moment it compares filesizes reported by ls -l to decide whether to copy. If the filesizes differ it renames the remote file with a datetime based extension and copies the new file in its place. It optimises by comparing whole directories at a time and only checks individual files if the directories differ.

I will have this looking neater in a couple of days and will post it then.

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Tue 17 Apr 2007, 02:01    Post subject:  

Very Happy Looking forward to it.
Back to top
View user's profile Send private message Visit poster's website 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Wed 18 Apr 2007, 21:36    Post subject:  

OK here its is.
http://users.ecs.soton.ac.uk/~wmd04r/puppy/pftpcopy-0.1
pftpcopy
This script can be used to mirror(ish) a local directory to a remote ftp server. It is designed to be restartable. When it starts it scans directories comparing filesizes, where there are differences or files that are not on the remote they are copied. No files are ever deleted. If a file's size has changed the remote version has a timestamp appended to its name before the new version is copied over.
It currently reveals the password to your ftp account in the process list, the curlftps website has a suggestion to avoid this.
It would probably be good to compare files using more than just size. (last saved time would be good)
The fourth parameter of minutes_to_run allows you to specify when you want the transfer to terminate. This is useful for me as my bandwidth is free overnight. This unmounts the curlftpfs connection. If you hit CTRL-C on the program it will finish transfering the current file (avoids leaving a mess on the server) before stopping but won't unmount the curlftpfs mount.

Usage: pftpcopy ftp_connection_string local_directoy remote_directory minutes_to_run

example: pftpcopy bob:bobspassword@ftp.test.org /root/my-pictures my-pictures 60

You will need curlftpfs. Install both Gekko's dotpups here
http://murga-linux.com/puppy/viewtopic.php?p=94776#94776

Here's a link to the reason why I wrote this.
http://www.murga-linux.com/puppy/viewtopic.php?p=111322#111322

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
Lobster
Official Crustacean


Joined: 04 May 2005
Posts: 15117
Location: Paradox Realm

PostPosted: Thu 19 Apr 2007, 00:58    Post subject:  

Smile Very interesting and very useful

DSL offers online storage
and I wonder if this or something similar will develop such a potential?

Do you plan a front end? Is there a command line ftp facility in Puppy? I know part of Curl is in Puppy (or will be)

Is this a Plan?:

Universal Backup

    Back up (choose files /directories)
    Auto Backup
    Backup to (HD / USB Device / Puppy Server / Other Server)


Smile

_________________
Puppy WIKI
Back to top
View user's profile Send private message Visit poster's website 
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Thu 19 Apr 2007, 16:52    Post subject:  

Very Happy This looks interesting. I really have to look at this one. Thanks a lot to HairyWill and Gekko.

Sigmund
Back to top
View user's profile Send private message Visit poster's website 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Fri 20 Apr 2007, 05:19    Post subject:  

zigbert I'm glad you're interested.
Lobster wrote:
Do you plan a front end? Is there a command line ftp facility in Puppy?
There is no command line ftp in puppy. I'd rather let curlftpfs handle that layer as it has inbuilt support for maintaining/resuming connections and then I don't have to deal with scary ftp stuff.

Why would I spoil a perfectly good command line program by building a GUI?
No, to be honest, zigbert expressed a need, I had the code already and it wanted to be free. If he wants to incoporate it into pbackup then that's great, though I'd imagine he doesn't really want the dependencies. The curl libraries dotpup is over 3MB so its not cheap.

The program could still do with a lot of fleshing out. I wanted to see if anyone else was interested in it.

One area in particular that needs improvement is the file comparison, just using filesizes is not really good enough. I am considering creating a hidden directory .pftpcopy in each directory on the server this will contain the md5sums of each file in the parent directory. This way it can do comparison of md5s without having to download the entire file from the server.

Algorithm goes like this.
Code:
If file doesn't exist on server
     copy file and md5 to server
else
     compare md5 of local with the stored md5 on server
     if md5s differ
          rename remote file
          rename remote md5
          copy new version of file and new md5 to server
     fi
fi

It will need some consistencey checks in case there are somehow files without md5s and vice-versa on the server.

The busybox cp in puppy seems to support the --backup=numbered switch (even though it isn't documented) so I might start using that format for file versioning.
foo
foo.~1~
foo.~2~

If anyone has any suggestions or requests I'd be very interested.

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Fri 20 Apr 2007, 12:49    Post subject:  

HairyWill wrote:
One area in particular that needs improvement is the file comparison, just using filesizes is not really good enough.
For Pbackup 2.2 I have another time dropped mirdir (too unflexible), and rewritten the internal mirror-function. It now scans very fast and compares different attributes. (size, time, user, group, permissions, link-status). It's all VERY simple:

1. Make a textfile with info of files in source and files in mirror.
Code:
find / -printf "%p|%s|%u|%g|%m|%l\n" >> /tmpfile
Output looks like this.
Code:
/root/my-documents/tmp|31|root|root|755|
/root/my-documents/tmp/README-tmp.txt|119|root|root|644|

2. Check differences in source and mirror.
Code:
diff /source /mirror > /tmpfile
Output looks like this.
Code:
1,5c1,57
< /root/my-documents/tmp|31|root|root|755|
< /root/my-documents/tmp/README-tmp.txt|119|root|root|644|
---
> /root/my-applications/bin/unrpm|41|root|root|755|
> /root/my-applications/bin/undeb|35|root|root|755|

3.
Divide output (difference) to 2 new textfiles.
A. Those lines which are in source, but not in mirror. - File are updated or new --> Copy to mirror.
B. Those files which are in mirror, but not in source. - File are moved or deleted --> Remove/rename...
Code:
sed -i -e "s/ /{SPACE}/g" /tmpfile
for I in `cat /tmpfile`; do
   TMP=`echo $I | grep "<{SPACE}/"` #files only in source
   if test "$TMP"; then
      echo "$TMP" | cut -d "|" -f 1 >> /copyfiles
   fi
   TMP=`echo $I | grep ">{SPACE}/"` #files only in mirror
   if test "$TMP"; then
      echo "$TMP" | cut -d "|" -f 1 >> /renamefiles
   fi
done
sed -i -e "s/{SPACE}/ /g" /copyfiles
sed -i -e "s/{SPACE}/ /g" /renamefiles
Now you got 2 nice lists with files to be copied, and renamed. Be aware of the fact that for-loop uses spaces as deliminator. That's why sed converts it temporary to {SPACE}.
=======================================

HairyWill wrote:
zigbert expressed a need, I had the code already and it wanted to be free. If he wants to incoporate it into pbackup then that's great, though I'd imagine he doesn't really want the dependencies. The curl libraries dotpup is over 3MB so its not cheap.
Absolutely right. Pbackup is now 17 kb compressed, and intensions are to keep Puppy small, isn't it? But I wonder, what is the difference between curl in Puppy and Gekkos package?
Quote:
The program could still do with a lot of fleshing out.
I guess, and hope you are thinking of using puppys curl. Confused Very Happy

Sigmund
Back to top
View user's profile Send private message Visit poster's website 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Fri 20 Apr 2007, 22:49    Post subject:  

good news Very Happy
The curl libraries in puppy214 seem to be sufficient though they are not present in 215 Sad
All that is needed above the standard 214 install is the curlftpfs binary, this is 78k uncompressed. OK its bigger than the entire of pbackup but it looks a lot more reasonable. Still not sure if Barry would buy it for default inclusion, this seems a bit of a niche market. Maybe it could be an installable extension.

Your comparison code is excellent and your coding much neater than mine.
I liked the idea of doing comparisons on md5s but on balance it is probably better not to write extra metadata to the server.

There is a significant lag in scanning for changes on my current setup, 30 - 60 seconds. I have ADSL 512K up / 8MB down and the server holds 1600 files in a 4 tier directory structure. My final use will involve 20 times as many files, a 10 - 20 minute startup time before any copying gets done is not good. This would be acceptable while I know there are just a few changes, but initially there are whole leaf directories to copy and it would be more efficient to jump straight to those.

Idea Ahh. I suppose it could persist the change list from one session to the next. I'm not sure that it is safe to assume that the list is 100% accurate but it could be used to home in on particular areas that definitely need copying. Maybe it could keep a logfile of any directories that were copied completely and not scan these whilst there were still others known to be incomplete.

will code some more tomorrow
must sleep Smile

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Mon 23 Apr 2007, 05:04    Post subject:  

problem

I don't think it is possible to guarantee much control over file attributes using ftp. Using cp -p doesn't work, at least on the ftp server I am using. I may still have to maintain some extra file metadata in a separate file.

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Tue 01 May 2007, 03:30    Post subject:  

Quote:
I don't think it is possible to guarantee much control over file attributes using ftp. Using cp -p doesn't work, at least on the ftp server I am using. I may still have to maintain some extra file metadata in a separate file.

It may be solved with the 'cmp' command.
It compares 2 files byte for byte, and doesn't bother about attributes.

Sigmund
Back to top
View user's profile Send private message Visit poster's website 
HairyWill


Joined: 26 May 2006
Posts: 2949
Location: Southampton, UK

PostPosted: Tue 01 May 2007, 06:51    Post subject:  

Thanks
zigbert wrote:
It may be solved with the 'cmp' command.

Ah but that is the architectural problem. To do cmp would require downloading the file from the fileserver.
On ADSL the download speed may be an order of magnitude greater than the upload speed but having to download the file to do the comparison is still a significant problem.

rsync solves this problem by being able to calculate MD5s (or something similar, I think) on the server. To perform a comparison only the hashes need to be transfered. rsync is not an option as it either requires an rsync server or a shell on the remote machine, all we have is ftp.

As I see it the options are:
1) Just do comparison based on size reported by ls -l
This works well for my application of creating a copy of all my digital photos as I don't edit the files ever.

2) Store file metadata locally
Only works if there is only ever one client that uploads to the ftp server.
The ftp server only holds the actual data you want to store.

3) Store file metadata on the server
Would allow more than one client to be able to upload (not sure this is sensible anyway as it is supposed to be a backup)
This makes the client stateless (apart from the rather large consideration that the client holds the data that needs copying)

I'm envisioning something similar to CVS or SVN where each directory contains a .pftpcopy directory containing the metadata. Storing MD5 hash seems preferable to last saved date as it actually analyses the files contents. It is probably important that the metadata files are small and are written and closed as soon as possible after a file copy is made, this should help to ensure consistency.

Personally I'd rather not store the metadata locally as the network share I'm copying is mounted read only. Though I suppose it could be down to user choice.

There are some interesting architectural choices here, I'm not sure how many people will use this.

_________________
Will
contribute: community website, screenshots, puplets, wiki, rss
Back to top
View user's profile Send private message 
zigbert


Joined: 29 Mar 2006
Posts: 5753
Location: Valåmoen, Norway

PostPosted: Tue 01 May 2007, 07:44    Post subject:  

Yes, I see. 'cmp' just dropped into my mind, and I didn't thought it all over.
Headfound (forum name) reported trouble when using Pbackup 2.0.0 to sync over different filesystems. I have now solved that nearly as your option 1 (ls -l). Instead of checking "all" attributs (find / -printf "%p|%s|%u|%g|%m|%l\n"), user has option to only check size (find / -printf "%p|%s\n"). Of course this is not 100 persent, and I wonder if cmp could be an answer. But on the other hand, I guess speed matters, and cmp with heavy datas doesn't sound that good.

Hmmm! Rolling Eyes
Back to top
View user's profile Send private message Visit poster's website 
Display posts from previous:   Sort by:   
Page 1 of 1 [13 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Advanced Topics » Puppy Projects
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1235s ][ Queries: 12 (0.0254s) ][ GZIP on ]