shinobar, sorry to hijack your thread. I'll move off your lawn very quickly after this.
Been thinking about it too ... I'm comparing the situation that requires snapmergepuppy and the one where /pup_rw is mounted directly on pupsave file. In this case, no management of whiteout files is done (as shinobar said) - and yet things will work correctly.
In the specific PUPMODE where merge script is required, these are the conditions:
a) there are, effectively, two pupsave files - the tmpfs layer, and the real pupsave (mounted ro by aufs)
b) we want to create the impression that this two pupsaves work as one
c) we don't want to duplicate items from pupsave to tmpfs
d) optionally, tmpfs and pupsave is allowed to have different size
a) & b) is rather easy to accomplish, it's c) & d) which causes the most headache and the need for merge script. Actually, c) is also the cause of problem if your real pupsave file is almost full, yet the tmpfs is empty (ie fresh boot). One can keep adding things without knowing that one cannot save the stuff anymore. Kinda like vmware thin provisioning, but without enough backing storage
If it's only a) & b) - easy - just load pupsave to tmpfs at start, and then rsync everything to pupsave during shutdown (or during merge). The real pupsave don't even need to be part of the branch.
But we need to do c) and d) since that's the agreed design criteria for now. Based on the above, I think the only check needed is as follows, for a combination of a "real file" and its corresponding whiteout file:
1. whiteout file exists in tmpfs, real file exists in pupsave
Cause ==> the file has just been deleted during user session.
Action ==> delete real file in pupsave & create the whiteout file (to prevent any file from lower layer getting exposed).
Then delete the whiteout in tmpfs.
2. whiteout file exist in tmpfs, real file doesn't exist in pupsave
Cause ==> whiteout is for a file in lower layer
Action ==> create whiteout file in pupsave
Then delete the whiteout in tmpfs.
3. real file exists in tmpfs, whiteout exist in pupsave
Cause ==> new file created over previously deleted file (from previous session)
Action ==> copy file from tmpfs to pupsave, and delete whiteout in pupsave
Then delete the real file in tmpfs.
4. real file exists in tmpfs, whiteout doesn't exist in pupsave
Cause ==> new file created in this session
Action ==> copy file from tmpfs to pupsave,
Then delete the real file in tmpfs.
5. real file exist in tmpfs, real file also exist in pupsave
Cause ==> file is updated in this session
Action ==> copy file from tmpfs to pupsave,
Then delete the real file in tmpfs.
Of course when I say "file" it also applies to directories.
I think that should handle 90% of the cases. We skip corner cases of "we only save the whiteout files only if the lower layer SFS have the real files" - I don't really see why this is necessary.
If the slowness comes from checking all those files in the SFS layers, then by dealing only with tmpfs and pupsave, this delay should be greatly reduced. If it's not, then the above may not help. In fact, I'm doubting the need to have c) and d) in the first place ... I mean, you have that very important big file you need to save, you can always save it in /mnt/home (ie the real storage).
Ok, I'm off - jemimah we can start another thread on this if you want to.
Shinobar, thanks for the update, I'll test it and get back to you.