Append text to tar archive, get it with 'strings'

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

Append text to tar archive, get it with 'strings'

#1 Post by sc0ttman »

I was able to append the following onto the end of a .pet archive:

PKGNAME=blah
PKGVER=0.1

I was then able to to strings filename.pet | tail -2

And it printed out exactly as above.. nice..
But it messed up the archive.. same thing happened with .tar.gz files..

Is there any way to append strings (such as 'VAR=blah') to the end of archives, and then get them back using `cat` or `strings` or similar?

If so, I might make a tmp copy of the akita repo, run it on all pkgs, then if all is well, I will be able to get package info straight from pkgs themselves, without having to extract it, or refer to the repo databse file...

EDIT::

Yet to test this fully.. But got this doing what I want:

Code: Select all

echo "
PKG_NAME=var1
PKG_VER=var2" >> pkg_name.pet
(The extra blank line is needed at the top, cos Barry creates/created pets with echo -n $md5 at the end.. )

then

Code: Select all

strings pkg_name.pet | tail -3
will show something like

Code: Select all

901eeb44ea34f71d2da0b7a8a5e14d02
PKG_NAME=bsme
PKG_VER=1.0
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#2 Post by sc0ttman »

which leads me to another question..

why does barry only append the md5 to the end???

Sure, that could be done last, but a wealth of useful info could go in there at the same time right before the md5sum... Or am I missing something??

Would it be useful to have lots of pkg specs and info avail through :

Code: Select all

strings $pkgname | tail -10
.. or not?
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
Karl Godt
Posts: 4199
Joined: Sun 20 Jun 2010, 13:52
Location: Kiel,Germany

#3 Post by Karl Godt »

pet2tgz wrote:#truncate is a little app I wrote. format: truncate newsize filename
truncate $ORIGSIZE "$1"
The script uses
ORIGSIZE=`expr $FULLSIZE - 32`
dd if="${1}" of=/tmp/petmd5sum bs=1 skip=${ORIGSIZE}

md5sum is 32 Bytes long , without -n option to echo probably 33 .

If you create a footer of always the same Byte size using padding Space , that might probably work to truncate
ORIGSIZE=`expr $FULLSIZE - 3232` for example .

OR
stat -c %s the unaltered archive first and echo that VALUE as last value at it, probably tail -n1 might fetch the value , so that can be used to truncate back to the pure archive

[14:52 0 /bin/sh 29650 29 TEST ]
[puppypc]# stat -c %s glib-1.2.10-i686.tar
665600

echo "BULLSHIT" >>!$
echo "BULLSHIT" >>glib-1.2.10-i686.tar

echo 665600 >>!$
echo 665600 >>glib-1.2.10-i686.tar

tail -n1 !$
tail -n1 glib-1.2.10-i686.tar
665600

truncate 665600 glib-1.2.10-i686.tar

WORKS FOR ME .


cat glib-1.2.10-i686.files >>glib-1.2.10-i686.tar.gz
xarchive glib-1.2.10-i686.tar.gz
gzip: stdin: decompression OK, trailing garbage ignored
tar: Child returned status 2
tar: Error is not recoverable: exiting now
wrapper exited with: 0
- Despite the error message seems to work though .

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#4 Post by seaside »

Another approach might be using tar file output for the pet spec file-

Code: Select all

tar xfz mypet.pet ./mypet/pet.specs -O
Cheers,
s

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#5 Post by sc0ttman »

Karl Godt wrote:stat -c %s the unaltered archive first and echo that VALUE as last value at it, probably tail -n1 might fetch the value , so that can be used to truncate back to the pure archive
can you explain your code?
echo "BULLSHIT" >>!$
echo 665600 >>!$
what is that?
tail -n1 !$
tail -n1 glib-1.2.10-i686.tar
and that...
truncate 665600 glib-1.2.10-i686.tar
and what does this do..

if it works for you, can you make it into a script or func that allows appending a given string to an existing pet, then appends the correct new md5sum...?? .. might help a bit more..
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
mikeb
Posts: 11297
Joined: Thu 23 Nov 2006, 13:56

#6 Post by mikeb »

How about.....

forget pets... supply all software as sfs...... can be used as is... but if the user wants to or has a full install it can be installed just like a pet.... mount and copy boogie.... the pet.specs can still be present and any install script needed and you can tag on info to an sfs like you are trying to do here.

Compatability...use older squash and puppy comes with version 3 handler anyway..

Advantages... only one format for both needs...smaller download, no space used during install process over and above the added files, simple.

sorry felt the urge to throw that in there.

mike

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#7 Post by sc0ttman »

.. maybe i'll be clear about why i'm looking into this.. i want the .pets in the akita repo to be the same as normal pets, except that they contain a tiny bit more info:

i wanna append new info onto the end of akita pets.. I want the pets './config' and compile time options put onto the end of pets by the 'new2dir' script, and by whatever scripts Pkg/buildpet uses to package up at the end..

Something like:

# strings $PKGNAME | tail -5
PKG_MD5=ewasgdw9grg32riu93rg3rg20
PKG_BUILD_HOST='Lucid 528, gcc 4.3.4,libc-2.11,make 2.81'
PKG_OPT='--prefix=/usr --no-python --more-blah'
PKG_BUILD=0


This would finally give users (at least in akita) the option of checking of out how the packages were compiled (which could be easily done by a package manager if the pets contained the info).. this would be helpful for things like vlc, mplayer, gtk, and anything you wanted to re-compile, etc ..

... if the './config' options made it into all .pets in general, then all pets, even those randomly made and posted on this forum should have decent info attached, or be conspicuous by its absence...
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#8 Post by technosaurus »

Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#9 Post by sc0ttman »

sorry guys... none of these examples are making any sense to me at all.. sorry..

.. can i please have an example function, where i give a PET and a multi-line string, and i get a pet with that string added at start or end, including corrected md5 inside the file.. so that i can simply use head -5 or tail -5 to get vars (inc new, correct md5sum), each on their own line ..

example add_pkg_specs func usage:

Code: Select all

LIST="PKGNAME=name
PKGVER=0.1
PKGREV=0
PKGCONF='--prefix=/usr'
PKGHOST='Lucid 528, libc 2.11, gcc 4.3.4, make 2.18'"

add_pkg_specs $pkgname "$LIST"
So I can then use this to get the details (including correct md5sum):

strings $pkg | head -5

or

strings $ pkg | tail -5

etc

.. and I don't understand something else... does the md5sum need to be at the end of the file? i would like it to be, to keep it the same as other pets...

EDIT: OK, I'm sure I am missing something here... the md5sums appended inside the archives/pets dont match what you get when you run the md5sum command on them... i assume this is normal... i checked with pets coming straight from the wary repo, i didnt edit them - the md5s didnt match (insdie the pet and from the md5 cmd)...

.... what is the point of appending an md5 to a pet, if it changes the md5 of that pet when you do it?

.. can i just append the vars i want (inc the 'soon to be old' md5sum), and not worry about a mismatching md5sum?!?!

.. If so, what is the truncate command for?? what exactly is going on inside pet2tgz/tgz2pet??
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#10 Post by seaside »

technosaurus gives an exellent example above of how to add data to binary files (that's for steganography, but it applies to anything)

If you add data to a pet file, you must know how many bytes are added in order to truncate the "enhanced" pet file back to it's original state in order for the pethandling to work.

You can see how pets are installed by looking at the puppy install scripts like "/usr/local/petget/installpkg.sh" and others in that directory. The mid5sums can only be matched on the original files.

It seems as if you want to create a different kind of package which would require it's own handler and extension.

Cheers,
s

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#11 Post by sc0ttman »

seaside wrote:technosaurus gives an exellent example above of how to add data to binary files (that's for steganography, but it applies to anything)

If you add data to a pet file, you must know how many bytes are added in order to truncate the "enhanced" pet file back to it's original state in order for the pethandling to work.

You can see how pets are installed by looking at the puppy install scripts like "/usr/local/petget/installpkg.sh" and others in that directory. The mid5sums can only be matched on the original files.

It seems as if you want to create a different kind of package which would require it's own handler and extension.

Cheers,
s

Code: Select all

#LIST="PKGNAME=blah PKGVER=blah"
# OLDSIZE="`stat -c %s babl-0.1.2-w5.pet`"
# truncate --size 32 babl-0.1.2-w5.pet
# OLDSIZE="`stat -c %s babl-0.1.2-w5.pet`"
# echo $OLDSIZE
67478
# echo $LIST >> babl-0.1.2-w5.pet
# echo -n $OLDSIZE >> babl-0.1.2-w5.pet
# tail -n1 babl-0.1.2-w5.pet
# truncate --size $OLDSIZE babl-0.1.2-w5.pet
# stat -c %s babl-0.1.2-w5.pet
67522
# tail -n1 babl-0.1.2-w5.pet
67478#
.... is that right??
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

amigo
Posts: 2629
Joined: Mon 02 Apr 2007, 06:52

#12 Post by amigo »

If you expect the pets to be compatible with existing installation/management routines, then there needs to be an md5sum at the end of the archive. Barry's routine is simpler since the size of the md5sum string itself is known -32 bytes.

It is rarely feasable to extend the functionality of a package format/specification -unless the format was designed with that in mind. I find the whole idea of adding data to the end of the package absurd -why not just have all the info you like in a file or files included inside the package.

Checking the size of an archive by counting the numbers of chars can be inaccurate and is costly and slow. The same goes for using tail to retrieve info from the end of an archive -what about a 190MB package -how long is that gonna take?

I definitely agree that a package should contain/provide as much info about itself as is useful. The only other alternative is to have simple archives -signed or with published md5sum along with a central database document(s). The latter is clumsy to use and hard-to-keep up to date.

It's worth mentioning, that rpm archives put all the meta-data into the file *header* with the payload following that. Other self-extracting archives do the same. However, such archive formats are them 'proprietary' in, at least, the sense that you are forced to use their tool for handling and creating them.

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#13 Post by sc0ttman »

amigo wrote:If you expect the pets to be compatible with existing installation/management routines, then there needs to be an md5sum at the end of the archive. Barry's routine is simpler since the size of the md5sum string itself is known -32 bytes.

It is rarely feasable to extend the functionality of a package format/specification -unless the format was designed with that in mind. I find the whole idea of adding data to the end of the package absurd -why not just have all the info you like in a file or files included inside the package.

Checking the size of an archive by counting the numbers of chars can be inaccurate and is costly and slow. The same goes for using tail to retrieve info from the end of an archive -what about a 190MB package -how long is that gonna take?

I definitely agree that a package should contain/provide as much info about itself as is useful. The only other alternative is to have simple archives -signed or with published md5sum along with a central database document(s). The latter is clumsy to use and hard-to-keep up to date.

It's worth mentioning, that rpm archives put all the meta-data into the file *header* with the payload following that. Other self-extracting archives do the same. However, such archive formats are them 'proprietary' in, at least, the sense that you are forced to use their tool for handling and creating them.
how ludicrous is it to put the info at the start, and get it with head?
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

amigo
Posts: 2629
Joined: Mon 02 Apr 2007, 06:52

#14 Post by amigo »

It's not ludicrous at all -it just means that your archives will be unknown 'data' to any one or tool who examines it. Maybe you do want that. But I still doubt it. That would still mean that the installer tool has to be present on the other end -a self-extractor sounds better, except they have extra up-front overhead and are not perfectly binary-compatible.

The real problem with this approach is that the archives/packages lack transparency. It also means that any changes to your approach require changes all across your build system and management system.

Don't get me wrong, I'm not making fun. There are up-sides and down-sides to each and every possible solution to each facet of the problem. And each facet has an effect on the other. As in this case, where it is quite easy to modify the final archive to add some information to it. But, on the installer end, it makes for a lot of redundant processing in order to retrieve that information.

I find the idea of including info inside the archive as the most straightforward and useful -one still has to debate whether it is best to include all that info in a single file or in several files. Since the data is made up of varying types of info, having dis-similar data types is one file make for hard, slow parsing. This matters because, the installer must keep some of the info of the package -dependency info, md5sums, file-list, etc. The data needs to be as accessible as possible -nothing is simpler than spreading things out in several files so it can be easily examined using common tools like grep.

A list of the files included in the package is the most important bit of data and is the essential basis for any sort of dependency information or resolution. A full dependency tree can only arise out of building a complete system in the proper order -keeping those file-lists all the way through so that, as each package gets built, it accesses that database to show which packages contain the files that it needs. Even if you plan to create assembled bundles/sfs/fsimages, you will still need this back-end databse to tell you where the files are which you plan to arbitrarily package/bundle.

I propose that a package should be a recognizable file type, which will properly unpack using commonly-available tools. A bundle/image is something different and the creation and management of them is a separate matter -although mostly the same concepts of 'package' management are used with them.

There was a huge row here on the forum once, where a budding developer discovered that directly decompressing *.pet archives with tar returns an error condition and warning. He never could deal with a warning not being the same as a failure. So, catting data onto the tail, or writing it into the head is gonna produce blobs which are mostly unrecognizable, or worse, look 'funny' to anyone in-the-know. And putting things in the header is actually worse -the most annoying and challenging thing about unpacking an rpm archives is that we don't know how long the header is. Since the header contains the file-list and all installer scripts its size can vary hugely.

I guess I'm being the devils' advocate here, but I surely don't mean to make anyone mad, or badly criticize their ideas or code. I've gotten a PM from you about using src2pkg to create pets, but in this thread you are talking about using a modified package format -I simply see that you need first a better package format which can communicate more info to the installer. So, then you need an installer which can find and use this info when installing packages -and hopefully upgrading or removing them. Since the consistency of the packages and the meta-data is essential for the installer, you're gonna need an automated way to produce the packages. Of course, ones' first instinct may be to create a new package format and at least an installer, but there is always the idea of using someone elses' package format and tools.

Sometimes I run out of steam and don't finish explaining my self or both/all sides of a question. Really, my only intention is to make you think. I have, obviously, spent a lot of time thinking about the packaging and inter-compatibility subjects, since src2pkg creates 6 different kinds of packages. One of those package formats is the *.tpkg format which I designed myself -basically an extended slackware format using an 'install' directory for build script and description files, except that *.tpkg packages have several additional files which provide a *lot* more info. And I have the 'tpkg' installer tools which use the *.tpkg format.

Of course, src2pkg knows how to produce these packages and will automatically produce the minimal amount of any needed files for the database. src2pkg can produce packages which are at least 'sane' -as far as the package format defines 'sane'. Producing packages which need special steps will always require a certain amount of human input, but src2pkg has reduced this to a minimum. On the other hand, src2pkg is also ideal for creating highly-complicated packages -it does not limit what steps you can add or skip while building the 'perfect' package.

Gotta go for now...

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#15 Post by technosaurus »

@ scottman if you fanagled dir2pet to add your data file first you could get it fast with wget + tar. ... didn't I post something similar in one of your threads for.desktop files?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#16 Post by sc0ttman »

Thanks for the answer amigo.. Very helpful.. Yeah, I would *like* pets to contain more info, but if it's a hassle for someone like yourself to manage, it's definitely beyond me.. And I have no intention of creating a new package format..

Techno, i believe you did actually, to help me get into from the pets in the wary repo.. i will have to dig it out and have a look..


...All i really want is a nice way of adding more info to the pets like compile options, as akita still uses the old pet specs format file, i know i could whack it in there, and *guess* (?) it wouldn't break the pets *if* anyone else wanted them...
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#17 Post by technosaurus »

You can always add fields to the pet specs file... I would recommend using awk to parse them and setting. FS="|" or if you are stuck with shell, set IFS="|" before your while read LINE; ... Puppy's existing spec parsers use a bunch of slow piped commands and are really bad examples...part of the reason ppm is so damn slow
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply