Universal package database format

Message

amigo · #16 Post by **amigo** » Fri 03 Nov 2017, 10:21

First, the tpm stuff is still incomplete and unavailable on my site.
tpkg, is fully implemented -the whole of KISS is built using it. But, the KISS stuff online is not current.

Sharp fellow that you are -even with no sleep, you've found some of the sore points in the tpkg spec and usage. As to the depends(requires), they are concrete and in-flexible, as you note. KISS does assume that a new distro version will consist of completely re-built packages -incremental building is a slippery slope which can lead to a slack-like situation where a large number of binary packages which are included, in fact cannot be re-compiled on the current system.

tpkg is using a 'provides' file which tries to list all run-ables and shared objects. I was trying to avoid having to parse a packages' full file-list and figure out from there whether something could be a 'required' for another package. However, 'provides', as used by debian, gives some flexibility regarding upgrading without 'breaking' the version numbers of dependant items.

What I mean is, that instead of saying that "prog X version 0.1 needs lib A version 0.1", the provides for A is actually the so-name of the library, not the name+version. This means, quite sanely, that as long as the newer version of A maintains the same so-name (and API), then prog X can still use the new A version without breaking the depends rules. This is something that will be done differently with tpm -I'm still formulating how to implement that. For tpkg, it was in keeping with the idea of not having version dependency 'ranges'. Managing these necessarily involves human-input which I try to avoid when possible.

Using a single-file has real advantages, vis-a-vis space-saving and convenience. However, when a single file must contain multiple entries, with each entry having multiple fields, with different types of data, they quickly can become a parsing nightmare. The puppy specs file is a good(bad) example -multiple fields, some with numeric values, some with text which might contain spaces and/or special characters (possible including your separator char). While it is simple to cut the wanted field, when you *look* at the file it is basically un-readable -especially since several fields contain the exact values as other fields. And, the specs file leaves out the most valuable info of all -the list of files contained in the package. Now, it gets even worse if you try to include the file-list in the specs file, and worse still if, on the local machine, you concatenate all the spec files into a single-file for the whole installation. You get a file unreadable-by-humans, difficult to parse by machine because of the various data types, and anytime you want info the whole file must be processed for even the tiniest bit of data, and when making changes to the installation the whole file must be parsed and then re-written -again even for the slightest change.

Both tpkg and tpm use multi-file databases -at package-level, repo-level and at the local installed level. Small bits of info which can be easily contained as variable values can be all packed into a single file. Using shell-syntax we can simply source the file and instantly have all info as variables. But, for long lists, like lists of files contained in a package, it behooves us to split the info into a separate file which can be processed as a line-oriented list. The file remains human-readable and by keeping these lists in their own directory we can search for data more easily. This is key to keeping search-times short. tpkg and tpm list files this way. There is a directory named like 'file-lists', which contains one file per package. Each file is named with the full name-version-arch-release of each packages. So, when we want to know, for instance, which package a certain file belongs to, we cd into the file-lists dir and simply grep:
grep name-of-file-we-search *
and we get back the name of the package or packages which contain the file of interest.
Of course, knowing which files belong to which package is the most useful bit of info we can have -you can't even remove or upgrade a package without it. It allows us to discover depends in the first place. Then, once we have a list of requirements for this package, we create another flat-file (also in its' own dir) which lists the requirements of that package. This gets used extensively for things like finding 'orphaned' packages, or reverse-depends who-needs this package.

For all package formats, src2pkg tries to concentrate heavy workloads into the build side- Generating, parsing info at install-time can make package installation painfully slow -especially when installing a full distro. For instance, some packages contain a MANIFEST file which uses tar -vvt to create a data-rich listing. This is useful for being able to determine if a file has changed after being installed. The listings can also include a checksum for each file of a package. This means we take the output of tar -tvv and combine the checksums with it, maybe re-arrange the data fields for readability. It gets quite lengthy and for packages like webmin or the kernel sources it can take a half-hour just to generate the manifest. This problem led to src2pkg being able to omit the manifest for such packages, and for tpkg(the installer) to not insist on haing the file. Anyway, if you were to try to create manifests at install-time, and there might be an advantage to this, but then the process could take a whole morning or day.

tpkg uses a scalable database, manifests are optional. Not having them makes less info available for instance we can't check if files have changed since installation. Since the manifests are the biggest files in the database, the space-saving is significant. I really was thinking of Puppy (users) when I made them optional.

I'm gonna stop now, I hope you'll get some sleep if you haven't already done so. Your feedback is very valuable to me -not many folks can get their head around the subject, so this interaction with you is important to me.

wanderer · #17 Post by **wanderer** » Fri 03 Nov 2017, 17:00

technosaurus

please do not be offended by my ignorance and intrusion
no need to answer this

but can you use flatpak

wanderer

technosaurus · #18 Post by **technosaurus** » Sat 04 Nov 2017, 08:25

Re: flatpack, I actually like the idea (most of it anyhow) but:

developed by people with a long involvement in the GNOME community

having spent time inside Red Hat working on container technologies

relies on certain session services, such as a dbus session bus and a systemd --user instance

I could go on.... so, just no. From portaudio to the gtk3 snafu to systemd, that team manages to take great boiler plate ideas and use their waterfall development system to turn them into something unrecognizable an awful. That is the complete opposite of how we do things here. Hell half our ideas start off as crap or from lack of awareness of a better existing solution, but fresh eyes and minds on an existing problem with a different set of constraints often brings about a better, more portable solution.

On a side note, I often intentionally box myself into unreasonably strict constraints - for shell scripts I limit myself to busybox compatibility and for C, I only use features supported by musl and uclibc and try to limit the number of dependencies.

@amigo
package file list =
tar -ztvf package.tar.gz

Having access to this information is critical if you want to be able to uninstall a package and since most small distros will delete the package after installation, it needs to be kept somewhere. The easiest way is to keep 1 file listing per package, but that gets to be pretty cumbersome and it isn't too hard to put them into a single file. Puppy had a file like this for years before I finally got tired of people complaining about not being able to uninstall packages - I think the shell script was in the single digit lines of code, but now that I know awk, I could probably do it in a one-liner. The great part about awk is the ease with which you can set record and field separators even to multiple characters - arch's package database format uses "\n\n" as the record separator and "\n" as the field separator with the first field being its field type surrounded by "%%" I _could_ parse that with a shell script but I would probably be the only one who could actually read it, but with awk, its pretty straight forward (besides, busybox shells don't support arrays) Anyhow back to file listings; busybox supports cksum, md5sum, sha1sum, sha256sum, sha512sum so any metapackage format should provide each of those and let the downstream choose which to use, but the list should be "$Sum $File" on each line because that format allows busybox to check them all with the "-c" flag. It doesn't have to be distributed in a separate file though, thats what here-docs are for. In fact, I can't see a reason not to use shell functions for each item, so that the package manager can just source it and run the functions. For example:

Code: Select all

#I forget the alias syntax, so just using a variable
[ "$DEBUG" ] && heredoc="tee ${PkgName}.debug" || heredoc="cat"

list_lib_deps(){
  $heredoc <<EOF
libc.so.1
libX11.so.1
EOF
}

list_bin_deps(){
  $heredoc <<EOF
X
wm
EOF
}

list_data_deps(){
  $heredoc <<EOF
myfont
thispackage-data
EOF
}

list_deps(){
  list_lib_deps
  list_bin_deps
  list_data_deps
}

The way I look at "provides" is a way for distros that often split packages into separate lib, dev, bin, data/common, etc... to basically say hey its ok if some of our package maintainers want to keep their package together. It can also provide a dual purpose for things like busybox, which could "provide" everything from ash to zcat for which it has a builtin applet, but since it is a multicall binary, it cannot easily be split up (unless you want a single symlink in a package)

WRT multi-part fields, the human readable way is to do something like puppy and use '|' to separate fields and then use ';' to separate items inside the fields and then ',' or '\t' but that's just from downloading several different package databases and counting ascii characters - only alpine contained a '|' character and it should have been a '/'. I suggest ';' for the second separator because that is what XDG uses in the desktop file spec... and then comma/tab for obvious reasons.
Keep in mind that ASCII has specific characters alloted to exactly this:

Code: Select all

28	1C	00011100	FS	file separator
29	1D	00011101	GS	group separator
30	1E	00011110	RS	record separator
31	1F	00011111	US	unit separator

However I don't know how most text editors would handle it.

amigo · #19 Post by **amigo** » Mon 06 Nov 2017, 09:05

Surely you didn't think I don't know how to list files in a package...? The list of files contained in a package is the first, most basic bit of data we can have. Can you imagine a package manager, or anything that calls itself a package manager which would be *unable* to produce a package fiile-list or remove a package. Oh wait, there is a distro that is famous for this... Still, I have some theoretical examples that are way more absurd.

I really re-posted here just to include the text of an email I just got from the rox-users mailing list -some very interesting and pertinent topics they are going to explore:

Message: 1
Date: Sat, 4 Nov 2017 14:10:40 +0100
From: Liam Proven <lproven@gmail.com>
To: ROX Desktop Mailing List <rox-users@lists.sourceforge.net>
Subject: [rox-users] FosDem
Message-ID:
<CAMTenCH=dOEUh45U3GBMbTU=CO+PMjsZTtH-xT_qxucwrKcaCA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Will any ROX people be at FOSdem? Prime material for a presentation, I think...

Liam Proven ? Profile: https://about.me/liamproven
Email: lproven@cix.co.uk ? Google Mail/Talk/Plus: lproven@gmail.com
Twitter/Facebook/Flickr: lproven ? Skype/LinkedIn/AIM/Yahoo: liamproven
UK: +44 7939-087884 ? ?R/WhatsApp/Telegram/Signal: +420 702 829 053

---------- Forwarded message ----------

Online at:
https://lists.fosdem.org/pipermail/fosd ... 02648.html

The Distributions devroom will take place Sunday 4 February 2018 at
FOSDEM, in Brussels, Belgium at the Universit? Libre de Bruxelles.

For this year's distributions devroom, we want to focus on the ways that
distribution technologies can be leveraged to allow for easier
creation of a multi-verse of artifacts from single source trees. We also
want to continue to highlight the huge efforts being made in shared
environments around Build/Test/Release cycles.

We welcome submissions targeted at contributors interested in issues
unique to distributions, especially in the following topics:

- Distribution and Community collaborations, eg: how does code flow from
developers to end users across communities, ensuring trust and code
audibility

- Automating building software for redistribution to minimize human
involvement, eg: bots that branch and build software, bots that
participate as team members extending human involvement

- Cross-distribution collaboration on common issues, eg: content
distribution, infrastructure, and documentation

- Growing distribution communities, eg: onboarding new users, helping
new contributors learn community values and technology, increasing
contributor technical skills, recognizing and rewarding contribution

- Principals of Rolling Releases, Long Term Supported Releases (LTS),
Feature gated releases, and calendar releases

- Distribution construction, installation, deployment, packaging and
content management

- Balancing new code and active upstreams verus security updates, back
porting and minimization of user breaking changes

- Delivering architecture independent software universally across
architectures within the confines of distribution systems

- Effectively communicating the difference in experience across
architectures for developers, packagers, and users

- Working with vendors and including them in the community

- The future of distributions, emerging trends and evolving user demands
from the idea of a platform

Ideal submissions are actionable and opinionated. Submissions may
be in the form of 25 or 50 minute talks, panel sessions, round-table
discussions, or Birds of a Feather (BoF) sessions.

Dates
------
Submission Deadline: 03-Dec-2017 @ 2359 GMT
Acceptance Notification: 8-Dec-2017
Final Schedule Posted: 15-Dec-2017

How to submit
--------------
Visit https://penta.fosdem.org/submission/FOSDEM18

1.) If you do not have an account, create one here
2.) Click 'Create Event'
3.) Enter your presentation details
4.) Be sure to select the Distributions Devroom track!
5.) Submit

What to include
---------------
- The title of your submission
- A 1-paragraph Abstract
- A longer description including the benefit of your talk to your target
audience, including a definition of your target audience.
- Approximate length / type of submission (talk, BoF, ...)
- Links to related websites/blogs/talk material (if any)

Administrative Notes
----------------
We will be live-streaming and recording the Distributions Devroom.
Presenting at FOSDEM implies permission to record your session and
distribute the recording afterwards. All videos will be made available
under the standard FOSDEM content license (CC-BY).

If you have any questions, feel free to contact the
devroom organizers: distributions-devroom@lists.fosdem.org
(https://lists.fosdem.org/listinfo/distributions-devroom)

Cheers!

Brian Exelbierd (twitter: @bexelbie) and Brian Stinson (twitter:
@bstinsonmhk) for and on behalf of The Distributions Devroom Program
Committee
_______________________________________________
rox-users mailing list
rox-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rox-users

End of rox-users Digest, Vol 78, Issue 1

amigo · #20 Post by **amigo** » Mon 06 Nov 2017, 18:17

Just remembering that when src2pkg generates a *.desktop file, when none is not present in the sources, it auto-detects some of the categories, like GTK and QT. Since by that time during package creation, src2pkg already knows which packages are needed by this new package, it checks if that includes gtk or other GUI libraries and adds those categories to the new *.desktop file. I do see the utility of desktop files/specs -at least from the point of view of desktop environments.

I'm pretty sure that the desktop-file-validate 'dislikes' *.desktop files which have no categories anyway, so I had to be able to discover some category for the package. BTW, absence of any dependance on GUI libraries indicate to src2pkg that the package being created probably doesn't need any *.desktop file at all. While it might be possible to auto-assign packages other categories, each bit of info would need to be found or generated. Of course src2pkg will use existing *.desktop files, if wanted. They are checked for validity and edited to meet the specs for modern *.desktop files.

That said, lots of info which can be had at package build-time would serve no sane purpose if passed through *.desktop files to the run-time side of things like a Desktop Environment.

technosaurus · #21 Post by **technosaurus** » Wed 08 Nov 2017, 05:50

amigo wrote:Surely you didn't think I don't know how to list files in a package...? The list of files contained in a package is the first, most basic bit of data we can have. Can you imagine a package manager, or anything that calls itself a package manager which would be *unable* to produce a package fiile-list or remove a package. Oh wait, there is a distro that is famous for this... Still, I have some theoretical examples that are way more absurd.

Half of my posts are more of a note to myself than anything else. I forget simple things like that all the time (I still have to use stackoverflow almost every time I do a complex regex or any non-stanard printf format) The rest of the post follows my broken train of thought that if we want to ensure each file is un-corrupted, that a tarball listing won't be enough, since it doesn't do checksums. I have been browsing the repositories of different distros and really like the idea of having the package info in a separate file from the package; tiny core, for instance, uses squashfs packages and has the package info in a separate file. That seems to be the simplest way. I like simple.

I really re-posted here just to include the text of an email I just got from the rox-users mailing list -some very interesting and pertinent topics they are going to explore:
Message: 1
Date: Sat, 4 Nov 2017 14:10:40 +0100
From: Liam Proven <lproven@gmail.com>
To: ROX Desktop Mailing List <rox-users@lists.sourceforge.net>
Subject: [rox-users] FosDem
Message-ID:
<CAMTenCH=dOEUh45U3GBMbTU=CO+PMjsZTtH-xT_qxucwrKcaCA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Will any ROX people be at FOSdem? Prime material for a presentation, I think...

Liam Proven ? Profile: https://about.me/liamproven
Email: lproven@cix.co.uk ? Google Mail/Talk/Plus: lproven@gmail.com
Twitter/Facebook/Flickr: lproven ? Skype/LinkedIn/AIM/Yahoo: liamproven
UK: +44 7939-087884 ? ?R/WhatsApp/Telegram/Signal: +420 702 829 053

---------- Forwarded message ----------

Online at:
https://lists.fosdem.org/pipermail/fosd ... 02648.html
...

I wonder if a live video feed / recorded lightning talk would be a possibility, since there's no way I could make it to Brussels. Rox specifically could benefit from the Mime-type + package manager integration parts. I'd also like to integrate the suggests/recommends items into the start menu via a right click menu... I wonder if I should submit a rewritten version of my previously mentioned Rox start menu? I stopped working on it after realizing that the XDG spec for .directory files was too ambiguous to the point that there is very little cross distro compatibility and the icon naming spec and desktop spec need some additional work to sync them up, but I don't recall the exact details. Though it would probably be better to patch Rox to properly handle mime-apps.list according to the spec ..instead of (or in addition to ) a directory full of shell scripts or links for handlers, use:

Code: Select all

[Default Applications]
mimetype1=default1.desktop;default2.desktop;

technosaurus · #22 Post by **technosaurus** » Wed 08 Nov 2017, 06:30

amigo wrote:That said, lots of info which can be had at package build-time would serve no sane purpose if passed through *.desktop files to the run-time side of things like a Desktop Environment.

I must have miscommunicated something. What I meant was to use much of the data from a package's .desktop file(s) in the package database entry if it has one. It doesn't even need to be in the same format, just a format that is easily usable by common programming languages used by package managers; thus it should be easy to parse in shell, awk, c/c++, lua, tcl and easily dumped to an sqlite db. Amazingly enough that is something puppy's format is really good at, though tab separators would be more conventional. Rather than having umpteen (possibly unused) additional fields for translations of each translatable field though, I would generate a separate database file for each language and use google translate (or similar) to fill in any gaps.

P.S. I think the desktop file should split Mimetype into multiple fields like read, write and edit (maybe more); so that right click menus could properly deduce whether an application can edit or only view or only convert to a specific Mimetype... which would also trigger an update to the mime apps spec.

On a similar note, dependencies are not specific enough either. Not just build vs run time, there is no way to tell if missing a dep will simply reduce functionality or render the program inoperable... which goes toward the ambiguity of suggests, recommends and other similar terms. There is also some difference between binary, library and data dependencies as well as module and plugin dependencies varying across programming languages. Finding those outside of the package build environment can become extremely complex - from parsing the output of strace to parsing an entire shell script to find out what binaries it calls that aren't shell functions or builtins (how many shell scripts do you think leave off sed or grep from dependencies even though they use it) This was (is) a big problem with user built pet packages, because they often only specify the dependencies that are not in the standard puppy (if any at all)

Edit: Note to self - Look into getting a cert for signing packages... and figure out how to sign them.

amigo · #23 Post by **amigo** » Wed 08 Nov 2017, 14:09

Yes, package info is best kept in a separate file. With care, this can be formatted as sourceable -just a bunch of variables which need no parsing, etc.

list of files included in a package should also be in separate files -and in their own subdir for easy, one-pass grepping to find out which package owns a certain file. This is used at build-time of new packages to determine dependencies (requirements).

These requirements are also best kept in their own file, own directory. This makes queries for reverse depends a single-pass grep operation and also means that requirements for the package or readily listed.

All file lists are simply that, without any comments or blank lines, so they cane be 'while, read, do-ed' -hehe. The list-of-all files should be included in the package -in case anyone gets the idea they could simply keep the package and then list it (the package) and use that for removepkg functions. The list-of-all files is the basic unit of package real 'management'.

Very important: dependency/requires list must be generated at build-time, on a native build system. Otherwise they can never be depended on to be accurate. Period. How specific and detailed the info is, is debatable. I mentioned that tpkg does away with any >= or <= definitions -as a way of avoiding that next level dependency hell. src2pkg uses a bash patch which can read scripts and determine which external programs it calls, then they are tracked by package, so a script which is part of a package will have its' deps checked and recorded. This is not perfect, but pretty killer, still.

Degrees of dependency: for most PM's, a dependency is something that must be available for the program to successfully start -or run effectively. Obviously, the first usually means some library that a program must have in order to start and run. The second part 'run effectively' means like the program 'man' which *mist have the executable program 'groff' in order to do anything useful. 'suggests' is used for things like binary plugins, or for runtime options, or like accessory scipts which aren't really part of the packages main functionality.

tpkg system is scalable in that one can choose to include, or not, manifest files. These are generated at package build-time using the long output of 'tar', but re-arranging the data fields while, at the same time, generateing hashes of each file in the package and adding that data field to the manifest entries. Again, the info can only be valid and believed, if generated at build-time and included in the package. This provides a way to check anytime, whether a file on the system has been modified. The scalability is that, the user can configure tpkg/tpm to discard the manifests, on installation. They are the largest files in the database.

Usually, lists of included files and dependencies do *not* fill the 2K file size-limit. However, the info from is the most of any needed data for queries and pkg management. It's worth of every bit of 'wasted' space to be able to easily and *very* quickly list files or cross-reference deps and reverse-deps.

"every time I do a complex regex or any non-stanard printf format" Well, I really dislike code which can't be easily read and understood and eventually, fixed or modified. I really hate regex and avoid it if possible. A tiny bit of egrep-regex and with sed. But even sed lets us use '-e' to separate different passes. All my packaging tools, end-to-end, use grep/fgrep often, sed rarely and awk never. I admire awk, but I don't think it should be required for handling or building packages.

build-time-dependencies are bit harder to get, beyond deducing that the build requires the devel-files of any libs it needs. Pretty sure most build systems use, like src2pkg, at least some manual input or referencing lists. src2pkg runs potentially *lots* of commands figuring out what the sources consist of and how to get them to configure and build. It does track some of its' success -being able to write the build script for you, when an auto-build succeeds. No routines yet, for writing out/amending build-depends from that.

T2 has a tool which parses output from configure scripts/cmake, etc. to determine header calls. That tool is the big 'gem' from T2 -along with their lua-bash plugin. I have the lua-bash working here. Could a real C guru, like yourself, pull that configure-parser out to stand alone? Probably a 15 minute job, no?

technosaurus · #24 Post by **technosaurus** » Tue 14 Nov 2017, 02:54

Crap... just found this

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Universal package database format

Would you support a new XDG spec for packages?