PPKG C-Programming Contest

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
Wosh
Posts: 60
Joined: Mon 09 Jul 2007, 19:20
Location: Nuremberg Germany

PPKG C-Programming Contest

#1 Post by Wosh »

Preface
This topic originally started in the Puppy Projects area:
http://www.murga-linux.com/puppy/viewtopic.php?t=47763

But there were two topics mixed:
1. PPKG Puppy Linux Package Handler
2. PPKG C-Programming Contest

So I have decided to split them: All questions about the PPKG Puppy Linux Package Handler are handled in the above URL.

This topic continues with the C-Programming Contest that should mainly regarded as fun, though it's results may contribute to make Puppy Linux programs more effective.

How it started
Playing around with Woof I decided to create a Debian based Puppy Linux. Everything run rather smoothly, congrats Barry, until it came to the conversion of the Debian-Package-Info files to the Puppy-Linux-Database-Package files. On my Intel Pentium-4 3GHz it took more than 1 hour for this conversion.

After a look into the shell script "0setup" that does the conversion, the reason for this behavior was found fast. To generate one database line for Debian the script must do this number of calls:
echo: 12, grep: 9, cut: 8, sed: 16, head: 3, tr: 2, tail: 1 find_cat: 1.
All commands together must be controlled by bash that has to use 36 pipes to transfer the results from one program to the other.

As the Debian Lenny main-package-file contains more than 22000 packages it is clear that there should be found another approach to solve this problem spoiling so much fun in Woof and the Pet Package Manager.

The Program
As I thought that it's not a big thing to do the conversion I wrote a small C-program. I called it ppkg (Puppy-Package-Handler) in the style of Debians dpkg.

After a few hours the code for the basic conversion was written, but then I realized that the problem wasn't as easy to fix as I had thought first for one simple reason:
The Debian packages must be assigned to the menu categories in Puppy Linux. As there is no real clue how that can be done, Barry has written a vala application called find_cat that tries to categorize the packages. To achieve it's goal it has to speculate on keywords in the package description. The results tend to get faulty and so I chose another approach. I created a new kind of simple file called puppy-category-file (extension ctg), that assigns the packages to their category. It looks like:

[Category-Name]
package-1
package-2
...
Using this concept I created the category file lenny.ctg, which contains the category assignment cut out of the puppy-package-database-files created by 0setup and find_cat as I wanted to compare the results of my program to those of 0setup.

As described before this assignment tends to be faulty and should be revised manually. A good job for someone who thinks, "I can not program but I want to contribute to Puppy Linux", would be to optimize this precategorization.

Let's go on. After I had solved this job there was only one more thing to do:
Exclude debug-packages via an option which wasn't a big thing. After completion, I compared the results of ppkg with those of 0setup and they really looked same except of one difference which I consider a bug in 0setup. 0setup cuts the symbols []()- from the package descriptions. Example:

ppkg: [incr Tcl] OOP extension for Tcl - development files
0setup: incr Tcl OOP extension for Tcl development files

Driven by Ambition
Now it may seem the problem was solved. But it wasn't for me. Maybe because of the poor performance of the original my programmer's ambition had been raised. I took it as a sporting task to create something that no real live programmer's client ever would pay (except maybe the owner of a search engine):

I decided to create:
The fastest possible Debian-to-Puppy-Package-Info-Converter that I can write in C.

So I used the most advanced software-techniques that I have in my repertoire to achieve the goal:
1. Buffering input and output
2. Avoiding system calls wherever possible
3. Using pointers to mark text segments
4. Minimizing copying
5. Creation of appropriate types
6. Creation of conversion tables
7. Using quicksort and binary search algorithmns
8. Optimizing loops
9. Anticipating the compiled machine code
10. ...

I maybe 7 or more times revised each line of code and maybe created as many different prototypes until I was satisfied. It has consumed around four days of my leisure time until the program became as it is now.

The result is meagre, at least regarding the 10K size of the stripped program.
So what about the performance that originally was the reason to create this program?
To compare the performance of ppkg and 0setup I wrote a small wrapper that measures time before and after a system call to the tested program. In addition it also can repeat this several times to minimize OS influences. I have called it prc (performance-check), syntax:

prc n program parameter-1 parameter-2 ... (n is the number of times to loop)

To start with fair conditions I converted 0setup to 0setup-cut and removed those parts that do not concern the database conversion. Then both programs had to complete this task:

Convert Packages-Lenny-Debian size: 24,358,983 Bytes
to Packages-Lenny-Puppy size: around 4,670,000 Bytes

Command lines:

0setup-cut debian Packages Packages-debian-lenny-main
ppkg -c debian -i lenny.ctg -x debug Packages Packages-debian-lenny-main

The result was: ppkg is slightly faster.

For those interested in dry statistics:
0setup-cut: 3,746,641 ms = 1 hour 2 minutes 26 seconds 641 ms
ppkg: 355 ms
Result: ppkg is 10,553 times faster than 0setup.

I have decided to donate ppkg to the the Puppy Linux community for these reasons:
1. I want to contribute to make this really witty distribution still a bit more witty.
2. I want to show you an example how a task can be solved using advanced software techniques
3. I want to promote the use of C in the Puppy Linux community especially for time consuming demands
4. I want to encourage people to write quality code (a lot of the open source code looks rather poor)
5. You can test your C programming skills. If you understand all concepts used straightaway then you might be a candidate for the next point.
6. I want to start a programming contest

The Contest
ppkg has been created as an all-out effort of my software techniques. Me personally can not solve this task in a more efficent way. The task itself is a good possibility to test software skills because it is not to trivial and also not to complicated. The basic demand is to convert Debian-Package-Infos like:

Package: abiword
Priority: optional
Section: editors
Installed-Size: 7352
Maintainer: Masayuki Hatta (mhatta) <mhatta@debian.org>
Architecture: i386
Version: 2.6.4-5
Replaces: abiword-gnome
Provides: abiword-gnome
Depends: libaiksaurus-1.2-0c2a (>= 1.2.1+dev-0.12), libaiksaurusgtk-1.2-0c2a (>= 1.2.1+dev-0.12)
Recommends: abiword-plugin-grammar, abiword-plugin-mathview, abiword-help
Suggests: abiword-plugin-goffice
Conflicts: abiword-gnome
Filename: pool/main/a/abiword/abiword_2.6.4-5_i386.deb
Size: 2882324
MD5sum: 7fabfdf5ea014d67541441b930674ff0
SHA1: 792d8d83177ef23cc802b7c249b47b12fa797031
SHA256: c642cd84e17d9e0e88c539f10e812ea187d8a2861acc2566ff1ca21a555ead3d
Description: efficient, featureful word processor with collaboration

to Puppy-Package-Database rows:
abiword_2.6.4|abiword|2.6.4|5|Document|7352K|pool/main/a/abiword|
abiword_2.6.4-5_i386.deb|+libaiksaurus-1.2-0c2a,+libaiksaurusgtk-1.2-0c2a,
+libart-2.0-2|efficient featureful word processor with collaboration|

I personally have reached my limits, I can not perform this task in a faster way than I have done in the released source code. But maybe you can. For you the task even might be a bit more easy. You do not need start from the scratch, though it maybe could be better to start without the influence of my implementation.

The challenge is:
Can you do the conversion of the Lenny packages with a better performance than me. Do you know methods that outperform this program or is it really the limit that can be reached with the algorithms available at present.

Conditions
1. Everything must be coded in C (It would be to simple to optimize parts in assembler).
2. The program must supply a proper error handling.
3. That's it.

As I haven't got a sponsor yet the revenue will be glory:
You will get an entry in the Hall-of-Fame of the Puppy Linux C-Programmers Champions-League that I lead unchallenged at the moment. As gamblers like to reach gigantic high scores I wanted to use the speed win to 0setup as reference. But it turns out to become to boring if you always have to wait one hour for the comparision result. So the high score will be expressed as the performance relation of ppkg Version 0.0.0 to your implementation times 1000.
So here it is:

Puppy Linux C-Programmers Champions-League
Hall-of-Fame

Name...............Date.......................Score
1. Wosh............2009 Oct 13...........1000


I hope some people out there like like this kind of contest. If not I will live a while under the illusion to have written the best solution world wide for this problem. And if you really beat me, I will take my hat off to you.

For those who can not or do not want to compete, there's still an interesting job to do I am not really interested in:

ppkg at it's present state can only convert Debian and Ubuntu packages. Using the concepts and algorithms I used for Debian it shouldn't be a big problem to realize the Slackware or Arch conversion thus helping Barry in his really open minded efforts to democratize a operating system.

The appended software includes all files needed to compile the program and do the described tests.

The Lenny Package-Info-file may be downloaded from:
http://http.us.debian.org/debian/dists/ ... ckages.bz2

All files needed for the contest may be downloaded from:
http://www.murga-linux.com/puppy/viewto ... h&id=22627

Cheers
Wosh

Post Reply