The time now is Mon 09 Dec 2019, 10:02
All times are UTC - 4 |
Author |
Message |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Wed 14 Feb 2018, 06:06 Post subject:
|
|
Hello everyone.
I'll say it politely, but I am fuming:
I just tested them on the pekwm xml material, and xmlstarlet, xml2 and consorts
are a complete waste of time and intelligence when what you want is a complete
txt file from a complete xml file.
Those utilities are basically designed to extract precise data from xml files. You
want the whole thing, you're out of luck.
This is nothing personal addressed at any of you nice people who shared your
findings. Again, thanks.
But how is it that no one in the (Linux only?) world ever thought that a complete
xml to txt conversion utility might someday become a need? Flabbergasting!!!
I'm going back to my scripts!!!
BFN.
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Wed 14 Feb 2018, 07:48 Post subject:
|
|
Have you used DocBooks XSLT stylesheets with those tools. XSLT stylesheet is a set of rules that helps convert XML file into another.
https://www.oxygenxml.com/forum/topic10767.html
Code: | xml tr oxygen.xsl your-xml.xml >test.txt |
where oxygen.xml:
Code: |
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="text()[string-length(normalize-space()) = 0]">
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="@*"/>
</xsl:stylesheet>
|
You will get output maybe not so beautifull like yours, but wait, maybe it is time to learn XSLT and make much better XSLT sheet .
O try use PanDoc:
https://pandoc.org/demos.html
Example 31:
Code: | pandoc -f docbook -t markdown -s howto.xml -o example31.text |
Last edited by puppy_apprentice on Wed 14 Feb 2018, 12:06; edited 3 times in total
|
Back to top
|
|
 |
jamesbond
Joined: 26 Feb 2007 Posts: 3387 Location: The Blue Marble
|
Posted: Wed 14 Feb 2018, 11:58 Post subject:
|
|
My apology. for offering the wrong tool for the job. I didn't read the first post carefully.
All that xml2 does is flatten the .xml files so you can process them further with the familiar awk/sed/grep set of tools (which are line-based and cannot work with hierarchical structure - of which .xml files are). It's a generic tool to pre-process generic .xml files for further processing.
You need a tool to convert a bunch of very specific .xml files (that is, pekwm doc files), in a very specific way, into text files. This is a very specific requirement for which no generic tool would do, or exist. I believe that your script is the first tool ever created to accomplish that job, and being the only tool in its class, I would say it's the best tool there is.
@puppy_apprentice: The pekwm doc is written in an old version of DocBook. With a newer docbook all you need is xslt processor (the one from xmlstarlet should do), and the docbook XSLs; but pekwm uses DSSSL and it requires OpenJade tool to convert it into PDF or HTML. The author of pekwm said so himself: https://github.com/pekdon/pekwm/blob/master/doc/tools/mkdocs.sh.
_________________ Fatdog64 forum links: Latest version | Contributed packages | ISO builder
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Wed 14 Feb 2018, 12:04 Post subject:
|
|
I belive it is no problem to convert those XMLs to HTML with proper XSLT stylesheet (PDF is another story, but you can convert it to PDF using my CSS from Firefox via CUPS-PDF).
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Wed 14 Feb 2018, 15:41 Post subject:
|
|
Many thanks for the encouragements, guys.
Text format is two-thirds done!
Once they are finished, I will have to "attack" the original xml files with
puppy_apprentice's css code! (Again thanks.)
So I'm far from through yet, with this project. (Learning a lot along the way.)
I intend to send the final edit of the files back to the the pekwm authors, and
hopefully they will like what they see.
~~~~~~~~~~~~~~
Not that puppy_apprentice is wrong, but James is also right: you can have an
excellent general tool, but there are always details to adjust. Not to mention
personal preferences of this author versus personal preferences of that other author.
General tools can and do save editors a lot of time, but the finishing touches always
have to be done by hand.
~~~~~~~~~~~~~~
Can I ask any of you guys a service ? For the last 4-5 days, I've been getting a
perkwm.org site that's for sale when I go there. But a week ago I could still access
the pekwm docs online. Could someone double-check? TIA.
BFN.
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Wed 14 Feb 2018, 17:12 Post subject:
|
|
A few years ago i've made few XSLT ssheets. So i've look on them again and read some stackexchange hints and there is solution:
Convert to text, save as text.xsl
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="yes" encoding="UTF-8"/>
<xsl:template match="title">
<xsl:value-of select="translate(text(), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
</xsl:template>
<xsl:template match="para[1]">
<xsl:value-of select="normalize-space(text())"/>
</xsl:template>
<xsl:template match="para[2]/variablelist/title">
<xsl:value-of select="translate(text(), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
</xsl:template>
<xsl:template match="para[2]/variablelist/varlistentry/term">
<xsl:value-of select="normalize-space(text())"/>
</xsl:template>
<xsl:template match="para[2]/variablelist/varlistentry/listitem/para">
<xsl:value-of select="concat(normalize-space(text()), '
')"/>
<xsl:for-each select="itemizedlist/listitem/para">
<xsl:value-of select="concat('+ ', normalize-space(text()), '
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
|
usage:
Code: |
xml tr text.xsl pek-xml.xml >test.txt
|
Convert to HTML, save as html.xsl
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<title>HTML Page from PekWM XML Docs</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="title">
<h1><xsl:value-of select="text()"/></h1>
</xsl:template>
<xsl:template match="para[1]">
<p><xsl:value-of select="text()"/></p>
</xsl:template>
<xsl:template match="para[2]/variablelist/title">
<h1><xsl:value-of select="text()"/></h1>
</xsl:template>
<xsl:template match="para[2]/variablelist/varlistentry/term">
<p><b><xsl:value-of select="text()"/></b></p>
</xsl:template>
<xsl:template match="para[2]/variablelist/varlistentry/listitem/para">
<p><i><xsl:value-of select="text()"/></i></p>
<ul>
<xsl:for-each select="itemizedlist/listitem/para">
<li><xsl:value-of select="text()"/></li>
</xsl:for-each>
</ul>
</xsl:template>
</xsl:stylesheet>
|
usage:
Code: |
xml tr html.xsl pek-xml.xml >test.html
|
Description |
xstarlet and those 2 XSLT sheets
|

Download |
Filename |
xstarlet.tar.gz |
Filesize |
129.63 KB |
Downloaded |
134 Time(s) |
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Wed 14 Feb 2018, 19:24 Post subject:
|
|
puppy_apprentice,
your nick is misleading! ; ) With xml, you are top-notch!
Your code worked the conversion to txt format instantly for all xml files in the theme
section of the pekwm docs, except the top one, the one with &bla-bla-bla; references in
it. I did that one by hand, but it was the shortest file of the bunch.
Impressive result attached (with source files).
I concatenated all resulting files in a main "theme.txt" file, keeping individual components.
My additional step after that was to use .
Many thanks.
BFN.
Description |
|

Download |
Filename |
pekwm-theme-section.zip |
Filesize |
14.15 KB |
Downloaded |
66 Time(s) |
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Thu 15 Feb 2018, 04:49 Post subject:
|
|
Those two stylesheets need some tweaking. I've made them for expecialy for your first posted example. Others XML files have other tags and little different structure so then don't look nice as expected. It is possible to write more general sheet. But it is exercise for others
Some references:
http://www.xsltfunctions.com/xsl/
http://scraping.pro/5-best-xpath-cheat-sheets-and-quick-references/
And XMLStarlet only understand XSLT v1.0, so not all functions from those resources will work with it.
So we learn here some bash, some CSS, XML, XSLT, XPath. Nice!
Description |
Destination output for all XML files should looks more like this.
|

Download |
Filename |
test.tar.gz |
Filesize |
3.76 KB |
Downloaded |
111 Time(s) |
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Thu 15 Feb 2018, 17:48 Post subject:
|
|
Ok Mushero i've found proper XSLT stylesheets for PekWM Docs.
Unzip archive and go to xsl folder. Read info file for instructions. Now you will see nice html docs for theme section. Could you check rest of PekWM xml docs files?
Description |
structure.html from structure.xml |
Filesize |
53.61 KB |
Viewed |
208 Time(s) |

|
Description |
i think that i found proper XSLT sheets for PekWM XML docs.
|

Download |
Filename |
looks-good.tar.gz |
Filesize |
181.09 KB |
Downloaded |
113 Time(s) |
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Thu 15 Feb 2018, 19:32 Post subject:
|
|
Hi, puppy_apprentice.
You got indeed beautiful results.
Using your main.xsl file within the script below in dir /usr/share/doc/pekwm. Crudely
drilling down in the subdirs to get the results. It's just that I am afraid there might be
links within the docs in the subdirs.
Talk with you later.
~~~~~~~~~~ Code: | #!/bin/sh
# formula-PA.sh
####
ls *.xml | awk -F"." '{ print $1 }' > liste
while read doc;do
# replaceit --input=$doc.xml "&" "-+- "
xml tr ~/my-applications/text.xsl $doc.xml > $doc.PA.txt
xml tr ~/my-applications/main.xsl $doc.xml > $doc.PA.html
# formule de « puppy-apprentice », du forum Puppy
done < liste
rm -f liste |
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Thu 15 Feb 2018, 20:16 Post subject:
|
|
Almost forgot: if the pekwm.org site stays down, it will be important to have those
docs in html up on another site.
I will send a PM to forum member augras to see if it is possible on his augras.eu site,
where he is hosting some of my stuff.
But where does not really matter. I am thinking that we would need some kind of
approbation from the pekwm people. Maybe they can tell us what is really going on
with their site, too.
BFN.
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Thu 15 Feb 2018, 20:55 Post subject:
|
|
I should have privided these earlier, sorry.
Description |
Complete pekwm docs in xml, from the source zip archive on github.
|

Download |
Filename |
pekwm-docs.tar.gz |
Filesize |
62.73 KB |
Downloaded |
112 Time(s) |
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Thu 15 Feb 2018, 22:01 Post subject:
|
|
Hello puppy_apprentice.
I have tried the system you suggested today (with the db2xhtml-master, etc.), and
got nowhere. Probably there is something I am not understanding.
However, I applied the tips you suggested yesterday and got sometimes pretty
good results. Two zip archives containing the full results of that run -- and the
script I used, are attached:
-- the *.PA.txt and *.PA.html files were created with your system. As you will see,
some have come out in outstanding fashion, others not so much, and others still
were not created at all.
My initial reaction would be to finish the job with a good html editor such as the
Kompozer in SeaMonkey.
But xml/html are more your field of expertise than mine, so if you can produce
the same quality of html as you have showed above, through your "xls" conversion,
applying it on the full pekwm xml's (in the zip archive from the pekwm source,
above), I will certainly not complain!!! Your process is blazingly fast, but it takes
time and expertise to set up.
-- the plain *.txt files in the attached were created with my script. I have also
edited them, with the help of some GNU text utilities, and manually. I feel I have
almost finished. I will use the good text files obtained with your method as
comparison basis, and filler, if needed, during a couple of final editing sessions.
~~~~~~~~
In parallel, I have now evolved a reader script and a search script for the pekwm
doc in text format. A menu acts as the table of contents, and the "real less" utility
as the reader.
I think this reader and search system is exportable | adaptable to other big text
documents. I need another manual like the pekwm manual to test the scripts on
and confirm this.
~~~~~~~~
This ends my "status report". I hope it will help us collaborate on this pekwm
docs project.
BFN.
 |
Description |
|

Download |
Filename |
pekwm-d1.zip |
Filesize |
135.16 KB |
Downloaded |
67 Time(s) |
Description |
|

Download |
Filename |
pekwm-d2.zip |
Filesize |
141.09 KB |
Downloaded |
60 Time(s) |
Description |
Index for the two zip archives above
|

Download |
Filename |
zipsplit.zip |
Filesize |
1.52 KB |
Downloaded |
63 Time(s) |
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
puppy_apprentice

Joined: 07 Feb 2012 Posts: 300
|
Posted: Fri 16 Feb 2018, 13:29 Post subject:
|
|
Had to do some greps, cats, replaceits, xmls and all is in HTML format.
There were some problems with lines that included "&" sign so i have to replace them in XMLs first to "***AND***" and in final HTML document replace them again to "&". There were problems too with "<simplesect>" tag in XMLs - had to change them first to "<section>".
Read whole HTML documents mushero and eventualy correct errors (there will be some variables with "&" at start like ©right so erase them or find in sources what those variables mean and replace them with proper values).
Description |
Happy reading
|

Download |
Filename |
pekwm-docs-html.tar.gz |
Filesize |
58.07 KB |
Downloaded |
260 Time(s) |
|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 14529 Location: Gatineau (Qc), Canada
|
Posted: Fri 16 Feb 2018, 17:38 Post subject:
|
|
Beautiful work, puppy_apprentice!
A thousand thanks for this!
I have referenced your layout on the pekwm thread, here.
BFN.
_________________ musher0
~~~~~~~~~~
Je suis né pour aimer et non pas pour haïr. (Sophocle) /
I was born to love and not to hate. (Sophocles)
|
Back to top
|
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|