A bash script to convert .xml files to .txt

Message

musher0 · #1 Post by **musher0** » Mon 12 Feb 2018, 12:04

Hello all.

Please find below a script that I have created to transform the .xml files from the
pekwm documentation into plain text files. The result is available here. The
starting .xml material is in the doc directory of the pekwm source at github under
pekdon/pekwm.

I spent a couple of hours combing the Web, but could not find no satisfactory utility.
Most required python; some others, to dish out a lot of cash... I did find a nice little
java applet in German, but it converts from xml to csv, which was not ideal.

Puppies do have the xmllint utility. It does a fair job of converting xml to html but
loses any paragraph structure. (A fine utility that is...) It was ok with the smaller
xml files in the pekwm docs, but not for the larger ones. This is the moment when
you realize that you are lost without the blank line separator!!!

So I took the bull by the horns, and decided to convert the pekwm docs to text
format using the replaceit utility and a couple of GNU standards.

This is the script. I know there are as many xml styles as there are stars in the
sky, but It may be useful to someone as a starting point.

It's still in crude form, I'm afraid. Any improvements and/or constructive
observations will be welcome.

BFN.

~~~~~~~~~~~~~~

Code: Select all

#!/bin/sh
# /opt/local/bin/xml2txt.sh
# (c) musher0, 12 février 2018. GPL3
#
# Usage: enter a directory where there are xml files and 
# issue the command < xml2txt.sh > (without the chevrons).
#
# All the xml files will be grouped in a text file
# bearing the name of the directory, and processed.
#
# Requires : replaceit.
#
# Comment: it leaves to be desired, you will have to refine
# the output; but it does a basic job of eliminating most (?) 
# of the XML tags.
#
####
name="`pwd | awk -F"/" '{ print $NF }'`"

cat -s *.xml > $name.tmp

RPLCT="replaceit --input=$name.tmp"

$RPLCT --wholeline simplesect "#";$RPLCT --wholeline listitem "#"
$RPLCT --wholeline varlistentry "#";$RPLCT --wholeline itemizedlist "#"
$RPLCT --wholeline variablelist "#";$RPLCT --wholeline xreflabel "#"
$RPLCT --wholeline dbhtml "#";$RPLCT "<para>" " "
$RPLCT "</para>" " ";$RPLCT formalpara " "
$RPLCT "<title>" " ¤¤ ";$RPLCT "</title>" " ¤¤ "
$RPLCT "<term>" " ¤ ";$RPLCT "</term>" " ¤ "
$RPLCT "<screen>" " -=- ";$RPLCT "</screen>" " -=- "
$RPLCT "<filename>" " ";$RPLCT "</filename>" " "
$RPLCT "</chapter>" "End of chapter"
$RPLCT "<chapter>" "Beginning of chapter"
$RPLCT "</note>" "End of note"
$RPLCT "<note>" "Beginning of note"
$RPLCT "</section>" "End of section"
$RPLCT "<section>" "Section:";$RPLCT "<partintro>" "Intro"
$RPLCT "</part" "#";$RPLCT "author>" "#"
$RPLCT "object>" "#";$RPLCT "subtitle>" " # "
$RPLCT "bookinfo>" "#";$RPLCT "abstract>" "#"
$RPLCT --wholeline "<authorgroup>" "#"

# Add more XML tags to be replaced or canceled out here.
# Simply follow the replaceit pattern above.

grep -vE "^#" $name.tmp | tr -s "\n" > $name.txt
# The commands in the line above ignore the lines beginning 
# with a "#" and then squeeze multiple blank lines into one.

rm -f $name.tmp # Obviously!

puppy_apprentice · #2 Post by **puppy_apprentice** » Mon 12 Feb 2018, 18:40

There are some XML tools like XMLStarlet:

http://xmlstar.sourceforge.net/overview.php

You can convert XML to HTML or TXT using XSLT sheet and XPath.

musher0 · #3 Post by **musher0** » Mon 12 Feb 2018, 20:37

Hi, puppy_apprentice.

Thanks but no thanks.

It's always the same thing with these apps that pretend to convert from xml:
you need the style sheet for the xml file, and as I said, there are as many of
those as there are stars in the sky.

Of course the author of the original xml file seldom provides said style sheet. And
to really complicate the problem, some xml authors do not know their xml, they
create xml files with errors in them! Can you believe that? How is a simple non-xml
bloke like me supposed to react?

In short, any xml file sends me up the well-known creek without a paddle.

~~~~~~~~~~~~

Reading some of the unix and/or linux forums, I saw that some of those guys use
common tools like sed, grep or awk to make xml files readable by humans.

One of them remarked that xml works with <tag> and <\untag>. Get it?

So in theory, you could come up with a script that scans the xml file for such tags,
notes what they are supposed to do, makes a little database of them, feeds them
to replaceit, maybe you have to use tr or awk or a similar utility to polish the
result, and voilà, now we would have a true text file readable by humans. Phew.

I hate xml. It was probably invented by some schizophrenic. No person with a sane
mind would have come up with such a contraption. IMO, xml is THE prime
example of: "why do simple when you can do complicated."

I think it's time to go to the pub with my good buddy Will Ockham, and send some
simple Belgian white down our gullets.

BFN.

puppy_apprentice · #4 Post by **puppy_apprentice** » Mon 12 Feb 2018, 21:29

Maybe you can apply CSS for those XML files and make them looks nice in browser?

https://www.w3schools.com/xml/xml_display.asp
https://www.htmlgoodies.com/beyond/css/ ... g-css.html
http://www.xmlplease.com/xml/xmlcss/
http://www.avajava.com/tutorials/lesson ... h-css.html
http://edutechwiki.unige.ch/en/CSS_for_XML_tutorial
https://www.quackit.com/xml/tutorial/xml_css.cfm
http://fabien.potencier.org/parsing-xml ... ctors.html

musher0 · #5 Post by **musher0** » Mon 12 Feb 2018, 22:54

Ok, fine, puppy_apprentice, so we need css files.

Why do the darn xml authors NOT provide them along with their *.xml files?
To prove they are superior to people who do not understand xml?

I notice that you yourself provided references to css files but NOT the actual stuff.

I'm still up that creek without a paddle.

I think I'm going to start the "DOWN WITH XML" movement !!!

Strip XML, strip XML, strip XML to txt, ra-ra-ra!!!

BFN.

puppy_apprentice · #6 Post by **puppy_apprentice** » Mon 12 Feb 2018, 23:38

from first link:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="cd_catalog.css"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
.
.
.
</CATALOG>

where cd_catalog.css is:

Code: Select all

CATALOG {
    background-color: #ffffff;
    width: 100%;
}
CD {
    display: block;
    margin-bottom: 30pt;
    margin-left: 0;
}
TITLE {
    display: block;
    color: #ff0000;
    font-size: 20pt;
}
ARTIST {
    display: block;
    color: #0000ff;
    font-size: 20pt;
}
COUNTRY, PRICE, YEAR, COMPANY {
    display: block;
    color: #000000;
    margin-left: 20pt;
}

Try to add to yours xml files:

Code: Select all

<?xml-stylesheet type="text/css" href="my-xml.css"?>

when my-xml.css could be eg.:

Code: Select all

section {
    background-color: #ffffff;
    width: 100%;
}
chapter {
    display: block;
    margin-bottom: 30pt;
    margin-left: 0;
}
title {
    display: block;
    color: #ff0000;
    font-size: 20pt;
}
note {
    display: block;
    color: #0000ff;
    font-size: 20pt;
}
term {
    display: block;
    color: #000000;
    margin-left: 20pt;
}

tags taken from your bash script.

slavvo67 · #7 Post by **slavvo67** » Tue 13 Feb 2018, 03:37

Musher0:

You may be better with an up with LibreOffice movement. I think you can use Libre or even ABI Word via command line to do the converts. Not sure how pretty the results would be; though.

Still, the more options, the better. - Thanks!

Slavvo67

musher0 · #8 Post by **musher0** » Tue 13 Feb 2018, 03:55

@slavvo67:
If the xml is malformed, OpenOffice won't take it on.

Once you have a good number of xml tags in your script, the conversion is not
that complicated.

Finally, there are so many GNU text tools we can use for this. I wonder why we
need xml anymore...

BFN.

~~~~~~~~~~

Second grind. Feedback most welcome.

Code: Select all

#!/bin/sh
# /opt/local/bin/xml2txt.sh ## 2nd grind
# Usage --
# Enter a directory where there are xml files and
# issue the command < xml2txt.sh > (without the chevrons).
#
# Explanation --
# All the xml files in the directory will be grouped in a text file
# bearing the name of the directory, and processed.
# If there is only one xml file, create a directory with a telling
# name, put the file in it, and use the script.
#
# Requires : replaceit.
#
# Comments --
# It leaves to be desired, the output will have to be revised and
# refined; however it does the basic job of eliminating most XML tags.
#
# An "ultimate" version of this script would scan the file for XML tags,
# and create the routine that replaceit would process.
#
# Originally designed for the pekwm-0.1.18rc1 docs in xml format.
####
###############
# © Christian L'Écuyer,Gatineau (Qc), Canada,  12 février 2018. GPL3
# (Alias musher0 [forum Puppy].) # # https://opensource.org/licenses/GPL-3.0
#    This program is free software: you can redistribute it and/or modify it under the
#    terms of the GNU General Public License as published by the Free Software Foundation,
#    either version 3 of the License, or  (at your option) any later version.
#    	This program is distributed in the hope that it will be useful, but WITHOUT ANY
#    WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
#    A PARTICULAR PURPOSE. See the GNU General Public License for more details.
#    	You should have received a copy of the GNU General Public License along with
#   this program. If not, see <http://www.gnu.org/licenses/>.
###############
#   Ce programme est libre : vous pouvez le redistribuer ou modifier selon les termes
#   de la Licence Publique Générale GNU publiée par la Free Software Foundation (v. 3
#   ou toute version ultérieure choisie par vous).
#       Ce programme est distribué dans l'espoir qu'il sera utile, mais SANS AUCUNE
#   GARANTIE, ni explicite ni implicite, y compris des garanties de commercialisation
#   ou d'adaptation à un but spécifique. Pour plus de détails, veuillez vous reporter
#   au texte officiel de cette licence à https://opensource.org/licenses/GPL-3.0, à
#   http://www.linux-france.org/article/these/gpl.html pour une traduction et, pour une
#   explication en français, à https://fr.wikipedia.org/wiki/Licence_publique_générale_GNU.
###############
####
name="`pwd | awk -F"/" '{ print $NF }'`"

cat -s *.xml > $name.tmp
# We could store this temporary file on a ramdisk
# during the process to speed things up.

RPLCT="replaceit --input=$name.tmp"

$RPLCT --wholeline simplesect "#";$RPLCT --wholeline listitem "#"
$RPLCT --wholeline varlistentry "#";$RPLCT --wholeline itemizedlist "#"
$RPLCT --wholeline variablelist "#";$RPLCT --wholeline xreflabel "#"
$RPLCT --wholeline dbhtml "#";
$RPLCT --wholeline "</abstract>" "#"
$RPLCT --wholeline "authorgroup>" "#"
$RPLCT --wholeline "object>" "#";$RPLCT --wholeline "<copyright" "#"
$RPLCT --wholeline "<year>" "#";$RPLCT --wholeline "<holder>" "#"
# $RPLCT --wholeline "           " "#" # NON
# $RPLCT --wholeline "</authorgroup>" "#"

$RPLCT "<abstract>" "Abstract"
$RPLCT "<ulink" " ";$RPLCT "</ulink>" " "
$RPLCT "<para>" " ";$RPLCT "</para>" " ";
$RPLCT formalpara " ";$RPLCT "&eacute;" "é"
$RPLCT "<title>" " ¤¤¤ ";$RPLCT "</title>" " ¤¤¤ "
$RPLCT "<subtitle>" " ¤¤ ";$RPLCT "</subtitle>" " ¤¤ "
$RPLCT "<term>" " ¤ ";$RPLCT "</term>" " ¤ "
$RPLCT "<screen>" " -=- ";$RPLCT "</screen>" " -=- "
$RPLCT "<filename>" " ";$RPLCT "</filename>" " "

$RPLCT "<chapter>" "Chapter"
$RPLCT "</chapter>" "End of chapter"

$RPLCT "<note>" "Note"
$RPLCT "</note>" "End of note"

$RPLCT "<section>" "Section"
$RPLCT "</section>" "End of section"

$RPLCT "<partintro>" "Intro"
$RPLCT "</part" "#";$RPLCT "author>" "#"
$RPLCT "object>" "#";$RPLCT "subtitle>" " # "
$RPLCT "bookinfo>" "#" # ;$RPLCT "abstract>" "#"
$RPLCT "<firstname>" " ";$RPLCT "</firstname>" " "
$RPLCT "<surname>" " ";$RPLCT "</surname>" " "
$RPLCT "legalnotice>" "#"

$RPLCT --wholeline "</#" "#";$RPLCT --wholeline "<#" "#"
$RPLCT --wholeline "</ #" "#";$RPLCT --wholeline "< #" "#"
$RPLCT --wholeline "</ >" "#"
# $RPLCT --wholeline "<!--" "#" # au besoin
$RPLCT --wholeline "</copyright>" "#"

# Add more XML tags to be replaced or canceled out here.
# Simply follow the pattern above.

# Peaufinage
grep -vE "^#" $name.tmp | tr -s "\n" > $name.txt
rm -f $name.tmp

# Testing points:
# cat autoprops.txt | tr -s " " | tr -s "\n" | tr -s "\n" > autoprops.tx1
# éliminerait les taquets et plus de lignes vides, mais on perdrait aussi de la mise en forme.
# For advanced layout:
# test;while read line;do [ "$line" = "\s" ] || echo $line | tr -s "\n" >> test;done < advanced.txt
# >test1;while read line;do [ "${line:0:1}" = "\"" ] && echo -e "\t$line" >> test1 || echo $line >> test1;done < test
# fmt test1 > test2

Example of output:

¤¤¤ Advanced Autoproperties ¤¤¤

Below is a list of the different actions available to you in your
autoproperties file.

These are the actual Auto Properties. They can take four types of
arguments:
bool, integer, string, or geom.

A bool is either True (1) or False (0). An Integer is a number, negative
or positive.

A string is any string, it's used as an identifier.

Finally, geom is an X Geometry String by the form:
"[=][<width>{xX}<height>][{+-}<xoffset>{+-}<yoffset>]"
(see: man 3 XParseGeometry). Examples are 200x300+0+0, 0x500+200+300,
20x10+0+50, et cetera.

¤¤¤ Exhaustive Autoprops List ¤¤¤
¤ AllowedActions (string), DisallowedActions (string)
A list of actions to allow/deny performing on a client.

"Move" ((Dis)allow moving of the client window)

"Resize" ((Dis)allow resizing of the client window)

"Iconify" ((Dis)allow iconifying of the client window)

"Shade" ((Dis)allow shading of the client window)

"Stick" ((Dis)allow setting sticky state on the client window)

"MaximizeHorizontal" ((Dis)allow maximizing the client window
horizontally)

"MaximizeVertical" ((Dis)allow maximizing the client window
vertically)

"Fullscreen" ((Dis)allow setting the client window in fullscreen
mode)

"SetWorkspace" ((Dis)allow changing of workspace)

"Close" ((Dis)allow closing)

¤ ApplyOn (string) ¤ A list of conditions of when to apply this autoprop
(so be sure to include this in your property), consisting of

"New" (Applies when the application first starts)

"Reload" (Apply when pekwm's config files are reloaded)

"Start" (Apply if window already exists before pekwm
starts/restarts. Note when using grouping Start will not take
workspaces in account)

"Transient" (Apply to Transient windows as well as normal
windows. Dialog boxes are commonly transient windows)

"TransientOnly" (Apply to Transient windows only. Dialog boxes
are commonly transient windows)

"Workspace" (Apply when the window is sent to another workspace)

¤ Border (bool) ¤ Window starts with a border

¤ CfgDeny (string) ¤ A list of conditions of when to deny things
requested by the client program, consisting of

"Above" (Ignore client request to always place window above
other windows)

"ActiveWindow" (Ignore client requests for showing and giving
input focus)

"Below" (Ignore client request to always place window below
other windows)

"Fullscreen" (Ignore client request to set window to fullscreen
mode)

"Hidden" (Ignore client request to show/hide window)

"MaximizedHorz" (Ignore client request to maximize window
horizontally)

"MaximizedVert" (Ignore client request to maximize window
vertically)

"Position" (Ignore client requested changes to window position)

"Size" (Ignore client requested changes to window size)

"Stacking" (Ignore client requested changes to window stacking)

"Strut" (Ignore client request for reserving space in the screen
corners, typically done by panels and the like)

"Tiling" (Tiling layouters should leave this window floating)

¤ ClientGeometry (geom) ¤
X Geometry String showing the initial size and position of the
client, excluding the possible pekwm titlebar and window borders.

¤ Decor (string) ¤
Use the specified decor for this window. The decor has to be
defined in the used theme. The decor is chosen by the first
match in order: AutoProperty, TypeRules, DecorRules.

¤ Focusable (bool) ¤
Toggles if this client can be focused while it's running.

¤ FocusNew (bool) ¤
Toggles if this client gets focused when it initially pops up
a window.

¤ FrameGeometry (geom) ¤
X Geometry String showing the initial size and position of
the window frame. Window frame includes the client window
and the possible pekwm titlebar and window borders. If both
ClientGeometry and FrameGeometry are present, FrameGeometry
overrides the ClientGeometry.

¤ Fullscreen (bool) ¤ Window starts in fullscreen mode

¤ Group (string) ¤
Defines the name of the group. Also the section that contains all
the grouping options. They are:

Behind (bool) - If true makes new clients of a group not to
become the active one in the group.

FocusedFirst (bool) - If true and there are more than one frame
where the window could be autogrouped into, the currently
focused frame is considered the first option.

Global (bool) - If true makes new clients start in a group even
if the group is on another workspace or iconified.

Raise (bool) - If true makes new clients raise the frame they
open in.

Size (integer) - How many clients should be grouped in one group.

¤ Iconified (bool) ¤ Window starts Iconified

¤ Layer (string) ¤ Windows layer.
Makes the window stay under or above other windows. Default layer
is "Normal". Possible parameters are (listed from the bottommost to
the uppermost):

¤ MaximizedHorizontal (bool) ¤
Window starts Maximized Horizontally

¤ MaximizedVertical (bool) ¤
Window starts Maximized Vertically

¤ Opacity (int int) ¤
Sets the focused and unfocused opacity values for the window. A value of
100 means completely opaque, while 0 stands for completely transparent.
Note that a Composite Manager needs to be running for this feature to
take effect.

¤ PlaceNew (bool) ¤
Toggles the use of placing rules for this client.

¤ Role (string) ¤
Apply this autoproperty on clients that have a WM_WINDOW_ROLE hint that
matches this string. String is a regexp like: "^Main".

¤ Shaded (bool) ¤
Window starts Shaded

¤ Skip (string) ¤
A list of situations when to ignore the defined application and let the user action
skip over it, consisting of

"Snap" (Do not snap to this window while moving windows)

"Menus" (Do not show this window in pekwm menus other than the icon menu)

"FocusToggle" (Do not focus to this window when doing Next/PrevFrame)

¤ Sticky (bool) ¤
Window starts Sticky (present on all workspaces)

¤ Title (string) ¤
Apply this autoproperty on clients that have a title that matches this string. String is
a regexp like: "^Saving".

¤ Titlebar (bool) ¤
Window starts with a TitleBar

¤ Workspace (integer) ¤
Which workspace to start program on.

The originating xml file has this:

<simplesect id="config-autoprops-adv" xreflabel="Advanced Autoproperties">
<title>Advanced Autoproperties</title>

<para>
Below is a list of the different actions available to you in your
autoproperties file; These are the actual Auto Properties. They
can take four types of arguments: bool, integer, string, or geom.
A bool is either True (1) or False (0). An Integer is a number,
negative or positive. A string is any string, it's used as an
identifier. Finally, geom is an X Geometry String by the form:
"[=][<width>{xX}<height>][{+-}<xoffset>{+-}<yoffset>]"
(see: man 3 XParseGeometry). Examples are 200x300+0+0,
0x500+200+300, 20x10+0+50, et cetera.
</para>

<para>
<variablelist>
<title>Exhaustive Autoprops List</title>
<varlistentry>
<term>AllowedActions (string) , DisallowedActions (string)</term>
<listitem>
<para>
A list of actions to allow/deny performing on a client.
<itemizedlist>
<listitem>
<para>
"Move" ((Dis)allow moving of the client window)
</para>
</listitem>
<listitem>
<para>
"Resize" ((Dis)allow resizing of the client window)
</para>
</listitem>
<listitem>
<para>
"Iconify" ((Dis)allow iconifying of the client window)
</para>
</listitem>
<listitem>
<para>
"Shade" ((Dis)allow shading of the client window)
</para>
</listitem>
<listitem>
<para>
"Stick" ((Dis)allow setting sticky state on the client window)
</para>
</listitem>
<listitem>
<para>
"MaximizeHorizontal" ((Dis)allow maximizing the client window
horizontally)
</para>
</listitem>
<listitem>
<para>
"MaximizeVertical" ((Dis)allow maximizing the client window
vertically)
</para>
</listitem>
<listitem>
<para>
"Fullscreen" ((Dis)allow setting the client window in
fullscreen mode)
</para>
</listitem>
<listitem>
<para>
"SetWorkspace" ((Dis)allow changing of workspace)
</para>
</listitem>
<listitem>
<para>
"Close" ((Dis)allow closing)
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>ApplyOn (string)</term>
<listitem>
<para>
A list of conditions of when to apply this autoprop (so be
sure to include this in your property), consisting of
<itemizedlist>
<listitem>
<para>
"New" (Applies when the application first starts)
</para>
</listitem>
<listitem>
<para>
"Reload" (Apply when pekwm's config files are
reloaded)
</para>
</listitem>
<listitem>
<para>
"Start" (Apply if window already exists before pekwm
starts/restarts. Note when using grouping Start will
not take workspaces in account)
</para>
</listitem>
<listitem>
<para>
"Transient" (Apply to Transient windows as well as
normal windows. Dialog boxes are commonly transient
windows)
</para>
</listitem>
<listitem>
<para>
"TransientOnly" (Apply to Transient windows
only. Dialog boxes are commonly transient
windows)
</para>
</listitem>
<listitem>
<para>
"Workspace" (Apply when the window is sent to another workspace)
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Border (bool)</term>
<listitem>
<para>Window starts with a border</para>
</listitem>
</varlistentry>
<varlistentry>
<term>CfgDeny (string)</term>
<listitem>
<para>
A list of conditions of when to deny things requested by the client program, consisting of
<itemizedlist>
<listitem>
<para>
"Above" (Ignore client request to always place window above other windows)
</para>
</listitem>
<listitem>
<para>
"ActiveWindow" (Ignore client requests for showing
and giving input focus)
</para>
</listitem>
<listitem>
<para>
"Below" (Ignore client request to always place window below other windows)
</para>
</listitem>
<listitem>
<para>
"Fullscreen" (Ignore client request to set window to fullscreen mode)
</para>
</listitem>
<listitem>
<para>
"Hidden" (Ignore client request to show/hide window)
</para>
</listitem>
<listitem>
<para>
"MaximizedHorz" (Ignore client request to maximize
window horizontally)
</para>
</listitem>
<listitem>
<para>
"MaximizedVert" (Ignore client request to maximize
window vertically)
</para>
</listitem>
<listitem>
<para>
"Position" (Ignore client requested changes to
window position)
</para>
</listitem>
<listitem>
<para>
"Size" (Ignore client requested changes to window
size)
</para>
</listitem>
<listitem>
<para>
"Stacking" (Ignore client requested changes to
window stacking)
</para>
</listitem>
<listitem>
<para>
"Strut" (Ignore client request for reserving space in the screen corners,
typically done by panels and the like)
</para>
</listitem>
<listitem>
<para>
"Tiling" (Tiling layouters should leave this window floating)
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>ClientGeometry (geom)</term>
<listitem>
<para>
X Geometry String showing the initial size and position of
the client, excluding the possible pekwm titlebar and
window borders.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Decor (string)</term>
<listitem>
<para>
Use the specified decor for this window. The decor has to be defined in the used theme.
The decor is chosen by the first match in order: AutoProperty, TypeRules, DecorRules.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Focusable (bool)</term>
<listitem>
<para>Toggles if this client can be focused while it's running.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FocusNew (bool)</term>
<listitem>
<para>Toggles if this client gets focused when it initially pops up a window.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FrameGeometry (geom)</term>
<listitem>
<para>
X Geometry String showing the initial size and position of
the window frame. Window frame includes the client window
and the possible pekwm titlebar and window borders. If
both ClientGeometry and FrameGeometry are present,
FrameGeometry overrides the ClientGeometry.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Fullscreen (bool)</term>
<listitem>
<para>Window starts in fullscreen mode</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Group (string)</term>
<listitem>
<para>
Defines the name of the group. Also the section that
contains all the grouping options. They are:
</para>
<para>
<itemizedlist>
<listitem>
<para>
Behind (bool) - If true makes new clients of a group
not to become the active one in the group.
</para>
</listitem>
<listitem>
<para>
FocusedFirst (bool) - If true and there are more
than one frame where the window could be autogrouped
into, the currently focused frame is considered the
first option.
</para>
</listitem>
<listitem>
<para>
Global (bool) - If true makes new clients start in a
group even if the group is on another workspace or
iconified.
</para>
</listitem>
<listitem>
<para>
Raise (bool) - If true makes new clients raise the
frame they open in.
</para>
</listitem>
<listitem>
<para>
Size (integer) - How many clients should be grouped in one group.
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Iconified (bool)</term>
<listitem>
<para>Window starts Iconified</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Layer (string)</term>
<listitem>
<para>Windows layer. Makes the window stay under or above other windows. Default layer is "Normal".
Possible parameters are (listed from the bottommost to the uppermost):
<itemizedlist>
<listitem><para>Desktop</para></listitem>
<listitem><para>Below</para></listitem>
<listitem><para>Normal</para></listitem>
<listitem><para>OnTop</para></listitem>
<listitem><para>Harbour</para></listitem>
<listitem><para>AboveHarbour</para></listitem>
<listitem><para>Menu</para></listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>MaximizedHorizontal (bool)</term>
<listitem>
<para>Window starts Maximized Horizontally</para>
</listitem>
</varlistentry>
<varlistentry>
<term>MaximizedVertical (bool)</term>
<listitem>
<para>Window starts Maximized Vertically</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Opacity (int int)</term>
<listitem>
<para>Sets the focused and unfocused opacity values for the window. A value of 100 means completely opaque, while 0 stands for completely transparent.</para>
<para>Note that a Composite Manager needs to be running for this feature to take effect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>PlaceNew (bool)</term>
<listitem>
<para>Toggles the use of placing rules for this client.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Role (string)</term>
<listitem>
<para>
Apply this autoproperty on clients that have a WM_WINDOW_ROLE hint that matches
this string. String is a regexp like: "^Main".
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Shaded (bool)</term>
<listitem>
<para>Window starts Shaded</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Skip (string)</term>
<listitem>
<para>A list of situations when to ignore the defined application and
let the user action skip over it, consisting of
<itemizedlist>
<listitem>
<para>"Snap" (Do not snap to this window while moving windows)</para>
</listitem>
<listitem>
<para>"Menus" (Do not show this window in pekwm menus other than the icon menu)</para>
</listitem>
<listitem>
<para>"FocusToggle" (Do not focus to this window when doing Next/PrevFrame)</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Sticky (bool)</term>
<listitem>
<para>Window starts Sticky (present on all workspaces)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Title (string)</term>
<listitem>
<para>
Apply this autoproperty on clients that have a title that matches
this string. String is a regexp like: "^Saving".
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Titlebar (bool)</term>
<listitem>
<para>Window starts with a TitleBar</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Workspace (integer)</term>
<listitem>
<para>Which workspace to start program on.</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</simplesect>

A most eloquent example of "why do simple when you can do complicated!

(IMO)
I mention it again, because you have to think of the time it took the author to layout the initial text file (there had to be
one, no?) in xml. And then only a specialized program can read it. Ha!

puppy_apprentice · #9 Post by **puppy_apprentice** » Tue 13 Feb 2018, 09:34

Ok, i've made a CSS file for your XML file.
I've added 2 lines in your XML file. You have to add those to other XML files to read XML file nice in browser.

musher0 · #10 Post by **musher0** » Tue 13 Feb 2018, 13:25

Many thanks, puppy_apprentice.

I will certainly try out your css snippet. (For the record, it's not "my" xml; that
document came from the pekwm devs.)

@all: Off the top, does any of you have an idea why xml docs authors often do not
provide such css files?

BFN.

puppy_apprentice · #11 Post by **puppy_apprentice** » Tue 13 Feb 2018, 20:16

I've meant your XML file as a poster not owner.

Why they do not include CSS stylesheet?

I don't know exactly but maybe this:

https://en.wikipedia.org/wiki/Single_source_publishing

And they (or we) can use to make software documentation DocBook XSL.

http://docbook.sourceforge.net/
http://archive.oreilly.com/pub/post/sof ... th_do.html
http://docbook.org/

So they use XML format as a source that will be later compiled/converted to TXT, HTML, PDF and etc.

musher0 · #12 Post by **musher0** » Wed 14 Feb 2018, 03:18

Thanks for sharing your research, puppy_apprentice.

I wouldn't think that "single-source publishing" is a reason for not including a css file
along with a xml document. On the contrary, if I were the editor, I would strive to
make available multiple css files (or similar), one for each media.

In any case, I find this ciriticism of it interesting. (From the wikipeda article.)

Criticism
Editors using single-source publishing have been criticized for below-standard work
quality, leading some critics to describe single-source publishing as the "conveyor
belt assembly" of content creation.[15]

While heavily used in technical translation, there are risks of error in regard to
indexing. While two words might be synonyms in English, they may not be
synonyms in another language. In a document produced via single-sourcing, the
index will be translated automatically and the two words will be rendered as
synonyms. This is because they are synonyms in the source language, while in the
target language they are not.

Food for thought.

I will look at your other articles later.

BFN.

MochiMoppel · #13 Post by **MochiMoppel** » Wed 14 Feb 2018, 05:27

musher0 wrote:On the contrary, if I were the editor, I would strive to
make available multiple css files (or similar), one for each media.

This would make sense for similarly structured HTML/XHTML documents. Here the authors almost always provide CSS files because they want to make sure that a page is rendered exactly the way they designed it and not in the way the browser renders page elements by default.
Less common is to provide CSS files for specific media. Apart from the extra work involved authors can't be sure that browsers can handle media dependent CSS.

DocBooks are a bit different. As puppy_apprentice already indicated, DocBooks are meant to be transformed to other formats like HTML, PDF, man pages or whatever (not sure though if 'whatever' includes TXT). This is done by XLS stylesheets and those stylesheets are already developed and are maintained by the DocBook community at Sourceforge, so there is really no need for DocBook authors to develop or supply them with their XML documents.

jamesbond · #14 Post by **jamesbond** » Wed 14 Feb 2018, 08:30

I'd recommend xml2, here: https://web.archive.org/web/20160730094 ... gnor/xml2/.
Sources are here: https://web.archive.org/web/20160427221 ... .net/gale/ and if you only trust github, a copy of the sources is kept here too: https://github.com/clone/xml2.

musher0 · #15 Post by **musher0** » Wed 14 Feb 2018, 09:02

Hi, James.

Writing those scripts, I somehow had a feeling that I was reinventing the four-hole button !!!

Many thanks.

musher0 · #16 Post by **musher0** » Wed 14 Feb 2018, 10:06

Hello everyone.

I'll say it politely, but I am fuming:

I just tested them on the pekwm xml material, and xmlstarlet, xml2 and consorts
are a complete waste of time and intelligence when what you want is a complete
txt file from a complete xml file.

Those utilities are basically designed to extract precise data from xml files. You
want the whole thing, you're out of luck.

This is nothing personal addressed at any of you nice people who shared your
findings. Again, thanks.

But how is it that no one in the (Linux only?) world ever thought that a complete
xml to txt conversion utility might someday become a need? Flabbergasting!!!

I'm going back to my scripts!!!

BFN.

puppy_apprentice · #17 Post by **puppy_apprentice** » Wed 14 Feb 2018, 11:48

Have you used DocBooks XSLT stylesheets with those tools. XSLT stylesheet is a set of rules that helps convert XML file into another.

https://www.oxygenxml.com/forum/topic10767.html

Code: Select all

xml tr oxygen.xsl your-xml.xml >test.txt

where oxygen.xml:

Code: Select all

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="text()[string-length(normalize-space()) = 0]">
        <xsl:text>
</xsl:text>
    </xsl:template>
   
    <xsl:template match="@*"/>
</xsl:stylesheet>

You will get output maybe not so beautifull like yours, but wait, maybe it is time to learn XSLT and make much better XSLT sheet

.

O try use PanDoc:

https://pandoc.org/demos.html

Example 31:

Code: Select all

pandoc -f docbook -t markdown -s howto.xml -o example31.text

jamesbond · #18 Post by **jamesbond** » Wed 14 Feb 2018, 15:58

My apology. for offering the wrong tool for the job. I didn't read the first post carefully.

All that xml2 does is flatten the .xml files so you can process them further with the familiar awk/sed/grep set of tools (which are line-based and cannot work with hierarchical structure - of which .xml files are). It's a generic tool to pre-process generic .xml files for further processing.

You need a tool to convert a bunch of very specific .xml files (that is, pekwm doc files), in a very specific way, into text files. This is a very specific requirement for which no generic tool would do, or exist. I believe that your script is the first tool ever created to accomplish that job, and being the only tool in its class, I would say it's the best tool there is.

@puppy_apprentice: The pekwm doc is written in an old version of DocBook. With a newer docbook all you need is xslt processor (the one from xmlstarlet should do), and the docbook XSLs; but pekwm uses DSSSL and it requires OpenJade tool to convert it into PDF or HTML. The author of pekwm said so himself: https://github.com/pekdon/pekwm/blob/ma ... /mkdocs.sh.

puppy_apprentice · #19 Post by **puppy_apprentice** » Wed 14 Feb 2018, 16:04

I belive it is no problem to convert those XMLs to HTML with proper XSLT stylesheet (PDF is another story, but you can convert it to PDF using my CSS from Firefox via CUPS-PDF).

musher0 · #20 Post by **musher0** » Wed 14 Feb 2018, 19:41

Many thanks for the encouragements, guys.

Text format is two-thirds done!

Once they are finished, I will have to "attack"

the original xml files with
puppy_apprentice's css code! (Again thanks.)

So I'm far from through yet, with this project. (Learning a lot along the way.)

I intend to send the final edit of the files back to the the pekwm authors, and
hopefully they will like what they see.

~~~~~~~~~~~~~~

Not that puppy_apprentice is wrong, but James is also right: you can have an
excellent general tool, but there are always details to adjust. Not to mention
personal preferences of this author versus personal preferences of that other author.

General tools can and do save editors a lot of time, but the finishing touches always
have to be done by hand.

~~~~~~~~~~~~~~

Can I ask any of you guys a service ? For the last 4-5 days, I've been getting a
perkwm.org site that's for sale when I go there. But a week ago I could still access
the pekwm docs online. Could someone double-check? TIA.

BFN.

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

A bash script to convert .xml files to .txt

A bash script to convert .xml files to .txt