Page 1 of 1

[closed]do you already have a xml to comma delinated script?

Posted: Wed 13 Feb 2013, 04:01
by scsijon
before I start to try to build it! I am NOT a good base coder, although I can adapt existing code fairly well.

Does anyone already have a script that can take a xml file and turn it into a comma deliniated file. I'm starting on a opensuse2ppm script similarly to barryk's mageia2ppm and would like to start with a automated step1 as there are 19098 packages in the opensuse set at the moment(, up from 18880 last month). I'm actually thinking of stripping out lines we don't need as step0 as the source filesize is some 76meg and that should make it quicker to process the rest.

source xml is in this format if anyones interested:

Code: Select all

<package type="rpm">
    <name>844-ksc-pcf</name>
    <arch>noarch</arch>
    <version epoch="0" ver="19990207" rel="784.1.1"/>
    <checksum type="sha256" pkgid="YES">ec26988a001df41bd1752aeb035608edbf1ef5ec646569d63a7d938228a6ff4d</checksum>
    <summary>Korean 8x4x4 Johab Fonts</summary>
    <description>Korean 8x4x4 johab fonts.</description>
    <packager>http://bugs.opensuse.org</packager>
    <url>http://www.debian.or.kr/~cwryu/archive/fonttools/</url>
    <time file="1319310585" build="1319310562"/>
    <size package="2592518" installed="4382509" archive="4403032"/>
    <location href="noarch/844-ksc-pcf-19990207-784.1.1.noarch.rpm"/>
    <format>
      <rpm:license>Public Domain, Freeware</rpm:license>
      <rpm:vendor>openSUSE</rpm:vendor>
      <rpm:group>System/X11/Fonts</rpm:group>
      <rpm:buildhost>build25</rpm:buildhost>
      <rpm:sourcerpm>844-ksc-pcf-19990207-784.1.1.src.rpm</rpm:sourcerpm>
      <rpm:header-range start="872" end="39087"/>
      <rpm:provides>
        <rpm:entry name="locale(xorg-x11:ko)"/>
        <rpm:entry name="844-ksc-pcf" flags="EQ" epoch="0" ver="19990207" rel="784.1.1"/>
      </rpm:provides>
      <rpm:requires>
        <rpm:entry name="perl" pre="1"/>
        <rpm:entry name="/bin/sh"/>
        <rpm:entry name="aaa_base" pre="1"/>
        <rpm:entry name="/bin/sh" pre="1"/>
      </rpm:requires>
    </format>
  </package>
regards
scsijon

Posted: Wed 13 Feb 2013, 16:57
by amigo
Yeah, before trying to build -parsing xml is 'heavy lifting'. A search would have gotten you lots of hits:
https://duckduckgo.com/?q=convert+xml+to+CSV

Posted: Wed 13 Feb 2013, 23:14
by scsijon
amigo wrote:Yeah, before trying to build -parsing xml is 'heavy lifting'. A search would have gotten you lots of hits:
https://duckduckgo.com/?q=convert+xml+to+CSV
thanks amigo, had already done a search via sourceforge and had a look at the results (all 52 pages of them), found everything but what actually did what was wanted, the few that said they did were windows, mac, or required such a lot of additional packages (for a puppyan) that it didn't make sense to use. I shall try your link and see if I can do any better.

Posted: Thu 14 Feb 2013, 08:24
by amigo
Using xslt would be the most obvious:
http://stackoverflow.com/questions/2516 ... ile-to-csv

Other standard XML tools are: XMLStarlet, xsltproc and perl xpath


This is interesting:
http://www.freesoftwaremagazine.com/art ... ties_linux

http://stackoverflow.com/questions/8935 ... ml-in-bash

Posted: Thu 14 Feb 2013, 16:26
by musher0
Hello, scsijon.

If you're into java, you may want to try one of these :

http://www.wenzlaff.de/xmltocsv.html
or
http://code.google.com/p/xml2csv-conv/

Best regards.

musher0

Posted: Thu 14 Feb 2013, 17:02
by seaside
scsijon.,

You may find the following xml utilities of interest. Below is a quote from the simple-icon-tray thread weather program made with xml-printf.
Since there are libraries for many languages to handle xml code parsing, I was wondering if any xml help existed for shell programs in Linux and ran across this -
http://xml-coreutils.sourceforge.net/

It's a set of utilities aimed at emulating the standard shell text tools like sed, tr, cat, printf, find, etc... but specifically for xml.

I compiled xml-printf which can be used to capture the content between tags and made a tray icon program for weather.
Here's a download link for tweather.pet, which contains xml-printf.

http://murga-linux.com/puppy/viewtopic. ... h&id=63046

Regards,
s

Posted: Fri 15 Feb 2013, 06:52
by technosaurus
xml is the most ridiculous format, I've no idea how it caught on.

That being said, if you are dealing with one tag per line, awk is pretty useful

awk '
BEGIN{FS="<|>"}
/<name>/{name=...}
/<arch>/{pkgs[$name][arch]=...}
/ var=/{pkgs[$name][var]=...}
END{}
' file

Posted: Sat 16 Feb 2013, 01:29
by scsijon
thank you all, and I agree technosaurus.

Unfortunately it's the best of two worlds for opensuse's packages! The other requires three sql files to be opened and integrated, before extraction for ppm and together they build 300meg+.

I think I have a lead from one of amigo's new links for something that will work easily. Thank you.

But i'm not marking this solved quite yet!

Posted: Sun 17 Feb 2013, 11:12
by jamesbond
I'm a little late to the game. But this is the tool that I use for getting stuff out of xml data: http://www.ofb.net/~egnor/xml2/. Converts XML into a flat file format which you can grep sed awk on.

Posted: Tue 19 Feb 2013, 03:32
by scsijon
thanks all,

I have changed methods and :oops: feel more than a bit of a fool :oops: after it was pointed out to me that all I need to do is modify the rpm2ppm script to match the opensuse format.