Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sun 23 Nov 2014, 08:27
All times are UTC - 4
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
Simple converter for .doc/.odt to HTML?
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [9 Posts]  
Author Message
Makoto


Joined: 03 Sep 2009
Posts: 1808
Location: Out wandering... maybe.

PostPosted: Tue 21 Aug 2012, 01:58    Post subject:  Simple converter for .doc/.odt to HTML?  

Is there anything simple that will allow me to convert document file formats (mostly .doc and .odt, I guess), with styles/etc., to an HTML file? I can (and have) used OpenOffice/LibreOffice - but that maintains parity with the MS Office way of creating an HTML file from a document, and adds a LOT of unnecessary code overhead to the resulting HTML file. Evil or Very Mad

Seamonkey's Composer won't directly open the above document filetypes. I can use it (among other HTML editors, of course) to attempt to strip out what the Office programs have done to the text... but that's a massive undertaking. Mr. Green (Though, if there's an automatic way to 'optimize' the HTML page in Seamonkey, I wouldn't mind. Very Happy)

_________________
[ Puppy 4.3.1 JP, Frugal install | 1GB RAM | 1.3GB swap ] * My Pidgin Builds for Puppy 4.3.1+
In memory of our beloved American Eskimo puppy (1995-2010) and black Lab puppy (1997-2011).
Back to top
View user's profile Send private message 
don570


Joined: 10 Mar 2010
Posts: 3379
Location: Ontario

PostPosted: Tue 21 Aug 2012, 20:17    Post subject:  

There is a nicely written text processor that opens up
microsoft docs and saves to various formats including html.

Softmaker 2012 beta is the latest version. It's a commercial product
but there's a free trial so you can find out if it's good enough.

Here's more info ---->

http://murga-linux.com/puppy/viewtopic.php?p=647950#647950

__________________________________________
Back to top
View user's profile Send private message 
Makoto


Joined: 03 Sep 2009
Posts: 1808
Location: Out wandering... maybe.

PostPosted: Wed 22 Aug 2012, 00:20    Post subject:  

Yeah, but I'm a little reluctant to install another office suite just for that particular 'simple' feature, though. Smile

Why MS Office/Word and Open/LibreOffice feel they have to add that much code even for a simple text page HTML, I don't know. I did just that, recently, had a monospace font set for the whole document - and the resulting HTML page from OpenOffice was redefining the font with every single line of text. Rolling Eyes Among other things, of course.

...then again, I experimented with doing it with AbiWord. Not only did I lose some of the formatting, but it also insisted on adding CSS functions to the document. (It's just plain text, with the occasional italicized, bolded and maybe underlined word. That's all. No real need for a stylesheet, is there? (No, really. I'm not really sure.))

_________________
[ Puppy 4.3.1 JP, Frugal install | 1GB RAM | 1.3GB swap ] * My Pidgin Builds for Puppy 4.3.1+
In memory of our beloved American Eskimo puppy (1995-2010) and black Lab puppy (1997-2011).
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4376

PostPosted: Sun 26 Aug 2012, 22:21    Post subject:  

Abiword
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Makoto


Joined: 03 Sep 2009
Posts: 1808
Location: Out wandering... maybe.

PostPosted: Mon 27 Aug 2012, 05:30    Post subject:  

I did try Abiword, as I mentioned above. Not only did it insist on adding CSS to the document, it still generated an HTML page around the same size as the versions Word and OpenOffice created. Neutral

All of them roughly converted a 66k (7-bit) text document into a 166k HTML file. Text should not need a 100k markup. Mr. Green
(I used to do it manually, so I should know. Razz)

_________________
[ Puppy 4.3.1 JP, Frugal install | 1GB RAM | 1.3GB swap ] * My Pidgin Builds for Puppy 4.3.1+
In memory of our beloved American Eskimo puppy (1995-2010) and black Lab puppy (1997-2011).
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4376

PostPosted: Mon 27 Aug 2012, 10:02    Post subject:  

that sound about right actually to preserve formatting... they have to cover cases that aren't as simple as yours. if you dont care about preserving format at all convert to text and then:
Code:
echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
   <title></title>
</head>
<body>
<pre>' > file.html

cat file.txt >>file.html

echo '</pre>
</body>
</html>' >>file.html

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Makoto


Joined: 03 Sep 2009
Posts: 1808
Location: Out wandering... maybe.

PostPosted: Mon 27 Aug 2012, 22:49    Post subject:  

I know, but there's usually something about the generated HTML that just seems... weird, for whatever reason. Much more than it probably needs to be, maybe. Like OpenOffice's insistence on restating the font on every single line of text (sure, I set a monospace font for the entire document, but does it really need to be renewed on every line?). Or an earlier version of MS Word insisting on tokenizing practically everything. Smile

Of course, I'll be the first to admit I'm not any sort of expert on HTML. Mr. Green

_________________
[ Puppy 4.3.1 JP, Frugal install | 1GB RAM | 1.3GB swap ] * My Pidgin Builds for Puppy 4.3.1+
In memory of our beloved American Eskimo puppy (1995-2010) and black Lab puppy (1997-2011).
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4376

PostPosted: Tue 28 Aug 2012, 22:29    Post subject:  

still sticking by my original suggestion, I just tested abiword-2.8.6 in wary 5.3 on /usr/share/examples/test.doc ... just uncheck all of the boxes when you save as html - it actually reduced the total size 4 fold and looks acceptable.
(btw abiword does have a command line interface that you can batch process with)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Makoto


Joined: 03 Sep 2009
Posts: 1808
Location: Out wandering... maybe.

PostPosted: Wed 29 Aug 2012, 00:18    Post subject:  

Abiword eats (doesn't support) some of the simple formatting elements I use, though, like horizontal lines. They disappear from the document when I load it... and, of course, aren't added to the end HTML. Sad

(That's aside from the fact that Abiword usually behaves rather badly, for me. I'm surprised I managed to get it to export a document to an HTML page without something bad happening, aside from the missing elements.)

Hmm... wonder how much of a dent HTML Tidy might make in it? Neutral

_________________
[ Puppy 4.3.1 JP, Frugal install | 1GB RAM | 1.3GB swap ] * My Pidgin Builds for Puppy 4.3.1+
In memory of our beloved American Eskimo puppy (1995-2010) and black Lab puppy (1997-2011).
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [9 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0810s ][ Queries: 12 (0.0187s) ][ GZIP on ]