Wiki-ing the Vista Monograph: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
I started with the MS Word version of the VistA Monograph,
===== Wiki the Vista Monograph =====
[http://www.va.gov/vista_monograph/docs/vista_monograph2005_06.doc vista_monograph2005_06.doc]

I started with the .doc file of the VistA Monograph


* opened it with Open Office
* opened it with Open Office
* saved it as html
* saved it as html
* cleaned it up with Dave Raggett's [http://tidy.sourceforge.net/ HTML Tidy]
* cleaned it up with Dave Raggett's [http://tidy.sourceforge.net/ HTML Tidy]
** funny characters - went away when I got the UTF-8 stuff right in HTML Tidy.
* converted it to MediaWiki using [http://search.cpan.org/~diberri/HTML-WikiConverter-0.61/lib/HTML/WikiConverter.pm HTML::WikiConverter].
* converted it to MediaWiki using [http://search.cpan.org/~diberri/HTML-WikiConverter-0.61/lib/HTML/WikiConverter.pm HTML::WikiConverter].
* manual editing to clean out a lot of junk html.
* manual editing to clean out a lot of junk html. replaced w/ sed scripts. Trust your Browser! to get things right w/o all this junk.
** br
** br
** font
** font
** div
** div
** span
** span

** funny characters
* Need to redo it with a sed script.
* script to remove excess blank lines.
* script to remove excess blank lines.
It's no wonder it has glitches. I'm surprised it came out as well as it did.
It's no wonder it has glitches. I'm really surprised it came out as well as it did.

NEED to polish scripts and make them available here.

Latest revision as of 03:11, 16 February 2008

I started with the MS Word version of the VistA Monograph, vista_monograph2005_06.doc

  • opened it with Open Office
  • saved it as html
  • cleaned it up with Dave Raggett's HTML Tidy
    • funny characters - went away when I got the UTF-8 stuff right in HTML Tidy.
  • converted it to MediaWiki using HTML::WikiConverter.
  • manual editing to clean out a lot of junk html. replaced w/ sed scripts. Trust your Browser! to get things right w/o all this junk.
    • br
    • font
    • div
    • span
  • script to remove excess blank lines.

It's no wonder it has glitches. I'm really surprised it came out as well as it did.

NEED to polish scripts and make them available here.