Up to the TUG homepage (external link)
Up to Converters between LaTeX and PC Textprocessors homepage

Converters from PC Textprocessors to LaTeX - Overview

Switch conversion direction: From LaTeX to PC

Author: Wilfried Hennings (texconvfaq "at" gmx.de), last update (including subpages): March 3, 2011
The url of this page is http://tug.org/utilities/texconv/pctotex.html

I maintain these pages because I need converters between LaTeX and PC Textprocessors for my work and I want to share the information with others who need it. Because I maintain them in my spare time (uh, what is spare time?), I can not answer individual questions.

This list is as good or as bad as its support, and I need YOUR support to update and supplement this list. Please supplement if you know more and/or better ones. There are some more converters on the CTAN sites, but the following seem to be most promising for conversion to and from the current versions of wordprocessors.

Neither correctness nor completeness is guaranteed.
All opinions mentioned (if any) are my own, not my employer's. Please send corrections, enhancements and supplements (auch in deutscher Sprache) to the following address:
texconvfaq "at" gmx.de

Note that this FAQ list contains information about converters ONLY between PC word processors and LaTeX. Converters to and from other formats may have own FAQ lists – e.g. see the link for converters to and from HTML.


For the impatient, here is a table with overview of features of the most recent converters.


General Remarks

Before looking for a converter, stop and think about a principal question:

What do you want to be converted in which way?

Do you want to convert the document structure, i.e. a heading should remain a heading, a list should remain a list etc., no matter how it will look like in the target format?
Or do you want to convert the appearance, i.e. how it looks like, no matter how it is represented in the target format?
Or do you want a mixture of both?
For using SGML as an intermediate format, you would have to specify the translation rules yourself (as far as I understood). This makes sense, and explains why different people have very different opinions about which converter best fits their needs: They simply have different demands and expectations on what should be converted and how.
So, not only practically there is no converter which is good for everyone and every purpose, but this is even principally impossible because there are no well-defined requirements which a converter should meet.

So keep this in mind when looking through the following list of converters, try yourself and decide what you need.

Principal problems of wordprocessor to LaTeX conversion

One advantage of LaTeX is that it forces to structure a document, whereas wordprocessors like Word/WordPerfect allow unstructured documents. It is hardly possible to automatically structure a document where there was no structure before.

However it is nevertheless possible to write a structured document with a wordprocessor by consistently using styles. Therefore, wordprocessor documents using styles can be converted to a LaTeX with an equivalent (but not necessarily identical) structure.

There are several ways to convert

  1. Word binary format -> LaTeX
  2. RTF (Word ASCII format, use Word's own RTF export) -> LaTeX
  3. Open Office format -> LaTeX
  4. WordPerfect 5.1 format -> LaTeX
  5. HTML (use Wordprocessor's built-in or add-on html converter) -> LaTeX
  6. maybe other external format(s)

The converters being most complete, undergoing further development and having support are:

rtf2latex2e - free standalone rtf to LaTeX converter for Mac, PC, and Unix

word2tex - shareware, MS Word export filter for PC

GrindEQ - shareware, MS Word export filter for PC

word-to-latex - shareware, MS Word export filter for PC

Writer2LaTeX - free export filter and standalone Open Office converter

WP2LaTeX - free standalone Word Perfect converter for PC


Using a Word macro

Free:

winw2ltx: A set of macros, originally for WinWord 2, adapted to WinWord 6 and 7 (95) and now (Aug. 2008) to WinWord 97 (and up)
See more detailed page

Commercial:

MathType: PC equation editor with export to LaTeX.
See more detailed page.
MathType home page (external link)


Using a Word export filter

Shareware:

Word-To-LaTeX: This converter can convert documents from Word2002(XP) or later to LaTeX.
The conversion can be run from the command-line (can be used for batch-processing), through the graphic interface, or directly from Word.
Besides Word2002(XP) or later it also needs MS .NET Framework 2.0 (external link)
Converts:

For a complete list of features, visit its homepage (external link).

The package can be downloaded from the homepage (external link). Limited to 15 days of trial.

Word2TeX: This converter can save documents from Word95 or later as LaTeX, including equation editor (!) objects, MathType objects and Word 2007-2010 equations.
Current version: 5.01, Feb. 2011.
Converts:

(*) restrictions will apply in unregistered Word2TeX: only 7 first equations will be translated, only 1 first table will be translated, only 1 first figure will be translated. Limited to 30 days of trial.

For a complete list of features, visit its homepage (external link).

GrindEQ Word-to-Latex: Shareware, 99EUR (49EUR academic)
converts Microsoft Word documents to LaTeX, AMS-LaTeX, Plain TeX, or AMS-TeX format.
Microsoft Equation 2007, Microsoft Equation 3.x, and MathType are supported.
Works with Microsoft Word 97/2000/XP/2003/2007 and Microsoft Windows 98/Me/NT/2000/XP/2003/x64/Vista.
Evaluation version is restricted to 10 launches.
See homepage (external link)

Converting from Word binary format

This means the .doc format which is used by Word 95, 97, 2000, XP and 2003 and in which also Word 2007 can save its documents if you tell it to do so. The new XML format which is by default used by Word 2007 is (afaik) not yet supported.

Free:

The free (LGPL) office suite Open Office can import Word format and export to LaTeX format. Open Office runs on MacOSX, several Linux/Unix's and also Windows95/98/NT/XP/2003 and Vista and stores documents as XML.
My (and other's) experiences with OO 3.1 are quite good, given the following prerequisites.
- You need to install the Writer2LaTeX extension.
- You need to check "Extras-Options-Load/Save-Microsoft Office-Load-MathType to OpenOffice Math".
See homepage (external link)

LAOLA can read Word6/Word7(=95) documents under Unix and extract the text.
See more detailed page
LAOLA homepage (external link)

word2x: Converts Word6/Word7(=95) documents to LaTeX or plain text.
See more detailed page
word2x homepage (external link)

antiword: A free MS Word reader for Linux, BeOS and RISC OS. It converts the binary files from Word 6, 7, 97 and 2000 to text and Postscript.
See antiword homepage (external link).
A user's comment: "It is still a bit incomplete, but I found it to be rather useful. Moreover, it is available for a wider-than-usual range of platforms."

wvWare is a library that can read the Word6/Word7(=95), Word8(=97) and Word9(=2000) binary file format. It works under most Unix systems.
See wvWare homepage (external link).
The wvWare library is used as import library in the wordprocessor AbiWord (see below).
Its predecessor MSWordView could only read Word8(=97) and convert word into html, which can then be read with a browser.

For the wvware library an API "wsW2LTX" and a GUI shell "wsW2LTXGUI" (for MS Windows) is available at http://www.winshell.de/modules/w2ltx_download/ (external link)

The free (GPL) wordprocessor AbiWord can import Word format (by using the aforementioned wvWare) and export to LaTeX format. AbiWord runs on BeOS, several Unix's and also Windows95/98/NT and stores documents as XML.
AbiWord homepage (external link)


Converting from RTF

To use an RTF converter, the wordprocessor document must first be "saved as" Rich Text Format. However each new version of MS Word came with a new level of the RTF language. Most of the available converters cannot understand the current RTF version

Free:

rtf2latex2e 2-0-1 version (2011).
download from sourceforge (external link).
See more detailed page
If you are interested in the history of this converter, see this page.

RTF2LaTeX, a patch for WP2LaTeX that allows it to convert also RTF documents. Experimental Release 0.4 (works, but it knows only a small group of commands). See its homepage (external link).

GNU unRTF is a command-line program written in C which converts documents in Rich Text Format (.rtf) to several formats including LaTeX.
See its homepage (external link). The latest version 0.21.10 (Jan.17, 2010) is only available as source code.
A precompiled binary for Windows, however an older version (0.19.3, Feb.12, 2005), is available from sourceforge (external link).

The free (GPL) wordprocessor AbiWord can import rtf and also MS Word doc format and export to LaTeX format. AbiWord runs on BeOS, several Unix's and also Windows95b or higher (up to XP) and stores documents as XML.
AbiWord homepage (external link)

Commercial:

Scientific Word: Win95/98/2000/NT4 based TeX/LaTeX system with graphical editor and rtf import capability including MS's equation editor equations. The rtf import converter is basically the same as rtf2latex2e.
See more detailed page
Scientific Word home page (external link)


Converting from Open Office format

Writer2LaTeX is a commandline utility written in java. It converts OpenOffice.org/StarOffice Writer documents into LaTeX2e.

If you have OpenOffice or StarOffice installed, get the Writer2LaTeX extension from http://extensions.services.openoffice.org/project/writer2latex.

If you need Writer2LaTeX as a standalone converter, see its homepage (external link)

Supported operating systems: All on which Java is supported. Requires java runtime environment (JRE), version 1.4 or higher, to run Writer2LaTeX. JRE is included in OpenOffice, and also can be downloaded from java.sun.com (external link) (scroll down to "Java Runtime Environment (JRE)").


Converting from WordPerfect format

Free:

WP2LaTeX converts WordPerfect 1.x / 2.x / 3.x / 4.x / 5.x / 6-8.x, including equations, to LaTeX.
See more detailed page
homepage (external link)

TeXPerfect: WordPerfect 5.1 for DOS -> LaTeX Translater

Commercial:

Publishing Companion: converts Word/WordPerfect, including equations, to LaTeX. Comes with own equation editor.
See more detailed page
KTALK's home page (external link)


HTML as intermediate format

Wordprocessor to HTML

There are free HTML converters for Word 6 and 7(95) for Windows available from Microsoft:
Download... IA for Word 6 (external link) / IA for Word 7 (95) (external link) / IA for Word for Mac (external link)
Word 97 contains an html converter by default, but in contrary to the previous versions it only recognizes heading styles if they are first converted into the corresponding html styles. Also, it sometimes inserts unnecessary tags.
Word 2000 contains the html converter by default, but you should not use this default: It actually creates sort of XML with many Word-specific elements. Instead, for saving as "clean" html, download and install the add-on converter from Microsoft (external link).
For Word XP (2002) and above the "clean" html export can be installed with Word's setup. It is recommended to "save as" "html filtered". However this isn't "clean" enough, you should manually edit the saved html before feeding it to the html-to-LaTeX converter.

WordPerfect 7 and up have an integrated InternetPublisher.
For WordPerfect 6.1 for Windows, the InternetPublisher is available separately:
Download... InternetPublisher for WPWin 6.1 (external link)

There also is a tool for Unix which is intended to convert word6, word7(95) and word8(97) binary files to html. See http://www.su.shuttle.de/turbo/michael/projekte/software/word2html.c.gz (external link)

Also see www.w3.org for a list of converters between word processors and HTML (external link) - now outdated (last change March 1999).

HTML to LaTeX

Because HTML is a structured format, the conversion between HTML and LaTeX is rather straightforward. However there remain the limitations of HTML compared to LaTeX, i.e. there are many elements in LaTeX which can not (yet?) be represented in HTML.

There are several HTML-to-LaTeX converters available. Without giving recommendations:

Frans Faase's html2tex (C source)
See homepage (external link)

Peter Thatcher's html2latex (Perl script)
See homepage at sourceforge.net (external link)

Jeffrey Schaefer's html2latex (Perl script)
See homepage at www.geom.umn.edu (external link)

Michal Kebrt's htmltolatex (Java Program)
See homepage at sourceforge.net (external link)

Some converters are available from CTAN (external link) ("Comprehensive TeX Archive Network"), e.g. in .../support/html2latex. However, what you can find in CTAN under .../support/html2latex/ is Nathan Torkington's converter of 1993 -- rather outdated.
(The ... stands for a host specific base directory, which often is either "/pub/tex" or "/tex-archive")


Other intermediate formats

There are ways to use SGML as intermediate format, and others have used it successfully. Having had a quick look at it, I found it rather complicated, especially it seems that you have to define the translation rules yourself. So I did not put more effort in trying to use it. If anyone can give me a ready-to-use cookbook solution, I will include it here.

Another intermediate format is TeXML. It was designed to make conversion to (La)TeX as easy as possible, especially XSLT-conversion from XML format. A converter from TeXML to (La)TeX is available, see http://getfo.sourceforge.net/texml/ (external link). However I yet don't know of any converter from a texprocessor format to TeXML.


Converting from PageMaker

Pmtolatex, a perl script to convert PageMaker files to LaTeX. See homepage (external link).


Converting from FrameMaker

FrameMaker Utilities (external link): Contains converters for both directions (LaTeX <-> FrameMaker) as well as templates which make conversion from Framemaker to LaTeX more easy


Converting from NotaBene

NB4LATEX converts files from NotaBene4 for DOS (which is an old version for DOS) to LaTeX2e format. You find it on CTAN in directory .../systems/msdos/nb4latex


Converting from ChiWriter

There are two converters on CTAN, but I don't know how good they are and whether they still work (they are DOS programs from 1993 and 1994). You find them on CTAN in directory .../support/chi2ltx/ and in directory .../support/chi2tex/


Converting from Excel

There are two possibilities to do that:

1. Excel2LaTeX: Excel-macro to convert Excel to LaTeX. The generated LaTeX code uses the tabular environment.
On CTAN in .../support/excel2latex/, i.e. here

<citation from http://www.latex-community.org/forum/viewtopic.php?f=5&p=28364>
"when I use Excel2Latex, it says "Can't find project or library" after pressing on the conversion button. ..."
"This problem may be caused by a broken reference to REFEDIT.DLL. Excel2LaTeX.xla contains a reference to folder OFFICE12 (Office 2007). If you use the macro with Office 2003 (OFFICE11) you must fix the reference. REFEDIT.DLL is in c:\Program Files\Microsoft Office\OFFICE11\. In Excel go to Tools>Macros>Visual Basic Editor, select VBAProject (Excel2LaTeX.xla) in Project Explorer, then go to Tools>References and uncheck MISSING:Ref Edit Control, click Browse and Open REFEDIT.DLL, then use the priority button to raise the new Ref Edit Control entry to where the MISSING entry was (click the up arrow button once and then hold Space to raise)." </citation end>.

2. Importing Excel file into Gnumeric (external link), then exporting to LaTeX. But I have no further info on the resulting LaTeX markup.

Converting from OpenOffice spreadsheet

OO macro to convert from OO to LaTeX: http://calc2latex.sourceforge.net/ (external link).


This HTML page is part of the texconv pages.
Copyright © 1998 ... 2011 Wilfried Hennings
You may copy and redistribute it under the following conditions:

Please also note the disclaimer.