The translation of a LaTeX source file into HTML involves of loading tex4ht.sty and *.4ht style files, choosing the desirable options for the translation, compiling the source into dvi code with the native LaTeX engine, and postprocessing the outcome with the tex4ht and t4ht programs (see overview).
The htlatex command loads a script which takes on itself to invoke the different steps of the process, without user intervention. The command assumes the form
where the first set of options is for the tex4ht.sty and *.4ht style files, the second set is for the tex4ht postprocessor, and the third for the t4ht postprocessor. For instance,
In addition, the command requests a break up of the output into separate web pages, in accordance to the two top sectioning levels of the document.
Moreover, it asks for a listing in the log file of the information available for the style files in use. That information, among other things, also introduces additional values available for the first list of options.
If the first list of options is not empty, it must start with the entry ‘html’, ‘xhtml’, or a name of a private configuration file.
To request a Unicode representation of symbols, the first list of options should include the ‘uni-html4’ entry, and the second list should include the ‘-cunihtf’ entry preceded by space. For instance, ‘xhtlatex filename "xhlatex,uni-html" " -cunihtf"’.
TeX4ht has different configurations for different modes of output. It is distributed with pre-tailored base configurations for translating LaTeX math into MathML, and extra configurations for adjusting the outcome to Mozilla, MathPlayer, and PMathML CSS. Only presentational MathML is supported.
The mzlatex command is a short cut representation for the command htlatex "xhtml,mozilla" " -cmozhtf". It take into account special needs of browsers. The xhmlatex command is a short cut representation for the command htlatex "xhtml,mathml" " -cunihtf"; it does not make any compromizes toward browsers.
It might be worthwhile to notice some of the more common sources of problems for MathML. The ‘mathml-’ options asks for a degraded MathML output that sidetracks some of the problems.
A translation for an OpenOffice format can be requested by the ‘’oolatex’ command. The command is a variant of \htlatex in which the first list of options holds the entries ‘xhtml,ooffice’, the second list holds the entry ‘-cmozhtf’ preceded by a space, and the third list contains ‘-coo’. The output of a command ‘oolatex filename’ is a zipped file named with a ‘.sxw’ extension.
The OpenOffice code employs MathML for formulas, and XSL-FO for formatting. It can be viewed by the OpenOffice word processor which, in turn, can export RTF and other MicroSoft-based formats (see also, Maarten Wisse, “Hacking TeX4ht for XML Output: The Road toward a TeX to Word Convertor”, MAPS 28 (2002), pp. 28-35).
A command of the form ‘htlatex filename "html,word" "symbol/!"’ asks for HTML output tuned toward MicroSoft Word. Such a format, however, relies on bitmaps for mathematical formulas.
The ‘dblatex”, ‘dbmlatex’, ‘teilatex’, and ‘teimlatex’ commands may be used for requesting DocBook and TEI output.
The leading entry, in the first list of options of the htlatex-like commands, can equal ‘html’ or ‘xhtml’. If this is not the case, the entry is assumed to be the name of a configuration file. The extension ‘cfg’ is assumed for names of configuration files that are listed without their extension.
A configuration file should take the following form for LaTeX files.
It is up to the user to decide the distribution of entries between the \Preamble and the \htlatex-like commands.
Given a LaTeX file
the ‘htlatex filename’ command produces a call ‘latex filename’ to LaTeX on an implicit file of the following form.
Similarly, the command ‘htlatex filename "options"’ produces a call to a ‘latex filename’ command on an implicit file of the following form.
From the perspective of TeX4ht, the htlatex-like commands, and the \usepackage, are indirect approaches for getting LaTeX files of the following form. Such files can be explicitly provided for compilations requested through the ‘ht latex filename’ command.
Commands similar to those offered for LaTeX are also offered for TeX (dbmtex, dbtex, ht, httex, mztex, ootex, t4ht, teimtex, teitex, tex4ht, xhmtex, xhtex) and TeXi (dbmtexi, dbtexi, httexi, mztexi, ootexi, teimtexi, teitexi, xhmtexi, xhtexi). In the case of TeX, the fragment of code ‘\csname tex4ht\endcsname’ should be introduced into the source file, after the preamble of the file where the document definitions reside (example). In the case of TeXi, such a code fragment is introduced implicitly.
The private configuration files are similar to those of LaTeX, with the instruction ‘\begin{document}’ excluded.
The outcome of the translations should be checked by validators for proper syntax. Typically, with the presence of validators, errors are easy to detect and correct, but they require human intervention.
TeX4ht doesn’t offer a built-in parser to verify the correctness of the outcome. However, external validator(s) can quite easily be integrated into the compilation process.
To keep with the spirit of LaTeX and hypertext, in which style is assumed to be separated from content, the users are encouraged to avoid inserting TeX4ht code into their source files. Instead, they should place their modifications, to the default settings, within private configuration files to be loaded by htlatex-like commands.
On the other hand, it should be noted that hypertext markings should adhere to strict rules specified by different standards. Consequently, it is strongly advised to check the output obtained from the default configurations, before trying to tailor new ones.