DocBook XML/SGML Processing Using OpenJade

Saqib Ali


saqib@seagate.com

Revision History                                                             
Revision v1.5           2002-08-11             Revised by: sa                
Added the XML section and the sample XML file.                               
Revision v1.4           2002-08-08             Revised by: sa                
Many valuable modifications/corrections suggested by Lloyd D Budd. Thanks    
Lloyd :)                                                                     
Revision v1.3           2002-08-02             Revised by: sa                
Added the "Additional Reseources" section.                                   
Revision v1.2           2002-07-23             Revised by: sa                
Added the section on converting HTML -> PDF using HTMLDOC. Thanks to Luc De  
Louw for the suggestion.                                                     
Revision v1.1           <2002-07-19            Revised by: KET               
fixed grammatical errors, numbered processes                                 
Revision v1.0           2002-06-29             Revised by: sa                
Initial public release.                                                      


This HOWTO explains setting up OpenJade to process SGML/XML DocBook
documents.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
    1.1. Copyright and License
    1.2. Credits
    1.3. What is DocBook?
    1.4. What is DSSSL?
    1.5. What do we need?
    1.6. Assumptions
   
   
2. Requirements
    2.1. Pre-requirements
    2.2. OpenJade
    2.3. DocBook DTD
    2.4. ISO Entities
    2.5. Norman Walsh's DSSSL
    2.6. LDP customized XSL stylesheets
    2.7. HTMLDOC (Optional)
   
   
3. Installing Processing Tools
    3.1. Installing OpenJade
    3.2. Installing Norman Walsh's DSSSL
    3.3. Installing DocBook DTDs
    3.4. Installing the ISO Entities
    3.5. Installing LDP DSL
    3.6. Installing HTMLDOC
   
   
4. Using OpenJade
    4.1. Processing SGML
    4.2. Processing XML
    4.3. HTML to PDF (optional)
   
   
5. Further Information
    5.1. News groups
    5.2. Mailing Lists
    5.3. IRC
    5.4. Web Sites
    5.5. Commercial Tools
   
   

1. Introduction

Some Acronyms:

 1. SGML - Standard Generalized Markup Language
   
 2. XML - Extensible Markup Language
   
 3. RTF - Rich Text Format
   
 4. HTML - HyperText Markup Language
   
 5. PDF - Portable Document Format
   

The objective of this document is to setup OpenJade to convert DocBook 3.2
and 4.2 Standard Generalized Markup Language (SGML) and Extensible Markup
Language (XML) documents to HyperText Markup Language (HTML), Rich Text
Format (RTF), and Portable Document Format (PDF).
-----------------------------------------------------------------------------

1.1. Copyright and License

 This document is Copyright 2001 by Saqib Ali. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.1 or any later version published by the Free
Software Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts. A copy of the license is available at [http://
www.gnu.org/copyleft/fdl.html] http://www.gnu.org/copyleft/fdl.html
-----------------------------------------------------------------------------

1.2. Credits

All praise is due to Allah, The Lord of the Worlds. All credits go to Allah.
Any mistake in this document is my own fault.

Additionally, I would like to acknowledge following people for their valuable
contributions to this document:

 1. Greg Ferguson <gferg (at) hoop.timonium.sgi.com> - for very helpful hints
    /suggestions on the docbook mailing list
   
 2. Kristin Thomas <kristint (at) us.ibm.com> - For the initial review of
    this document.
   
 3. Luc de Louw <luc@delouw.ch> - For suggestions on HTMLDOC (PDF -> HTML)
    section
   
 4. Lloyd D Budd <ldp@foolswisdom.org> - For suggestions on improving most of
    the sections of the document
   

-----------------------------------------------------------------------------
1.3. What is DocBook?

DocBook is a document type definition (DTD). A DTD defines the syntax of a
document. DocBook describes the types of structure and formats to use in
technical documents. It is commonly used because of its simplicity and
completeness.

 A DTD defines the syntax of a document - essentially it is a 'rule book'
that describes the sets of tags and attributes that will be used to describe
specific kinds of content. So DocBook is a "Rule Book" that is used for
writing documents. Every TAG that is used in writing the document, must be
defined verfy specifically and formally in the DTD.
-----------------------------------------------------------------------------

1.4. What is DSSSL?

A Document Style Semantics and Specification Language (DSSSL) defines how to
convert an Standard Generalized Markup Language (SGML) document into a
human-readable format.
-----------------------------------------------------------------------------

1.5. What do we need?

The tools needed to set up OpenJade for converting SGML and XML are:

*OpenJade
   
*Entities
   
*Norman Walsh's DSSSL
   
*DocBook DTDs
   
*LDP DSL
   

Note Note                                                                    
    All of these packages are free and available for download on the net.   
     The next chapter explains how to download these packages.               
-----------------------------------------------------------------------------

1.6. Assumptions

This document assumes that you have the following already installed on your
system.

*gzip - available from [http://www.gnu.org] http://www.gnu.org
   
*gcc and GNU make - available from [http://www.gnu.org] http://www.gnu.org
   

-----------------------------------------------------------------------------
2. Requirements

You'll have to download and compile only one package (OpenJade). This HOWTO
will explain the compilation process, but you should be familiar with
installing from source code.

Most of the packages that we need, are located at [http://www.tldp.org/
authors/index.html#resources] The Linux Documentation Project (TLDP) website
-----------------------------------------------------------------------------

2.1. Pre-requirements

 Create a directory /tmp/downloads. We will use this directory to store the
downloaded source code.
-----------------------------------------------------------------------------

2.2. OpenJade

OpenJade will be used to process DocBook documents. OpenJade can be
downloaded from: [http://openjade.sourceforge.net/] http://
openjade.sourceforge.net/.

At the time of writing this document OpenJade 1.3.1 was available. Download
the openjade-1.3.x.tar.gz file
-----------------------------------------------------------------------------

2.3. DocBook DTD

All the DocBook DTDs are available from The Linux Documentation Project
website at [http://www.tldp.org/authors/index.html#resources] http://
www.tldp.org/authors/index.html#resources

Please download [http://www.tldp.org/authors/tools/docbk41.zip] DocBook SGML
v4.1, [http://www.tldp.org/authors/tools/docbk31.zip] DocBook SGML v3.1, and
[http://www.tldp.org/authors/tools/docbkx412.zip] DocBook XML v4.1.2

Note Note                                                                    
    Please download all the zip archives.                                   
-----------------------------------------------------------------------------

2.4. ISO Entities

 [http://www.tldp.org] The Linux Documentation Project has packaged all the
Entities into one big tar file and placed it at [http://www.tldp.org/authors/
tools/entities.tar.gz] http://www.tldp.org/authors/tools/entities.tar.gz for
the convenience of the users. Thanks to TLDP for this.
-----------------------------------------------------------------------------

2.5. Norman Walsh's DSSSL

Norman Walsh's DSSSL can be downloaded from the DocBook project website at
[http://sourceforge.net/project/showfiles.php?group_id=21935] http://
sourceforge.net/project/showfiles.php?group_id=21935.

 At the time of writing this document docbook-dsssl-1.7.6 was available.
-----------------------------------------------------------------------------

2.6. LDP customized XSL stylesheets

 LDP DSL is a customized style sheet used by [http://www.tldp.org] The Linux
Documentation Project (TLDP). It is an extension to Norman Walsh's DSSSL. It
add things like background and Table of Contents. It can be downloaded from
[http://www.tldp.org/authors/tools/ldp.dsl] http://www.tldp.org/authors/tools
/ldp.dsl.

ldp.dsl requires Normal Walsh's DSSSL
-----------------------------------------------------------------------------

2.7. HTMLDOC (Optional)

HTMLDOC can be used for converting the HTML to PDF. If you would like to
produce PDF documents, please download HTMLDOC from [http://www.easysw.com/
htmldoc/software.php] http://www.easysw.com/htmldoc/software.php
-----------------------------------------------------------------------------

3. Installing Processing Tools

In this section we will install all the tools in the appropriate directories.
All the tools go in the /usr/local/dbtools/ directory. Create this directory
using the following command:
mkdir /usr/local/dbtools                                                     
-----------------------------------------------------------------------------

3.1. Installing OpenJade

This process is a the easy part, but the most time consuming one too. Keep in
mind that OpenJade take a long time to compile. To install OpenJade complete
the following steps:

 1. Change directories to /tmp/downloads.
            # cd /tmp/downloads                                              
                                                                             
   
 2. Unzip the file.
            # gzip -d openjade-1.3.x.tar.gz                                  
                                                                             
   
 3. Untar the file.
            # tar -xvf openjade-1.3.x.tar                                    
                                                                             
   
 4. Change directories to openjade-1.3
            # cd openjade-1.3.x                                              
                                                                             
   
 5. Run the ./configure command.
            # ./configure                                                    
            --prefix=/usr/local/dbtools/openjade                             
                                                                             
   
 6. Run the make command.
            # make                                                           
                                                                             
   
 7. Run the make install command. After this step the openjade binaries will
    be installed under /usr/local/dbtools/openjade.
            # make install                                                   
                                                                             
   
 8. Copy the dsssl directory from /tmp/downloads/openjade-1.3.x to /usr/local
    /dbtools/openjade
            # cp -dpR dsssl /usr/local/dbtools/openjade/                     
                                                                             
   

-----------------------------------------------------------------------------
3.2. Installing Norman Walsh's DSSSL

 In this step we will install Norman Walsh's DSSSL in appropriate place. The
DSSSL does not have to be compiled.

 1. Change directories to /tmp/downloads
            # cd /tmp/downloads                                              
                                                                             
   
 2. Unzip the file.
            # gzip -d docbook-dsssl-1.76.tar.gz                              
                                                                             
   
 3. Untar the file.
            # tar -xvf docbook-dsssl-1.76.tar                                
                                                                             
   
 4. Move the file to the /usr/local/dbtools/docbook-dsssl.
            # mv docbook-dsssl-1.76 /usr/local/dbtools/docbook-dsssl         
                                                                             
   

-----------------------------------------------------------------------------
3.3. Installing DocBook DTDs

In this section we will install DocBook DTDs.

 1. Change directories to /usr/local/dbtools.
            # cd /usr/local/dbtools                                          
                                                                             
   
 2. Create a new directory called dtd3.1 dtd4.1 dtd4.1.2.
            # mkdir dtd3.1                                                   
            # mkdir dtd4.1                                                   
            # mkdir dtd4.1.2                                                 
                                                                             
   
 3. Change directories to the dtd3.1.
                    # cd dtd3.1                                              
                                                                             
   
 4. Unzip the file DocBook SGML v3.1 in this directory.
                    # unzip /tmp/downloads/docbk31.zip                       
                                                                             
   
 5. Change directories to the dtd4.1.
                    # cd ../dtd4.1                                           
                                                                             
   
 6. Unzip the file DocBook SGML v4.1 in this directory.
                    # unzip /tmp/downloads/docbk41.zip                       
                                                                             
   
 7. Change directories to the dtd4.1.2.
                    # cd ../dtd4.1.2                                         
                                                                             
   
 8. Unzip the file DocBook XML v4.1.2 in this directory.
                    # unzip /tmp/downloads/docbk412.zip                      
                                                                             
   

-----------------------------------------------------------------------------
3.4. Installing the ISO Entities

In this section we will install the ISO entities that we downloaded from the
LDP website.

First we install the ISO Entities for the 3.1 SMGL DTD

 1. Change directories to the /usr/local/dbtools/dtd3.1 directory.
            # cd /usr/local/dbtools/dtd3.1                                   
                                                                             
   
 2. Copy /tmp/download/entities.tar.gz to this directory.
            # cp /tmp/download/entities.tar.gz .                             
                                                                             
   
 3. Unzip the file.
            # gzip -d entities.tar.gz                                        
                                                                             
   
 4. Untar the file.
            # tar -xvf entities.tar                                          
                                                                             
   

Next we install the ISO Entities for the 4.1 SMGL DTD

 1. Change directories to the /usr/local/dbtools/dtd4.1 directory.
            # cd /usr/local/dbtools/dtd4.1                                   
                                                                             
   
 2. Copy /tmp/download/entities.tar.gz to this directory.
            # cp /tmp/download/entities.tar.gz .                             
                                                                             
   
 3. Unzip the file.
            # gzip -d entities.tar.gz                                        
                                                                             
   
 4. Untar the file.
            # tar -xvf entities.tar                                          
                                                                             
   

-----------------------------------------------------------------------------
3.5. Installing LDP DSL

Finally we install the ldp.dsl.

 1. Change directories to the /tmp/download directory.
            # cd /tmp/download                                               
                                                                             
   
 2. Copy the ldp.dsl file to the /usr/local/dbtools/docbook-dsssl/print/
    ldp.dsl directory.
            # cp ldp.dsl /usr/local/dbtools/docbook-dsssl/print/ldp.dsl      
                                                                             
   
 3. >Copy the ldp.dsl file to the  /usr/local/dbtools/docbook-dsssl/html/
    ldp.dsl directory.
            # cp ldp.dsl /usr/local/dbtools/docbook-dsssl/html/ldp.dsl       
                                                                             
   

-----------------------------------------------------------------------------
3.6. Installing HTMLDOC

This step is optional. It is only required if you want to produce PDF
documents

Change back to the downloads directory
        # Change to /tmp/download directory                                  

Untar the source code for HTMLDOC
        # gzip -d htmldoc-1.8.xx-source.tar.gz                               
        # tar -xvf htmldoc-1.8.xx-source.tar                                 
        # cd htmldoc-1.8.xx-1                                                

run configure to set the installation location
        # ./configure --prefix=/usr/local/dbtools/htmldoc                    
        # make                                                               

At the time of writing this document HTMLDOC ver 1.8.20-1 was available. This
version had a little problem in the fonts Makefile. It would complain while
installing the fonts, because the correct fonts were not available on the
system

Here is the error you will get while running make install
        # make install                                                       
Making all in htmldoc...                                                     
Making all in doc...                                                         
Installing in fonts...                                                       
Installing font files in /usr/local/dbtools/htmldoc/share/htmldoc/fonts...   
/bin/cp: cannot stat `ZapfChancery.afm': No such file or directory           
/bin/cp: cannot stat `ZapfChancery.pfa': No such file or directory           
/bin/cp: cannot stat `ZapfDingbats.afm': No such file or directory           
/bin/cp: cannot stat `ZapfDingbats.pfa': No such file or directory           
make[1]: *** [install] Error 1                                               
                                                                             

To fix this installation issue, please edit fonts/Makefile and comment out
the lines with references to ZapfChancery and ZapfDingbats fonts

Then execute the install:
# make install                                                               
Making all in htmldoc...                                                     
Making all in doc...                                                         
Installing in fonts...                                                       
Installing font files in /usr/local/dbtools/htmldoc/share/htmldoc/fonts...   
Installing in data...                                                        
Installing in doc...                                                         
Installing in htmldoc...                                                     
                                                                             
-----------------------------------------------------------------------------

4. Using OpenJade

In this section we will use OpenJade to convert SGML/XML documents to HTML,
RTF, and PDF.
-----------------------------------------------------------------------------

4.1. Processing SGML

You can download a sample DocBook 3.1 SGML file from [http://
docbook.sc-icc.org/DocBook-OpenJade-SGML-XML-HOWTO.sgml] http://
docbook.sc-icc.org/DocBook-OpenJade-SGML-XML-HOWTO.sgml
-----------------------------------------------------------------------------

4.1.1. Setting the SGML_CATALOG_FILES Environmental Variable for SGML

The SGML_CATALOG_FILES variable must be set to point to appropriate catalog
files. To set the variable, use the following command:
# export SGML_CATALOG_FILES=/usr/local/dbtools/openjade/dsssl/catalog:/usr/local/dbtools/dtd3.1/docbook.cat:/usr/local/dbtools/docbook-dsssl/catalog 
-----------------------------------------------------------------------------

4.1.2. SGML to HTML

To convert from SGML to HTML, use the following command:
# /usr/local/dbtools/openjade/bin/openjade -t sgml -d /usr/local/dbtools/docbook-dsssl/html/ldp.dsl#html DocBook-OpenJade-SGML-XML-HOWTO.sgml  

To create a non-chunked (all in one) output:
# /usr/local/dbtools/openjade/bin/openjade -V nochunks -t sgml -d /usr/local/dbtools/docbook-dsssl/html/ldp.dsl#html DocBook-OpenJade-SGML-XML-HOWTO.sgml  
-----------------------------------------------------------------------------

4.1.3. SGML to RTF

To convert from SGML to RTF, use the following command:
# /usr/local/dbtools/openjade/bin/openjade -t rtf -d /usr/local/dbtools/docbook-dsssl/print/ldp.dsl#print DocBook-OpenJade-SGML-XML-HOWTO.sgml  
-----------------------------------------------------------------------------

4.2. Processing XML

You can download a sample DocBook 4.1.2 XML file from [http://
docbook.sc-icc.org/DocBook-OpenJade-SGML-XML-HOWTO.xml] http://
docbook.sc-icc.org/DocBook-OpenJade-SGML-XML-HOWTO.xml
-----------------------------------------------------------------------------

4.2.1. Setting the SGML_CATALOG_FILES Environmental Variable for XML

The SGML_CATALOG_FILES variable must be set to point to appropriate catalog
files. To set the variable, use the following command:
# export SGML_CATALOG_FILES=/usr/local/dbtools/openjade/dsssl/catalog:/usr/local/dbtools/dtd4.1.2/docbook.cat:/usr/local/dbtools/docbook-dsssl/catalog 
                                                                                                                                                       
-----------------------------------------------------------------------------

4.2.2. XML to HTML

 To convert from XML to HTML, use the following command
# /usr/local/dbtools/openjade/bin/openjade -t xml -d /usr/local/dbtools/docbook-dsssl/html/ldp.dsl#html /usr/local/dbtools/docbook-dsssl/dtds/decls/xml.dcl DocBook-OpenJade-SGML-XML-HOWTO.xml 
-----------------------------------------------------------------------------

4.2.3. XML to RTF

 To convert from XML to HTML, use the following command
# /usr/local/dbtools/openjade/bin/openjade -t rtf -d /usr/local/dbtools/docbook-dsssl/print/ldp.dsl#print /usr/local/dbtools/docbook-dsssl/dtds/decls/xml.dcl DocBook-OpenJade-SGML-XML-HOWTO.xml 
-----------------------------------------------------------------------------

4.3. HTML to PDF (optional)

 To convert HTML to PDF we must use HTMLDOC. First create non-chunked HTML
output of the SGML:
# /usr/local/dbtools/openjade/bin/openjade -V nochunks -t sgml -d /usr/local/dbtools/docbook-dsssl/html/ldp.dsl#html DocBook-OpenJade-SGML-XML-HOWTO.sgml  

Then run HTMLDOC to produce PDF
# /usr/local/dbtools/htmldoc/bin/htmldoc -f outfile.pdf input.html           
-----------------------------------------------------------------------------

5. Further Information

This section has some pointers to the related resource on the internet.

If you would like to suggest additional resource for this section, please
email me @ < saqib@seagate.com>. Thanks.
-----------------------------------------------------------------------------

5.1. News groups

Some of the News groups of interest are:

 1. comp.text.sgml (easily accessible from [http://www.deja.com] http://
    www.deja.com)
   
 2. comp.text.xml (easily accessible from [http://www.deja.com] http://
    www.deja.com)
   
 3. htmldoc.general (server - nttp://news.easysw.com)
   

-----------------------------------------------------------------------------
5.2. Mailing Lists

Here are some relevant Mailing Lists

 1. DocBook mailing list @ OASIS. Visit [http://www.oasis-open.org/committees
    /docbook/mailinglist/index.shtml] http://www.oasis-open.org/committees/
    docbook/mailinglist/index.shtml for more info.
   
 2. DocBook mailing list @ TLDP. Visit [http://www.tldp.org/mailinfo.html]
    http://www.tldp.org/mailinfo.html for more info.
   
 3. xml-doc @ Yahoo Groups. Visit [http://groups.yahoo.com/group/xml-doc/]
    http://groups.yahoo.com/group/xml-doc/ for more info.
   

-----------------------------------------------------------------------------
5.3. IRC

 1. DocBook IRC Channel. #docbook on irc://irc.openprojects.net
   

-----------------------------------------------------------------------------
5.4. Web Sites

 1. [http://www.oasis-open.org/] http://www.oasis-open.org/ OASIS maintains
    various DocBook DTDs
   
 2. [http://docbook.org/wiki/moin.cgi/] http://docbook.org/wiki/moin.cgi/The
    DocBook Wiki
   
 3. [http://www.docbook.org/tdg/en/] http://www.docbook.org/tdg/en/Online
    version of DocBook: The Definitive Guide
   
 4. [http://www-106.ibm.com/developerworks/library/l-docbk.html] http://
    www-106.ibm.com/developerworks/library/l-docbk.htmlA gentle guide to
    DocBook (very good introduction).
   
 5. [http://www.tldp.org/LDP/LDP-Author-Guide/index.html] http://www.tldp.org
    /LDP/LDP-Author-Guide/index.htmlThe Linux Documentation Project (TLDP)
    Author Guide
   
 6. [http://www.tldp.org/authors/index.html#resources] http://www.tldp.org/
    authors/index.html#resourcesDocBook resources provided by TLDP
   

-----------------------------------------------------------------------------
5.5. Commercial Tools

 1. DocPro by Command Prompt, INC. [http://www.commandprompt.com/entry.lxp?
    lxpe=2] http://www.commandprompt.com/entry.lxp?lxpe=2
   
 2. YAWC Pro by XML Workshop LTD. [http://www.yawcpro.com/] http://
    www.yawcpro.com/. Can be used for converting MS Word to Simple DocBook
    XML.
   
 3. Logictran RTF Converter. [http://www.logictran.com/] http://
    www.logictran.com/. Word/RTF to HTML/XML
   
 4. MajiX - Word to XML converter. [http://tetrasys.dhs.org/] http://
    tetrasys.dhs.org/
   
 5. XMETAL by SoftQuad [http://www.softquad.com/] http://www.softquad.com/
   

