<< , >> , Title


About HTML

Native documents on the World-Wide Web are written in HTML, the HyperText Markup Language. HTML defines the structural elements in a document (such as headers, citations, addresses, etc.), layout information (bold and italics), the use of inline graphics together with the ability to provide hypertext links.

A simple HTML document is illustrated in Figure 4-1.

<TITLE>The World-Wide Web</TITLE>
<H1>About The World-Wide Web</H1>
<P>The World-Wide Web is a <EM>distributed multimedia
hypertext</EM> system.</P>
Figure 4-1 A Simple HTML Document.

Structural elements in the document are identified by start and end tags. For example the <TITLE> and </TITLE> tag is used to specify the title of the document, which is often displayed by a client. The <H1> and </H1> tag is used to define the first level heading. Clients will normally display headers differently from the body text: for example, a graphical client could display the header using a larger or different font, whereas a text-based client could display a header as centred text or in all capitals.

Figure 4-1 also illustrates the <EM> container. Text held in the container (which is defined by the <EM> start tag and the </EM> end tag) will be emphasised in some way. A graphical browser could render the emphased text by displaying it in italics, whereas a browser with audio capabilities for the visually impaired could render the emphasis by a change in the tone of the voice output.

Figure 4-1 also shows the paragraph container. It is important to understand that the <P> tag is part of a paragraph container and is no longer a paragraph separator (as many people mistakenly believe). If the </P> is not used the existence of the next <P> tag will imply a </P>. In future versions of HTML it will be possible to specify paragraph attributes: for example <P ALIGN=Centred>.

Although browsers will display the HTML document shown in Figure 4-1, for reasons of performance and upwards compatibility it is strongly recommended that HTML documents contain additional elements including the <HTML>, <HEAD> and <BODY> tags, as shown in Figure 4-2.

<TITLE>The World-Wide Web</TITLE>
<H1>About The World-Wide Web</H1>
<P>Information about the World-Wide Web is available 
<A HREF="http://info.cern.ch/hypertext/WWW/TheProject.html"> at
Figure 4-2 A Simple HTML Document.

The <HTML> container is used to define the extent of the HTML document. Within the HTML document there are two other containers: <HEAD> and <BODY>. The <HEAD> container provides information about the document itself. This can include the title of the document (as illustrated) copyright information, keywords and expiry dates (for use by caching software). It is important to make use of the tag since, for example, an automatic indexing program which wishes to index the title of HTML documents can parse only the information contained in the container. If the container is not present the entire document may have to be parsed, which will place unnecessary extra load on the server.

Figure 4-2 also illustrates the use of the anchor <A> container. This tag is used to provide hypertext links. In the example the text at CERN which is contained between the <A> and </A> tags will be highlighted in some way by the browser. Selecting this highlighted phrase will cause the client to send a request for http://info.cern.ch/hypertext/WWW/TheProject.html This request will use the http protocol and will be sent to the server running on the system at info.cern.ch

HTML Authoring Tools

Initially information providers on the World-Wide Web used standard editors such as vi and emacs to create HTML documents. As WWW grew in popularity authoring tools were developed to assist information providers. This section describes three authoring tools which are available for the Microsoft Windows environment: HTML Assistant, HTML Hyperedit and HTMLEd.

HTML Assistant

HTML Assistant is a simple authoring tool which can be used to create and edit HTML documents. Frequently Asked Questions about HTML Assistant is available at the URL http://cs.dal.ca/ftp/htmlasst/htmlafaq.html HTML Assistant is available at the URL ftp://ftp.cica.indiana.edu/pub/pc/win3/misc In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/html-assistant

Figure 4-3 HTML Assistant.

HTML Hyperedit

HTML Hyperedit (which was developed using the Toolbook authoring system) not only provides an environment for producing HTML documents, but also contains a tutorial which gives an introduction to HTML. HTML Hyperedit is available at the URL ftp://info.curtin.edu.au/pub/internet/mswindows/hyperedit In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/win-htmledit

Figure 4-4 HTML HyperEdit


HTMLEd is a simple authoring tool which can be used to create HTML documents. In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/

Figure 4-5 HTMLEd.

Word Processing Tools

HTML Assistant and HTML Hyperedit are self-contained authoring tools. Another approach is to develop authoring tools which work within a word processing environment. These tools are normally implemented as macros for popular word processing packages, such as Word For Windows or WordPerfect. This section describes three tools which have been developed for use within Word For Windows: the GT_HTML, CU_HTML and ANT_HTML macros.

Word processing tools have the advantage that they provide a consistent environment for existing users of word processors. However they do have their disadvantages. Because they are normally implemented as macros, they can be very slow, especially when used with large or complicated documents. There is also a danger that HTML markup which is embedded as hidden text could cause conflicts with other word processing tools if, for example, the word processed document was used by other users.


One of the first word processing macros which could be used to create HTML documents was the GT_HTML macro. This macro, written for Word For Windows, was developed at the Georgia Technical Research Institute. In the UK the software is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/macros/ms-winword

Figure 4-6 The GT_HTML Macro.


CU_HTML is a template designed to work within Word For Windows. The template was written by Anton Lam () The software is available at the URL ftp://ftp.cuhk.hk/pub/www/windows/util

Figure 4-7 The CU_HTML Macro.


ANT_HTML is a template designed to work within Word For Windows 6.0. The template was written by Jill Swift (mailto:jswift@freenet.fsu.edu) The software is available at the URL ftp://ftp.einet.net/einet/pc/ANT_HTML.ZIP

Figure 4-8 The ANT_HTML Macro.

Browser Editing Tools

Another approach to editing HTML documents is provided by browsers which are integrated with editing tools. The Arena browser enables an external editor to be invoked to edit the displayed HTML document. Figure 4-9 illustrates the Arena browser used in conjunction with the Emacs editor.

Figure 4-9 Editing A Document From Arena.

HTML Document Conversion Tools

Authoring tools are normally used to create new HTML documents. Document conversion tools, on the other hand, can be used to convert existing documents to HTML format.


One of the first sophisticated document conversion tools to be developed was the LaTeX2html conversion program. This program was written by Nikos Drakos, Computer Based Learning Unit, University of Leeds. It and set the standard for document converters, providing a wide range of feature including:

Figure 4-10 illustrates a document which has been converted by the LaTeX2html conversion program.

Figure 4-10 A Document Converted Using LaTeX2html.

LaTeX2html is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/latex2html Further information is available at the URL http://cbl.leeds.ac.uk/nikos/doc/www94/www94.html


The RTFtohtml conversion program enables RTF files (which can be produced by word processing packages such as Word For Windows) to be converted to HTML. The program was written by Chris Hector (Cray) based on RTF parsing software developed by Paul DuBois.

RTFtohtml is available as a command line tool for a number of Unix platforms. In addition an Apple Macintosh implementation is available. A beta version of an MSDOS implementation was announced in November 1994.

An extension of the RTFtohtml program is known as RTFtoweb. This provides a number of additional features, including creation of hypertext links at user defined section breaks. Figure 4-11 illustrates a document on Exploring The World-Wide Web Using Mosaic For Windows which is available at the URL http://www.leeds.ac.uk/ucs/docs/tut50/tut50.html

Figure 4-11 Document Converted Using RTFtoweb.

In Figure 4-11 it should be noted that the document is automatically split into a number of files. A hypertext table of contents is automatically generated. Chevrons (>> and <<) are also generated automatically which can be used to move to the next or previous section.

Further information about RTFtohtml is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/rtftohtml_overview.html The software is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/latest/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/rtftohtml

RTFtoweb is available at the URL ftp://ftp.rrzn.uni-hannover.de/pub/unix-local/misc/rtftoweb/html/rtftoweb.html

HTML Quality Tools

The HTML specification states that "HTML parsers should be liberal except when verifying code. HTML generators should generate strictly conforming HTML." Put simply this means that browsers should be capable of displaying documents which contain invalid HTML, but HTML authoring tools and document converters should generate HTML which conforms strictly to the standard.

A number of HTML validation tools are available which can validate HTML documents. A number of popular tools are described below.


HoTMetaL is an HTML authoring tool and validator. It will provide feedback if it encounters invalid HTML, as illustrated in Figure 4-12.

Figure 4-12 HoTMetaL.

HoTMetaL is available for the X and Microsoft Windows platforms. Two versions of the software are available: a public domain version and a licensed version. HoTMetaL Pro, the licensed version, can be used to import and validate an existing document. The public domain version will give an error and refuse to load a document which contains invalid HTML.

HoTMetaL is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/Mosaic/html/hotmetal


A tool called weblint can be used to check for invalid HTML documents. This software is available from the URL ftp://ftp.khoros.unm.edu/pub/perl/www/weblint-1.000.tar.gz In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/weblint


sgmls is a tool which can be used to validate SGML documents. It is available at the URL ftp://sgml1.ex.ac.uk/pub/SGML/sgmls/ sgmls is used in a number of HTML validation services, such as those mentioned above. Information on installing sgmls and also pgmls (an SGML mode for emacs) is available at the URL http://web.nexor.co.uk/users/mak/doc/html/sgml-lib/html-sgml.html

HTML Validation Service

An HTML validation service is available at the URL http://www.hal.com/%7Econnolly/html-test/service/validation-form.html This service makes use of HTML forms and a CGI script which runs a HTML validation program. The service can be used to check HTML syntax by entering the HTML markup to be checked. It can also be used to check an existing HTML document by entering the URL of the document.

Figure 4-13 HTML Validation Service.

A variation on this service is available at the URL http://www.cc.gatech.edu/grads/j/Kipp.Jones/HaLidation/validation-form.html

These services make of the sgmls validation program.

The software can be installed on your local Unix system. It is available at the URL ftp://ftp.hal.com/pub/CGI/check-html.tar.Z

HTML Check Toolkit

The HTML Check Toolkit is another HTML validation program. The software can be installed using a WWW browser. The installation service, illustrated below, is based on the EIT Webmaster Starter's Kit. HTML Check Toolkit is available at the URL http://www.hal.com/~markg/HaLSoft/html-check/

Figure 4-14 Installing The Check_HTML Script.

Review of HTML Tools

Before choosing HTML authoring tools, document converters or quality tools for institutional use the following issues should be considered:

Support Who wrote the software - an experienced software developer or a student as part of a computer project? Will the software continue to be developed and supported?

Quality Does the software produce valid HTML?

Functionality What facilities does the software provide?

Other Issues If the software is based on a word processing package, what happens if the word processed document needs to be used by another word processor?

Writing Style

Writing styles for WWW documents are still developing. However there are a number of guidelines which can be provided:

Finding Out More About HTML

This document does not provide an in-depth tutorial on HTML. Many WWW resources are available which give details on writing HTML. Some of these are listed below:

In addition to these documents the following resources are also available.

A review of Microsoft Windows HTML authoring tools is available at the URL http://werple.apana.org.au/~gabriel/html-editors/index.html

A list of HTML tools is available at the URL http://info.cern.ch/hypertext/WWW/Tools/Filters.html

Dan Connolly's HTML Design Notebook is available at the URL http://www.hal.com/%7Econnolly/html-design.html

The HTML specification is available at the URL http://www.hal.com/%7Econnolly/html-spec.html

<< , >> , Title