3. Content creation and modification

Table of Contents

3.1. Creating a new Website
3.1.1. A naming convention for files
3.1.2. Site navigation, the layout file
3.2. Adding a new webpage to an existing site
3.2.1. Adding an xml file to the Website, using the layout.xml file
3.2.2. Adding an external non-xml file to a Website
3.3. Deleting a webpage from an existing site
3.4. Moving a webpage within an existing site

3.1. Creating a new Website

What you will need.

  • A plan of what you want your site to look like

  • A naming convention for files

  • A copy of the Docbook DTD and stylesheets

  • A copy of the Website DTD and stylesheets

  • Develop the build script and layout file

Take that one at a time. First, the view of how you want your Website to look. This should be a logical map of what webpages you want, and how you want the parts of the Website to link together. The structure could be as simple as this list below


   index.html
    prepare.html
         ingredients.html
         teapot.html
         prep.kettle.html
    procedure.html
         proc.kettle.html
         brew.html
         serve.html


Are derived from:

maketea.xml 
prep.xml 
ing.xml  
teapot.xml    
prep.k.xml    
proc.xml      
proc.k.xml    
brew.xml      
pour.xml      


The indentation on the right of the first list shows the structure I want in the html output. The home page for the site is to be called index.html, then I want two major sections below that, each containing a number of files. The first section deals with preparing to make a cup of tea, the second with the procedure for making a cup of tea. All very complicated! The second list shows the xml source files, with names suitably abbreviated since readers won't see these. This example shows two sections, each containing a number of files. This structure can be extended in two ways. Further sections can be added in the same way that prepare.html and procedure.html are added; depth can be created by a further nesting of sections. For example brew.xml could be made into a section containing its own files. So although a simple example, there's nothing here that can't be extended further. I'll describe such an example at Section 3.2.1.

3.1.1. A naming convention for files

The file names of the html files in the above list are important if you intend readers to navigate to an individual page. Long names with underscores and other 'decorations' are a real nuisance when typing in a web address (URI). For this reason it is worth while keeping the names reasonably short. Further restrictions arise due to their use as a part of a URI. Case sensitivity problems are avoided if you always use lower case letters. Some characters are invalid, others need escaping, so again, avoid these problems by only using the characters a-z and 0-9, always starting with a letter. If you wish to group files, join them with a shared prefix as per the example above. Hypen [ - ] or period [ . ] might be suitable joining characters to produce filenames such as proc.k.xml for the kettle procedure and prep.k.xml for the kettle preparation. The details are less important than sticking to a procedure that you can remember! It will become apparent later how a repeatable naming convention can help with site maintenance. Making it easier for readers to remember the URI for a page is the reason the html files should retain clear names, brevity also helps.

Along with each html file I have named the corresponding xml file from which the html is derived.

E.g. teapot.xml is transformed into teapot.html

They need not have the same filenames as the html file, indeed you should note that this example contains two files which could both be named kettle.xml, one dealing with preparation, one with procedure. You should be aware of this if you intend all the xml files to reside in the same directory. I've used a convention to avoid this, by prefixing the name with the section, so I have prep.k.xml and proc.k.xml.

All that is needed is a convention that you can remember when dealing with filenames. If you are from a windows background, you may find it easier to restrict filenames such that all filenames are lower case and each filename has no whitespace in it, so avoid overall index.html, instead develop a way of joining or merging words to remove spaces, e.g. oaindex.html or overall_index.html instead of overall index.html.

There is, however, a good reason to keep the xml filenames and the html filenames related, if not identical. When you are maintaining the site and wish to change or move a file, you need the xml filename! If you have to look this up (in the layout file, of which more later) then it makes it just that little bit more laborious. If you can simply view the Website (the html files) and derive the xml name from that - then the search is shorter. As with many things, it becomes a trade-off. Same names for the xml and html files, or more readable names? The choice is yours.

Just be aware that you may overwrite files by being careless with filenaming!

That's the first and the second step addressed. Until you run into problems with filenames you won't appreciate how easy it is to avoid them! Just takes a little forethought.

3.1.2. Site navigation, the layout file

One reason I listed the xml files alongside the created html files above is because the processing of the xml into html requires this map to be available. The example file shown below is the layout file needed for the processing of this simple example. The purpose is to create a site layout such that links between files can be created, the output file names can be determined, and a certain amount of control is also available. There is another example linked from Section 9.

Firstly the layout file. It is an XML file. It is valid to this DTD (current at the time of writing, this DTD is part of Website 2.6.0) and hence is part of the distribution that you downloaded. If your editor can make use of it, point it at the DTD or relax-NG schema.

Example 1. An example layout file


<layout>
  <toc page="maketea.xml" filename="index.html">              1
    <tocentry dir="prepare" page="prep.xml" filename="prepare.html">   2
      <tocentry page="ing.xml" filename="ingredients.html"/>
      <tocentry page="teapot.xml" filename="teapot.html"/>
      <tocentry page="prep.k.xml" filename="prep.kettle.html"/>           3
    </tocentry>
    <tocentry page="proc.xml" dir="procedure" filename="procedure.html">                     4
      <tocentry page="proc.k.xml" filename="proc.kettle.html"/>
      <tocentry page="brew.xml" filename="brew.html"/>
      <tocentry page="pour.xml" filename="serve.html"/>                           5
    </tocentry>
  </toc>
</layout>  

1

The root of the Website, generated from maketea.xml

2

The first section, created in directory 'prepare', named prepare.html

3

a file in the first section (note no children, no 'dir' attribute)

4

The second section

5

The last file in the second section


Although not holding everything it could, this provides the basic information needed for the layout engine to generate the Website based on the information provided. The indentation provides the 'structure', reflecting the XML nesting, used to create the directories for the output html.

Taking one example.

<tocentry dir="prepare" page="prep.xml" filename="prepare.html">   

This is the XML file prep.xml which creates the html file prepare.html. Note the dir attribute? Also that this element (tocentry) has other elements nested within it? This indicates that this is a section header, i.e. the start of a deeper layer of the Website. From the root of the Website, a directory called prepare will be created, and within that, the first file will be prepare.html.

[Note]Note

If a user navigates to the site root, then to the prepare directory (without specifying a filename within that directory), then you're in trouble, since the browser would normally be presented with a directory listing! To avoid this problem, always name the 'top' file in a directory index.html. The protocol in use by the browser (in most cases) causes the index.html file to be shown when a user navigates to the directory without an explicit filename. That would mean changing the layout file entry to

  <tocentry dir="prepare" page="prep.xml" filename="index.html"> 

It is your choice whether or not to cater for those who don't type in a full path to a file, being used to the 'shortcut' mentioned. The example shows the result so that you can see what happens.

Just a couple of other features that you might wish to know about. Let's assume that proc.xml is a keeper file, i.e. there is nothing to say in the section head although it is required to make a regular structure, so the user may just as well carry on to the first file in that section (proc.kettle.html. This can be achieved by using the tocskip attribute on that entry, set to a value of 1. E.g

   <tocentry page="proc.xml" dir="procedure" filename="procedure.html" tocskip="1">

This has the desired effect. When a user selects this from a table of contents, the navigation system directs the user straight past the file procedure.html and to the next one (proc.kettle.html). The XML file proc.xml must exist (but it can be empty of any real content).

Once all the xml files are written, then all that remains is to build the site. See Section 4 for that. There is some customization available though it is not necessary for a basic Website. An introduction to Website customization is provided in Section 7

That completes the creation of a simple Website. Having that basic outline further files can be added/removed or changed. For that, see Section 3.2.1 later in this document.