Docbook Lessons

2007-07-03T15:12:36Z
Dave Pawson.  link
Home

Docbook Lessons

Adapting Docbook

I was visiting some relatives recently and was shown their genealogy output collated on a Windows program. It was about 20 years worth, roughly 1000 individuals. What gave me pause for thought was the odd individual left floating. No clear ties to anyone else on the tree. For whatever reason it made me think of RDF, and the querying abilities thereof. With that in mind I sought a Linux product, found GRAMPS and having checked it can export XML, which it does, I started to look for a way to get the Windows data (family tree maker is the product), via XML, and into RDF. My first task was to sketch out some RDF, which I managed, with a lot of help from the semweb people on #swig, particularly sbp, for which thanks.

Knowing I can get most XML into rdf, my next task was to look at Gramps. I joined the mailing list just in time to read an email which contained the following sentiments.

I am Hungarian, and so is about half my family.  The rest of them are
from one part or another of the English-speaking world.

This leaves me with a peculiar dilemma, because Hungarian (as well as
some Asian) names actually following a different word order than is
traditional elsewhere.

Basically, all my Hungarian relatives' names are like:

KISS János
KOVÁCS István
PÉNZVÁLTÓ Eszter Johanna Mária

and so on and so forth (family name is capitalized).

This is, in my mind, not just a *display convention*.  The name order
is a fundamental property of the name.  (e.g.: Mao Zedong's family
name is Mao [yes, really!], and Zedong is his first name.)  For more
info: http://en.wikipedia.org/wiki/Name_order#Name_order

The closest I can get to doing things properly at this time is to
treat the name order as if it were a display convention, and display
all names in the Eastern Name Order in the Hungarian reports of the
database, and in Western Name Order for the English reports of the
database.

I have submitted a feature request for this though...

My bigger problem is, that even by playing around with the name order,
I cannot have all my relative's names appear correctly, because of an
apparent lack of "functional" tagging of non-family/non-patronymic
names.

i.e.:

PÉNZVÁLTÓ Eszter Johanna Mária

PÉNZVÁLTÓ <--- family name
Eszter <--- Saint's Name chosen at Catholic Confirmation
Johanna <--- Secondary Given Name
Mária <---- Primary Given Name

... which basically means, that the proper way to "anglicize" or
"westernize" the word order would be:

Mária Johanna Eszter PÉNZVÁLTÓ

Basically reversing everything... of course, since given names are
lumped together, the closest I can do is:

Eszter Johanna Mária PÉNZVÁLTÓ

Which, I think, gives the impression that the person's "first name" /
"primary given name" is Eszter... the only name in her list of given
names that isn't even functionally a "first name", but is rather a
"middle name" in the religious sense.

It was far more than I knew about name order!

Not being shy, I immediately thought of the enormous flexibility I get with docbook customisation, and made such a suggestion on the user mailing list. A french respondant pooh poohed the idea, complaining about XSLT and memory usage etc. I thought it could be done, and having had the gauntlet thrown down I started to look at it seriously. Firstly from the docbook end.

I've used docbook for a number of years and I'm thoroughly impressed with the quality of the teams work. There are some seriously clever people involved. Bob (seemingly) came from nowhere and demonstrated more than a passing knowledge of the XSLT stylesheets, which are quite complex. Norm started them, after having designed the DSSSL ones, no mean feat in itself. I knew it would be hard. I really had no idea just how hard.

My starting point was to persuade the html stylesheets to work with XSLT 2.0 (why not I asked myself, possibly rather foolishly) and with another input schema, which was namespaced. Luckily the schema was a piece of cake compared to docbook. Seven grouping elements, each holding fairly simple structures, though lots of them. My test piece was just over 2000 people.

Firstly to isolate the required stylesheets once I'd pulled the one holding the root template. It was fourteen from the html directory, then ten from the common directory, with the lib directory as well. This was my starting point, without doing any modifications at all.

Once I'd found the root template (honest, that was no mean feat!) in chunk-code.xsl I was away. Yes, I wanted chunking, I didn't think a single html file would be much use. Hence I needed to grok the chunking schema, and its linking mechanism. I started chasing down the calling sequence once I realised it wouldn't be easy. Oh. My other requirement was to make full use of (hence extend) the l10n aspects of docbook, since Gramps is already l10n aware. The hard part was knowing where to start tweaking and where I could leave well alone. The inter-page linking is robust and works well, so I wanted to keep that, even if it may not eventually be included. The hard part was adding my elements in place of the docbook ones. Firstly the elements I wanted to be chunked. Turned out to be 3 places in chunk-code.xsl and two places in chunk-common.xsl. I then added my own file, gramps.xsl in place of docbook.xsl to do the mix of including and importing. Nice one Norm. It is a seriously good lesson for anyone wanting an advanced lesson in XSLT. I then decided initially to create a single stylesheet file to do the content processing, I called this gramps.groups.xsl. Original? OK, no. The key lesson here is to avoid applying templates where not needed, since the chunking logic needs to take precedence over mine. The reported error is one I'd never seen before. It implied I was writing out to other than the main output stream whilst writing into a variable! Remember that these stylesheets were designed with XSLT 1.0 in mind.

Eventually I started to make sense of the linking logic, the key being (IMHO) the href.target and navig.content templates. Take care to recognise the called ones and the ones which are used by applying to the current context. Once I'd made a start I quickly put together a framework which created the 7 top level (empty) files. I then wanted to create a table of contents, each linked to the individual person page. I decided to try one of the grouping techniques to present it as a table. Mistake. I also wanted to sort by name. The combination was complex. I spent over a day getting this wrong before I backed out and left it for later.

Having this base framework working, I then started the slog. Putting the stylesheets for the new material together. This was fairly straightforward, since I'd already learned the linking lessons. I quickly learned about l:gentext('key'), which was my xslt 2.0 replacement for the 1.0 called template in l10n.xsl.

I kept popping questions on the Gramps mailing list which were promptly answered. I was stopped smartly when I tried to file a bug report at gramps-project.org. The project are using captcha for anti-spam. With my accessibility background I mailed the project lead and put the work on hold. No response as yet.

Summary.. Docbook xslt is scarily complex, an aspect I'm led to believe is quite probably a result of its development over the years and the increasingly complex schema. It is also very competant. It can be adapted to a new schema. If I can do it, I'm sure others can.

The l10 and customization aspects are in my view key to its success, nearly as much as the team of supporters it has. Thanks people.

Keywords: docbook

Comments (View)

Return to main index