2007-07-03T15:12:36Z
Dave Pawson.
link
Home
Docbook Lessons
I was visiting some relatives recently and was shown their genealogy output collated on a Windows program. It was about 20 years worth, roughly 1000 individuals. What gave me pause for thought was the odd individual left floating. No clear ties to anyone else on the tree. For whatever reason it made me think of RDF, and the querying abilities thereof. With that in mind I sought a Linux product, found GRAMPS and having checked it can export XML, which it does, I started to look for a way to get the Windows data (family tree maker is the product), via XML, and into RDF. My first task was to sketch out some RDF, which I managed, with a lot of help from the semweb people on #swig, particularly sbp, for which thanks.
Knowing I can get most XML into rdf, my next task was to look at Gramps. I joined the mailing list just in time to read an email which contained the following sentiments.
I am Hungarian, and so is about half my family. The rest of them are from one part or another of the English-speaking world. This leaves me with a peculiar dilemma, because Hungarian (as well as some Asian) names actually following a different word order than is traditional elsewhere. Basically, all my Hungarian relatives' names are like: KISS János KOVÁCS István PÉNZVÁLTÓ Eszter Johanna Mária and so on and so forth (family name is capitalized). This is, in my mind, not just a *display convention*. The name order is a fundamental property of the name. (e.g.: Mao Zedong's family name is Mao [yes, really!], and Zedong is his first name.) For more info: http://en.wikipedia.org/wiki/Name_order#Name_order The closest I can get to doing things properly at this time is to treat the name order as if it were a display convention, and display all names in the Eastern Name Order in the Hungarian reports of the database, and in Western Name Order for the English reports of the database. I have submitted a feature request for this though... My bigger problem is, that even by playing around with the name order, I cannot have all my relative's names appear correctly, because of an apparent lack of "functional" tagging of non-family/non-patronymic names. i.e.: PÉNZVÁLTÓ Eszter Johanna Mária PÉNZVÁLTÓ <--- family name Eszter <--- Saint's Name chosen at Catholic Confirmation Johanna <--- Secondary Given Name Mária <---- Primary Given Name ... which basically means, that the proper way to "anglicize" or "westernize" the word order would be: Mária Johanna Eszter PÉNZVÁLTÓ Basically reversing everything... of course, since given names are lumped together, the closest I can do is: Eszter Johanna Mária PÉNZVÁLTÓ Which, I think, gives the impression that the person's "first name" / "primary given name" is Eszter... the only name in her list of given names that isn't even functionally a "first name", but is rather a "middle name" in the religious sense.
It was far more than I knew about name order!
Not being shy, I immediately thought of the enormous flexibility I get with docbook customisation, and made such a suggestion on the user mailing list. A french respondant pooh poohed the idea, complaining about XSLT and memory usage etc. I thought it could be done, and having had the gauntlet thrown down I started to look at it seriously. Firstly from the docbook end.
I've used docbook for a number of years and I'm thoroughly impressed with the quality of the teams work. There are some seriously clever people involved. Bob (seemingly) came from nowhere and demonstrated more than a passing knowledge of the XSLT stylesheets, which are quite complex. Norm started them, after having designed the DSSSL ones, no mean feat in itself. I knew it would be hard. I really had no idea just how hard.
My starting point was to persuade the html stylesheets to work with XSLT 2.0 (why not I asked myself, possibly rather foolishly) and with another input schema, which was namespaced. Luckily the schema was a piece of cake compared to docbook. Seven grouping elements, each holding fairly simple structures, though lots of them. My test piece was just over 2000 people.
Firstly to isolate the required stylesheets once I'd pulled the one holding the root template. It was fourteen from the html directory, then ten from the common directory, with the lib directory as well. This was my starting point, without doing any modifications at all.
Once I'd found the root template (honest, that was no mean
feat!) in chunk-code.xsl I was away. Yes, I wanted
chunking, I didn't think a single html file would be much
use. Hence I needed to grok the chunking schema, and its linking
mechanism. I started chasing down the calling sequence once I
realised it wouldn't be easy. Oh. My other requirement was to
make full use of (hence extend) the l10n aspects of docbook,
since Gramps is already l10n aware. The hard part was knowing
where to start tweaking and where I could leave well alone. The
inter-page linking is robust and works well, so I wanted to keep
that, even if it may not eventually be included. The hard part
was adding my elements in place of the docbook ones. Firstly the
elements I wanted to be chunked. Turned out to be 3 places in
chunk-code.xsl and two places in
chunk-common.xsl. I then added my own file,
gramps.xsl in place of docbook.xsl to
do the mix of including and importing. Nice one Norm. It is a
seriously good lesson for anyone wanting an advanced lesson in
XSLT. I then decided initially to create a single stylesheet
file to do the content processing, I called this
gramps.groups.xsl. Original? OK, no. The key lesson
here is to avoid applying templates where not needed, since the
chunking logic needs to take precedence over mine. The reported
error is one I'd never seen before. It implied I was writing out
to other than the main output stream whilst writing into a
variable! Remember that these stylesheets were designed with
XSLT 1.0 in mind.
Eventually I started to make sense of the linking logic, the
key being (IMHO) the href.target and
navig.content templates. Take care to recognise the
called ones and the ones which are used by applying to the
current context. Once I'd made a start I quickly put together a
framework which created the 7 top level (empty) files. I then
wanted to create a table of contents, each linked to the
individual person page. I decided to try one of the grouping
techniques to present it as a table. Mistake. I also wanted to
sort by name. The combination was complex. I spent over a day
getting this wrong before I backed out and left it for
later.
Having this base framework working, I then started the
slog. Putting the stylesheets for the new material
together. This was fairly straightforward, since I'd already
learned the linking lessons. I quickly learned about
l:gentext('key'), which was my xslt 2.0 replacement
for the 1.0 called template in l10n.xsl.
I kept popping questions on the Gramps mailing list which were promptly answered. I was stopped smartly when I tried to file a bug report at gramps-project.org. The project are using captcha for anti-spam. With my accessibility background I mailed the project lead and put the work on hold. No response as yet.
Summary.. Docbook xslt is scarily complex, an aspect I'm led to believe is quite probably a result of its development over the years and the increasingly complex schema. It is also very competant. It can be adapted to a new schema. If I can do it, I'm sure others can.
The l10 and customization aspects are in my view key to its success, nearly as much as the team of supporters it has. Thanks people.
Keywords: docbook
Comments (View)Return to main index