XSLT and Omnimark
Does anyone have any feedback of when and where they would recommend using XSL/T, instead of Omnimark or any other text processing language?
Things which make XSLT better than Omnimark:
However, I still use Omnimark for somethings such as:
XSLT does not allow you to output anything to the document prolog aside from the doctype statement.
In omnimark you have to be careful to avoid conflicting rules, and when you get into large conversion scripts, this makes code incredibly cumbersome
Noting: I would certainly hesitate to offer a hostage to fortune by suggesting that there are jobs which only one of the two languages could cope with. you can almost certainly, with greater or less elegance, solve any problem with both. whether the solutions are always *sensible* is another matter. I wrote a KWIC concordance generator in XSLT. possible, but was it wise?
Rick Geimer puts a size perspective on the comparison
When file sizes vary greatly, so does the performance. I suggest finding the average size, doing some benchmarking between XSLT and OmniMark, and picking a cutoff point where you are more comfortable with one over the other. If your sizes range from 20k -> 3MB, then I bet your average size is somewhere around a few hundred k, which is a gray area as far as I am concerned. I'm sure if all your files were 3MB, XSLT would be bogging you down by now.
Also, there are many ways to extend OmniMark besides system calls. You can use the C/C++ api, or web services. If I need to call Java from OmniMark, I write a web service servlet that talks XML-RPC or SOAP, then use omhttp to make the call. This creates a very extensible and scalable solution that I can spread over multiple servers if necessary. Heck, I can even call an XSLT processor to do the easy stuff, then use OmniMark for the more complicated cleanup. In short, I don't think the two are mutually exclusive.
XSLT vs Omnimark
I use both, but I tend to stick with OmniMark for anything complex. Here are some pros and cons as I see them: OmniMark Pros: Built in regular expression language Support for DTDs Complete control over the output Support for SGML as well as XML Fast OmniMark Cons: Proprietary language Ignores some wellformedness errors in XML that are legal in SGML Will probably never be implementable on the client side (i.e. in a browser) Syntax can be a little confusing to newcomers XSLT Pros: Random access to the entire tree Non-proprietary language with many evolving implementations Implementations are appearing on both the client and server side XSLT Cons: No regular expressions Current implementations tend to be a little slow (this is improving, though) No support for DTDs Variables that don't vary - (I know this is by design, but it is a pain sometimes) Syntax can be a little confusing to newcomers Basically, I like XSLT for the most part, it allows me to do 95% of what I need to do fairly easily, but trying to accomplish that last 5% of a complex job is a real pain, or in some cases virtually impossible. This is probably because the focus of XSLT doesn't meet my needs. XSLT is a tree transformation tool, and it is very good at what it does, but if you need to do more than just move nodes around, you would be better off looking elsewhere at this time. See http://www.omnimark.com James Robertson adds: * XSLT is becomming pretty common, so many people understand it. * Omnimark is much more powerful, and extensible. * Omnimark has regular expressions, which are vital for almost all real-world work. It also has much cleaner handling of multiple files, data structures, etc. * Both have strange, bizzare syntaxes. * Both are free. * XSLT has better support for XML (Omnimark is primarily an SGML tool). Omnimark is improving in this area, though. * Omnimark primarily works on valid documents (ie the ones with DTDs). XSLT works well on well-formed documents as well as valid ones. * Both can be extended using external functions, in a variety of languages. * Omnimark is streaming, and very fast. It doesn't require 40meg of ram for a 50kb document (see earlier message re: XSLT). * Omnimark can easily handle 100+ meg documents without requiring unreasonable amounts of RAM. And simple user requirements get steadily more complex as time goes on, so I want a tool that has plenty of power, and few limitations. I would recommend trying both. Your biggest problem is that both tools have a steep learning curve. Ken Holman adds There is only access to the currently element and its ancestry (all currently open elements) and no access to other constructs of the source, thus, the programmer must accommodate forward referencing (your term "look-ahead pull"). It is OmniMark's responsibility (not the programmer) to emit the final file with all the programmer-resolved referent values (it is an error if a referent's value is not defined by the programmer). While some term this "two-pass", I've heard "two-pass" reserved for when it is the programmer's responsibility to satisfy the second pass, which is not true in this case. The programmer only sees the result data once; the programmer only sees the source data once; OmniMark sees the result data the second time when filling in the place-holders and is *very* efficient doing so entirely behind the scenes without program intervention, thus I find the term "one-and-a-half-pass" quite apropos. The streaming nature of OmniMark is great for some problems and there is no overhead for the source document (it is not maintained in memory), only for the result document (and the intermediate result is on disk, not memory; I think referent values are in memory, but I'm not sure and it doesn't affect me as a programmer). The tree nature of XSLT is great for some problems and, being result oriented, has no overhead for the result (it can be instantly serialized), but does for the source (the entire file has to be accessible at all times; currently this is in memory for the processors I'm aware of). Two different approaches for transformation ... one isn't necessarily better or worse than the other in the general case or language definition, just different to the extent that a direct comparison of the two is difficult. I use both and I choose which one based on the requirement, the customer, the nature of the data, and the nature of the transformation.