Remove duplicate nodes
select="reference[not(@name=following::reference/@name)]" will return you a list of reference elements with no two having the same name.
Remove duplicates from a list
Q: Expansion. A part of my xml looks like this: <location> <state>xxxx</state> </location> <location> <state>yyyy</state> </location> <location> <state>xxxx</state> </location> The desired output is: xxxx yyyy That is, duplicate values of state should not be printed. Can this be done? <xsl:variable name="unique-list" select="//state[not(.=following::state)]" /> <xsl:for-each select="$unique-list"> <xsl:value-of select="." /> </xsl:for-each>
Testing for duplicate id values on two elements.
If there is an element that has the same @id attribute as the current element, then:
@id = (preceding::*/@id | following::*/@id)
would be true. You can get elements that have repeated ids with:
//*[@id = (preceding::*/@id | following::*/@id)]
Of course if you test the above expression as a boolean, then it will return true if there are any repeated ids in the document.
You can list the IDs that are repeated (once per ID) with:
//*[not(preceding::*/@id = @id) and following::*/@id = @id]
The preceding:: and following:: axes are, of course, horribly inefficient. You could use a recursive solution instead, or you could use keys:
<xsl:key name="ids" select="*" use="@id" />
If, for an element, the 'ids' key returns more than one node, then it's a repeated id, so you can test whether there are any repeats with:
[Aside: I've started using positional predicates to test whether a node set has more than a certain number of nodes, although perhaps any processor that optimises '$nodes' will also optimise 'count($nodes) > 1'. Any thoughts?]
If you want to get a list (without repeats) of the repeated IDs using the key() method, then you can use the usual:
//*[key('ids', @id) and count(.|key('ids', @id)) = 1]
[or actually, by extension to the above aside:
//*[key('ids', @id) and not((.|key('ids', @id)))]
making identity-testing of nodes even more obscure! :)]
Testing this across documents is harder because (a) nodes from different documents aren't related to each other through axes and (b) key() is scoped within the document of the context node. Probably the simplest solution is to create a node set variable holding copies of the nodes from the relevant documents, and then test that as if it were a document on its own:
<xsl:variable name="IDed-elements"> <xsl:copy-of select="(/ | document('foo.xml'))//*[@id]" /> </xsl:variable> <xsl:if test="$IDed-elements//*[key('ids', @id)]"> <!-- repeated IDs --> </xsl:if>
Alternatively, you could use a recursive method.
> can you elaborate your approach using count()?
It isn't really mine, but anyway in the beginning, the general method of getting rid of duplicates (or equivalently getting the first item in each group of related items. was to go something like
i.e. select all foo's that don't have the same value as an earlier one.
This works but has quadratic behaviour in the number of nodes being searched.
Steve Meunch had an insight into using xsl:key to improve things. This is populatised by Jeni, who has a good description of it at her site. It's the method I used in my "dynamic" example. First you specify a key, then a node is the first item in a group 9ie the one you want if you are discarding duplicates) if it is the first node in the node set returned by the key.
The only trick part is that XSL doesn't have a node identity test. Given a node $x and a node set $Y how do you tell if $x is in $Y.
two basic methods, one uses generate-id to give strings which you can compare to give node identity test, the other uses set theory.
count($x | $Y) is the number of elements in the set Y union the singlton set $x. This will be equal to count($Y) if $x is already in $Y and equal to count($Y)+1 otherwise.
selects all element child nodes of record for whic count(.|key('x',name()))=1 (rather than 2) ie it selects those nodes that are in the set key('x',name()) ie are the first node of each group. IE subsequent duplicates are not selected.
Sort by number of occurrences and remove duplicates.
Please try the XSL below --
We need to use Muenchian method for grouping. I have used xalan:nodeset extension function.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; xmlns:xalan="http://xml.apache.org/xalan"; exclude-result-prefixes="xalan"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:key name="x" match="docroot/token/pageid" use="."/> <xsl:template match="/"> <xsl:variable name="treefrag"> <docroot> <xsl:for-each select="docroot/token/pageid"> <xsl:if test="generate-id(.) = generate-id(key('x', .))"> <page> <pageid> <xsl:value-of select="."/> </pageid> <no_of_times> <xsl:value-of select="count(key('x', .))"/> </no_of_times> </page> </xsl:if> </xsl:for-each> </docroot> </xsl:variable> <xsl:call-template name="process_tree"> <xsl:with-param name="tree" select="xalan:nodeset($treefrag)"/> </xsl:call-template> </xsl:template> <xsl:template name="process_tree"> <xsl:param name="tree"/> <docroot> <xsl:for-each select="$tree/docroot/page"> <xsl:sort select="no_of_times" data-type="number" order="descending"/> <pageid> <xsl:value-of select="pageid"/> </pageid> </xsl:for-each> </docroot> </xsl:template> </xsl:stylesheet>
List all unique element and attribute names
In XSLT 2.0 it's simply
If you really need to do it with XSLT 1.0, eliminating duplicates is essentially the same problem as grouping, and you can use the Muenchian grouping approach.
The preceding-sibling grouping technique isn't going to work (a) because your nodes are not siblings of each other, and (b) because it only works where the grouping key is the string-value of the node, not where it is some other function of the node (here, it's name). Muenchian grouping works for any string-valued function of a node.
If you are stuck with 1.0 then you need to use a key:
<xsl:key name="names" match="*|@*" use="name()"/>
<xsl:for-each select="//*|//*/@*"> <xsl:if test="generate-id(.) = generate-id(key('names', name(.)))"> <xsl:value-of select="local-name()"/>
In 2.0 you can use distinct-values() or xsl:for-each-group, but it always good to learn this technique.