xml - Key word count in xslt -
xml - Key word count in xslt -
i , looking @ keyword type of counting mechanism posted here word frequency counter in xslt:
my wrinkle keywords may have multiple words, e.g.:
<xsl:variable name="stopwords" select="('audio codec', 'dual audio', 'audio switch' )"/>
i playing code above question, , had this:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <xsl:variable name="stopwords" select="('audio codec', 'dual audio', 'audio switch' )"/> <wordcount> <xsl:for-each-group group-by="." select=" $w in //text()/tokenize(., '\w+')[not(.=$stopwords)] homecoming $w"> <word word="{current-grouping-key()}" frequency="{count(current-group())}"/> </xsl:for-each-group> </wordcount> </xsl:template>
certainly tokenizing '\w+' breaks words, won't match stopwords, , can multiple words.
can suggest elegant way word counting when keywords may have multiple wrods?
thanks help in this!
russ
given input xml:
<?xml version="1.0" ?> <a> <b>match: sound switch</b> <c>no match:</c> <d>no match: audiocodec</d> <e attr="no match: sound codec"/> no match: sound switch/dual sound match x2: sound switch/dual audio/audio switch no match: <f>xxx audio</f><g>codec yyy</g> </a>
this xslt:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:f="http://example.com/f" exclude-result-prefixes="f"> <xsl:output method="xml" encoding="utf-8" indent="yes"/> <xsl:variable name="keyphrases" select="('audio codec', 'dual audio', 'audio switch' )"/> <xsl:template match="/"> <xsl:variable name="doctext" select="string-join(//text(), '|')"/> <keyphrases> <xsl:for-each select="$keyphrases"> <keyphrase phrase="{.}" count="{f:substr-count($doctext, .)}"/> </xsl:for-each> </keyphrases> </xsl:template> <xsl:function name="f:substr-count"> <xsl:param name="s"/> <xsl:param name="substr"/> <xsl:value-of select="if ($s , $substr , contains($s, $substr)) f:substr-count(substring-after($s, $substr), $substr)+1 else 0"/> </xsl:function> </xsl:stylesheet>
will produce output xml counts occurrences of "stop" words (which renamed keyphrases
):
<?xml version="1.0" encoding="utf-8"?> <keyphrases> <keyphrase phrase="audio codec" count="0"/> <keyphrase phrase="dual audio" count="1"/> <keyphrase phrase="audio switch" count="3"/> </keyphrases>
xml xslt xslt-2.0 word-count
Comments
Post a Comment