xml - Key word count in xslt -



xml - Key word count in xslt -

i , looking @ keyword type of counting mechanism posted here word frequency counter in xslt:

my wrinkle keywords may have multiple words, e.g.:

<xsl:variable name="stopwords" select="('audio codec', 'dual audio', 'audio switch' )"/>

i playing code above question, , had this:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <xsl:variable name="stopwords" select="('audio codec', 'dual audio', 'audio switch' )"/> <wordcount> <xsl:for-each-group group-by="." select=" $w in //text()/tokenize(., '\w+')[not(.=$stopwords)] homecoming $w"> <word word="{current-grouping-key()}" frequency="{count(current-group())}"/> </xsl:for-each-group> </wordcount> </xsl:template>

certainly tokenizing '\w+' breaks words, won't match stopwords, , can multiple words.

can suggest elegant way word counting when keywords may have multiple wrods?

thanks help in this!

russ

given input xml:

<?xml version="1.0" ?> <a> <b>match: sound switch</b> <c>no match:</c> <d>no match: audiocodec</d> <e attr="no match: sound codec"/> no match: sound switch/dual sound match x2: sound switch/dual audio/audio switch no match: <f>xxx audio</f><g>codec yyy</g> </a>

this xslt:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:f="http://example.com/f" exclude-result-prefixes="f"> <xsl:output method="xml" encoding="utf-8" indent="yes"/> <xsl:variable name="keyphrases" select="('audio codec', 'dual audio', 'audio switch' )"/> <xsl:template match="/"> <xsl:variable name="doctext" select="string-join(//text(), '|')"/> <keyphrases> <xsl:for-each select="$keyphrases"> <keyphrase phrase="{.}" count="{f:substr-count($doctext, .)}"/> </xsl:for-each> </keyphrases> </xsl:template> <xsl:function name="f:substr-count"> <xsl:param name="s"/> <xsl:param name="substr"/> <xsl:value-of select="if ($s , $substr , contains($s, $substr)) f:substr-count(substring-after($s, $substr), $substr)+1 else 0"/> </xsl:function> </xsl:stylesheet>

will produce output xml counts occurrences of "stop" words (which renamed keyphrases):

<?xml version="1.0" encoding="utf-8"?> <keyphrases> <keyphrase phrase="audio codec" count="0"/> <keyphrase phrase="dual audio" count="1"/> <keyphrase phrase="audio switch" count="3"/> </keyphrases>

xml xslt xslt-2.0 word-count

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -