c++ - How does this code for obtaining LCP from a Suffix Array work? -
c++ - How does this code for obtaining LCP from a Suffix Array work? -
can explain how code constructing lcp suffix array works? suffixarr[]
array such suffixarr[i]
holds value of index in string suffix rank i.
void lcpconstruct() { int i,c[1001],l; c[suffixarr[0]] = n; for(i=1;i<n;i++) c[suffixarr[i]] = suffixarr[i-1]; l = 0; for(i=0;i<n;i++) { if(c[i]==n) lcpadj[i] = 0; else { while(i+l<n && c[i]+l<n && s[i+l] == s[c[i]+l]) l++; lcpadj[i] = l; l = max(l-1,0); } } for(i=0;i<n;i++) cout<<lcpadj[suffixarr[i]]<<"\n"; }
first, it's of import realize algorithm processes suffixes in original order, i.e. order in appear in input string. not in lexicographical order.
so if input string abxabc
, first considers abxabc
, bxabc
, xabc
, forth.
for each suffix considers in order, determines position of suffix lexicographical predecessor(*) (so here, , here, uses concept of lexicographical order). first suffix abxabc
, lexicographical predecessor, i.e. suffix appears straight before in lexicographical ordering of suffixes, abc
. determines o(1) lookup in array c
, has been prepared purpose.
the inner loop compares characters of abxabc
, abc
1 1 , determines these 2 suffixes have first 2 characters in common. variable l
in code, , means entry in lcp suffix abxabc
must 2, set lcpadj[i] = l
. note i
here refers position of suffix in input string, not position in suffix array. lcpadj
not lcp array (yet). it's auxiliary info structure.
then proceeds next string, bxabc
. 1 time again uses c
find bc
lexicographical predecessor of , determines how many prefix characters 2 share. , here comes trick: can sure must at least many in previous step (i.e. 2), minus 1. why? because string consider, bxabc
, of course of study suffix of string considered (abxabc
), hence lexicographical predecessor of string (abc
) must have suffix 1 character shorter (bc
), , suffix must somewhere in suffix array, , must share prefix considered string, minus first character. moreover, there can't other suffix both shorter , lexicographically closer string considered. latter quite logical if think how lexicographical sorting works, there formal proofs of (e.g. lemma 5.10 in kärkkäinen's lecture here)
so describes main principle @ work here. there few things note code understand role of each variable:
as explained,c
auxiliary array (n
integers in length) stores, each suffix in input string, position of other suffix immediate lexicographical predecessor. array constructed not left right, (wisely) going through suffix array left right, because makes easy determine immediate lexicogaphical predecessor of string: immediate lexicographical predecessor of suffix starting @ position suffixarr[i]
must of course of study located @ position suffixarr[i-1]
. confirm in code how c
defined. as mentioned above, lcpadj
stores lcp values suffix in order in appear in input string, not order in appear in suffix array. why @ output time, lcpadj
not printed left right, going through suffix array left right, , printing lcpadj[i]
in order. confirm case. i hope helps. allow me know if not.
(*)by lexicographical predecessor mean immediate predecessor of suffix in lexicographically ordered list of suffixes, i.e. suffix immediate left in suffix array.
c++ algorithm suffix-array
Comments
Post a Comment