In postgresql, which part of locale causes problems with LIKE operations? -

there's lot of give-and-take around net locale other c or posix causing performance problems in postgresql. i'm not clear though part(s) of locale setting cause problems , why.

in manpage initdb(1), see:

--locale=locale  sets default locale database cluster. if  alternative not specified, locale inherited environment  initdb runs in. locale  back  upwards described in section 22.1, \u201clocale support\u201d, in documentation.  --lc-collate=locale, --lc-ctype=locale, --lc-messages=locale, --lc-monetary=locale, --lc-numeric=locale, --lc-time=locale   --locale, sets locale in specified category.

we see:

   alter default collation order or character set classes,  utilize --lc-collate , --lc-ctype options. collation orders other c or    posix have performance penalty. these reasons  of import  take right locale when running initdb.

does mean utilize --lc-collate posix --lc-ctype utf-8 , avoid performance penalties? or there other performance issues involved?

i'm not surprised collation affects sort performance, same issue arises comparisons not using indexes? can explain issue operator?

it sounds you're talking text_pattern_ops operator class , application databases in locales other c.

the issue not 1 of encoding, of collation.

a b-tree index requires have single, stable sort order next invariants, assumption if a < b b > a. comparing operators used sort tree when building , maintaining , index.

for text strings, collation rules language applied comparing operators when determining whether 1 string greater or less strings sort "correctly" far user concerned. these rules locale-dependent, , can things ignore punctuation , whitespace.

the like operator isn't interested in locales. wants find prefix string, , can't ignore punctuation. cannot utilize b-tree index created collation might ignore punctuation/whitespace, etc. like walks downwards index tree character character find match, , can't if index might ignore characters.

that's why, if db uses locale other "c" (posix) must create different indexes utilize like.

example of localised sorting, compare:

regress=> x(v) (values ('10'),('1'),('1.'),('2.'),('.2'),('.1'),('1-1'),('11') ) select v x order v collate "en_au";   v   -----  1  .1  1.  10  11  1-1  .2  2. (8 rows)  regress=> x(v) (values ('10'),('1'),('1.'),('2.'),('.2'),('.1'),('1-1'),('11') ) select v x order v collate "c";   v   -----  .1  .2  1  1-1  1.  10  11  2. (8 rows)

the text_pattern_ops opclass serves need. in newer postgresql releases can create index collate "c" on target column instead, serving same need, e.g.:

create index idx_c on t2(x collate "c");

like utilize such index, , can used faster sorting don't care locale given operation, e.g.

select x t2 order x collate "c";

postgresql locale database-performance

Search This Blog

Jaimee

In postgresql, which part of locale causes problems with LIKE operations? -

Comments

Post a Comment

Popular posts from this blog

c - Compilation of a code: unkown type name string -

java - Bypassing "final local variable defined in an enclosing type" -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -