In postgresql, which part of locale causes problems with LIKE operations? -
In postgresql, which part of locale causes problems with LIKE operations? -
there's lot of give-and-take around net locale other c or posix causing performance problems in postgresql. i'm not clear though part(s) of locale setting cause problems , why.
in manpage initdb(1), see:
--locale=locale sets default locale database cluster. if alternative not specified, locale inherited environment initdb runs in. locale back upwards described in section 22.1, \u201clocale support\u201d, in documentation. --lc-collate=locale, --lc-ctype=locale, --lc-messages=locale, --lc-monetary=locale, --lc-numeric=locale, --lc-time=locale --locale, sets locale in specified category.
we see:
alter default collation order or character set classes, utilize --lc-collate , --lc-ctype options. collation orders other c or posix have performance penalty. these reasons of import take right locale when running initdb.
does mean utilize --lc-collate posix --lc-ctype utf-8
, avoid performance penalties? or there other performance issues involved?
i'm not surprised collation affects sort performance, same issue arises comparisons not using indexes? can explain issue operator?
it sounds you're talking text_pattern_ops
operator class , application databases in locales other c
.
the issue not 1 of encoding, of collation.
a b-tree index requires have single, stable sort order next invariants, assumption if a < b
b > a
. comparing operators used sort tree when building , maintaining , index.
for text strings, collation rules language applied comparing operators when determining whether 1 string greater or less strings sort "correctly" far user concerned. these rules locale-dependent, , can things ignore punctuation , whitespace.
the like
operator isn't interested in locales. wants find prefix string, , can't ignore punctuation. cannot utilize b-tree index created collation might ignore punctuation/whitespace, etc. like
walks downwards index tree character character find match, , can't if index might ignore characters.
that's why, if db uses locale other "c" (posix) must create different indexes utilize like
.
example of localised sorting, compare:
regress=> x(v) (values ('10'),('1'),('1.'),('2.'),('.2'),('.1'),('1-1'),('11') ) select v x order v collate "en_au"; v ----- 1 .1 1. 10 11 1-1 .2 2. (8 rows) regress=> x(v) (values ('10'),('1'),('1.'),('2.'),('.2'),('.1'),('1-1'),('11') ) select v x order v collate "c"; v ----- .1 .2 1 1-1 1. 10 11 2. (8 rows)
the text_pattern_ops
opclass serves need. in newer postgresql releases can create index collate "c"
on target column instead, serving same need, e.g.:
create index idx_c on t2(x collate "c");
like
utilize such index, , can used faster sorting don't care locale given operation, e.g.
select x t2 order x collate "c";
postgresql locale database-performance
Comments
Post a Comment