Oracle - Optimizing looping over all regex matches on CLOB column when only interested in the distinct matches -
Oracle - Optimizing looping over all regex matches on CLOB column when only interested in the distinct matches -
i'm making stored procedure loops on table (with many one thousand rows), , each row there clob column want fetch matches on regular look (sa "fnr"). thereafter, want insert each distinct match in new table. single clob column may contain thousands of matches, oftentimes same "fnr" repeats in clob - i.e. there much fewer distinct regex matches, , ones i'm interested in. however, procedure i've made takes ridiculously long time, , suspect looping on matches time consuming part.
my procedure looks this:
create or replace procedure sp_mtv_finn_fnr begin declare v_n number; v_cnt number; v_mtrid number; v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_doc clob; v_fnr varchar2(11); begin -- rows table -- select count(*) v_n table; if v_n > 0 -- loop on rows -- in 1..v_n loop select doc, mtrid v_doc, v_mtrid (select doc doc, id mtrid, rownum rnum table rownum <=i) rnum >= i; if v_doc not null select regexp_count(v_doc, v_regex_fnr) v_cnt dual; if v_cnt >= 1 -- each regex match - time consuming, right? -- j in 1..v_cnt loop select regexp_substr(v_doc, v_regex_fnr, 1, j, 'm') v_fnr dual; if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) select v_mtrid, v_fnr dual; end if; end loop; end if; end if; commit; end loop; end if; end loop; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end;
/
do have thought of how optimize procedure?
i'm using oracle 11.2.0.3.0. (btw, know of ctx_entity-package, disabled on version. still, i'm thinking of enabling it.)
update
after applying helpful performance optimizing techniques given nop77svk, can certainty regexp_substr()
on clob
bottleneck, there unfortunately no performance improvement. however, came "hack/workaround" minimize amount of regexp_substr()
calls, tremendous performance improvement. first thought of making incrementally "trained" regex, excluding previous matches, oracle doesn't back upwards negative lookahead, didn't work. ended saving clob
, , using regexp_replace()
remove occurences of match. there lot of same occurences in clob, saved procedure lot of regexp_substr()
calls, , simultaneously dealt distinct requirement.
under follows result, based on nop77svk's contribution. , yes, i'm using dual
in merge
-statement, there way around here?
create or replace procedure sp_mtv_finn_fnr2 begin declare v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); v_doc clob; type rec_table2 record ( mtr_id table2.mtr_id%type, fnr table2.fnr%type ); type arr_table2 table of rec_table2 index simple_integer; table2_bulk arr_table2; table2_row rec_table2; begin rec in ( select doc, mtr_id mtrid table doc not null ) loop v_doc := rec.doc; loop v_fnr := regexp_substr(v_doc, v_regex_fnr, 1, 1, 'm'); exit when v_fnr null; v_vedlegg := regexp_replace(v_doc, v_fnr , '' , 1 , 0); -- incrementally remove occurences of match doc -- if check_fnr(v_fnr) = 'true' table2_row.mtr_id := rec.mtrid; table2_row.fnr := v_fnr; table2_bulk(table2_bulk.count+1) := table2_row; end if; end loop; end loop; forall in indices of table2_bulk merge table2 t using (select table2_bulk(i).mtr_id mtrid, table2_bulk(i).fnr fnr dual) b on (t.mtr_id = b.mtrid , t.fnr = b.fnr) when not matched insert (t.mtr_id, t.fnr) values (b.mtrid, b.fnr); commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; end; /
iteratively tuning pl/sql block ...
iteration 0: fixing syntax errors ...
create or replace procedure sp_mtv_finn_fnr v_n number; v_cnt number; v_mtrid number; v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_doc clob; v_fnr varchar2(11); begin -- rows table -- select count(*) v_n table; if v_n > 0 -- loop on rows -- in 1..v_n loop select doc, mtrid v_doc, v_mtrid (select doc doc, id mtrid, rownum rnum table rownum <=i) rnum >= i; if v_doc not null select regexp_count(v_doc, v_regex_fnr) v_cnt dual; if v_cnt >= 1 -- each regex match - time consuming, right? -- j in 1..v_cnt loop select regexp_substr(v_doc, v_regex_fnr, 1, j, 'm') v_fnr dual; if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) select v_mtrid, v_fnr dual; end if; end loop; end if; end if; commit; end loop; end if; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end;
iteration 1: removing unnecessary context switches , useless row counting ...
declare v_cnt number; v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); begin rec in ( select doc, id mtrid table ) loop if rec.doc not null v_cnt := regexp_count(rec.doc, v_regex_fnr); if v_cnt >= 1 -- each regex match - time consuming, right? -- j in 1..v_cnt loop v_fnr := regexp_substr(rec.doc, v_regex_fnr, 1, j, 'm'); if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) values (rec.mtrid, v_fnr); end if; end loop; end if; end if; end loop; commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; /
iteration 2: decreasing number of outer loops ...
declare v_cnt number; v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); begin rec in ( select doc, id mtrid table doc not null ) loop v_cnt := regexp_count(rec.doc, v_regex_fnr); if v_cnt >= 1 -- each regex match - time consuming, right? -- j in 1..v_cnt loop v_fnr := regexp_substr(rec.doc, v_regex_fnr, 1, j, 'm'); if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) values (rec.mtrid, v_fnr); end if; end loop; end if; end loop; commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; /
iteration 3: shortening code of iteration 2 ...
declare v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); begin rec in ( select doc, id mtrid, regexp_count(rec.doc, v_regex_fnr) regexp_cnt table doc not null , regexp_like(doc, v_regex_fnt) ) loop j in 1..rec.regexp_cnt loop v_fnr := regexp_substr(rec.doc, v_regex_fnr, 1, j, 'm'); if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) values (rec.mtrid, v_fnr); end if; end loop; end loop; commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; /
iteration 4: removing unnecessary regexp_count()
counting ...
declare v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); j integer; begin rec in ( select doc, id mtrid table doc not null ) loop j := 1; loop v_fnr := regexp_substr(rec.doc, v_regex_fnr, 1, j, 'm'); exit when v_fnt null; if check_fnr(v_fnr) = 'true' insert table2(mtr_id, fnr) values (rec.mtrid, v_fnr); end if; j := j + 1; end loop; end loop; commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; /
iteration 5: saving results memory , flushing db @ 1 time (using collection binding), plus dealing distinct requirement ...
create or replace type obj_table2 object ( mtr_id integer, fnr varchar2(4000) ); / create or replace type arr_table2 table of obj_table2; / declare v_regex_fnr varchar2(54) := '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))'; v_fnr varchar2(11); j integer; table2_bulk arr_table2 := arr_table2(); begin rec in ( select doc, id mtrid table doc not null ) loop j := 1; loop v_fnr := regexp_substr(rec.doc, v_regex_fnr, 1, j, 'm'); exit when v_fnt null; if check_fnr(v_fnr) = 'true' table2_bulk.extend(); table2_bulk(table2_bulk.last) := new obj_table2( mtr_id => rec.mtrid, fnr => v_fnr ); end if; j := j + 1; end loop; end loop; insert table2(mtr_id, fnr) select mtr_id, fnr table(table2_bulk) x minus select mtr_id, fnr table2; commit; exception when others dbms_output.put_line('error - rollback'); dbms_output.put_line('the error code ' || sqlcode || '- ' || sqlerrm); rollback; end; /
iteration 6: throwing away whilst having decided show off detestably ...
insert table2 (mtr_id, fnr) xyz (doc, mtrid, fnr, j) ( select doc, id mtrid, cast(null varchar2(4000)) fnr, 0 j table doc not null -- union -- select doc, mtrid, regexp_substr(doc, '(((0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d{2}))(\d{5}))', 1, j+1, 'm') fnr, j+1 xyz x j = 0 or j > 0 , x.fnr not null ) select distinct mtrid, fnr xyz j > 0 , fnr not null , check_fnr(fnr) = 'true' ; commit;
please note these code snippets may not work. since did not provide test info setup, can tune code in hypothetical way.
please note slowest part of still regexp_substr()
on clob
value. might want think using position
parameter of regexp_substr()
instead of occurence
parameter subsequent regexp matches.
enjoy.
regex oracle performance loops distinct
Comments
Post a Comment