c - fast string search to find string array element which matches the givven pattern -



c - fast string search to find string array element which matches the givven pattern -

i have array of constant strings iterate through find index of element string contains search pattern. search algorithm should take improve speed of finding element? not limited in time before running application preparing tables if necessary.

i corrected question - not doing exact string match - searching pattern within element, in array

array of strings: [0] reddish fox jumps on fence [1] bluish table [2] reddish flowers on fence

i need find element contains word 'table' - in case element 1

i 50000 iterations of set of 30 array contain 30000 strings of not less 128 characters. using good-old strstr brute forcefulness slow...

ok, posting part of function, first strstr - looks in uncut array of lines if there occurrences, brute search follows. know can speed part, not doing optimization on approach...

// sheets[i].buffer - contains page of text split lines // fullfunccall - pattern // sheets[i].cells[k] - pointers lines in buffer for( i=0; i<sheetcount; i++) { if( i!= currsheet && sheets[i].name && sheets[i].name[0] != '\0') { if( strstr(sheets[i].buffer, fullfunccall )) { usedexternally = 1; int foundatleastone = 0; for( k=0; k<sheets[i].numcells; k++ ) { strncpy_s(testline, max_line_size, sheets[i].cells[k]->line, sheets[i].cells[k]->linesize); testline[sheets[i].cells[k]->linesize] = '\0'; if( strstr(testline, fullfunccall )) { dependency_num++; if( dependency_num >= max_cell_dependencies-1) { printf("allocation sheet cell dependencies insuficcient\n"); return; } sheets[currsheet].cells[currcellposinsheet]->numdeps = dependency_num+1; foundatleastone++; sheets[currsheet].cells[currcellposinsheet]->deps[dependency_num] = &sheets[i].cells[k]; } } if( foundatleastone == 0 ) { printf("error locating dependency external func: %s\n", fullfunccall ); return; } } }; }

you wrote 'haystack' (the set of strings search through) 30000 strings approx. 200 characters each. wrote 'needle' (the term search for) either string of 5 or 20 characters.

based on this, precompute hashtable maps 5-character subsequence string(s) in haystack occurs in. 30000 strings (200 characters each) there @ 30000 * (200 - 5) = 5.850.000 different 5-character substrings. if hash each of 16bit checksum you'd need minimum 11mb of memory (for hash keys) plus pointers pointing string(s) in substring occurs.

for instance, given simplfied haystack of

static const char *haystack[] = { "12345", "123456", "23456", "345678" };

you precompute hash map maps possible 5-character string such that

12345 => haystack[0], haystack[1] 23456 => haystack[1], haystack[2] 34567 => haystack[3] 45678 => haystack[4]

with this, take first 5 characters of given key (either 5 or 20 characters long), hash , normal strstr through strings key mapped hash.

c fastsearch

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -