c - fast string search to find string array element which matches the givven pattern -
c - fast string search to find string array element which matches the givven pattern -
i have array of constant strings iterate through find index of element string contains search pattern. search algorithm should take improve speed of finding element? not limited in time before running application preparing tables if necessary.
i corrected question - not doing exact string match - searching pattern within element, in array
array of strings: [0] reddish fox jumps on fence [1] bluish table [2] reddish flowers on fence
i need find element contains word 'table' - in case element 1
i 50000 iterations of set of 30 array contain 30000 strings of not less 128 characters. using good-old strstr
brute forcefulness slow...
ok, posting part of function, first strstr
- looks in uncut array of lines if there occurrences, brute search follows. know can speed part, not doing optimization on approach...
// sheets[i].buffer - contains page of text split lines // fullfunccall - pattern // sheets[i].cells[k] - pointers lines in buffer for( i=0; i<sheetcount; i++) { if( i!= currsheet && sheets[i].name && sheets[i].name[0] != '\0') { if( strstr(sheets[i].buffer, fullfunccall )) { usedexternally = 1; int foundatleastone = 0; for( k=0; k<sheets[i].numcells; k++ ) { strncpy_s(testline, max_line_size, sheets[i].cells[k]->line, sheets[i].cells[k]->linesize); testline[sheets[i].cells[k]->linesize] = '\0'; if( strstr(testline, fullfunccall )) { dependency_num++; if( dependency_num >= max_cell_dependencies-1) { printf("allocation sheet cell dependencies insuficcient\n"); return; } sheets[currsheet].cells[currcellposinsheet]->numdeps = dependency_num+1; foundatleastone++; sheets[currsheet].cells[currcellposinsheet]->deps[dependency_num] = &sheets[i].cells[k]; } } if( foundatleastone == 0 ) { printf("error locating dependency external func: %s\n", fullfunccall ); return; } } }; }
you wrote 'haystack' (the set of strings search through) 30000 strings approx. 200 characters each. wrote 'needle' (the term search for) either string of 5 or 20 characters.
based on this, precompute hashtable maps 5-character subsequence string(s) in haystack occurs in. 30000 strings (200 characters each) there @ 30000 * (200 - 5) = 5.850.000 different 5-character substrings. if hash each of 16bit checksum you'd need minimum 11mb of memory (for hash keys) plus pointers pointing string(s) in substring occurs.
for instance, given simplfied haystack of
static const char *haystack[] = { "12345", "123456", "23456", "345678" };
you precompute hash map maps possible 5-character string such that
12345 => haystack[0], haystack[1] 23456 => haystack[1], haystack[2] 34567 => haystack[3] 45678 => haystack[4]
with this, take first 5 characters of given key (either 5 or 20 characters long), hash , normal strstr
through strings key mapped hash.
c fastsearch
Comments
Post a Comment