java - n-gram similarity for the words in the file -



java - n-gram similarity for the words in the file -

/* * alter license header, take license headers in project properties. * alter template file, take tools | templates * , open template in editor. */ bundle sim; import java.io.*; import java.util.arrays; import java.util.scanner; import java.util.logging.level; import java.util.logging.logger; import static jdk.nashorn.internal.objects.nativemath.max; /** * * @author admin */ public class sim { public string[][] bigramizedwords = new string[500][100]; public string[] words = new string[500]; public file file1 = new file("file1.txt"); public file file2 = new file("file2.txt"); public int tracker = 0; public double matches = 0; public double denominator = 0; //this hold sum of bigrams of 2 words public double res; public double results; public scanner a; public printwriter pw1; public sim(){ intialize(); // bigramize(); results = max(res); system.out.println("\n\nthe bigram similarity value between " + words[0] + " , " + words[1] + " " + res + "."); pw1.close(); } /** * @param args command line arguments */ public static void main(string[] args) { sim si=new sim(); // todo code application logic here } public void intialize() { int j[]=new int[35]; seek { file file1=new file("input.txt"); file file2=new file("out.txt"); scanner = new scanner(file1); printwriter pw1= new printwriter(file2); int i=0,count = 0; while (a.hasnext()) { java.lang.string gram = a.next(); if(gram.startswith("question")|| gram.endswith("?")) { count=0; count-=1; } if(gram.startswith("[")||gram.startswith("answer")||gram.endswith(" ") ) { //pw1.println(count); j[i++]=count; count=0; //pw1.println(gram); //system.out.println(count); } else { // system.out.println(count); count+=1; //system.out.println(count + " " + gram); } int line=gram.length(); int sa_length; //int[] j = null; int refans_length=j[1]; //system.out.println(refans_length); for(int k=2;k<=35;k++) // system.out.println(j[k]); //system.out.println(refans_length); for(int m=2;m<=33;m++) { sa_length=j[2]; //system.out.println(sa_length); for(int s=0;s<=refans_length;s++) { for(int l=0;l<=sa_length;l++) { (int x = 0; x <= line - 2; x++) { int tracker = 0; bigramizedwords[tracker][x] = gram.substring(x, x + 2); system.out.println(gram.substring(x, x + 2) + ""); //bigramize(); } // bigramize(); } } } bigramize(); words[tracker] = gram; tracker++; } //pw1.close(); } grab (filenotfoundexception ex) { logger.getlogger(sim.class.getname()).log(level.severe, null, ex); } } public void bigramize() { //for(int p=0;p<=sa_length;p++) denominator = (words[0].length() - 1) + (words[1].length() - 1); (int k = 0; k < bigramizedwords[0].length; k++) { if (bigramizedwords[0][k] != null) { (int = 0; < bigramizedwords[1].length; i++) { if (bigramizedwords[1][i] != null) { if (bigramizedwords[0][k].equals(bigramizedwords[1][i])) { matches++; } } } } } matches *= 2; res = matches / denominator; } }

i have tried above code bigramizing words in file "input.txt" have got result of bigram didnt similarity value. e.g: input file contains as

answer: high risk simulate behaviour solution set rules [2] rules outline high source knowledge [1] set rules simulate behaviour

in above illustration have compare words under reply every word under [2] {high,rules} {high,outline} {high,high} {high,source} {high,knowledge} , have store maximum value of above comparing , 1 time again sec word reply taken , similar process taken. @ last, mean of maximum value of each iteration taken.

java similarity

Comments

Popular posts from this blog

assembly - What is the addressing mode for ld, add, and rjmp instructions? -

vowpalwabbit - Interpreting Vowpal Wabbit results: Why are some lines appended by "h"? -

Is there a way to convert an HTML page styled with Bootstrap CSS into email-compatible html? -