Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark -
Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark -
i'm using mllib of apache-spark , scala. need convert grouping of vector
import org.apache.spark.mllib.linalg.{vector, vectors} import org.apache.spark.mllib.regression.labeledpoint
in labeledpoint in order apply algorithms of mllib each vector composed of double value of 0.0 (false) or 1.0 (true). vectors saved in rdd, final rdd of type
val data_tmp: org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.vector]
so, in rdd there vectors create
def createarray(values: list[string]) : vector = { var arr : array[double] = new array[double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: vector = vectors.dense(arr) homecoming dv } /*each element of result list[string]*/ val data_tmp=result.map(x=> createarray(x._2)) val data: rowmatrix = new rowmatrix(data_tmp)
how can create rdd (data_tmp) or rowmatrix (data) labeledpoint set using mllib algorithms? illustration need apply svms linear alghoritms show here
i found solution:
def createarray(values: list[string]) : vector = { var arr : array[double] = new array[double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: vector = vectors.dense(arr) homecoming dv } val data_tmp=result.map(x=> createarray(x._2)) val parseddata = data_tmp.map { line => labeledpoint(1.0,line) }
apache scala label apache-spark mllib
Comments
Post a Comment