Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark -



Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark -

i'm using mllib of apache-spark , scala. need convert grouping of vector

import org.apache.spark.mllib.linalg.{vector, vectors} import org.apache.spark.mllib.regression.labeledpoint

in labeledpoint in order apply algorithms of mllib each vector composed of double value of 0.0 (false) or 1.0 (true). vectors saved in rdd, final rdd of type

val data_tmp: org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.vector]

so, in rdd there vectors create

def createarray(values: list[string]) : vector = { var arr : array[double] = new array[double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: vector = vectors.dense(arr) homecoming dv } /*each element of result list[string]*/ val data_tmp=result.map(x=> createarray(x._2)) val data: rowmatrix = new rowmatrix(data_tmp)

how can create rdd (data_tmp) or rowmatrix (data) labeledpoint set using mllib algorithms? illustration need apply svms linear alghoritms show here

i found solution:

def createarray(values: list[string]) : vector = { var arr : array[double] = new array[double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: vector = vectors.dense(arr) homecoming dv } val data_tmp=result.map(x=> createarray(x._2)) val parseddata = data_tmp.map { line => labeledpoint(1.0,line) }

apache scala label apache-spark mllib

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -