scala - How to ignore lines with missing fields in the database -



scala - How to ignore lines with missing fields in the database -

so i'm next tutorial on spark using scala, , working this dataset wikimedia. interested in generating histogram of total page views language. first column language, while 3rd column page views. however, seems lines in database not have field 3rd column, arrayindexoutofbondexception error when run next code.

scala> val tuples = pagecounts.map(line => line.split(" ")) scala> val keyvaluepairs = tuples.map(line => (line(0).substring(0, 2), line(2).toint)) scala> keyvaluepairs.reducebykey(_+_, 1).collect

does have idea, how ignore lines have missing fields 3rd column, can run query against lines contain field 3rd column in database?

you want filter page counts ones 3 fields beingness operated on. utilize filter select those:

val tuples = pagecounts.map(line => line.split(" ").filter(_.length == 3))

database scala bigdata apache-spark

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -