scala - How to ignore lines with missing fields in the database -
scala - How to ignore lines with missing fields in the database -
so i'm next tutorial on spark using scala, , working this dataset wikimedia. interested in generating histogram of total page views language. first column language, while 3rd column page views. however, seems lines in database not have field 3rd column, arrayindexoutofbondexception
error when run next code.
scala> val tuples = pagecounts.map(line => line.split(" ")) scala> val keyvaluepairs = tuples.map(line => (line(0).substring(0, 2), line(2).toint)) scala> keyvaluepairs.reducebykey(_+_, 1).collect
does have idea, how ignore lines have missing fields 3rd column, can run query against lines contain field 3rd column in database?
you want filter page counts ones 3 fields beingness operated on. utilize filter
select those:
val tuples = pagecounts.map(line => line.split(" ").filter(_.length == 3))
database scala bigdata apache-spark
Comments
Post a Comment