Spark Streaming UpdateStateByKey -



Spark Streaming UpdateStateByKey -

i running spark streaming 24x7 , using updatestatebykey function save computed historical info in case of networkwordcount example..

i tried stream file 3lac records 1 sec sleep every 1500 records. using 3 workers

over period updatestatebykey growing, programme throws next exception

error executor: exception in task id 1635 java.lang.arrayindexoutofboundsexception: 3

14/10/23 21:20:43 error tasksetmanager: task 29170.0:2 failed 1 times; aborting job 14/10/23 21:20:43 error diskblockmanager: exception while deleting local spark dir: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/t/spark-local-20141023204346-b232 java.io.ioexception: failed delete: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/t/spark-local-20141023204346-b232/24 14/10/23 21:20:43 error executor: exception in task id 8037 java.io.filenotfoundexception: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/t/spark-local-20141023204346-b232/22/shuffle_81_0_1 (no such file or directory) @ java.io.fileoutputstream.open(native method)

how handle this? guess updatestatebykey should periodically reset growing in rapid rate, please share illustration on when , how reset updatestatebykey.. or there other problem? shed light.

any help much appreciated. time

did set checkpoint ssc.checkpoint("path checkpoint")

spark-streaming

Comments

Popular posts from this blog

assembly - What is the addressing mode for ld, add, and rjmp instructions? -

vowpalwabbit - Interpreting Vowpal Wabbit results: Why are some lines appended by "h"? -

Is there a way to convert an HTML page styled with Bootstrap CSS into email-compatible html? -