Skip to content

Latest commit

 

History

History
61 lines (47 loc) · 1.8 KB

File metadata and controls

61 lines (47 loc) · 1.8 KB

第92课:(重要)SparkStreaming中Tanformations和状态管理解密

标签: sparkIMF


只要你会写任何一门编程语言的代码,大数据Spark和Hadoop你已经会60%左右了!

##介绍Spark Streaming中的Transformations和Actions

###Transformations on DStreams

  • map(func)
  • fletMap(func)
  • filter(func)
  • repartition(numPartitions)
  • union(other Stream)
  • count()
  • reduce(func)
  • countByValue()
  • reduceByKey(func,[number Tasks])
  • join(otherStream,[num Tasks])
  • cogroup(otherStream,[num Tasks])
  • transform(func) 直接对RDD进行操作
  • updateStateByKey(func) updateStateByKey的时候它有前面的状态,也有当前的状态,这个所谓的当前的状态,其实就是指过去进行的时间间隔,它会更新前面的值

###Window Operations

  • window(windowLength, slideInterval)
  • countByWindow(windowLength, slideInterval)
  • reduceByWindow(func, windowLength, slideInterval)
  • reduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks])
  • reduceByKeyAndWindow(func, invFunc, windowLength, slideInterval, [numTasks])
  • countByValueAndWindow(windowLength, slideInterval, [numTasks])

###Output Operations on DStreams

  • print()
  • saveAsTextFiles(prefix, [suffix])
  • saveAsObjectFiles(prefix, [suffix])
  • saveAsHadoopFiles(prefix, [suffix])
  • foreachRDD(func)

##状态管理 1.6.x新推出mapWithState

def mapWithState[StateType: ClassTag, MappedType: ClassTag](
      spec: StateSpec[K, V, StateType, MappedType]
    ): MapWithStateDStream[K, V, StateType, MappedType] = {
    new MapWithStateDStreamImpl[K, V, StateType, MappedType](
      self,
      spec.asInstanceOf[StateSpecImpl[K, V, StateType, MappedType]]
    )
  }

mapWithState和updateStateByKey是同样一种类型的操作。

##阅读源码

  • PairDStreamFunctions
  • StateDStream