7/13/2018

Short Notes - Flink Architecture - Task Execution



Flink within YARN




Application Submission and Component interaction



Application


The application consists of a so-called JobGraph, a logical dataflow graph

Job Manager

Single application control by single Job Manager



For each job client is submitted , the session will spawn its own JobManager.

  • All Jobs run under session-user credentials
  •  Resource Manager holds on to containers for a certain time
  •  Internally, sessions built on the dispatcher component. 
  •  TaskManager == YARN Container


Data Flow Graph




Blue Nodes == tasks
Edge == data dependency

Data Exchange Strategies





Forward

Sending data from one task to a receiving task

Broadcast

Send all the data to all parallel tasks

Key-based

Guarantees that data items having the same key will be processed by the same task

Random

Data distribute randomly to tasks

Parallel Task Execution



  • C & A operators are data sources while E is a data sink.   
  • TaskManager runs only tasks of one application
  • Scheduling tasks as slices to slots has the advantage that many tasks are co-located on the TaskManager which means that they can efficiently exchange data without accessing the network.


No comments:

Post a Comment