Flink within YARN
Application Submission and Component interaction
Application
The application consists of a so-called JobGraph, a logical dataflow graph
Job Manager
Single application control by single Job Manager
For each job client is submitted , the session will spawn its own JobManager.
- All Jobs run under session-user credentials
- Resource Manager holds on to containers for a certain time
- Internally, sessions built on the dispatcher component.
- TaskManager == YARN Container
Data Flow Graph
Blue Nodes == tasks
Edge == data dependency
Data Exchange Strategies
Forward
Sending data from one task to a receiving task
Broadcast
Send all the data to all parallel tasks
Key-based
Guarantees that data items having the same key will be processed by the same task
Random
Data distribute randomly to tasks
Parallel Task Execution
- C & A operators are data sources while E is a data sink.
- TaskManager runs only tasks of one application
- Scheduling tasks as slices to slots has the advantage that many tasks are co-located on the TaskManager which means that they can efficiently exchange data without accessing the network.
No comments:
Post a Comment