You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/docs/deployment/elastic_scaling.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -169,9 +169,9 @@ To use Adaptive Batch Scheduler, you need to:
169
169
- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2).
170
170
171
171
In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler:
172
-
-[`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively. Currently, this option should be configured as a power of 2, otherwise it will be rounded up to a power of 2 automatically.
173
-
-[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively. Currently, this option should be configured as a power of 2, otherwise it will be rounded down to a power of 2 automatically.
174
-
-[`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that since the parallelism of the vertices is adjusted to a power of 2, the actual average size will be 0.75~1.5 times this value. It is also important to note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value.
172
+
-[`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively.
173
+
-[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively.
174
+
-[`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value.
175
175
-[`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source.
176
176
177
177
#### Set the parallelism of operators to `-1`
@@ -190,7 +190,6 @@ Adaptive Batch Scheduler will only decide parallelism for operators whose parall
190
190
191
191
-**Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. Exception will be thrown if a streaming job is submitted.
192
192
-**ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch Scheduler only supports jobs whose [shuffle mode]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) is `ALL-EXCHANGES-BLOCKING`.
193
-
-**The decided parallelism will be a power of 2**: In order to ensure downstream tasks to consume the same count of subpartitions, the configuration option [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism) should be set to be a power of 2 (2^N), and the decided parallelism will also be a power of 2 (2^M and M <= N).
194
193
-**FileInputFormat sources are not supported**: FileInputFormat sources are not supported, including `StreamExecutionEnvironment#readFile(...)``StreamExecutionEnvironment#readTextFile(...)` and `StreamExecutionEnvironment#createInput(FileInputFormat, ...)`. Users should use the new sources([FileSystem DataStream Connector]({{< ref "docs/connectors/datastream/filesystem.md" >}}) or [FileSystem SQL Connector]({{< ref "docs/connectors/table/filesystem.md" >}})) to read files when using the Adaptive Batch Scheduler.
195
194
-**Inconsistent broadcast results metrics on WebUI**: In Adaptive Batch Scheduler, for broadcast results, the number of bytes/records sent by the upstream task counted by metric is not equal to the number of bytes/records received by the downstream task, which may confuse users when displayed on the Web UI. See [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler) for details.
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that since the parallelism of the vertices is adjusted to a power of 2, the actual average size will be 0.75~1.5 times this value. It is also important to note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
15
+
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded down to a power of 2 automatically.</td>
27
+
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded up to a power of 2 automatically.</td>
33
+
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that since the parallelism of the vertices is adjusted to a power of 2, the actual average size will be 0.75~1.5 times this value. It is also important to note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
33
+
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded down to a power of 2 automatically.</td>
45
+
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded up to a power of 2 automatically.</td>
51
+
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that since the parallelism of the vertices is adjusted to a power of 2, the actual average size will be 0.75~1.5 times this value. It is also important to note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
15
+
<td>The average size of data volume to expect each task instance to process if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Note that when data skew occurs or the decided parallelism reaches the <codeclass="highlighter-rouge">jobmanager.adaptive-batch-scheduler.max-parallelism</code> (due to too much data), the data actually processed by some tasks may far exceed this value.</td>
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded down to a power of 2 automatically.</td>
27
+
<td>The upper bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code>. Currently, this option should be configured as a power of 2, otherwise it will also be rounded up to a power of 2 automatically.</td>
33
+
<td>The lower bound of allowed parallelism to set adaptively if <codeclass="highlighter-rouge">jobmanager.scheduler</code> has been set to <codeclass="highlighter-rouge">AdaptiveBatch</code></td>
0 commit comments