Skip to content

Commit

Permalink
[feature](segcompaction) enable segcompaction by default (apache#18722)
Browse files Browse the repository at this point in the history
Signed-off-by: freemandealer <[email protected]>
  • Loading branch information
freemandealer committed Apr 17, 2023
1 parent 74d424e commit 16cdd9e
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 16 deletions.
2 changes: 1 addition & 1 deletion be/src/common/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -875,7 +875,7 @@ CONF_String(be_node_role, "mix");
// Hide the be config page for webserver.
CONF_Bool(hide_webserver_config_page, "false");

CONF_Bool(enable_segcompaction, "false"); // currently only support vectorized storage
CONF_Bool(enable_segcompaction, "true");

// Trigger segcompaction if the num of segments in a rowset exceeds this threshold.
CONF_Int32(segcompaction_threshold_segment_num, "10");
Expand Down
10 changes: 5 additions & 5 deletions docs/en/docs/admin-manual/config/be-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
}
---

<!--
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
Expand Down Expand Up @@ -246,7 +246,7 @@ There are two ways to configure BE configuration items:

* Description: This configuration is mainly used to modify the parameter `socket_max_unwritten_bytes` of brpc.
- Sometimes the query fails and an error message of `The server is overcrowded` will appear in the BE log. This means there are too many messages to buffer at the sender side, which may happen when the SQL needs to send large bitmap value. You can avoid this error by increasing the configuration.

#### `transfer_large_data_by_brpc`

* Type: bool
Expand Down Expand Up @@ -279,7 +279,7 @@ There are two ways to configure BE configuration items:

* Type: string
* Description:This configuration indicates the service model used by FE's Thrift service. The type is string and is case-insensitive. This parameter needs to be consistent with the setting of fe's thrift_server_type parameter. Currently there are two values for this parameter, `THREADED` and `THREAD_POOL`.

- If the parameter is `THREADED`, the model is a non-blocking I/O model.

- If the parameter is `THREAD_POOL`, the model is a blocking I/O model.
Expand Down Expand Up @@ -628,8 +628,8 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
#### `enable_segcompaction`

* Type: bool
* Description: Enable to use segment compaction during loading
* Default value: false
* Description: Enable to use segment compaction during loading to avoid -238 error
* Default value: true

#### `segcompaction_threshold_segment_num`

Expand Down
4 changes: 2 additions & 2 deletions docs/en/docs/faq/data-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ This error usually occurs during data import operations. The error code is -235.

This error is usually caused by the import frequency being too high, which is greater than the compaction speed of the backend data, causing versions to pile up and eventually exceed the limit. At this point, we can first pass the show tablet 27306172 statement, and then execute the show proc statement in the result to check the status of each copy of the tablet. The versionCount in the result represents the number of versions. If you find that a copy has too many versions, you need to reduce the import frequency or stop importing and observe whether the number of versions drops. If the number of versions does not decrease after the import is stopped, you need to go to the corresponding BE node to view the be.INFO log, search for the tablet id and compaction keyword, and check whether the compaction is running normally. For compaction tuning, you can refer to the ApacheDoris official account article: Doris Best Practices - Compaction Tuning (3)

The -238 error usually occurs when the same batch of imported data is too large, resulting in too many Segment files for a tablet (default is 200, controlled by the BE parameter `max_segment_num_per_rowset`). At this time, it is recommended to reduce the amount of data imported in one batch, or appropriately increase the BE configuration parameter value to solve the problem.
The -238 error usually occurs when the same batch of imported data is too large, resulting in too many Segment files for a tablet (default is 200, controlled by the BE parameter `max_segment_num_per_rowset`). At this time, it is recommended to reduce the amount of data imported in one batch, or appropriately increase the BE configuration parameter value to solve the problem. Since version 2.0, users can enable segment compaction feature to reduce segment file number by setting `enable_segcompaction=true` in BE config.

### Q5. tablet 110309738 has few replicas: 1, alive backends: [10003]

Expand Down Expand Up @@ -152,7 +152,7 @@ broker_timeout_ms = 10000

Adding parameters here requires restarting the FE service.

### Q11. [ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}
### Q11. [ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}

The reason for this problem is that Kafka's cleanup policy defaults to 7 days. When a routine load task is suspended for some reason and the task is not restored for a long time, when the task is resumed, the routine load records the consumption offset, and This problem occurs when kafka has cleaned up the corresponding offset

Expand Down
10 changes: 5 additions & 5 deletions docs/zh-CN/docs/admin-manual/config/be-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
}
---

<!--
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
Expand Down Expand Up @@ -190,7 +190,7 @@ BE 重启后该配置将失效。如果想持久化修改结果,使用如下
* 描述:当BE启动时,会检查``storage_root_path`` 配置下的所有路径。

- `ignore_broken_disk=true`

如果路径不存在或路径下无法进行读写文件(坏盘),将忽略此路径,如果有其他可用路径则不中断启动。

- `ignore_broken_disk=false`
Expand Down Expand Up @@ -642,8 +642,8 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
#### `enable_segcompaction`

* 类型:bool
* 描述:在导入时进行 segment compaction 来减少 segment 数量
* 默认值:false
* 描述:在导入时进行 segment compaction 来减少 segment 数量, 以避免出现写入时的 -238 错误
* 默认值:true

#### `segcompaction_threshold_segment_num`

Expand Down Expand Up @@ -1303,7 +1303,7 @@ load tablets from header failed, failed tablets size: xxx, path=xxx
#### `jvm_max_heap_size`

* 类型:string
* 描述:BE 使用 JVM 堆内存的最大值,即 JVM 的 -Xmx 参数
* 描述:BE 使用 JVM 堆内存的最大值,即 JVM 的 -Xmx 参数
* 默认值:1024M

</version>
Expand Down
6 changes: 3 additions & 3 deletions docs/zh-CN/docs/faq/data-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Unique Key模型的表是一个对业务比较友好的表,因为其特有的

这个错误通常是因为导入的频率过高,大于后台数据的compaction速度,导致版本堆积并最终超过了限制。此时,我们可以先通过show tablet 27306172 语句,然后执行结果中的 show proc 语句,查看tablet各个副本的情况。结果中的 versionCount即表示版本数量。如果发现某个副本的版本数量过多,则需要降低导入频率或停止导入,并观察版本数是否有下降。如果停止导入后,版本数依然没有下降,则需要去对应的BE节点查看be.INFO日志,搜索tablet id以及 compaction关键词,检查compaction是否正常运行。关于compaction调优相关,可以参阅 ApacheDoris 公众号文章:Doris 最佳实践-Compaction调优(3)

-238 错误通常出现在同一批导入数据量过大的情况,从而导致某一个 tablet 的 Segment 文件过多(默认是 200,由 BE 参数 `max_segment_num_per_rowset` 控制)。此时建议减少一批次导入的数据量,或者适当提高 BE 配置参数值来解决。
-238 错误通常出现在同一批导入数据量过大的情况,从而导致某一个 tablet 的 Segment 文件过多(默认是 200,由 BE 参数 `max_segment_num_per_rowset` 控制)。此时建议减少一批次导入的数据量,或者适当提高 BE 配置参数值来解决。在2.0版本及以后,可以通过打开 segment compaction 功能来减少 Segment 文件数量(BE config 中 enable_segcompaction=true)。

### Q5. tablet 110309738 has few replicas: 1, alive backends: [10003]

Expand Down Expand Up @@ -92,7 +92,7 @@ Unique Key模型的表是一个对业务比较友好的表,因为其特有的

可以升级到 Doris 0.15 及之后的版本,已修复这个问题。

### Q8. 执行导入、查询时报错-214
### Q8. 执行导入、查询时报错-214

在执行导入、查询等操作时,可能会遇到如下错误:

Expand Down Expand Up @@ -150,7 +150,7 @@ broker_timeout_ms = 10000

这里添加参数,需要重启 FE 服务。

### Q11.[ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}
### Q11.[ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}

出现这个问题的原因是因为kafka的清理策略默认为7天,当某个routine load任务因为某种原因导致任务暂停,长时间没有恢复,当重新恢复任务的时候routine load记录了消费的offset,而kafka的清理策略已经清理了对应的offset,就会出现这个问题

Expand Down

0 comments on commit 16cdd9e

Please sign in to comment.