fix port event bug (EPROT-6) #130

zyk6271 · 2025-07-14T02:14:00Z

Description

Testing

Checklist

Before submitting a Pull Request, please ensure the following:

🚨 This PR does not introduce breaking changes.
All CI checks (GH Actions) pass.
Documentation is updated as needed.
Tests are updated or added as necessary.
Code is well-commented, especially in complex areas.
Git history is clean — commits are squashed to the minimum necessary.

CLAassistant · 2025-07-14T02:14:06Z

All committers have signed the CLA.

zyk6271 · 2025-08-25T06:41:19Z

这有什么冲突请问

alisitsyn · 2025-08-26T08:08:40Z

这有什么冲突请问

Thank you for contribution and patience.

In spite the fix looks simple and clear, but it may cause a serious issue with other configurations.

This behavior depends on the configured priority of the Modbus tasks, user application tasks, and other system tasks, and also the setting of CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND. This may not work as expected and can cause a serious issue when some group events are incorrectly cleared after processing in poll task. I think the existing way of clearing the events by using the xEventGroupWaitBits() function and setting its xClearOnExit parameter to pdTRUE should be kept. This is the most common and safest way to wait for event bits and then automatically clear them in a single, atomic operation, which prevents race conditions.

zyk6271 · 2025-08-27T08:40:25Z

你可以试下，当工作在master模式跟多个slaver进行通讯时，如果通过RS485工具模拟一下混入了干扰数据，会造成后面的事件全部错位，你可以看下下面两张图，一张是混入干扰数据后通讯就会一直失败了，一张是正常通讯的日志

alisitsyn · 2025-08-27T14:03:18Z

@zyk6271,

Please additionally send your sdkconfig file. Thanks.

zyk6271 · 2025-08-28T00:44:24Z

sdkconfig.txt
在附件里

alisitsyn · 2025-08-28T08:22:42Z

@zyk6271,

According to log your sdkconfig is misleading (in the code the response time is 1000 ms in the sdkconfig is 5000 ms). I guess you override it in master. The situation is typical modbus race condition when you set the response time to 1000ms but your actual slave processing time is a bit longer 1012 ms. This causes the response comes from slave a bit later when the master starts next transaction and as a result the master drops the transaction because new one is started the next response is also will be discarded because during the actual transaction the master gets two responses from slave. In this situation the fix above is just local solution because with a bit changed settings and priorities if the slave respond a bit later the result will be the same.

This is misconfiguration issue and if you don't change the CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND to correct value you will have the same result.
So, the CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND parameter in the sdkconfig file or the corresponded field in the configuration structure must be set to a value that is greater than the longest possible processing time of the slave. Setting this to 5000 ms in sdkconfig while the code overrides it to 1000 ms is the direct cause of the problem.

Please read the important-configuration-aspects. As I said before the fix you proposed is just a local solution that may cause master to discard the first response (expired) but receive the second one. This may cause the situation when the actual response will be skipped instead. Since the master poll code is shared between all masters the fix is not safe and can cause the additional issues later.
This race condition is typical for Modbus due to Modbus protocol specifics. I will think about safer approach in addition to the already implemented in master (injected sequence number corresponded to timestamp).

zyk6271 · 2025-08-28T08:38:44Z

现在的参数配置确实是1000，见以下代码， mb_communication_info_t comm = {
.ser_opts.port = MB_PORT_NUM,
.ser_opts.mode = MB_RTU,
.ser_opts.baudrate = MB_DEV_SPEED,
.ser_opts.parity = MB_PARITY_NONE,
.ser_opts.uid = 0,
.ser_opts.response_tout_ms = 1000,
.ser_opts.data_bits = UART_DATA_8_BITS,
.ser_opts.stop_bits = UART_STOP_BITS_1
};，你的意思是response_tout_ms 这个值改为2000，CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND也改为2000是吗

alisitsyn · 2025-08-28T08:46:44Z

The line .ser_opts.response_tout_ms = 1000, overrides the master timeout to 1000 ms, so it is enough to change it in this line only to ~2300ms. This should solve this issue.

zyk6271 · 2025-08-28T08:48:41Z

好的，我试试，谢谢

zyk6271 · 2025-08-28T08:59:31Z

刚才试了下，.ser_opts.response_tout_ms = 2300，CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND=5000以及CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND=2000以及，仍然可以复现该问题，在出现这行代码之后就不行了，mb_object.master: 0x3fcb171c, drop data received outside of transaction.

alisitsyn · 2025-08-28T09:36:24Z

Could you send me debug log when the issue happens and if possible the communication traffic between master and slave?
Which Modbus slave you are using? Does it happen with every request or only every second request?

zyk6271 · 2025-08-29T00:55:57Z

modbus_log.txt
chip_log.txt
日志请看附件，在modbus发送FFFFFFFFFFFFFFF模拟干扰后，就可以复现BUG，你应该也可以复现的

alisitsyn · 2025-08-29T07:26:36Z

@zyk6271,
Thank you for the logs. I will try to check this as soon as possible and will let you know.

zyk6271 · 2025-09-02T01:03:12Z

请问有进展吗

alisitsyn · 2025-09-03T08:28:59Z

Sorry for the late reply. I have many other activities and the response may take some time.
I checked your partial logs and can see some stress testing aspects there. You have some unresponsive nodes and send incorrect frames as well. The stack should able to skip these frames and repair the communication from next correct frame.

Request: [08:52:54.544]收←◆02 03 00 00 00 15 84 36
Response + incorrect next request [08:52:54.602]收←◆02 03 2A 00 00 00 00 00 01 00 01 00 00 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 FF AA 00 20 00 06 C5 E0
The following "00 20 00 06 C5 E0 " is the next request which should be: "03 03 00 20 00 06 c5 e0".
It is not clear why it was cut like this because the log says that the request was correctly sent out by port level.
As per target log it looks like the device is not able to repair from error and reports the ESP_ERR_INVALID_RESPONSE in spite of correct response. Is this correct? Does this apply to all following requests on your side?

This issue is clear for me. Your slave sends the wrong data outside of transaction at any time (FF FF FF FF, can be expired response from slave). This causes the EV_ERROR_PROCESS event generated outside of transaction and the corresponded event bit is set. If the wrong event is not cleared the next transaction will be completed with fail immediately even if the data does not come from slave. This then should clear all event bits and next transaction should be handled normally if the proper data comes from slave. If the slave response is set incorrectly this sequence with expired response can be repeated forever. I agree this issue should be solved and approve your solution. The only note that the current approach may cause some additional issues. I will try to find appropriate solution to cover all possible issues. Thank you for your contribution. I will return to it once possible.

alisitsyn · 2025-10-06T08:54:38Z

@zyk6271 ,

Sorry for the late response. The workaround for the issue was implemented and should solve the issue you observed.
The component v2.1.1 includes the fix.
It is recommended to wait the feature freeze of esp-idf v6.0 for stable results.
Please let me know if you have any comments.
Thanks.

zyk6271 · 2025-10-21T02:49:38Z

Hi, I will test later and let you know the result. Thanks for your work, Rick ***@***.*** From: Alex Lisitsyn Date: 2025-10-06 16:55 To: espressif/esp-modbus CC: 少年老王; Mention Subject: Re: [espressif/esp-modbus] fix port event bug (EPROT-6) (PR #130) alisitsyn left a comment (espressif/esp-modbus#130) @zyk6271 , Sorry for the late response. The workaround for the issue was implemented and should solve the issue you observed. The component v2.1.1 includes the fix. It is recommended to wait the feature freeze of esp-idf for stable results. Please let me know if you have any comments. Thanks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

fix port event bug

575ced7

github-actions bot changed the title ~~fix port event bug~~ fix port event bug (EPROT-6) Oct 1, 2025

fix port event bug (EPROT-6) #130

Are you sure you want to change the base?

fix port event bug (EPROT-6) #130

Uh oh!

Conversation

zyk6271 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related

Testing

Checklist

Uh oh!

CLAassistant commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Aug 25, 2025

Uh oh!

alisitsyn commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alisitsyn commented Aug 27, 2025

Uh oh!

zyk6271 commented Aug 28, 2025

Uh oh!

alisitsyn commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Aug 28, 2025

Uh oh!

alisitsyn commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Aug 28, 2025

Uh oh!

zyk6271 commented Aug 28, 2025

Uh oh!

alisitsyn commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Aug 29, 2025

Uh oh!

alisitsyn commented Aug 29, 2025

Uh oh!

zyk6271 commented Sep 2, 2025

Uh oh!

alisitsyn commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alisitsyn commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyk6271 commented Oct 21, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyk6271 commented Jul 14, 2025 •

edited

Loading

CLAassistant commented Jul 14, 2025 •

edited

Loading

alisitsyn commented Aug 26, 2025 •

edited

Loading

zyk6271 commented Aug 27, 2025 •

edited

Loading

alisitsyn commented Aug 28, 2025 •

edited

Loading

alisitsyn commented Aug 28, 2025 •

edited

Loading

alisitsyn commented Aug 28, 2025 •

edited

Loading

alisitsyn commented Sep 3, 2025 •

edited

Loading

alisitsyn commented Oct 6, 2025 •

edited

Loading