Skip to content

Conversation

Elektordi
Copy link

SUMMARY

I was having timeout problems on IOS XE and IOS XR devices with very long configurations (more than 50k lines / 2M bytes). On those same devices, a "show running-config" (with pager disabled) over plain old openssh client takes about 10-15 seconds.
Even for smaller configurations (20k lines / 700k bytes) I had to change command_timeout up to 300 for my ansible playbook to success.

After some debug, I found that the problem is in the regex search for errors and prompt in network_cli, which is done on the full buffer, after each 4096 bytes from the wire, so it takes an exponential time to parse it.
My change is to only do regex search on the last 1k of the buffer, I think there is no chance of a prompt or error longer than 1k.
It prevents timeout issues, but also greatly speed-up many tasks on those devices.

Currently, for the proof-of-concept, I hardcoded the limit in the source code, but if someone think it could benefit from being an option, I could look on changing that.

My initial issue: https://forum.ansible.com/t/ios-config-and-iosxr-config-for-very-long-configs-50k-lines/40757

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

network_cli

ADDITIONAL INFORMATION

Before changes:

2025-03-05 16:44:30,117 p=49599 u=elektordi n=ansible | jsonrpc request: b'{"jsonrpc": "2.0", "method": "run_commands", "id": "05b1a0f7-642a-4a03-92cb-ae5186b37d5c", "params": [[], {"commands": ["show running-config"], "check_rc": false}]}'
2025-03-05 16:49:30,642 p=49599 u=elektordi n=ansible | command timeout triggered, timeout value is 300 secs.
See the timeout setting options in the Network Debug and Troubleshooting Guide.
2025-03-05 16:49:30,644 p=49599 u=elektordi n=ansible | Traceback (most recent call last):
  File "/home/elektordi/.local/lib/python3.10/site-packages/ansible/utils/jsonrpc.py", line 45, in handle_request
    result = rpc_method(*args, **kwargs)
  File "/home/elektordi/.ansible/collections/ansible_collections/cisco/ios/plugins/cliconf/ios.py", line 537, in run_commands
    out = self.send_command(**cmd)
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/plugin_utils/cliconf_base.py", line 148, in send_command
    resp = self._connection.send(**kwargs)
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 345, in wrapped
    return func(self, *args, **kwargs)
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 1005, in send
    response = self.receive(
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 946, in receive
    response = self.receive_libssh(
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 909, in receive_libssh
    if self._find_prompt(resp):
  File "/home/elektordi/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 1152, in _find_prompt
    match = stdout_regex.search(response)
  File "/home/elektordi/.local/lib/python3.10/site-packages/ansible/cli/scripts/ansible_connection_cli_stub.py", line 186, in command_timeout
    raise Exception(msg)
Exception: command timeout triggered, timeout value is 300 secs.
See the timeout setting options in the Network Debug and Troubleshooting Guide.

After change: OK, no error

I also did some timing checks on some devices using this command:
time ansible -vvvv -i inventory.yaml --playbook-dir . <DEVICE> -m cisco.ios.ios_command -a '{"commands":["sh runn"]}' -c ansible.netcommon.network_cli

Device with config of 55967 lines / 2188377 bytes
Before: Timeout
After: Ok in 25s

Device with config of 24651 lines / 717168 bytes
Before: Ok in 1m16s
After: Ok in 13s

Device with config of 2740 lines / 67080 bytes
Before: Ok in 13s
After: Ok in 11s

@KB-perByte
Copy link
Collaborator

Hey @Elektordi, Thanks for the contribution. I would keep an eye on the PR to check if it impacts anything else.

@pinquin87
Copy link

Hi @KB-perByte, we are running into the same issue with timeouts when running some commands on Cisco wireless controllers. When running the commands 'show ap config general' or 'show ap config slots' on WLC's with lot of AP's, we receive a timeout similar to @Elektordi.
Tested with @Elektordi fix, and this solves the issue. But this will break when updating Ansible, soit would be nice if this fix could be in a release someday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants