Improve time_scenarios stability: fix undefined exception handling and enhance shell detection fallback (ref #1211)#1225
Conversation
Review Summary by QodoFix time_scenarios stability and add comprehensive chaos template library
WalkthroughsDescription• Fixed undefined exception handling in time_scenarios with proper exception binding • Implemented shell detection fallback mechanism for pod command execution • Added comprehensive chaos template library with 9 pre-configured scenarios • Enhanced error handling and logging throughout time actions plugin Diagramflowchart LR
A["time_actions_scenario_plugin.py"] -->|"Add shell fallback"| B["detect_available_shell()"]
A -->|"Add shell fallback"| C["exec_with_shell_fallback()"]
A -->|"Fix exceptions"| D["Proper exception binding"]
E["template_manager.py"] -->|"Create templates"| F["ChaosTemplate class"]
E -->|"Manage templates"| G["TemplateManager class"]
H["run_kraken.py"] -->|"Route commands"| E
I["krkn-template"] -->|"CLI wrapper"| E
J["Template Library"] -->|"9 scenarios"| K["pod-failure, node-failure, network-latency, cpu-stress, disk-stress, pod-kill, container-restart, vm-outage, resource-failure"]
File Changes1. krkn/scenario_plugins/time_actions/time_actions_scenario_plugin.py
|
Code Review by Qodo
1. Template scenario schema invalid
|
3ae2384 to
24270a5
Compare
Review:Thank you for addressing the production bug from #1211! ✅ What Works Well
Possible issues1. Shell Quote Escaping MissingFile: Problem: Commands with quotes will break: # Current code
wrapped_command = f"{shell} -c '{command}'"
# If command = "echo 'hello world'"
# Results in: /bin/sh -c 'echo 'hello world'' # Syntax error!Fix: import shlex
# In exec_with_shell_fallback()
wrapped_command = f"{shell} -c {shlex.quote(command)}"2. Unhandled Fallback FailureFile: Problem: If Scenario: # pod_exec() detects shell error
response = "impossible to determine the shell to run command"
# Calls fallback
exec_with_shell_fallback()
→ detects shell = "/bin/sh"
→ wraps command: "/bin/sh -c 'date'"
→ kubecli.exec_cmd_in_pod() still fails with same error
→ Returns: "impossible to determine the shell..." # Same error!
# User gets same error - unclear if fallback was even attemptedFix - Add check in exec_with_shell_fallback(): for i in range(5):
response = kubecli.exec_cmd_in_pod(wrapped_command, pod_name, namespace, container_name)
if not response:
time.sleep(2)
continue
elif "unauthorized" in response.lower() or "authorization" in response.lower():
time.sleep(2)
continue
# ADD THIS CHECK ↓
elif "impossible to determine the shell" in response.lower():
logging.error(
f"Shell fallback failed: even with detected shell {shell}, "
f"still cannot execute command in pod {pod_name}. "
f"This indicates a deeper issue with pod exec permissions or krkn-lib."
)
return False # Give up - fallback didn't help
elif "exec failed" in response.lower() or "error" in response.lower():
logging.debug(f"Command execution attempt {i+1}/5 failed: {response}")
time.sleep(2)
continue
else:
return response3. Command Double-WrappingFile: Problem: If the original command is already shell-wrapped, you'll create nested shells: # Original command
command = "/bin/bash -c 'date'"
# After wrapping
wrapped = "/bin/sh -c '/bin/bash -c 'date''" # Nested shells!Fix: # In exec_with_shell_fallback()
if isinstance(command, list):
command = " ".join(command)
# Check if already wrapped with a shell
if command.strip().startswith(("/bin/bash", "/bin/sh", "/busybox/sh")):
# Already wrapped, use as-is
wrapped_command = command
logging.debug(f"Command already shell-wrapped: {command}")
else:
# Need to wrap with detected shell
wrapped_command = f"{shell} -c {shlex.quote(command)}"
logging.debug(f"Wrapped command with {shell}: {wrapped_command}")4. Missing Unit TestsProblem: No tests added for new methods. Required tests in def test_detect_available_shell_finds_bash(self):
"""Test shell detection finds /bin/bash"""
kubecli_mock = MagicMock()
kubecli_mock.exec_cmd_in_pod.return_value = "test"
shell = self.plugin.detect_available_shell("pod1", "ns1", "container1", kubecli_mock)
self.assertEqual(shell, "/bin/bash")
def test_detect_available_shell_fallback_to_sh(self):
"""Test falls back to /bin/sh when bash unavailable"""
kubecli_mock = MagicMock()
# First call (bash) fails, second call (sh) succeeds
kubecli_mock.exec_cmd_in_pod.side_effect = [
"bash: not found", # bash fails
"test" # sh succeeds
]
shell = self.plugin.detect_available_shell("pod1", "ns1", "container1", kubecli_mock)
self.assertEqual(shell, "/bin/sh")
def test_detect_available_shell_no_shell_available(self):
"""Test returns None when no shells available"""
kubecli_mock = MagicMock()
kubecli_mock.exec_cmd_in_pod.side_effect = Exception("No shell found")
shell = self.plugin.detect_available_shell("pod1", "ns1", "container1", kubecli_mock)
self.assertIsNone(shell)
def test_exec_with_shell_fallback_handles_quotes(self):
"""Test proper escaping of commands with quotes"""
kubecli_mock = MagicMock()
kubecli_mock.exec_cmd_in_pod.return_value = "test"
# Command with quotes should not break
with patch.object(self.plugin, 'detect_available_shell', return_value='/bin/sh'):
result = self.plugin.exec_with_shell_fallback(
"pod1", "echo 'hello world'", "ns1", "container1", kubecli_mock
)
self.assertIsNotNone(result)
# Verify shlex.quote was used properly
def test_pod_exec_triggers_fallback_on_shell_error(self):
"""Test pod_exec activates fallback when seeing shell error"""
kubecli_mock = MagicMock()
kubecli_mock.exec_cmd_in_pod.return_value = "impossible to determine the shell to run command"
with patch.object(self.plugin, 'exec_with_shell_fallback', return_value="success") as fallback_mock:
result = self.plugin.pod_exec("pod1", "date", "ns1", "container1", kubecli_mock)
fallback_mock.assert_called_once()
self.assertEqual(result, "success")
def test_exec_with_shell_fallback_detects_persistent_shell_error(self):
"""Test fallback detects when shell error persists"""
kubecli_mock = MagicMock()
# Shell detection succeeds, but execution still fails with shell error
kubecli_mock.exec_cmd_in_pod.side_effect = [
"test", # Shell detection succeeds
"impossible to determine the shell" # But execution fails
]
with patch.object(self.plugin, 'detect_available_shell', return_value='/bin/sh'):
result = self.plugin.exec_with_shell_fallback(
"pod1", "date", "ns1", "container1", kubecli_mock
)
self.assertFalse(result)5. Shell Detection Efficiency ConcernPotential alternative: # Test shell existence without executing it
test_command = f"test -x {shell} && echo exists || echo missing"This might be more reliable in edge cases, but current approach should work for most scenarios. 🎯 Suggestion
🔍 Test Plan Suggestion
Great work identifying this production issue! 🎉 Looking forward to your feedback. |
|
Thanks for the detailed review! Addressed all the mentioned issues:
All tests are passing successfully. Please let me know if anything else needs improvement. |
Excellent Work - Minor Test Fixes RequiredThis PR has been significantly improved and addresses all previously identified concerns. The implementation is production-ready, but there are 3 failing unit tests that need mock configuration fixes. ❌ CI Test Failures (3 tests) FAIL: test_exec_with_shell_fallback_detects_persistent_shell_error FAIL: test_exec_with_shell_fallback_fails_after_max_retries FAIL: test_exec_with_shell_fallback_retries_on_error Verification After fixing, verify with: All three should pass. Required before merge:
Once tests pass: ✅ READY TO MERGE Great work on addressing all feedback! The implementation is solid - just needs test mock adjustments. |
0b46af1 to
bbb1a93
Compare
- Add exec_with_shell_fallback method with retry logic and shell fallback - Add unit tests for the new method with proper mocking - All tests now pass as expected Signed-off-by: NITESH SINGH <[email protected]>
bbb1a93 to
8e0864d
Compare
Description
This PR improves the stability and robustness of
time_scenariosexecution by addressing edge cases observed in issue #1211.While the primary fix was introduced in #1198, there are still gaps in error handling and shell detection that can lead to runtime failures in certain environments.
Changes
1. Fix: Undefined exception variable
name 'e' is not defined)except Exception as e2. Enhancement: Shell detection fallback
/bin/bash/bin/sh/busybox/sh#Why this change?
In some environments, containers may not have
/bin/bash, causing:Additionally, improper exception handling can crash execution unexpectedly.
This PR ensures:
Impact
time_scenariosTesting
Notes