Skip to content

submariner: make start hook idempotent with ensure model#2502

Open
raghavendra-talur wants to merge 1 commit intoRamenDR:mainfrom
raghavendra-talur:rtalur-fix-submariner-install
Open

submariner: make start hook idempotent with ensure model#2502
raghavendra-talur wants to merge 1 commit intoRamenDR:mainfrom
raghavendra-talur:rtalur-fix-submariner-install

Conversation

@raghavendra-talur
Copy link
Copy Markdown
Member

Refactor the submariner start hook to check whether the broker and cluster joins are already healthy before re-running them. This avoids the "existing joined cluster with the same ID" error when re-running start on a partially deployed environment.

  • Split deploy_broker/join_cluster into is_/do_/ensure_* functions
  • Add are_deployments_available() to check deployment health
  • Add clean_broker_registration() to remove stale broker-side state (clusters.submariner.io and endpoints.submariner.io) before re-joining
  • Add subctl.uninstall() wrapper in drenv/subctl.py
  • Fix typo: "deployuments" -> "deployments"

Assisted-by: Claude Code/claude-opus-4-6

Refactor the submariner start hook to check whether the broker and
cluster joins are already healthy before re-running them. This avoids
the "existing joined cluster with the same ID" error when re-running
start on a partially deployed environment.

- Split deploy_broker/join_cluster into is_*/do_*/ensure_* functions
- Add are_deployments_available() to check deployment health
- Add clean_broker_registration() to remove stale broker-side state
  (clusters.submariner.io and endpoints.submariner.io) before re-joining
- Add subctl.uninstall() wrapper in drenv/subctl.py
- Fix typo: "deployuments" -> "deployments"

Assisted-by: Claude Code/claude-opus-4-6
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
@raghavendra-talur raghavendra-talur removed the request for review from parikshithb April 8, 2026 17:45
def deploy_broker(broker):
print(f"Waiting until broker '{broker}' is ready")
drenv_cluster.wait_until_ready(broker)
def is_broker_deployed(broker, broker_info):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this should be a bit more robust and could include a broker health validation and the validity of the broker info file. A corrupted or stale broker info file could cause the function to skip deployment when it shouldn't.

)


BROKER_NAMESPACE = "submariner-k8s-broker"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is the best practice placement for this const?

pass # Not found is fine.


def are_deployments_available(cluster, names, namespace):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it doesn't distinguish between deployment doesn't exist and deployment exists but isn't available

Comment on lines +178 to +189
for name in names:
try:
out = kubectl.get(
f"deploy/{name}",
f"--namespace={namespace}",
"--output=jsonpath={.status.conditions[?(@.type=='Available')].status}",
context=cluster,
)
if out.strip() != "True":
return False
except Exception:
return False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be optimized to minimize kubectl.get calls by fetching all deployments in advance and then checking against each one?

@nirs
Copy link
Copy Markdown
Member

nirs commented Apr 9, 2026

This avoids the "existing joined cluster with the same ID" error when re-running start on a partially deployed environment.

I never seen such issue - how do you reproduce it?

The start script should already be idempotent, deploying submariner twice works.

@nirs
Copy link
Copy Markdown
Member

nirs commented May 4, 2026

@raghavendra-talur I just tried and submariner is idempotent:

% drenv start envs/submariner.yaml 
2026-05-04 16:57:59,926 INFO    [submariner] Starting environment
2026-05-04 16:57:59,974 INFO    [hub] Starting minikube cluster
2026-05-04 16:57:59,978 INFO    [dr1] Starting minikube cluster
2026-05-04 16:57:59,984 INFO    [dr2] Starting minikube cluster
2026-05-04 16:58:15,424 INFO    [dr2] Cluster started in 15.44 seconds
2026-05-04 16:58:15,764 INFO    [dr2] Configuring containerd
2026-05-04 16:58:18,643 INFO    [hub] Cluster started in 18.67 seconds
2026-05-04 16:58:18,978 INFO    [hub] Configuring containerd
2026-05-04 16:58:20,070 INFO    [hub/0] Running addons/submariner/start
2026-05-04 16:58:21,701 INFO    [dr1] Cluster started in 21.72 seconds
2026-05-04 16:58:22,044 INFO    [dr1] Configuring containerd
2026-05-04 16:59:19,887 INFO    [hub/0] addons/submariner/start completed in 59.82 seconds
2026-05-04 16:59:19,887 INFO    [hub/0] Running addons/submariner/test
2026-05-04 16:59:39,860 INFO    [hub/0] addons/submariner/test completed in 19.97 seconds
2026-05-04 16:59:39,861 INFO    [submariner] Environment started in 99.93 seconds

% drenv start envs/submariner.yaml
2026-05-04 17:02:09,311 INFO    [submariner] Starting environment
2026-05-04 17:02:09,629 INFO    [dr1] Starting minikube cluster
2026-05-04 17:02:09,634 INFO    [dr2] Starting minikube cluster
2026-05-04 17:02:09,649 INFO    [hub] Starting minikube cluster
2026-05-04 17:02:32,565 INFO    [dr1] Cluster started in 22.94 seconds
2026-05-04 17:02:32,678 INFO    [dr1] Waiting for fresh status
2026-05-04 17:02:39,585 INFO    [hub] Cluster started in 29.94 seconds
2026-05-04 17:02:39,664 INFO    [hub] Waiting for fresh status
2026-05-04 17:02:40,713 INFO    [dr2] Cluster started in 31.08 seconds
2026-05-04 17:02:40,788 INFO    [dr2] Waiting for fresh status
2026-05-04 17:03:02,671 INFO    [dr1] Looking up failed deployments
2026-05-04 17:03:09,664 INFO    [hub] Looking up failed deployments
2026-05-04 17:03:10,002 INFO    [hub/0] Running addons/submariner/start
2026-05-04 17:03:10,780 INFO    [dr2] Looking up failed deployments
2026-05-04 17:03:57,613 INFO    [hub/0] addons/submariner/start completed in 47.61 seconds
2026-05-04 17:03:57,613 INFO    [hub/0] Running addons/submariner/test
2026-05-04 17:04:16,690 INFO    [hub/0] addons/submariner/test completed in 19.08 seconds
2026-05-04 17:04:16,690 INFO    [submariner] Environment started in 127.39 seconds

Can you explain how to reproduce the issue you are trying to fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants