Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh: fix panic if the etcd client cannot be created #32225

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

giorio94
Copy link
Member

The blamed commit anticipated the execution of the watchdog in charge of restarting the etcd connection to a remote cluster in case of errors. However, this can lead to a panic if the etcd client cannot be created (e.g., due to an invalid config file), as in that case the returned backend is nil, and the errors channel cannot be accessed.

Let's push again this logic below the error check, to make sure that the backend is always valid at that point. Yet, let's still watch for possible reconnections during the initial connection establishment phase, so that we immediately restart it in case of issues. Otherwise, this phase may hang due to the interceptor preventing the establishment to succeed, given that it would continue returning an error.

Fixes: 174e721 ("ClusterMesh: validate etcd cluster ID")

Marking as release-note/misc as the blamed commit has not yet been released. And marking for backport as it had been backported through #32005.

The blamed commit anticipated the execution of the watchdog in charge of
restarting the etcd connection to a remote cluster in case of errors.
However, this can lead to a panic if the etcd client cannot be created
(e.g., due to an invalid config file), as in that case the returned
backend is nil, and the errors channel cannot be accessed.

Let's push again this logic below the error check, to make sure that
the backend is always valid at that point. Yet, let's still watch
for possible reconnections during the initial connection establishment
phase, so that we immediately restart it in case of issues. Otherwise,
this phase may hang due to the interceptor preventing the establishment
to succeed, given that it would continue returning an error.

Fixes: 174e721 ("ClusterMesh: validate etcd cluster ID")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 29, 2024
@giorio94 giorio94 requested a review from thorn3r April 29, 2024 10:14
@giorio94 giorio94 requested a review from a team as a code owner April 29, 2024 10:14
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.5 Apr 29, 2024
@giorio94
Copy link
Member Author

/test

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch 👍

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 29, 2024
@sayboras sayboras added this pull request to the merge queue Apr 29, 2024
Merged via the queue into cilium:main with commit 58b74f5 Apr 29, 2024
65 checks passed
@pippolo84 pippolo84 mentioned this pull request May 6, 2024
14 tasks
@pippolo84 pippolo84 added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels May 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.15 in 1.15.5 May 6, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels May 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.15 to Backport done to v1.15 in 1.15.5 May 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport done to v1.15 in 1.15.5 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants