Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_the_etcdv2_1000ErrorCodeEventIndexCleared_bug #1141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liucimin
Copy link

@liucimin liucimin commented Jun 4, 2018

Description of the changes

Type of fix: Bug Fix

Fixes #1140

Please describe:

Add the protection for the watcher.
When the watcher get the ErrorCodeEventIndexCleared from the etcd,we should create a new watcher
to catch Future events。

  • type of testing done (both manual and automated)
    manual test in my env.

TODO

  • Tests
    1.Create a new network for the contiv.
    Result : make sure the pod can be create by the netplugin.

2.Use the contiv until the etcd index 1000 larger than step 1.
Result : use the curl to get the etcd's index.

3.Interrupt the network between netplugin and etcd.
Result: can see "Error client:etcd cluster is unavailable or misconfigured during watch" in the netplugin log.

4.Resume the etwork between netplugin and etcd.
Result: can see "Error 401: The event in requested index is outdated and cleared (*) during watch" in the netplugin log.

5.Create a new network for the contiv.
Result: create succeed.
6.Create new pods.
Result : make sure the pod can be create by the new network.

  • Documentation

@liucimin
Copy link
Author

liucimin commented Jun 4, 2018

@dseevr

switch err.(type) {
case *client.ClusterError:
// retry and wait for etcd cluster to recover!
time.Sleep(time.Second * 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose this value for the sleep?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value for the sleep can be set any value.But this decides how many times the watcher will send request to the etcd until the cluster being recover.
The etcd cluster may be recovered any time after it broken.So i choose the value from my experiences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The etcdv2 Watcher bug in the contiv.
2 participants