Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Cilium replace kube-proxy + update docs for latest cilium version #3908

Open
Smithx10 opened this issue Apr 11, 2024 · 11 comments

Comments

@Smithx10
Copy link

While attempting to replace KubeProxy with Cilium Kube Proxy I ran into the following questions:

Do I need to following the instructions here: https://kubeovn.github.io/docs/v1.12.x/en/advance/with-cilium/
From the kubeovn docs:
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.11.6
--namespace kube-system
--set cni.chainingMode=generic-veth
--set cni.customConf=true
--set cni.configMap=cni-configuration
--set tunnel=disabled
--set enableIPv4Masquerade=false
--set enableIdentityMark=false

Do I need to provide the additional cilium kubeProxyReplacement settings here?
helm install cilium cilium/cilium --version 1.15.3
--namespace kube-system
--set routingMode=native
--set kubeProxyReplacement=true
--set loadBalancer.aglorithm=maglev
--set loadBalancer.mode=dsr
--set loadBalancer.dsrDispatch=opt
--set k8sServiceHost=${API_SERVER_IP}
--set k8sServicePort=${API_SERVER_PORT}

@zhangzujian
Copy link
Member

If you want to use cilium v1.15.3, you can try the following steps (for IPv4 clusters):

  1. Create a k8s cluster without kube-proxy, or delete kube-proxy;
  2. Execute the following commands:
kubectl apply -f https://raw.githubusercontent.com/kubeovn/kube-ovn/master/yamls/cilium-chaining.yaml
helm repo add cilium https://helm.cilium.io/
helm repo update cilium
helm install cilium cilium/cilium --wait \
	--version 1.15.3 \
	--namespace kube-system \
	--set k8sServiceHost=${API_SERVER_IP} \
	--set k8sServicePort=${API_SERVER_PORT} \
	--set kubeProxyReplacement=partial \
	--set operator.replicas=1 \
	--set socketLB.enabled=true \
	--set nodePort.enabled=true \
	--set externalIPs.enabled=true \
	--set hostPort.enabled=false \
	--set routingMode=native \
	--set sessionAffinity=true \
	--set enableIPv4Masquerade=false \
	--set enableIPv6Masquerade=false \
	--set hubble.enabled=true \
	--set sctp.enabled=true \
	--set ipv4.enabled=true \
	--set ipv6.enabled=false \
	--set ipam.mode=cluster-pool \
	--set-json ipam.operator.clusterPoolIPv4PodCIDRList='["100.65.0.0/16"]' \
	--set-json ipam.operator.clusterPoolIPv6PodCIDRList='["fd00:100:65::/112"]' \
	--set cni.chainingMode=generic-veth \
	--set cni.chainingTarget=kube-ovn \
	--set cni.customConf=true \
	--set cni.configMap=cni-configuration
kubectl -n kube-system rollout status ds cilium
ENABLE_LB=false ENABLE_NP=false CNI_CONFIG_PRIORITY=10 WITHOUT_KUBE_PROXY=true bash dist/images/install.sh

This method has been tested in the master branch (v1.13.0), and should work for v1.12.x, too.

@zhangzujian
Copy link
Member

To experience cilium v1.15.3 + kube-ovn quickly, you can run the following commands (in the master branch):

make kind-init-cilium-chaining
make kind-install-cilium-chaining

@Smithx10
Copy link
Author

Is there a procedure to apply ENABLE_LB=false, ENABLE_NP=false, WITHOUT_KUBE_PROXY=true after an install?

What are these addresses used for? --set-json ipam.operator.clusterPoolIPv4PodCIDRList='["100.65.0.0/16"]'

We noticed our cluster asked for: ipv4-native-routing-cidr: x.x.x.x/y: Set the CIDR in which native routing can be performed..
Which addresses do folks use for this?

      --ipv4-native-routing-cidr string                           Allows to explicitly specify the IPv4 CIDR for native routing. When specified, Cilium assumes networking for this CIDR is preconfigured and hands traffic destined for that range to the Linux network stack without applying any SNAT. Generally speaking, specifying a native routing CIDR implies that Cilium can depend on the underlying networking stack to route packets to their destination. To offer a concrete example, if Cilium is configured to use direct routing and the Kubernetes CIDR is included in the native routing CIDR, the user must configure the routes to reach pods, either manually or by setting the auto-direct-node-routes flag.

@zhangzujian
Copy link
Member

Is there a procedure to apply ENABLE_LB=false, ENABLE_NP=false, WITHOUT_KUBE_PROXY=true after an install?

kube-ovn-controller parameters: --enable-lb/--enable-np。

What are these addresses used for? --set-json ipam.operator.clusterPoolIPv4PodCIDRList='["100.65.0.0/16"]'

Used for cilium host device. The value should be a new CIDR.

@Smithx10
Copy link
Author

We received some errors:

unable to determine direct routing device. Use --direct-routing-device to specify it

Do we need to create our own new devices in the host? Or should it be reusing host devices?

@Smithx10
Copy link
Author

Is it preferred to use Cilium for LB, Policy and Observation?

@zhangzujian
Copy link
Member

Are you installing cilium in a k8s cluster where kube-ovn has already been installed?

@zhangzujian
Copy link
Member

Is it preferred to use Cilium for LB, Policy and Observation?

Kube-OVN with cilium chainging is not a stable solution, althrough the combination has passed the k8s network e2e test suite.

@Smithx10
Copy link
Author

@zhangzujian, We currently are hoping to use Kube-OVN and Kubernetes to be used as our Internal Scheduler to Deliver an Internal Private Cloud solution for Virtual Machines / Containers. (Similar to OpenStack / Triton.

We'd like to know what is the most stable / performant / Operator friendly way to run without the need for HW offloading.

Are people using the cilium chaining in production?

We are told that Kube-Proxy can really effect performance and were looking at ways to avoid that from day 1, but don't want to introduce Cilium if it will be trouble in the future. Do you think there is a great risk in introducing Cilium for Proxy / Observation.... or is it not necessary and should just use Kube-OVN to avoid the performance issues around kube-proxy?

Your suggestions are very much welcome,
Thank You

@Smithx10
Copy link
Author

While trying to use Cilium Chaining I noticed that trying to deploy a pod on an underlay resulted in the gateway check timing out. Am I missing some simple configuration?

10m                 Warning   FailedCreatePodSandBox         Pod/t2                Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d618540e642604624ab2960d2d8ba01030324340b2d8d6123c65619323258dff": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"d618540e642604624ab2960d2d8ba01030324340b2d8d6123c65619323258dff" Netns:"/var/run/netns/cni-4af0f876-9e12-6886-f7c0-7343a01bc3da" IfName:"eth0" Args:"K8S_POD_NAME=t2;K8S_POD_INFRA_CONTAINER_ID=d618540e642604624ab2960d2d8ba01030324340b2d8d6123c65619323258dff;K8S_POD_UID=59419201-b1fc-4dc2-afed-7423b1f139ca;IgnoreUnknown=1;K8S_POD_NAMESPACE=default" Path:"" ERRORED: error configuring pod [default/t2] networking: [default/t2/59419201-b1fc-4dc2-afed-7423b1f139ca:generic-veth]: error adding container to network "generic-veth": plugin type="kube-ovn" failed (add): RPC failed; request ip return 500 configure nic failed network 10.91.237.1/24 with gateway 10.91.237.254 is not ready for interface eth0 after 200 checks: resolve MAC address of 10.91.237.254 timeout: read packet 00:00:00:93:b7:10: i/o timeout
': StdinData: {"capabilities":{"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/05-cilium.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}

@Smithx10
Copy link
Author

Smithx10 commented Apr 16, 2024

While attempting to debug the previous message, I wanted to uninstall the Cilium Integration, I noticed that wasn't documented. I attempted to helm uninstall cilium and remove the files from /etc/cni/net.d/00-multus.conf and 05-cilium.conf

Seems like now anything put on an underlay subnet does not get to the gateway.

I'd like to uninstall cilium integration and have it back to the normal functionality.

switch 88a16dca-c621-4801-b754-8393b7e78e16 (external2080)
    port t2.default
        addresses: ["00:00:00:AE:C3:21 10.91.237.1"]
    port localnet.external2080
        type: localnet
        tag: 2080
        addresses: ["unknown"]
switch 9a9d40e6-934b-487a-bbd7-98de0fa3cfce (external)
    port localnet.external
        type: localnet
        tag: 1998
        addresses: ["unknown"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants