daemon: Run conntrack GC after Endpoint Restore #32012

joestringer · 2024-04-17T00:06:46Z

Relies on #32068 to debug test failures.

The reverse call tree for RestoreEndpoint, which exposes all restored
endpoints in the EndpointManager, is as follows:

INCOMING CALLS
- f RestoreEndpoint github.com/cilium/cilium/pkg/endpointmanager
  - f regenerateRestoredEndpoints github.com/cilium/cilium/pkg/endpointmanager
    - f initRestore github.com/cilium/cilium/daemon/cmd
      + f startDaemon github.com/cilium/cilium/daemon/cmd

Previously, the CTNATMapGC.Enable() call, which invokes
gc.endpointsManager.GetEndpoints(), would be called prior to exposing
these endpoints in the EndpointManager. As a result, the step where the
initial scan attempts to update each Endpoint's DNSHistory with the
latest CT GC timers would fail, leaving the timestamps empty.

The potential impact of this is that DNS entries that should expire soon
after a cilium-agent restart may not time out for an extra entire
conntrack garbage collection interval several minutes later.

joestringer · 2024-04-17T00:36:29Z

/test

squeed · 2024-04-17T12:17:40Z

Wow, easy.

Is this a mostly cosmetic change (as referenced in #32013), or is this also causing perceptible issues?

squeed · 2024-04-17T12:19:43Z

Because #31205 added an explicit revision counter, this change doesn't introduce any risk of GCing active connections. So that looks good 👍

joestringer · 2024-04-17T17:34:49Z

The main thing that I noticed with #32013 is that #32013 didn't work for the first several minutes after startup without this change. The reason is that #32013 relies on understanding the timing of previous and next conntrack GC events from this logic running against all endpoints. Because of the timing before this PR, the initial GC would not report those prev/next conntrack GC timestamps into the DNS history for each endpoint here (eps would be empty):

cilium/pkg/maps/ctmap/gc/gc.go

Lines 167 to 169 in a6b8538

    
           for _, e := range eps { 
        
           	e.MarkCTGCTime(gcStart) 
        
           }

In my local testing, if I waited another ~7m30s then the next conntrack GC would trigger and the DNS timers would be updated, then everything would work as expected.

The idea behind the conntrack GC on startup is to ensure that when Cilium starts, everything is as up-to-date and synchronized as it can be. So we trigger GC on startup, and for the most part this works. There's only a couple of things that rely on a correct list, (a) marking the alive time for individual connections in Endpoint.DNSZombies and (b) Setting the most recent run time for CT GC inside the Endpoint.DNSZombies. Both of these end up boiling down to a set of checks in the DNS cache GC which asserts that the connections are alive if the timestamp from (a) is older than the timestamp for (b). (a) is presumably restored on startup from a previous run, whereas (b) will remain zero during initial CT GC scan, so all entries are considered "alive" until the second CT GC run, when the timestamps will be refreshed. So it doesn't look like we will prematurely delete DNS entries on startup due to this bug.

$ git grep -C 1 lastCTGCUpdate
pkg/fqdn/cache.go-      // DNSZombieMappings.MarkAlive.
pkg/fqdn/cache.go:      // When AliveAt is later than DNSZombieMappings.lastCTGCUpdate the zombie is
pkg/fqdn/cache.go-      // considered alive.
--
pkg/fqdn/cache.go-      // multiple times.
pkg/fqdn/cache.go:      // When DNSZombieMappings.lastCTGCUpdate is earlier than DeletePendingAt a
pkg/fqdn/cache.go-      // zombie is alive.
--
pkg/fqdn/cache.go-      deletes        map[netip.Addr]*DNSZombieMapping
pkg/fqdn/cache.go:      lastCTGCUpdate time.Time
pkg/fqdn/cache.go-      nextCTGCUpdate time.Time // estimated
--
pkg/fqdn/cache.go-// Zombie is considered dead if all of these conditions apply:
pkg/fqdn/cache.go:// 1. CT GC has run after the DNS Expiry time and grace period (lastCTGCUpdate > DeletePendingAt + GracePeriod), and
pkg/fqdn/cache.go:// 2. The CT GC run did not mark the Zombie alive (lastCTGCUpdate > AliveAt)
pkg/fqdn/cache.go-// 3. CT GC has run at least 2 times since Zombie was entered
--
pkg/fqdn/cache.go-func (zombies *DNSZombieMappings) isConnectionAlive(zombie *DNSZombieMapping) bool {
pkg/fqdn/cache.go:      if !zombies.lastCTGCUpdate.After(zombie.DeletePendingAt.Add(option.Config.ToFQDNsIdleConnectionGracePeriod)) {
pkg/fqdn/cache.go-              return true
pkg/fqdn/cache.go-      }
pkg/fqdn/cache.go:      if !zombies.lastCTGCUpdate.After(zombie.AliveAt) {
pkg/fqdn/cache.go-              return true
--
pkg/fqdn/cache.go-      zombies.Lock()
pkg/fqdn/cache.go:      zombies.lastCTGCUpdate = ctGCStart
pkg/fqdn/cache.go-      zombies.nextCTGCUpdate = estNext

So then the main impact I can identify at this point is just that the garbage collection that we expect to happen within the first ~30s of Cilium startup will be deferred to the first ~10m of Cilium startup. Now I put it that way, the logic seems inconsistent with the intent of the code but it's not that severe of a bug.

joestringer · 2024-04-18T21:55:03Z

/ci-l4lb

nebril

Nice

aspsk · 2024-04-25T09:36:53Z

Could you please rebase?

The reverse call tree for RestoreEndpoint, which exposes all restored endpoints in the EndpointManager, is as follows: INCOMING CALLS - f RestoreEndpoint github.com/cilium/cilium/pkg/endpointmanager - f regenerateRestoredEndpoints github.com/cilium/cilium/pkg/endpointmanager - f initRestore github.com/cilium/cilium/daemon/cmd + f startDaemon github.com/cilium/cilium/daemon/cmd Previously, the `CTNATMapGC.Enable()` call, which invokes `gc.endpointsManager.GetEndpoints()`, would be called prior to exposing these endpoints in the EndpointManager. As a result, the step where the initial scan attempts to update each Endpoint's DNSHistory with the latest CT GC timers would fail, leaving the timestamps empty. The potential impact of this is that DNS entries that should expire soon after a cilium-agent restart may not time out for an extra entire conntrack garbage collection interval several minutes later. Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer · 2024-04-25T16:07:30Z

/test

joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 17, 2024

joestringer requested a review from squeed April 17, 2024 00:06

joestringer mentioned this pull request Apr 17, 2024

Report estimated expiry timers for connection-based FQDN entries #32013

Merged

joestringer removed the needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch label Apr 17, 2024

joestringer force-pushed the pr/joe/ct-gc-after-restore branch from d1135ed to 15772a8 Compare April 18, 2024 21:54

joestringer changed the base branch from main to pr/joe/improve-nat46x64-debugging April 18, 2024 21:54

joestringer added the dont-merge/blocked Another PR must be merged before this one. label Apr 18, 2024

joestringer marked this pull request as ready for review April 22, 2024 16:26

joestringer requested a review from a team as a code owner April 22, 2024 16:26

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from 39c0a25 to e828af6 Compare April 22, 2024 21:34

joestringer requested review from a team as code owners April 22, 2024 21:34

joestringer requested review from aspsk, nebril and viktor-kurchenko and removed request for a team April 22, 2024 21:34

nebril approved these changes Apr 23, 2024

View reviewed changes

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from e828af6 to 584a1b6 Compare April 25, 2024 03:06

Base automatically changed from pr/joe/improve-nat46x64-debugging to main April 25, 2024 07:57

joestringer force-pushed the pr/joe/ct-gc-after-restore branch from 15772a8 to 09fac0c Compare April 25, 2024 16:07

joestringer removed request for aspsk and viktor-kurchenko April 25, 2024 16:08

joestringer removed the dont-merge/blocked Another PR must be merged before this one. label Apr 25, 2024

joestringer enabled auto-merge April 25, 2024 16:09

joestringer added this pull request to the merge queue Apr 30, 2024

Merged via the queue into main with commit 6b0d76a Apr 30, 2024
261 of 265 checks passed

joestringer deleted the pr/joe/ct-gc-after-restore branch April 30, 2024 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon: Run conntrack GC after Endpoint Restore #32012

daemon: Run conntrack GC after Endpoint Restore #32012

joestringer commented Apr 17, 2024 •

edited

joestringer commented Apr 17, 2024

squeed commented Apr 17, 2024

squeed commented Apr 17, 2024

joestringer commented Apr 17, 2024

joestringer commented Apr 18, 2024

nebril left a comment

aspsk commented Apr 25, 2024

joestringer commented Apr 25, 2024

daemon: Run conntrack GC after Endpoint Restore #32012

daemon: Run conntrack GC after Endpoint Restore #32012

Conversation

joestringer commented Apr 17, 2024 • edited

joestringer commented Apr 17, 2024

squeed commented Apr 17, 2024

squeed commented Apr 17, 2024

joestringer commented Apr 17, 2024

joestringer commented Apr 18, 2024

nebril left a comment

Choose a reason for hiding this comment

aspsk commented Apr 25, 2024

joestringer commented Apr 25, 2024

joestringer commented Apr 17, 2024 •

edited