Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log_backup: TiDB may panic when canceling task. #50839

Closed
YuJuncen opened this issue Jan 31, 2024 · 1 comment · Fixed by #50869
Closed

log_backup: TiDB may panic when canceling task. #50839

YuJuncen opened this issue Jan 31, 2024 · 1 comment · Fixed by #50869
Labels
affects-6.5 affects-7.1 affects-7.5 component/br This issue is related to BR of TiDB. found/gs found by gs severity/major type/bug This issue is a bug.

Comments

@YuJuncen
Copy link
Contributor

YuJuncen commented Jan 31, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Register a task.
  2. Let TiKV make a flush.
  3. At the same time, remove the task.

2. What did you expect to see? (Required)

The task has been safely removed.

3. What did you see instead (Required)

TiDB panicked at the event handler of the subscriber.

goroutine 143 [running]:
github.com/pingcap/tidb/br/pkg/streamhelper/spans.(*ValueSortedFull).MergeAll(0x0, {0x140008d5f18, 0x1, 0x10673fc38?})
        /Volumes/eXternal/Developer/tidb/br/pkg/streamhelper/spans/value_sorted.go:49 +0xac
github.com/pingcap/tidb/br/pkg/streamhelper/spans.(*ValueSortedFull).Merge(...)
        /Volumes/eXternal/Developer/tidb/br/pkg/streamhelper/spans/value_sorted.go:38
github.com/pingcap/tidb/br/pkg/streamhelper.(*CheckpointAdvancer).SpawnSubscriptionHandler.func1()
        /Volumes/eXternal/Developer/tidb/br/pkg/streamhelper/advancer.go:483 +0x33c
created by github.com/pingcap/tidb/br/pkg/streamhelper.(*CheckpointAdvancer).SpawnSubscriptionHandler in goroutine 141
        /Volumes/eXternal/Developer/tidb/br/pkg/streamhelper/advancer.go:469 +0x2a0

4. What is your TiDB version? (Required)

v6.5.3

@YuJuncen YuJuncen added the type/bug This issue is a bug. label Jan 31, 2024
@kennytm kennytm added the component/br This issue is related to BR of TiDB. label Jan 31, 2024
@kennytm
Copy link
Contributor

kennytm commented Jan 31, 2024

For context: the nil pointer panic is likely due to this line being executed without a lock and/or before the subscriber event stream being cleared.

case EventDel:
utils.LogBackupTaskCountDec()
c.task = nil
c.taskRange = nil
c.checkpoints = nil
// This would be synced by `taskMu`, perhaps we'd better rename that to `tickMu`.
// Do the null check because some of test cases won't equip the advancer with subscriber.
if c.subscriber != nil {
c.subscriber.Clear()
}

this caused c.checkpoints.Merge() here to be called with a nil object and eventually crash the program.

case event, ok := <-es:
if !ok {
return
}
c.checkpointsMu.Lock()
log.Debug("Accepting region flush event.",
zap.Stringer("range", logutil.StringifyRange(event.Key)),
zap.Uint64("checkpoint", event.Value))
c.checkpoints.Merge(event)
c.checkpointsMu.Unlock()

Given that Go supports generics now I think we should refactor the code to replace sync.Mutex with packages like https://pkg.go.dev/github.com/btwiuse/gmx to prevent accidental unlocked access (similar to how we replaced the old procedural sync/atomic API with https://pkg.go.dev/go.uber.org/atomic (though the feature is now available in sync/atomic too)).

@kennytm kennytm added the found/gs found by gs label Jan 31, 2024
YuJuncen added a commit to YuJuncen/tidb that referenced this issue Feb 18, 2024
close pingcap#50839

Signed-off-by: Yu Juncen <yu745514916@live.com>
BornChanger pushed a commit to BornChanger/tidb that referenced this issue Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 affects-7.1 affects-7.5 component/br This issue is related to BR of TiDB. found/gs found by gs severity/major type/bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants