After upgrading Kafka from version 2.2.1 to 3.5.1, we observed issues related to leader switching. The specific issues and reproduction steps are as follows:
Process:In such cases, the metadata should update correctly, allowing proper leader switching and load balancing.
Actual Behavior:The metadata fails to update correctly, causing all requests to be incorrectly routed to a specific node, forming an error loop.
Additional Information:Relevant Kafka configuration and librdkafka version are as follows:
[2024-07-26 02:30:16.729] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 1 Epoch -1|0||0|false
[2024-07-26 02:31:00.839] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 1 Epoch -1|0||0|false
[2024-07-26 02:31:01.334] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:01.642] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:01.835] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 1 Epoch 75|0||0|false
[2024-07-26 02:31:02.338] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 1 Epoch 75|0||0|false
[2024-07-26 02:31:02.846] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:03.352] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:03.854] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:04.436] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:04.880] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:05.388] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:05.894] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:06.400] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:06.903] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:07.406] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:07.907] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:08.411] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:08.920] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:09.436] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:31:09.938] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
a lot of same info
[2024-07-26 02:46:03.228] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:03.733] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:04.236] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:04.748] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:05.241] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:05.746] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch -1|0||0|false
[2024-07-26 02:46:06.640] [info] ConsumerEvent|my-topic-name|LOG|Success|7|METADATA|[thrd:main]: Topic my-topic-name partition 0 Leader 2 Epoch 77|0||0|false
code
static int rd_kafka_toppar_leader_update(rd_kafka_topic_t *rkt,
int32_t partition,
int32_t leader_id,
rd_kafka_broker_t *leader,
int32_t leader_epoch) {
rd_kafka_toppar_t *rktp;
rd_bool_t need_epoch_validation = rd_false;
int r = 0;
rktp = rd_kafka_toppar_get(rkt, partition, 0);
if (unlikely(!rktp)) {
/* Have only seen this in issue #132.
* Probably caused by corrupt broker state. */
rd_kafka_log(rkt->rkt_rk, LOG_WARNING, "BROKER",
"%s [%" PRId32
"] is unknown "
"(partition_cnt %i): "
"ignoring leader (%" PRId32 ") update",
rkt->rkt_topic->str, partition,
rkt->rkt_partition_cnt, leader_id);
return -1;
}
rd_kafka_toppar_lock(rktp);
if (leader_epoch < rktp->rktp_leader_epoch) {
rd_kafka_dbg(rktp->rktp_rkt->rkt_rk, TOPIC, "BROKER",
"%s [%" PRId32
"]: ignoring outdated metadata update with "
"leader epoch %" PRId32
" which is older than "
"our cached epoch %" PRId32,
rktp->rktp_rkt->rkt_topic->str,
rktp->rktp_partition, leader_epoch,
rktp->rktp_leader_epoch);
if (rktp->rktp_fetch_state !=
RD_KAFKA_TOPPAR_FETCH_VALIDATE_EPOCH_WAIT) {
rd_kafka_toppar_unlock(rktp);
rd_kafka_toppar_destroy(rktp); /* from get() */
return 0; // <----------------------------------------------------------it quit directly
}
}
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4