When rd_kafka_metadata_cache_topic_update()
is called with the same topic but a different internal topic ID as an existing metadata cache entry, the metadata cache AVL indexed by topic_id will retain a stale pointer to a now-freed metadata cache entry. This causes corruption and, usually in short order, crashes.
Suppose we start with the metadata cache containing a single entry at address a1
, with {topic, topic_id} being {"t1", 1}
. rd_kafka_metadata_cache_insert()
is called with topic "t1"
and topic_id 2
. Follow the code path through rd_kafka_metadata_cache_insert()
:
rkmce = rd_tmpabuf_alloc(&tbuf, sizeof(*rkmce));
rkmce->rkmce_mtopic = *mtopic;
rkmce->rkmce_metadata_internal_topic = *metadata_internal_topic;
rkmce
is the new metadata cache entry, with {topic, topic_id} being {"t1", 2}
.
/* Insert (and replace existing) entry. */
old = RD_AVL_INSERT(&rk->rk_metadata_cache.rkmc_avl, rkmce,
rkmce_avlnode);
old
is a1
, containing {"t1", 1}
. old
has been removed from the AVL indexed by topic; rkmce
is inserted in the AVL indexed by topic.
/* Insert (and replace existing) entry into the AVL tree sorted
* by topic id. */
if (!RD_KAFKA_UUID_IS_ZERO(
rkmce->rkmce_metadata_internal_topic.topic_id)) {
/* If topic id isn't zero insert cache entry into this tree */
old_by_id = RD_AVL_INSERT(&rk->rk_metadata_cache.rkmc_avl_by_id,
rkmce, rkmce_avlnode_by_id);
rkmce
is inserted in the AVL indexed by topic ID. Since there was no previous metadata cache entry with ID 2
, old_by_id
is null. The AVL indexed by topic ID continues to have an entry for ID 1
, still pointing to a1
.
} else if (old && !RD_KAFKA_UUID_IS_ZERO(
old->rkmce_metadata_internal_topic.topic_id)) {
/* If it had a topic id, remove it from the tree */
RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
}
This is the else
clause to the preceding if
, so it does not run.
if (old) {
/* Delete and free old cache entry */
rd_kafka_metadata_cache_delete(rk, old, 0);
}
The last parameter to rd_kafka_metadata_cache_delete()
is unlink_avl
, whether it is to unlink the cache entry from the AVLs. Since unlink_avl
is 0, rd_kafka_metadata_cache_delete()
doesn't update the AVLs and just frees old
/ a1
.
At this point, the landmine has been armed: the AVL indexed by topic ID has a stale pointer to freed memory a1
. Depending on when and how that memory is reallocated, a variety of ill-effects can occur. Most of these ill-effects are crashes at inconsistent locations, but usually related to metadata cache operations.
if (old_by_id && old_by_id != old) {
/* If there was a different cache entry in this tree,
* remove and free it. */
RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl, old_by_id);
rd_kafka_metadata_cache_delete(rk, old_by_id, 0);
}
Since old_by_id
is null, this code doesn't run.
I believe that the bug is that this:
} else if (old && !RD_KAFKA_UUID_IS_ZERO(
old->rkmce_metadata_internal_topic.topic_id)) {
/* If it had a topic id, remove it from the tree */
RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
}
should not be an else
. Instead it should be:
if (old && !RD_KAFKA_UUID_IS_ZERO(
old->rkmce_metadata_internal_topic.topic_id) &&
rd_kafka_Uuid_cmp(rkmce->rkmce_metadata_internal_topic.topic_id,
old->rkmce_metadata_internal_topic.topic_id) != 0) {
/* If it had a different topic id, remove it from the tree */
RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
}
That is, if old
had a topic ID which was different from rkmce
, remove old
from the AVL indexed by topic ID. If old
has the same topic ID as rkmce
, then from the preceding if
body, old_by_id
will be equal to old
and properly handled by the existing code.
This seems to solve the problem in my testing.
This was found using librdkafka v2.6.0, but, as far as I can see, it continues to exist on the most recent main.
I suspect this is also the root cause of issues #4904 and #4907.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4