Using some internal QUIC tests, we see this timing in 1.1:
$ USE_GQUIC_VERSIONS=1 perf record -F 999 -g ./quic_lib_tests --gtest_repeat=100 --gtest_filter=*ZeroRttDisabled*
$ perf report
+ 31.73% 31.55% quic_lib_tests quic_lib_tests [.] bn_sqr8x_internal
+ 9.28% 9.28% quic_lib_tests quic_lib_tests [.] mul4x_internal
+ 4.91% 4.91% quic_lib_tests quic_lib_tests [.] sha256_block_data_order_avx
In 3.0 we see this:
$ USE_GQUIC_VERSIONS=1 perf record -F 999 -g ./quic_lib_tests --gtest_repeat=100 --gtest_filter=*ZeroRttDisabled*
$ perf report
+ 11.02% 10.99% quic_lib_tests quic_lib_tests [.] bn_sqr8x_internal
+ 8.38% 8.08% quic_lib_tests libpthread-2.31.so [.] __pthread_rwlock_rdlock
+ 7.65% 7.51% quic_lib_tests libpthread-2.31.so [.] __pthread_rwlock_unlock
+ 4.98% 4.78% quic_lib_tests quic_lib_tests [.] getrn
+ 4.14% 4.11% quic_lib_tests quic_lib_tests [.] mul4x_internal
+ 3.37% 2.57% quic_lib_tests quic_lib_tests [.] ossl_tolower
3.30% 3.30% quic_lib_tests quic_lib_tests [.] ossl_lh_strcasehash
+ 2.72% 2.13% quic_lib_tests quic_lib_tests [.] OPENSSL_strcasecmp
+ 2.29% 2.05% quic_lib_tests quic_lib_tests [.] ossl_lib_ctx_get_data
+ 1.93% 1.93% quic_lib_tests quic_lib_tests [.] sha256_block_data_order_avx
This seems to be part of OSSL_DECODER_CTX_new_for_pkey
16% of the time is spent in locking, on a single threaded binary. And 10% is in a string hashtable lookup.
If anyone on the project is going to look at this, I will try to get a small reproducer. But our the time for our QUIC tests is doubling.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3