It’s a test that sometimes fails, but if you retry it enough times, it passes, eventually.
What are the potential cause for a test to be flaky? State leakLabel: flaky-test::state leak
Description: Data state has leaked from a previous test. The actual cause is probably not the flaky test here.
Difficulty to reproduce: Moderate. Usually, running the same spec files until the one that’s failing reproduces the problem.
Resolution: Fix the previous tests and/or places where the test data or environment is modified, so that it’s reset to a pristine test after each test.
Examples:
let_it_be
shared between test examples, while some test modifies the model either deliberately or unwillingly causing out-of-sync data in test examples. This can result in PG::QueryCanceled: ERROR
in the subsequent test examples or retries. For more information about state leakages and resolution options, see GitLab testing best practices.let_it_be
depended on a stub defined in a before
block. let_it_be
executes during before(:all)
, so the stub was not yet set. This exposed the tests to the actual method call, which happened to use a method cache.Label: flaky-test::dataset-specific
Description: The test assumes the dataset is in a particular (usually limited) state or order, which might not be true depending on when the test run during the test suite.
Difficulty to reproduce: Moderate, as the amount of data needed to reproduce the issue might be difficult to achieve locally. Ordering issues are easier to reproduce by repeatedly running the tests several times.
Resolution:
Examples:
master
if the order of tests changes.42
). If the test is run early in the test suite, it might pass as not enough records were created before it, but as soon as it would run later in the suite, there could be a record that actually has the ID 42
, hence the test would start to fail.ORDER BY
, database is not given deterministic ordering, or data race can happen in the tests.Label: flaky-test::too-many-sql-queries
Description: SQL Query limit has reached triggering Gitlab::QueryLimiting::Transaction::ThresholdExceededError
.
Difficulty to reproduce: Moderate, this failure may depend on the state of query cache which can be impacted by order of specs.
Resolution: See query count limits docs.
Random inputLabel: flaky-test::random input
Description: The test use random values, that sometimes match the expectations, and sometimes not.
Difficulty to reproduce: Easy, as the test can be modified locally to use the “random value” used at the time the test failed
Resolution: Once the problem is reproduced, it should be easy to debug and fix either the test or the app.
Examples:
Label: flaky-test::unreliable dom selector
Description: The DOM selector used in the test is unreliable.
Difficulty to reproduce: Moderate to difficult. Depending on whether the DOM selector is duplicated, or appears after a delay etc. Adding a delay in API or controller could help reproducing the issue.
Resolution: It really depends on the problem here. It could be to wait for requests to finish, to scroll down the page etc.
Examples:
element not found
error.Label: flaky-test::datetime-sensitive
Description: The test is assuming a specific date or time.
Difficulty to reproduce: Easy to moderate, depending on whether the test consistently fails after a certain date, or only fails at a given time or date.
Resolution: Freezing the time is usually a good solution.
Examples:
Label: flaky-test::unstable infrastructure
Description: The test fails from time to time due to infrastructure issues.
Difficulty to reproduce: Hard. It’s really hard to reproduce CI infrastructure issues. It might be possible by using containers locally.
Resolution: Starting a conversation with the Infrastructure department in a dedicated issue is usually a good idea.
Examples:
Label: flaky-test::improper synchronization
Description: A flaky test issue arising from timing-related factors, such as delays, eventual consistency, asynchronous operations, or race conditions. These issues may stem from shortcomings in the test logic, the system under test, or their interaction. While tests can sometimes address these issues through improved synchronization, they may also reveal underlying system bugs that require resolution.
Difficulty to reproduce: Moderate. It can be reproduced, for example, in feature tests by attempting to reference an element on a page that is not yet rendered, or in unit tests by failing to wait for an asynchronous operation to complete.
Resolution: In the end-to-end test suite, using an eventually matcher.
Examples:
seed
from the CI job logwhile :; do bin/rspec <spec> || break; done
in a loop to find a seed
bin/rspec --seed <previously found> --require ./config/initializers/macos.rb --bisect <spec>
let_it_be
is a common source of problemsseed
scripts/rspec_check_order_dependence
to ensure the spec can be run in random orderwhile :; do bin/rspec <spec> || break; done
in a loop again (and grab lunch) to verify it’s no longer flakyWhen we have a flaky test in master
:
Unless you really need to have a test disabled very fast (< 10min
), consider using the ~pipeline::expedited
label instead.
To quickly quarantine a test without having to open a merge request and wait for pipelines, you can follow the fast quarantining process.
Always proceed to open a long-term quarantine merge request after fast-quarantining a test! This is to ensure the fast-quarantined test was correctly fixed by running tests from the CI/CD pipelines (which are not run in the context of the fast-quarantine project).
Long-term quarantineOnce a test is fast-quarantined, you can proceed with the long-term quarantining process. This can be done by opening a merge request.
First, ensure the test file has a feature_category
metadata, to ensure correct attribution of the test file.
Then, you can use the quarantine: '<issue url>'
metadata with the URL of the ~“failure::flaky-test” issue you created previously.
# Quarantine a single spec
it 'succeeds', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
expect(response).to have_gitlab_http_status(:ok)
end
# Quarantine a describe/context block
describe '#flaky-method', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
[...]
end
This means it will be skipped in CI. By default, the quarantined tests will run locally.
We can skip them in local development as well by running with --tag ~quarantine
:
# Bash
bin/rspec --tag ~quarantine
# ZSH
bin/rspec --tag \~quarantine
Also, ensure that:
Note that we should not quarantine a shared example/context, and we cannot quarantine a call to it_behaves_like
or include_examples
:
# Will be flagged by Rubocop
shared_examples 'loads all the users when opened', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
[...]
end
# Does not work
it_behaves_like 'a shared example', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345'
# Does not work
include_examples 'a shared example', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345'
After the long-term quarantining MR has reached production, you should revert the fast-quarantine MR you created earlier.
Find quarantined tests by feature categoryTo find all quarantined tests for a feature category, use ripgrep
:
rg -l --multiline -w "(?s)feature_category:\s+:global_search.+quarantine:"
Jest
For Jest specs, you can use the .skip
method along with the eslint-disable-next-line
comment to disable the jest/no-disabled-tests
ESLint rule and include the issue URL. Here’s an example:
// quarantine: https://gitlab.com/gitlab-org/gitlab/-/issues/56789
// eslint-disable-next-line jest/no-disabled-tests
it.skip('should throw an error', () => {
expect(response).toThrowError(expected_error)
});
This means it is skipped unless the test suit is run with --runInBand
Jest command line option:
A list of files with quarantined specs in them can be found with the command:
For both test frameworks, make sure to add the ~"quarantined test"
label to the issue.
Once a test is in quarantine, there are 3 choices:
On our CI, we use RSpec::Retry
to automatically retry a failing example a few times (see spec/spec_helper.rb
for the precise retries count).
We also use a custom Gitlab::RspecFlaky::Listener
. This listener runs in the update-tests-metadata
job in maintenance
scheduled pipelines on the master
branch, and saves flaky examples to rspec/flaky/report-suite.json
. The report file is then retrieved by the retrieve-tests-metadata
job in all pipelines.
This was originally implemented in: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/13021.
If you want to enable retries locally, you can use the RETRIES
environment variable. For instance RETRIES=1 bin/rspec ...
would retry the failing examples once.
To generate the reports locally, use the FLAKY_RSPEC_GENERATE_REPORT
environment variable. For example, FLAKY_RSPEC_GENERATE_REPORT=1 bin/rspec ...
.
rspec/flaky/report-suite.json
report
The rspec/flaky/report-suite.json
report is imported into Snowflake once per day, for monitoring with the internal dashboard.
rspec-retry
is biting us when some API specs fail: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9825PG::UniqueViolation
: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9846
spec/mailers/notify_spec.rb
more robust: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10015spec/requests/api/commits_spec.rb
: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9944To identify ordering issues in a single file read about how to reproduce a flaky test locally.
Some flaky tests can fail depending on the order they run with other tests. For example:
To identify the ordering issues across different files, you can use scripts/rspec_bisect_flaky
, which would give us the minimal test combination to reproduce the failure:
First obtain the list of specs that ran before the flaky test. You can search for the list under Knapsack node specs:
in the CI job output log.
Save the list of specs as a file, and run:
cat knapsack_specs.txt | xargs scripts/rspec_bisect_flaky
If there is an order-dependency issue, the script above will print the minimal reproduction.
Time-sensitive flaky testssrc
attributeIf a spec hangs, or times out in CI, it might be caused by a LoadInterlockAwareMonitor deadlock bug in Rails.
To diagnose, you can use sigdump to print the Ruby thread dump :
Run the hanging spec locally.
Trigger the Ruby thread dump by running this command:
The thread dump will be saved to the /tmp/sigdump-<pid>.log
file.
If you see lines with load_interlock_aware_monitor.rb
, this is likely related:
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `mon_enter'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:22:in `block in synchronize'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
See examples where we worked around by creating the factories before making requests:
It could help to split the large RSpec files in multiple files in order to narrow down the context and identify the problematic tests.
Recreate job failure in CI by forcing the job to run the same set of test filesReproducing a job failure in CI always helps with troubleshooting why and how a test fails. This require us running the same test files with the same spec order. Since we use Knapsack to distribute tests across parallelized jobs, and files can be distributed differently between two pipelines, we can hardcode this job distribution through the following steps:
gitlab-org/gitlab
branch to the same commit to ensure we are running with the same copy of the project.Running command: bundle exec rspec
, the last argument of this command should contain a list of filenames. Copy this list.tooling/lib/tooling/parallel_rspec_runner.rb
where the test file distribution happens. Have a look at this merge request as an example, store the file list you copied from step 2 into a TEST_FILES
constant and have RSpec run this list by updating the rspec_command
method as done in the example MR.spec/tooling/lib/tooling/parallel_rspec_runner_spec.rb
so it doesn’t cause your pipeline to fail early.spec/support/rspec_order.rb
file by hard coding Kernel.srand
with the value shown in the originally failing job, as done here. You can fine the srand value in the job log by searching Randomized with seed
which is followed by this value.We collect information about tests duration in rspec_profiling_stats
project. The data is showed using GitLab Pages in this UI
In this issue, we defined thresholds for tests duration that can act as a guide.
For tests that are above the thresholds, we automatically report slowness occurrences in Test issues so that groups can improve them.
For tests that are slow for a legitimate reason and to skip issue creation, add allowed_to_be_slow: true
.
The process around these issues is very lightweight. Feel free to close them or not, they’re managed automatically:
[Test]
issue isn’t closed manually, it will be closed automatically after 30 days of inactivity.Return to Testing documentation
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4