A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.gitlab.com/development/testing_guide/unhealthy_tests/ below:

Unhealthy tests | GitLab Docs

Flaky tests What’s a flaky test?

It’s a test that sometimes fails, but if you retry it enough times, it passes, eventually.

What are the potential cause for a test to be flaky? State leak

Label: flaky-test::state leak

Description: Data state has leaked from a previous test. The actual cause is probably not the flaky test here.

Difficulty to reproduce: Moderate. Usually, running the same spec files until the one that’s failing reproduces the problem.

Resolution: Fix the previous tests and/or places where the test data or environment is modified, so that it’s reset to a pristine test after each test.

Examples:

Dataset-specific

Label: flaky-test::dataset-specific

Description: The test assumes the dataset is in a particular (usually limited) state or order, which might not be true depending on when the test run during the test suite.

Difficulty to reproduce: Moderate, as the amount of data needed to reproduce the issue might be difficult to achieve locally. Ordering issues are easier to reproduce by repeatedly running the tests several times.

Resolution:

Examples:

Too Many SQL queries

Label: flaky-test::too-many-sql-queries

Description: SQL Query limit has reached triggering Gitlab::QueryLimiting::Transaction::ThresholdExceededError.

Difficulty to reproduce: Moderate, this failure may depend on the state of query cache which can be impacted by order of specs.

Resolution: See query count limits docs.

Random input

Label: flaky-test::random input

Description: The test use random values, that sometimes match the expectations, and sometimes not.

Difficulty to reproduce: Easy, as the test can be modified locally to use the “random value” used at the time the test failed

Resolution: Once the problem is reproduced, it should be easy to debug and fix either the test or the app.

Examples:

Unreliable DOM Selector

Label: flaky-test::unreliable dom selector

Description: The DOM selector used in the test is unreliable.

Difficulty to reproduce: Moderate to difficult. Depending on whether the DOM selector is duplicated, or appears after a delay etc. Adding a delay in API or controller could help reproducing the issue.

Resolution: It really depends on the problem here. It could be to wait for requests to finish, to scroll down the page etc.

Examples:

Datetime-sensitive

Label: flaky-test::datetime-sensitive

Description: The test is assuming a specific date or time.

Difficulty to reproduce: Easy to moderate, depending on whether the test consistently fails after a certain date, or only fails at a given time or date.

Resolution: Freezing the time is usually a good solution.

Examples:

Unstable infrastructure

Label: flaky-test::unstable infrastructure

Description: The test fails from time to time due to infrastructure issues.

Difficulty to reproduce: Hard. It’s really hard to reproduce CI infrastructure issues. It might be possible by using containers locally.

Resolution: Starting a conversation with the Infrastructure department in a dedicated issue is usually a good idea.

Examples:

Improper Synchronization

Label: flaky-test::improper synchronization

Description: A flaky test issue arising from timing-related factors, such as delays, eventual consistency, asynchronous operations, or race conditions. These issues may stem from shortcomings in the test logic, the system under test, or their interaction. While tests can sometimes address these issues through improved synchronization, they may also reveal underlying system bugs that require resolution.

Difficulty to reproduce: Moderate. It can be reproduced, for example, in feature tests by attempting to reference an element on a page that is not yet rendered, or in unit tests by failing to wait for an asynchronous operation to complete.

Resolution: In the end-to-end test suite, using an eventually matcher.

Examples:

How to reproduce a flaky test locally?
  1. Reproduce the failure locally
  2. Reduce the examples by bisecting the spec failure with bin/rspec --seed <previously found> --require ./config/initializers/macos.rb --bisect <spec>
  3. Look at the remaining examples and watch for state leakage
  4. Once fixed, rerun the specs with seed
  5. Run scripts/rspec_check_order_dependence to ensure the spec can be run in random order
  6. Run while :; do bin/rspec <spec> || break; done in a loop again (and grab lunch) to verify it’s no longer flaky
Quarantined tests

When we have a flaky test in master:

  1. Create a ~“failure::flaky-test” issue with the relevant group label.
  2. Quarantine the test after the first failure. If the test cannot be fixed in a timely fashion, there is an impact on the productivity of all the developers, so it should be quarantined.
RSpec Fast quarantine

Unless you really need to have a test disabled very fast (< 10min), consider using the ~pipeline::expedited label instead.

To quickly quarantine a test without having to open a merge request and wait for pipelines, you can follow the fast quarantining process.

Always proceed to open a long-term quarantine merge request after fast-quarantining a test! This is to ensure the fast-quarantined test was correctly fixed by running tests from the CI/CD pipelines (which are not run in the context of the fast-quarantine project).

Long-term quarantine

Once a test is fast-quarantined, you can proceed with the long-term quarantining process. This can be done by opening a merge request.

First, ensure the test file has a feature_category metadata, to ensure correct attribution of the test file.

Then, you can use the quarantine: '<issue url>' metadata with the URL of the ~“failure::flaky-test” issue you created previously.

# Quarantine a single spec
it 'succeeds', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
  expect(response).to have_gitlab_http_status(:ok)
end

# Quarantine a describe/context block
describe '#flaky-method', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
  [...]
end

This means it will be skipped in CI. By default, the quarantined tests will run locally.

We can skip them in local development as well by running with --tag ~quarantine:

# Bash
bin/rspec --tag ~quarantine

# ZSH
bin/rspec --tag \~quarantine

Also, ensure that:

  1. The ~“quarantine” label is present on the merge request.
  2. The MR description mentions the flaky test issue with the usual terms to link a merge request to an issue.

Note that we should not quarantine a shared example/context, and we cannot quarantine a call to it_behaves_like or include_examples:

# Will be flagged by Rubocop
shared_examples 'loads all the users when opened', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
  [...]
end

# Does not work
it_behaves_like 'a shared example', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345'

# Does not work
include_examples 'a shared example', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345'

After the long-term quarantining MR has reached production, you should revert the fast-quarantine MR you created earlier.

Find quarantined tests by feature category

To find all quarantined tests for a feature category, use ripgrep:

rg -l --multiline -w "(?s)feature_category:\s+:global_search.+quarantine:"
Jest

For Jest specs, you can use the .skip method along with the eslint-disable-next-line comment to disable the jest/no-disabled-tests ESLint rule and include the issue URL. Here’s an example:

// quarantine: https://gitlab.com/gitlab-org/gitlab/-/issues/56789
// eslint-disable-next-line jest/no-disabled-tests
it.skip('should throw an error', () => {
  expect(response).toThrowError(expected_error)
});

This means it is skipped unless the test suit is run with --runInBand Jest command line option:

A list of files with quarantined specs in them can be found with the command:

For both test frameworks, make sure to add the ~"quarantined test" label to the issue.

Once a test is in quarantine, there are 3 choices:

Automatic retries and flaky tests detection

On our CI, we use RSpec::Retry to automatically retry a failing example a few times (see spec/spec_helper.rb for the precise retries count).

We also use a custom Gitlab::RspecFlaky::Listener. This listener runs in the update-tests-metadata job in maintenance scheduled pipelines on the master branch, and saves flaky examples to rspec/flaky/report-suite.json. The report file is then retrieved by the retrieve-tests-metadata job in all pipelines.

This was originally implemented in: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/13021.

If you want to enable retries locally, you can use the RETRIES environment variable. For instance RETRIES=1 bin/rspec ... would retry the failing examples once.

To generate the reports locally, use the FLAKY_RSPEC_GENERATE_REPORT environment variable. For example, FLAKY_RSPEC_GENERATE_REPORT=1 bin/rspec ....

Usage of the rspec/flaky/report-suite.json report

The rspec/flaky/report-suite.json report is imported into Snowflake once per day, for monitoring with the internal dashboard.

Problems we had in the past at GitLab Order-dependent flaky tests

To identify ordering issues in a single file read about how to reproduce a flaky test locally.

Some flaky tests can fail depending on the order they run with other tests. For example:

To identify the ordering issues across different files, you can use scripts/rspec_bisect_flaky, which would give us the minimal test combination to reproduce the failure:

  1. First obtain the list of specs that ran before the flaky test. You can search for the list under Knapsack node specs: in the CI job output log.

  2. Save the list of specs as a file, and run:

    cat knapsack_specs.txt | xargs scripts/rspec_bisect_flaky

If there is an order-dependency issue, the script above will print the minimal reproduction.

Time-sensitive flaky tests Array order expectation Feature tests Capybara expectation times out Hanging specs

If a spec hangs, or times out in CI, it might be caused by a LoadInterlockAwareMonitor deadlock bug in Rails.

To diagnose, you can use sigdump to print the Ruby thread dump :

  1. Run the hanging spec locally.

  2. Trigger the Ruby thread dump by running this command:

  3. The thread dump will be saved to the /tmp/sigdump-<pid>.log file.

If you see lines with load_interlock_aware_monitor.rb, this is likely related:

/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `mon_enter'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:22:in `block in synchronize'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/builds/gitlab-org/gitlab/vendor/ruby/3.2.0/gems/activesupport-7.0.8.4/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'

See examples where we worked around by creating the factories before making requests:

Suggestions Split the test file

It could help to split the large RSpec files in multiple files in order to narrow down the context and identify the problematic tests.

Recreate job failure in CI by forcing the job to run the same set of test files

Reproducing a job failure in CI always helps with troubleshooting why and how a test fails. This require us running the same test files with the same spec order. Since we use Knapsack to distribute tests across parallelized jobs, and files can be distributed differently between two pipelines, we can hardcode this job distribution through the following steps:

  1. Find a job that you want to reproduce, identify the commit that it ran against, set your local gitlab-org/gitlab branch to the same commit to ensure we are running with the same copy of the project.
  2. In the job log, locate the list of spec files that were distributed by Knapsack - you can search for Running command: bundle exec rspec, the last argument of this command should contain a list of filenames. Copy this list.
  3. Go to tooling/lib/tooling/parallel_rspec_runner.rb where the test file distribution happens. Have a look at this merge request as an example, store the file list you copied from step 2 into a TEST_FILES constant and have RSpec run this list by updating the rspec_command method as done in the example MR.
  4. Skip the tests in spec/tooling/lib/tooling/parallel_rspec_runner_spec.rb so it doesn’t cause your pipeline to fail early.
  5. Since we want to force the pipeline to run against a specific version, we do not want to run a merged results pipeline. We can introduce a merge conflict into the MR to achieve this.
  6. To preserve spec ordering, update the spec/support/rspec_order.rb file by hard coding Kernel.srand with the value shown in the originally failing job, as done here. You can fine the srand value in the job log by searching Randomized with seed which is followed by this value.
Metrics & Tracking Resources Slow tests Top slow tests

We collect information about tests duration in rspec_profiling_stats project. The data is showed using GitLab Pages in this UI

In this issue, we defined thresholds for tests duration that can act as a guide.

For tests that are above the thresholds, we automatically report slowness occurrences in Test issues so that groups can improve them.

For tests that are slow for a legitimate reason and to skip issue creation, add allowed_to_be_slow: true.

Date Feature tests Controllers and Requests tests Unit Other Method 2023-02-15 67.42 seconds 44.66 seconds - 76.86 seconds Top slow test eliminating the maximum 2023-06-15 50.13 seconds 19.20 seconds 27.12 45.40 seconds Avg for top 100 slow tests Handling issues for flaky or slow tests

The process around these issues is very lightweight. Feel free to close them or not, they’re managed automatically:

Return to Testing documentation


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4