A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/lyft/atlantis below:

lyft/atlantis: Terraform automation for GitHub PRs (private fork of runatlantis/atlantis)

This was forked from runatlantis/atlantis at v0.17.3

Since then this version has diverged significantly from upstream and was therefore detached.

⚠️ This repo is still still contains a lot of code from upstream which is slowly being phased out as we test our implementation in production. It is not ready for general consumption.

Functional Differences

Non Functional Differences

Most of the new code can be found in server/neptune. Neptune is the codename for our rebuild of Atlantis. Outside of that package is mostly old code which we are removing as we deprecate upstream/legacy workers.

Currently, Atlantis can operate in 3 modes based on the configuration passed in:

In order for Neptune to work correctly, all of these must exist.

Receives webhook events from github and acts on them accordingly. Gateway is stateless however each request does spin up a go routine that clones a repository to disk. This is the primary bottleneck here.

Responsible for speculative planning and policy checking within the PR. This code is relatively untouched from upstream atlantis and should eventually be nuked in favor of Temporal workflows.

Responsible for running 3 primary processes:

Deploy workflows are run on the granularity of a single repository root. It follows the ID pattern below:

<OWNER/REPO>||<ROOT_NAME>

The following is a high level diagram of how this workflow is structured:

                                           ┌─────────────┐
                                           │             │
                                           │  deployment │
                                           │    store    │
┌─────────────────┐                        │             │
│     select      │                        └────▲───┬────┘
│                 │                             │   │
│                 │                             │   │
│                 │                             │   │
├┬───────────────┬┤                        ┌────┴───▼──────┐
││revision signal││    ┌──────────────┐    │               │
││   channel     │┼────►priority queue├────► queue worker  │
│┼───────────────┼│    └──────────────┘    │               │
│┼───────────────┼│                        └──────┬────────┘
││ timeout timer ││                               │
│┼───────────────┼│                               │
└─────────────────┘                       ┌───────▼─────────┐
                                          │     select      │
                                          │                 │
                                          │                 │
                                          │                 │
                                          ├┬───────────────┬┤
                                          ││ queue::CanPop │┼───────────────┐
                                          ││               ││               │
                                          │┼───────────────┼│      ┌────────▼─────────┐
                                          │┼───────────────┼│      │      select      │
                                          ││ unlock signal ││      │                  │         ┌─────────┐
                                          │┼───────────────┼│      │                  │         │ Github  │
                                          └─────────────────┘      │                  │         └▲────────┘
                                                                   ├┬────────────────┬┤          │
                                                                   ││ state change   ││          │
                                                                   ││ signal channel │┼──────────┴───┐
                                                                   │┼────────────────┼│              │
                                                                   │┼────────────────┼│              │
                                                                   ││ child workflow ││           ┌──▼──┐
                                                                   │┼────────────────┼│           │ SNS │
                                                                   └──────────────────┘           └─────┘

The deploy workflow is responsible for a few things:

In order to receive revisions our main workflow thread listens to a dedicated channel. If we haven't received a new revision in 60 minutes and our queue is empty, we instigate a shutdown of the workflow in its entirety.

The queue is modeled as a priority queue where manually triggered deployments always happen before merge triggered deployments. This queue can be in a locked state if the last deployment that happened was triggered manually. The queue lock applies only to deployments that have been triggered via merging and can be unlocked through the corresponding check run of an item that is blocked.

Items can only be popped of the queue if the queue is unlocked OR if the queue is locked but contains a manually triggered deployment.

By default, a new workflow starts up in an unlocked state.

Upon workflow startup, we start a queue worker go routine which is responsible for popping off items from our queue when it can, and listening for unlock signals from gateway.

The worker also maintains local state on the latest deployment that has happened. This is used for validating that new revisions intended for deploy are ahead of the latest deployed revisions. Once each deployment is complete, the worker persists this information in the configured blob store. The worker only fetches from this blob store on workflow startup and maintains the information locally for lifetime of its execution.

A deploy consists of executing a terraform workflow. The worker blocks on execution of this "child" workflow and listens for state changes via a dedicated signal channel. Once the child is complete, we stop listening to this signal channel and move on.

State changes are reflected in the github checks UI (ie. plan in progress, plan failed, apply in progress etc.). A single check run is used to represent the deployment state. The check run state is indicative of the completion state of the deployment and the details of the deployment itself are rendered in the check run details section.

State changes for apply jobs specifically are sent to SNS for internal auditing purposes.

The Terraform workflow runs on the granularity of a single deployment. It's identifier is the deployment's identifier which is randomly generated in the Deploy Workflow. Note: this means a single revision can be tied to multiple deployments.

The terraform workflow is stateful due to the fact that it keeps data on disk and references it throughout the workflow. Cleanup of that data only happens when that workflow is complete. In order to ensure this statefulness, the terraform workflow is aware of the worker it's running on and fetches this information as part of the first activity. Each successive activity task takes place on the same task queue.

Following this:

Before and after this job, the workflow signals it's parent execution with a state object. At this point, the workflow either blocks on a dedicated plan review channel, or proceeds to the apply under some criteria. Atm this is only if there are no changes in the plan.

Plan review signals are received directly by this workflow from gateway which pulls the workflow ID from the check run's External ID field.

If the plan is approved, the workflow proceeds with the apply, all the while updating the parent execution with the status, before exiting the workflow. If the plan is rejected or times out (1 week timeout on plan reviews), the parent is notified and the workflow exits.

The workflow itself has no retries configured. All activities use the default retry policy except for Terraform Activities. Terraform Activities throw up a TerraformClientError if there is an error from the binary itself. This error is configured to be non-retryable since most of the time this is a user error.

For Terraform Applies, timeouts are not retried. Timeouts can happen from exceeding the ScheduleToClose threshold or from lack of heartbeat for over a minute. Instead of retrying the apply, which can have unpredictable results, we signal our parent that there has been a timeout and this is surfaced to the user.

Since Terraform activities can run long, we send hearbeats at 5 second intervals. If 1 minute goes by without receiving a hearbeat, temporal will assume the worker node is down and the configured retry policy will be run.

Terraform operation logs are streamed to the local server process using go channels. Once the operation is complete, the channel is closed and the receiving process persists the logs to the configured job store.

Running Atlantis With Local Changes

The atlantis worker can't technically be run yet locally given it's dependency on sqs. However, Docker compose is set up to run a gateway, a temporal worker, temporalite and ngrok all in the same network. Ngrok allows us to expose localhost to the public internet in order to test github app integrations.

There is some setup that is required in order to have your containers running and receiving webhooks.

  1. Setup your own personal github organization and test github app.

  2. Install this app to a test repo within your organization with the following configuration.

  3. Download the key file and save it to ~/.ssh directory. Note: ~/.ssh is mounted to allow for referencing any local ssh keys.

  4. Create the following files: ~/atlantis-gateway.env

ATLANTIS_PORT=4143
ATLANTIS_GH_APP_ID=<FILL THIS IN>
ATLANTIS_GH_APP_KEY_FILE=/.ssh/your-key-file.pem
ATLANTIS_GH_WEBHOOK_SECRET=<FILL THIS IN>
ATLANTIS_GH_APP_SLUG=<FILL THIS IN>

# The github organization the feature flag repo resides
ATLANTIS_FF_OWNER=<FILL THIS IN>
# Name of the feature flag repo
ATLANTIS_FF_REPO=<FILL THIS IN>
# The path to the flags.yaml file
ATLANTIS_FF_PATH=<FILL THIS IN>

ATLANTIS_DATA_DIR=/tmp/
ATLANTIS_LYFT_MODE=gateway
ATLANTIS_REPO_CONFIG=/generated/repo-config.yaml
ATLANTIS_WRITE_GIT_CREDS=true
ATLANTIS_ENABLE_POLICY_CHECKS=true
ATLANTIS_ENABLE_DIFF_MARKDOWN_FORMAT=true

ATLANTIS_REPO_ALLOWLIST=<FILL THIS IN>
ALLOWED_REPOS=<FILL THIS IN>

~/atlantis-temporalworker.env

ATLANTIS_PORT=4142
ATLANTIS_GH_APP_ID=<FILL THIS IN>
ATLANTIS_GH_APP_KEY_FILE=/.ssh/your-key-file.pem
ATLANTIS_GH_WEBHOOK_SECRET=<FILL THIS IN>
ATLANTIS_GH_APP_SLUG=<FILL THIS IN>
ATLANTIS_FF_OWNER=<FILL THIS IN>
ATLANTIS_FF_REPO=<FILL THIS IN>
ATLANTIS_FF_PATH=<FILL THIS IN>
ATLANTIS_DATA_DIR=/tmp/
ATLANTIS_LYFT_MODE=temporalworker
ATLANTIS_REPO_CONFIG=/generated/repo-config.yaml
ATLANTIS_WRITE_GIT_CREDS=true
ATLANTIS_ENABLE_POLICY_CHECKS=true
ATLANTIS_ENABLE_DIFF_MARKDOWN_FORMAT=true
ATLANTIS_REPO_ALLOWLIST=<FILL THIS IN>
ALLOWED_REPOS=<FILL THIS IN>
ATLANTIS_DEFAULT_TF_VERSION=1.2.5

Once these steps are complete, everything should startup as normal. You just need to run:

make build-service
docker-compose build
docker-compose up

In order to have events routed to gateway, you'll need to visit http://localhost:4040/ and copy the https url into your GitHub app.

In order to see the temporal ui visit http://localhost:8233/.

If the ngrok container is restarted, the url will change which is a hassle. Fortunately, when we make a code change, we can rebuild and restart the atlantis container easily without disrupting ngrok.

e.g.

make build-service
docker-compose up --detach --build

make test. If you want to run the integration tests that actually run real terraform commands, run make test-all.

docker run --rm -v $(pwd):/go/src/github.com/runatlantis/atlantis -w /go/src/github.com/runatlantis/atlantis ghcr.io/runatlantis/testing-env:latest make test

Or to run the integration tests

docker run --rm -v $(pwd):/go/src/github.com/runatlantis/atlantis -w /go/src/github.com/runatlantis/atlantis ghcr.io/runatlantis/testing-env:latest make test-all
Calling Your Local Atlantis From GitHub
	- Metadata: Read-Only 
	- Pull Requests: Read and Write
	- Commit Statuses: Read and Write

Similarly, subscribe for the following events:

	- Pull Request
	- Issue Comment
atlantis server --gh-user <your username> --gh-token <your token> --repo-allowlist <your repo> --gh-webhook-secret <your webhook secret> --log-level debug
Error setting up workspace: failed to run git clone: could find git

and will instead look like

Error: setting up workspace: running git clone: no executable "git"

This is easier to read and more consistent

We write our own mocks to reduce dependencies. Most of the old code uses pegomock which is unmaintained. We shouldn't use this for any new changes going forward.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4