Detect Secrets Stream is a server tool which ingests metadata of all (public repositories by default, private repositories are opt-in only) git pushes on your company's GitHub Enterprise server. For each push, it scans the push contents for secrets. Once found and verified, secrets metadata will be stored in a database, and the raw secret will be stored in Vault.
There is a companion Admin tool which enables org admins to:
Under the hood, the server tool uses the developer tool to scan for secrets. Read IBM/detect-secrets for more info about developer tool.
python3
pyenv
to install Python 3: brew install pyenv; pyenv install 3.8.5;
pyenv global 3.8.5
. Then restart your shell. Run python --version
to validate the default Python version is 3.docker
https://docs.docker.com/get-docker/skaffold
, v1.12.1 and abovekustomize
, v3.8.1 and above. Do NOT use the version bundled with kubectl
, as it does not support some options use by this project (e.g. replicas).container-structure-test
, used for docker image validation - installation.pipenv
, used to manage all Python dependencies.You can install all the tools except docker and python 3 with one liner below
brew install kustomize skaffold container-structure-test pipenvPython package dependencies
pipenv
shell with pipenv shell
pipenv install --dev
pre-commit
tool with pre-commit install
Example secrets can be found in secrets.template (more docs coming). For your local dev environment, some secrets are auto-generated. For prod environments, you will need to supply the real secrets.
To set up your local dev secrets, first run ./kustomize_envs/dev/gen-secret.sh
. It will create two hidden folders under /kustomize_envs/dev/
:
secret_generated/
containing automatically-generated secrets. Secrets get regenerated when re-running gen-secret.sh
.secret_manual/
containing manually-entered secrets. Running gen-secret.sh
will generate a template for secrets under this folder. You do need to manually update each secret to match your environment. Re-running gen-secret.sh
will not overwrite the files if they are non-empty.This table contains information on what the values of the manually-entered secrets should be set to:
File name Valueapp.key
The test GitHub App's private key. Download this from the GitHub App config UI. env.txt
The test GitHub App's app ID. Obtained from the GitHub App config UI. db2consv_zs.lic
The IBM DB2 license certificate file. You can read more about how to retrieve it here. This is only needed if looking for DB2 for z or DB2 for i secrets. email.conf
Your company's internal email regex. Replace mycompany.com
with your company's email domain. ghe_revocation.token
The Jenkins job trigger token to revoke the GHE token. github.conf
github.mycompany.com
- replace this with your company's GHE domain. tokens
- a list of GitHub API tokens (no scopes necessary). This token pool is necessary for when a single token's rate limit has been reached. <org>
and <repo>
should be replaced with the organization/repository containing the org set configuration for private repository scanning. iam.conf
The IBM Cloud IAM API key for an admin account which can resolve an IBM Cloud IAM token owner. kafka.conf
brokers_sasl
- comma-separated Kafka broker list. For example broker-1:9093,broker-2:9093,broker-3:9093
. api_key
- Kafka API key to publish and consume from the queue. When using IBM Cloud Events Stream service, you can obtain such value from the Events Stream console by navigating to the Service credentials panel and creating a new service credential. brokers_sasl
is the value of kafka_brokers_sasl
(without "
or spaces) from your service credential. api_key
is the value of api_key
from your service credential. revoker_urls.conf
Replace github.mycompany.com
with your company's GHE URL, artifactory
with your company's artifactory URL, and jenkins
with your company's Jenkins URL, which should contain a Jenkins job to revoke the GHE token.
Besides the secrets mentioned from Local dev secrets, for production environments, you also need to prepare secrets which are auto-generated in the dev environment.
This table contains information on what the values of the dev secrets should be set to:
File name Valuebasic_auth.conf
Basic auth info for ingestion and revoker layer. dc_iv_file
and dc_key_file
The key file used for deterministic encryption. This will be replaced by non-deterministic encryption later. gd_db.conf
Database related secrets. hmac.key
HMAC key used in hashing algorithm. encryption.key
and encryption.key.pub
Encryption key used in non-deterministic encryption. vault.conf
Vault related secrets. DSS uses approle for auth and KV v1 as secret engine.
You don't need to unlock secrets when running unit tests.
Run a subset of the unit testsThis provides faster feedback if you are just writing code for a module.
# The part after last dash (-) corresponding to folder name under detect_secrets_stream # For example, run unit test for just files under bp_lookup make test-unit-bp_lookup # Run unit test for just files under pi_cleaner make test-unit-pi_cleaner
This requires a personal or staging environment. See kustomize_envs/dev/README.md for more details.
This repo has provided a utility module which enables an admin to do many routine tasks. The utility is invoked though python -m detect_secrets_stream.util.secret_util
Running the utility requires several environment variables. An example environment variable file (.env.example
) has been provided.
From a fish
shell, you can do something like below
cp .env.example .env.prod env (grep -v '^#' .env.prod | xargs -n1) python -m detect_secrets_stream.util.secret_util --help
Based on id
python -m detect_secrets_stream.util.secret_util decrypt-token-by-id [token_id]
Based on UUID
python -m detect_secrets_stream.util.secret_util decrypt-token-by-uuid [uuid]Validating Admin Tool Org Admins
python -m detect_secrets_stream.util.secret_util get-org-admins [ORG_NAME]
# export require env vars # macos python -m detect_secrets_stream.util.secret_util backfill --size=10000 --from $(date -j -f "%a %b %d %T %Z %Y" "Wed Sep 11 00:00:00 EDT 2019" +"%s") --to $(date -j - f "%a %b %d %T %Z %Y" "Wed Sep 12 00:00:00 EDT 2019" +"%s") # linux python -m detect_secrets_stream.util.secret_util backfill --size=10000 --from $(date -d '06/12/2012 07:21:22' +"%s") --to $(date -d '06/12/2012 08:21:22' +"%s")
Manually add a commit to the diff-scan
queue.
Note: must set KAFKA_CLIENT_ID
, GD_KAFKA_CONF
environment variables.
KAFKA_CLIENT_ID
is the name of the Kafka client used for manual ingestion. It can be anything, such as manual-ingest
GD_KAFKA_CONF
points to the Kafka configuration file. The production config is stored under kustomize_envs/prod-secrets/secret/kafka.conf
. The one below is an example of what should be contained in the config file.
[kafka]
brokers_sasl = my_sasl1.us-east.containers.appdomain.cloud:9000,my_sasl2.us-east.containers.appdomain.cloud:9000,my_sasl3.us-east.containers.appdomain.cloud:9000
api_key = my_api_key
Sample usage:
# export require env vars python -m detect_secrets_stream.util.secret_util ingest-commit -r <repo> -c <commit>
More options: running the command above with --help
will reveal help info on more options. For example, if you know the branch and repository's visibility, you can also supply these. By default, it assumes the commit is from the master branch and the repository's visibility is public.
python -m detect_secrets_stream.util.secret_util --help
The mechanism for removing old secrets is a cronjob which cleans PI on a daily basis.
It removes the following PI for all tokens that have been remediated for over seven days:
Every four hours, a cronjob checks if all live tokens in the database are still live and updates their statuses accordingly.
CREATE ROLE scan_worker_role; GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA public TO scan_worker_role; GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO scan_worker_role; GRANT TRUNCATE ON TABLE public.vmt_report TO scan_worker_role;How to Provision a User with a Specific Role
CREATE ROLE vmt_role WITH LOGIN; GRANT CONNECT ON DATABASE <db_name> TO vmt_role; GRANT USAGE ON SCHEMA public TO vmt_role; GRANT SELECT ON public.vmt_report TO vmt_role; CREATE USER vmt_user WITH IN GROUP vmt_role PASSWORD [redacted]Column name Type Description
vuln_id
VARCHAR The vulnerability ID token_owner_email
VARCHAR The token owner's email address token_type
VARCHAR The type of token, such as 'Slack', 'GHE', Softlayer; (10 < types < 100) vulnerability
VARCHAR The vulnerability pusher_email
VARCHAR The commit pusher's email address committer_email
VARCHAR The committer's email address author_email
VARCHAR The author's email date_last_tested
TIMESTAMPTZ The date that the token was last tested date_remediated
TIMESTAMPTZ The date the token was remediated security_focals
VARCHAR The security focals repo_public
BOOLEAN Whether the token has been leaked in at least one public repository repo_private
BOOLEAN Whether the token has been leaked in at least one private repository
See the additional docs for more information:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4