RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/trinodb/reports below:

trinodb/reports: Analysis of the CI workflows of the trinodb/trino project

Analysis of the Trino project CI workflows.

This repo is an example of using Trino to perform ETL (extract, transform, and load) and generate basic reports. The complete workflow is:

Use a custom connector to read Github's API data and save it into an S3 bucket.
Run some queries to analyze the data stored in the S3 bucket, save results to a file and publish it to Github Pages.

All the of the above is repeatable and executed on a schedule using Github Actions.

Queries could be executed against the tables in the Github connector, but it has a few downsides:

it would perform many API requests, possibly hitting rate limits
range of the data is limited - for example, Github Workflow runs are only available from 2 last months

Data is saved to an S3 bucket, since it's cheap and easy to set-up. Since there's no database server running, there's no maintenance required.

Materialized views are not used, because incremental updates are tricky and different for many tables (Github API endpoints).

To run the queries locally:

download all the custom Trino connectors
run aws configure with the proper credentials to access the S3 bucket mentioned in catalog/trinocicd.properties; use the trino-reports profile name
make sure to have the GITHUB_TOKEN environment variable set
start Trino in a container

# ensure all these environment variables are set to correct values
export AWS_REGION AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY GITHUB_TOKEN
./bin/run-trino.sh

Now you can run any query from the sql directory using the Trino CLI:

trino trino://localhost:8080/trinocicd/v2 --output-format=ALIGNED < sql/pr/burndown.sql

The sync workflow:

Downloads and extracts a zip with the custom trino-rest connector.
Starts Trino in a container with both this additional connector (plugin) and its configuration mounted as volumes. It also includes configuration for a built-in connector that can read and write to an S3 bucket. The required credentials are passed as environmental variables, populated from Github Secrets.
It runs a series of queries similar to INSERT INTO hive.<table> SELECT * FROM github.<table> to save the data. See the Sync class from trino-rest for more details.

The reports workflow:

Starts Trino in a container with only the connector required to read data from the S3 bucket.
Executes multiple queries, saving results as a simple text table in a file.
Commits the updated results file to this repository, which triggers publishing it to Github Pages.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4