Analysis of the Trino project CI workflows.
This repo is an example of using Trino to perform ETL (extract, transform, and load) and generate basic reports. The complete workflow is:
All the of the above is repeatable and executed on a schedule using Github Actions.
Queries could be executed against the tables in the Github connector, but it has a few downsides:
Data is saved to an S3 bucket, since it's cheap and easy to set-up. Since there's no database server running, there's no maintenance required.
Materialized views are not used, because incremental updates are tricky and different for many tables (Github API endpoints).
To run the queries locally:
aws configure
with the proper credentials to access the S3 bucket mentioned in catalog/trinocicd.properties
; use the trino-reports
profile nameGITHUB_TOKEN
environment variable set# ensure all these environment variables are set to correct values export AWS_REGION AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY GITHUB_TOKEN ./bin/run-trino.sh
Now you can run any query from the sql
directory using the Trino CLI:
trino trino://localhost:8080/trinocicd/v2 --output-format=ALIGNED < sql/pr/burndown.sql
trino-rest
connector.INSERT INTO hive.<table> SELECT * FROM github.<table>
to save the data. See the Sync class from trino-rest
for more details.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4