Write python locally, execute SQL in your data warehouse
≪ Read the Docs · Join Our Slack »
RasgoQL is a Python package that enables you to easily query and transform tables in your Data Warehouse directly from a notebook.
You can quickly create new features, sample data, apply complex aggregates... all without having to write SQL!
Choose from our library of predefined transformations or make your own to streamline the feature engineering process.
Why is this package useful?Data scientists spend much of their time in pandas preparing data for modelling. When they are ready to deploy or scale, two pain points arise:
We created RasgoQL to solve these two pain points.
Learn more at https://docs.rasgoql.com.
Under the covers, RasgoQL sends all processing to your Data Warehouse, enabling the efficient transformation of massive datasets. RasgoQL only needs basic metadata to execute transforms, so your private data remains secure.
RasgoQL does these things well:
Rasgo supports Snowflake, BigQuery, Postgres, and Amazon Redshift with more Data Warehouses being added soon. If you'd like to suggest another database type, submit your idea to our GitHub Discussions page so that other community members can weight in and show their support.
If you use pandas to build features, but you are working on a massive set of data that won't fit in your machine's memory. RasgoQL can help!
If your organization uses dbt or another SQL tool to run production data flows, but you prefer to build features in pandas. RasgoQL can help!
If you know pandas, but not SQL and want to learn how queries will translate. RasgoQL can help!
Just run a simple pip install.
pip install rasgoql~=1.0
Report Bug · Suggest Improvement · Request Feature
pip install rasgoql --upgrade # Connect to your data warehouse creds = rasgoql.SnowflakeCredentials( account="", user="", password="", role="", warehouse="", database="", schema="" ) # Connect to DW rql = rasgoql.connect(creds) # List available tables rql.list_tables('ADVENTUREWORKS').head(10) # Allow rasgoQL to interact with an existing Table in your Data Warehouse dataset = rql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES') # Take a peek at the data dataset.preview() # Use the datetrunc transform to seperate things into weeks weekly_sales = dataset.datetrunc(dates={'ORDERDATE':'week'}) # Aggregate to sum of sales for each week agg_weekly_sales = weekly_sales.aggregate( group_by=['PRODUCTKEY', 'ORDERDATE_WEEK'], aggregations={'SALESAMOUNT': ['SUM']}, ) # Quickly validate output agg_weekly_sales.to_df() # Print the SQL print(agg_weekly_sales.sql())
The best way to get familiar with the RasgoQL basics is by running through these notebooks in the tutorials folder.
Easily join tables together using the join
transform.
sales_dataset = rasgoql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES') sales_product_dataset = sales_dataset.join( join_table='DIM_PRODUCT', join_columns={'PRODUCTKEY': 'PRODUCTKEY'}, join_type='LEFT', join_prefix='PRODUCT') sales_product_dataset.sql() sales_product_dataset.preview()Chain transforms together
Create a rolling average aggregation and then drops unnecessary colomns.
sales_agg_drop = sales_dataset.rolling_agg( aggregations={"SALESAMOUNT": ["MAX", "MIN", "SUM"]}, order_by="ORDERDATE", offsets=[-7, 7], group_by=["PRODUCTKEY"], ).drop_columns(exclude_cols=["ORDERDATEKEY"]) sales_agg_drop.sql() sales_agg_drop.preview()Transpose unique values with pivots
Quickly generate pivot tables of your data.
sales_by_product = sales_dataset.pivot( dimensions=['ORDERDATE'], pivot_column='SALESAMOUNT', value_column='PRODUCTKEY', agg_method='SUM', list_of_vals=['310', '345'], ) sales_by_product.sql() sales_by_product.preview()Does any of my data get collected?
Rasgo will not collect any personal information. We log execution of methods in transforms.py
for success and failure so that we can more accurately track what's useful and what's problematic.
If you have any questions please:
Review the contributors guide
RasgoQL uses the GNU AGPL license, as found in the LICENSE file.
This project is sponspored by RasgoML. Find out at https://www.rasgoml.com/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4