How to set up AWS Glue with dbt Developer Hub

Configure AWS IAM role & install dbt in Airflow to run dbt transformations. Add dependencies & create Airflow DAGs to automate workflows. Set up AWS profile for Glue Interactive Session.
Published
May 10, 2024
Author

How to Create an IAM Role for AWS Glue Interactive Session?

Setting up AWS Glue with dbt Developer Hub requires an AWS Identity and Access Management (IAM) role with the necessary permissions to run an AWS Glue interactive session. This role will allow AWS Glue to access the necessary resources and perform tasks on your behalf.


{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "glue:StartInteractiveSession",
"Resource": "*"
}
]
}

The above JSON policy document allows the IAM role to start an AWS Glue interactive session. This policy should be attached to the IAM role that you create for AWS Glue.

  • Version: The policy language version
  • Statement: An array of individual statements
  • Effect: Specifies whether the statement results in an allow or an explicit deny
  • Action: Describes the specific action or actions that will be allowed or denied
  • Resource: Specifies the object or objects to which the action applies

How to Install dbt in the New Airflow Environment?

After creating the IAM role, the next step is to install dbt in the new Airflow environment. dbt is a transformation tool that allows you to define, test, and execute data transformations in SQL.


pip install dbt

The above command installs dbt using pip, which is a package installer for Python. You should run this command in your Airflow environment.

  • pip: The Python package installer
  • install: The command to install a package
  • dbt: The package to be installed

How to Add Dependencies to Your requirements.txt?

For AWS Glue to work with dbt Developer Hub, you need to add certain dependencies to your requirements.txt file. These dependencies include boto3, botocore, dbt-redshift, dbt-postgres, and Python.


boto3>=1.17.54
botocore>=1.20.54
dbt-redshift>=1.3.0
dbt-postgres>=1.3.0

The above lines should be added to your requirements.txt file. Each line specifies a package and its minimum required version.

  • boto3: The AWS SDK for Python
  • botocore: The low-level, core functionality of boto3
  • dbt-redshift: The dbt adapter for Amazon Redshift
  • dbt-postgres: The dbt adapter for PostgreSQL

How to Create DAGs Focusing on dbt Transformation?

Once the dependencies are installed, you can create Directed Acyclic Graphs (DAGs) that focus on dbt transformation. DAGs are a set of tasks that run in a particular order, without any cycles.


from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator

def dbt_transform():
# dbt transformation code here

dag = DAG('dbt_dag', description='A simple dbt DAG',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 3, 20), catchup=False)

dummy_operator = DummyOperator(task_id='dummy_task', retries=3, dag=dag)

dbt_operator = PythonOperator(task_id='dbt_transform', python_callable=dbt_transform, dag=dag)

dummy_operator >> dbt_operator

The above Python script creates a simple Airflow DAG with two tasks. The first task is a dummy task, and the second task is a Python task that calls a function for dbt transformation.

  • DAG: A set of tasks that run in a particular order
  • DummyOperator: An operator that does nothing
  • PythonOperator: An operator that calls a Python function
  • dbt_transform: A function for dbt transformation

How to Configure Your AWS Profile for Glue Interactive Session?

The final step in setting up AWS Glue with dbt Developer Hub is to configure your AWS profile for Glue Interactive Session. This involves setting your AWS access key ID, secret access key, and default region in your AWS configuration file.


[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
region = YOUR_REGION

The above lines should be added to your AWS configuration file, which is usually located at ~/.aws/config. Replace YOUR_ACCESS_KEY, YOUR_SECRET_KEY, and YOUR_REGION with your actual AWS access key ID, secret access key, and default region, respectively.

  • aws_access_key_id: Your AWS access key ID
  • aws_secret_access_key: Your AWS secret access key
  • region: Your default AWS region

Keep reading

See all