2 minute read

1. Dataset Background

Olist is a Brazilian e-commerce marketplace that connects small businesses to major marketplaces, enabling them to sell their products online across Brazil.

This dataset includes 100,000+ orders placed between 2016 and 2018 with details including:

  • Seller performance
  • Customer profiles
  • Product details
  • Order timelines (purchase, payment, shipping)
  • Reviews and ratings
  • Geolocation (states and cities)
  • Marketing channel performance

In this project, the focus is on seller dynamics to derive actionable insights.

2. Project Overview

This project builds a complete ELT pipeline:

  • Extract: Raw CSV files are stored in Google Cloud Storage.
  • Load: Airflow DAGs load them into BigQuery staging tables.
  • Transform: dbt turns staging data into analytics-ready marts.
  • Visualize: Metabase connects to BigQuery for dashboarding.

The infrastructure is managed with Docker and Terraform.

diagram

3. Getting Started

3.1 Clone Repository

git clone https://github.com/fachry-isl/olist-ecom-elt-pipeline.git

3.2 Cloud Service Provisioning

Google Application Credential

Follow this guide to create your credentials.

Save your credentials file to: ~/.google/credentials/google_credentials.json

Then set in variables.tf:

variable "credentials"{
  description = "The path to the service account key file"
  default     = "~/.google/credentials/google_credentials.json"
}

variable "project" {
  description = "Project"
  default     = "gcp-refresh-2025"
}

Terraform Apply

cd terraform
terraform apply

3.3 Setup Environment Variables

Edit docker-compose.yaml or create .env file:

GOOGLE_APPLICATION_CREDENTIALS=/.google/credentials/google_credentials.json
GCP_PROJECT_ID=gcp-refresh-2025
GCP_GCS_BUCKET=gcp-refresh-2025-olist-ecom
BIGQUERY_DATASET=olist_ecom_all

3.4 Build & Run Docker Container

docker-compose up --build

4. Tools Setup and Preview

4.1 Airflow for ELT Pipeline

Airflow orchestrates:

  • Extract from CSV to GCS
  • Load from GCS to BigQuery staging
  • Transform with DBT

4.2 DBT with Cosmos in Airflow

DBT is used for transformations using star schema modeling.

  • Fact and Dimension modeling
  • Data quality checks (e.g. not null, unique)
  • Documentation autogenerated

Cosmos integrates DBT inside Airflow DAGs.

airflow_main_dag

You can access DBT docs from Airflow UI /dbt_docs.

dbt_docs

4.3 Metabase for Visualization

Metabase connects to the final marts in BigQuery to build dashboards.

dashboard

5. Key Insights

๐Ÿ“Š Sales Performance Q3 2018

  • Total Sales: $1.9M (โ†“ 37.95% QoQ, โ†“ 23.39% YoY)
  • Target Achievement: ~78% of $2.5M target
  • Sales Trend: Peaked at $3.1M in Q2 2018, dipped in Q3

๐Ÿ›๏ธ Top Sellers & Categories

  • Leading Categories: watches_gifts, cool_stuff, audio
  • Consistent Sellers: Strong contributors to Q2/Q3 sales

๐Ÿ“ฆ Product-Level Insights

  • Most Ordered: Health & Beauty โ€“ 15% of all orders (50k+)
  • Revenue Drops:
    • watches_gifts: $311k โ†’ $192k
    • bed_bath_table, sports_leisure also down

6. Summary

  • End-to-end ELT pipeline deployed using Terraform, Docker, Airflow, DBT, and Metabase.
  • Olist E-commerce dataset used to analyze seller and product dynamics.
  • Key focus: identifying sales trends, seller performance, and product insights.

7. Recommendations

  • Analyze seasonality to anticipate sales dips.
  • Revisit underperforming categories post-Q2.
  • Use seller/product insights to optimize listings.

Project by Fachry Ikhsal ยท 2025