30 Data Engineering Project Ideas

Luisprooc
4 min readMay 15, 2023

--

Photo by Markus Winkler on Unsplash

๐ŸŽฏ Beginners

  1. Data Pipeline for Weather Data: Build a data pipeline that pulls weather data from an API and stores it in a database.
  2. ETL for Movie Ratings: Create an ETL process that extracts movie ratings from a CSV file, transforms the data, and loads it into a database.
  3. Data Warehouse for Sales Data: Design and implement a data warehouse to store sales data from an e-commerce site.
  4. Data Pipeline for Social Media Data: Build a data pipeline that ingests social media data from the Twitter API and stores it in a database.
  5. ETL for Financial Data: Create an ETL process that extracts financial data from an Excel file, transforms the data, and loads it into a database.
  6. Data Pipeline for News Articles: Build a data pipeline that pulls news articles from an API and stores them in a database.
  7. ETL for Social Media Analytics: Create an ETL process that extracts social media data, transforms it, and loads it into a database for analytics.
  8. Data Lake for Satellite Imagery: Build a data lake to store satellite imagery for remote sensing applications.
  9. Data Pipeline for Financial News: Create a data pipeline that pulls financial news from an API and stores it in a database.
  10. ETL for Sensor Data: Create an ETL process that extracts sensor data from a CSV file, transforms the data, and loads it into a database.

๐ŸŽฏ Intermediate

  1. Data Pipeline for Clickstream Data: Build a data pipeline that ingests clickstream data from a website and stores it in a database for analytics.
  2. ETL for Customer Data: Create an ETL process that extracts customer data from a CRM system, transforms the data, and loads it into a database for analytics.
  3. Data Lake for Machine Learning: Build a data lake to store data for machine learning models, including training data, feature sets, and model outputs.
  4. Data Pipeline for Financial Transactions: Create a data pipeline that ingests financial transaction data from a bankโ€™s API and stores it in a database for analytics.
  5. ETL for Social Media Influence: Create an ETL process that extracts social media data and computes a social media influence score for individuals.
  6. Data Warehouse for Healthcare Claims: Design and implement a data warehouse to store healthcare claims data from multiple sources for analysis.
  7. Data Pipeline for IoT Sensor Data: Build a data pipeline that ingests IoT sensor data from multiple sources and stores it in a database for analysis.
  8. ETL for Stock Market Data: Create an ETL process that extracts stock market data from multiple sources, transforms the data, and loads it into a database for analytics.
  9. Data Lake for Genomics Data: Build a data lake to store genomics data, including gene sequences, gene expression data, and other genomic features.
  10. Data Pipeline for Social Media Advertising: Create a data pipeline that ingests social media advertising data and stores it in a database for analysis.
  11. ETL for Customer Reviews: Create an ETL process that extracts customer review data from multiple sources, transforms the data, and loads it into a database for analysis.
  12. Data Warehouse for Marketing Data: Design and implement a data warehouse to store marketing data from multiple sources, including web analytics, advertising campaigns, and CRM data.
  13. Data Pipeline for Traffic Analysis: Build a data pipeline that ingests traffic data from multiple sources, including GPS data and traffic sensors, and stores it in a database for analysis.
  14. ETL for Social Network Analysis: Create an ETL process that extracts social network data from multiple sources, transforms the data, and loads it into a database for analysis.
  15. Data Lake for Log Data: Build a data lake to store log data from multiple sources, including web servers, application servers, and network devices.
  16. Data Pipeline for Supply Chain Optimization: Create a data pipeline that ingests supply chain data from multiple sources and stores it in a database for analysis and optimization.
  17. ETL for Energy Consumption Data: Create an ETL process that extracts energy consumption data from multiple sources, transforms the data, and loads it into a database for analysis.
  18. Data Warehouse for Inventory Management: Design and implement a data warehouse to store inventory data from multiple sources, including point-of-sale systems, warehouses, and distribution centers.
  19. Data Pipeline for Fraud Detection: Build a data pipeline that ingests financial transaction data from multiple sources and performs fraud detection using machine learning.
  20. ETL for Natural Language Processing: Create an ETL process that extracts natural language data from multiple sources, transforms the data, and loads it into a database for analysis and natural language processing applications.

--

--

Luisprooc

๐ŸŒ Web Development - ๐Ÿง  AI & Machine Learning ๐Ÿ› ๏ธ Data Engineering - ๐Ÿ Python aficionado