Option 2 - Event Driven Data Lake

The purpose of this lab is to demonstrate an event-driven data lake ingestion strategy, as well as how to use AWS services to transform the raw data, query the data lake and create visualizations of the data.

In this lab you will perform the following tasks:

  • Validate sample data
  • Configure an ETL pipeline using S3 event triggers and AWS Lambda
  • Catalog ingested data using AWS Glue
  • Transform ingested data into Parquet format using an AWS Glue ETL job
  • Execute ad-hoc queries on transformed data using Amazon Athena
  • Visualize formatted data using Amazon QuickSight

DataLake-Architecture-overview

Services Used

Click below to be taken to the product page for services used in this lab.

Duration : Approx. 2-2.5 hours