Validating sample data

Validate sample data

In this section we verify the existence of customer sample data in our S3 Holding bucket (we will use this sample data for populating the data lake in this workshop). This sample data has fictitious personal customer information (e.g. First & Last name, Birth Date, Birth country) as well as the customers first sales order date and customer ID.

validate-sample-data-overview

  1. Please refer to the Cloudformation Outputs tab to capture the details of S3 Holding Bucket and S3 Copy Lambda function as shown below: Cloudformation-sampledata

  2. On the AWS Console, select the Amazon S3 service and select the S3 Holding bucket referenced in the CloudFormation output (Note: bucket will have “<stackname>-holdingS3bucket” in it’s name). This bucket should contain 2 files (customer-table-day1 & customer-table-day2 ~25MB size each). Cloudformation-sampledata

S3 Copy Lamba Function (Optional)

As part of the CloudFormation template deployment, the S3 Copy Lambda function is automatically executed and it copies customer sample data into the S3 Holding bucket.

If you are interested in details about this Lambda function, complete the steps bleow. Alternatively, feel free to move on to the next section.

  1. On the AWS Console, select AWS Lambda service and click on the Lambda function S3CopyLambda created by CloudFormation. Cloudformation-sampledata

  2. Click Monitoring tab and scroll down to CloudWatch Logs Insights section.
    Cloudformation-sampledata

  3. Under CloudWatch Logs Insights section you can see interesting details about recent Lambda execution - Duration, Memory Used and CloudWatch Logs
    Note: CloudWatch might take 1-2 minutes to populate details about a completed Lambda job. Cloudformation-sampledata

Note - AWS Lambda functions are charged based on number of executions and duration (GB-sec) usage. - AWS Lambda provides free usage tier which includes 1 M free requests/month and 400,000 GB-seconds of compute time each month.