Data Analytics and AI are becoming fundamental driving forces for business decisions. Simple data Analytics and AI methods for sales distribution and forecasting, customer profile segmentation, and churn prediction can significantly impact the businesses. If you have substantial business data and are excited about getting started with AI using your data, then this blog is for you! We will help you understand how you could use Sagemaker - the new Analytics and AI platform from Amazon.

Introduction video:

Credits: Getting Started with Amazon SageMaker

What you need to have?

  1. An AWS SSO or IAM account to login to SageMaker Studio. Log in and explore the options to get familiar with Studio UI.
  2. Amazon S3 bucket, Amazon SageMaker SDK, and AWS SDK for Python - like boto3 and local Anaconda installation for Jupyter notebook are required if you want to use Sagemaker notebook instances.
  3. Supervised machine learning needs labeled data. If your data is not labeled, then one of the unique distinguishing features of SageMaker is Ground Truth, which will help you prepare a labeled dataset with ease. We have used existing open-source data in the below section.

What is Amazon SageMaker?

Amazon SageMaker provides an integrated Jupyter authoring notebook instance for developers and data scientists to quickly and easily train, build, and deploy machine learning models.

The following diagram illustrates the typical workflow for creating a machine learning application:

Using Studio to build AutoPilot experiments(For beginners):

  1. In JupyterLab, on the File menu and choose New, then select Notebook. In the Select Kernel box option, choose Python 3 (Data Science).
  2. Copy and paste the following code into the code cell and choose Run to download and extract the code.
%%sh
apt-get install -y unzip
wget https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip
unzip -o bank-additional.zip

3. Upload the CSV dataset into an Amazon S3 bucket.

import sagemaker

prefix = 'sagemaker/tutorial-autopilot/input'
sess   = sagemaker.Session()

uri = sess.upload_data(path="./bank-additional/bank-additional-full.csv", key_prefix=prefix)

print(uri)

Cheers! You are done with the coding part. The code outputs the S3 URI like below.

s3://sagemaker-us-east-2-ACCOUNT/sagemaker/tutorial-autopilot/input/bank-additional-full.csv

4. After you have uploaded the dataset, navigate to the navigation pane to the Experiments icon, and select Create Experiments.

5. Fill the Job Settings like below and click on Create Experiment.

Experiment Name: autopilot-exp

S3 location of input data: S3 URI you printed above
s3://sagemaker-us-east-2-ACCOUNT/sagemaker/tutorial-autopilot/input/bank-additional-full.csv

Target attribute name: y

S3 location for output data:
s3://sagemaker-us-east-2-ACCOUNT/sagemaker/autopilot-exp/output

6. Stages of SageMaker AutoPilot Experiment run.

a. Analyzing data

b. Feature Engineering

c. Model Tuning

7. Select & click Deploy Model. Select the endpoint-name and click on Deploy Model.

8. Evaluate the deployed model using the below code.

import boto3, sys

ep_name = 'tutorial-autopilot-best-model' #experiment name
sm_rt = boto3.Session().client('runtime.sagemaker')

with open('bank-additional/bank-additional-full.csv') as f:
         lines = f.readlines()
        for l in lines[1:2000]:   # Skip header
              l = l.split(',')      # Split CSV line into features
              label = l[-1]         # Store 'yes'/'no' label for validation
             l = l[:-1]            # Remove label from testing dataset
            l = ','.join(l)       # Rebuild CSV line without label
            response = sm_rt.invoke_endpoint(EndpointName=ep_name,
                                                                               ContentType='text/csv',
                                                                                Accept='text/csv', Body=l)
            response = response['Body'].read().decode("utf-8")
            print ("label %s response %s" %(label,response))

Congratulations!! You have successfully used AI! :-)

Please note: AutoPilot right now supports only regression and classification problems.

Using Studio notebook to build experiments(For experts in coding):

  1. Choose the file browser icon and navigate to "amazon-sagemaker-examples/aws_sagemaker_studio/getting_started".

2. Double-click xgboost_customer_churn_studio.ipynb to open the notebook. In the Select Kernel dialog, choose Python 3 (Data Science), then choose Select.

3. The notebook has cells to perform operations as per the machine learning workflow mentioned above. Update experiment names, session names, S3 Bucket ID, container, instance_type, and other parameters.

4. Run cells in sequence by pressing Shift + Enter together to run your experiments and then deploy and evaluate them.

Credits: Amazon SageMaker Studio 

How is Amazon SageMaker different?

  • Amazon SageMaker Automate AI (A2I) makes it easy to build and manage human reviews for machine learning applications.
  • Amazon SageMaker Neo makes it possible to train a model once and run it anywhere in the cloud without worrying about the underlying software and hardware configuration.

Conclusion:

Though the simple drag and drop Azure ML platform caters to a wider audience, SageMaker’s efficiency makes this tool the better choice of the two tools. We can conclude by saying that Amazon’s SageMaker is the better option if you are well versed with programming, and it serves well when working on complex and large-scale projects. In contrast, Microsoft’s Azure Machine Learning Studio is more suitable for those with smaller and simpler goals.

Both the products are rapidly evolving, and so it is now up to you to gauge your requirements and narrow down on the selection.