Azure Machine Learning Services

By Prashanth Yelsetty, Kenfront LLC
Azure Machine Learning service is a cloud service that allows us to train, deploy, automate, and manage machine learning models, all at the broad scale that the cloud provides. Microsoft launched Azure Machine Learning Service in September of 2018, which helps data scientists and Machine learning engineers to build end-to-end machine learning pipelines. Azure Machine Learning service provides SDKs and services to quickly pre-process data, train, and deploy machine learning models.
This article will focus on the Azure Machine Learning service by Azure, and it will cover the basic concepts including an example of training your own machine learning model. Following is the breakdown of the article:
1) Introduction to Azure Machine Learning Services
2) Developing and Deploying Machine Learning Models using Azure Notebooks
Introduction to Azure Machine Learning Service
Azure Machine Learning Service (AML) provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models.

 

Workflow

The machine learning workflow generally follows this sequence:

  1. Develop machine learning training scripts in Python.
  2. Create and configure a compute target.
  3. Submit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution are saved as runs in the workspace and grouped under experiments.
  4. Query the experiment for logged metrics from the current and past runs. If the metrics don’t indicate a desired outcome, loop back to step 1 and iterate on your scripts.
  5. After a satisfactory run is found, register the persisted model in the model registry.
  6. Develop a scoring script.
  7. Create an image and register it in the image registry.
  8. Deploy the image as a web service in Azure.

Concepts

In order to work with AML, one need to be aware of the following concepts:

Workspace: The workspace is the top-level resource for Azure Machine Learning service. It provides a centralized place to work with all the artefacts you create when you use Azure Machine Learning service. Create a Workspace

Experiment: Within the workspace, you can define experiments that contain individual training runs. Each training run you perform will associate itself with an experiment and a workspace. Defining logical high-level experiments will help you monitor various training runs and their outputs.

Pipeline:  Pipelines are used to create and manage workflows that stitch together machine learning phases. Various machine learning phases including data preparation, model training, model deployment, and inferencing.

Compute: A compute target is the compute resource that one use to run their training script or host their service deployment Compute targets are attached to a workspace. Compute targets other than the local machine are shared by users of the workspace.

Model:  At its simplest, a model is a piece of code that takes an input and produces output. Creating a machine learning model involves selecting an algorithm, providing it with data, and tuning hyperparameters.

Image: Images provide a way to reliably deploy a model, along with all components you need to use the model.

Deployment: A deployment is an instantiation of your image into either a web service that can be hosted in the cloud or an IoT module for integrated device deployments.

Developing Machine Learning Pipeline with Azure Notebooks

Allstate Claim Prediction Challenge is hosted by Kaggle. The goal of this competition is to better predict Bodily Injury Liability Insurance claim payments based on the characteristics of the insured customer’s vehicle. Many factors contribute to the frequency and severity of car accidents including how, where and under what conditions people drive, as well as what they are driving. Bodily Injury Liability Insurance covers other people’s bodily injury or death for which the insured is responsible. The goal of this competition is to predict Bodily Injury Liability Insurance claim payments based on the characteristics of the insured’s vehicle.  

Launch into workspace that you have created in the Azure Portal, open Azure Notebooks and create a New Project with a memorable name.

In the project workspace we can store data, notebooks, pickle-files etc.  Open the created project and Create a new Python-3.6 Notebook

Azure notebooks are powered by Jupyter notebooks. Once the notebook is opened everything appears like a Jupyter notebook. We can develop the scripts using different programming languages and different versions in azure notebooks.

Step 1 – Import Data. A new Azure Notebook by will be opened, with in this notebook we develop Machine Learning (ML) Model. One who is familiar with Jupyter notebook found will no difference with Azure Notebooks functionality. In addition to Jupyter notebooks We have Azure ML Libraries. We are using Azure Blob Storage to store the data. Using “azureml” library the data will be imported.

import azureml.dataprep as dprep

dataset_root = “https://databricksstg.blob.core.windows.net/allstate/allstate_2006_100k.csv”

data_az = dprep.read_csv(path=dataset_root, header=dprep.PromoteHeadersMode.GROUPED)

display(data_az.head(5))

We can use either azureml library or regular python libraries to process data and to develop ML model. Accordingly, we need to change dataset type. AML allows us to switch among different dataset types. Here we are using regular python libraries so, the imported dataset will be converted into the pandas Data frame.    

data = data_az.to_pandas_dataframe()

Step 2 – Exploratory Data Analysis.  Exploratory Data Analysis (EDA) refers to process of performing initial investigations on data to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. EDA gives the idea of data pre-processing steps and ML Algorithms to be considered. We have performed EDA on the data.

In the EDA, the following things are checked

  • The datatype of each variable
  • Number of Categorical levels in each categorical variable
  • Statistics (Mean, Min, Max, Variance etc.) of the numerical variable
  • Missing values in each variable
  • The distribution of the Target variable
  • The distribution of the input numerical variable
  • Correlation among Numerical variables and Target variable
  • Relation between car age and the Target Variable

These are the observations from EDA:

  • Target Variable is a Tweedie Distribution which is a combination of Poisson Distribution and Exponential Distribution and frequently used in the Insurance industry
  • All the input numeric Variables are Normally Distributed
  • No input variable is correlated with the Target Variable
  • Vehicle Age is Impacting the Claim Amount
  • Data has Missing Values and Some variables have more then 40% of missing values
  • Data has Categorical Variables and some Categorical Variables have lot of categorical variables

 

Step 3 – Data Preparation.  Based on the observations from the EDA the data pre-processing is done.

  • Features which doesn’t impact target variable are removed (Row ID, Household ID)
  • Variables which have more then 40% missing values are not Considered to build the model as those may mislead
  • Some new features are extracted, and redundant features are removed
  • Missing Values are Imputed using central imputation method
  • Categorical features are converted into One-Hot Encode which have less Categorical levels
  • Dataset is divided into Train dataset and test dataset
  • Remaining Categorical features are converted into Response Averages

Step 4 – Building ML Model.  Distribution of the Target Variable is Tweedie. So, linear models are not efficient here. Generalised Linear Models (GLM) with the Tweedie distribution and XGBoost with the Tweedie distribution are used to build model. Tweedie Variance power is the key hyper parameter with the both Algorithms. We experimented different ML models with GLM and XGBoost by tuning different hyper parameters. Results are better with XGBoost model then GLM. Once the model is finalised save the ML model in “.pkl” or “.sav” format.

Step 5 – Evaluation Metric.  Normalized Gini index is the evaluation metric here. Gini index is widely used in statistics to measure the inequalities in the distribution, higher Gini index indicates greater inequality. The Normalized Gini evaluates order of the predicted values which is important in this case.

Once the ML model is finalized, it is ready to be deployed. Till here the process is same as in Jupyter notebooks or any other IDE.  To Deploy the model, we use Azure ML library. We can deploy the model as a Web service using Azure Container Instances, Azure Kubernetes Service, or FPGAs. We create the service from our model, script, and associated files. These are encapsulated in an image. To deploy the model the steps are as follows.

Step 6 – Register the ML Model. Set up the environment and authenticate your workspace by providing Subscription ID of your AML workspace, Resource group name and your AML workspace name. Retrieve the model name saved earlier and choose a name to register the Model.

from azureml.core import workspace

ws = workspace.Workspace(“Subscription-ID-of-your-AML-workspace”,” Your-Resource-group-name “, “Your-AML-workspace-name”, auth=None, _location=None, _disable_service_check=False)

from azureml.core.model import Model

model = Model.register(model_path = “allstate_model.sav”,

                       model_name = “AllStatemodel”,

                       tags = {“key”: “0.1”},

                       description = “test”, workspace = ws)

After successfully registering the model, we will be able to see the model in the AML workspace.

Step 7 – Register Image.  We need to create scoring script, environment file and configuration file. Scoring script is a web service call which uses this script to show how to use the model. Environment file specifies all the script’s package dependencies. This file is used to make sure that all those dependencies are installed in the Docker image. Deployment configuration file Specifies the number of CPUs and gigabytes of RAM needed for your Container Instances container.

Scoring Script:

%%writefile score.py

import json

import numpy as np

import os

import pickle as pkl

from azureml.core.model import Modelimport requests

from pip._internal import main

main([‘install’,” xgboost”])

 

def init():

    global model

    # retrieve the path to the model file using the model name

    model_path = Model.get_model_path(‘AllStatemodel’)

    filename = “allstate_model.sav”

    model = pkl.load(open(filename, ‘rb’))

   

def run(raw_data):

    data = np.array(raw_data)

    # make prediction

    y_hat = model.predict(data)

    # you can return any data type as long as it is JSON-serializable

    return y_hat.tolist()

 

Environment File:

from azureml.core.conda_dependencies import CondaDependencies

myenv = CondaDependencies()

myenv.add_conda_package(“scikit-learn”)

myenv.add_conda_package(“xgboost”)

with open(“myenv.yml”,”w”) as f:

    f.write(myenv.serialize_to_string())

Deployment Configuration:

from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,memory_gb=1,tags={“data”: “x_test.csv”,  “method” : “xgboost”}, description=’Predict Cliam_Amount with XGboost’)

Once the image is registered, we will see it in the under images of our AML workspace.

Step 8 – Deploy Image. Configure the image and deploy. Here we deploy Image to Azure Container Instances, that offers a simple way to run a container in Azure, without having to provision any virtual machines and without having to adopt a higher-level service. After successfully deploying the image, it will appear under Deployments in the AML workspace.

%%time

from azureml.core.webservice import Webservice

from azureml.core.image import ContainerImage

# configure the image

image_config = ContainerImage.image_configuration(execution_script=”score.py”,

                                                  runtime=”python”, conda_file=”myenv.yml”)

 

service = Webservice.deploy_from_model(workspace=ws,

                                       name=’allstate-xgboost1′,   deployment_config=aciconfig,

                                       models=[model],  image_config=image_config)

service.wait_for_deployment(show_output=True)

 

 

 

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google
Spotify
Consent to display content from Spotify
Sound Cloud
Consent to display content from Sound