Using the AzureML R package to connect Azure ML Studio and R

This notebook demonstrates some of the capabilities of the AzureML package:

Read, download, and delete data in Azure ML workspace
Read intermediate data from Azure ML experiment
Allow for a concise way of publishing and consuming web services

The target audience should have a basic understanding of the Azure Machine Learning studio. Specifically, you should:

Know how to get the workspace ID and authorization token
- Note that this step is not necessary in Azure ML Jupyter notebooks, because the notebook service stores your credentials in the file system
Understand setting up web services on Azure

If you are completely new to Azure ML, the Tutorial for Data Scientists can help you get started. All results as shown here are from my own Azure ML workspace.

Note that you don't have to specify own workspace ID and authorization token to run the code.

Load the package

The package is already in the Jupyter service on Azure ML.

Start by loading the package.



In [1]:

    
library(AzureML)









    



Warning message:
: package 'AzureML' was built under R version 3.2.3

Work with Workspace

The AzureML R package allows you to work with workspaces directly. Specifically, with this package users can read, download, and delete data in an Azure ML workspace.

Connect with AzureML workspace

Start by loading the library and setting up connection with an AzureML workspace.



In [2]:

    
# Connect to the workspace
ws <- workspace()

List datasets

The datasets attribute of a workspace() variable contains information about all the datasets in the workspace, including the default datasets from Microsoft.



In [3]:

    
# list first several datasets in my workspace
head(datasets(ws, filter = "sample")$Name)









    Out[3]:





	"text.preprocessing.zip"
	"fraudTemplateUtil.zip"
	"testDataSource_a5e9faf817084997981d094ef040389b"
	"MetaAnalytics.Test.GlobalDataset.IntegerTSVFile"
	"MetaAnalytics.Test.GlobalDataset.IntegerCSVFile"
	"testDataSource_65885ed31a854d70962a02d5641d3c92"

Download a dataset

To download a dataset we can use the download.datasets() function.



In [4]:

    
# download datasets
movies <- download.datasets(ws, name = "Movie Ratings")
head(movies)









    Out[4]:





UserId MovieId Rating Timestamp

	1 1 68646 10 1381620027
	2 1 113277 10 1379466669
	3 2 454876 8 1394818630
	4 2 790636 7 1389963947
	5 2 816711 8 1379963769
	6 2 1091191 7 1391173869



In [5]:

    
options(repr.plot.width = 6, repr.plot.height = 4)
hist(movies$Rating, main = "Rating", xlab = NULL)

Upload a dataset

We'll use the air quality dataset that comes with base R to show how a dataset can be uploaded. Note that if dataset with the same name already exists in the workspace an error will be reported.



In [6]:

    
airquality[1:10,]









    Out[6]:





Ozone Solar.R Wind Temp Month Day

	1 41 190 7.4 67 5 1
	2 36 118 8 72 5 2
	3 12 149 12.6 74 5 3
	4 18 313 11.5 62 5 4
	5 NA NA 14.3 56 5 5
	6 28 NA 14.9 66 5 6
	7 23 299 8.6 65 5 7
	8 19 99 13.8 59 5 8
	9 8 19 20.1 61 5 9
	10 NA 194 8.6 69 5 10



In [7]:

    
# uploading R data frame to Azure ML workspace
mydata <- airquality[1:10,]
# information about the uploaded dataset in the workspace will be returned
upload.dataset(mydata, ws, name = "my air quality")









    Out[7]:





VisualizeEndPoint SchemaEndPoint SchemaStatus Id DataTypeId Name Description FamilyId ResourceUploadId SourceOrigin ellip.h PromotedFrom UploadedFromFilename ServiceVersion IsLatest Category DownloadLocation IsDeprecated Culture Batch CreatedDateTicks

	1 NANA NANANA Pending a2aba0dafad8436788401bbc8c22fe36.6faa388f56e6415492bec335192d2d00.v1-default-106 GenericTSV my air quality 6faa388f56e6415492bec335192d2d00 f5004e84bfef4733b8cdc63bf1db717e FromResourceUpload <8b> NA 0 TRUE NA https://esintussouthsus.blob.core.windows.net/uploadedresources/C7FDC_a2aba0dafad8436788401bbc8c22fe36_f5004e84bfef4733b8cdc63bf1db717e.tsv?sv=2015-02-21&sr=b&sig=tkHy%2BvjbJ5dDQgCVY%2Fv6b9mz%2BgwMOK5GPFbq79U%2Bhsk%3D&st=2016-03-13T18%3A21%3A10Z&se=2016-03-14T18%3A26%3A10Z&sp=r&rscd=attachment%3B%20filename%3D%22my%20air%20quality.tsv%22 FALSE default 106 6.359349e+17



In [8]:

    
# download to check its content
head(download.datasets(ws, name = "my air quality"))









    Out[8]:





Ozone Solar.R Wind Temp Month Day

	1 41 190 7.4 67 5 1
	2 36 118 8 72 5 2
	3 12 149 12.6 74 5 3
	4 18 313 11.5 62 5 4
	5 NA NA 14.3 56 5 5
	6 28 NA 14.9 66 5 6

Delete a dataset

If the delete action is successful the returned status value for Deleted should be TRUE.



In [9]:

    
# delete dataset
delete.datasets(ws, name = "my air quality")









    Out[9]:





Name Deleted status_code

	1 my air quality TRUE 204

The "Airport Codes Dataset" is one of the dafault datasets in Azure ML. This example shows that the default datasets cannot be deleted.



In [10]:

    
# delete Azure sample dataset: not allowed
# Uncomment the following line to see the failure report

# delete.datasets(ws, name = "Airport Codes Dataset")

Work with experiments

The AzureML package allows you to get a summary of the existing experiments and to download the intermediate datasets.

List existing experiments

Information for all experiments in the workspace, including the default ones from Microsoft, can be returned.



In [11]:

    
# experiments
exps <- experiments(ws)
head(
    with(exps, data.frame(Description, ExperimentId, Creator, stringsAsFactors = FALSE))
    )
#head(cbind(Description = exps$Description, ExperimentId = exps$ExperimentId, Creator = exps$Creator))









    Out[11]:





Description ExperimentId Creator

	1 Simple convert to TSV a2aba0dafad8436788401bbc8c22fe36.f-id.2237184cf35e42c4a41c0307f623a3f3 adevries
	2 Experiment created on â€Ž2â€Ž/â€Ž4â€Ž/â€Ž2016 a2aba0dafad8436788401bbc8c22fe36.f-id.26730c0bfd0948afb8eb8b405bd264ba adevries
	3 Experiment created on â€Ž12â€Ž/â€Ž18â€Ž/â€Ž2015 a2aba0dafad8436788401bbc8c22fe36.f-id.3292fcc8bcd14c29b75242f384190a41 adevries
	4 Experiment created on â€Ž2â€Ž/â€Ž9â€Ž/â€Ž2016 a2aba0dafad8436788401bbc8c22fe36.f-id.63199eaa2e8e4da1beb4105b79b74988 adevries
	5 Experiment created on â€Ž2â€Ž/â€Ž23â€Ž/â€Ž2016 a2aba0dafad8436788401bbc8c22fe36.f-id.9c98fd354ff2471bb6d3d5798ce35d16 adevries
	6 Simple experiment to test Jupyter a2aba0dafad8436788401bbc8c22fe36.f-id.a0fbd6ccad204e65825b5b7e574c9be8 adevries

You can also filter by using the experiments() function with the "filter" argument.



In [12]:

    
# check sample experiments
e <- experiments(ws, filter = "samples")
head(e$Creator)
head(cbind(e$Description, e$ExperimentId))









    Out[12]:





	"Microsoft Corporation"
	"Microsoft Corporation"
	"Microsoft Corporation"
	"Microsoft Corporation"
	"Microsoft Corporation"
	"Microsoft Corporation"








    Out[12]:






	Binary Classification: Network intrusion detection                    506153734175476c4f62416c57734963.f-id.02556bd24faf4f099dc88c1d21262f36
	Text Classification: Step 2 of 5, text preprocessing                  506153734175476c4f62416c57734963.f-id.07811aadbdca4669a2d129abb083d2f0
	Learning with Counts: Binary Classification                           506153734175476c4f62416c57734963.f-id.08b6ef9574de46e7a9fdc74dc5221fb2
	Clustering: Color quantization                                        506153734175476c4f62416c57734963.f-id.0a4458138dc140329a934393d7bbf2ed
	Retail Forecasting: Step 4 of 6, train regression models              506153734175476c4f62416c57734963.f-id.0ec7e45f59244b63a994b86ae81d7f93
	Binary Classification: Direct marketing                               506153734175476c4f62416c57734963.f-id.11e8958b4571436d8848f2a75b348fdc

Download intermediate data

You can also download intermediate data from an experiment. To do this you need information for four variables:

experiment
node_id
port_name
data_type_id.

To obtain this information, follow these simple steps

Click the output port of the "Convert to CSV" module in an experiment you get: Figure 1
Next, click "Generate Data Access Code...", to get the information in Figure 2
Copy and paste the code from the "R" tab to your R session

Step 1: Click on Convert to CSV module output

Step 2: Click Generate data access code, and "R" tab

Copying the information to your local R session

You can copy and paste the code in the "R" tab to a local R session. When you have evaluate the code, the data gets downloaded from the Azure ML Studio to your local session.



In [13]:

    
# download intermediate data

# Replace the code below with the snipped provided by the AzureML Studio
#exp_data <- download.intermediate.dataset(ws = ws, 
#            experiment  = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
#            node_id = "xxxxxxxx-xxx-xxx-xxxx-xxxxxxxxxxxx-xxx",
#            port_name = "Results dataset",
#            data_type_id = "GenericCSV")

#head(exp_data)

A concise way of consuming web service

The AzureMLpackage also allows a very concise way of consuming the web service. All you need is to provide the web service ID and the workspace information. Then use consume() to consume the service from any R terminal (as long as you have internet access).

For illustration purpose, fit a linear model and deploy a web service based on the model.

If you encounter the error Requires external zip utility. Please install zip, ensure it's on your path and try again while running this on Windows, you can install RTools and add the install directory to the system path. For example, if it's installed in C:\Tools, you should add C:\Tools\bin to your system path and then restart R.

Publishing the web service



In [14]:

    
# load the package
library(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv ~ ., data = Boston)

# define predict function
mypredict <- function(newdata){
  predict(lm1, newdata)
}

# test the prediction function
newdata <- Boston[1:10, ]

# Publish the service
ep <- publishWebService(ws = ws, 
                        fun = mypredict, 
                        name = "HousePricePrediction", 
                        inputSchema = newdata)
str(ep)









    



Warning message:
: package 'MASS' was built under R version 3.2.3





    



Classes 'Endpoint' and 'data.frame':	1 obs. of  14 variables:
 $ Name                 : chr "default"
 $ Description          : chr ""
 $ CreationTime         : chr "2016-03-13T18:26:32.567Z"
 $ WorkspaceId          : chr "a2aba0dafad8436788401bbc8c22fe36"
 $ WebServiceId         : chr "126375f0e94911e58e50518e3763ef03"
 $ HelpLocation         : chr "https://studio.azureml-int.net/apihelp/workspaces/a2aba0dafad8436788401bbc8c22fe36/webservices/126375f0e94911e58e50518e3763ef03"| __truncated__
 $ PrimaryKey           : chr "rhEGPgds8G+Hptp8OC0sYFH7CqyyQx8j3nG1Tj6kocWKBaoa52rr4zJ+zAwj7DeCwMiAiLydGZXfPlcIF4uv2A=="
 $ SecondaryKey         : chr "lth3osRs3BjpoWrEn7urQ5F0Lxd4qnOl8TdUHw58PuBWImpC+DiZm1Ua9LoFS+u9ZsoyVyefM6dlO/BFcRMTBQ=="
 $ ApiLocation          : chr "https://ussouthcentral.services.azureml-int.net/workspaces/a2aba0dafad8436788401bbc8c22fe36/services/a150c8f086974477a6d9d63e12"| __truncated__
 $ PreventUpdate        : logi FALSE
 $ GlobalParameters     :List of 1
  ..$ : list()
 $ MaxConcurrentCalls   : int 4
 $ DiagnosticsTraceLevel: chr "None"
 $ ThrottleLevel        : chr "Low"

Consuming the web service

Now you are ready to consume the web service.



In [15]:

    
# consume
consume(ep, newdata)









    



Request failed with status 401. Waiting 1.7 seconds before retry
..





    Out[15]:





ans

	1 30.00384
	2 25.02556
	3 30.5676
	4 28.60704
	5 27.94352
	6 25.25628
	7 23.00181
	8 19.53599
	9 11.52364
	10 18.92026

Additional resources

The AzureML package also has a vignette Getting Started with the AzureML Package that covers a wider range of examples.

You can also take a look at the help for ?publishWebservice

	UserId	MovieId	Rating	Timestamp
1	1	68646	10	1381620027
2	1	113277	10	1379466669
3	2	454876	8	1394818630
4	2	790636	7	1389963947
5	2	816711	8	1379963769
6	2	1091191	7	1391173869

	Ozone	Solar.R	Wind	Temp	Month	Day
1	41	190	7.4	67	5	1
2	36	118	8	72	5	2
3	12	149	12.6	74	5	3
4	18	313	11.5	62	5	4
5	NA	NA	14.3	56	5	5
6	28	NA	14.9	66	5	6
7	23	299	8.6	65	5	7
8	19	99	13.8	59	5	8
9	8	19	20.1	61	5	9
10	NA	194	8.6	69	5	10

	Description	ExperimentId	Creator
1	Simple convert to TSV	a2aba0dafad8436788401bbc8c22fe36.f-id.2237184cf35e42c4a41c0307f623a3f3	adevries
2	Experiment created on â€Ž2â€Ž/â€Ž4â€Ž/â€Ž2016	a2aba0dafad8436788401bbc8c22fe36.f-id.26730c0bfd0948afb8eb8b405bd264ba	adevries
3	Experiment created on â€Ž12â€Ž/â€Ž18â€Ž/â€Ž2015	a2aba0dafad8436788401bbc8c22fe36.f-id.3292fcc8bcd14c29b75242f384190a41	adevries
4	Experiment created on â€Ž2â€Ž/â€Ž9â€Ž/â€Ž2016	a2aba0dafad8436788401bbc8c22fe36.f-id.63199eaa2e8e4da1beb4105b79b74988	adevries
5	Experiment created on â€Ž2â€Ž/â€Ž23â€Ž/â€Ž2016	a2aba0dafad8436788401bbc8c22fe36.f-id.9c98fd354ff2471bb6d3d5798ce35d16	adevries
6	Simple experiment to test Jupyter	a2aba0dafad8436788401bbc8c22fe36.f-id.a0fbd6ccad204e65825b5b7e574c9be8	adevries

Binary Classification: Network intrusion detection	506153734175476c4f62416c57734963.f-id.02556bd24faf4f099dc88c1d21262f36
Text Classification: Step 2 of 5, text preprocessing	506153734175476c4f62416c57734963.f-id.07811aadbdca4669a2d129abb083d2f0
Learning with Counts: Binary Classification	506153734175476c4f62416c57734963.f-id.08b6ef9574de46e7a9fdc74dc5221fb2
Clustering: Color quantization	506153734175476c4f62416c57734963.f-id.0a4458138dc140329a934393d7bbf2ed
Retail Forecasting: Step 4 of 6, train regression models	506153734175476c4f62416c57734963.f-id.0ec7e45f59244b63a994b86ae81d7f93
Binary Classification: Direct marketing	506153734175476c4f62416c57734963.f-id.11e8958b4571436d8848f2a75b348fdc

	ans
1	30.00384
2	25.02556
3	30.5676
4	28.60704
5	27.94352
6	25.25628
7	23.00181
8	19.53599
9	11.52364
10	18.92026