Name

Data preparation by deleting a cluster in Cloud Dataproc

Label

Cloud Dataproc, cluster, GCP, Cloud Storage, Kubeflow, Pipeline

Summary

A Kubeflow Pipeline component to delete a cluster in Cloud Dataproc.

Intended use

Use this component at the start of a Kubeflow Pipeline to delete a temporary Cloud Dataproc cluster to run Cloud Dataproc jobs as steps in the pipeline. This component is usually used with an exit handler to run at the end of a pipeline.

Runtime arguments

Argument	Description	Optional	Data type	Default
project_id	The Google Cloud Platform (GCP) project ID that the cluster belongs to.	No	GCPProjectID
region	The Cloud Dataproc region in which to handle the request.	No	GCPRegion
name	The name of the cluster to delete.	No	String
wait_interval	The number of seconds to pause between polling the operation.	Yes	Integer	30

Cautions & requirements

To use the component, you must:

Set up a GCP project by following this guide.
The component can authenticate to GCP. Refer to Authenticating Pipelines to GCP for details.
Grant the Kubeflow user service account the role roles/dataproc.editor on the project.

Detailed description

This component deletes a Dataproc cluster by using Dataproc delete cluster REST API.

Follow these steps to use the component in a pipeline:

Install the Kubeflow Pipeline SDK:



In [ ]:

    
%%capture --no-stderr

KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.14/kfp.tar.gz'
!pip3 install $KFP_PACKAGE --upgrade

Load the component using KFP SDK



In [ ]:

    
import kfp.components as comp

dataproc_delete_cluster_op = comp.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/01a23ae8672d3b18e88adf3036071496aca3552d/components/gcp/dataproc/delete_cluster/component.yaml')
help(dataproc_delete_cluster_op)

Sample

Note: The following sample code works in an IPython notebook or directly in Python code. See the sample code below to learn how to execute the template.

Prerequisites

Create a Dataproc cluster before running the sample code.

Set sample parameters



In [ ]:

    
PROJECT_ID = '<Please put your project ID here>'
CLUSTER_NAME = '<Please put your existing cluster name here>'

REGION = 'us-central1'
EXPERIMENT_NAME = 'Dataproc - Delete Cluster'

Example pipeline that uses the component



In [ ]:

    
import kfp.dsl as dsl
import json
@dsl.pipeline(
    name='Dataproc delete cluster pipeline',
    description='Dataproc delete cluster pipeline'
)
def dataproc_delete_cluster_pipeline(
    project_id = PROJECT_ID, 
    region = REGION,
    name = CLUSTER_NAME
):
    dataproc_delete_cluster_op(
        project_id=project_id, 
        region=region, 
        name=name)

Compile the pipeline



In [ ]:

    
pipeline_func = dataproc_delete_cluster_pipeline
pipeline_filename = pipeline_func.__name__ + '.zip'
import kfp.compiler as compiler
compiler.Compiler().compile(pipeline_func, pipeline_filename)

Submit the pipeline for execution



In [ ]:

    
#Specify pipeline argument values
arguments = {}

#Get or create an experiment and submit a pipeline run
import kfp
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)

#Submit a pipeline run
run_name = pipeline_func.__name__ + ' run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)

References

License

By deploying or using this software you agree to comply with the AI Hub Terms of Service and the Google APIs Terms of Service. To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.

Content source: kubeflow/kfp-tekton-backend

Similar notebooks:

notebook.community | gallery | about