Azure Setup

Setting up Cloudbreak on Azure is different than on other cloud providers for which we provide pre-built public images with Cloudbreak Deployer pre-installed. On Azure, you launch Cloudbreak Deployer using the Azure Resource Manager Templates.

Deploy Using the Azure Portal

To get started with Cloudbreak installation using the Azure Resource Manager template, click here: deploy on azure

VM Requirements

When selecting an instance type, consider these minimum and recomended requirements:
- 8GB RAM, 10GB disk, 2 cores - The minimum instance type suitable for Cloudbreak is D2

To learn about all requirements, see System Requirements.

Deployment Details

In addition to the default values, the following parameters are mandatory for the new cbd template:

On the Custom deployment panel:

On the Parameters panel:

Finally you should review the Legal terms from the Custom deployment panel:

Deployment takes about 15-20 minutes. You can track the progress on the resource group details. If any issue has occurred, open the Audit logs from the settings. We have faced an interesting behaviour on the Azure Portal: All operations were successful on template deployment, but overall fail.

Under the Hood

While Azure is creating the deployment, review this information about what happens in the background:

Cloudbreak Deployer Highlights

Validate That Cloudbreak Deployer Has Started and Profile has public IP properly configured

  sudo su

This is a MUST on Azure because the Customscript Extension which basically creates everything running as sudo and this is not modifiable.

  cd /var/lib/cloudbreak-deployment
  cbd doctor

If you need to run cbd update refer to Cloudbreak Deployer Update. Most of the cbd commands require root permissions.

   cbd logs cloudbreak

Cloudbreak should start within a minute - you should see a line like this: Started CloudbreakApplication in 36.823 seconds

Provisioning Prerequisites

We use the new Azure ARM in order to launch clusters. In order to work we need to create an Active Directory application with the configured name and password and adds the permissions that are needed to call the Azure Resource Manager API. Cloudbreak Deployer automates all this for you.

If you forget to configure these steps you will not able to create any resource with Cloudbreak

Azure Access Setup

If you do not have an Active Directory (AD) user then you have to configure it before deploying a cluster with Cloudbreak:

Why you need this? Read more here

Full size here.

You have got a temporary password so you have to change it before you start using the new user.

Full size here.

Azure Application Setup with Azure CLI

In order for Cloudbreak to be able to launch clusters on Azure on your behalf you need to set up your Azure ARM application. If you do not want to create your ARM application via the Azure Web UI, you can create it with Azure CLI.

You can find Azure CLI install documentation on the following link: Azure CLI

First you have to login with the command below:

az login

Then you can setup your Azure Application with the following Azure CLI command:

az ad sp create-for-rbac --name cloudbreak-app --password "****" --role Owner

Response:

{
  "appId": "********-748c-4018-b445-************",
  "displayName": "cloudbreak-app",
  "name": "http://cloudbreak-app",
  "password": "****",
  "tenant": "********-d98e-4c64-9301-************"
}

Why you need this? Read more here

  1. It creates an Active Directory application with the configured name, password
  2. It grants permissions to call the Azure Resource Manager API

Please use the output of the command when you creating your Azure credential in Cloudbreak.

File System Configuration

When starting a cluster with Cloudbreak on Azure, the default file system is “Local HDFS”.

Cloudbreak has support for Azure Data Lake Store (ADLS) file system, selecting it from the drop-down list automatically configures the required properties in the cluster. ADLS is not supported as default file system.

Hadoop has built-in support for the Windows Azure Blob Storage (WASB) file system, so it can be used easily as default file system. To enable this behavior, Use File System As Default must be selected.

Disks and Storage

In Azure every data disk attached to a virtual machine is stored as a virtual hard disk (VHD) in a page blob inside an Azure storage account. Because these are not local disks and the operations must be done on the VHD files it causes degraded performance when used as HDFS. When WASB is used as a Hadoop file system the files are full-value blobs in a storage account. It means better performance compared to the data disks and the WASB file system can be configured very easily but Azure storage accounts have their own limitations as well. There is a space limitation for TB per storage account (500 TB) as well but the real bottleneck is the total request rate that is only 20000 IOPS where Azure will start to throw errors when trying to do an I/O operation. To bypass those limits Azure Data Lake Store (ADLS) can be used, which is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem. To be able to use it, an ADLS account must be created on your Azure subscription. For more information on ADLS, refer to Overview of Azure Data Lake Store.

Containers Within the Storage Account

Cloudbreak creates a new container in the configured storage account for each cluster with the following name pattern cloudbreak-UNIQUE_ID. Re-using existing containers in the same account is not supported as dirty data can lead to failing cluster installations. In order to take advantage of the WASB file system your data does not have to be in the same storage account nor in the same container. You can add as many accounts as you wish through Ambari, by setting the properties described here. Once you added the appropriate properties you can use those storage accounts with the pre-existing data, like:

hadoop fs -ls wasb://data@youraccount.blob.core.windows.net/terasort-input/

IMPORTANT: Make sure that your cloud account can launch instances using the new Azure ARM (a.k.a. V2) API and you have sufficient qouta (CPU, network, etc) for the requested cluster size.

Generate a New SSH Key

All the instances created by Cloudbreak are configured to allow key-based SSH, so you'll need to provide an SSH public key that can be used later to SSH onto the instances in the clusters you'll create with Cloudbreak. You can use one of your existing keys or you can generate a new one.

To generate a new SSH keypair:

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
# Creates a new ssh key, using the provided email as a label
# Generating public/private rsa key pair.
# Enter file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
You'll be asked to enter a passphrase, but you can leave it empty.

# Enter passphrase (empty for no passphrase): [Type a passphrase]
# Enter same passphrase again: [Type passphrase again]

After you enter a passphrase the keypair is generated. The output should look something like below.

# Your identification has been saved in /Users/you/.ssh/id_rsa.
# Your public key has been saved in /Users/you/.ssh/id_rsa.pub.
# The key fingerprint is:
# 01:0f:f4:3b:ca:85:sd:17:sd:7d:sd:68:9d:sd:a2:sd your_email@example.com

Later you'll need to pass the .pub file's contents to Cloudbreak and use the private part to SSH to the instances

Provisioning via Browser

You can log into the Cloudbreak application at https://<Public_IP>/.

The main goal of the Cloudbreak UI is to easily create clusters on your own cloud provider account. This description details the AZURE setup - if you'd like to use a different cloud provider check out its manual.

This document explains the four steps that need to be followed to create Cloudbreak clusters from the UI:

Setting up Azure Credentials

Cloudbreak works by connecting your AZURE account through so called Credentials, and then uses these credentials to create resources on your behalf. The credentials can be configured on the manage credentials panel on the Cloudbreak Dashboard.

Please read the Provisioning prerequisites where you can find the steps how can get the mandatory Subscription ID, App ID, Password and App Owner Tenant ID for your Cloudbreak credential.

To create a new AZURE credential you have two options:

Interactive login

Before you can use interactive login, you have to provide your tenant ID and subscription ID in your Profile

Steps

  1. Fill out the new credential Name

    • Only alphanumeric and lowercase characters (min 5, max 100 characters) can be applied
  2. Copy your SSH public key to the SSH public key field

    • The SSH public key must be in OpenSSH format and it's private keypair can be used later to SSH onto every instance of every cluster you'll create with this credential.
    • The SSH username for the AZURE instances is cloudbreak.
  3. Click next. Then you will see a device code on the screen. Click on the 'Azure login' button and please enter the code on the azure portal.

  4. Select the account you wish to use

    • This account must be in the same subscription and tenant what you previously provided in your Profile otherwise the credential creation will fail.

  5. After that you should see a progress bar like on the image below

App based

Full size here.

Full size here.

Any other parameter is optional here.

Public in account means that all the users belonging to your account will be able to use this credential to create clusters, but cannot delete it.

Cloudbreak is supporting simple rsa public key instead of X509 certificate file after 1.0.4 version

Full size here.

Infrastructure Templates

After your AZURE account is linked to Cloudbreak you can start creating resource templates that describe your clusters' infrastructure:

When you create one of the above resources, Cloudbreak does not make any requests to AZURE. Resources are only created on AZURE after the create cluster button has pushed. These templates are saved to Cloudbreak's database and can be reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple large disks to the datanodes or have memory optimized instances for Spark nodes.

The instance templates can be configured on the manage templates panel on the Cloudbreak Dashboard.

The Volume Type describes the Storage Account type which will be used for the attached disks. The only constraint is that the Premium storage can only be used for DS instance types. For more details about the premium storage read this.

If Public in accountis checked all the users belonging to your account will be able to use this resource to create clusters, but cannot delete it

Networks

Your clusters can be created in their own networks or in one of your already existing one. The subnet's IP range must be defined in the Subnet (CIDR) field using the general CIDR notation.

Default AZURE Network

If you don't want to create or use your custom network, you can use the default-azure-network for all your Cloudbreak clusters. It will create a new network with a 10.0.0.0/16 subnet every time a cluster is created.

Custom AZURE Network

If you'd like to deploy a cluster to a custom network you'll have to create a new network template on the manage networks panel.

You have the following options:

IMPORTANT: In case of existing subnet make sure you have enough room within your network space for the new instances. The provided subnet CIDR will be ignored, but the existing subnet's CIDR range will be used. The security group behavior will be changed in this case as well described in the security group section below.

If Public in account is checked all the users belonging to your account will be able to use this network template to create clusters, but cannot delete it.

NOTE: The new networks are created on AZURE only after the the cluster provisioning starts with the selected network template.

Full size here.

Security groups

Security group templates are very similar to the security groups on Azure. They describe the allowed inbound traffic to the instances in the cluster. Currently only one security group template can be selected for a Cloudbreak cluster and all the instances have a public IP address so all the instances in the cluster will belong to the same security group. This may change in a later release.

Default Security Group

You can also use the two pre-defined security groups in Cloudbreak.

only-ssh-and-ssl: all ports are locked down except for SSH and the selected Ambari Server HTTPS (you can't access Hadoop services outside of the virtual network):

Custom Security Group

You can define your own security group by adding all the ports, protocols and CIDR range you'd like to use. The rules defined here doesn't need to contain the internal rules, those are automatically added by Cloudbreak to the security group on Azure.

Hadoop services : Ambari (8080) Consul (8500) NN (50070) RM Web (8088) Scheduler (8030RM) IPC (8050RM) Job history server (19888) HBase master (60000) HBase master web (60010) HBase RS (16020) HBase RS info (60030) Falcon (15000) Storm (8744) Hive metastore (9083) Hive server (10000) Hive server HTTP (10001) Accumulo master (9999) Accumulo Tserver (9997) Atlas (21000) KNOX (8443) Oozie (11000) Spark HS (18080) NM Web (8042) Zeppelin WebSocket (9996) Zeppelin UI (9995) Kibana (3080) * Elasticsearch (9200)

IMPORTANT 443, 9443, and 22 ports needs to be there in every security group otherwise Cloudbreak won't be able to communicate with the provisioned cluster.

If Public in account is checked all the users belonging to your account will be able to use this security group template to create clusters, but cannot delete it.

NOTE: The security groups are created on Azure only after the cluster provisioning starts with the selected security group template.

IMPORTANT: If you use and existing virtual network and subnet the selected security group will only be applied to the selected Ambari Server node due to the lack of capability to attach multiple security groups to an existing subnet. If you'd like to open ports for Hadoop you must do it on your existing security group.

Full size here.</sub

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones. Blueprints can be added from file, URL (an example blueprint) or the whole JSON can be written in the JSON text box.

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will fill that with a default value.

If Public in account is checked all the users belonging to your account will be able to use this blueprint to create clusters, but cannot delete or modify it.

Full size here.

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight modifications. There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the modifications have to be done manually. When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations...etc. that won't be applicable to the Cloudbreak cluster

Cluster Deployment

After all the cluster resources are configured you can deploy a new HDP cluster.

Here is a basic flow for cluster creation on Cloudbreak Web UI:

Configure Cluster tab

Setup Network and Security tab

Choose Blueprint tab

Add File System tab

Review and Launch tab

Cloudbreak uses Azure Resource Manager to create the resources - you can check out the resources created by Cloudbreak on the Azure Portal Resource groups page. Full size here.

Besides these you can check the progress on the Cloudbreak Web UI itself if you open the new cluster's Event History. Full size here.

Advanced options

There are some advanced features when deploying a new cluster, these are the following:

Ambari Username This user will be used as admin user in Ambari. You can log in using this username on the Ambari UI.

Ambari Password The password associated with the Ambari username. This password will be also the default password for all required passwords which are not specified in the blueprint. E.g: hive DB password.

Minimum cluster size The provisioning strategy in case the cloud provider cannot allocate all the requested nodes.

Validate blueprint This is selected by default. Cloudbreak validates the Ambari blueprint in this case.

Custom Image If you enable this, you can override the default image for provision.

Shipyard enabled cluster This is selected by default. Cloudbreak will start a Shipyard container which helps you to manage your containers.

Persistent Storage Name This is cbstore by default. Cloudbreak will copy the image into a storage which is not deleting under the termination. When you starting a new cluster then the provisioning will be much faster because of the existing image.

Attached Storage Type This is single storage for all vm by default. If are you using the default option then your whole cluster will by in one storage which could be a bottleneck in case of Azure. If you are using the separated storage for every vm then we will deploy as much storage account as many node you have and in this case IOPS limit concern just for one node.

Config recommendation strategy Strategy for how configuration recommendations will be applied. Recommended configurations gathered by the response of the stack advisor.

Cluster Termination

You can terminate running or stopped clusters with the terminate button in the cluster details.

IMPORTANT: Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the Azure resource group first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't be terminated. In this case the Forced termination option can help to terminate the cluster at the Cloudbreak side. If it has happened:

  1. You should check the related resources at the Azure Portal
  2. If it is needed you need to manually remove resources from there

Full size here.

Interactive mode / Cloudbreak Shell

The goal with the Cloudbreak Shell (Cloudbreak CLI) was to provide an interactive command line tool which supports:

Start Cloudbreak Shell

To start the Cloudbreak CLI use the following commands:

   cd cloudbreak-deployment
   cbd start
   cbd util cloudbreak-shell

At the very first time it will take for a while, because of need to download all the necessary docker images.

This will launch the Cloudbreak shell inside a Docker container then it is ready to use. Full size here.

IMPORTANT You have to copy all your files into the cbd working directory, what you would like to use in shell. For example if your cbd working directory is ~/cloudbreak-deployment then copy your blueprint JSON, public ssh key file...etc. to here. You can refer to these files with their names from the shell.

Autocomplete and Hints

Cloudbreak Shell helps you with hint messages from the very beginning, for example:

cloudbreak-shell>hint
Hint: Add a blueprint with the 'blueprint create' command or select an existing one with 'blueprint select'
cloudbreak-shell>

Beyond this you can use the autocompletion (double-TAB) as well:

cloudbreak-shell>credential create --
credential create --AWS          credential create --AZURE        credential create --EC2          credential create --GCP          credential create --OPENSTACK

Provisioning via CLI

Setting up Azure Credential

Cloudbreak works by connecting your Azure account through so called Credentials, and then uses these credentials to create resources on your behalf. Credentials can be configured with the following command for example:

credential create --AZURE --name my-azure-credential --description "sample credential" --subscriptionId 
your-azure-subscription-id --tenantId your-azure-application-tenant-id --appId 
your-azure-application-id --password YourApplicationPassword --sshKeyString "ssh-rsa AAAAB3***etc."

Cloudbreak is supporting simple rsa public key instead of X509 certificate file after 1.0.4 version

NOTE: Cloudbreak does not set your cloud user details - we work around the concept of Access Control Service (ACS). You should have already a valid Azure Subscription and Application. You can find further details here.

Alternatives to provide SSH Key:

You can check whether the credential was created successfully

credential list

You can switch between your existing credentials

credential select --name my-azure-credential

Infrastructure Templates

After your Azure account is linked to Cloudbreak you can start creating resource templates that describe your clusters' infrastructure:

When you create one of the above resources, Cloudbreak does not make any requests to Azure. Resources are only created on Azure after the cluster create has applied. These templates are saved to Cloudbreak's database and can be reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple large disks to the datanodes or have memory optimized instances for Spark nodes.

A template can be used repeatedly to create identical copies of the same stack (or to use as a foundation to start a new stack). Templates can be configured with the following command for example:

template create --AZURE --name my-azure-template --description "sample description" --instanceType Standard_D4 --volumeSize 100 --volumeCount 2 --volumeType Standard_LRS

The Volume Type describes the Storage Account type which will be used for the attached disks. The only constraint is that the Premium storage can only be used for DS instance types. For more details about the premium storage read this.

Other available option here is --publicInAccount. If it is true, all the users belonging to your account will be able to use this template to create clusters, but cannot delete it.

You can check whether the template was created successfully

template list

Networks

Your clusters can be created in their own networks or in one of your already existing one. If you choose an existing network, it is possible to create a new subnet within the network. The subnet's IP range must be defined in the Subnet (CIDR) field using the general CIDR notation.

Default AZURE Network

If you don't want to create or use your custom network, you can use the default-azure-network for all your Cloudbreak clusters. It will create a new network with a 10.0.0.0/16 subnet and 10.0.0.0/8 address prefix every time a cluster is created.

Custom AZURE Network

If you'd like to deploy a cluster to a custom network you'll have to apply the following command:

network create --AZURE --name my-azure-network --addressPrefix 192.168.123.123 --subnet 10.0.0.0/16

IMPORTANT: Make sure the defined subnet and theirs address prefixes here doesn't overlap with any of your already deployed subnet and its already used address prefix in the network, because the validation only happens after the cluster creation starts.

In case of existing subnet make sure you have enough room within your network space for the new instances. The provided subnet CIDR will be ignored, but a proper CIDR range will be used.

You can check whether the network was created successfully

network list

--addressPrefix This list will be appended to the current list of address prefixes.

You can find more details about the AZURE Address Prefixes here.

If --publicInAccount is true, all the users belonging to your account will be able to use this network template to create clusters, but cannot delete it.

NOTE: The new networks are created on AZURE only after the the cluster provisioning starts with the selected network template.

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones. Blueprints can be added from file or URL (an example blueprint).

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will fill that with a default value.

blueprint create --name my-blueprint --description "sample description" --file <the path of the blueprint>

Other available options:

--url the url of the blueprint

--publicInAccount If it is true, all the users belonging to your account will be able to use this blueprint to create clusters, but cannot delete it.

You can check whether the blueprint was created successfully

blueprint list

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight modifications. There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the modifications have to be done manually. When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations..etc. that won't be applicable to the Cloudbreak cluster.

Metadata Show

You can check the stack metadata with

stack metadata --name myawsstack --instancegroup master

Other available options:

--id In this case you can select a stack with id.

--outputType In this case you can modify the outputformat of the command (RAW or JSON).

Cluster Deployment

After all the cluster resources are configured you can deploy a new HDP cluster. The following sub-sections show you a basic flow for cluster creation with Cloudbreak Shell.

Select Credential

Select one of your previously created Azure credential:

credential select --name my-azure-credential

Select Blueprint

Select one of your previously created blueprint which fits your needs:

blueprint select --name multi-node-hdfs-yarn

Configure Instance Groups

You must configure instance groups before provisioning. An instance group define a group of nodes with a specified template. Usually we create instance groups for host groups in the blueprint. For Ambari server only 1 host group can be specified. If you want to install the Ambari server to a separate node, you need to extend your blueprint with a new host group which contains only 1 service: HDFS_CLIENT and select this host group for the Ambari server. Note: this host group cannot be scaled so it is not advised to select a 'slave' host group for this purpose.

instancegroup configure --instanceGroup master --nodecount 1 --templateName minviable-aws --securityGroupName all-services-port --ambariServer true
instancegroup configure --instanceGroup slave_1 --nodecount 1 --templateName minviable-aws --securityGroupName all-services-port --ambariServer false

Other available option:

--templateId Id of the template

Select Network

Select one of your previously created network which fits your needs or a default one:

network select --name default-azure-network

Create Stack / Create Cloud Infrastructure**

Stack means the running cloud infrastructure that is created based on the instance groups configured earlier (credential, instancegroups, network, securitygroup). Same as in case of the API or UI the new cluster will use your templates and by using Azure ARM will launch your cloud stack. Use the following command to create a stack to be used with your Hadoop cluster:

stack create --AZURE --name myazurestack --region "North Europe"

The infrastructure is created asynchronously, the state of the stack can be checked with the stack show command. If it reports AVAILABLE, it means that the virtual machines and the corresponding infrastructure is running at the cloud provider.

Other available option is:

--wait - in this case the create command will return only after the process has finished.

--persistentStorage - This is cbstore by default. Cloudbreak will copy the image into a storage which is not deleting under the termination. When you starting a new cluster then the provisioning will be much faster because of the existing image.

--attachedStorageType - This is SINGLE by default. If you are using the default option then your whole cluster will by in one storage which could be a bottleneck in case of Azure. If you are using the PER_VM then we will deploy as much storage account as many node you have and in this case IOPS limit concern just for one node.

Create a Hadoop Cluster / Cloud Provisioning

You are almost done! One more command and your Hadoop cluster is starting! Cloud provisioning is done once the cluster is up and running. The new cluster will use your selected blueprint and install your custom Hadoop cluster with the selected components and services.

cluster create --description "my first cluster"

Other available option is --wait - in this case the create command will return only after the process has finished.

You are done! You have several opportunities to check the progress during the infrastructure creation then provisioning:

Full size here.

         cluster show

Full size here.

Full size here.

Stop Cluster

You have the ability to stop your existing stack then its cluster if you want to suspend the work on it.

Select a stack for example with its name:

stack select --name my-stack

Other available option to define a stack is its --id.

Every time you should stop the cluster first then the stack. So apply following commands to stop the previously selected stack:

cluster stop
stack stop

Restart Cluster

Select your stack that you would like to restart after this you can apply:

stack start

After the stack has successfully restarted, you can restart the related cluster as well:

cluster start

Upscale Cluster

If you need more instances to your infrastructure, you can upscale your selected stack:

stack node --ADD --instanceGroup host_group_slave_1 --adjustment 6

Other available option is --withClusterUpScale - this indicates also a cluster upscale after the stack upscale. You can upscale the related cluster separately if you want to do this:

cluster node --ADD --hostgroup host_group_slave_1 --adjustment 6

Downscale Cluster

You also can reduce the number of instances in your infrastructure. After you selected your stack:

cluster node --REMOVE  --hostgroup host_group_slave_1 --adjustment -2

Other available option is --withStackDownScale - this indicates also a stack downscale after the cluster downscale. You can downscale the related stack separately if you want to do this:

stack node --REMOVE  --instanceGroup host_group_slave_1 --adjustment -2

Cluster Termination

You can terminate running or stopped clusters with

stack delete --name myawsstack

Other available option is --wait - in this case the terminate command will return only after the process has finished.

IMPORTANT: Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the CloudFormation stack first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't be terminated. In this case the Forced termination option on the Cloudbreak Web UI can help to terminate the cluster at the Cloudbreak side. If it has happened:

  1. You should check the related resources at the AWS CloudFormation
  2. If it is needed you need to manually remove resources from there

Silent Mode

With Cloudbreak Shell you can execute script files as well. A script file contains shell commands and can be executed with the script cloudbreak shell command

script <your script file>

or with the cbd util cloudbreak-shell-quiet command

cbd util cloudbreak-shell-quiet < example.sh

IMPORTANT: You have to copy all your files into the cbd working directory, what you would like to use in shell. For example if your cbd working directory is ~/cloudbreak-deployment then copy your script file to here.

Example

The following example creates a hadoop cluster with hdp-small-default blueprint on Standard_D3 instances with 2X100G attached disks on default-azure-network network using all-services-port security group. You should copy your ssh public key file into your cbd working directory with name id_rsa.pub and paste your Azure credentials in the parts with <...> highlight.

credential create --AZURE --description "credential description" --name myazurecredential --subscriptionId <your Azure subscription id> --appId <your Azure application id> --tenantId <your tenant id> --password <your Azure application password> --sshKeyPath id_rsa.pub
credential select --name myazurecredential
template create --AZURE --name azuretemplate --description azure-template --instanceType Standard_D3 --volumeSize 100 
--volumeCount 2
blueprint select --name hdp-small-default
instancegroup configure --instanceGroup host_group_master_1 --nodecount 1 --templateName azuretemplate --securityGroupName all-services-port --ambariServer true
instancegroup configure --instanceGroup host_group_master_2 --nodecount 1 --templateName azuretemplate --securityGroupName all-services-port --ambariServer false
instancegroup configure --instanceGroup host_group_master_3 --nodecount 1 --templateName azuretemplate --securityGroupName all-services-port --ambariServer false
instancegroup configure --instanceGroup host_group_client_1  --nodecount 1 --templateName azuretemplate --securityGroupName all-services-port --ambariServer false
instancegroup configure --instanceGroup host_group_slave_1 --nodecount 3 --templateName azuretemplate --securityGroupName all-services-port --ambariServer false
network select --name default-azure-network
stack create --AZURE --name my-first-stack --region "West US" --wait true
cluster create --description "My first cluster" --wait true

Congratulations! Your cluster should now be up and running on this way as well. To learn more about Cloudbreak and provisioning, we have some interesting insights for you.

Edit on GitHub