Mesos introduction

Mesos support is currently TECHNICAL PREVIEW. It may not be suitable for production use.

The basic concepts of Cloudbreak's Mesos support are the same as the other other cloud provider implementations: HDP clusters will be provisioned through Ambari with the help of blueprints and Ambari server and agents run in Docker containers. But it has a lot of major differences, and to start working with Cloudbreak on Mesos these differences must be understood first.

Differences with the cloud provider implementations

1. The Mesos integration doesn't start new instances and doesn't build a new infrastructure on a cloud provider.

Cloudbreak's normal behavior is to build the infrastructure first where Hadoop components will be deployed later through Ambari. It involves creating or reusing the networking layer like virtual networks and subnets, provisioning new virtual machines in these networks from pre-existing cloud images and starting the Ambari docker containers on these nodes. Mesos integration was designed not to include these steps because in most cases users already have their own Mesos infrastructure and would like to deploy their cluster there, near their other components. That's why Cloudbreak expects to "bring your own Mesos infrastructure" and configure access to this Mesos deployment in Cloudbreak first.

2. A Mesos credential on the Cloudbreak UI means configuring access to the Marathon API.

Cloudbreak needs a control system in Mesos through which it can communicate and start Ambari containers. The standard application scheduling framework for services in Mesos is Marathon so we've chosen it as the solution for Cloudbreak. It means that to be able to communicate with Mesos, Cloudbreak needs a Marathon deployment on the Mesos cluster. When setting up access in Cloudbreak a Marathon API endpoint must be specified. Basic authentication and TLS on the Marathon API is not yet supported in the tech preview.

3. A Mesos template on the Cloudbreak UI means constraints instead of new resources.

Cloudbreak templates describe the virtual machines in a cluster's hostgroup that will be provisioned through the cloud provider API. Templates can be created on the UI for Mesos and they can be linked to a hostgroup as well but these templates mean resource constraints that will be demanded through the Marathon API instead of resources that will created. Example:

4. Cloudbreak doesn't start a gateway instance.

On the cloud providers there is a gateway VM that's deployed for every new cluster by Cloudbreak. It runs a few containers like Ambari server but most importantly runs an Nginx server. Every communication between a Cloudbreak deployment and a cluster deployed by Cloudbreak goes through this Nginx instance. This is done on a 2-way TLS channel where the Nginx server is responsible for the TLS termination. Communication inside the cluster, like between Ambari server and agents is not encrypted but every communication from outside is secure. It enables Cloudbreak to be deployed outside of the private network of the cluster. The Mesos integration doesn't have a solution like this, so every communication between Cloudbreak and the cluster goes through an unencrypted channel. This is one of the reasons that in this case Cloudbreak should be deployed inside the same private network (or in the same Mesos cluster) where the clusters will be deployed.

Technical Preview Restrictions

1. No out-of-the-box dns solution like Consul.

In case of Mesos Cloudbreak does not provide a custom DNS solution like on other cloud providers, where Consul is used to provide addresses for every node and some services like Ambari server. In the Mesos tech preview containers are deployed with net=host, and Mesos nodes must be set up manually in a way to be able to resolve each other's hostnames to IP addresses and vice versa with reserve DNS. This is a requirement of Hadoop and it is usually accomplished by setting up the /etc/hosts file on each node in the cluster, but it can also be provided by some DNS servers like Amazon's default DNS server in a virtual network. Example:

    10.0.0.2 node2
    10.0.0.3 node3
    10.0.0.4 node4
    10.0.0.5 node5

2. Cloudbreak must be able to resolve the addresses of the Mesos slaves.

Cloudbreak must be able to communicate with the Ambari server that's deployed in the Mesos cluster to make the API requests needed for example to create a cluster. After Cloudbreak instructs Marathon to deploy the Ambari server container somewhere in the Mesos cluster it asks the address of the node where it was deployed and will try to communicate with the Ambari server through the address that was returned by Marathon. Take for example a Mesos cluster with 5 registered nodes: node1, node2, node3, node4, node5:

Because of the lack of a gateway node, communication is unencrypted between Cloudbreak and the clusters, so it is suggested that Cloudbreak should be deployed in the same private network. In that case the above scenario is probably not a problem. If Cloudbreak is not in the same network this can be solved by adding the addresses with a reachable IP in the /etc/hosts file of the machine where Cloudbreak is deployed.

3. Storage management needs to be improved

This is one of the two biggest limitations of the current Mesos integration. There is no specific volume management in the current integration, which means that data is stored inside the Docker containers. This solution has a lot of problems that will be solved only in later releases:

4. IP-per-task is not supported yet

The other big limitation of the current integration is the lack of IP-per-task support. Currently containers are deployed with net=host which means that only one container can be deployed per Mesos host because of possible port collisions, and that is the case even with multiple clusters. IP-per-task means that every task of an app (all the containers) deployed through Marathon will get their own network interface and an IP address. This feature is already available in Mesos/Marathon but does not work in combination with Docker containers.

5. Recipes are not supported

Recipes are script extensions to an HDP cluster installation supported by Cloudbreak, but it is not supported with the Mesos integration because of the lack of a Consul deployment as this feature is heavily dependent on Consul's HTTP API.

Cloudbreak deployer

Cloudbreak Deployer Highlights

Setup Cloudbreak Deployer

First you should install the Cloudbreak Deployer manually on a VM inside your Mesos cluster's private network.

If you have your own installed VM, you should check the Initialize your Profile section here before starting the provisioning.

Open the cloudbreak-deployment directory:

cd cloudbreak-deployment

This is the directory of the configuration files and the supporting binaries for Cloudbreak Deployer.

Initialize your Profile

First initialize cbd by creating a Profile file:

cbd init

It will create a Profile file in the current directory. Please open the Profile file then check the PUBLIC_IP. This is mandatory, because it is used to access the Cloudbreak UI. In some cases the cbd tool tries to guess it. If cbd cannot get the IP address during the initialization, please set the appropriate value.

Start Cloudbreak Deployer

To start the Cloudbreak application use the following command. This will start all the Docker containers and initialize the application.

cbd start

At the very first time it will take for a while, because it will need to download all the necessary docker images.

The cbd start command includes the cbd generate command which applies the following steps:

Validate the started Cloudbreak Deployer

After the cbd start command finishes followings are worth to check:

   cbd doctor

In case of cbd update is needed, please check the related documentation for Cloudbreak Deployer Update. Most of the cbd commands require root permissions.

   cbd logs cloudbreak

Cloudbreak should start within a minute - you should see a line like this: `Started CloudbreakApplication in 36.823 seconds

Provisioning Prerequisites

A working Mesos cluster with Marathon

It is not the scope of Cloudbreak to provision a new Mesos cluster so it needs an already working Mesos cluster where it will be able to start HDP clusters. It is also required to have Marathon installed because Cloudbreak uses its API to schedule Docker containers.

Hostnames must be resolvable inside the Mesos cluster and also by Cloudbreak

Cloudbreak does not deploy a custom DNS solution like on other cloud providers, where Consul is used to provide addresses for every node. Containers are deployed with net=host and Mesos nodes must be set up manually in a way to be able to resolve each other's hostnames to IP addresses and vice versa with reserve DNS. This is a requirement of Hadoop and it is usually accomplished by setting up the /etc/hosts file on each node in the cluster, but it can also be provided by some DNS servers like Amazon's default DNS server in a virtual network.

Example:

    10.0.0.2 node2
    10.0.0.3 node3
    10.0.0.4 node4
    10.0.0.5 node5

Docker must be installed on Mesos slave nodes and Docker containerizer must be enabled

To be able to use the Docker containerizer, Docker must be installed on all the Mesos slave nodes. To install Docker, follow the instructions in their documentation here.

After Docker is installed, it can be configured for the Mesos slave, by adding the Docker containerizer to each Mesos slave configuration. To configure it, add docker,mesos to the file /etc/mesos-slave/containerizers on each of the slave nodes (or start mesos-slave with the --containerizers=mesos,docker flag, or set the environment variable MESOS_CONTAINERIZERS="mesos,docker"). You may also want to increase the executor timeout to 10 mins by adding 10mins to /etc/mesos-slave/executor_registration_timeout because it will allow time for pulling large Docker images.

Provisioning via Browser

You can log into the Cloudbreak application at http://<PUBLIC_IP>:3000.

The main goal of the Cloudbreak UI is to easily create clusters on your own cloud provider, or on your existing Mesos cluster. This description details the Mesos setup - if you'd like to use a different cloud provider check out its manual.

This document explains the four steps that need to be followed to create Cloudbreak clusters from the UI:

IMPORTANT Make sure that you have sufficient quota (CPU, memory) in your Mesos cluster for the requested cluster size.

Setting up Marathon credentials

Cloudbreak works by connecting your Marathon API through so called Credentials, and then uses the API to schedule containers on your Mesos cluster. The credentials can be configured on the manage credentials panel on the Cloudbreak Dashboard.

To create a new Marathon credential follow these steps:

  1. Fill out the new credential Name
    • Only alphanumeric and lowercase characters (min 5, max 100 characters) can be applied
  2. Add an optional description
  3. Specify the endpoint of your Marathon API in this format: http://<marathon-address>:<port>. Example: http://172.16.252.31:8080.

Public in account means that all the users belonging to your account will be able to use this credential to create clusters, but cannot delete it.

Authentication and HTTPS to a Marathon API is not yet supported by Cloudbreak

Resource constraints

After your Marathon API is linked to Cloudbreak you can start creating resource constraint templates that describe the resources requested through the Marathon API when starting an Ambari container.

When you create a resource constraint template, Cloudbreak does not make any requests to Marathon. Resources are only requested after the create cluster button was pushed and Cloudbreak starts to orchestrate containers. These templates are saved to Cloudbreak's database and can be reused with multiple clusters to describe the same resource constraints.

A typical setup is to combine multiple templates in a cluster for the different types of nodes. For example you may want to request more memory for Spark nodes.

The resource contraint templates can be configured on the manage templates panel on the Cloudbreak Dashboard under the Mesos tab. You can specify the memory, CPU and disk needed by the nodes in a hostgroup. If Public in account is checked all the users belonging to your account will be able to use this resource to create clusters, but cannot delete it.

Defining cluster services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones. Blueprints can be added from file, URL (an example blueprint) or the whole JSON can be written in the JSON text box.

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE Not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will fill that with a default value.

If Public in account is checked all the users belonging to your account will be able to use this blueprint to create clusters, but cannot delete or modify it.

Full size here.

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight modifications. There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the modifications have to be done manually. When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations...etc. that won't be applicable to the Cloudbreak cluster

Cluster deployment

After all the cluster resources are configured you can deploy a new HDP cluster.

Here is a basic flow for cluster creation on Cloudbreak's Web UI:

Configure Cluster tab

Choose Blueprint tab

Review and Launch tab

You can check the progress on the Cloudbreak Web UI if you open the new cluster's Event History. It is available if you click on the cluster's name.

Advanced options

There are some advanced features when deploying a new cluster, these are the following:

Validate blueprint This is selected by default. Cloudbreak validates the Ambari blueprint in this case.

Config recommendation strategy Strategy for configuration recommendations how will be applied. Recommended configurations gathered by the response of the stack advisor.

Cluster termination

You can terminate running or stopped clusters with the terminate button in the cluster details.

IMPORTANT Always use Cloudbreak to terminate the cluster instead of deleting the containers through the Marathon API. Deleting them first would cause inconsistencies between Cloudbreak's database and the real state and that could lead to errors

Edit on GitHub