HDP is the industry’s a truly secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust big data analytics that accelerate decision making and innovation. (Source: https://hortonworks.com/products/data-platforms/hdp/).
I am going to install HDP to create a big cluster of 5 nodes deployed on CloudSigma. CloudSigma provides easy deployment, a vast library of operating systems and easy-to-use interface to set a Big Data Platform within minutes.
Step 1: Set up and Configure your Desired Server Infrastructure
To begin, I have already created five machines at CloudSigma. These machines have 16 GB RAM, 8 cores (2.5 GHz each) and 256 GB SSD. These configurations cost around 20 cents per hour for each machine on CloudSigma to run. I have installed Ubuntu 16.04 on each of the machines. Also, I have cloned the following Ubuntu drives from CloudSigma’s library:
Ubuntu 16.04 with VirtiO drivers Python 3 and 2.7.12 Pip 9.0.1 OpenSSL 1.0.2l Cloud-init 0.7.9 and latest updates until 2017-12-26
Step 2: Set up the Master/Slave Configuration
Next, for our big data tools to work properly, we require that our host (master) should be able to communicate with each of the nodes (slaves). So, we create another sudo user account, say m1 with the following commands on each machine.
1 2 |
adduser m1 usermod -aG sudo m1 |
Now for the machines to be able to communicate to each other, we first give each of the machine a name in /etc/hosts file:
1 |
sudo vi /etc/hosts |
Add entries similar to these with the IPs of your machine and the names you want to give the machines, for example:
-
- IP_1 machine1.CloudSigma.dann machine1
-
- IP_2 machine2.CloudSigma.dann machine2
-
- IP_3 machine3.CloudSigma.dann machine3
-
- IP_4 machine4.CloudSigma.dann machine4
- IP_5 machine5.CloudSigma.dann machine5
Now we want that our m1 user from machine1 can access m1 user on other machines without being asked for password. For that passwordless ssh setup is done.
On machine1:
-
- I. Log onto user m1
1 |
su - m1 |
1 |
ssh-keygen |
1 2 3 4 |
ssh-copy-id m1@Machine2 ssh-copy-id m1@Machine3 ssh-copy-id m1@Machine4 ssh-copy-id m1@Machine5 |
Step 3: Get Ambari Up and Running
Go to HortonWorks’s HDP download page and choose your preferred option. We are going to install HDP 2.6.4 (Automated) with Ambari 2.6.1. Click on download and it will redirect you to Apache Ambari Installation page. Select the base OS. In our case, I have Ubuntu 16 machines.
Following that, login to the host machine as root.
1 |
sudo su - |
Next, download the Ambari repository file to a directory of choice. Execute the commands as mentioned on the page to download the repository file.
Now that we have the repo file, we can install Ambari. Since it downloads files of around 750 MB, a cloud platform is preferable for such clusters. This is because they provide an average download speed of around 40 MBps. So, it takes seconds with CloudSigma,
1 |
apt-get install ambari-server |
It’s time to set up the Ambari Server next.
1 |
ambari-server setup |
It will ask several things but default options are fine for our purposes. So, we can just hit enter while going through them and the setup will be done.
Finally, with the following command you can start Ambari:
1 |
ambari-server start |
In order to access the Ambari UI, go to this address using your browser on any computer/tablet.
1 |
http://<>:8080 |
For example, if my IP is 213.125.36.21, then I will go to the address http://213.125.36.21:8080.
Now that you are in Ambari UI, you can log in using default username – admin and password – admin. You should change them to something secure straight away.
And voilà – we are finally finished! This was our tutorial on how to set up a big cluster in 3 simple steps.
For more tutorials, go ahead and explore our Community Section on the website.
Happy Computing!
- Removing Spaces in Python - March 24, 2023
- Is Kubernetes Right for Me? Choosing the Best Deployment Platform for your Business - March 10, 2023
- Cloud Provider of tomorrow - March 6, 2023
- SOLID: The First 5 Principles of Object-Oriented Design? - March 3, 2023
- Setting Up CSS and HTML for Your Website: A Tutorial - October 28, 2022