It's been a while since I've had the opportunity to blog so when I decided to install HDInsight on a VM, I figured what better opportunity to get back in the swing of it.
The Jumping Off Point
To get things started, I am using VirtualBox as my VM host and I am running a fully patched (all 150+ of them) version of Windows Server 2008R2. Not that its relevant, but to be thorough, I've also installed SQL Server 2012 SP 1 as it will be used in subsequent blogs.
A Tale of Two Installers
Before we dive in to the click-by-click walkthrough, it is important to understand that the HDInsight install actually consists of two separate installs. Technically, as you will see if you are installing it on a clean VM, there are actually a number of pre-requisites ranging from IIS to Python, but the two key ones are HortonWorks Data Platform and HDInsight.
The HortonWorks Data Platform is the core implementation of Hadoop. It includes HDFS, Map Reduce, Hive, Pig and HCatalog among others. The HDInsight installer on the other hand includes the HDInsight dashboard, Sqoop, Isotope.js, the sample materials that are available and PowerShell scripts to initialize the instance and configure the Head Node.
For those of us that live, breathe and bleed in the Microsoft world, the benefit of having two separate installers is obvious. Two standalone products can be updated on different intervals and since HDInsight is v0.0001 preview, a bad release won't cause you to lose your cluster. Awesome, if you ask me. Just remember that if you decide to punt on the HDInsight experiment, uninstall both the Hadoop Data Platform and HDInsight to completely remove it.
Step by Step Installation
1. HDInsight installed from the Web Platform Installer (WebPi). If WebPi is not installed on your machine you can install it by navigating to the following link: http://www.microsoft.com/web/downloads/platform.aspx
2. After you have installed and then launched WebPi, simply search for HDInsight and then click Add. Begin the installation process by clicking the Install button.
3. When the install process has started, you will be presented with a list of prerequisites which must be installed for HDInsight to function. The list of the prereqs may vary from machine to machine. As you can see by scrolling through the list HDInsight has multiple dependencies on IIS as well as on the Horton Data Platform previously discussed. Click the I Accept button to install the HDInsight and its dependencies.
4. The installer may take some time to progress through all the dependencies and a restart may be required if you do not have the Microsoft .Net 4.5 framework installed on your machine.
5. If a restart is required, the installation will automatically pick up where is left off. You may notice several command shell windows open and close at the installers set-ups both Python and the Hortonworks Data Platform before finally installing HDInsight.
6. When the installation is finished, you are presented an option to configure your installation. Clicking the Configure button launches the HDInsight Dashboard.
7. Browser your new Hadoop cluster by clicking on the local (hdfs) tile.
I experienced only one issue during the installation process. The first time I click the tile to view my cluster, I received an error saying the Page Cannot Be Displayed. A little research identified the problem which was the result of all the Apache services being stopped. I also noticed that all the services where set to manual. I changed the configure to automatic and started all the Apache services which resolved the issue.
There are two commands (start-onbox and stop-onebox) in the c:\Hadoop folder that call PowerShell scripts to perform the same task as above.
That's all it takes to get your own HDInsight/Hadoop cluster up an running!
Till next time,