Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Installing Mahout for HDInsight on Windows Server

  • 4 April 2013
  • Author: cprice1979
  • Number of views: 8527
  • 0 Comments

I am passionate when it comes to analytics, data mining and machine learning and I think most organizations do too little when it comes to this arena. That's why one of my favorite parts of the Hadoop ecosystem is Mahout.  

Mahout is a scalable machine learning library that includes multiple out of the box machine learning and data mining algorithms including clustering, classification, collaborative filtering and frequent pattern mining. 

If you are using HDInsight in the cloud Mahout comes pre-installed for your use. Unfortunately, if you are running a local HDInsight instance on Windows Server you must deploy Mahout on your own. 

While this may sound like a daunting task the fortunate thing is that underneath the covers of HDInsight is a standard instance of Hadoop. Let's take a look at what it takes to get Mahout up and running. 

Step-by-Step

1. Download the zipped Mahout 0.7 distribution from the Apache website: http://www.apache.org/dyn/closer.cgi/mahout/ 

2. Extract the contents of the zip file to c:\Hadoop and rename the folder mahout-0.7 for simplicity 

3. Now we are going to test the installation using the Simple Recommendation Engine demo: http://www.windowsazure.com/en-us/manage/services/hdinsight/recommendation-engine-using-mahout/ 

4. Follow the lab to generate the required files for lab or for expediency you can download them here:

mUser.txt

user.txt 

5. Once you have download the files, place them in the c:\temp\ directory on your HDInsight instance. 

6. Open the Hadoop Command Line console by clicking the link either found on the desktop or the on the start menu. 

7. The first step as directed by the lab is to copy the test files from the local file system into HDFS. Use the following commands to deploy both text files to HDFS: 

hadoop dfs -copyFromLocal c:\temp\mInput.txt input\mInput.txt

hadoop dfs -copyFromLocal c:\temp\users.txt input\users.txt 

8. Browse and verify that the files now exists within HDFS: 

hadoop fs -ls input/ 

image

 

9. I won?t explain what the sample job is doing since the lab referenced above does a good job of explaining that. We will simply use the sample job to verify the Mahout distribution is configured and ready for use: 

hadoop jar C:\Hadoop\mahout-0.7\mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE

--input=input/mInput.txt --output=output --usersFile=input/users.txt 

image 

10. The job will take several minutes to run to completion. When the job completes lets dump the results to a text file in the temp directory: 

hadoop fs -copyToLocal output/part-r-00000 c:\temp\output.txt 

image 

11. Optionally, to clean-up the files used for the test use the following commands to remove the output and temp directories: 

hadoop fs -rmr -skipTrash temp

hadoop fs -rmr -skipTrash output 

That's it. You Hadoop instance now has Mahout support!

Till next time!

Chris

 

Print
Categories: Analysis Services
Tags:
Rate this article:
No rating

cprice1979cprice1979

Other posts by cprice1979

Please login or register to post comments.