Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Being Productive with HDInsight

  • 9 April 2013
  • Author: cprice1979
  • Number of views: 7496
  • 0 Comments

This post will be the holding place where I put misc. tools and tips for HDInsight 

Build Tools

1. Apache ANT (http://ant.apache.org/manual/install.html)

Extract archive to c:\ant\ then modify the classpath to include Ant:

set ANT_HOME=c:\ant

set PATH=%PATH%;%ANT_HOME%\bin
2. Apache IVY (http://ant.apache.org/ivy/history/latest-milestone/install.html)

  • Copy Ivy.JAR to Ant lib folder

3. Git Client (http://git-scm.com/downloads)

 

Data Preparation/Research Tools

1. CURL (http://curl.haxx.se/download.html)

2. CYGWIN (http://cygwin.com)

3. Enthought Data Platform (EDP) (http://www.enthought.com/products/epd.php)

4. GNU Parallel (ftp://ftp.gnu.org/gnu/parallel/ )

 

PIGGYBANK

Community contributed user defined functions for PIG

  • Retrieve source from Git:
    git clone https://github.com/apache/pig.git
    
    ls Pig
    
    git checkout -b branch-0.9 remotes/origin/branch-0
    
  • Build Pig and then PiggyBank using Ant
  • Pig Script:
    -- myscript.pig
    REGISTER C:\Users\Administrator\pig\contrib\piggybank\java\piggybank.jar;
    
    A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
    
    B = FOREACH A GENERATE myudfs.UPPER(name);
    
    DUMP B;
    
Print
Categories: Analysis Services
Tags:
Rate this article:
No rating

cprice1979cprice1979

Other posts by cprice1979

Please login or register to post comments.