Writing for BIDN

Looking to give back to the community or learn through teaching others? Anyone can post blogs by clicking Add Blog Post after contacting us for permission.

«August 2015»
MonTueWedThuFriSatSun
272829

Power BI and Big Data

If you’re worked in the wide and diverse field of information technology for almost any amount of time, it probably hasn’t taken you long to discover that the one thing constant about IT is that the technologies and strategies involved change faster than you can learn them. And if you work in business intelligence like I do, you don’t have to look very far at all to see change. The Microsoft Power BI team rolls out a software update every month! If I want to stay learned up on the technology, I have to really be on top of things.

About ten years ago when Hadoop was first being developed at Yahoo, I don’t think anyone could have anticipated the size of the ripples (more likes cannonball sized splashes) being able to access Big Data could and would have on the IT industry. Hadoop (and other advances in hardware and software technologies) gave us something we never had before: The ability to access and report on data in real time on a scale never previously imagined gives an organization to identify and understand trends and patterns in the data and gain previously unknown insights. The organizations that are able to leverage big data will be the organizations that leave their competition in the dust.

Set Up and Configure the Hortonworks Sandbox in Azure

Not only does Power BI Desktop give us the ability to connect to Hadoop Distributed File System (HDFS) for reporting we can also mash it up with other more traditional and structured data sources with minimal effort required. But that’s not what this blog post is all about. This post is about setting up a virtual machine in Azure running Hadoop and connecting to our Hortonworks Sandbox with Power BI Desktop :).

The first thing you do if you don’t have access to a Hadoop cluster is to set up the Hortonworks Sandbox on Azure. The good news is its free (for the duration of the trial) and its super easy. Just follow the instructions at this link to set up the Hortonworks Sandbox.

Hadoop in Azure

Once that’s set up, you’ll need to add mapping for the IP address and host name to your hosts file. Devin Knight has a blog on this that you’ll find helpful.

Connecting to Hadoop with Power BI Desktop

Once your Hortonworks Sandbox is set up, you’re ready to set up your connection to Hadoop with Power BI Query. Start up the Power BI Desktop and click Get Data. Scroll down and select Hadoop File (HDFS) and click Connect.

Get Data with Power BI

From there you can follow the rest of the wizard to load the data into the semantic model.

Load Data with Power BI

Once the data is loaded, you’ll need to modify the query to navigate to the data you wish to use in your model.

In Power BI Desktop, go to the Home ribbon and click Edit Queries.

Read more

30

Three Best Practices for Power BI

Since the release of Power BI Desktop this past week, I’ve been really spending my extra time digging into the application focusing on learning and experimenting as much as I can. When my wife has been watching Law and Order: SVU reruns at night after the rug rats are in bed, I’ve been right there next to her designing Power BI dashboards like the total data nerd that I am. When my kids have been taking their naps during the weekend, I’ve been writing calculations in the model for my test dashboards. Or when I’ve been riding in the car back and forth to work I’ve been thinking of new things to do with Power BI Desktop.

Since I’ve been spending a decent amount of time with Power BI Desktop, I thought I’d take a moment to share three things to know and remember when designing your Power BI models and dashboards that I think will help you make the most of this tool and be effective at providing the data your business needs to succeed.

1. Optimize your Power BI Semantic Model

It probably hasn’t taken you long to figure this one out if you’ve built Power Pivot/Tabular models or at least it won’t when you do start developing Power BI dashboards. The visualizations in Power BI and Power View are heavily meta-data driven which means that column names, table or query names, formatting and more are surfaced to the user in the dashboard. So if you using a really whacky naming convention in your data warehouse for your tables like “dim_Product_scd2_v2” and the column names aren’t much better, these naming conventions are going to be shown to the users in the report visualizations and field list.

For example, take a look at the following report.

Power BI Dashboard without formatting

Notice anything wonky about it? Check the field names, report titles and number formatting. Not very pretty, is it? Now take a look at this report.

Power BI Dashboard with formatting

See the difference a little cleaned up metadata makes? All I did was spend a few minutes giving the fields user-friendly name and formatting the data types. This obviously makes a huge difference in the way the dashboard appears to the users. By the way, I should get into the movie production business. ;)

My point is that the names of columns, formatting, data types, data categories and relationships are all super important to creating clean, meaningful and user friendly dashboards. The importance of a well-defined semantic model cannot be understated in my opinion. A good rule of thumb is to spend 80% to 90% of your time on the data model (besides, designing the reports is the easy part).

I’d also like the mention the importance of the relationships between the objects in the semantic model. Chance are you will have a small group of power users that will want to design their own dashboards to meet their job’s requirements and that’s one of the beauties of Power BI. But when users began developing reports, they may query your model in unexpected ways that will generate unexpected behaviors and results. I only want to mention this because the relationships between the objects in the model will impact the results your users will see in their reports. Double check your relationships and ensure that they are correct, especially after you add new objects to the model since the

Read more
31

Power BI Fantasy Football Player Stats Dashboards for Download

Every year at Pragmatic Works some coworkers, including consultants, marketing staff, support team members, software development staff and project management, partake in a company fantasy football league. And with the recent release of the new Power BI Desktop, I thought what better way is there to prepare to completely annihilate my coworkers and friends in an imaginary nonsensical game than by creating some nifty Power BI dashboards based on last years player stats as recorded by Yahoo! Sports. So I thought I’d walk you through some of the steps I followed to leverage the Yahoo! Sports NFL player stats page as a data source and some of the query transformations I applied to prepare the data for reporting.

Power BI dashboard with Power BI Desktop

Click here to download my Fantasy Football Dashboards Power BI .pbix file.

If you’re completed new to Power BI Desktop I highly suggest you watch my video walkthrough of Power BI Desktop or read my blog post which walks you through each step of creating your first Power BI dashboards with Power BI Desktop. Last Friday, I also blogged about my three best practices for designing a killer Power BI solution, so take a look at that.

To create these dashboards, I simply navigated to the Yahoo! Sports NFL stats page and found the page for each position I’m interested in for this fantasy football season. I copied the URL to my clipboard. In Power BI Desktop, click Get Data and then use the Web data source option. Then all you have to do is copy and paste the URL into the text box and click OK.

Get data from web with Power BI Desktop

Then select the HTML table that contains your data and click Edit. We need to edit our query because there are some issues with the data. By clicking Edit, we can apply transformations to our query which will allow us to do things like rename columns, remove unwanted columns, modify data types, create custom columns and much more.

Get data from web with Power BI Desktop

One thing you’ll notice in the above screen grab is that the column names are in the first row, so we need to fix that.

On the Home ribbon of the Query Editor, just click the Use First Row As Headers button. Pre

Read more
12
3456789
10111213141516
17181920212223
24252627282930
31123456

Using Parameters in Power Query Extracts

  • 7 August 2013
  • Author: DevinKnight
  • Number of views: 9492
  • 0 Comments

Introduction

By now you have likely heard a little about Power Query (formally known as Data Explorer). While you may have heard of the tool many still have not gotten their hands on it and started experiencing the potential it has. In case you’re new to Power Query here’s a couple quick points before continuing on:

My goal with this post is to show you how simple yet powerful Power Query really is. I’ll do this by showing you an example of solving a problem that would be fairly complex using traditional ETL tools like SSIS (SQL Server Integration Services) but made simple with Power Query.

Problem

Power Query has the ability to do some basic “screen scraping” of data from web pages and add this as a new data source to your Self-Service BI solution. The problem that often occurs when doing this is that there are often dozens of pages or filters that need to be modified to get a full historical view of the data on the web page. To solve this problem we can leverage parameters in the Power Query Formula Language to navigate through this data. Even though Power Query is new there are a couple very good resources for learning it that can be found here. To make this example fun we will be pulling our data from data from the National Football League website (www.nfl.com). When pulling historical data about teams from the league the website only permits users to view one year at a time. However, our goal is to view how teams have performed across all time.

clip_image002

To solve this we will create a parameter using the Power Query Formula Language to dynamically pass in the years that are needed to extract data across all time. Next we’ll walk through a beginning to end example on solving this problem.

Hypothesis

Once this data is collected I would like to prove or disprove a hypothesis of mine. I believe that the fewer penalty yards are accumulated by a team’s offensive will result in better performance. To determine if I’m right we will apply a visualization to the data once we’ve completed importing it.

Step by Step

  • Launch Excel 2010 or higher. My screenshots are all done using Excel 2013 but everything should look the same until the data visualization section.
  • Select the Power Query tab.
  • To complete this example we’ll need to enable advanced query editing. Under the Machine Settings section select Options.

clip_image004

  • Check the option called Enable Advanced Query Editing then click OK. This setting is turned off by default.

clip_image006

clip_image007

  • In the Navigator pane select Table 0, which has the content we need for this demonstration.

clip_image009

  • Rename the query by double-clicking on the query name in the top left of the Query Editor. Change the name from Query1 to Team Stats.
  • Now click the Edit Query button in the top right of the Query Edit.

clip_image011

  • Add the following parameter definition above the existing query:

(getYear) =>

  • Look at the URL string that is part of the query and replace the hardcoded year 2012 with the following expression:

" & Number.ToText(getYear) & " Click Done. The full query after these two changes should look like this: (getYear) => let Source = Web.Page(Web.Contents("http://www.nfl.com/stats/categorystats?tabSeq=2&statisticCategory=GAME_STATS&conference=ALL&role=TM&season=" & Number.ToText(getYear) & "&seasonType=REG")), Data0 = Source{0}[Data], ChangedType = Table.TransformColumnTypes(Data0,{{"Rk", type number}, {"Team", type text}, {"G", type number}, {"Pts/G", type number}, {"TotPts", type number}, {"Scrm Plys", type number}, {"Yds/G", type number}, {"Yds/P", type number}, {"1st/G", type number}, {"3rd Md", type number}, {"3rd Att", type number}, {"3rd Pct", type number}, {"4th Md", type number}, {"4th Att", type number}, {"4th Pct", type number}, {"Pen", type number}, {"Pen Yds", type number}, {"ToP/G", type text}, {"FUM", type number}, {"Lost", type number}, {"TO", type number}}) in ChangedType

  • You can now click the Invoke button to pass in any year value you want and get different results based on your selection. Click Invoke and type in the year 2009. Click OK.

clip_image013

  • This returns all data for the year 2009 for all teams. Our next step is to return data for every year for all teams. Expand the Steps pane on the right side of the Query Editor and click the X next to InvokedTeam Stats. This removes the last action taken, which was passing in the value of 2009 into our parameter. Click Done.

clip_image015

  • Select the Power Query tab.
  • Select From Other Source > Blank Query

clip_image017

  • In the Query Editor formula bar type the following expression:

= {2002..2012} This will automatically create a list of the last 10 years from 2002 to 2012

  • Right-click on the column header and select To Table to convert this list into a table. You will be prompted with some settings for changing this list to a table. Just accept the defaults and click OK.

clip_image019

  • Right-click on the column header again and select Insert Column > Custom
  • Use the following formula to pull in data from the parameter driven function we created earlier.

#"Team Stats"([Column1])

clip_image021 NOTE: The Power Query Formula Language is case sensitive

clip_image023

Double quotes are only needed here because there is a space in the name of the other query. Click OK.

  • Click the Expand button next to the new custom column then click OK.

clip_image025

  • This should give you a preview of all team stats across the last 10 years. Rename the query in the top left of the Query Editor from Query1 to Full Team Stats.
  • Multi-select the columns Column1, Custom.Team, Custom.Pts/G, Custom.TotPts, Custom.Yds/G, Custom.Pen, Custom.Pen Yds then right-click and select Remove Other Columns.

clip_image027

  • Rename Column1 to Year and rename all the other columns to remove the prefix of Custom. from the name. Columns can be renamed by right-clicking on them and selecting Rename.
  • Multi-Select the columns that have our aggregate data in it then right-click and select Change Type > Number as shown in the screen shot.

clip_image029

  • Finally, click Done to actually import the full dataset into Excel. Depending on the web site you’re extraction could take several minutes.
  • With the data now in Excel click the Load to data model button inside the Query Settings pane on the right of your screen. This sends to data directly into Power Pivot.

clip_image031

This is a feature only available in Excel 2013. If you’re using Excel 2010 then you must launch Power Pivot and go to the Design tab and chose from Existing Connections to do the same behavior.

Visualizing Data (Cherry on top)

Now that we have the data in Power Pivot it should be fairly simple to visualize it. We could go with a straight forward approach and use PivotTables, but since this demonstration was done using Excel 2013 let’s use Power View. If you’d like to replicate this demonstration but are using Excel 2010 you can do so by deploying your workbook to a Power Pivot gallery in SharePoint 2010 SP1 that uses the SQL Server 2012 Reporting Services add-in. Remember this is the section where I should be able to prove or disprove my hypothesis about the team’s offensive performance being impacted by penalties.

  • Go to the Insert tab in Excel 2013 and select Power View.
  • Delete any visualizations that Power View may have automatically tried to create for you.
  • Close the Filters section by hitting the next in the top right next to the Filters pane.
  • In the Power View Fields list expand Full Team Stats table and select Pen Yds, Pts/G, Yds/G, and Team.
  • With these fields selected change the visualization to a scatter chart by select Other Chart > Scatter in the Design tab.

clip_image033

  • Expand the scatter chart so it takes up the entire design surface except for the title.
  • From the Power View Fields list bring the Year column from the Full Team Stats table into the Play Axis of the Chart properties.

clip_image035

This chart is starting to tell us some interesting things. It looks like the least penalized team, the Atlanta Falcons, is on the upper half of offensive performance but not the best. The best offense is the New England Patriots and they’re about middle of the pack when it comes to penalties. Probably the most interesting thing I find here is that the Super Bowl winner from 2012 was the most penalized team and about middle of the pack when it comes to offensive output. That just goes to tell you that in American Football there’s another half of the game we’re not analyzing here. Baltimore is well known for have a great defense, which would explain the discrepancy. Overall it looks like my hypothesis cannot be proven right.

  • Now, if we hit the Play button in the bottom left of the chart we can see across the last 10 years.
  • It looks like the most consistent offensive team has been the New England Patriots. We can focus on the New England Patriots by clicking on their bubble in the scatter chart and it shows over the last 10 years Patriots have been an outstanding offensive team.

clip_image037

  • Click somewhere in the background of the chart to remove the filter on the Patriots.
  • Finally, give the report a title of NFL Offensive Performance to complete this example.

This is the kind of amazing analysis you can do with Power BI. I hope you’ve found this useful and can apply it to your own work! You can download the sample workbook for this example here.

Print
Tags:
Rate this article:
No rating
DevinKnight

DevinKnightDevinKnight

Other posts by DevinKnight

Please login or register to post comments.