Writing for BIDN

Looking to give back to the community or learn through teaching others? Anyone can post blogs by clicking Add Blog Post after contacting us for permission.

«August 2015»
MonTueWedThuFriSatSun
272829

Power BI and Big Data

If you’re worked in the wide and diverse field of information technology for almost any amount of time, it probably hasn’t taken you long to discover that the one thing constant about IT is that the technologies and strategies involved change faster than you can learn them. And if you work in business intelligence like I do, you don’t have to look very far at all to see change. The Microsoft Power BI team rolls out a software update every month! If I want to stay learned up on the technology, I have to really be on top of things.

About ten years ago when Hadoop was first being developed at Yahoo, I don’t think anyone could have anticipated the size of the ripples (more likes cannonball sized splashes) being able to access Big Data could and would have on the IT industry. Hadoop (and other advances in hardware and software technologies) gave us something we never had before: The ability to access and report on data in real time on a scale never previously imagined gives an organization to identify and understand trends and patterns in the data and gain previously unknown insights. The organizations that are able to leverage big data will be the organizations that leave their competition in the dust.

Set Up and Configure the Hortonworks Sandbox in Azure

Not only does Power BI Desktop give us the ability to connect to Hadoop Distributed File System (HDFS) for reporting we can also mash it up with other more traditional and structured data sources with minimal effort required. But that’s not what this blog post is all about. This post is about setting up a virtual machine in Azure running Hadoop and connecting to our Hortonworks Sandbox with Power BI Desktop :).

The first thing you do if you don’t have access to a Hadoop cluster is to set up the Hortonworks Sandbox on Azure. The good news is its free (for the duration of the trial) and its super easy. Just follow the instructions at this link to set up the Hortonworks Sandbox.

Hadoop in Azure

Once that’s set up, you’ll need to add mapping for the IP address and host name to your hosts file. Devin Knight has a blog on this that you’ll find helpful.

Connecting to Hadoop with Power BI Desktop

Once your Hortonworks Sandbox is set up, you’re ready to set up your connection to Hadoop with Power BI Query. Start up the Power BI Desktop and click Get Data. Scroll down and select Hadoop File (HDFS) and click Connect.

Get Data with Power BI

From there you can follow the rest of the wizard to load the data into the semantic model.

Load Data with Power BI

Once the data is loaded, you’ll need to modify the query to navigate to the data you wish to use in your model.

In Power BI Desktop, go to the Home ribbon and click Edit Queries.

Read more

30

Three Best Practices for Power BI

Since the release of Power BI Desktop this past week, I’ve been really spending my extra time digging into the application focusing on learning and experimenting as much as I can. When my wife has been watching Law and Order: SVU reruns at night after the rug rats are in bed, I’ve been right there next to her designing Power BI dashboards like the total data nerd that I am. When my kids have been taking their naps during the weekend, I’ve been writing calculations in the model for my test dashboards. Or when I’ve been riding in the car back and forth to work I’ve been thinking of new things to do with Power BI Desktop.

Since I’ve been spending a decent amount of time with Power BI Desktop, I thought I’d take a moment to share three things to know and remember when designing your Power BI models and dashboards that I think will help you make the most of this tool and be effective at providing the data your business needs to succeed.

1. Optimize your Power BI Semantic Model

It probably hasn’t taken you long to figure this one out if you’ve built Power Pivot/Tabular models or at least it won’t when you do start developing Power BI dashboards. The visualizations in Power BI and Power View are heavily meta-data driven which means that column names, table or query names, formatting and more are surfaced to the user in the dashboard. So if you using a really whacky naming convention in your data warehouse for your tables like “dim_Product_scd2_v2” and the column names aren’t much better, these naming conventions are going to be shown to the users in the report visualizations and field list.

For example, take a look at the following report.

Power BI Dashboard without formatting

Notice anything wonky about it? Check the field names, report titles and number formatting. Not very pretty, is it? Now take a look at this report.

Power BI Dashboard with formatting

See the difference a little cleaned up metadata makes? All I did was spend a few minutes giving the fields user-friendly name and formatting the data types. This obviously makes a huge difference in the way the dashboard appears to the users. By the way, I should get into the movie production business. ;)

My point is that the names of columns, formatting, data types, data categories and relationships are all super important to creating clean, meaningful and user friendly dashboards. The importance of a well-defined semantic model cannot be understated in my opinion. A good rule of thumb is to spend 80% to 90% of your time on the data model (besides, designing the reports is the easy part).

I’d also like the mention the importance of the relationships between the objects in the semantic model. Chance are you will have a small group of power users that will want to design their own dashboards to meet their job’s requirements and that’s one of the beauties of Power BI. But when users began developing reports, they may query your model in unexpected ways that will generate unexpected behaviors and results. I only want to mention this because the relationships between the objects in the model will impact the results your users will see in their reports. Double check your relationships and ensure that they are correct, especially after you add new objects to the model since the

Read more
31

Power BI Fantasy Football Player Stats Dashboards for Download

Every year at Pragmatic Works some coworkers, including consultants, marketing staff, support team members, software development staff and project management, partake in a company fantasy football league. And with the recent release of the new Power BI Desktop, I thought what better way is there to prepare to completely annihilate my coworkers and friends in an imaginary nonsensical game than by creating some nifty Power BI dashboards based on last years player stats as recorded by Yahoo! Sports. So I thought I’d walk you through some of the steps I followed to leverage the Yahoo! Sports NFL player stats page as a data source and some of the query transformations I applied to prepare the data for reporting.

Power BI dashboard with Power BI Desktop

Click here to download my Fantasy Football Dashboards Power BI .pbix file.

If you’re completed new to Power BI Desktop I highly suggest you watch my video walkthrough of Power BI Desktop or read my blog post which walks you through each step of creating your first Power BI dashboards with Power BI Desktop. Last Friday, I also blogged about my three best practices for designing a killer Power BI solution, so take a look at that.

To create these dashboards, I simply navigated to the Yahoo! Sports NFL stats page and found the page for each position I’m interested in for this fantasy football season. I copied the URL to my clipboard. In Power BI Desktop, click Get Data and then use the Web data source option. Then all you have to do is copy and paste the URL into the text box and click OK.

Get data from web with Power BI Desktop

Then select the HTML table that contains your data and click Edit. We need to edit our query because there are some issues with the data. By clicking Edit, we can apply transformations to our query which will allow us to do things like rename columns, remove unwanted columns, modify data types, create custom columns and much more.

Get data from web with Power BI Desktop

One thing you’ll notice in the above screen grab is that the column names are in the first row, so we need to fix that.

On the Home ribbon of the Query Editor, just click the Use First Row As Headers button. Pre

Read more
12
3456789
10111213141516
17181920212223
24252627282930
31123456

Answers to Advanced SSIS Interview Question

  • 7 November 2009
  • Author: DevinKnight
  • Number of views: 55769
  • 0 Comments

Last week I wrote a set of Advanced SSIS Interview Questions.  Here are the answers I came up for these.  Of course you may have some variation of what I have and I would love to hear how you would answer these.  Feel free to email me how you would answer them. 

I’ll skip the demonstrate/whiteboard part of the questions.

1. Demonstrate or whiteboard how you would suggest using configuration files in packages.  Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?

There should be a single configuration file for each connection manager in your packages that stores their connection string information.  So if you have 6 connection managers then you have 6 config files.  You can use the same config file across all your packages that use the same connections. 

If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file.  This means you may have to put connection managers in your package that you don’t even need.

2. Demonstrate or whiteboard how checkpoints work in a package.

When checkpoints are enabled on a package if the package fails it will save the point at which the package fails.  This way you can correct the problem then rerun from the point that it failed instead of rerunning the entire package.  The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.

3. Demonstrate or whiteboard using a loop in a package so each file in a directory with the .txt extension is loaded into a table.  Before demonstrating this tell which task/container accomplishes this and which enumerator will be used.  (Big hint on which task/container to use is that it requires and enumerator)

This would require a Foreach Loop using the Foreach File Enumerator.  Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through.  Next select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.

4. Demonstrate or whiteboard how transactions work in a package.

If transactions are enabled on your package and tasks then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative Tools -> Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction set TransactionOption at the package level to Required and each task to Supported.

5. If you have a package that runs fine in Business Intelligence Development Studio (BIDS) but fails when running from a SQL Agent Job what would be your first guess on what the problem is?

The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account.

To create a proxy account you need to first create new credentials with the appropriate permissions. Next assign those credentials to a proxy account. When you run the job now you will select Run As the newly created proxy account.

6. What techniques would you consider to add auditing to your packages?  You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations.

I like to create a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

7. What techniques would you consider to add notification to your packages?  You’re required to send emails to essential staff members immediately after a package fails.

This could either be set in the SQL Agent when the package runs or actually inside the package you could add a Send Mail Task in the Event Handlers to notify when a package fails.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

8. Demonstrate or whiteboard techniques you would use to for CDC (Change Data Capture)?  Tell how you would write a package that loads data but first detects if the data already exists, exists but has changes, or is brand new data for a destination.

If for some reason you’ve avoided using a whiteboard to show your ideas to this point then make sure you start on this question! For small amounts of data I may use the Slowly Changing Dimension.

More often than not the data is too large to use in such a slow transform. I prefer to do a lookup on the key of the target table and rows that don’t match are obviously new rows that can be inserted. If they do match it’s possible they are updates or duplicates. Determine this by using a conditional split comparing rows from the target to incoming rows. Send updates to a staging table that can then be updated in an Execute SQL Task.

Explain that putting updates in a staging table instead of updating using the OLE DB Command is much better for performance because the Execute SQL Task performs a bulk operation.

9. Explain what breakpoints are and how you would use them.

Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events.

A reason in which I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. I would place a watch window on the package and type the variable name in. Set a break point on the container the stop after each iteration of the loop.

Print
Tags:
Rate this article:
3.0
DevinKnight

DevinKnightDevinKnight

Other posts by DevinKnight

Please login or register to post comments.