posted 5/2/2012 by MMilligan - Views: [2865]
The Analyze Key Influencers tool is used to show how column values in a data set might determine the values of a specified target column. The process creates a temporary mining model in Microsoft SQL Server Analysis Services using the Naïve Bayes algorithm. It then produces a Main Influencers report which represents the key influencers for a distinct value of the target column. You have the option of creating one or many additional Discrimination Reports that compares the influencers for any two distinct values of the target column. The Discrimination Reports are only useful if your target column contains more than two distinct states.
The Naïve Bayes algorithm is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions. The naïve part of the name comes from the fact that it assumes that all attributes are unrelated to each other and that the combination of attributes independently contribute to the probabilities that it predicts. For example, a fruit may be considered an orange if it is round, has the color orange, has seeds, grows on a tree, etc. Even if any of these features depend on the existence of other features, a Naïve Bayes classifier considers these properties to independently contribute to the probability that the fruit is an orange. One advantage of this algorithm is that it only requires a small set of data to estimate the means and variances of the variables required for classification.
This blog post will work through two examples using the sample data provided with the Microsoft SQL Server 2012 Data Mining Add-ins and another example using data from the Contoso sample database.
Example One:
Which properties of a customer in the sample data help to predict a customer's level of education?
The Key Influencers Report for Education shows which columns and which values of those columns have a significant impact over the value of the Education column. According to this report, people between the age of 37 and 46 who work in Management are very likely to have their Bachelors degree. Persons with only one car and work in a clerical profession are very likely to have only attended some College. People with two cars that work in a manual occupation and earn less than about 39K per year are likely to have only attended high school. Similar characteristics apply for those that only received a partial high school education. Persons that do not own an automobile are very likely to have completed a graduate degree.
Now, back to the Discrimination report dialog that we moved out of the way. Let's run a discrimination report that compares those with graduate degrees with those who only attended some of High School.
We can add as many discrimination reports as we want.
The Table Analysis Tools Sample worksheet only contains 1000 rows. When we go through the exact same steps on the Source Data sheet which has 10,000 rows, we get remarkably similar results.
Example Two:
Next, I'll run the tool to see what factors most strongly influence whether or not the customer is likely to purchase a bike.
The Key Influencers Report for BikeBuyer shows us that strongest predictors of whether or not the customer is likely to purchase a bike are when the customer doesn't own any cars, and that they are between the ages of 36 and 46. The strongest predictors that they will not buy a bike are when they own two cars and are over or equal to the age of 64.
The discrimination report shows us essentially the same thing.
Example Three:
For the next example, I have imported the V_Customer view from the Contoso Retail demo database which you can download from Microsoft.
If you import the data using the Data ribbon, From Other data sources button it will automatically format it as a table which is required. If you import your data from a CSV or copy and paste it into a spreadsheet it may not be formatted as a table.
Here we see that MaritalStatus has the most impact on influencing the value of HouseOwnerFlag. We also see that not having any children is a strong indicator for not owning a home.
I hope this explains how to use the Analyze Key Influencers tool sufficiently. If you have any questions, please use the comments section below.
Here are some additional links:
Analyze Key Influencers Video Tutorial
Microsoft BI - Data Mining - Analyze Key Influencers