Recommendation Engines have become a pervasive and daily part of our digitally connected lives. Whether your shopping on Amazon or reading new articles on your Yahoo! home page the products and news you offered are the result of some implicit or explicit behavior that is used to drive a computational engine that uses patterns to predict (hopefully successfully) your likes and dislikes in order to serve up recommendations. While this technology is nothing new, advancement in toolsets have made these engines far more approachable without requiring a PhD in Statistics or Mathematics.
This post will start a series in which we build a Recommendation Engine using Mahout on HDInsight. In this first post, we will build the foundation by first exploring a couple different flavors or types of recommendation engines.
Types of Recommenders
There is a vast field of techniques for delivering recommendations. For our purposes we can broadly group most techniques into three primary types of recommendation engines: Collaborative Filtering, Content-Based and Data Mining. We will briefly introduce each below.
One of the most common types of recommendation engine, Collaborative Filtering is a behavior based system that functions solely on the assumption that people with similar interests share common preferences. In order to use a system of this type some behavior, either implicit or explicit (see table below) in terms of a user/product relationship must be captured.
|Explicit Behaviors ||Implicit Behaviors
|Ephemeral Needs (Need for the moment)
Using the behavior data, the system will take the following generalized steps:
- Select the target or user for which the recommendations are being generated for
- Find collaborating users using similarity metrics such as Pearson Correlation, Cosine or Euclidean Distance.
- Make recommendations based on products or news articles that were rated favorably by collaborative or similar users.
The type of collaborative filtering previously described is referred to as user-to-user. A second type of collaborative filtering you may hear discussed is item-to-item which finds the same behavior-based similarity between items instead of users. A good example of both collaborative filter and proof that my daughter has been using my Netflix account can be seen below:
Content-based recommendation engines are the easiest to understand. They function by recommending items that are similar to those that have been recently viewed or purchased. Similarity between items is based on the attributes of the item such as a description or taxonomy. One of the best examples of content-based recommendation engines can be found on Amazon in their Recommendation for You section. Note that for each item in the list there is a 'Why recommended?" link which will show you the item that caused the recommendation to be generated.
While a number of data mining techniques can be used to generate recommendations, the two most common are Association Rules and Clustering. Association Rules is used to find rules that predict the occurrence of one item given the presences of one or more other items. This is the classic example of what's better know as market basket analysis where we might recommend a mouse or monitor when a visitor adds a computer to their shopping cart.
Clustering on the other hand can be used to group either users or items together. An example of this algorithm in action is a news aggregator website which would categorize news articles so that it could either hide duplicate articles from different sources or recommend additional articles based on topic or genre.
In this post, we established a foundation of understanding on which we will build a recommendation engine in subsequent posts. It's important to understand the different types of recommenders available not only because they handle different scenarios but also because they are typically used together in an enterprise recommendation system.
Till next time!