Recommendation Engine


As the vast majority of businesses in virtually every industry strive to leverage the benefits of A.I., a recommendation engine is quickly becoming a necessity to maintaincompetitiveness. According to Netflix , 75% of their users watch what is recommendationed rather than actually searching for a particular title. Google News’ recommendations increase an article’s views by 38% (Das et al 2007), and 35% of Amazon's’ revenue is generated from their recommendation engine. A recommendation engine doesn't just improve ROI; it's essential when there is a broad selection of products and browsing is impossible. Just try to browse every product on Amazon.


A Recommendation Engine is a tool to help your business rank which of your products you should present to your customers, and in which order. Whether your business is gadgets, music, media or any product or service, it's a feature that readily performs better than random, and therefore increases revenue.


This demo will take a look at the MovieLens dataset since it's public, clean and big enough (20 million ratings) to demonstrate what a recommendation engine does and how.


Interact with the widgets on the left to query a subset of films to the scatterplot on the right. Hover over the circles to see the titles that come up. You can also filter by director or cast and change the variables on the X and Y axis.

A gold circle represents an Oscar, while a purple one symbolizes a Razzie award.

Inspired by the Shiny Movie Explorer.


The two traditional approaches to building a recommendation engine are Content-based Filtering and Collaborative Filtering. We will demonstrate the latter using a Matrix factorization. This model shows users in the rows and movies in the columns with the user's rating for that movie in the table.

Matrix factorization example

We use the data to fill in the missing values in the table, and in the process, we get latent factors (colors). The only knowledge we give to the model is the ratings each user gave to a movie. More features, like users' age, movies' year and cast, etc., might be used as additional features for better results. But it's not alway the case that you have all the data you want.


Our model calculates the predicted ratings a user might give to movies and then presents the appropriate movies to them ranked by the predicted values.


Look again at the table of users and movies above. We can see that a big red circle correlates to violent movies (Terminator and Robocop), so users who have big red circles of their own (Birger) are likely to give violent movies a higher rating. Even though the film genres were not provided to the model, the model may still pick up on them if the genre is very representative, like in the case of the orange circle (animated movies). Another example of a way the model can segment movies with just user ratings is the blue circle, which seems to correspond to movies starring Arnold Schwarzenegger.


This graph shows movies that correlate to a particular factor (like the colors from the table above). You can move the slider to explore different factors and then drag the results to scroll and see more movies.

In the graph, we can see the films that rate highest and lowest for that particular factor. Beneath the graph, you see each movies' genre tags. The fact that many genres repeat within a particular factor shows how they are a representative segmentation tool to differentiate between films and that our model picks up on them. Further down is the list of users with the highest and lowest ratings for that factor, which means they tend to like those types of movies. Recognizing these trends, we can gain useful insights into their preferences and find questions we hadn't considered asking. Effectively targeting advertising to users that are interested in particular products or discovering niche interests that warrant further product development are some of the ways insights gained from these kinds of analysis are used.

For example:

See if you can find any other insights based on the latent factors.


Recommendation engines are a reliable way to increase your revenues and will be ubiquitously used in every industry that can benefit from them as more and more companies leverage A.I. In this example, we used a matrix factorization model to predict the ratings customers would have given movies they haven't seen yet. Then, by ranking our predictions, we found effective, salient recommendations tailored for individual customers. Finally, we used the latent factors which arose from the model, driven solely by the data itself, to create meaningful movie and user segmentation.

Powered by KVASS.AI