Artificial intelligence (AI) and machine learning (ML) are becoming commonplace. They are used to perform tasks and help make critical decisions in a wide range of industries, including the energy, medical, and financial sectors.
Among other uses, AI and ML power recommender systems. These systems recommend products, content, or services consumers might like, whether they’re shopping online, choosing a movie or song to stream, or browsing through news articles.
Companies use recommender systems because they help them personalize the buyer experience, grow revenue, and improve customer retention and brand loyalty. Consumers generally appreciate suggestions that highlight items or features they might not have considered.
Types of recommender systems
There are three primary types of recommender systems.
- Content-based filtering uses similarities in products, services, or content features, as well as information accumulated about the user to make recommendations.
- Collaborative filtering relies on the preferences of similar users to offer recommendations to a particular user.
- Hybrid recommender systems combine two or more recommender strategies, using the advantages of each in different ways to make recommendations.
In this article, we’ll look at content-based recommender systems in particular, including how they work, their upsides and challenges, and the skills and technologies you may need to start developing one.
What is content-based filtering and how does it work?
Content-based filtering is a type of recommender system that attempts to guess what a user may like based on that user’s activity.
Content-based filtering makes recommendations by using keywords and attributes assigned to objects in a database (e.g., items in an online marketplace) and matching them to a user profile. The user profile is created based on data derived from a user’s actions, such as purchases, ratings (likes and dislikes), downloads, items searched for on a website and/or placed in a cart, and clicks on product links.
For example, suppose you’re recommending accessories to a user that just purchased a smartphone from your website and has previously bought smartphone accessories. Aside from keywords such as the smartphone manufacturer, make, and model, the user profile indicates prior purchases including phone holders with sleeves for credit cards. Based on this information, the recommender system may suggest similar phone holders for the new phone with attributes such as an RFID blocking fabric layer to help prevent unauthorized credit card scanning. In this example, the user would expect recommendations for similar phone holders, but the RFID blocking feature may be something they didn’t expect yet to appreciate.
Assigning attributes
Content-based filtering relies on assigning attributes to database objects so the algorithm knows something about each object. These attributes depend primarily on the products, services, or content you’re recommending.
Assigning attributes can be a monumental undertaking. Many companies resort to using subject-matter expert teams to assign attributes to each item manually. For example, Netflix has hired screenwriters to rate shows on aspects ranging from shooting locations and actors to plotlines, tone, and emotional effects. The resulting tags, used by the recommender, are algorithmically combined to group films together that share similar aspects.
Building a user profile
User profiles are another element crucial to content-based recommender systems. Profiles include the database objects the user has interacted with—purchased, browsed, read, watched, or listened to—as well as their assigned attributes.
Attributes appearing across multiple objects are weighted more heavily than those that show up less often. This helps establish a degree of importance because not all of an object’s attributes are equal to the user. User feedback is also critical when weighing items, which is why websites that provide recommendations are continually asking you to rate products, services, or content.
Based on attribute weightings and histories, the recommender system produces a unique model of each user’s preferences. The model consists of attributes the user is liable to like or dislike based on past activities, weighted by importance. User models are compared against all database objects, which are then assigned scores based on their similarity to the user profile.
Here’s an example: Let’s say you’ve listened to Taylor Swift’s “The Last Time,” Shakira’s “Can’t Remember to Forget You,” and “Me, Myself and I” by Beyoncé. A recommender system might recognize that you like female pop artists and breakup songs. You could expect to receive recommendations for more breakup songs by these and other female pop artists, such as Miley Cyrus’s “Slide Away.”
The recommender system may also suggest different types of songs by Miley Cyrus because you appear to like female pop artists. Still, since you didn’t choose to listen to this artist or songs unassociated with breakups before, these selections would receive a lower assigned score.
Why use content-based filtering?
Content-based filtering has many benefits compared to collaborative filtering, including:
- No data from other users is required to start making recommendations. Unlike collaborative filtering, content-based filtering doesn’t need data from other users to create recommendations. Once a user has searched on and browsed a few items and/or completed some purchases, a content-based filtering system can begin making relevant recommendations. This makes it ideal for businesses that don’t have an enormous pool of users to sample. It also works well for sellers that have many users but a small number of user interactions in specific categories or niches.
- Recommendations are highly relevant to the user. Content-based recommenders can be highly tailored to the user’s interests, including recommendations for niche items because the method relies on matching the characteristics or attributes of a database object with the user’s profile. For instance, content-based filtering will recognize a specific user’s preferences and tastes, such as hot sauces made in Texas with organic Scotch bonnet peppers, and recommend products with the same attributes. Content-based filtering is also valuable for businesses with extensive libraries containing a single type of product, such as smartphones, where recommendations need to be based on many discrete features.
- Recommendations are transparent to the user. Highly relevant recommendations project a sense of openness to the user, bolstering their trust level in offered recommendations. Comparatively, with collaborative filtering, instances are more likely to occur where users don’t understand why they see specific recommendations. For example, let’s say a group of users who purchased an umbrella also happen to buy down puffer coats. A collaborative system may recommend down puffer coats to other users who bought umbrellas but are uninterested in them and have never browsed or purchased that product.
- You avoid the “cold start” problem. Collaborative filtering creates a potential cold start scenario when a new website or community has few new users and lacks user connections. Although content-based filtering needs some initial inputs from users to start making recommendations, the quality of early recommendations is generally better than a collaborative system that requires the addition and correlation of millions of data points before becoming optimized.
- Content-based filtering systems are generally easier to create. The data science behind a content-based filtering system is relatively straightforward compared to collaborative filtering systems intended to mimic user-to-user recommendations. The real work in content-based filtering is assigning the attributes.
Challenges of content-based filtering
Like all recommender systems, content-based filtering has both pros and cons. We’ve covered some of the benefits. Here are a few disadvantages.
- There’s a lack of novelty and diversity. There’s more to recommendations than relevance. Suppose you liked the movie, Tenet. Chances are you’ll like Inside Man, too. But there’s a high probability you don’t need a recommender system to tell you this. So, to be of value, recommendation engines must come up with diverse and unexpected results.
- Scalability is a challenge. Every time a new product or service or new content is added, its attributes must be defined and tagged. The arduous, never-ending nature of attribute assignments can make scalability difficult and time-consuming.
- Attributes may be incorrect or inconsistent. Content-based recommendations are only as good as the subject-matter experts tagging items. Potentially millions of items may need attributes assigned, and since attributes can be subjective, many may be incorrectly tagged. A process that ensures attributes are applied consistently and accurately is paramount. Otherwise, a content-based recommender system will not function as intended.
Skills and tech you need to build a content-based filtering system
Building a recommendation engine is a classic machine learning exercise. Not only should data scientists have experience with the tools of statistical analysis, but they should also be familiar with tools and frameworks that provide an infrastructure for building recommendation engines. These include programming languages like Python and Scala, and libraries and frameworks such as Hadoop/Spark MLlib, LensKit, and Neo4. Which ones are appropriate for your project will depend on what exactly you’re trying to accomplish.
Next steps
Recommender systems such as content-based filtering benefit both sellers and buyers. Buyers can spend less time searching through pages of different products in a digital marketplace. Sellers can better understand customer preferences, provide a more personalized buyer experience, increase sales, and build brand loyalty by using content-based filtering.



