Recommender systems in online education. Recommender System Implicit Data Collection Examples

Let's start by defining what a recommender system is. These are programs and services that try to determine what users want to see and provide it to them (or recommend it, hence the name). Each of us must have met similar tricks on various sites. Today we will describe the types, the principle of operation of such programs, and also give examples of these algorithms in action. Read to the end, it will be interesting!

Above we described what recommender systems are, now we will talk in more detail about how important they are. These programs have improved the way the site interacts with the visitor because instead of providing static information, the user gets an interactive experience.

Recommendations are generated separately for each person, based on their previous actions on a specific web resource or on the basis of past activity. In addition, the behavior of previous participants in the process also matters.

For online stores, this is basically important function, and for such large directories like Amazon - one of the few ways to do quality work. The recommendation method this case is not a common additional option, it provides the convenience of user navigation through a web resource. If a Digital catalogue contains more than 20,000 product items, the orientation already seems prohibitively difficult, what can I say if there are millions of goods?

How tiring potential buyer interaction with a similar site? The answer is obvious. And a widget for searching for products visually similar to the one you are looking for, or belonging to the same product group, or complementary products (when a pair of shoes is offered to choose a handbag, for example) comes to the rescue. This decision not only increases the number of views, it has a positive effect on the conversion.

As practice shows, not only online stores use this technique. Social media is also not far behind. Below is an example from VKontakte.

Also, such techniques can be easily seen on various social platforms, portals dedicated to literature, travel, news resources, online stores, in a word - almost everywhere. This technique is really very popular. The Kinopoisk web resource is another accessible example.

Techniques

So, the first type is explicit data collection. As you might guess from the name, the user himself provides the materials necessary for the work. For example, when the recommender systems of Yandex or other search engines ask a person to rate various elements, make a list of favorites in a certain area, or answer a few questions. If a person refuses to give information on his own, the following technique will be relevant.

The second type is implicit data collection. Relatively speaking, this is a spy mission, according to which the actions of a process participant are recorded by the program for further processing and application. What is needed for this? The program recognizes purchases, ratings on sites, collects information on views, comments. Of course, the choice of such a technique leads to some ethical issues, because the protection of personal data is one of the main requirements for search engines by the user. But while the fact remains - a kind of surveillance is possible, and ordinary site visitors cannot check whether such events are really taking place.

The first basic technique is called collaborative filtering. Recommendations using this technique are issued based on the behavioral characteristics of one person or group of people, the latter is even more effective. Groups gather people who are similar to each other in behavior and characteristics.

Let's give an example to make the information easier to understand. A website is being created where musical works will be recommended to the audience. How will recommendations services based on a collaborative methodology work in this case? According to this principle: one community will be taken as a basis, where participants add tracks of the same genre to the playlist. Further, the most popular of all pieces of music are determined and recommended to one user from the group who has not yet listened to this melody.

The second approach is called content-based filtering. Here the recommendation is formed on the basis of human behavior. When using this approach, the browsing history of a particular participant can also be taken as a basis.

This time we will give an example with thematic online magazines. So, in the case where a person previously read materials about mountain biking and regularly commented on blog articles of such content, then the content filtering method will use this past information to identify similar resources and offer them as a recommendation to this user.

There are also mixed approaches, in accordance with which the development of a recommender system is carried out.

A mixed approach is a combination of collaborative and content filtering. As you know, more is better, so mixing these two methods increases the effectiveness of recommendation systems, namely, they significantly increase the accuracy of forecasts for specific people.

Algorithms

Pearson correlation

This algorithm allows you to select General characteristics between multiple users. How? With the help of simple mathematics, namely the definition of a linear relationship between two elements. An important point - this technique is not suitable for a community of people.

Clustering

This principle of operation of recommender systems is based on highlighting the similarity between elements (users) by calculating their proximity to each other in the so-called feature space. Signs are those elements on which the interests of certain participants in the process converge (for musical resources, these are tracks, for movie portals, films). Users with similar characteristics are grouped into so-called clusters.

Collaborative Filtering Algorithm

Hard clustering can be replaced by another algorithm that works according to a rather complex formula, and, like all previous ones, is based on the behavior of users from its group. However, this technique has several rather significant disadvantages. First, it's hard for new or atypical users (those who don't group together) to find recommendations. Secondly, the so-called "cold start", when new objects do not get into recommender systems.

Content Filtering Algorithm

The algorithm is symmetrical to the previous one, but if in the first case we started from the assumption that the user will like the object because his “classmates” like it, then here we will recommend based on similar objects that he has already noted for himself. And here, traditionally, several problems can be identified. The same "cold start" and the fact that the recommendations are often mundane.

Instead of a conclusion

So, we have provided all the information that a beginner or a layman should know about recommender systems. Let's be honest, algorithms are somewhat difficult for an unprepared person, so there are no mathematical formulas in this article, although the algorithms are based on them.

Recommendation programs are useful services for both ordinary Internet users and researchers and online businessmen. Those who want to increase conversions and the number of views should pay attention to this technique and be sure to implement it to increase the efficiency of a web resource, especially an online store.

There are two main strategies for building recommender systems: content filtering and collaborative filtering.

At content filtering created profiles users and objects.

User profiles may include demographic information or answers to a specific set of questions.
Object profiles can include genre names, actor names, performer names, and so on. - depending on the type of object.

This approach is used in the project Music Genome Project: The music analyst evaluates each song against hundreds of different musical characteristics, which can be used to reveal the user's musical preferences.

At collaborative filtering information about past user behavior is used - for example, information about purchases or ratings. In this case, it does not matter what types of objects you are working with, but implicit characteristics can be taken into account, which would be difficult to take into account when creating a profile. The main problem with this type of recommender systems is the “cold start”: the lack of data about users or objects that have recently appeared in the system.

Methodology

Examples of Explicit Data Collection

requesting the user to evaluate the object on a differentiated scale;
asking the user to rank a group of objects from best to worst;
presenting two objects to the user with the question of which one is better;
a suggestion to create a list of objects that the user loves.

Examples of implicit data collection

monitoring what the user is viewing in online stores or other types of databases;
maintaining records of user behavior online;
tracking the contents of the user's computer;

Application

Recommender systems compare the same type of data from different people and calculate a list of recommendations for a particular user. Some examples of their commercial and non-commercial use are given in the collaborative filtering article. Recommender systems are a convenient alternative to search algorithms, as they allow you to discover objects that cannot be found last. Curiously, recommender systems often use search engines to index unusual data.

Imhonet (films, literature, photos)
Last.fm (music)
Ozon (books, CDs, etc.)
Software Informer (software)
Science Fiction Lab (Science Fiction and Fantasy website)
Imdb - movies (website in English)
Rechelper - movies
Advizzer - places
Mir4 is an experimental system capable of working with any content, including small circulation ones. For now, it only works with news.

Notes

Literature

Melville P., Mooney R., Nagarajan R. Content-Boosted Collaborative Filtering for Improved Recommendations // University of Texas, USA: Materials conf. / AAAI-02, Austin, TX, USA, 2002. - 2002. - S. 187-192.

Wikimedia Foundation. 2010 .

Branch of scientific and practical activity, one of the types of bibliography (See Bibliography). The main public purpose of R. b. in the USSR and other socialist countries to promote the general and professional education and self-education... Great Soviet Encyclopedia

URL: http://imhonet.ru Commercial: No Type sa ... Wikipedia

This article in this moment actively edited by member Member:Moshanin. Please do not make any changes to it until this announcement is gone. Otherwise, editing conflicts may occur. This ... ... Wikipedia - This page is proposed to be renamed the International Atomic Energy Agency. Explanation of reasons and discussion on the Wikipedia page: To be renamed / July 24, 2012. Perhaps its current name does not meet the standards of the modern ... Wikipedia

BIBLIOGRAPHY OF CHILDREN'S LITERATURE- a bibliography section, whose tasks include accounting and information about printed works for children and youth. This information is sent in two main addresses: adults (teachers, parents, educators, professionals professionally associated with ... ... Russian Pedagogical Encyclopedia

Economics of a country- (National economy) The country's economy is public relations to ensure the country's wealth and the well-being of its citizens The role of the national economy in the life of the state, the essence, functions, sectors and indicators of the country's economy, the structure of countries ... ... Encyclopedia of the investor

Liquidity- (Liquidity) Liquidity is the mobility of assets, which ensures the possibility of uninterrupted payment of obligations Economic characteristic and the liquidity ratio of an enterprise, bank, market, assets and investments as an important economic ... ... Encyclopedia of the investor

On April 28, 2016, we officially announced the launch of the first adaptive course on Stepic.org, which selects Python tasks depending on the level of the student. Before that, we also implemented the recommended lessons on the platform, so that students would not forget what they had already completed and discover new topics that might interest them.

Under the cut, there are two main topics:

about online education, pros/cons/pitfalls;
classification of recommender systems, their applicability in education, examples.

About online education, its pros, cons and pitfalls

This part is more introductory, characterizing online education, exciting details of recommender systems under the following picture :)

In the modern world, online education is gradually becoming more and more popular. Opportunity to learn from leading professors educational institutions, to study new areas, to get the knowledge necessary for work, without leaving home, attracts a large number of people.

One of the most common forms of online learning is Massive Open Online Courses (MOOCs). Most often they include videos, slides and text content prepared by the teacher, as well as tasks for knowledge testing, which are usually checked automatically, but it is also possible for students to check each other's work. As tasks, a wide variety of types of tasks can be offered: from simple choice the correct answer before writing an essay and even, as we have on Stepik, programming tasks with automatic verification.

Online education has its own characteristics that distinguish it from conventional, offline education. Among the advantages, firstly, the availability already mentioned above to everyone who has access to the Internet. Secondly, it is almost unlimited scalability: thanks to automated task checking, thousands of people can study on the course at the same time, which is incomparable with conventional courses in classrooms. Thirdly, each student can choose a convenient time and pace for passing the material. Fourth, educators have a wealth of data about how users take their courses, which they can use to analyze and improve their materials.

At the same time, there are some downsides to online learning. Unlike traditional education, where the student always has a motivation in the form of an assessment of his academic performance, in the case of online courses, there are no penalties for not completing the course. Because of this, the proportion of those who have completed the course of those who have signed up for it rarely exceeds 10% (on Stepik, we have the best Anatoly Karpov’s course “Fundamentals of Statistics” according to the EdCrunch Awards 2015, a record 17% of those who signed up for the first launch, but this is rather an exception). In addition, due to the large number of students, the teacher does not have the opportunity to give individual attention to each student in accordance with his level and capabilities.

We set ourselves the task of creating a recommender system that could advise the student on content that is interesting to him and take into account his level of preparation and gaps in knowledge. In addition, the system must be able to evaluate the complexity of the content. This is necessary, in particular, for adaptive recommendations that will help the user to study the material, adapting flexibly to it, offering exactly the content that he needs now for learning. Such a system would benefit users with personalized lesson recommendations that could help them learn a particular topic or suggest something new.

In general, learning should have become even more interesting!

One of the first modern examples of a recommender system is movielens.org, which suggests movies to users based on their preferences. This service is interesting in that it provides everyone with an extensive set of data about films and ratings given to them by users. This dataset has been used in a lot of recommender research over the past two decades.

Systems based on content filtering. Such systems offer users content that is similar to what they have previously studied. Similarity is calculated using the characteristics of the compared objects. For example, you can use genre proximity or cast to recommend movies. This approach is used in the service for rating, searching and recommending movies Internet Movie Database.
Systems using collaborative filtering. In this case, the user is offered content that is of interest to similar users. The MovieLens recommendations are based on this approach.
Hybrid systems combining the two previous approaches. This type of system is used by Netflix, a service for watching movies and series online.

We have created a hybrid system with more active use of content filtering and less active use of collaborative filtering.

There is a lot of research on recommender systems for Technology Enhanced Learning. The specificity of the task in this case adds new directions for the development of the recommender system.

What are the features of the recommender system of an educational project?

Firstly, it is the ability to build an adaptive recommender system that will adapt to the needs of the user at a particular moment and offer him the best ways to study the material. In this format, various simulators can be implemented, for example, in mathematics or any programming language, containing many tasks of varying complexity, of which different students will be suitable at any given time.

Second, you can extract the dependencies between training materials from the data on how users go through them.

This data can help extract individual topics in the materials, the connections between these topics, their relationship in complexity.

Coursera, EdX, Udacity (online learning platforms) use their recommender systems to advise users on courses they might be interested in. The disadvantage of these recommendations is that they can only offer the whole course, but not some part of it, even if the user is only interested in it. Also, a system built in this way cannot help the user in studying the course he has chosen.

MathsGarden's resource recommendation system, on the other hand, works with the smallest pieces of content - individual tasks. It is a simulator in elementary arithmetic for students elementary school, which offers the student tasks that best suit him at a given time in terms of complexity.
To do this, the system calculates and dynamically changes the relative characteristic of the student's knowledge, as well as the characteristic of the complexity of the tasks, but more on this later.

In the following articles, we will talk in more detail about the Stepic.org device and the implementation of the recommender system, define what an adaptive recommender system is, and analyze the results in detail. It will be fun:)

When recommendation systems were just beginning to be unobtrusively implemented on various resources, it seemed like a nice addition to the self-searching process. When the choice of products or any content is large enough, the search turns into an exciting journey with often unpredictable results. For example, I have never been interested in horror films, preferring films of a slightly different direction, however, thanks to random swarming in content, one day I stumbled upon the classic hellraiser, a casual viewing of which left a strong and indelible impression on me. I am sure that each of the readers at least once enriched himself in a cultural or aesthetic sense precisely thanks to a random search and actions at random. On the other hand, I discovered a lot of interesting things for myself with the help of the recommendations that thematic resources provide me. Many films, books, music or products became known to me (and interesting) only because of the successful operation of the recommendation system. Tellingly, now I almost always rely on recommendations and much less often look for something on my own, because there is simply no time left for the latter!

This state of affairs is exacerbated by the fact that I see the extent to which recommender algorithms have begun to understand me. If earlier successful hits did not happen so often, today at least a good half of the recommended things interest me to one degree or another. And when I still try, instead of apathetic acceptance of what is offered to me, to find something worthwhile on my own, I quickly give up under the pressure of incredible, unprecedented abundance. And the farther, the clearer the picture of the not so distant future emerges, when the surrounding reality will continuously adapt to your personality, constantly transforming and learning. Never before in the history of mankind has comfort been so ominously absolute. And never before have loopholes for incredible chance finds been removed from use so quickly and categorically.

Accepting the coming future as it is, it is worth learning to evaluate it critically, identifying dubious or even dark sides with the same zeal with which we strive to use innovations in everyday life that make our lot easier. Let's try to understand the subject of our conversation today.

Filtering methods used in recommender systems

Collaborative filtering

Collaborative filtering is widely used, not least because of the relative ease of implementation. The principle of its operation is indeed simple, although it can be divided into two different approaches.

The user-based approach takes into account the similarity of a given user to other users involved in the system. For example, if Vasily positively assessed Lady Gaga, Oasis and Led Zeppelin, then Anastasia, who loves Lady Gaga and Led Zeppelin, may well try to offer Oasis.

The concept of object matching (item-based, respectively), on the contrary, analyzes the objects themselves and reveals their similarity to those that Vasily once liked. In practice, it looks like this - Vasily once liked Radiohead and Blur, why don't we offer him Oasis as well?

Collaborative filtering allows you to get very accurate and relevant recommendations based on the analysis and comparison of differences among users with similar behavior.

Vasily and Anastasia: mutual automatic recommendations based on differences in preferences.

Content filtering

Content filtering builds internal links between the offered goods or any content. This simple principle is manifested in the recommendation to the user of objects similar to those that he had previously chosen. For example, if you purchase a guitar manual at a bookstore, you will automatically be offered other popular tutorials or manuals by the same author. A big plus of recommender systems that use the principle of content filtering is the ability to interest a new user with offers literally from his first consumer steps. You do not need to collect data about a person's preferences for a long time, you can immediately include the visitor in working with the resource. Also, an important advantage of content filtering is the ability to recommend to the user those objects that are not appreciated and bypassed by other users. The latter point often occurs when using the collaborative method.

Content filtering completely ignores users' opinions about certain objects. By building connections purely between the objects themselves, we have the opportunity to instantly, without collecting ratings and additional personal information, offer a person something similar to the position that interested him. By excluding user experience from the recommender system as a fundamental substance, we seem to be solving the problem of the so-called. "cold start", when the sparseness of user data prevents the system from generating personalized recommendations. However back side content filtering consists in completely inappropriate, and sometimes simply ridiculous recommendations like “Did you buy a Toyota RAV4? You might also be interested in Toyota Highlander!”

Another difficulty associated with the use of the principle of content filtering is the impressive amount of work to build relationships between all objects in the system. But the main drawback of this method is expressed in a very low, and sometimes quite conditional, hit on the target. Content filtering does not imply a high degree of personalization, so the accuracy of recommendations is relatively low.

Knowledge Based Filtering (Knowledge– based systems)

Systems of this type are widely used in online stores. In essence, knowledge-based recommendations are similar to the previous content filtering method, however, such algorithms use a deeper analysis of objects, building connections between them not according to banal similarity criteria, but based on the interconnectedness of certain product groups.

In practice, it looks like this - when purchasing, for example, a smartphone, the site offers you accessories suitable for use with your new device. It can be covers, headphones, memory cards and stuff like that. You can additionally stimulate the buyer by providing a discount on accessories, which can be very useful in connection with the acquisition of a new device.

Knowledge-based recommendations are showing good results, raising the turnover of large network trading floors by tens of percent. In addition, unlike content filtering, this type of recommendation has high precision, offering the user what he can really come in handy.

If you are interested in accurate recommendations, then you should definitely consider implementing a knowledge-based system on your resource. Like content filtering, a knowledge-based recommender system studies and analyzes the relationships between objects (products), but, in addition, it takes into account a number of additional options related to the individual properties of a particular user.

a) User wishes. A situation familiar to everyone - the site prompts the user to indicate the desired characteristics, after which it offers products suitable for the request.

Yandex.Market and its checkboxes are a good and striking example of a recommendation system that is guided by user requirements.

b) Demographic features. In fact, the largest companies use demographic data to make recommendations. social networks, such as Facebook, LinkedIn, Vkontakte and others.

Of course, to implement such a system, you need to work hard - you will have to collect and process a huge amount of data.

Hybrid Filtration

The most powerful and difficult to implement tool. Apparently, the future lies precisely in the combination of various recommender mechanisms into a single powerful algorithm. That absolute comfort and personalized reality that we talked about at the beginning of the article will be realized with the help of a hybrid of the most effective recommendation methods.

Such an example is provided by Netflix, whose complex hybrid recommendation system, which demonstrates unique accuracy, is constantly being improved and modernized. The development of such a powerful algorithm is largely due to the generous funding of research in this area by Netflix itself, which in 2006 offered $1,000,000 to improve its recommender system by 10%.

BellKor's Pragmatic Chaos development team, who managed to improve the algorithmNetflixby 10.09%.

A few words about practical steps as a conclusion

The choice of a specific type of filtering or a combination of several methods directly depends on two factors - the complexity of your project and the amount of its funding. For example, creating an algorithm for a system of thematic, intersecting blogs is a relatively simple and moderately costly task. Larger and more heterogeneous projects, such as online stores, are more expensive, especially if the goal is to increase conversion by really significant amounts. As a rule, in such projects it is not possible to limit oneself to one type of recommender algorithm and one has to use hybrid filtering, as a result of which the cost and complexity of development increase by orders of magnitude.

To create, implement and debug a hybrid algorithm, you will need a whole team of experienced developers who are well aware of what linear and relational algebra is, and also have a number of skills that make the creators of recommender algorithms virtually a separate profession.

One way or another, when developing a project that offers the user the opportunity to select specific objects from the general set, it is necessary to take into account the rapid progress of usability in absolutely all areas. human life– from optimizing sleep with the help of devices that analyze all processes occurring in sleep and issue recommendations for its improvement, to automatic selection of everyday products based on the current needs of the user. As you know, an indispensable condition for the success of any undertaking is its exact compliance with the spirit of the times.