A survey of collaborative filtering techniques pdf




















A widely employed approach for the purpose is the memory based algorithm. The existing preferences of a user are represented in form of a user- item matrix. The method makes use of the complete or partial user-item matrix in order to isolate the nearest users for the active user and then generate the prediction. The majority of initial efforts dedicated to understanding electronic commerce and recommender systems concentrate only on the technical aspects like algorithm building and computational needs of such systems.

Not much attention has been provided to questions pertaining to the need of such systems or how effective they are at what they try to perform. Along with looking at the various stages corresponding to a memory based collaborative filtering system, we propose an experiment to check the effectiveness of predictions or ratings generated by such systems. This is more relevant in current time of information explosion when a customer gets confounded by the variety of options that are available concerning every issue.

They try to bridge the gap between the user and the market by mathematically determining what a user may prefer. The uniqueness of collaborative filtering lies with its miscellany. The technique is unchanged for any type of data, i. The collaborative filtering technique can be utilized by two different approaches- Memory-based and Model-based.

Memory based collaborative- filtering systems use the complete user-item rating matrix or a part of it to generate recommendations. The Model- based approach attempts to determine a pattern or trend in the given ratings data and then construct a model to generate recommendations [1].

Memory-based approach has been discussed at length and is predominantly utilized in commercial systems due to several factors. The first reason is its ease of use. Since it concentrates on the user item database, it is easier to apply and account for. The second reason is its intuitive nature. As the system keeps collecting data about a particular user, it spontaneously acts to generate recommendations after considering this new information.

Hence the predictions are always up-to-date. The third reason is the cost. They are less costly and hence outperform the other approach in speed and resource usage [2]. The first limitation of this approach is that it is rating dependent. The behavioral trends or taste of a user may change over time. The user can also get resistive during the rating process and may selectively or incorrectly rate items. Another factor is the limited scope of ratings. Data belonging to a particular domain can be used to successfully generate predictions for that specific domain only.

It is difficult to generate a prediction about the breakfast preferences of a user after analyzing the music that user hears. The second limitation is data sparsity. When a new user is introduced to the system it takes time to build a profile for him as no information exists about him. This is called the cold start problem [2]. We observe the various stages of a general collaborative filtering algorithm and the try to analyze the effectiveness of various techniques that are employed for the same.

The input is the data gathered about a user active user. Euclidean distance and Correlation Coefficient are widely used measures. Angular Distance can also be used. The output of the filter is the generated recommendation. Implicit method is concerned with checking the browsing history, tracking the number of clicks and recording time spent on a particular page.

The gathered data is represented in the form of the user-item matrix [4]. This step is concerned with identifying the knearest users to the active user.

These k users form the neighborhood of the active user. The rating is generated keeping in mind the neighborhood of the active user [3]. The different methods to calculate the similarity are: 2. The users are then represented as points on the graph and the distance between the different points is measured using the Euclidean distance formula. A disadvantage of this method is the two dimensional nature of the measure despite its simplicity.

The range of this measure is [0,1]. It determines the degree of association between two variables. The nearer the points are to a linear trajectory, the higher their strength of association. Unlike the Euclidean distance, it has a wider range [-1,1] and also assumes negative values.

Its strength lies in the fact that it can also accommodate any form of scaling and can correct for any non-normalized nature of data. By using this measure we are trying to establish the angle between the two vectors [6]. It has a range from [-1,1]. Cosine of 0 is 1 which indicates that vectors are overlapping, hence indicating that users have similar tastes. This measure is particularly useful when data is sparse or the co-rated items are few and useful relationship cannot be determined using other measures.

The task can be accomplished using various methods the most trivial of which is taking a simple average or mean of the obtained ratings. A more efficient method is to take the weighted average of the available ratings. The aim is to calculate the expected rating for all items that have not yet been rated by the active user and then recommend the N most recommendedhighest rated items in the neighborhood.

Hence our task is the evaluation of three different techniques commonly used for memory based collaborative filtering- Euclidean Distance, Pearson Correlation Coefficient and Vector Based Cosine Similarity. An empirical study to calculate the effectiveness of the ratings generated using various similarity measures was conducted. For our study, we considered explicit input, i. A standard one page questionnaire was prepared containing a list of common movies belonging to the dataset and users were asked to provide ratings for the same.

Some features of the site may not work correctly. DOI: Khoshgoftaar Published Computer Science Adv. As one of the most successful approaches to building recommender systems, collaborative filtering CF uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users.

View PDF. Save to Library Save. Create Alert Alert. Share This Paper. Background Citations. Methods Citations. Results Citations. Figures, Tables, and Topics from this paper. Collaborative filtering Algorithm Scalability Sparse matrix Dimensionality reduction Recommender system Cluster analysis Factor analysis Singular value decomposition Cosine similarity Geo-imputation Precision and recall Missing data Mobile app Approximation error Weight function Performance Mean squared error Experiment Signal-to-noise ratio Computer cluster Bayesian network.

Citation Type. Has PDF. Publication Type. More Filters. A Fusion Approach for Collaborative Filtering. ICIAI Highly Influenced. View 4 excerpts, cites methods and background.



0コメント

  • 1000 / 1000