ArticlePDF Available

Abstract and Figures

In recent times, the recommender systems (RSs) have considerable importance in academia, commercial activities, and industry. They are widely used in various domains such as shopping (Amazon), music (Pandora), movies (Netflix), travel (TripAdvisor), restaurant (Yelp), people (Facebook), and articles (TED). Most of the RSs approaches rely on a single-criterion rating (overall rating) as a primary source for the recommendation process. However, the overall rating is not enough to gain high accuracy of recommendations because the overall rating cannot express fine-grained analysis behind the user’s behavior. To solve this problem, multi-criteria recommender systems (MCRSs) have been developed to improve the accuracy of the RS performance. Additionally, a new source of information represented by the user-generated reviews is incorporated in the recommendation process because of the rich and numerous information included (i.e. review elements) related to the whole item or to a certain feature of the item or the user’s preferences. The valuable review elements are extracted using either text mining or sentiment analysis. MCRSs benefit from the review elements of the user-generated reviews in building their criteria forming multi-criteria review based recommender systems. The review elements improve the accuracy of the RS performance and mitigate most of the RS’s problems such as the cold start and sparsity. In this review, we focused on the multi-criteria review-based recommender system and explained the user reviews elements in detail and how these can be integrated into the RSs to help develop their criteria to enhance the RSs performance. Finally, based on the survey, we presented four future trends based on this type of RSs to support researchers who wish to pursue studies in this area.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
1
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Multi-Criteria Review-Based Recommender
System The State of the Art
Sumaia Mohammed AL-Ghuribi1,2 and Shahrul Azman Mohd Noah1
1Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
2Faculty of Applied Sciences, Department of Computer Science, Taiz University, Yemen
Corresponding author: Shahrul Azman (e-mail: [email protected]m.edu.my).
ABSTRACT In recent times, the recommender systems (RSs) have considerable importance in academia, commercial
activities, and industry. They are widely used in various domains such as shopping (Amazon), music (Pandora), movies
(Netflix), travel (TripAdvisor), restaurant (Yelp), people (Facebook), and articles (TED). Most of the RSs approaches rely
on a single-criterion rating (overall rating) as a primary source for the recommendation process. However, the overall rating
is not enough to gain high accuracy of recommendations because the overall rating cannot express fine-grained analysis
behind the user’s behavior. To solve this problem, multi-criteria recommender systems (MCRSs) have been developed to
improve the accuracy of the RS performance. Additionally, a new source of information represented by the user-generated
reviews is incorporated in the recommendation process because of the rich and numerous information included (i.e. review
elements) related to the whole item or to a certain feature of the item or the user’s preferences. The valuable review
elements are extracted using either text mining or sentiment analysis. MCRSs benefit from the review elements of the user-
generated reviews in building their criteria forming multi-criteria review based recommender systems. The review elements
improve the accuracy of the RS performance and mitigate most of the RS’s problems such as the cold start and sparsity. In
this review, we focused on the multi-criteria review-based recommender system and explained the user reviews elements in
detail and how these can be integrated into the RSs to help develop their criteria to enhance the RSs performance. Finally,
based on the survey, we presented four future trends based on this type of RSs to support researchers who wish to pursue
studies in this area.
INDEX TERMS Recommender System, Multi-Criteria Recommender System, User-generated reviews, Review elements,
Sentiment analysis, Text mining, Multi-Criteria Review-based Recommender System, Recommender system accuracy.
I. INTRODUCTION
At present, there is a vast flow of information on the Web,
and it continues to grow exponentially while providing users
or customers with various resources pertaining to services
such as products, hotels, and restaurants. Despite the benefits
of such data, the vast flow of information causes challenges
for users to deal with and choose from a huge number of
options made available to them. This causes an information
overload problem [1] and makes the decision-making process
more complex. In this case, it is important to filter the
information to a limited amount based on the current
user/customer preferences in order to assist them in making
the correct decision [2]. Such a filtering process is typically
done by RSs, which are developed to solve the information
overload problem by providing personalized suggestions of
services (i.e. items) to specific customers according to their
preferences [3].
RS has been proven to be significantly crucial in many
fields and is widely used by various domains such as
shopping (Amazon), music (Pandora), movies (Netflix),
travel (TripAdvisor), restaurant (Yelp), people (Facebook)
and articles (TED). There are many definitions of RSs
including:
a. A tool to mine items and/or collect users’
opinions to help users in their search process
and suggests items related to their preferences
[4], [2], [5].
b. A program or software for content filtering that
attempts to reduce the information overload
problem, where users encountered a flood of
data on the Web, by recommending
personalized items to users depending on the
items' information and/or users’ preferences [6],
[7], [8].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 2
c. A system to manage information overload
problem by collecting information, guiding
users in a personalized way and providing
individualized recommendations as output when
there are many possible alternatives to choose
from [9].
The problem of RSs can be identified as a way to assist
users/customers to discover relevant items to suit their needs
and most likely to their preferences [10]. Generally, a model
of RS consists of two sets and a utility function, in which
Users set contains all the users and Items set contains all the
items that can be recommended to the users. The utility
function calculates the suitability of a recommendation to a
user u Users an item i Items, which is declared as R:
Users×Items→R0, where R0 is equal to either a real number
or a positive integer within a specific range [11].
Typically, RS works through three phases [11], [12],
[13] as follows:
a. Modeling Phase: This phase is focused on
preparing the data that will be used in the next two
phases. There are three cases for that, the first is
building a rating matrix that contains the users as
rows, items as columns and the value of each
matrix’s cell is the rating done by a user for a
certain item. Second, building a user profile which
is mostly a vector for each user that explains his
preferences of an item as a whole or on some
aspects of the item. Third, building an item profile
that contains the features of a specific item.
b. Prediction Phase: This phase aims to predict the
rating or score of unseen/unknown items for a
specific user through a utility function depending on
the extracted information during the modeling
phase.
c. Recommendation Phase: This phase is an
extension of the prediction phase where various
approaches are applied to support the user’s
decision by filtering the most suitable items. It
recommends/proposes new items to the user (i.e. a
set of top-N items with the highest-predicted
ratings) that is most likely to be interesting to him.
Figure 1 shows the three main phases of RS.
FIGURE 1. The Phases of Recommender System
There are three main recommendation approaches which
are content-based, collaborative-based and hybrid. Classical
approaches rely on the users' ratings as the main source of
input of the recommendation. Relying on a single-criterion
(i.e. overall) rating for a recommendation is insufficient to
give an accurate recommendation because the overall ratings
cannot express fine-grained analysis behind the users'
behaviors since it only expresses the coarse-grained analysis.
It cannot be determined why the user choose such ratings.
Thus, it is difficult to know the exact user’s preferences. As a
result, multiple-criteria decision analysis is combined with
RS to form a multi-criteria recommender system (MCRS), in
which the recommendation is based on multiple criteria, and
not just on a single criterion.
Besides the primary source (i.e. numeric rating) of the
recommendation input, the user-generated reviews are also
used as an alternative source because of the valuable and rich
information they contained. The rich information from the
reviews can be extracted as elements such as topics, features,
overall score, and context, through analyzing the reviews
using sentiment analysis or text analytics approaches. In this
survey, we emphasized on MCRS especially in a multi-
criteria review-based RS because of its effective role in
enhancing the accuracy of the RS performance.
In the following content, the state-of-the-art is organized
as follows: the RS approaches are described in Section 2,
then the multi-criteria recommender system is explained in
Section 3. After that, in Section 4, the user-generated reviews
and the valuable elements that can be extracted from them
are discussed. This is followed by Section 5 which is the
main section because it contains the most recent researches in
the multi-criteria review based recommender system
approaches. Finally, the discussion and the forthcoming
trends in MCRS are presented in Section 6 followed by the
conclusion in Section 7.
II. RECOMMENDER SYSTEM APPROACHES
Approaches to the recommendation are usually categorized
into four categories which include content-based,
collaborative filtering, knowledge-based and hybrid.
A.
CONTENT-BASED APPROACH
A content-based (CB) approach mines the appropriate
recommendations for a user based on his recent behaviors
according to what the user liked, bought or watched [14]. It
generates the user profile from previously selected items by
characterizing the user according to the item features and
recommends items to the user based on the items that have
similar features to the items that the user liked before [15]. It
characterizes each user without having to compare his
preferences to other users. Put differently, it does not use the
information about other users’ preferences or the similarities
with other users [15], [16]. The process of CB approach can
be summarized into the following steps [2],[17],[18]:
a. Item representation: The information source of the item
description is used to extract the item’s characteristics
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 3
(i.e. features) to produce the structured item’s
representation.
b. Learning the user profile: A user profile is generated
from the previous user's behaviors (i.e. explicit and
implicit feedback) such as like/dislike of an item; assign
a score to an item (rating) or writing a textual opinion
about an item (comment).
c. Recommendations' generation: A list of items is
recommended to the user by comparing the item's
features with the user's profile and the items that are
most likely to be interesting to the user are added to the
list (i.e. top-ranked items).
This type of approach has been implemented in many
domains [9] especially in recommending items that contain
textual information such as websites, news, and articles. It
also recommends activities such as travel, tourism, e-
commerce, and TVs [15]. This approach is preferred for
moderate-sized items.
Some of the CB approach advantages are:
a. It can give an explanation for recommending specific
items (i.e. present the logic behind their
recommendations) through providing a list of content
features. This, in turn, can strengthen the user’s
confidence about the RS that reflects his own
preferences [16].
b. Since this approach relies on the content of each item,
not the ratings of other users, it gives several
advantages as follows [19]:
It offers a high level of personalization in the
recommendations.
It is scalable in terms of the number of users.
It can make recommendations for users with
peculiar interests.
It has high security from malicious item creation
and allows users to prevent viral marketing.
On the other hand, CB approaches have some
disadvantages such as:
The vast size of the items is considered a major
problem because when the recommendation is
made, the content of every item has to be
examined to discover items that are most likely
relatable to the user's interest [19]. This task is
error-prone and time-consuming [20].
User profiles are built based on the static
characteristics of the items. As a result, there is a
high probability that different users have similar
profiles even if they have various preferences
among these items, just because they commented
on the same items [9].
The over-specialization problem occurs in this
type of approaches because users do not receive
diverse or new items because of the restriction in
his profile regarding the description of similar
items [20].
Lack of serendipity. Overspecialization can also
cause the issue of serendipity, whereby users are
being recommended with familiar items.
B.
COLLABORATIVE FILTERING APPROACH
The collaborative filtering approach (CF) is the most popular
technique used in RSs [21]. It generates the recommendation
for a user based on the similarities among users who have
similar preferences/interests to him in the past. This approach
is based on the following hypothesis: people who agreed with
a user in the past will also agree in the future [16]. It
identifies the new user-item association by determining the
relationships between users and the interdependencies
between items [21]. It uses the implicit knowledge of a
community of users on used items to identify the
relationships of those items to other users who have not
used/seen those items within the community [15]. This can
be represented as a user × items matrix in which each cell
represents the user rating of a particular item.
The first CF framework for RS was developed by
Resnick et al. [22] called GroupLens. It recommends articles
to the Netnews clients using the rating server, named Better
Bit Bureaus (BBB) which gathers users' rating to predict
other user's scores on articles based on the heuristic model
that clients who agreed to the rating of articles in the past
they will probably agree in the future.
CF can be grouped into two classes memory-based and
model-based [9]. The memory-based CF type is a heuristic
algorithm that predicts the item's rating based on other users'
ratings, and can be classified into two methods [10]: user-
based and item-based, the former identifies a set of neighbors
(i.e. like-minded users) for a target user using ratings then
recommend a set of items that interest his neighbors. While
the latter, recommends items to a target user that are similar
(i.e. has shared features) interests in the items that a user
purchased, viewed or liked before. There are two approaches
that are most frequently used to identify the user/item
similarity, the Pearson correlation approach and cosine-based
approach [1].
On the other hand, the model-based CF type [23] predicts
user's rating of unseen items by developing models using
different representative techniques such as the clustering
models, Bayesian networks and Markov decision process. A
survey by Su and Khoshgoftaar [23], provided a comparison
between the CF classes as shown in Table 1.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 4
TABLE 1: COMPARISON BETWEEN CF CLASSES
CF Class
Advantages
Shortcomings
Memory-
based
Easy to implement.
Easy to add new
data incrementally.
The content of
recommended items
need not consider.
Co-rated items have
been scaled well.
Dependent on human
ratings.
In sparse data, the
recommendation
performance is
decreased.
New items/users cannot
be recommended (i.e.
cold start problem).
The large datasets have
limited the scalability.
Model-
based
The scalability and
sparsity problem are
better addressed.
The prediction
performance is
improved.
An intuitive
rationale is given for
recommendations
Building the model is
expensive
A trade-off is done
between scalability and
prediction performance.
Useful information is
lost through
dimensionality reduction
techniques.
CF approaches possess many advantages compared to
other approaches, some of the main advantages are:
Serendipity where novel and unfamiliar items are
recommended.
Able to recommend more subtle items and can
capture more nuances around items.
Flexible and suitable for various domains.
No need to analyze the items contents.
Generally, the performance of the CF approach depends
on the availability of sufficient user participation [16]. It
performs satisfactorily only when there is adequate rating
information [23]. Depending on the ratings exposed CF
approach to the following issues [21], [24]:
a. Sparsity Problem: One of the major problems that
complicate the personalized item ranking process is
data sparsity because items cannot be reliably linked
to users [25], causing a limitation in the
recommendation’s effectiveness and limited coverage
of recommendation space [26].
This problem occurs due to the following issues [26],
[27]:
Insufficient or missing information of either the
user or item or both in the dataset during the
process of filling the ratings (user-item) matrix.
The complexity of gathering the itemsratings.
Expressing user’s preferences about items as a
rating is a complicated process.
b. Cold Start Problem: This problem happens in the case
of new users who do not provide any ratings as yet or
new items that have not been rated [28]. It can be
considered as a particular case of the sparsity problem
in which most of the cells of the item-user interaction
matrix contain null values [29]. The CF approach is
not able to generate accurate recommendations for
new users or items without sufficient existing data on
them [24].
c. Scalability: The number of users and items in a system
grows rapidly. For example, the behavior of such a
user per day may result in his stored data reaching the
size of TBs in some popular websites [30].
Furthermore, the RS should respond in less than a
second to keep users satisfied and to enable them to
continuously engaged with the RS [30]. As a result,
both large-scale datasets and responding time create a
challenge in designing efficient RS and as a result, it
demands colossal computing resources.
d. Rating bias: In the CF approach, recommendations are
based on users’ ratings, but these ratings cannot show
users’ preferences or their clear opinion on some
criteria which makes it difficult in interpreting these
ratings.
C. KNOWLEDGE-BASED APPROACH
This approach is applied in some cases when both content-
based and collaborative-based approaches cannot work
properly because no sufficient ratings are available for a
specific item at hand which affects the recommendation
process [31]. For instance, recommending items that are
rarely purchased like cars, houses and financial services.
This approach uses the user’s knowledge of the item
domain to recommend items that will best satisfy his
requirements [32]. The main advantage and strength of this
approach are that no-existence of the early-rater problems
and cold start problems. While a corresponding drawback is
that it requires knowledge engineering with all of its
attendant difficulties to understand the item’s domain
satisfactorily [33].
D. HYBRID APPROACH
This approach aims to mitigate the weakness of both CF
and CB and benefit from their strengths by integrating two
or more recommendation components or algorithm’s
implementations in a single recommendation system to
enhance RS accuracy and gain better performance [2],[34].
When the hybrid approach is generated through hybridizing
two or more algorithms, two major points must be taken
into account [2]: the first is the recommendation models
that declare the required inputs and the determination on
which the hybrid recommender will be based on. The
second point is determining the strategy that will be used
within the hybrid recommender [35].
Although hybrid approaches may overcome the
limitation of both CB and CF approaches and enhance the
prediction performance, it is expensive to implement,
increases the complexity and needs external information
that is mostly unavailable [23].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 5
CONVENTIONAL SINGLE-RATING RECOMMENDATION
PROBLEM
The aforementioned conventional approaches mostly rely
on a single-criterion rating for generating predictions. This
rating is considered as the overall satisfaction of a user for
the item [2]. In other words, most of the RS approaches
work in a two-dimensional space (Users and Items), and the
RS uses the previous ratings made by a user to predict the
utility function for the user of an item represented as a
totally ordered set as R: Users×Items [36]. However, the
overall rating is not enough to gain high-performance
recommendations because the overall rating is only a
numeric rating with a specific scale that cannot express a
fine-grained analysis about the underlying rationale behind
the users’ rating. It expresses the coarse-grained rating only
(i.e. overall rating cannot reflect the details of user
preferences or interest toward each part of the item to
understand users’ opinions and analyze users’ behaviors)
[8, 37]],[38].
For example, when a user gives a high rating about an
item, it does not mean that the user likes the item as a
whole. There is still a probability that he dislikes some
specific features (i.e. aspects) of that item. Likewise, a low
rating does not imply that the user dislikes everything about
the item. Additionally, when the user puts the overall
ratings, he places various emphases on various aspects and
this has a significant effect on the final decision made by
the user [21].
To overcome the shortage of using a single criterion (or
an overall rating) in RS, multiple-criteria decision analysis
is combined with RS to form a multi-criteria recommender
system (MCRS) to develop the overall accuracy and
performance of the RS [8],[36]. Thus, by adopting multi-
criteria decision analysis, an item recommendation process
is the decision process, a potential user is the decision-
maker, the item attributes are the criteria and the items are
the decision alternatives [2]. The following section presents
a survey of various approaches used for supporting the
multi-criteria recommendation.
III. MULTI-CRITERIA RECOMMENDER SYSTEM
Multiple-criteria decision-making or Multiple criteria
decision analysis (MCDA) is a sub-discipline of operations
research and management science. It aims to develop tools
and methodologies to construct a convincing and reliable
model for addressing complicated decision problems
including multiple criteria goals or multi-alternatives [8].
The idea of combining MCDA with RS is to recommend
items that meet users’ personalized needs. In this case,
personalization refers to “the ability to provide services and
content that are tailored to users depending on the knowledge
about their behaviors and preferences” [39]. As such, RSs
will be able to comprehend how the user thinks and why the
user likes an item and not only what the user likes [8].
Both single-criterion RS and MCRS have the same goal
which is to identify items that are suitable and relevant to fit
the user’s preferences. The difference between them is that
the MCRS has more detailed information about both the
items and users that can be used to efficiently enhance the
recommendation performance. Generally, the rating
function in the MCRS is described as follows:
R: Users × Items R0 × R1 ×.... × Rk; Where R0 is
the overall rating and R1, R2,…, Rk is the rating values
for each singular criterion.
As an illustration, consider a hotel RS meant to
recommend a suitable hotel based on the needs and
requirements of the target user. In the conventional single
criteria RS, a user (U) provides one rating (overall rating)
for the hotel (I) that he has visited, denoted R (U, I).
Specifically, the RS calculates the predicted rating of the
unvisited hotel based on other users' ratings that have
similar preferences for the target user. The precise choice of
the relevant users is crucial to gain an accurately predicted
rating and high-performance recommendation. So, if two
users (U1) and (U2) have rated their overall satisfaction of
the visited hotel 5 out of 10 as presented in Table 2, they
will be considered as neighbors and the predicted rating of
the user (U1) for the unvisited hotel (H4) is calculated
using the ratings of the user (U2) and it will be 9 out of 10.
On the other hand, in a multi-criteria rating, a user provides
ratings for multiple features (i.e. attributes) of an item. For
example, in a hotel RS with four criteria such as room,
price, location, and cleanness, the users will provide ratings
for these four criteria.
TABLE 2 MULTI-CRITERIA HOTEL RECOMMENDER SYSTEM
H1
H2
H3
U1
5 2,2,8,8
7 5,5,9,9
7 5,5,9,9
U2
5 8,8,2,2
7 9,9,5,5
7 9,9,5,5
U3
6 3,3,9,9
7 6,6,8,8
6 4,4,8,8
Suppose we have three users’ ratings for the four
features of the hotel (H2) plus an overall rating as
illustrated in Table 2: U1(7overall, 5room, 5price, 9location, 9cleaness),
U2 (7overall, 9room, 9price, 5location, 5cleaness), and U3(7overall, 6room,
6price, 8location, 8cleaness). If we recommend based on a single-
criterion rating only, all the three users are considered as
neighbors because all of them has an overall rating of 7 out
of 10 for the hotel H2. Considering the three users as
neighbors, despite the difference in their preferences will
affect the accuracy of the recommendation's performance.
While, in the MCRS, when the choice of neighbors is based
on the rating of each item's feature, users U1 and U2 are not
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 6
neighbors because they chose different ratings for the
hotel's features even if they have a similar overall rating, so
the predicting rating for H4 for U1 will be 5. These
additional details of users’ preferences from the item’s
features help the RS to recommend more accurate items and
enhance the RS performance in general.
MCRS becomes a significant trend in studying RS and
it is successful in gaining the attention of both the industry
and research [11]. The numerous research prove that by
using MCRS, the recommendation's accuracy outperforms
the single-rating RS [11],[40].
The item’s criteria in MCRS are either explicitly
represented or implicitly represented in the user-generated
reviews. The next section will discuss these two types of
item’s criteria.
A. MULTI-CRITERIA RECOMMENDER SYSTEM USING
EXPLICIT USER PREFERENCES
In this type, the user gives ratings to each of the item's
features with or without the rating of the whole item. The
user’s preferences are known directly from the users’
ratings on the items’ features (explicitly stated). As an
example, Figure 2 shows two ratings for two hotels from
TripAdvisor; each rating contains an overall rating and
multiple criteria ratings, Hotel A contains 4 features/criteria
(location, cleanliness, service, and value) while hotel B
contains six features (cleanliness, dining, facilities,
location, rooms, and service).
The Lanai Langkawi Beach Resort
(A)
(B)
FIGURE 2. Example of hotel rating from TripAdvisor
A considerable number of research applies this type of
MCRS as illustrated in the works of [1], [8], [40-42].
Additionally, there are some researches that apply MCRS
recently and the following are three of them:
Wasid and Ali [43] proposed a multi-criteria RS
using a clustering approach. The main idea of this
approach is to find more similar neighbors of a user
within the user’s cluster in order to improve the
recommendation set. To achieve that, initially the
users’ preferences are extracted from the multi-criteria
ratings that they have given for items and the user
cluster centers (C) are defined based on the extracted
preferences. Then, the Euclidean distance is used to
assign the closest C for each user and the Mahalanobis
distance is used to compute the top-N neighbors for a
user in the same cluster. After that, the predicting
rating of an item for the user is computed based on
similar neighbors who have been chosen from the
same cluster. The approach is evaluated using Yahoo!
Movies dataset and the users who have ratings of at
least 20 movies are chosen, yielding to 484 users, 945
movies and 19,050 ratings. An experiment is done to
compare the Mean Absolute Error (MAE) using
clustering and without clustering. The result shows
that their clustering method produces the best result
with MAE equal to 2.175.
Zheng [44] developed a utility-based multi-criteria
recommender system in which the items are
recommended to a user based on the utility function of
each item for the user. The utility function is built
using the multi-criteria ratings as the similarity
between the vector of user evaluations and the vector
of user expectations (i.e. the higher degree of over-
expectations, the higher the similarity between the
vectors of the expectation and the user evaluations).
Three similarity measures are used to calculate the
utility score (i.e. Pearson correlation, cosine similarity,
and Euclidean distance). The user expectations are
learned by three optimization learning-to-rank methods
(i.e. Pointwise ranking, Pairwise Ranking, and Listwise
Ranking). Evaluation of the proposed method was done
using two datasets which are TripAdvisor and Yahoo!
Movies [45]. TripAdvisor contains of 14,300 hotels,
1502 users (users with at least 10 ratings are chosen),
and 22,130 ratings including seven criteria ratings (i.e.
price, location, quality of rooms, cleanliness,
convenience of the hotel, service experience of check-
in, and particular business services). While Yahoo!
Movies contains 2,162 users who have issued 62,739
ratings on 3,078 movies that have four criteria (i.e.
story, direction, visual effects, and acting). The
developed method is compared with four baselines: the
matrix factorization, the linear aggregation model [40],
the hybrid context model [46] and the criteria chain
model [47]. The results outperform the baselines in
terms of precision and NDCG and the Pearson
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 7
correlation measure gives the best result and the
Listwise ranking gives the most outstanding
performance.
Tallapally et al. [48] used a deep neural network
technique called stacked autoencoders to solve the
shortage of single rating RS through using multi-
criteria RS. The conventional stacked autoencoders are
extended to fit with the multi-criteria ratings by adding
an extra layer which acts as an input layer to the
autoencoders. The input which is the multiple criteria
ratings is connected to the intermediate layer which is
represented by the items. The intermediate layer is
connected to N consecutive encoding layers where the
latent representation for each item is encoded. The last
encoding layer is connected to N consecutive decoding
layers. The last layer is the output layer in which the
items’ overall rating is predicted. An experiment is
conducted to evaluate the effectiveness of the proposed
network on two datasets: the TripAdvisor and Yahoo!
Movies (YM). For TripAdvisor users who rated at least
five hotels and hotels that have rated by at least five
users are chosen (5-5) leading to 3,550 hotels rated by
3,160 users by 19,374 ratings. Similarly, YM dataset
forms three subset YM 5-5, YM 10-10 and YM 20-20.
The proposed network result is compared with many
baselines such as [49], [1] and [47]. The result
outperforms all the compared baselines in terms of the
following performance metrics MAE, F1, and both
Good Items MAE and Good Predicted Items MAE that
was introduced by Cacheda et al. [50].
B.
MULTI-CRITERIA RECOMMENDER SYSTEM USING
IMPLICIT USER PREFERENCES
In the first type of MCRS, users should give ratings for
each feature of the item regardless of whether he is
interested in the features or otherwise. Unlike this type,
users provide opinions only on the item’s feature that they
are interested in through writing comments (i.e. reviews)
that express their feelings or opinions about their
experiences with the items. This type of approach is
claimed to be more accurate in determining the users'
preferences because users will write exclusively about what
they concerned with regarding the items. This, in turn, will
enhance the accuracy of the RS, because the more accurate
the users preferences are determined, the more accurate
the recommendation provided to the user.
In this type, the criteria of the RS process are implicitly
represented and they need to be extracted from the valuable
information of the user-generated reviews. Figure 3 shows
an example of multi-criteria RS where the users reviews
are collected from TripAdvisor to extract the hotel’s criteria
(i.e. aspects) such as price, food, location, and bed using
sentiment analysis methods. These aspects are used in
building the rating matrix. Then recommend a hotel to a
specific user based on the criteria that are mentioned in his
reviews. The extracted valuable information can be
summarized as the review elements. In the following
section, we will explain the reviews, their benefits in RS
and the review elements. Then, we will explore various
research that have utilized this type of MCRS and explain
how the review elements enhanced the recommendation
process.
User Reviews from TripAdvisor
Hotel Aspects
U: Users
R: Restaurant
N: Number of Restaurant
M: Number of Users
Rating Matrix
R1
R2
R3
..
RN
U1
4
3
5
U2
2
U3
1
3
……………
UM
4
5
Recommended Hotel
FIGURE 3. Multi-Criteria RS with Users Reviews
IV. USER-GENERATED REVIEWS
Recently, vast growth in e-commerce and social Websites
have been observed and these Websites encourage users to
incorporate their experience with each other. Therefore,
there is a significant number of online comments (i.e.
reviews) about various topics such as hotels, products,
movies, restaurants, travel, and services and they continue
to increase on a daily basis [15],[51]. These reviews are
valuable resources for users because they help them in
making decisions before consuming or buying a particular
item. Such reviews may provide an overall overview of the
items or specific comments on certain features of the items
[7]. The reviews may also indicate users’ preferences.
Many users are affected by the other customers' reviews
because it is considered as trustworthy information
compared to the vendor's information [52]. This, in turn,
influences the buying behavior which also helps the
vendors and companies to manage and improve their
products and develop new ones based on the users'
preferences which can be extracted from the written
reviews [53],[54].
The user’s review exhibits distinct characteristics: it is
brief, prone to the occurrence of noise (i.e. misspelling,
many hyperlinks and may include advertisement), written
in the form of plain/textual text without a standard structure
or fixed rules and may contain emoticons. The user writes
them just to explain his usage experience with the item
[15],[51]. Due to the previous characteristics of the reviews,
most RS do not use them in generating recommendations
because of the difficulties encountered by the machines to
comprehend written natural language compared to other
structured data sources [25].
A.
ANALYSING USER-GENERATED REVIEWS
There are many fields involved in processing textual
reviews and extracting the valuable information from the
reviews such as natural language processing, text mining
and opinion mining (or sentiment analysis). In this survey,
we are more interested in the involvement of sentiment
analysis with RS because the sentiment analysis field will
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 8
help us in determining the user’s preferences by analyzing
the user’s sentiment behind his reviews. Sentiment analysis
is a discipline derived from artificial intelligence,
information retrieval, and natural language processing. It
focuses on predicting the positive or negative polarity of the
given entity. Sentiment analysis usually works in three
levels: document-level, sentence-level, and aspect-level.
Leung et al. [27] is the first researcher who indicated the
potential advantages of integrating sentiment analysis field
with the CF approaches to improve the accuracy of the RS
performance through calculating an inferred rating from
users’ reviews when the explicit rating is not available. He
developed a rating inference framework that consists of two
parts; the first part is a rating inference which is responsible
for calculating the inferred rating from user's reviews
through extracting the opinion words (OWs) from the
reviews and aggregating the sentiment polarity of such
OWs to determine an inferred rating. While the second part
is the recommendation process using the CF approach
which recommends items to users based on the calculated
inferred rating. An experiment is done to infer users' ratings
using the MovieLens-100k dataset which contains 1477
movies, 1065 users (i.e. users with more than 10 reviews)
and 30,000 reviews (i.e. reviews with user-specified
ratings). The work of Leung et al. [27] is considered as a
hypothesis because there is no evaluation for the RS
performance after the inferred rating is calculated.
After Leung et al. [27], Aciar et al. [16] made the first
attempt to use the user reviews in building RS through
developing an ontology to convert the review content into a
structured form that is used to provide recommendations.
The ontology model is built manually with two main
components of opinion qualities, which show the user’s
expertise regarding the product; and the product quality,
which indicates the rating that the user made for the product
features. Each review is considered as an ontology instance
and it is automatically mapped onto the ontology through
the mapping process. After all the reviews are mapped onto
the ontology, the product's overall assessment (OA) score
(i.e. the final score for the product based on each product
feature's estimation) is determined through performing a set
of computations. Using OA, this application gives a
recommendation to the user about the product that has the
highest OA based on the features that are mentioned in the
user’s request. For the application evaluation, the authors
have yet to do an empirical test for measuring the
performance of the proposed RS. The authors claim that
their application overcomes the cold start problem in the CF
techniques. It is beyond the scope of this paper to discuss
various research that exploit users reviews in providing
recommendations. Interested readers may refer to the
review done by Chen et al. [10].
B.
ADVANTAGES OF USING USER-GENERATED
REVIEWS IN RECOMMENDER SYSTEMS
Although there are some difficulties in processing users'
reviews, there are major advantages that RS can get benefit
from them to enhance its performance especially the
reviews that can be broadly accessed over the internet. The
following are some of the reviews advantages [37], [55]-
[56]:
a. Alleviate the data sparsity problem in the case of
missing ratings. Reviews provide valuable and
natural information about the user’s interests
which can be extracted and inferred.
b. Relieve cold start problem either for a new user or
a new item. It can be considered as a special
instance of the sparsity problem. There are three
cases for causing this problem: the first is a user
who enters the system for the first time (totally
new), the second is a user who has not made many
ratings (limited experience) and the third is a user
with incomplete (i.e. partial) preferences.
Similarly, for new items either totally new items
are added to the system or items have no ratings.
Reviews can solve this type of problem by
providing information that is used to improve
recommendation such as the work of Wang et al.
[57].
c. In the case of dense data, the reviews still provide
a valuable and detailed information that can be
used to enhance the recommendation accuracy
such as: check the rating quality (compared both
the user’s star rating with the inferred rating from
the reviews text, or from review’s helpfulness),
derive users’ aspects or context-dependent-aspect
or context- independent-aspect preferences.
d. The reviews provide rich and useful information in
some domains like tourism and travel, where it is
difficult to express user’s preferences as scalar
ratings or collect numerical ratings for items.
e. Reviews help to construct both the user model and
item model precisely because they contain much
finer-grained sentiment trend for various features
of a single item.
C.
REVIEW ELEMENTS
After agreeing on the review’s usefulness on improving the
RS performance, we can summarize the rich and valuable
information (called elements) that can be extracted from the
users’ reviews as follows:
a. Total Review of Polarity Score
A user’s overall opinion can be inferred from his
written review about an item whether he or she
likes it or not (positive or negative sentiment), this
overall sentiment can be converted into implicit
ratings. Implicit rating (also called virtual rating) is
generated by aggregating the opinion words of the
review (i.e. mostly the adjectives or adverb) and
then calculating the sentiment polarity of each
opinion word. For example, the opinion words in
the review in Figure 4 are newly, strategic, friendly,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 9
helpful, spacious, nice, clean, tidy and thumbs up.
The total review polarity score is the summation of
all the polarities of the extracted opinion words,
which is done using either machine learning
methods or text mining methods. There is an
implicit relationship between the user’s rating and
his expressed comment [15]. As a result, the
implicit rating takes the role of the explicit rating
(also called the actual rating) in case the explicit
rating is not available such as in [58]. Additionally,
in the case where the explicit rating is available, the
implicit rating can be used either to enhance the
actual rating [59] or be used for both ratings to
enhance the performance of RSs further as
illustrated in [21], [60].
b. Review Terms
Review terms are the words that are frequently used
or that occurred in the reviews and extracting them
is the easiest way for analyzing users' reviews. For
example, the terms in the review in Figure 4 are
hotel, location, staff, room, and budget. The Term
Frequency-Inverse Document Frequency (TF-IDF),
weight scheme is the most widely used statistical
method for measuring the importance of terms. In
this case, the items that are recommended to a user
are based on his term-based profile. Researches that
use this type of element prove its usefulness in
improving the RS performance such as in [20], [61].
c. Review Feature/Aspect/Topic
Review aspect can be defined as a concept that
depicts a topic of each item's domain and it is
restricted to exist in every item; each aspect consists
of a set of words (terms) (e.g., the following terms
"attitude, service, waitress, waiter" correspond to
the "Service" aspect). Aspects comprise of either
noun or noun phrases that are common in the
domain being analyzed and must be in every item.
In contrast, the terms consist of nouns that most
frequently occur in the reviews and it is not
necessary that every term present themselves in all
the items set [7]. Some researchers use the
terminology of feature for aspect such as [62], [63],
while others use topic such as [25] and all the
terminologies (aspect, feature, and topic) have the
same meaning. The identification of aspects is
usually based on two approaches: heuristic-based
and model-based [64]. The former approach
identifies a set of manually-selected keywords (fix
aspects) and then searches for other related terms by
applying the clustering method [65] and relying on
the calculation of the relationship between the
aspect and the candidate’s terms [66, 67]. While in
the latter approach, the aspects are automatically
extracted (denoted as learned aspect) and the most
popular model that is applied is Latent Dirichlet
Allocation (LDA) [68]. A comparison between the
fix and learned aspects will be discussed in detail
later. The review as a whole gives a coarse-grained
opinion about the user’s preferences, while the
review-aspect gives a fine-grained opinion about
the user’s preferences. An aspect-based
recommendation is claimed to enhance the
performance of RSs due to its ability in determining
the specific preferences of the user [9], [38, 69].
Besides the advantage of aspect-extraction in
enhancing the accuracy of the RS, one point must
be taken into consideration, which is the number of
the extracted aspects, because the high number of
the candidate aspects will negatively affect the RS’s
performance and lead to more sparse data. As a
result, aspect selection is of importance that may
influence the performance of the RSs [7]. In the
example illustrated in Figure 4, three aspects have
occurred which are location, staff, and room.
d. Review Context
Review context is the circumstance within which a
user expresses his opinion about the item or some
feature of the item. For example, in Figure 4 the
context is traveling for business. Like aspects,
review contexts are either pre-defined (fixed)
contexts or learned contexts that are automatically
extracted. It can be discovered through rule-based
reasoning, keyword matching or using a classifier
such as LDA-based classifier [9]. The review
context proves its benefit in enhancing the
recommendation performance by either combining
it with the explicit rating to predict the user rating
for an item in a specific context [64] or using it in
the user modeling as proposed by Chen [70]
through using context-dependent aspect preferences
or context-independent aspect preferences.
e. Review Comparative Words
A user sometimes writes his opinion about an item
by comparing it with other items in terms of some
specific features. This type of element called
comparative opinion where it identifies if item A is
superior or inferior to item B in some shared aspect.
The comparative words can be extracted either
using graph relations or a set of linguistic rules and
then use them in RSs in order to enhance the items’
ranking quality such as in [55], [71], [72]. In the
review illustrated in Figure 4, best is a
comparative word used to emphasize that the price
of the hotel is the best compared to others.
f. Review Emoticons
When a user writes a review, he can reflect his
mood using some symbolic representations of icons
(faces) (e.g., smile, joy, sadness, distress faces).
Most of the reviews contain icons, (41% of the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 10
reviews contain emoticons [73]), which make them
available for use in the recommendation process in
spite of the fact that it is harder to detect them
compared to other review elements. Using these
icons, we can infer if the user likes the item or not
(overall rating) and then use this information for
better item recommendation for the users as seen in
[74]. Additionally, these icons can be aggregated
with other review elements to enhance the RS
performance such as in [73].
g. Review Helpfulness
For every user’s review, readers can vote by clicking
the helpful button for the review if they find it useful
for them. These votes can be used in the RS to make
a better predictions, especially for determining the
rating’s quality score such as in [37], [75]. In other
words, the more votes were given for a review, the
more the rating's quality score is assigned. For
example in Figure 4, the review helpfulness is equal
to two which means that two users get to benefit
from this review.
1. Total Review Polarity Score: yellow underline marks the sentiment
words for calculating the total score.
2. Term / Features: green lines show either a term (most frequent words)
or a feature (concept)
3. Context: blue lines
4. Comparative words: red lines
5. Helpful: brown lines.
FIGURE 4. Example of Review Elements
D.
ASPECTS TYPE
As mentioned in the previous section, there are two types of
aspects, fix aspects and learned aspects. In the fix aspects
type, experts define a fixed set of aspects manually such as
food, price, atmosphere, and service for a restaurant
domain. While in the learned aspects type, some methods
are used to extract the aspects automatically from the users’
reviews. Additionally, some researchers identity fix aspects
at the beginning of their methods then search for other
learned aspects that are related to the fixed aspects from the
users’ reviews such as in [63, 76]. Most of the researchers
claim that the learned aspects give better recommendations
compared to the fixed aspects [60, 69, 77].
Learned aspects are preferred than the fixed aspects
due to the following reasons:
a. The number of fixed features (catalog features) are
few. This, in turn, will restrict the range of
estimating inter-item similarity at the
recommendation time.
b. The static features in some domains are technical
in nature; as a result, it is hard to know the
significance of the feature similarities in practical
terms. For example, a camera item in the product
domain has the following features (resolution,
sensor-type, and price) while picture quality and
beautiful design are learned aspects that provide
more details about the
c. camera and make the item’s similarity easier to
find.
d. Learned aspects from the user's review show the
user's preferences are more accurate compared to
the static ones because the user will write in his
reviews only the aspects that he or she is interested
in which will make knowing user's preferences
easier and more obvious.
e. Approaches that use static aspects sometimes fail
to provide compatible recommendations about the
user's preferences. For example, service and food
are both fixed aspects for a specific restaurant, the
user put a 5/5 rating for the restaurant’s service
and 2/5 for the food. When the RS gives a
recommendation to this user, it will recommend
restaurants that have a good service but in fact, the
user does not care about the restaurant's service
aspect and care only with the restaurant's food.
Thus, the system is unable to propose a suitable
recommendation for the user.
f. A high number of item's features produce better
results in the recommendation scenarios because
the item is much better described compared to
using fixed features which are typically small in
numbers.
V. MULTI-CRITERIA REVIEW-BASED
RECOMMENDATION APPROACH
Multi-criteria review-based RS uses user reviews to extract
the criteria that will be used in the recommendation process.
These criteria are defined from the review elements
explained in the previous section. These criteria can be used
on their own or by combining with the actual users’ ratings.
The full cycle (stages) of the multi-criteria review-based RSs
is summarized in Figure 5.
Multi-criteria review based RS is applied by many kinds of
research; each research has an idea about combining the
elements of the user-generated review with the RS; some use
just one review element while others combine more than one
review element. Stated below are recent researches in multi-
criteria review-based RS which are grouped based on the
review elements as discussed in Section IV.
A. TOTAL REVIEW POLARITY SCORE
A user’s general opinion can be inferred from his written
review of an item. This overall sentiment can be converted
into an implicit rating. Most of the works that use total
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 11
FIGURE 5. The full cycle of the multi-criteria review-based RSs
review mainly involved the sentiment analysis approach
whereby the total review polarity scores are generated by
aggregating the score of all the opinion words in the reviews.
However, there exist slight variations among the approaches
that used total review polarity score that is mainly concerned
with the variations of the selected opinion words within the
reviews.
Pappas & Popescu’s [78] work focuses on addressing the
problem of one-class CF. One-class CF problem referred to
the CF approach that deals with nothing but positive explicit
feedbacks.
The issue with the one-class CF problem is the identification
of negative instances. Therefore, this approach extracts
sentiment information from textual reviews, and it is
integrated with the nearest neighbor model into a sentiment-
aware nearest-neighbor model (SANN) by mapping the
sentiment scores according to the user’s ratings (likes or
favorites). The proposed approach consists of two steps:
firstly, the polarity of the user’s reviews is calculated using a
rule-based classifier [79]. Then the sum of the total polarities
of each sentence is normalized. Secondly, the normal
neighborhood model is extended by proposing a sentiment
aware nearest neighbor approach using a mapping function
(MF) to combine the user’s ratings with the user polarities
resulted from a rule-based classifier. Three MF are defined,
which are random mapping, fixed mapping and learned to
map. To evaluate their application three real datasets are used
which contain both user ratings and comments, namely
Vimeo, TED, and Flicker, which are popular sources for
videos, lectures, and images respectively. The proposed
application is compared with five baseline models which are
Top popular, Nearest Neighbors (NN), Singular Value
Decomposition (SVD), Non-negative Matrix Factorization
(NMF), and Sparse Non-negative Matrix Factorization
(SNMF). Three performance measures are calculated which
are the mean average precision (MAP), the mean average
recall (MAR) and the mean average F-measure (MAF) using
5-fold cross-validation. The results show that the proposed
approach outperforms all the baseline models which prove
that there is an inherent relationship between user unary
feedback (likes or favorites) and sentiment expressed in user
comments.
García-Cumbreras et al. [15] approach exploits the
pessimistic and optimistic behaviors among users of RSs.
The idea is to classify users into two classes (Pessimist and
Optimist) according to the average polarity of users’ reviews
then add the user’s class as a new attribute to the CF
algorithm. Five experiments are performed using
RapidMiner to prove the effectiveness of the authors’ idea as
follows: the first experiment studies the relation between the
user’s rating and his reviews through calculating the user’s
rating from his reviews using SVM algorithm. The result
shows an implicit relation between the user’s rating and his
reviews and it proves that the user’s reviews provide valuable
information that enhances RSs performance. While in the
second experiment, the rating prediction is calculated by
feeding the rating from the users’ reviews into the CF using
the k Nearest Neighbor (kNN) algorithm for both user-based
and item-based approach. The result of the user-based
outperforms item-based approach. Thus, the authors try to
enhance the rating prediction in a subsequent experiment by
adding some characteristics for users by either using a rating
behavior or sentiment analysis of the users’ reviews. In the
third experiment, the rating prediction is calculated using a
new attribute called user classification. The classification is
performed based on the average of all the movie ratings
given by each user, whereby the pessimist class is for users
with average ratings < 4 and optimist class is for users with
average ratings > 6. The results show that using the user
category in rating a prediction based on the rating only (not
reviews) slightly reduce the error prediction values of the
ratings. Finally, the last experiment is similar to the previous
one except that the classification of the users is based on the
average polarity of his reviews and not his ratings. The
accuracy of the classification is 80% which proves that the
user can be classified based on reviews only. Additionally,
the rating predicting is calculated in this experiment by
feeding the user class into the CF using kNN, and the results
outperform the conventional CF algorithms which prove that
users reviews can enhance the RSs performance. For
performing the experiments, a new corpus is created from the
Internet Movie Database (IMDb) using an automatic
extraction program which retrieves the user rating and
reviews for each movie.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 12
Zhang et al.[73] proposes an algorithm to infer the
overall rating (or virtual rating) from users’ reviews by
aggregating the sentiments of the opinion words with the
emoticons that are also included in the reviews to mitigate
the sparsity problem in RSs. The proposed algorithm consists
of two main steps: the first step is a review sentiment
classification using SElf-supervised, Lexicon and Corpus-
based (SELC) model to derive the virtual rating, while the
second step is item recommendation using user-based and
item-based CF algorithms. The SELC model combines the
unsupervised model with the semi-supervised model.
Through it, the overall sentiment score of each review is
calculated from the two sets which are the sentiment word
element set and the emoticons set by aggregating the scores
of the words and emoticons that occur in the target review.
Experiments that compared among user-based, item-based
and non-personalized popularity-based approaches use two
datasets: Youku (a Chinese Website) that does not contain
real ratings and Amazon.com (book section) that has real
ratings. The results show that the user-based CF outperforms
both the item-based CF and the non-personalized popularity-
based approach in terms of precision. Experiment on top-N
recommendation shows that the user-based CF that uses both
real and virtual ratings performed the best in terms of
precision. A unique feature of this approach is the
combination of user textual reviews and emoticons , which
exist in 41% of the users’ reviews.
Table 3 summarizes the main contributions of the
recommendation approaches that exploit total review polarity
score elements.
TABLE 3: SUMMARY FOR RESEARCHES THAT USE TOTAL REVIEW POLARITY SCORE
ELEMENT
Research
Main Contribution
Pappas & Popescu
[78]
Deals with one-class CF problems.
García-Cumbreras
et al. [15]
Takes into account the optimistic and
pessimistic users’ behaviors during
ratings.
Zhang et al.[73]
Aggregates the opinion words with the
emoticons to generate overall ratings.
B.
REVIEW TERMS
Review terms are the words that frequently occur in a review.
The use of review terms is mainly found in the works of
D’Addio et al. [20], [60], and D’Addio & Manzato [7, 61],
which primarily use users’ reviews to produce item
representation that is based on the overall sentiment
regarding the items features. The approach follows a four-
step procedure: text pre-processing, feature extraction, item
representation using sentiment analysis and recommendation.
The text preprocessing step aims to convert the unstructured
user reviews into a structured form to extract features that
can then be used to develop a vector-based representation for
each item. The value of each vector’s position represents the
overall sentiment of a specific feature in all the reviews.
The feature extraction step is the main step and it is quite
different from the four types of research conducted by
D’Addio: At the beginning of their approach, the
Transductive Learning for Automatic Term Extraction
(TLATE) method proposed by Conrado et al. [80] is used for
extract the features in the works of both D’Addio et al. [60]
and D’Addio & Manzato [61]. Next, they develop two
techniques for feature extraction term-based and aspect-based
in the work of D’Addio & Manzato [7]. For the term-based
technique, the candidate features are extracted if they are
tagged as a singular or plural noun and their frequencies
exceed the threshold value. While in the aspect-based
technique the features are extracted after the process of
stemming using porter algorithm [81], stop words removal
and clustering. In the last work of D’Addio et al. [61], the
feature extraction is made more precise through extracting
terms and aspects using heuristic and machine learning.
In the item representation using the sentiment analysis
step, the item vector is generated using the extracted feature
as used in the previous step in which each position of the
item vector is the score of a feature. The score is calculated
using the Stanford CoreNLP proposed by Socher et al. [82].
In the work of D’Addio & Manzato [61], the score is
calculated based on the feature popularity of all the users.
The last step is the recommendation step where item
neighborhood-based CF is used. The produced items vectors
are used to discover the items’ similarities instead of the
items’ rating vector and they are then fed into the item
neighborhood-based CF model, and the items with the
highest rating are recommended to the user.
An experiment is conducted to evaluate the proposed
approach for each work; for the works of both D’Addio et al.
[20] and D’Addio & Manzato [61], two databases are
combined which are the MovieLens dataset and the Internet
Movie Database (IMDb). The results show that the proposed
approach has a better value in both [email protected] and MAP
performance measures compared to the recommendations
based only on structured metadata. For the work of D’Addio
& Manzato [7], the proposed approach is tested on the
MovieLens-100K database (ML-100k). The results show the
term-based technique gives better accuracy compared to the
aspect-based technique and the proposed approach in both
techniques outperforms the baselines (the approaches that use
structured metadata) in terms of Root Mean Square Error
(RMSE). Finally, the proposed approach of D’Addio et al.
[60] is tested on two databases which are the MovieLens-
100K (ML-100k) and Movielens-2k (HetRec ML). The
results outperform all the results obtained from the compared
traditional structured metadata constructions used as
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 13
baselines in term of RMSE. The feature extraction technique
based on the terms using machine learning provides the best
results since it gives a large set of features and this provides
more details about the items.
C.
REVIEW FEATURE/ASPECT/TOPIC
The review aspect can be defined as a concept that describes
a topic for each item's domain, and it is restricted to exist in
every item. Each aspect consists of a set of terms. Most
approaches under this category employ algorithms for aspect
extraction and subsequently identify terms and opinion words
associated with each aspect. Sentiment analysis is then
applied to identify the polarity of each aspect and scores or
ratings that have been allocated to the aspects. Some of the
works under this category are as follows.
An approach by Musto et al. [69] follows a two-step
process: the first step is building a framework using a non-
symmetric measure called Kullback-Leibler divergence to
extract the aspects, and for each aspect, the sub-aspects are
extracted using phrases and informativeness measures
proposed by Tomokiyo & Hurst [83]. Subsequently, a
sentiment score for each main aspect and its sub-aspects is
assigned using two strategies: a model-based algorithm that
utilizes deep learning method proposed by Socher et al. [82]
and a lexicon-based algorithm proposed by Musto et al. [84]
which is based on the AFINN wordlist created by Nielsen
[85]. The second step used the extracted aspects to feed the
multi-criteria user-based and item-based CF algorithms. The
sentiment score that resulted from the first step is considered
as a rating, and the similarity between two users (or items) is
calculated using the multi-dimensional Euclidean distance
[40]. Their experimental evaluation included three datasets
Yelp, TripAdvisor and Amazon. The best performance is
achieved from the user-based CF with 10 aspects except for
the Amazon dataset with 50 aspects. The results of the
proposed algorithm also outperform all the single-criterion
recommendation algorithms and algorithms that are based on
the matrix factorization in terms of mean average error
(MAE).
Akhtar et al. [86] present a technique for analyzing hotel
reviews and extracting valuable information from them to
help service providers and customers. The technique is
targeted at TripAdvisor website’s users. Two types of
information are crawled and extracted from the TripAdvisor:
the review text and the metadata. Then, each review is
classified into one of the predefined categories. These
categories are aspects that frequently recur in the review data
set. After that, the topic modeling technique Latent Dirichlet
Allocation (LDA) is applied to reveal the hidden topics from
the reviews. Finally, sentiment analysis is performed using
SentiWordNet corpus to calculate the review’s polarity by
aggregating the positive and negative words in the review.
The experiment is carried out for the Orchid Residency Hotel
and 78 reviews are crawled. After implementing all the
previous processes on the reviews, a summary for the
reviews is given showing the most positive, negative and
neutral reviews. However, no evaluation result is reported.
Bauman et al. [87] develop a recommendation method that
recommends to a user the items with the most valuable
aspects to enhance the user’s experience with those items.
The valuable aspects are identified using Sentiment Utility
Logistic Model method which consists of two parts, the first
part is used for extracting aspect-sentiment pairs using
opinion parser called double propagation proposed by Qiu et
al. [88] for extracting aspects from user reviews and a
sentiment lexicon created by Liu [89] to classify the aspect
sentiment (i.e. positive, negative or neutral). The second part
is used for predicting the overall rating of a review by
combining all the sentiment values for all the extracted
aspects in the user’s review and identifies the influence of
each aspect on the overall rating. After the aspect is identified
and the overall ratings are estimated, users’ and items’
profiles are created and the recommendation process is
completed as a classification problem (i.e. the rating is
classified as like if the estimated overall rating is 4 or 5, and
‘dislike’ for 1, 2 or 3). An experiment is done to evaluate the
performance of the developed method on the Yelp dataset for
the domains of a restaurant, hotel and beauty, and spa. The
number of the extracted aspects for the three domains are 69,
42 and 45 respectively. The proposed method is compared
with three baseline approaches as follows: the popular aspect
approach, the most positive aspect approach and the most
negative aspect approach. The results show the proposed
method outperforms the baseline approaches in terms of the
Precisi[email protected] and Area Under Curve (AUC) [12].
Yang et al. [21] also proposed a similar approach whereby
the technique consists of three main components, opinion
mining, aspect weight computing, and overall rating
inference. The opinion mining component is responsible for
extracting the aspects and opinion words from users’ reviews
then it computes a rating for each extracted aspect. The
aspect extraction is done using the double propagation
method [88] which selects the relationship between the
aspect terms and the opinion word of type Direct
Dependency relationship described using dependency
grammar created by Tesnière [90]. The aspect weight
computing component uses a tensor factorization approach to
compute the aspect weight which expresses the user’s
satisfaction about the aspect. The third component is the
overall rating inference which uses the aspect rating (user
opinion) of component one and aspect weight (user
preferences) of component two to predict the overall rating
for the item that is not rated by a user. Two datasets are used
for the experiment evaluation which are the movies dataset
collected from IMDb website and hotel dataset provided by
Wang [66] in which the user review is associated with a
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 14
rating on seven fixed aspects. Two accuracy metrics MAE
and RMSE are computed, and then the results are compared
with two baseline models (MF that does not consider any text
reviews and TF that extracts user’s opinions not only aspects
weights). The proposed framework’s results outperform the
baseline models with high accuracy for both datasets.
Dong et al. [77] develop an approach for CB that combines
feature similarity and feature sentiment to recommend items
with high priority that are similar and better than the items in
the user’s query. The approach consists of three steps, and
the first step is extracting the product’s features from user-
generated reviews using shallow NLP and statistical methods
proposed by Hu and Liu [91] and Justeson and Katz [92]
respectively. The second step is identifying an opinion for
each extracted feature using the opinion pattern method
proposed by Moghaddam and Ester [93]. The third step is
generating recommendations for the user depending on his
query Q. This approach recommends items that are not only
similar to Q but also have higher relative sentiment
improvement by calculating the product’s score. The top-N
products with the highest score are recommended to the user.
To evaluate the developed approach, data from Amazon.com
is extracted for six product domains such as Phones, Tablets,
and GPS, in which each product has at least ten reviews. Two
measure qualities are used which are rating benefit metric
and query product similarity. The former compares two sets
of recommendations depending on their ratings, while the
latter computes the average similarity between the query
product and the given recommendations based on the
extracted feature. The experiment results demonstrate
significant benefits in the quality of the given
recommendations of the developed approach compared to
Amazon’s recommendations.
Wang et al. [57] focus on solving new users problems with
partial preferences. New users usually relate to the cold start
problem in RS. Thus, most RS will ask users to indicate their
preferences in some aspects or attributes of the items.
However, such preferences are usually incomplete due to the
user’s knowledge gap of the items. Thus, Wang et al. [57]
use users’ reviews at the aspect opinion levels of the items to
predict the missing preferences. The approach extracts the
feature opinions from users reviews and maps it to the static
item’s attributes to predict the user’s incomplete preferences
[63, 65]. The sentiment polarity of each opinionated feature
is calculated using SentiWordNet [94], and this is
subsequently mapped to the static items’ attributes.
Incomplete preferences of the new users are then inferred by
calculating the similarities between the new user and the like-
minded reviewers’ preferences. The recommendation is
based on the new user’s preferences for the top-N items. The
proposed approach is evaluated on a dataset collected from
Amazon, containing 57 users (full preferences are
determined), 64 products (digital cameras), each product has
eight static attributes and 4904 reviews. To simulate the
missing preferences of a new user, the partial preferences are
selected at random (i.e. 2, 4, or 6 of his attribute preferences).
The proposed approach achieves better recommendations
accuracy compared to the four baselines used during
evaluations: random, PopRank, PartialRank, and
HybridRank.
Musat et al. [25] develop a method called topic profile
collaborative filtering (TPCF) to address the problems of data
sparsity and non-personalized ranking methods. TPCF works
as follows: a frequency-based technique is utilized to extract
the topics and this is followed by grouping them based on
their synonyms using Wordnet synsets. Then, for each
extracted topic, the relevant opinion word and its polarity are
identified through constructing a set of relations such as the
work of DeMarneffe et al. [95] and using the OpinionFinder
proposed by Wilson [96]. Finally, based on the extracted
topics and the scores generated from the polarities of the
opinion words, the profile of the user topic is created. To
recommend a product to the user, a product’s score is
calculated using the generated user profile, and the highest
product’s score is recommended to the user. The method is
evaluated using a dataset collected from the TripAdvisor’s
website and its result outperforms the baseline method; the
non-personalized product ranking method in terms of MAE
and Kendall's tau rank correlation coefficient [97]
The work of Chen & Chen [70] attempts to address the issue
of context in order to enhance personalized recommendation.
They suggest that people may possess distinct aspect-level
preferences in various contexts. An algorithm for contextual
recommendation by extracting the relationship between the
weight of each user’s preferences and the related context is
proposed. Both user’s preferences and contextual information
are extracted from the user’s reviews. Two types of
preferences are detected from the user’s reviews, context-
independent preference, and context-dependent preference.
Both preferences are then combined to generate accurate
recommendations. The former preferences are not affected
by context and reflect the individual user’s requirements for
items that do not change over time. It is learned from the
user’s overall ratings and aspect opinions. The latter refers to
the aspect-level requirement under certain context for a user.
Contextual opinions are extracted using a rule-based
approach and keyword matching. They experiment with
mutual information, chi-square statistics, and information
gain measures when assigning weights for various aspects in
different contexts. Finally, both context-independent and
dependent preferences are combined to compute an item’s
matching score, and the highest top-N scores are
recommended to the user. The proposed algorithm has been
tested on two datasets, TripAdvisor and Yelp which are
restaurant datasets. They compare their algorithm with some
baseline methods such as Context Freer and Context Pre-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 15
filter; the result of the proposed algorithm outperforms all the
compared methods in terms of the Hit Ratio and Mean
Reciprocal Rank. Additionally, the chi-square statistic
method generates the best results compared to the other two
contextual weighting methods.
Jamroonsilp & Prompoon [55] present an approach for
item ranking based on the analysis of the user’s reviews. Five
pre-defined aspects for software items are defined, and the
software ranking is calculated by analyzing the users
comparative sentences from the user’s reviews for each
software aspect. It consists of three phases, gathering user
reviews, analyzing the gathered reviews and calculating
software ranking. In the first phase, users’ reviews are
collected from google custom search API for three software
topics which include database management system, PHP web
application framework, and content management system.
While in the second phase, the quality term (aspect)
mentioned in the users review is classified as one of the five
pre-defined aspects based on the classes given by Coallier
[98] and Mairiza et al. [99]; in addition a score for the quality
term is assigned using the lexicon created by Hu & Liu [91].
This is followed by the extraction of the comparative
relation, and a polarity score is assigned for the relation of
the two types of software and the compared quality term
mentioned in the user’s reviews. Finally, in the third phase,
the overall software score is calculated based on all the
quality aspects scores and the relation’s score that are
calculated in the previous phase. The approach is evaluated
using the dataset that is collected in the first phase using
Pearson’s correlation coefficient and compared with a human
expert and the work of Zhang et al. [71]. It achieves a high
Pearson's correlation coefficient with value 0.935 which
proves that the software ranking is statistically consistent
with the human experts’ rankings and better than Zhang’s
approach.
Zhang et al. [37] propose an approach that exploits the
aspect-level sentiment of the users’ reviews with the support
of helpfulness reviews. The approach consists of four phases.
The first phase is extracting aspects using a latent Dirichlet
allocation model and the words with the highest conditional
probability are chosen as aspects. The second phase is
determining the sentiment orientation of each extracted
aspect using the sentiment lexicon SentiWordNet. By using
the extracted aspects and their sentiment orientation, the item
model and user model are created in Phase 3. The item model
is represented as a vector with the mentioned aspects that
appear in the product’s reviews with the support of the
helpfulness reviews to give weight to the related aspects. The
user model is represented as a vector with the aspects that
frequently occur in the user’s reviews. The last phase is the
recommendation phase, a score for each user and a candidate
item pair is calculated by multiplying both user’s vector and
item vector, and the items with the top k scores are
recommended to the user. An experiment conducted on Yelp
dataset (i.e. restaurant domain) evaluates the proposed
approach. The approach is compared with two baseline
methods (CF based on matrix factorization approach and
popularity-based approach), and its result outperforms the
two baseline methods in terms of mean reciprocal rank
(MRP).
Table 4 provides a summary of all the approaches previously
discussed that use aspects elements in the review-based
recommendation.
TABLE 4: SUMMARY OF RESEARCHES THAT USE FEATURE/ASPECT/TOPIC ELEMENT
Research
Main Contribution
Musto et al. [69]
Feeds the CF with the aspect and the
sub-aspects.
Akhtar et al. [86]
Uses the aspects as categories and
classifies users based on the
categories.
Bauman et al. [87]
Identifies the most valuable aspects
and recommends items to user based
on them.
Yang et al. [21]
Integrates users preferences and
opinions on different aspects into the
CF algorithm.
Dong et al. [77]
Combines feature similarity with
feature sentiment to recommend the
high prioritized items that are similar
and better than the items in the user’s
query.
Wang et al. [57]
Solves the new user problem with
partial preferences through using
users reviews at the aspect opinion
levels of the items to predict the
missing preferences.
Musat et al. [25]
Creates users profile based on the
extracted topics and using them to
calculate the product’s score to be
used in the recommendation process.
Chen & Chen [70]
Combines between two review
elements (aspect and context) to
enhance recommendations.
Jamroonsilp &
Prompoon [55]
Analyzes the users’ comparative
sentences from user reviews for each
software aspects to calculate the
software ranking.
Zhang et al. [37]
Extracts aspects from user reviews and
assigns weights for them by using the
helpfulness element to develop user
and item model.
Following Table 5 that provides a summary of all the 28 surveyed
approaches categorized as multi-criteria review-based recommender
system.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 16
TABLE 5: MULTI-CRITERIA REVIEW-BASED RECOMMENDER SYSTEMS
Research
Approach
Review Analysis Method
Sentiment Analysis Technique
Review
Elements
Rating
Type
Profile
Type
Recommen
dation
using
Evaluation of Data Set
Performance
measure
Data
Set
Domain
Size
Item
Users
Rating
Reviews
Content Based
Collaborative
Filtering
Preferenece
Based
Hybrid
Statistical
Machine
Learning
Lexicon /
Dictionary for
Sentiment
Explicit
Implicit
Rating matrix
User
Item
Overall rating
Preferences
rating
(Rafailidis
&
Crestani
2019)
[100]
Not used
Total
Review
polarity
scores
Amazon
Beauty
For all the dataset Domains
MSE=
1.311
Health
1.203
Cell
Phone
1.382
Clothing
N/A
N/A
1075
4316
82353
4
1.163
)Cheng et
al. 2019)
[101]
Not used
Aspect
Yelp
Local
business
1416
7109
N/A
13501
5
RMSE =
0.9562
Amazon
Beauty
1210
1
22363
N/A
1985
02
1.0613
CDs
6442
1
75258
N/A
1097
597
0.9250
Phone
1042
9
27879
N/A
1943
9
1.1456
Clothing
2303
3
39387
N/A
2786
77
1.0118
Movies
5005
2
12396
0
N/A
1697
533
1.0091
(Pappas
and
Popescu-
Belis 2016)
[78]
MPQA
polarity
lexicon 2
Total
Review
polarity
scores
TED
Talks
1203
4961
11324
1
35229
MAP = 6.10
MAR= 22.73
MAF = 9.63
Vimeo
Videos
2000
7071
15520
7
32639
MAP = 5.48
MAR= 16.92
MAF = 8.28
Flicker
Images
1994
9963
161398
30456
4
MAP= 15.26
MAR= 56.75
MAF = 24.05
(Musto et
al. 2017)
[69]
Stanford
CoreNLP3 &
a lexicon
based on
AFINN
wordlist 4
Learned
Aspect
Yelp
restaurant
45981
11537
N/A
22990
6
#asp=10
MAE= 0.8362
TripAdv
isor
hotel
536952
3945
N/A
79695
8
#asp=50
MAE= 0.7111
Amazon
product
826773
50210
N/A
132475
9
#asp=10
MAE= 0.6276
(Yang et
al. 2016)
[21]
Seed opinion
lexicon 5
Learned
Aspect
IMDb
Movies
1507
879
N/A
41128
MAE= 1.204
RMSE=1.647
[66]
dataset
Hotel
1850
64215
N/A
81085
#asp =8
MAE= 1.134
RMSE=1.468
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 17
Research
Approach
Review Analysis Method
Sentiment Analysis Technique
Review
Elements
Rating
Type
Profile
Type
Recommen
dation
using
Evaluation of Data Set
Performance
measure
Data
Set
Domain
Size
Item
Users
Rating
Reviews
Content Based
Collaborative
Filtering
Preferenece
Based
Hybrid
Statistical
Machine Learning
Lexicon /
Dictionary for
Sentiment
Explicit
Implicit
Rating matrix
User
Item
Overall rating
Preferences
rating
(Cheng et
al. 2018)
[38]
Not Used
Learned
Aspects
19
DataSe
t from
Amazo
n and
Yelp
Product
&
restaura
nt
For Yelp only
K=5
RMSE= 1.155
63300
169257
165967
8
N/A
(Wang et
al. 2013)
[57]
SentiWordNet
6
Learned
aspects
Amazon
Products
[digital
camera
]
64
57
N/A
4909
Hit ratio =
0.2807
±
0.037
2
With given 2
attributes
(Musat et
al. 2013)
[25]
OpinionFinder
[96]
Learned
Topics
TripAd
visor
Hotel
216
59067
N/A
68049
reduce the
MAE error
by 8% when
no of
reviews = 15
(Zhang
et al.
2013)
[73]
HowNet
Sentiment
Dictionary
(Chinese
dictionary)
-Total
Review
polarity
Scores
- Emoticons
Youku
movies
1085
6450
N/A
10813
7
K=10
precision
U-U =5.1%
I-I
=4.7%
Amazon
book
1805
5502
N/A
31873
0
K=10
precision
U-U =5.6%
K=20
precision
I-I =5.2%
(Bauman
et al.
2017)
[87]
sentiment
lexicon [89]
Learned
aspects
Yelp
restaurant
N/A
23209
N/A
60211
2
=0.818
AUC=0.707
Yelp
Hotel
N/A
352
N/A
5669
=0.849
AUC=0.745
Yelp
Beauty
&Spa
N/A
349
N/A
5065
p3 =0.862
AUC=0.663
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 18
Research
Approach
Review Analysis Method
Sentiment Analysis Technique
Review
Elements
Rating
Type
Profile
Type
Recommen
dation
using
Evaluation of Data Set
Performance
measure
Data
Set
Domain
Size
Item
Users
Rating
Reviews
Content Based
Collaborative
Filtering
Preferenece
Based
Hybrid
Statistical
Machine Learning
Lexicon /
Dictionary for
Sentiment
Explicit
Implicit
Rating matrix
User
Item
Overall rating
Preferences
rating
(Jamroons
ilp &
Prompoon
2013)
[55]
SentiWordNet
6
-Fixed
aspects
-comparative
words
Google
Software
105
N/A
N/A
3542
Pearson's
correlation
coefficient =
0.935
(D’Addi
et al.
2017)
[60]
Stanford
CoreNLP
[102]
-Learned
Aspects
Or
-Terms
Movie
Lens-
100K
&
IMDb
Movies
1682
943
100000
15863
Terms with k=20
RMSE=0.930
2
=0.1043
MAP =0.0658
RMSE=0.931
0
=0.1041
MAP =0.0656
Stanford
CoreNLP
[102]
-Learned
Aspects
Or
-Terms
Movie
Lens-
2K
(HetRec
ML)
&
IMDb
Movies
10197
2113
855598
656031
RMSE=0.796
4
=0.1047
MAP =0.0258
RMSE=
0.8025
=0.1057
MAP = 0.0256
(Zhang
et al.
2015)
[37]
SentiWordNet
6
-Learned
aspects
-helpfulness
Yelp
restaurant
N/A
30000
N/A
60000
MRP=0.627
(D'Addio
&
Manzato
2015)
[7]
Stanford
CoreNLP
sentiment
analysis
tool
-Learned
Aspects or
-Terms
Movie
Lens-
100K
&
IMDb
Movies
1682
943
100000
15863
Using term
CF
RMSE=
0.93106
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 19
Research
Approach
Review Analysis Method
Sentiment Analysis Technique
Review
Elements
Rating
Type
Profile
Type
Recommen
dation
using
Evaluation Data Set
Performance
measure
Data
Set
Domain
Size
Item
Users
Rating
Reviews
Content Based
Collaborative
Filtering
Preferenece
Based
Hybrid
Statistical
Machine Learning
Lexicon /
Dictionary for
Sentiment
Explicit
Implicit
Rating matrix
User
Item
Overall rating
Preferences
rating
(Ebadi &
Krzyzak
2016)
[3]
Not Used
Polarity score
is predicted
learned
Aspects
The detected
keywords list
is manually
refined
TripAd
visor
Hotel
4333
148429
N/A
87856
1
MSE=1.22
MAE=0.78
RMSE=1.1
(Wang &
Chen
2012)
[63]
SentiWordNet
Fix Feature
heuristic
www.
buzzilli
ons.co
m
digital
camera
186
3754
N/A
7485
9
5
9
Percentile=0.69
7
(Chen
&Chen
2015)
[64]
opinion
lexicon[79]
-Fix Context
-Fix Aspect
heuristic
Yelp
Restaurant
11485
23152
N/A
23707
7
TripAd
visor
Hotel
11405
30039
N/A
35711
3
(D'Addio
et al.
2014)
[20]
Stanford
CoreNLP
sentiment
analysis
tool
Terms
Movie
Lens-
100K
&
IMDb
Movies
1682
943
100000
15851
K=57
0,05991
MAP=
0,04764
(Dong et
al. 2013)
[62]
opinion
pattern
method [93]
Learned
feature
Amazon
Electro
nic
Product
1000
599
N/A
42482
Mean
relative rank
achieve a
rank
improvement
of only about
4% (Jaccard)
and 9%
for Cosine
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2954861, IEEE Access
VOLUME XX, 2017 20
Research
Approach
Review Analysis Method
Sentiment Analysis Technique
Review
Elements
Rating
Type
Profile
Type
Recommen
dation
using
Evaluation of Data Set
Performance
measure
Data
Set
Domain
Size
Item
Users
Rating
Reviews
Content Based
Collaborative
Filtering
Preferenece
Based
Hybrid
Statistical
Machine Learning
Lexicon /
Dictionary for
Sentiment
Explicit
Implicit
Rating matrix
User
Item
Overall rating
Preferences
rating
(D'Addio
&
Manzato
2014)
[61]
Stanford
CoreNLP
sentiment
analysis
tool
Terms
Movie
Lens-
100K
&
IMDb
Movies
1682
943
100000
15851
Threshold 30
0.04623
MAP=
0.03684
(García-
Cumbreras
et al. 2013)
[15]
WeFeelFine1
[103]
Total
Review
polarity
scores
IMDb
Movies
2713
4112
N/A
80848
RMSE= 2.14
MAE = 1.63
(Wang et
al. 2018)
[6]
lexicon is built
for movies
Terms