A customized non-exclusive clustering algorithm for news recommendation systems

Clustering is one of the main tasks in machine learning and data mining and is being utilized in many applications including news recommendation systems. In this paper, we propose a new non-exclusive clustering algorithm named Ordered Clustering (OC) with the aim is to increase the accuracy of news recommendation for online users. The basis of OC is a new initialization technique that groups news items into clusters based on the highest similarities between news items to accommodate news nature in which a news item can belong to different categories. Hence, in OC, multiple memberships in clusters are allowed. An experiment is carried out using a real dataset which is collected from the news websites. The experimental results demonstrated that the OC outperforms the k-means algorithm with respect to Precision, Recall, and F1-Score.


Introduction
Clustering is one of the main tasks in machine learning and data mining that has been widely applied in the field of time series prediction, recommendation, and parameter estimation [1,2].The items with the highest similarities are grouped together in the same cluster and those with considerable dissimilarity are grouped into different clusters [3,4].In news recommendation systems, scalability is one of the issues that requires delicate algorithms to effectively deal with huge amount of news articles [5].To address the scalability issue, several strategies can be used such as MinHash [6] and clustering algorithms.The most commonly used clustering algorithms in recommendation systems are hierarchical clustering [5,7] and k-means [8,9].Nevertheless, these clustering algorithms do not take into consideration the news nature in clustering the news items.Consequently, a news item will only belong to a single cluster while in reality, a news item can be categorized in more than one news category.Moreover, it is obvious that users' interests are not limited to one news category.
Hence, the clustering algorithm to be employed in news clustering should be able to cluster news items without limiting their membership to a single cluster.It is also important to ensure that the clustering algorithm will not add any additional complexity.It is impossible to exclusive clustering approaches to cluster news items into different clusters.Any other way, fuzzy clustering approaches are very complicated without any considerable improvement [10].
In this paper an efficient clustering algorithm named Ordered Clustering (OC) is proposed for news clustering based on the news nature in which a news item can belong to more than one category with the aim to achieve accurate and diverse recommendations.Our algorithm considered the highest peer-to-peer item similarities and grouped the items into multiple clusters.
The rest of this paper is organized as follows.The related works are presented in the following section.This is then followed by description of the proposed clustering algorithm.The experimental evaluations are then presented which is followed by a summary of this research work.

Related Works
Scalability is one of the issues in news recommendation that requires effective algorithms to deal with large news corpus.One of the common strategies used for solving scalability is clustering.In news recommendation systems, news retrieval is performed based on the user's access pattern in news reading and the news content is compared to the users' read news contents.Selecting a suitable clustering algorithm is essential for achieving reasonable results.Some extensively utilized clustering algorithms in the news recommendation systems are reviewed as follows.

Locality Sensitive Hashing (LSH)
The Locality Sensitive Hashing (LSH) technique [11] is introduced to answer the near-neighbor search problem.Since, many applications utilizing LSH have been found in numerous fields [12].The main idea of the LSH technique is to use several hash functions to hash the data points.Thus, for each hash function, the probability of collision has to be higher for the items near to each other than for those that are far away from each other.Then, near neighbors could be determined by hashing the query point and stored by the elements retrieved from the buckets including that point.LSH schemes are identified to exist for the following similarity or dissimilarity (distance) measures: Jaccard's coefficient [13,14], Hamming norm [15], Earth Mover's Distance (EMD), and cosine distance [16].
Min-Hash (Min-wise Independent Permutations) is a LSH scheme first introduced by Cohen [14].It is a probabilistic clustering method that places a couple of users in a cluster with a probability proportional to the overlap between the sets of the news items these users have accessed.A given user i u is represented by a set of the news items that the   has read based on his/her click behavior.Click history    represents the   's click behavior.The similarity ratio (  ,   ) between the two users   and   is defined as the intersection between their news sets computed based on Jaccard's coefficient.Jaccard's coefficient is a value between 0 and 1.The distance function is defined as (  ,   ) = 1 − (  ,   ) [16].Min-Hash uses a simple pruning technique.The users, who have read at least one news without reducing the number of candidates to a manageable number owing to the presence of popular news stories, are realized via the hash table.

Probabilistic Latent Semantic Indexing (PLSI)
To conduct a collaborative filtering, Probabilistic Latent Semantic Indexing (PLSI) models were developed by Hofmann [17].Accordingly, modeling of the news items ( ∈ ) and users ( ∈ ) as random variables is done by taking their values from the spaces of all possible news and users.The joint distribution of the news and users is modeled to learn the relationship between the news and users.To find this relationship, a hidden variable Z with the values derived from  ∈  is presented while ‖‖ = . represents the news categories and user communities.This model can be formally written as a mixture model proposed in Equation (1): The Conditional Probability Distributions (CPDs) of p(n|z) and p(z|u) are displayed by parameter θ, through which the model can be completely specified.The model mainly introduces the latent variable Z leading to the conditional independence of users and items.In this generative model, state z of the latent variable Z is selected for u as a random user with regard to CPD p(z|u).Then, the sampling of the item s is followed based on z selected from CPD p(n|z).

Hierarchical Clustering
A hierarchical clustering algorithm [18] partitions data items into a tree of clusters.Hierarchical clustering methods are categorized as either divisive or agglomerative, it depends on whether the hierarchical decomposition is planned in a splitting (topdown) or merging (bottom-up).Hierarchical clustering algorithm suffers from its inability to accomplish adjustment once a split or merge decision has been performed.Because of it, if a particular split or merge decision later turns out to have been a poor option, the method is not able to back down and correct it.Latest research studies have accentuated the integration of hierarchical agglomeration with iterative relocation methods.

K-means
The k-means algorithm [18] takes k, as a input parameter, and clusters a set of n data items into k clusters so that the intra-cluster similarity result is high but the resulting inter-cluster similarity is low.Cluster similarity is computed by consdering the mean value of the items in a cluster, tht is viewed as the cluster's center or centroid of gravity.The k-means algorithm performs as follows.Firstly, it arbitrarily selects k of the data items, each of which primarily represents a cluster center or mean.To each remaining data item, an item is assigned to the cluster to which has higest similarity, based on the distance between the cluster center and the data item.It then measures the new centroid for each cluster.This process repeats until the criterion function converges.Commonly, the square-error criterion is utilized, defined as follows.
where E is the summation of the square error for all data items in the dataset;   is the mean of cluster   ; and p is the point in space which represents a given data item (both   and p are multidimensional).Namely, for each data item in any cluster, the distance from the data item to its cluster mean is squared, and the distances are summed.This measure attempts to generate the resulting k clusters as separate and as compact as possible.

Application of Clustering Algorithm in News Recommendation System
PENERATE [7] used a group-based hierarchical clustering method.This approach firstly separates user items into different sets based on their historical behaviors and each user item might be allocated to a number of sets.In SCENE [5], LSH [15] and hierarchical clustering are integrated to address the scalability issue of news recommendation.Initially, the recently published news items are partitioned into small sets based on the news content using the LSH while a 2-layer hierarchical clustering is employed in the next step.The leaf nodes indicate the sets accompanied by their topic distributions and the inner nodes hold a pair of news sets representing more common news topics.Google News is a Collaborative Filtering (CF) based on the personalized news recommendation system [19].News recommendation is generated by using 3 approaches, namely: Min-Hash clustering, PLSI, and co-visitation counts of the news items.CCNS is a vertical news recommendation system that focuses on helping users to find their preferred news in a specific field and utilizes adjusted k-means for clustering users [8].Table 1 presents a brief comparison among the aforementioned clustering approaches.

The Proposed Algorithm
In this paper, a new non-exclusive clustering algorithm is designed that is called Ordered Clustering (OC).To examine OC a three phases approach is designed as shown in Figure 1.These phases are historical data conversion into a User Click Behavior (UCB) Matrix, Similarity Matrix (SM) constructoin, and user clustering.  ,   ,  −  >, where   denotes the ith user,   represents the jth news item, and read-time stands for the time that the user accessed the news item.The entry of UCB is 1 if user   has accessed the news item n j and 0 otherwise.Table 2 presents an instance of a UCB binary matrix.

Construct Similarity Matrix
The main goal of this phase is to calculate the similarities between users based on their historical reading behaviors.By calculating peer-to-peer similarities between the users, a Similarity Matrix (SM) is constructed.Binary Jaccard's similarity measure is used to calculate the similarities between the users [19].Each user's reading behavior can be determined as a bit string.For example, based on Table 2, the bit string of the reading behavior of user  1 (the first row of UCB matrix) is (10010110101001110011), where "1" and "0" denote read and unread, respectively.

Ordered Clustering Algorithm
The purpose of this phase is to cluster users based on the Ordered Clustering (OC) algorithm.OC algorithm includes new features and could not be classified into exclusive or fuzzy clustering classifications.It is generated based on the news nature and user reading behavior in news reading.Each user may be interested in a variety of news categories and a news article could be accessed by various users with different behaviors and preferences.In OC algorithm, multiple memberships are allowed with no membership weights or values and hence called a non-exclusive clustering.For instance, a user may be interested to read both sport news and economic news.The sport news may be read by several users.In this way, a news item should be recommended to several users, and a user may be categorized into several groups of news.
The objective of clustering is to group the dataset  consisting of d items into  clusters.In OC, the number of clusters is determined during the execution of the clustering algorithm.A multiple binary cluster of  can be defined as a family of subsets {  | 1 ≤  ≤ } ⊂ () (() is the power set of ) with the below properties: ∅ ⊂   ⊂ , 1 ≤  ≤  . ( Equation ( 3) means that the union of all the subsets   contains all the data in .The subsets can be added, as expressed by Equation ( 4), and none of the subsets is empty or contains all the data in  as presented in Equation ( 5).In terms of membership functions, a cluster can be expediently represented by the cluster matrix  = [  ] × .The ith row of the  matrix includes values of the membership function   of the ith subset   of .It follows from equations 3, 4, and 5 that the elements of  must satisfy the below conditions: Ordered Clustering algorithm selects a pair of users with the highest similarity ratio in the Similarity Matrix and groups these users into the same cluster.This process is repeated with the next highest similarity ratio.This means that the users in a cluster is ordered in descending order based on the similarity ratios between the users.Consequently, given a user   of cluster   , the left-hand side neighbors of   is said to be more similar than the right-hand side neighbors of   .For example, refer to Figure 2 which shows a cluster   with  members.To the given user   the left-hand side neighbors   ,   , ...,  −1 are more similar compared to the   's right-hand side neighbors  +1 , ...,   as the similarity ratio values on the left-hand side of   are greater than those on the right-hand side.Step 4 repeats and ensures that eventually all users of U are considered and belong to at least a cluster.In Step 6, the highest similarity ratio in SM is identified.This value indicated by   is assigned to a variable called Max (Step 7).In Step 8, the entry   of  is set to 0 to avoid it from being chosen again in the next iteration.In Step 9, the existing clusters are checked and if user   is a member of an existing cluster say   then user   is inserted into the same cluster   of user   (Step 13).However, if   is not found in the cluster   but   is found to be a member of cluster   then   is inserted into the cluster   (Step 20).Yet, if both users   and   do not belong to any clusters, then a new cluster   is created and both   and   are inserted into the cluster   (Step 28).The algorithm is terminated when all users are members of at least one cluster, i.e.  = ∅.

OC Algorithm
Find the maximum value in SM 7.
FOR By running the algorithm on the CM given in Figure 4, the clusters created are as shown in Table 4.

Experiment Environment
In this section, an experimental evaluation is provided to show how the proposed clustering algorithm differed from k-means for news recommendation system.First, the real dataset used in the experiments are introduced.Then, the results of the k-means method and the proposed OC algorithm are presented.

News Dataset
The dataset was gathered from Twitter information streams that was crawled over a period of more than 60 days, from October 2010 to January 2011.In this dataset, the streamed news articles were accessed by more than 20,000 users with the more than 10 million times.To relate the tweets with the news articles, more than 60 wellknown news agencies, such as New York Times, CNN, and BBC were monitored.The tweets create a total of 77,544 news articles [20].
We are interested in analyzing the clustering algorithms and their accuracy in prediction.Thus, we generated a sample of 1,009 users, who read at least 4 news items per day.This sample dataset contained 1,161,798 news-reading records.From our sample, 38,737 news items were derived.Table 5 presents the characteristics and the descriptive statistics of the dataset.

Evaluation of Clustering Effectiveness
The effectiveness of the proposed clustering algorithm, OC, is compared to the result of the k-means algorithm.The detailed experiments are illustrated as follows.
Both clustering algorithms were implemented using Java on a Pentium V PC with MS-Windows 8.0 and 4 GB of RAM.Each experiment was run 10 times.The recall, precision, and F1-scores have been measured for OC and k-means algorithms.Each user is assumed as an entry in the clustering.
Figure 5 depicts the results of the evaluation.It can be concluded from Figure 5 that the proposed OC algorithm significantly outperforms k-means in terms of accuracy with 20.6% improvement in Recall, 51.5% improvement in Precision, and 46% improvement in F1-score.To corroborate the effectiveness of our proposed clustering algorithm, a detailed comparison was also provided between our method and the general recommendation method that utilized k-means based on pairwise similarities.For each approach, we arbitrarily selected 100 users to provide recommendation for them.We then plotted the precision and recall of the news items recommended to each user.Figure 6 presents the results of recall and precision of this experiment.In Figure 6, the row denotes recall of the algorithms, "♦" shows the precision values for k-means, and "▲" demonstrates the Ordered Clustering precision.It can be concluded from Figure 6 that besides obtaining a higher recall and precision in the OC algorithm, the performance distribution of OC algorithm is more dense than that of the k-means algorithm.This assures the efficiency of OC algorithm for the news recommendation system.In the accomplished experiments of this study, all the users were equally treated as the experimental subjects.Actually, users with different news reading behaviors, such as various daily reading frequencies, might have different patterns of news topic preferences and then, the dynamic interests in the news items could vary very much.

Conclusion Remark and Future Research Direction
In this paper, we proposed a new non-exclusive clustering algorithm named Ordered Clustering (OC) that is dedicated to news recommendation.In this algorithm the highest peer-to-peer item similarities is considered and these items are grouped into multiple clusters.OC is a qualified and specified clustering algorithm in news recommendation based on the news nature.The results indicated that multiple memberships in the clusters contribute to the accuracy enhancement.
The experimental results and the higher F1-score demonstrated that OC is more efficient for clustering news items and generating accurate recommendations than the k-means.Our evaluation is done in an offline manner with real data.Nevertheless, a better evaluation can be performed in an online recommendation system.For future work, to achieve more precise results, assessment should be done in a real online environment.

Figure 1 .
Figure 1.The Phases of the Proposed Approach.

Figure 5 .
Figure 5. Accuracy Metrics in Different Clustering Algorithms.

Figure 6 .
Figure 6.Recall-precision plot for different user clustering algorithms; remarks: "▲" represents news recommendation results using ordered clustering and "♦" denotes news recommendation results obtained from k-means clustering.

Figure 3. The Ordered Clustering Algorithm.
each cluster   in C AND Found ≠ T DO 10.