本文共 4556 字,大约阅读时间需要 15 分钟。
基于用户的协同过滤算法的思想是有相似兴趣的用户(user)可能会喜欢相同的物品(item)。因此,计算用户的相似度成为该算法的关键步骤。本文实现过程中使用的相似度公式如下:
ω u v = N ( u ) ⋂ N ( v ) N ( u ) ∗ N ( v ) \omega_{uv}=\frac{N(u)\bigcap N(v)}{\sqrt{N(u)*N(v)}} ωuv=N(u)∗N(v)N(u)⋂N(v) 其中 N(u) 表示用户 u 看过的电影个数。userId : 用户 ID
movieId : 用户看过的电影 ID rating : 用户对所看电影的评分 timestap : 用户看电影的时间戳
N : 记录用户看过的电影数量,如: N[“1”] = 10 表示用户 ID 为 “1” 的用户看过 10 部电影;
W : 相似矩阵,存储两个用户的相似度,如:W[“1”][“2”] = 0.66 表示用户 ID 为 “1” 的用户和用户 ID 为 “2” 的用户相似度为 0.66 ; train : 用户记录数据集中的数据, 格式为: train= { user : [[item1, rating1], [item2, rating2], …], …… } item_users : 将数据集中的数据转换为 物品_用户 的倒排表,这样做的原因是在计算用户相似度的时候,可以只计算看过相同电影的用户之间的相似度(没看过相同电影的用户相似度默认为 0 ),倒排表的形式为: item_users = { item : [user1, user2, …], ……} k : 使用最相似的 k 个用户作推荐 n : 为用户推荐 n 部电影
"""@description: a demo for user-based collaborative filtering@author: Chengcheng Zhao@date: 2020-11-15 15:21:13"""import randomimport operatorclass UserBasedCF: def __init__(self): self.N = { } # number of items user interacted, N[u] = the number of items user u interacted self.W = { } # similarity of user u and user v self.train = { } # train = { user : [[item1, rating1], [item2, rating2], …], …… } self.item_users = { } # item_users = { item : [user1, user2, …], …… } # recommend n items from the k most similar users self.k = 30 self.n = 10 def get_data(self, file_path): """ @description: load data from dataset @file_path: path of dataset """ with open(file_path, 'r') as f: for i, line in enumerate(f, 0): if i != 0: # remove the title of the first line line = line.strip('\n') user, item, rating, timestamp = line.split(',') self.train.setdefault(user, []) self.train[user].append([item, rating]) self.item_users.setdefault(item, []) self.item_users[item].append(user) def similarity(self): """ @description: calculate similarity between user u and user v """ for item, users in self.item_users.items(): for u in users: self.N.setdefault(u, 0) self.N[u] += 1 for v in users: if u != v: self.W.setdefault(u, { }) self.W[u].setdefault(v, 0) self.W[u][v] += 1 # number of items which both user u and user v have interacted for u, user_cnts in self.W.items(): for v, cnt in user_cnts.items(): self.W[u][v] = self.W[u][v] / (self.N[u] * self.N[v]) ** 0.5 # similarity between user u and user v def recommendation(self, user): """ @description: recommend items for user @param user : the user who is recommended, we call this user u @return : items recommended for user u """ watched = [i[0] for i in self.train[user]] # items that user have interacted rank = { } for v, similar in sorted(self.W[user].items(), key=operator.itemgetter(1), reverse=True)[0:self.k]: # order user v by similarity between user v and user u for item_rating in self.train[v]: # items user v have interacted if item_rating[0] not in watched: # item user hvae not interacted rank.setdefault(item_rating[0], 0.) rank[item_rating[0]] += similar * float(item_rating[1]) return sorted(rank.items(), key=operator.itemgetter(1), reverse=True)[0:self.n] if __name__ == "__main__": file_path = "C:\\Users\\DELL\\Desktop\\code\\python\\dataset\\ml-latest-small\\ratings.csv" userBasedCF = UserBasedCF() userBasedCF.get_data(file_path) userBasedCF.similarity() user = random.sample(list(userBasedCF.train), 1) rec = userBasedCF.recommendation(user[0]) print(rec)
程序随机选择一位用户,根据和他最相似的 30 个用户,为其推荐 10 部电影,打印出推荐电影的 ID 和加权评分 [[电影ID, 加权评分],[电影ID,加权评分],……]。
参考资料:《推荐系统实践》——项亮
转载地址:http://sezmz.baihongyu.com/