# 个性推荐②—基于用户协同过滤算法原及优化方案

## 一、算法步骤

① 找到和目标用户相似的用户集合
② 找到这个集合中用户喜欢的，且目标用户没有听过或产生过行为的物品，推荐给目标用户

## 二、用户相似度计算

``````def UserSimilarity(train):
W = dict()
for u in train.keys():
for v in train.keys():
if u == v:
continue
W[u][v] = len(train[u] & train[v])
W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
return W
``````

``````def UserSimilarity(train):
# build inverse table for item_users
item_users = dict()
for u, items in train.items():
for i in items.keys():
if i not in item_users:
item_users[i] = set()

#calculate co-rated items between users
C = dict()
N = dict()
for i, users in item_users.items():
for u in users:
N[u] += 1
for v in users:
if u == v:
continue
C[u][v] += 1
#calculate finial similarity matrix W
W = dict()
for u, related_users in C.items():
for v, cuv in related_users.items():
W[u][v] = cuv / math.sqrt(N[u] * N[v])
return W
``````

## 三、用户对物品的兴趣度计算

S(u, K)包含和用户u兴趣最接近的K个用户，N(i)是对物品i有过行为的用户集合，wuv

``````def Recommend(user, train, W):
rank = dict()
interacted_items = train[user]
for v, wuv in sorted(W[u].items, key=itemgetter(1), \
reverse=True)[0:K]:
for i, rvi in train[v].items:
if i in interacted_items:
#we should filter items user interacted before
continue
rank[i] += wuv * rvi
return rank
``````

UserCF只有一个重要的参数K，即为每个用户选出K个和他兴趣最相似的用户，然后推荐那K个用户感兴趣的物品，这是取自《推荐系统实战》的结果截图

① 准确率和召回率

② 流行度
K越大，流行度越大，因为参考的用户过多，越趋向于推荐热门的产品

③ 覆盖率
K越大，覆盖率越低，是因为K越大，就越容易推荐流行度高的物品，导致长尾挖掘不充分

## 四、改进方案

1、 降低热门物品的相似度

``````def UserSimilarity(train):
# build inverse table for item_users
item_users = dict()
for u, items in train.items():
for i in items.keys():
if i not in item_users:
item_users[i] = set()

#calculate co-rated items between users
C = dict()
N = dict()
for i, users in item_users.items():
for u in users:
N[u] += 1
for v in users:
if u == v:
continue
C[u][v] += 1 / math.log(1 + len(users))
#calculate finial similarity matrix W
W = dict()
for u, related_users in C.items():
for v, cuv in related_users.items():
W[u][v] = cuv / math.sqrt(N[u] * N[v])
return W
``````

2、加入时间上下文信息

a) 生成推荐列表时假如一定的随机性，比如从推荐前30个中随机挑选20个
b)记录每天给用户的推荐结果，给出现次数过多的物品降低权重
c) 每天使用不同的推荐算法，比如协同过滤，基于内容过滤等

① 用户相似度：两者对同一物品产生行为的时间间隔越近，相似度越大

v对物品i产生行为的时间越远，那么这两个用户的兴趣相似度就会越小

``````def UserSimilarity(train):
# build inverse table for item_users
item_users = dict()
for u, items in train.items():
for i,tui in items.items():
if i not in item_users:
item_users[i] = dict()
item_users[i][u] = tui

#calculate co-rated items between users
C = dict()
N = dict()
for i, users in item_users.items():
for u,tui in users.items():
N[u] += 1
for v,tvi in users.items():
if u == v:
continue
C[u][v] += 1 / (1 + alpha * abs(tui - tvi))
#calculate finial similarity matrix W
W = dict()
for u, related_users in C.items():
for v, cuv in related_users.items():
W[u][v] = cuv / math.sqrt(N[u] * N[v])
return W
``````

② 相似用户的兴趣：相似用户最近喜欢的物品，兴趣度要高于之前喜欢的

``````def Recommend(user, T, train, W):
rank = dict()
interacted_items = train[user]
for v, wuv in sorted(W[u].items, key=itemgetter(1),
reverse=True)[0:K]:
for i, tvi in train[v].items:
if i in interacted_items:
#we should filter items user interacted before
continue
rank[i] += wuv / (1 + alpha * (T - tvi))
return rank
``````