data/0054.yaml

- en: Build Trail Recommender for TrailForks
  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: 为 TrailForks 构建推荐系统
- en: 原文：[https://towardsdatascience.com/build-trail-recommender-for-trailforks-8ea64b1a2fe4?source=collection_archive---------13-----------------------#2023-01-04](https://towardsdatascience.com/build-trail-recommender-for-trailforks-8ea64b1a2fe4?source=collection_archive---------13-----------------------#2023-01-04)
  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
  zh: 原文：[https://towardsdatascience.com/build-trail-recommender-for-trailforks-8ea64b1a2fe4?source=collection_archive---------13-----------------------#2023-01-04](https://towardsdatascience.com/build-trail-recommender-for-trailforks-8ea64b1a2fe4?source=collection_archive---------13-----------------------#2023-01-04)
- en: How I won the Outside 2022 Innovation Days Award
  id: totrans-2
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
  zh: 我是如何赢得 Outside 2022 创新日奖的
- en: '[](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)[![Wen
    Yang](../Images/5eac438762d015a0ab128757cc951967.png)](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)[](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)[![Towards
    Data Science](../Images/a6ff2676ffcc0c7aad8aaf1d79379785.png)](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)
    [Wen Yang](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)'
  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
  zh: '[](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)[![Wen
    Yang](../Images/5eac438762d015a0ab128757cc951967.png)](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)[](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)[![Towards
    Data Science](../Images/a6ff2676ffcc0c7aad8aaf1d79379785.png)](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)
    [Wen Yang](https://medium.com/@wen_yang?source=post_page-----8ea64b1a2fe4--------------------------------)'
- en: ·
  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
  zh: ·
- en: '[Follow](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fsubscribe%2Fuser%2Fcbb5383bd438&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&user=Wen+Yang&userId=cbb5383bd438&source=post_page-cbb5383bd438----8ea64b1a2fe4---------------------post_header-----------)
    Published in [Towards Data Science](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)
    ·11 min read·Jan 4, 2023[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Ftowards-data-science%2F8ea64b1a2fe4&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&user=Wen+Yang&userId=cbb5383bd438&source=-----8ea64b1a2fe4---------------------clap_footer-----------)'
  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
  zh: '[关注](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fsubscribe%2Fuser%2Fcbb5383bd438&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&user=Wen+Yang&userId=cbb5383bd438&source=post_page-cbb5383bd438----8ea64b1a2fe4---------------------post_header-----------)
    发表在 [Towards Data Science](https://towardsdatascience.com/?source=post_page-----8ea64b1a2fe4--------------------------------)
    · 11 分钟阅读 · 2023 年 1 月 4 日 [](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Ftowards-data-science%2F8ea64b1a2fe4&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&user=Wen+Yang&userId=cbb5383bd438&source=-----8ea64b1a2fe4---------------------clap_footer-----------)'
- en: --
  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
  zh: --
- en: '[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F8ea64b1a2fe4&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&source=-----8ea64b1a2fe4---------------------bookmark_footer-----------)![](../Images/8c1b04655799b55dc2ea6e187f82f7a4.png)'
  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
  zh: '[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F8ea64b1a2fe4&operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-trail-recommender-for-trailforks-8ea64b1a2fe4&source=-----8ea64b1a2fe4---------------------bookmark_footer-----------)![](../Images/8c1b04655799b55dc2ea6e187f82f7a4.png)'
- en: Photo by [Kristin Snippe](https://unsplash.com/@frausnippe?utm_source=medium&utm_medium=referral)
    on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral)
  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
  zh: 摄影：[Kristin Snippe](https://unsplash.com/@frausnippe?utm_source=medium&utm_medium=referral)
    摄于 [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral)
- en: Long Background Story
  id: totrans-9
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: 长背景故事
- en: When I first joined [Outside Inc](https://www.outsideinc.com/) in May 2021,
    my job was about building a personalized recommender system to power Outside Feed,
    which comprises a wide range of outdoor and active lifestyle content in a mixed
    medium format such as articles, videos, Outside films, and podcasts. Our goal
    is to inspire people to go outside.
  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
  zh: 当我在 2021 年 5 月首次加入 [Outside Inc](https://www.outsideinc.com/) 时，我的工作是构建一个个性化推荐系统，以驱动
    Outside Feed。Outside Feed 包含各种户外和活跃生活方式的内容，以文章、视频、Outside 电影和播客等混合媒介格式呈现。我们的目标是激励人们走出户外。
- en: 'Fast-forward to 17 months later, we have made two major transformations for
    Outside Feed’s recommender system:'
  id: totrans-11
  prefs: []
  type: TYPE_NORMAL
  zh: 快进到 17 个月后，我们对 Outside Feed 的推荐系统进行了两次重大改造：
- en: 'From reverse chronological feed to a hybrid-recommender powered feed, which
    includes three classic recsys functionalities: Collaborative Filtering, Content-based
    Filtering, and Hot & Trending. Model training and inference are batch-based. To
    use the 80–20 rule, this kind of system can get you from 0 to 80% in terms of
    providing personalized recommendations with relatively low effort. “Low” effort
    here is mainly meant in the sense of algorithm complexity.'
  id: totrans-12
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 从逆时间顺序的动态信息流到由混合推荐器驱动的动态信息流，其中包括三种经典的推荐系统功能：协同过滤、基于内容的过滤和热门与趋势。模型训练和推理是基于批次的。使用
    80–20 法则，这种系统可以让你在提供个性化推荐时，借助相对较少的努力达到 80%。这里的“低”努力主要是指算法复杂性方面。
- en: From a batch-based hybrid recommendation system to a real-time recommender system.
    We started to integrate [Miso.ai](https://miso.ai/) — a third-party tool to make
    the best use of clickstream events data collected by MetaRouter, and generate
    real-time recommendations. Since we use GraphQL, there are many lessons learned
    about wrapping rest API in Apollo as well as monitoring in DataDog, which deserves
    its own post for another day.
  id: totrans-13
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 从基于批次的混合推荐系统到实时推荐系统。我们开始集成 [Miso.ai](https://miso.ai/) —— 一个第三方工具，用来充分利用 MetaRouter
    收集的点击流事件数据，并生成实时推荐。由于我们使用 GraphQL，关于在 Apollo 中封装 REST API 以及在 DataDog 中进行监控，我们学到了很多经验，这些经验值得另写一篇文章。
- en: 'For this post, I documented my experience participating in Outside Innovation
    Day, in which I built a prototype trail recommender using trailforks data within
    one day. (PS: yes trailforks is part of the Outside family now!) Perhaps it is
    because a lot of my coworkers are avid hikers, mountain bikers, and outdoor enthusiasts,
    and they know firsthand the joy and relaxation of having a trail recommender to
    help discover trails to explore. They all kindly voted for my project and I won
    the Outside Innovation Day Award for Best embodying our mission to get people
    outside! ( Very fortunate and happy 😁 ~~~).'
  id: totrans-14
  prefs: []
  type: TYPE_NORMAL
  zh: 在这篇文章中，我记录了我参加 Outside Innovation Day 的经验，在一天内使用 Trailforks 数据构建了一个原型小径推荐系统。（PS：是的，Trailforks
    现在是 Outside 家族的一部分！）也许因为我的很多同事都是热衷的徒步旅行者、山地自行车爱好者和户外运动爱好者，他们亲身体验了有一个小径推荐系统来帮助发现探索小径的乐趣和放松。他们都友好地为我的项目投了票，我赢得了
    Outside Innovation Day 最能体现我们使命的奖项！（非常幸运和开心 😁 ~~~）。
- en: 'Below I will share three things:'
  id: totrans-15
  prefs: []
  type: TYPE_NORMAL
  zh: 以下是我将分享的三件事：
- en: 🏔️ 1\. How to build an 80% personalized recommender system with 20% effort (with
    Trailforks data)?
  id: totrans-16
  prefs: []
  type: TYPE_NORMAL
  zh: 🏔️ 1\. 如何以 20% 的努力构建 80% 个性化的推荐系统（使用 Trailforks 数据）？
- en: 🏔️ 2\. Things to consider for preparing hackathon type of projects
  id: totrans-17
  prefs: []
  type: TYPE_NORMAL
  zh: 🏔️ 2\. 为准备黑客马拉松类型的项目考虑的事项
- en: 🏔️ 3\. Further considerations for productionize recommender system
  id: totrans-18
  prefs: []
  type: TYPE_NORMAL
  zh: 🏔️ 3\. 生产化推荐系统的进一步考虑
- en: 🏔️ 1\. The 80–20 Rule of Trail Recommender
  id: totrans-19
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: 🏔️ 1\. Trail Recommender 的 80–20 法则
- en: There are so many advanced recommender systems, but a hybrid of Collaborative
    filtering + Content-based recommenders is really the sweet spot of personalized
    recsys. It can cover 80% of the ground with about 20% of engineering effort.
  id: totrans-20
  prefs: []
  type: TYPE_NORMAL
  zh: 有很多高级推荐系统，但协同过滤 + 基于内容的推荐系统的混合真的是个性化推荐系统的最佳选择。它可以以约 20% 的工程努力覆盖 80% 的内容。
- en: Collaborative filtering involves analyzing the preferences of other users who
    have similar tastes to the current user and recommending trails based on those
    users’ preferences.
  id: totrans-21
  prefs: []
  type: TYPE_NORMAL
  zh: 协同过滤涉及分析其他具有相似口味的用户的偏好，并基于这些用户的偏好推荐小径。
- en: '![](../Images/886a3fc481dc5d4448af5d0f4e0240f7.png)'
  id: totrans-22
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/886a3fc481dc5d4448af5d0f4e0240f7.png)'
- en: image created by Wen Yang
  id: totrans-23
  prefs: []
  type: TYPE_NORMAL
  zh: 由 Wen Yang 创建的图像
- en: Content-based filtering, on the other hand, involves analyzing the characteristics
    of the trails themselves and recommending trails with similar characteristics
    to the user’s preferences.
  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
  zh: 另一方面，基于内容的过滤涉及分析小径本身的特征，并推荐与用户偏好相似的小径。
- en: '![](../Images/04c0fdaa440136b442203dbff7fe2393.png)'
  id: totrans-25
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/04c0fdaa440136b442203dbff7fe2393.png)'
- en: image by Wen Yang
  id: totrans-26
  prefs: []
  type: TYPE_NORMAL
  zh: Wen Yang 制作的图像
- en: Building an 80–20 trail recommender involves three steps.
  id: totrans-27
  prefs: []
  type: TYPE_NORMAL
  zh: 构建一个 80–20 小径推荐系统涉及三个步骤。
- en: '**Step 1: First, you need to gather data.**'
  id: totrans-28
  prefs: []
  type: TYPE_NORMAL
  zh: '**步骤 1：首先，你需要收集数据。**'
- en: There are two types of data most useful for building recommender systems.
  id: totrans-29
  prefs: []
  type: TYPE_NORMAL
  zh: 有两种数据对于构建推荐系统特别有用。
- en: '**Interaction dataset**: includes `userid` , `trailid` , and `activitytype`
    . The first two are most essential, and the third feature `activitytype` is not
    used in my project.'
  id: totrans-30
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '**交互数据集**：包括 `userid`、`trailid` 和 `activitytype`。前两个最为重要，而第三个特征 `activitytype`
    在我的项目中未使用。'
- en: '![](../Images/53860b726aa3889cc1812f4f65fa82dc.png)'
  id: totrans-31
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/53860b726aa3889cc1812f4f65fa82dc.png)'
- en: Preview Trailforks Interaction dataset
  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
  zh: 预览 Trailforks 互动数据集
- en: '**Catalog dataset**: includes selected trail information such as the trail’s
    location, length, difficulty level, global rank score and etc.'
  id: totrans-33
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '**目录数据集**: 包括选定小径的信息，如小径的位置、长度、难度等级、全球排名分数等。'
- en: '![](../Images/8fbb63a0c303d99b4264c46df888e589.png)'
  id: totrans-34
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/8fbb63a0c303d99b4264c46df888e589.png)'
- en: '**Step 2: Data Visualization with Altair**'
  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
  zh: '**步骤 2: 使用 Altair 进行数据可视化**'
- en: From the interaction dataset, there are only 10 users, and one user alone (userid
    = 454369) explored 2524 unique trails. (PS. His name is Trevor and he is our founding
    engineer at TrailForks — totally makes sense!)
  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
  zh: 从互动数据集中，只有 10 个用户，其中一个用户（userid = 454369）探索了 2524 条独特的小径。（PS：他的名字是 Trevor，他是
    TrailForks 的创始工程师——这完全合理！）
- en: '[PRE0]'
  id: totrans-37
  prefs: []
  type: TYPE_PRE
  zh: '[PRE0]'
- en: '![](../Images/cf1c1f11f32a3e3927c4e614e18975a9.png)'
  id: totrans-38
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/cf1c1f11f32a3e3927c4e614e18975a9.png)'
- en: 'The below code can make a bar chart per variable name:'
  id: totrans-39
  prefs: []
  type: TYPE_NORMAL
  zh: 以下代码可以为每个变量名生成一个柱状图：
- en: '[PRE1]'
  id: totrans-40
  prefs: []
  type: TYPE_PRE
  zh: '[PRE1]'
- en: 'For example, let’s look at trail direction, physical rating difficulty title,
    and country name:'
  id: totrans-41
  prefs: []
  type: TYPE_NORMAL
  zh: 比如，我们来看看小径方向、物理评分难度标题和国家名称：
- en: '![](../Images/ce682760565f644e135762906c6d43b6.png)'
  id: totrans-42
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/ce682760565f644e135762906c6d43b6.png)'
- en: Initially, I was trying to use `global_rank_score` as the “popularity” feature,
    but there are too many records missing this feature. Fortunately, `rating` is
    positively correlated to `global_rank_score` . Below is one of my favorite types
    of visualizations, which can be done with `altair` fairly easily.
  id: totrans-43
  prefs: []
  type: TYPE_NORMAL
  zh: 起初，我尝试使用 `global_rank_score` 作为“流行度”特征，但有太多记录缺少这个特征。幸运的是，`rating` 与 `global_rank_score`
    存在正相关。下面是我最喜欢的可视化类型之一，使用 `altair` 可以很容易地完成。
- en: '[PRE2]'
  id: totrans-44
  prefs: []
  type: TYPE_PRE
  zh: '[PRE2]'
- en: '![](../Images/ba5b73d62b18134ec20350beb2406f28.png)'
  id: totrans-45
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/ba5b73d62b18134ec20350beb2406f28.png)'
- en: Similarly, we can use `difficulty_title` instead of `physical_rating` to represent
    the difficulty level, since the former has fewer missing values than the latter.
  id: totrans-46
  prefs: []
  type: TYPE_NORMAL
  zh: 同样，我们可以使用 `difficulty_title` 替代 `physical_rating` 来表示难度等级，因为前者的缺失值比后者少。
- en: Moderate → Blue, Black Diamond
  id: totrans-47
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 中等 → 蓝色，黑钻
- en: Hard → Black Diamond, Blue
  id: totrans-48
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 困难 → 黑钻，蓝色
- en: Extreme → Double Black Diamond
  id: totrans-49
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 极端 → 双黑钻
- en: Easy → Green
  id: totrans-50
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 简单 → 绿色
- en: '[PRE3]'
  id: totrans-51
  prefs: []
  type: TYPE_PRE
  zh: '[PRE3]'
- en: '![](../Images/0ead52bffcd14b16db778c925d14fdfa.png)'
  id: totrans-52
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/0ead52bffcd14b16db778c925d14fdfa.png)'
- en: '**Step 3a: Build a collaborative filtering model using** `**implicit**`'
  id: totrans-53
  prefs: []
  type: TYPE_NORMAL
  zh: '**步骤 3a: 使用** `**implicit**` **构建协同过滤模型**'
- en: 'For collaborative filtering, the only dataset you need is the interaction dataset.
    Here I calculated three other features:'
  id: totrans-54
  prefs: []
  type: TYPE_NORMAL
  zh: 对于协同过滤，你所需要的唯一数据集是互动数据集。这里我计算了三个其他特征：
- en: '`n_times_interacted` : how many times has this user interacted with this trail'
  id: totrans-55
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`n_times_interacted` : 该用户与此小径互动了多少次'
- en: '`n_trails_interacted` : how many trails were explored by this user'
  id: totrans-56
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`n_trails_interacted` : 该用户探索了多少条小径'
- en: '`n_users_interacted` : how many unique users explored this trail'
  id: totrans-57
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`n_users_interacted` : 多少独特用户探索了这条小径'
- en: 'The idea is the use `n_times_interacted` as an implicit feedback feature: the
    more times a user explored a certain trail, the more confidence that the user
    likes this trail.'
  id: totrans-58
  prefs: []
  type: TYPE_NORMAL
  zh: 思路是使用 `n_times_interacted` 作为隐式反馈特征：用户探索某条小径的次数越多，就越能确定用户喜欢这条小径。
- en: '![](../Images/c773975dfc96646a4944a445c1ff8d27.png)'
  id: totrans-59
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/c773975dfc96646a4944a445c1ff8d27.png)'
- en: 'Model Training:'
  id: totrans-60
  prefs: []
  type: TYPE_NORMAL
  zh: 模型训练：
- en: '[PRE4]'
  id: totrans-61
  prefs: []
  type: TYPE_PRE
  zh: '[PRE4]'
- en: 'Useful functions for inference:'
  id: totrans-62
  prefs: []
  type: TYPE_NORMAL
  zh: 推理的有用函数：
- en: '[PRE5]'
  id: totrans-63
  prefs: []
  type: TYPE_PRE
  zh: '[PRE5]'
- en: '🌄 DEMO: recommend trails for Trevor using Collaborative Filtering'
  id: totrans-64
  prefs: []
  type: TYPE_NORMAL
  zh: '🌄 DEMO: 使用协同过滤为 Trevor 推荐小径'
- en: 'I used `pandas-profiling` to find out Trevor’s preference on trails, and here’s
    the observation:'
  id: totrans-65
  prefs: []
  type: TYPE_NORMAL
  zh: 我使用了 `pandas-profiling` 来了解 Trevor 对小径的偏好，以下是观察结果：
- en: 'physical_rating: moderate, hard'
  id: totrans-66
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 'physical_rating: 中等，困难'
- en: 'difficulty_title: Blue, Black Diamond'
  id: totrans-67
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 'difficulty_title: 蓝色，黑钻'
- en: 'trailtype: Singletrack'
  id: totrans-68
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 'trailtype: 单轨'
- en: 'direction: Downhill Only'
  id: totrans-69
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 方向：仅限下坡
- en: 'most explored trail: Fitzsimmons Connector (319 times!)'
  id: totrans-70
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 最多被探索的小径：Fitzsimmons Connector（319 次！）
- en: '![](../Images/076484d011994423377d05dba4643b31.png)![](../Images/f47ad8752446f8cdccdf56e5a08092f7.png)'
  id: totrans-71
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/076484d011994423377d05dba4643b31.png)![](../Images/f47ad8752446f8cdccdf56e5a08092f7.png)'
- en: Let’s get recommendations from the Collaborative Filtering Model →
  id: totrans-72
  prefs: []
  type: TYPE_NORMAL
  zh: 让我们从协同过滤模型中获取推荐 →
- en: '![](../Images/e67b2d492610b6b63dcbb6ea3f458ee1.png)'
  id: totrans-73
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/e67b2d492610b6b63dcbb6ea3f458ee1.png)'
- en: 'And the photo and map for the first recommended trail:'
  id: totrans-74
  prefs: []
  type: TYPE_NORMAL
  zh: 以及第一个推荐小径的照片和地图：
- en: '![](../Images/b06915379122cf1299c6aece3e2436b9.png)'
  id: totrans-75
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/b06915379122cf1299c6aece3e2436b9.png)'
- en: '**Step 3b: Content-based Filtering**'
  id: totrans-76
  prefs: []
  type: TYPE_NORMAL
  zh: '**步骤 3b: 基于内容的过滤**'
- en: Since Collaborative filtering is based on the interaction dataset, this means
    that it won’t work for the cold-user scenario. If a new user hasn’t explored any
    trails, we won’t be able to find similar users to this new user. That’s why we
    need Content-based filtering to fill the gap.
  id: totrans-77
  prefs: []
  type: TYPE_NORMAL
  zh: 由于协同过滤基于交互数据集，这意味着它在冷启动用户场景下不起作用。如果新用户没有探索任何路线，我们将无法找到与该新用户相似的用户。这就是为什么我们需要基于内容的过滤来填补这个空白。
- en: 'Content-based filtering is all about finding “similar trails”, therefore we
    need to decide two things:'
  id: totrans-78
  prefs: []
  type: TYPE_NORMAL
  zh: 基于内容的过滤完全是关于寻找“相似的路线”，因此我们需要决定两件事：
- en: similar in what characteristics
  id: totrans-79
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 在什么特征上相似
- en: how to measure similarity
  id: totrans-80
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 如何测量相似性
- en: 'The first question is a feature engineering question. The below features are
    selected as trail characteristics:'
  id: totrans-81
  prefs: []
  type: TYPE_NORMAL
  zh: 第一个问题是特征工程问题。以下特征被选为路线特征：
- en: 'Numerical features: `stat_climb` , `stat_descent` , `stat_distance` , `rating`'
  id: totrans-82
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 数值特征：`stat_climb` 、`stat_descent` 、`stat_distance` 、`rating`
- en: 'Text features: `title`, `difficulty_title` , `trailtype` , `direction` and
    `country_title`'
  id: totrans-83
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 文本特征：`title`、`difficulty_title`、`trailtype`、`direction` 和 `country_title`
- en: As for the second question, we use cosine similarity which is a commonly used
    measure in terms of RecSys practice.
  id: totrans-84
  prefs: []
  type: TYPE_NORMAL
  zh: 对于第二个问题，我们使用余弦相似度，这是在推荐系统实践中常用的度量方法。
- en: 💻 Code example for feature engineering and similarity calculation
  id: totrans-85
  prefs: []
  type: TYPE_NORMAL
  zh: 💻 特征工程和相似度计算的代码示例
- en: '[PRE6]'
  id: totrans-86
  prefs: []
  type: TYPE_PRE
  zh: '[PRE6]'
- en: '🌄 DEMO: recommend trails for Trevor using Content-based Filtering'
  id: totrans-87
  prefs: []
  type: TYPE_NORMAL
  zh: 🌄 DEMO：使用基于内容的过滤为 Trevor 推荐路线
- en: Let’s find trails similar to Trevor’s second most explored trail (‘A-Line —
    Lower’)
  id: totrans-88
  prefs: []
  type: TYPE_NORMAL
  zh: 让我们找到与 Trevor 的第二条最常探索的路线（‘A-Line — Lower’）相似的路线
- en: '![](../Images/6828e474a35e00a6a97a82eff7f6e82d.png)![](../Images/0986a632ebcd8e64f77ca19a62f4c360.png)'
  id: totrans-89
  prefs: []
  type: TYPE_IMG
  zh: '![](../Images/6828e474a35e00a6a97a82eff7f6e82d.png)![](../Images/0986a632ebcd8e64f77ca19a62f4c360.png)'
- en: Wow! The content-based recommender is pretty impressive, which recommended most
    medium to hard difficulty levels, and downhill direction trails. My coworker Trevor
    is a serious mountain biker, and he’s quite pleased about such recommendations!
  id: totrans-90
  prefs: []
  type: TYPE_NORMAL
  zh: 哇！基于内容的推荐器非常令人印象深刻，它推荐了大多数中等到困难难度的山地车路线和下坡方向的路线。我的同事 Trevor 是一个资深的山地车骑行者，他对这样的推荐非常满意！
- en: OK, now the hard work is done. I’d like to share a few reflections.
  id: totrans-91
  prefs: []
  type: TYPE_NORMAL
  zh: 好吧，现在艰苦的工作已经完成。我想分享一些反思。
- en: 🏔️ 2\. Things to consider for preparing hackathon type of projects
  id: totrans-92
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: 🏔️ 2. 准备黑客马拉松类型项目时需要考虑的事项
- en: 'The biggest lesson I learned from participating in hackathon type of projects
    is that “less is more”. It’s very tempting to build a recommender system including
    both the data science component, the backend database component, and the frontend
    or just streamlit component for a nicer demo. But hackathons are typical with
    time limitations, you need to be aware of which parts you are willing to sacrifice
    a bit to give time for other things. For me, such parts are:'
  id: totrans-93
  prefs: []
  type: TYPE_NORMAL
  zh: 从参与黑客马拉松类型的项目中我学到的最大教训是“少即是多”。尽管很诱人去构建一个包括数据科学组件、后端数据库组件以及前端或只是用于更好演示的 streamlit
    组件的推荐系统，但黑客马拉松通常时间有限，你需要意识到哪些部分你愿意稍微牺牲一点，以便留出时间做其他事情。对我来说，这些部分是：
- en: 'EDA and data visualization: exploratory data analysis is like a black hole,
    it can suck you in and it feels no obvious ending point. The same goes for data
    visualization, you could spend endless time to make one chart pretty and prettier,
    but that would only benefit your project from 80 to 82, which is the opposite
    of the 80–20 rule. My solutions are `pandas-profiling` and `altair` . Before performing
    any data analysis, I would use `pandas-profiling` to check the missing values,
    the general distribution, and correlations, this will help me narrow down the
    data I can keep and prepare for feature engineering. `altair` is really simple
    to use and with a few compacted lines, you can easily make nice-looking bar charts
    and scatter+distribution charts. One of my tips is that I would just copy a few
    code snippets on the notebook beforehand, and only use those already in my notebook
    to remove any further temptations.'
  id: totrans-94
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: EDA 和数据可视化：探索性数据分析就像一个黑洞，它可以把你吸进去，并且没有明显的结束点。数据可视化也是如此，你可能会花费无尽的时间来美化一个图表，但这只会将你的项目从
    80 提升到 82，这与 80-20 规则相反。我的解决方案是`pandas-profiling` 和 `altair`。在进行任何数据分析之前，我会使用
    `pandas-profiling` 来检查缺失值、总体分布和相关性，这将帮助我缩小可以保留的数据范围并为特征工程做准备。`altair` 使用起来非常简单，通过几行紧凑的代码，你可以轻松制作出漂亮的条形图和散点+分布图。我有一个小技巧，就是提前在笔记本中复制几个代码片段，只使用已经在笔记本中的那些代码，以避免任何进一步的诱惑。
- en: The demo is about storytelling. You can spend all-nighters to have the fanciest
    and most complex hackathon project, but you need to reserve enough time to prepare
    how you want to tell the story. A story can start with a Why (the motivation and
    background of why you want to do this project), a What (what happened, what’s
    surprising, what works), a little bit of How (keep it at a high level, equations
    and code are impressive in itself but it creates unnecessary intimidation and
    fatigue for people who only have 5 mins to understand your project). For example,
    this post is definitely too long and has too many details to demo standard. Lastly,
    adding a bit of drama and humor is always an advantage.
  id: totrans-95
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 演示是关于讲故事的。你可以熬夜制作最华丽和最复杂的黑客松项目，但你需要留出足够的时间来准备如何讲述这个故事。一个故事可以从 Why（你为什么要做这个项目的动机和背景）、What（发生了什么，什么令人惊讶，什么有效）、一点点
    How（保持在高层次，方程式和代码本身很令人印象深刻，但它会给那些只有5分钟时间理解你的项目的人带来不必要的威慑和疲惫）开始。例如，这篇文章绝对过长，细节过多，不符合演示标准。最后，加入一些戏剧性和幽默感总是一个优势。
- en: 🏔️ 3\. Further considerations for productionize recommender system
  id: totrans-96
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: 🏔️ 3\. 将推荐系统生产化的进一步考虑
- en: 'Lastly, if you made it this far, and you really want to productionize the prototype
    recommender system, there are three pieces of ghost knowledge* I’d like to share:'
  id: totrans-97
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，如果你已经看到这里，并且真的想将原型推荐系统投入生产，我有三点幽灵知识*想与大家分享：
- en: '*ghost knowledge (*[*source*](https://vickiboykis.com/2021/03/26/the-ghosts-in-the-data/)*)
    :*'
  id: totrans-98
  prefs: []
  type: TYPE_NORMAL
  zh: '*幽灵知识 (*[*来源*](https://vickiboykis.com/2021/03/26/the-ghosts-in-the-data/)*)
    :*'
- en: Knowledge that is present somewhere in the epistemic community, and is perhaps
    readily accessible to some central member of that community, but it is not really
    written down anywhere and it’s not clear how to access it.
  id: totrans-99
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
  zh: 知识存在于某个认知社区中，也许某个社区的核心成员很容易获得，但实际上并没有被写下来，也不清楚如何获取它。
- en: 'Batch vs real-time recommender system: batch system sounds simple and straightforward,
    but it has its own complications. If you use `implicit` for collaborative filtering
    and using `cosine similarity` to retrieve similar content, it is very likely the
    latency will not satisfy your frontend API requirement. You can try to improve
    it using indexing like FAISS, but it might still be higher than 500ms. That’s
    why you might consider doing batch computations for each user and each item. Depending
    on the size of users and items, the batch recommendations might take more than
    4hours and it will not refresh until you retrain the model. A quick tip is that
    only using users have any interactions in the past 90 days, or only do batch updates
    for items published in the past 90 days.'
  id: totrans-100
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 批量 vs 实时推荐系统：批量系统听起来简单直接，但它有其自身的复杂性。如果你使用`implicit`进行协同过滤并使用`cosine similarity`检索类似内容，很可能延迟无法满足你的前端
    API 要求。你可以尝试使用像 FAISS 这样的索引来提高速度，但它可能仍然高于500ms。这就是为什么你可能考虑对每个用户和每个项目进行批量计算。根据用户和项目的规模，批量推荐可能需要超过4小时，并且直到你重新训练模型才会刷新。一个快速提示是，仅使用过去90天内有任何互动的用户，或仅对过去90天内发布的项目进行批量更新。
- en: '“Cold-Start” might not be as severe as you think: I remember when I first learned
    the recommender system, books and talks made sure to emphasize the “cold-start”
    problem as if it is the most difficult scenario to solve. But for Outside Feed,
    this problem is not even top-3\. The reason is that a lot of people are used to
    reading the freshest and most recent content on their Outside Feed, and it is
    fairly easy to rank content by publish date to surface the new items (cold items),
    thus it is easy to guarantee new items get pageview (interactions) data. For us,
    the much harder problem is actually “how to recommend ever-green content”? If
    we recommend an all-time popular article first written in 2016, even though this
    piece of content might be relevant to users'' tastes, users might get the impression
    that we didn’t have fresh content to recommend.'
  id: totrans-101
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: “Cold-Start” 可能并不像你想象的那么严重：我记得当我第一次学习推荐系统时，书籍和讲座总是强调“cold-start”问题，好像这是最难解决的场景。然而，对于
    Outside Feed 来说，这个问题甚至不在前3名。原因是很多人习惯于在 Outside Feed 上阅读最新的内容，因此通过发布日期对内容进行排序，以便展示新项（冷项）是相当容易的，从而能够保证新项获得页面浏览（互动）数据。对我们来说，更难的问题实际上是“如何推荐永恒的内容”？如果我们推荐一篇最初于2016年撰写的全时热门文章，即使这篇内容可能符合用户的口味，用户可能会觉得我们没有新鲜的内容可以推荐。
- en: 'The third piece of knowledge is an anecdote I heard from another recommender
    system practitioner (Andy - the CTO from [miso.ai](https://miso.ai/)): he once
    built a homepage recommender for a clothing website. Even though the learn-to-rank
    ranking system learned that the “Black” color is likely to be the most popular
    one, thus it recommended the first page with all clothing in different styles
    but all in black color. Well, the model isn’t wrong, but it looks terrible at
    first glance and it performed poorly in terms of enticing people to buy clothes.
    He joked that he’s been studying recommender systems for years and years, but
    nobody mentioned that a good color complementary design might have a huge effect.'
  id: totrans-102
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 第三个知识点是我从另一位推荐系统从业者（Andy - 来自[miso.ai](https://miso.ai/)的CTO）那里听到的趣闻：他曾经为一个服装网站构建了一个主页推荐系统。尽管学习排名的系统了解到“黑色”可能是最受欢迎的颜色，因此推荐了首页上全是黑色的不同风格的服装。好吧，模型并没有错，但乍一看效果很糟糕，并且在吸引人们购买衣物方面表现不佳。他开玩笑说，他研究推荐系统已经很多年了，但没有人提到好的颜色搭配设计可能会有很大的影响。
- en: That’s it for my last medium post in the year 2022\. Thank you tremendously
    for the support. If you have any thoughts or reflections or better yet — more
    recsys anecdotes, please send them my way! Love to compile them into the fascinating
    book I dream to write < The Pleasure and Sorrow of Building RecSys> 🐳~~~
  id: totrans-103
  prefs: []
  type: TYPE_NORMAL
  zh: 这就是我在2022年的最后一篇中等帖子。非常感谢你的支持。如果你有任何想法或反思，或者更好——更多的推荐系统趣闻，请发送给我！我很乐意将它们编入我梦想写的迷人书籍《构建推荐系统的乐趣与悲哀》中
    < The Pleasure and Sorrow of Building RecSys> 🐳~~~