This commit is contained in:
colaer 2021-10-21 11:49:53 +08:00
parent d61aaaab70
commit cc847839d7
370 changed files with 29429 additions and 7 deletions

202
LICENSE
View File

@ -1,9 +1,201 @@
MIT License
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Copyright (c) <year> <copyright holders>
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
1. Definitions.
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

804
README.md
View File

@ -1,3 +1,803 @@
# jstarcraft-rns-master
**基于JStarCraft实现的搜索引擎**
## 1.项目介绍
**JStarCraft RNS是一个面向信息检索领域的轻量级引擎.遵循Apache 2.0协议.**
专注于解决信息检索领域的基本问题:推荐与搜索.
提供满足工业级别场景要求的推荐引擎设计与实现.
提供满足工业级别场景要求的搜索引擎设计与实现.
****
## 2.特性
* 1.跨平台
* [2.串行与并行计算](https://github.com/HongZhaoHua/jstarcraft-ai)
* [3.CPU与GPU硬件加速](https://github.com/HongZhaoHua/jstarcraft-ai)
* [4.模型保存与装载](https://github.com/HongZhaoHua/jstarcraft-ai)
* 5.丰富的推荐与搜索算法
* 6.丰富的脚本支持
* Groovy
* JS
* Lua
* MVEL
* Python
* Ruby
* [7.丰富的评估指标](#评估指标)
* [排序指标](#排序指标)
* [评分指标](#评分指标)
****
## 3.安装
JStarCraft RNS要求使用者具备以下环境:
* JDK 8或者以上
* Maven 3
#### 3.1安装JStarCraft-Core框架
```shell
git clone https://github.com/HongZhaoHua/jstarcraft-core.git
mvn install -Dmaven.test.skip=true
```
#### 3.2安装JStarCraft-AI框架
```shell
git clone https://github.com/HongZhaoHua/jstarcraft-ai.git
mvn install -Dmaven.test.skip=true
```
#### 3.3安装JStarCraft-RNS引擎
```shell
git clone https://github.com/HongZhaoHua/jstarcraft-rns.git
mvn install -Dmaven.test.skip=true
```
****
## 4.使用
#### 4.1设置依赖
* 设置Maven依赖
```xml
<dependency>
<groupId>com.jstarcraft</groupId>
<artifactId>rns</artifactId>
<version>1.0</version>
</dependency>
```
* 设置Gradle依赖
```gradle
compile group: 'com.jstarcraft', name: 'rns', version: '1.0'
```
#### 4.2构建配置器
```java
Properties keyValues = new Properties();
keyValues.load(this.getClass().getResourceAsStream("/data.properties"));
keyValues.load(this.getClass().getResourceAsStream("/recommend/benchmark/randomguess-test.properties"));
Configurator configurator = new Configurator(keyValues);
```
#### 4.3训练与评估模型
* 构建排序任务
```java
RankingTask task = new RankingTask(RandomGuessModel.class, configurator);
// 训练与评估排序模型
task.execute();
```
* 构建评分任务
```java
RatingTask task = new RatingTask(RandomGuessModel.class, configurator);
// 训练与评估评分模型
task.execute();
```
#### 4.4获取模型
```java
// 获取模型
Model model = task.getModel();
```
****
## 5.概念
#### 5.1为什么需要信息检索
```
随着信息技术和互联网的发展,人们逐渐从信息匮乏(Information Underload)的时代走入了信息过载(Information Overload)的时代.
无论是信息消费者还是信息生产者都遇到了挑战:
* 对于信息消费者,从海量信息中寻找信息,是一件非常困难的事情;
* 对于信息生产者,从海量信息中暴露信息,也是一件非常困难的事情;
信息检索的任务就是联系用户和信息,一方面帮助用户寻找对自己有价值的信息,另一方面帮助信息暴露给对它感兴趣的用户,从而实现信息消费者和信息生产者的双赢.
```
#### 5.2搜索与推荐的异同
```
从信息检索的角度:
* 搜索和推荐是获取信息的两种主要手段;
* 搜索和推荐是获取信息的两种不同方式;
* 搜索(Search)是主动明确的;
* 推荐(Recommend)是被动模糊的;
搜索和推荐是两个互补的工具.
```
#### 5.3JStarCraft-RNS引擎解决什么问题
```
JStarCraft-RNS引擎旨在解决推荐与搜索领域的两个核心任务:排序预测(Ranking)和评分预测(Rating).
```
#### 5.4Ranking任务与Rating任务之间的区别
```
根据解决基本问题的不同,将算法与评估指标划分为排序(Ranking)与评分(Rating).
两者之间的根本区别在于目标函数的不同.
通俗点的解释:
Ranking算法基于隐式反馈数据,趋向于拟合用户的排序.(关注度)
Rating算法基于显示反馈数据,趋向于拟合用户的评分.(满意度)
```
#### 5.5Rating算法能不能用于Ranking问题
```
关键在于具体场景中,关注度与满意度是否保持一致.
通俗点的解释:
人们关注的东西,并不一定是满意的东西.(例如:个人所得税)
```
****
## 6.示例
#### 6.1JStarCraft-RNS引擎与BeanShell脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写BeanShell脚本训练与评估模型并保存到Model.bsh文件
```java
// 构建配置
keyValues = new Properties();
keyValues.load(loader.getResourceAsStream("data.properties"));
keyValues.load(loader.getResourceAsStream("model/benchmark/randomguess-test.properties"));
configurator = new Configurator(keyValues);
// 此对象会返回给Java程序
_data = new HashMap();
// 构建排序任务
task = new RankingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取排序指标
measures = task.execute();
_data.put("precision", measures.get(PrecisionEvaluator.class));
_data.put("recall", measures.get(RecallEvaluator.class));
// 构建评分任务
task = new RatingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取评分指标
measures = task.execute();
_data.put("mae", measures.get(MAEEvaluator.class));
_data.put("mse", measures.get(MSEEvaluator.class));
_data;
```
* 使用JStarCraft框架从Model.bsh文件加载并执行BeanShell脚本
```java
// 获取BeanShell脚本
File file = new File(ScriptTestCase.class.getResource("Model.bsh").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置BeanShell脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置BeanShell脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行BeanShell脚本
ScriptExpression expression = new GroovyExpression(context, scope, script);
Map<String, Float> data = expression.doWith(Map.class);
Assert.assertEquals(0.005825241F, data.get("precision"), 0F);
Assert.assertEquals(0.011579763F, data.get("recall"), 0F);
Assert.assertEquals(1.2708743F, data.get("mae"), 0F);
Assert.assertEquals(2.425075F, data.get("mse"), 0F);
```
#### 6.2JStarCraft-RNS引擎与Groovy脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写Groovy脚本训练与评估模型并保存到Model.groovy文件
```groovy
// 构建配置
def keyValues = new Properties();
keyValues.load(loader.getResourceAsStream("data.properties"));
keyValues.load(loader.getResourceAsStream("recommend/benchmark/randomguess-test.properties"));
def configurator = new Configurator(keyValues);
// 此对象会返回给Java程序
def _data = [:];
// 构建排序任务
task = new RankingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取排序指标
measures = task.execute();
_data.precision = measures.get(PrecisionEvaluator.class);
_data.recall = measures.get(RecallEvaluator.class);
// 构建评分任务
task = new RatingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取评分指标
measures = task.execute();
_data.mae = measures.get(MAEEvaluator.class);
_data.mse = measures.get(MSEEvaluator.class);
_data;
```
* 使用JStarCraft框架从Model.groovy文件加载并执行Groovy脚本
```java
// 获取Groovy脚本
File file = new File(ScriptTestCase.class.getResource("Model.groovy").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置Groovy脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置Groovy脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行Groovy脚本
ScriptExpression expression = new GroovyExpression(context, scope, script);
Map<String, Float> data = expression.doWith(Map.class);
```
#### 6.3JStarCraft-RNS引擎与JS脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写JS脚本训练与评估模型并保存到Model.js文件
```js
// 构建配置
var keyValues = new Properties();
keyValues.load(loader.getResourceAsStream("data.properties"));
keyValues.load(loader.getResourceAsStream("recommend/benchmark/randomguess-test.properties"));
var configurator = new Configurator([keyValues]);
// 此对象会返回给Java程序
var _data = {};
// 构建排序任务
task = new RankingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取排序指标
measures = task.execute();
_data['precision'] = measures.get(PrecisionEvaluator.class);
_data['recall'] = measures.get(RecallEvaluator.class);
// 构建评分任务
task = new RatingTask(RandomGuessModel.class, configurator);
// 训练与评估模型并获取评分指标
measures = task.execute();
_data['mae'] = measures.get(MAEEvaluator.class);
_data['mse'] = measures.get(MSEEvaluator.class);
_data;
```
* 使用JStarCraft框架从Model.js文件加载并执行JS脚本
```java
// 获取JS脚本
File file = new File(ScriptTestCase.class.getResource("Model.js").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置JS脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置JS脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行JS脚本
ScriptExpression expression = new JsExpression(context, scope, script);
Map<String, Float> data = expression.doWith(Map.class);
```
#### 6.4JStarCraft-RNS引擎与Kotlin脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写Kotlin脚本训练与评估模型并保存到Model.kt文件
```js
// 构建配置
var keyValues = Properties();
var loader = bindings["loader"] as ClassLoader;
keyValues.load(loader.getResourceAsStream("data.properties"));
keyValues.load(loader.getResourceAsStream("model/benchmark/randomguess-test.properties"));
var option = Option(keyValues);
// 此对象会返回给Java程序
var _data = mutableMapOf<String, Float>();
// 构建排序任务
var rankingTask = RankingTask(RandomGuessModel::class.java, option);
// 训练与评估模型并获取排序指标
val rankingMeasures = rankingTask.execute();
_data["precision"] = rankingMeasures.getFloat(PrecisionEvaluator::class.java);
_data["recall"] = rankingMeasures.getFloat(RecallEvaluator::class.java);
// 构建评分任务
var ratingTask = RatingTask(RandomGuessModel::class.java, option);
// 训练与评估模型并获取评分指标
var ratingMeasures = ratingTask.execute();
_data["mae"] = ratingMeasures.getFloat(MAEEvaluator::class.java);
_data["mse"] = ratingMeasures.getFloat(MSEEvaluator::class.java);
_data;
```
* 使用JStarCraft框架从Model.kt文件加载并执行Kotlin脚本
```java
// 获取Kotlin脚本
File file = new File(ScriptTestCase.class.getResource("Model.kt").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置Kotlin脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Option", MapOption.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置Kotlin脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行Kotlin脚本
ScriptExpression expression = new KotlinExpression(context, scope, script);
Map<String, Float> data = expression.doWith(Map.class);
```
#### 6.5JStarCraft-RNS引擎与Lua脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写Lua脚本训练与评估模型并保存到Model.lua文件
```lua
-- 构建配置
local keyValues = Properties.new();
keyValues:load(loader:getResourceAsStream("data.properties"));
keyValues:load(loader:getResourceAsStream("recommend/benchmark/randomguess-test.properties"));
local configurator = Configurator.new({ keyValues });
-- 此对象会返回给Java程序
local _data = {};
-- 构建排序任务
task = RankingTask.new(RandomGuessModel, configurator);
-- 训练与评估模型并获取排序指标
measures = task:execute();
_data["precision"] = measures:get(PrecisionEvaluator);
_data["recall"] = measures:get(RecallEvaluator);
-- 构建评分任务
task = RatingTask.new(RandomGuessModel, configurator);
-- 训练与评估模型并获取评分指标
measures = task:execute();
_data["mae"] = measures:get(MAEEvaluator);
_data["mse"] = measures:get(MSEEvaluator);
return _data;
```
* 使用JStarCraft框架从Model.lua文件加载并执行Lua脚本
```java
// 获取Lua脚本
File file = new File(ScriptTestCase.class.getResource("Model.lua").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置Lua脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置Lua脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行Lua脚本
ScriptExpression expression = new LuaExpression(context, scope, script);
LuaTable data = expression.doWith(LuaTable.class);
```
#### 6.6JStarCraft-RNS引擎与Python脚本交互
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写Python脚本训练与评估模型并保存到Model.py文件
```python
# 构建配置
keyValues = Properties()
keyValues.load(loader.getResourceAsStream("data.properties"))
keyValues.load(loader.getResourceAsStream("recommend/benchmark/randomguess-test.properties"))
configurator = Configurator([keyValues])
# 此对象会返回给Java程序
_data = {}
# 构建排序任务
task = RankingTask(RandomGuessModel, configurator)
# 训练与评估模型并获取排序指标
measures = task.execute()
_data['precision'] = measures.get(PrecisionEvaluator)
_data['recall'] = measures.get(RecallEvaluator)
# 构建评分任务
task = RatingTask(RandomGuessModel, configurator)
# 训练与评估模型并获取评分指标
measures = task.execute()
_data['mae'] = measures.get(MAEEvaluator)
_data['mse'] = measures.get(MSEEvaluator)
```
* 使用JStarCraft框架从Model.py文件加载并执行Python脚本
```java
// 设置Python环境变量
System.setProperty("python.console.encoding", StringUtility.CHARSET.name());
// 获取Python脚本
File file = new File(PythonTestCase.class.getResource("Model.py").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置Python脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置Python脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行Python脚本
ScriptExpression expression = new PythonExpression(context, scope, script);
Map<String, Double> data = expression.doWith(Map.class);
```
#### 6.7JStarCraft-Ruby
* [完整示例](https://github.com/HongZhaoHua/jstarcraft-rns/tree/master/src/test/java/com/jstarcraft/rns/script)
* 编写Ruby脚本训练与评估模型并保存到Model.rb文件
```ruby
# 构建配置
keyValues = Properties.new()
keyValues.load($loader.getResourceAsStream("data.properties"))
keyValues.load($loader.getResourceAsStream("model/benchmark/randomguess-test.properties"))
configurator = Configurator.new(keyValues)
# 此对象会返回给Java程序
_data = Hash.new()
# 构建排序任务
task = RankingTask.new(RandomGuessModel.java_class, configurator)
# 训练与评估模型并获取排序指标
measures = task.execute()
_data['precision'] = measures.get(PrecisionEvaluator.java_class)
_data['recall'] = measures.get(RecallEvaluator.java_class)
# 构建评分任务
task = RatingTask.new(RandomGuessModel.java_class, configurator)
# 训练与评估模型并获取评分指标
measures = task.execute()
_data['mae'] = measures.get(MAEEvaluator.java_class)
_data['mse'] = measures.get(MSEEvaluator.java_class)
_data;
```
* 使用JStarCraft框架从Model.rb文件加载并执行Ruby脚本
```java
// 获取Ruby脚本
File file = new File(ScriptTestCase.class.getResource("Model.rb").toURI());
String script = FileUtils.readFileToString(file, StringUtility.CHARSET);
// 设置Ruby脚本使用到的Java类
ScriptContext context = new ScriptContext();
context.useClasses(Properties.class, Assert.class);
context.useClass("Configurator", MapConfigurator.class);
context.useClasses("com.jstarcraft.ai.evaluate");
context.useClasses("com.jstarcraft.rns.task");
context.useClasses("com.jstarcraft.rns.model.benchmark");
// 设置Ruby脚本使用到的Java变量
ScriptScope scope = new ScriptScope();
scope.createAttribute("loader", loader);
// 执行Ruby脚本
ScriptExpression expression = new RubyExpression(context, scope, script);
Map<String, Double> data = expression.doWith(Map.class);
Assert.assertEquals(0.005825241096317768D, data.get("precision"), 0D);
Assert.assertEquals(0.011579763144254684D, data.get("recall"), 0D);
Assert.assertEquals(1.270874261856079D, data.get("mae"), 0D);
Assert.assertEquals(2.425075054168701D, data.get("mse"), 0D);
```
****
## 7.对比
#### 7.1排序模型对比
* 基准模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | AUC | MAP | MRR | NDCG | Novelty | Precision | Recall |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| MostPopular | filmtrust | 43 | 273 | 0.92080 | 0.41246 | 0.57196 | 0.51583 | 11.79295 | 0.33230 | 0.62385 |
| RandomGuess | filmtrust | 38 | 391 | 0.51922 | 0.00627 | 0.02170 | 0.01121 | 91.94900 | 0.00550 | 0.01262 |
* 协同模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | AUC | MAP | MRR | NDCG | Novelty | Precision | Recall |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| AoBPR | filmtrust | 12448 | 253 | 0.89324 | 0.38967 | 0.53990 | 0.48338 | 21.13004 | 0.32295 | 0.56864 |
| AspectRanking | filmtrust | 177 | 58 | 0.85130 | 0.15498 | 0.42480 | 0.26012 | 37.36273 | 0.13302 | 0.31292 |
| BHFreeRanking | filmtrust | 5720 | 4257 | 0.92080 | 0.41316 | 0.57231 | 0.51662 | 11.79567 | 0.33276 | 0.62500 |
| BPR | filmtrust | 4228 | 137 | 0.89390 | 0.39886 | 0.54790 | 0.49180 | 21.46738 | 0.32268 | 0.57623 |
| BUCMRanking | filmtrust | 2111 | 1343 | 0.90782 | 0.39794 | 0.55776 | 0.49651 | 13.08073 | 0.32407 | 0.59141 |
| CDAE | filmtrust | 89280 | 376 | 0.91880 | 0.40759 | 0.56855 | 0.51089 | 11.82466 | 0.33051 | 0.61967 |
| CLiMF | filmtrust | 48429 | 140 | 0.88293 | 0.37395 | 0.52407 | 0.46572 | 19.38964 | 0.32049 | 0.54605 |
| DeepFM | filmtrust | 69264 | 99 | 0.91679 | 0.40580 | 0.56995 | 0.50985 | 11.90242 | 0.32719 | 0.61426 |
| EALS | filmtrust | 850 | 185 | 0.86132 | 0.31263 | 0.45680 | 0.39475 | 20.08964 | 0.27381 | 0.46271 |
| FISMAUC | filmtrust | 2338 | 663 | 0.91216 | 0.40032 | 0.55730 | 0.50114 | 12.07469 | 0.32845 | 0.60294 |
| FISMRMSE | filmtrust | 4030 | 729 | 0.91482 | 0.40795 | 0.56470 | 0.50920 | 11.91234 | 0.33044 | 0.61107 |
| GBPR | filmtrust | 14827 | 150 | 0.92113 | 0.41003 | 0.57144 | 0.51464 | 11.87609 | 0.33090 | 0.62512 |
| HMM | game | 38697 | 11223 | 0.80559 | 0.18156 | 0.37516 | 0.25803 | 16.01041 | 0.14572 | 0.22810 |
| ItemBigram | filmtrust | 12492 | 61 | 0.88807 | 0.33520 | 0.46870 | 0.42854 | 17.11172 | 0.29191 | 0.53308 |
| ItemKNNRanking | filmtrust | 2683 | 250 | 0.87438 | 0.33375 | 0.46951 | 0.41767 | 20.23449 | 0.28581 | 0.49248 |
| LDA | filmtrust | 696 | 161 | 0.91980 | 0.41758 | 0.58130 | 0.52003 | 12.31348 | 0.33336 | 0.62274 |
| LambdaFMStatic | game | 25052 | 27078 | 0.87064 | 0.27294 | 0.43640 | 0.34794 | 16.47330 | 0.13941 | 0.35696 |
| LambdaFMWeight | game | 25232 | 28156 | 0.87339 | 0.27333 | 0.43720 | 0.34728 | 14.71413 | 0.13742 | 0.35252 |
| LambdaFMDynamic | game | 74218 | 27921 | 0.87380 | 0.27288 | 0.43648 | 0.34706 | 13.50578 | 0.13822 | 0.35132 |
| ListwiseMF | filmtrust | 714 | 161 | 0.90820 | 0.40511 | 0.56619 | 0.50521 | 15.53665 | 0.32944 | 0.60092 |
| PLSA | filmtrust | 1027 | 116 | 0.89950 | 0.41217 | 0.57187 | 0.50597 | 16.01080 | 0.32401 | 0.58557 |
| RankALS | filmtrust | 3285 | 182 | 0.85901 | 0.29255 | 0.51014 | 0.38871 | 25.27197 | 0.22931 | 0.42509 |
| RankCD | product | 1442 | 8905 | 0.56271 | 0.01253 | 0.04618 | 0.02682 | 55.42019 | 0.01548 | 0.03520 |
| RankSGD | filmtrust | 309 | 113 | 0.80388 | 0.23587 | 0.42290 | 0.32081 | 42.83305 | 0.19363 | 0.35374 |
| RankVFCD | product | 54273 | 6524 | 0.58022 | 0.01784 | 0.06181 | 0.03664 | 62.95810 | 0.01980 | 0.04852 |
| SLIM | filmtrust | 62434 | 91 | 0.91849 | 0.44851 | 0.61083 | 0.54557 | 16.67990 | 0.34019 | 0.63021 |
| UserKNNRanking | filmtrust | 1154 | 229 | 0.90752 | 0.41616 | 0.57525 | 0.51393 | 12.90921 | 0.32891 | 0.60152 |
| VBPR | product | 184473 | 15304 | 0.54336 | 0.00920 | 0.03522 | 0.01883 | 45.05101 | 0.01037 | 0.02266 |
| WBPR | filmtrust | 20705 | 183 | 0.78072 | 0.24647 | 0.33373 | 0.30442 | 17.18609 | 0.25000 | 0.35516 |
| WRMF | filmtrust | 482 | 158 | 0.90616 | 0.43278 | 0.58284 | 0.52480 | 15.17956 | 0.32918 | 0.60780 |
| RankGeoFM | FourSquare | 368436 | 1093 | 0.72708 | 0.05485 | 0.24012 | 0.11057 | 37.50040 | 0.07866 | 0.08640 |
| SBPR | filmtrust | 41481 | 247 | 0.91010 | 0.41189 | 0.56480 | 0.50726 | 15.67905 | 0.32440 | 0.59699 |
* 内容模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | AUC | MAP | MRR | NDCG | Novelty | Precision | Recall |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| EFMRanking | dc_dense | 2066 | 2276 | 0.61271 | 0.01611 | 0.04631 | 0.04045 | 53.26140 | 0.02387 | 0.07357 |
| TFIDF | musical_instruments | 942 | 1085 | 0.52756 | 0.01067 | 0.01917 | 0.01773 | 72.71228 | 0.00588 | 0.03103 |
#### 7.2评分模型对比
* 基准模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | MAE | MPE | MSE |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| ConstantGuess | filmtrust | 137 | 45 | 1.05608 | 1.00000 | 1.42309 |
| GlobalAverage | filmtrust | 60 | 13 | 0.71977 | 0.77908 | 0.85199 |
| ItemAverage | filmtrust | 59 | 12 | 0.72968 | 0.97242 | 0.86413 |
| ItemCluster | filmtrust | 471 | 41 | 0.71976 | 0.77908 | 0.85198 |
| RandomGuess | filmtrust | 38 | 8 | 1.28622 | 0.99597 | 2.47927 |
| UserAverage | filmtrust | 35 | 9 | 0.64618 | 0.97242 | 0.70172 |
| UserCluster | filmtrust | 326 | 45 | 0.71977 | 0.77908 | 0.85199 |
* 协同模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | MAE | MPE | MSE |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| AspectRating | filmtrust | 220 | 5 | 0.65754 | 0.97918 | 0.71809 |
| ASVDPlusPlus | filmtrust | 5631 | 8 | 0.71975 | 0.77921 | 0.85196 |
| BiasedMF | filmtrust | 92 | 6 | 0.63157 | 0.98387 | 0.66220 |
| BHFreeRating | filmtrust | 6667 | 76 | 0.71974 | 0.77908 | 0.85198 |
| BPMF | filmtrust | 25942 | 52 | 0.66504 | 0.98465 | 0.70210 |
| BUCMRating | filmtrust | 1843 | 30 | 0.64834 | 0.99102 | 0.67992 |
| CCD | product | 15715 | 9 | 0.96670 | 0.93947 | 1.62145 |
| FFM | filmtrust | 5422 | 6 | 0.63446 | 0.98413 | 0.66682 |
| FMALS | filmtrust | 1854 | 5 | 0.64788 | 0.96032 | 0.73636 |
| FMSGD | filmtrust | 3496 | 10 | 0.63452 | 0.98426 | 0.66710 |
| GPLSA | filmtrust | 2567 | 7 | 0.67311 | 0.98972 | 0.79883 |
| IRRG | filmtrust | 40284 | 6 | 0.64766 | 0.98777 | 0.73700 |
| ItemKNNRating | filmtrust | 2052 | 27 | 0.62341 | 0.95394 | 0.67312 |
| LDCC | filmtrust | 8650 | 84 | 0.66383 | 0.99284 | 0.70666 |
| LLORMA | filmtrust | 16618 | 82 | 0.64930 | 0.96591 | 0.76067 |
| MFALS | filmtrust | 2944 | 5 | 0.82939 | 0.94549 | 1.30547 |
| NMF | filmtrust | 1198 | 8 | 0.67661 | 0.96604 | 0.83493 |
| PMF | filmtrust | 215 | 7 | 0.72959 | 0.98165 | 0.99948 |
| RBM | filmtrust | 19551 | 270 | 0.74484 | 0.98504 | 0.88968 |
| RFRec | filmtrust | 16330 | 54 | 0.64008 | 0.97112 | 0.69390 |
| SVDPlusPlus | filmtrust | 452 | 26 | 0.65248 | 0.99141 | 0.68289 |
| URP | filmtrust | 1514 | 25 | 0.64207 | 0.99128 | 0.67122 |
| UserKNNRating | filmtrust | 1121 | 135 | 0.63933 | 0.94640 | 0.69280 |
| RSTE | filmtrust | 4052 | 10 | 0.64303 | 0.99206 | 0.67777 |
| SocialMF | filmtrust | 918 | 13 | 0.64668 | 0.98881 | 0.68228 |
| SoRec | filmtrust | 1048 | 10 | 0.64305 | 0.99232 | 0.67776 |
| SoReg | filmtrust | 635 | 8 | 0.65943 | 0.96734 | 0.72760 |
| TimeSVD | filmtrust | 11545 | 36 | 0.68954 | 0.93326 | 0.87783 |
| TrustMF | filmtrust | 2038 | 7 | 0.63787 | 0.98985 | 0.69017 |
| TrustSVD | filmtrust | 12465 | 22 | 0.61984 | 0.98933 | 0.63875 |
| AssociationRule | filmtrust | 2628 | 195 | 0.90853 | 0.41801 | 0.57777 | 0.51621 | 12.65794 | 0.33263 | 0.60700 |
| PersonalityDiagnosis | filmtrust | 45 | 642 | 0.72964 | 0.76620 | 1.03071 |
| PRankD | filmtrust | 3321 | 170 | 0.74472 | 0.22894 | 0.32406 | 0.28390 | 45.81069 | 0.19436 | 0.32904 |
| SlopeOne | filmtrust | 135 | 28 | 0.63788 | 0.96175 | 0.71057 |
* 内容模型
| 名称 | 数据集 | 训练 (毫秒) | 预测 (毫秒) | MAE | MPE | MSE |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| EFMRating | dc_dense | 659 | 8 | 0.61546 | 0.85364 | 0.78279 |
| HFT | musical_instruments | 162753 | 13 | 0.64272 | 0.94886 | 0.81393 |
| TopicMFAT | musical_instruments | 6907 | 7 | 0.61896 | 0.98734 | 0.72545 |
| TopicMFMT | musical_instruments | 6323 | 7 | 0.61896 | 0.98734 | 0.72545 |
## 8.参考
#### 8.1个性化模型说明
* 基准模型
| 名称 | 问题 | 说明/论文 |
| :----: | :----: | :----: |
| RandomGuess | Ranking Rating | 随机猜测 |
| MostPopular | Ranking| 最受欢迎 |
| ConstantGuess | Rating | 常量猜测 |
| GlobalAverage | Rating | 全局平均 |
| ItemAverage | Rating | 物品平均 |
| ItemCluster | Rating | 物品聚类 |
| UserAverage | Rating | 用户平均 |
| UserCluster | Rating | 用户聚类 |
* 协同模型
| 名称 | 问题 | 说明/论文 |
| :----: | :----: | :----: |
| AspectModel | Ranking Rating | Latent class models for collaborative filtering |
| BHFree | Ranking Rating | Balancing Prediction and Recommendation Accuracy: Hierarchical Latent Factors for Preference Data |
| BUCM | Ranking Rating | Modeling Item Selection and Relevance for Accurate Recommendations |
| ItemKNN | Ranking Rating | 基于物品的协同过滤 |
| UserKNN | Ranking Rating | 基于用户的协同过滤 |
| AoBPR | Ranking | Improving pairwise learning for item recommendation from implicit feedback |
| BPR | Ranking | BPR: Bayesian Personalized Ranking from Implicit Feedback |
| CLiMF | Ranking | CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering |
| EALS | Ranking | Collaborative filtering for implicit feedback dataset |
| FISM | Ranking | FISM: Factored Item Similarity Models for Top-N Recommender Systems |
| GBPR | Ranking | GBPR: Group Preference Based Bayesian Personalized Ranking for One-Class Collaborative Filtering |
| HMMForCF | Ranking | A Hidden Markov Model Purpose: A class for the model, including parameters |
| ItemBigram | Ranking | Topic Modeling: Beyond Bag-of-Words |
| LambdaFM | Ranking | LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates |
| LDA | Ranking | Latent Dirichlet Allocation for implicit feedback |
| ListwiseMF | Ranking | List-wise learning to rank with matrix factorization for collaborative filtering |
| PLSA | Ranking | Latent semantic models for collaborative filtering |
| RankALS | Ranking | Alternating Least Squares for Personalized Ranking |
| RankSGD | Ranking | Collaborative Filtering Ensemble for Ranking |
| SLIM | Ranking | SLIM: Sparse Linear Methods for Top-N Recommender Systems |
| WBPR | Ranking | Bayesian Personalized Ranking for Non-Uniformly Sampled Items |
| WRMF | Ranking | Collaborative filtering for implicit feedback datasets |
| Rank-GeoFM | Ranking | Rank-GeoFM: A ranking based geographical factorization method for point of interest recommendation |
| SBPR | Ranking | Leveraging Social Connections to Improve Personalized Ranking for Collaborative Filtering |
| AssociationRule | Ranking | A Recommendation Algorithm Using Multi-Level Association Rules |
| PRankD | Ranking | Personalised ranking with diversity |
| AsymmetricSVD++ | Rating | Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model |
| AutoRec | Rating | AutoRec: Autoencoders Meet Collaborative Filtering |
| BPMF | Rating | Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo |
| CCD | Rating | Large-Scale Parallel Collaborative Filtering for the Netflix Prize |
| FFM | Rating | Field Aware Factorization Machines for CTR Prediction |
| GPLSA | Rating | Collaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis |
| IRRG | Rating | Exploiting Implicit Item Relationships for Recommender Systems |
| MFALS | Rating | Large-Scale Parallel Collaborative Filtering for the Netflix Prize |
| NMF | Rating | Algorithms for Non-negative Matrix Factorization |
| PMF | Rating | PMF: Probabilistic Matrix Factorization |
| RBM | Rating | Restricted Boltzman Machines for Collaborative Filtering |
| RF-Rec | Rating | RF-Rec: Fast and Accurate Computation of Recommendations based on Rating Frequencies |
| SVD++ | Rating | Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model |
| URP | Rating | User Rating Profile: a LDA model for rating prediction |
| RSTE | Rating | Learning to Recommend with Social Trust Ensemble |
| SocialMF | Rating | A matrix factorization technique with trust propagation for recommendation in social networks |
| SoRec | Rating | SoRec: Social recommendation using probabilistic matrix factorization |
| SoReg | Rating | Recommender systems with social regularization |
| TimeSVD++ | Rating | Collaborative Filtering with Temporal Dynamics |
| TrustMF | Rating | Social Collaborative Filtering by Trust |
| TrustSVD | Rating | TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings |
| PersonalityDiagnosis | Rating | A brief introduction to Personality Diagnosis |
| SlopeOne | Rating | Slope One Predictors for Online Rating-Based Collaborative Filtering |
* 内容模型
| 名称 | 问题 | 说明/论文 |
| :----: | :----: | :----: |
| EFM | Ranking Rating | Explicit factor models for explainable recommendation based on phrase-level sentiment analysis |
| TF-IDF | Ranking | 词频-逆文档频率 |
| HFT | Rating | Hidden factors and hidden topics: understanding rating dimensions with review text |
| TopicMF | Rating | TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation |
#### 8.2数据集说明
* [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/)
* [Bibsonomy Dataset](https://www.kde.cs.uni-kassel.de/wp-content/uploads/bibsonomy/)
* [BookCrossing Dataset](https://grouplens.org/datasets/book-crossing/)
* [Ciao Dataset](https://www.cse.msu.edu/~tangjili/datasetcode/truststudy.htm)
* [Douban Dataset](http://smiles.xjtu.edu.cn/Download/Download_Douban.html)
* [Eachmovie Dataset](https://grouplens.org/datasets/eachmovie/)
* [Epinions Dataset](http://www.trustlet.org/epinions.html)
* [Foursquare Dataset](https://sites.google.com/site/yangdingqi/home/foursquare-dataset)
* [Goodbooks Dataset](http://fastml.com/goodbooks-10k-a-new-dataset-for-book-recommendations/)
* [Gowalla Dataset](http://snap.stanford.edu/data/loc-gowalla.html)
* [HetRec2011 Dataset](https://grouplens.org/datasets/hetrec-2011/)
* [Jest Joker Dataset](https://grouplens.org/datasets/jester/)
* [Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/)
* [MovieLens Dataset](https://grouplens.org/datasets/movielens/)
* [Newsgroups Dataset](http://qwone.com/~jason/20Newsgroups/)
* [Stanford Large Network Dataset](http://snap.stanford.edu/data/)
* [Serendipity 2018 Dataset](https://grouplens.org/datasets/serendipity-2018/)
* [Wikilens Dataset](https://grouplens.org/datasets/wikilens/)
* [Yelp Dataset](https://www.yelp.com/dataset)
* [Yongfeng Zhang Dataset](http://yongfeng.me/dataset/)
JStarCraft RNS是一个面向信息检索领域的轻量级引擎.遵循Apache 2.0协议.

BIN
data.7z Normal file

Binary file not shown.

156
pom.xml Normal file
View File

@ -0,0 +1,156 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.jstarcraft</groupId>
<artifactId>jstarcraft-rns</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<!-- 兼容Java 11 -->
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
<!-- JStarCraft框架依赖 -->
<dependency>
<groupId>com.jstarcraft</groupId>
<artifactId>jstarcraft-core-script</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>com.jstarcraft</groupId>
<artifactId>jstarcraft-ai-model</artifactId>
<version>1.0</version>
</dependency>
<!-- 搜索与推荐框架依赖 -->
<dependency>
<groupId>ch.obermuhlner</groupId>
<artifactId>big-math</artifactId>
<version>2.1.0</version>
</dependency>
<!-- Test框架依赖 -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>5.1.6.RELEASE</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>net.sourceforge.jdistlib</groupId>
<artifactId>jdistlib</artifactId>
<version>0.4.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>1.0.0-beta3</version>
<scope>test</scope>
</dependency>
<!-- <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-cuda-9.2-platform</artifactId> <version>1.0.0-beta3</version> <scope>test</scope> </dependency> -->
<!-- 桥接器:Slf4j使用Log4j2 -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.11.2</version>
<scope>test</scope>
</dependency>
<!-- 桥接器:Commons Logging使用Log4j2 -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-jcl</artifactId>
<version>2.11.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache-extras.beanshell</groupId>
<artifactId>bsh</artifactId>
<version>2.0b6</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
<version>2.4.16</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-scripting-jsr223</artifactId>
<version>1.4.21</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.luaj</groupId>
<artifactId>luaj-jse</artifactId>
<version>3.0.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mvel</groupId>
<artifactId>mvel2</artifactId>
<version>2.4.4.Final</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.python</groupId>
<artifactId>jython-standalone</artifactId>
<version>2.7.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
<version>9.2.11.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>

View File

@ -0,0 +1,27 @@
package com.jstarcraft.rns.data.processor;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.processor.DataSorter;
public class AllFeatureDataSorter implements DataSorter {
@Override
public int sort(DataInstance left, DataInstance right) {
for (int dimension = 0, order = left.getQualityOrder(); dimension < order; dimension++) {
int leftValue = left.getQualityFeature(dimension);
int rightValue = right.getQualityFeature(dimension);
if (leftValue != rightValue) {
return leftValue < rightValue ? -1 : 1;
}
}
for (int dimension = 0, order = right.getQuantityOrder(); dimension < order; dimension++) {
float leftValue = left.getQuantityFeature(dimension);
float rightValue = right.getQuantityFeature(dimension);
if (leftValue != rightValue) {
return leftValue < rightValue ? -1 : 1;
}
}
return 0;
}
}

View File

@ -0,0 +1,24 @@
package com.jstarcraft.rns.data.processor;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.processor.DataSorter;
public class QualityFeatureDataSorter implements DataSorter {
private int dimension;
public QualityFeatureDataSorter(int dimension) {
this.dimension = dimension;
}
@Override
public int sort(DataInstance left, DataInstance right) {
int leftValue = left.getQualityFeature(dimension);
int rightValue = right.getQualityFeature(dimension);
if (leftValue != rightValue) {
return leftValue < rightValue ? -1 : 1;
}
return 0;
}
}

View File

@ -0,0 +1,19 @@
package com.jstarcraft.rns.data.processor;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.processor.DataSplitter;
public class QualityFeatureDataSplitter implements DataSplitter {
private int dimension;
public QualityFeatureDataSplitter(int dimension) {
this.dimension = dimension;
}
@Override
public int split(DataInstance instance) {
return instance.getQualityFeature(dimension);
}
}

View File

@ -0,0 +1,24 @@
package com.jstarcraft.rns.data.processor;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.processor.DataSorter;
public class QuantityFeatureDataSorter implements DataSorter {
private int dimension;
public QuantityFeatureDataSorter(int dimension) {
this.dimension = dimension;
}
@Override
public int sort(DataInstance left, DataInstance right) {
float leftValue = left.getQuantityFeature(dimension);
float rightValue = right.getQuantityFeature(dimension);
if (leftValue != rightValue) {
return leftValue < rightValue ? -1 : 1;
}
return 0;
}
}

View File

@ -0,0 +1,35 @@
package com.jstarcraft.rns.data.processor;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.IntegerArray;
import com.jstarcraft.ai.data.module.ReferenceModule;
import com.jstarcraft.ai.data.processor.DataSorter;
import com.jstarcraft.core.utility.RandomUtility;
public class RandomDataSorter implements DataSorter {
@Override
public int sort(DataInstance left, DataInstance right) {
throw new UnsupportedOperationException();
}
@Override
public ReferenceModule sort(DataModule module) {
int size = module.getSize();
IntegerArray reference = new IntegerArray(size, size);
for (int index = 0; index < size; index++) {
reference.associateData(index);
}
int from = 0;
int to = size;
for (int index = from; index < to; index++) {
int random = RandomUtility.randomInteger(from, to);
int data = reference.getData(index);
reference.setData(index, reference.getData(random));
reference.setData(random, data);
}
return new ReferenceModule(reference, module);
}
}

View File

@ -0,0 +1,100 @@
package com.jstarcraft.rns.model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.processor.DataSorter;
import com.jstarcraft.ai.data.processor.DataSplitter;
import com.jstarcraft.ai.environment.EnvironmentContext;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.rns.data.processor.AllFeatureDataSorter;
import com.jstarcraft.rns.data.processor.QualityFeatureDataSplitter;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
* 抽象推荐器
*
* @author Birdy
*
*/
public abstract class AbstractModel implements Model {
protected final Logger logger = LoggerFactory.getLogger(this.getClass());
protected String id;
// 参数部分
/** 用户字段, 物品字段, 分数字段 */
protected String userField, itemField;
/** 用户维度, 物品维度 */
protected int userDimension, itemDimension;
/** 用户数量, 物品数量 */
protected int userSize, itemSize;
/** 最低分数, 最高分数, 平均分数 */
protected float minimumScore, maximumScore, meanScore;
/** 行为数量(TODO 此字段可能迁移到其它类.为避免重复行为,一般使用matrix或者tensor的元素数量) */
protected int actionSize;
/** 训练矩阵(TODO 准备改名为actionMatrix或者scoreMatrix) */
protected SparseMatrix scoreMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
userField = configuration.getString("data.model.fields.user", "user");
itemField = configuration.getString("data.model.fields.item", "item");
userDimension = model.getQualityInner(userField);
itemDimension = model.getQualityInner(itemField);
userSize = space.getQualityAttribute(userField).getSize();
itemSize = space.getQualityAttribute(itemField).getSize();
DataSplitter splitter = new QualityFeatureDataSplitter(userDimension);
DataModule[] models = splitter.split(model, userSize);
DataSorter sorter = new AllFeatureDataSorter();
for (int index = 0; index < userSize; index++) {
models[index] = sorter.sort(models[index]);
}
HashMatrix dataTable = new HashMatrix(true, userSize, itemSize, new Long2FloatRBTreeMap());
for (DataInstance instance : model) {
int rowIndex = instance.getQualityFeature(userDimension);
int columnIndex = instance.getQualityFeature(itemDimension);
dataTable.setValue(rowIndex, columnIndex, instance.getQuantityMark());
}
scoreMatrix = SparseMatrix.valueOf(userSize, itemSize, dataTable);
actionSize = scoreMatrix.getElementSize();
KeyValue<Float, Float> attribute = scoreMatrix.getBoundary(false);
minimumScore = attribute.getKey();
maximumScore = attribute.getValue();
meanScore = scoreMatrix.getSum(false);
meanScore /= actionSize;
}
protected abstract void doPractice();
protected void constructEnvironment() {
}
protected void destructEnvironment() {
}
@Override
public final void practice() {
EnvironmentContext context = EnvironmentContext.getContext();
context.doAlgorithmByEvery(this::constructEnvironment);
doPractice();
context.doAlgorithmByEvery(this::destructEnvironment);
}
}

View File

@ -0,0 +1,104 @@
package com.jstarcraft.rns.model;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.StringUtility;
import com.jstarcraft.rns.model.exception.ModelException;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
* 模型推荐器
*
* <pre>
* 与机器学习相关
* </pre>
*
* @author Birdy
*
*/
public abstract class EpocheModel extends AbstractModel {
/** 周期次数 */
protected int epocheSize;
/** 是否收敛(early-stop criteria) */
protected boolean isConverged;
/** 用于观察损失率 */
protected float totalError, currentError;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// 参数部分
epocheSize = configuration.getInteger("recommender.iterator.maximum", 100);
isConverged = configuration.getBoolean("recommender.recommender.earlystop", false);
}
/**
* 是否收敛
*
* @param iteration
* @return
*/
protected boolean isConverged(int iteration) {
float deltaError = currentError - totalError;
// print out debug info
if (logger.isInfoEnabled()) {
String message = StringUtility.format("{} : epoch is {}, total is {}, delta is {}", getClass().getSimpleName(), iteration, totalError, deltaError);
logger.info(message);
}
if (Float.isNaN(totalError) || Float.isInfinite(totalError)) {
throw new ModelException("Loss = NaN or Infinity: current settings does not fit the recommender! Change the settings and try again!");
}
// check if converged
boolean converged = Math.abs(deltaError) < MathUtility.EPSILON;
return converged;
}
/**
* fajie To calculate cmg based on pairwise loss function type
*
* @param lossType
* @param error
* @return
*/
protected final float calaculateGradientValue(int lossType, float error) {
final float constant = 1F;
float value = 0F;
switch (lossType) {
case 0:// Hinge loss
if (constant * error <= 1F)
value = constant;
break;
case 1:// Rennie loss
if (constant * error <= 0F)
value = -constant;
else if (constant * error <= 1F)
value = (1F - constant * error) * (-constant);
else
value = 0F;
value = -value;
break;
case 2:// logistic loss, BPR
value = LogisticUtility.getValue(-error);
break;
case 3:// Frank loss
value = (float) (Math.sqrt(LogisticUtility.getValue(error)) / (1F + Math.exp(error)));
break;
case 4:// Exponential loss
value = (float) Math.exp(-error);
break;
case 5:// quadratically smoothed
if (error <= 1F)
value = 0.5F * (1F - error);
break;
default:
break;
}
return value;
}
}

View File

@ -0,0 +1,211 @@
package com.jstarcraft.rns.model;
import java.util.Map.Entry;
import org.apache.commons.math3.distribution.NormalDistribution;
import org.apache.commons.math3.random.JDKRandomGenerator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.probability.QuantityProbability;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.rns.model.exception.ModelException;
/**
* Factorization Machine Recommender
*
* Rendle, Steffen, et al., <strong>Fast Context-aware Recommendations with
* Factorization Machines</strong>, SIGIR, 2011.
*
* @author Tang Jiaxi and Ma Chen
*/
// TODO 论文中需要支持组合特征(比如:历史评价过的电影),现在的代码并没有实现.
public abstract class FactorizationMachineModel extends EpocheModel {
/** 是否自动调整学习率 */
protected boolean isLearned;
/** 衰减率 */
protected float learnDecay;
/**
* learn rate, maximum learning rate
*/
protected float learnRatio, learnLimit;
protected DataModule marker;
/**
* global bias
*/
protected float globalBias;
/**
* appender vector size: number of users + number of items + number of
* contextual conditions
*/
protected int featureSize;
/**
* number of factors
*/
protected int factorSize;
/**
* weight vector
*/
protected DenseVector weightVector; // p
/**
* parameter matrix(featureFactors)
*/
protected DenseMatrix featureFactors; // p x k
/**
* parameter matrix(rateFactors)
*/
protected DenseMatrix actionFactors; // n x k
/**
* regularization term for weight and factors
*/
protected float biasRegularization, weightRegularization, factorRegularization;
/**
* init mean
*/
protected float initMean;
/**
* init standard deviation
*/
protected float initStd;
protected QuantityProbability distribution;
protected int[] dimensionSizes;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
isLearned = configuration.getBoolean("recommender.learnrate.bolddriver", false);
learnDecay = configuration.getFloat("recommender.learnrate.decay", 1.0f);
learnRatio = configuration.getFloat("recommender.iterator.learnrate", 0.01f);
learnLimit = configuration.getFloat("recommender.iterator.learnrate.maximum", 1000.0f);
maximumScore = configuration.getFloat("recommender.recommender.maxrate", 12F);
minimumScore = configuration.getFloat("recommender.recommender.minrate", 0F);
factorSize = configuration.getInteger("recommender.factor.number");
// init all weight with zero
globalBias = 0;
// init factors with small value
// TODO 此处需要重构
initMean = configuration.getFloat("recommender.init.mean", 0F);
initStd = configuration.getFloat("recommender.init.std", 0.1F);
biasRegularization = configuration.getFloat("recommender.fm.regw0", 0.01F);
weightRegularization = configuration.getFloat("recommender.fm.regW", 0.01F);
factorRegularization = configuration.getFloat("recommender.fm.regF", 10F);
// TODO 暂时不支持连续特征,考虑将连续特征离散化.
this.marker = model;
dimensionSizes = new int[marker.getQualityOrder()];
// TODO 考虑重构,在AbstractRecommender初始化
actionSize = marker.getSize();
// initialize the parameters of FM
// TODO 此处需要重构,外部索引与内部索引的映射转换
for (int orderIndex = 0, orderSize = marker.getQualityOrder() + marker.getQuantityOrder(); orderIndex < orderSize; orderIndex++) {
Entry<Integer, KeyValue<String, Boolean>> term = marker.getOuterKeyValue(orderIndex);
if (term.getValue().getValue()) {
// 处理离散维度
dimensionSizes[marker.getQualityInner(term.getValue().getKey())] = space.getQualityAttribute(term.getValue().getKey()).getSize();
featureSize += dimensionSizes[marker.getQualityInner(term.getValue().getKey())];
} else {
// 处理连续维度
}
}
weightVector = DenseVector.valueOf(featureSize);
distribution = new QuantityProbability(JDKRandomGenerator.class, 0, NormalDistribution.class, initMean, initStd);
featureFactors = DenseMatrix.valueOf(featureSize, factorSize);
featureFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
/**
* 获取特征向量
*
* <pre>
* 实际为One Hot Encoding(一位有效编码)
* 详细原理与使用参考:http://blog.csdn.net/pipisorry/article/details/61193868
* </pre>
*
* @param featureIndexes
* @return
*/
protected MathVector getFeatureVector(DataInstance instance) {
int orderSize = instance.getQualityOrder();
int[] keys = new int[orderSize];
int cursor = 0;
for (int orderIndex = 0; orderIndex < orderSize; orderIndex++) {
keys[orderIndex] += cursor + instance.getQualityFeature(orderIndex);
cursor += dimensionSizes[orderIndex];
}
ArrayVector vector = new ArrayVector(featureSize, keys);
vector.setValues(1F);
return vector;
}
/**
* Predict the rating given a sparse appender vector.
*
* @param userIndex user Id
* @param itemIndex item Id
* @param featureVector the given vector to predict.
*
* @return predicted rating
* @throws ModelException if error occurs
*/
protected float predict(DefaultScalar scalar, MathVector featureVector) {
float value = 0;
// global bias
value += globalBias;
// 1-way interaction
value += scalar.dotProduct(weightVector, featureVector).getValue();
// 2-way interaction
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float scoreSum = 0F;
float predictSum = 0F;
for (VectorScalar vectorTerm : featureVector) {
float featureValue = vectorTerm.getValue();
int featureIndex = vectorTerm.getIndex();
float predictValue = featureFactors.getValue(featureIndex, factorIndex);
scoreSum += predictValue * featureValue;
predictSum += predictValue * predictValue * featureValue * featureValue;
}
value += (scoreSum * scoreSum - predictSum) / 2F;
}
return value;
}
@Override
public void predict(DataInstance instance) {
DefaultScalar scalar = DefaultScalar.getInstance();
// TODO 暂时不支持连续特征,考虑将连续特征离散化.
MathVector featureVector = getFeatureVector(instance);
instance.setQuantityMark(predict(scalar, featureVector));
}
}

View File

@ -0,0 +1,167 @@
package com.jstarcraft.rns.model;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.math3.distribution.NormalDistribution;
import org.apache.commons.math3.random.JDKRandomGenerator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.probability.QuantityProbability;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
* 矩阵分解推荐器
*
* @author Birdy
*
*/
public abstract class MatrixFactorizationModel extends EpocheModel {
/** 是否自动调整学习率 */
protected boolean isLearned;
/** 衰减率 */
protected float learnDecay;
/**
* learn rate, maximum learning rate
*/
protected float learnRatio, learnLimit;
/**
* user latent factors
*/
protected DenseMatrix userFactors;
/**
* item latent factors
*/
protected DenseMatrix itemFactors;
/**
* the number of latent factors;
*/
protected int factorSize;
/**
* user regularization
*/
protected float userRegularization;
/**
* item regularization
*/
protected float itemRegularization;
/**
* init mean
*/
protected float initMean;
/**
* init standard deviation
*/
protected float initStd;
protected QuantityProbability distribution;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userRegularization = configuration.getFloat("recommender.user.regularization", 0.01f);
itemRegularization = configuration.getFloat("recommender.item.regularization", 0.01f);
factorSize = configuration.getInteger("recommender.factor.number", 10);
isLearned = configuration.getBoolean("recommender.learnrate.bolddriver", false);
learnDecay = configuration.getFloat("recommender.learnrate.decay", 1.0f);
learnRatio = configuration.getFloat("recommender.iterator.learnrate", 0.01f);
learnLimit = configuration.getFloat("recommender.iterator.learnrate.maximum", 1000.0f);
// TODO 此处需要重构
initMean = configuration.getFloat("recommender.init.mean", 0F);
initStd = configuration.getFloat("recommender.init.std", 0.1F);
distribution = new QuantityProbability(JDKRandomGenerator.class, 0, NormalDistribution.class, initMean, initStd);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
protected float predict(int userIndex, int itemIndex) {
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
DefaultScalar scalar = DefaultScalar.getInstance();
return scalar.dotProduct(userVector, itemVector).getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
/**
* Update current learning rate after each epoch <br>
* <ol>
* <li>bold driver: Gemulla et al., Large-scale matrix factorization with
* distributed stochastic gradient descent, KDD 2011.</li>
* <li>constant decay: Niu et al, Hogwild!: A lock-free approach to
* parallelizing stochastic gradient descent, NIPS 2011.</li>
* <li>Leon Bottou, Stochastic Gradient Descent Tricks</li>
* <li>more ways to adapt learning rate can refer to:
* http://www.willamette.edu/~gorr/classes/cs449/momrate.html</li>
* </ol>
*
* @param iteration the current iteration
*/
protected void isLearned(int iteration) {
if (learnRatio < 0F) {
return;
}
if (isLearned && iteration > 1) {
learnRatio = Math.abs(currentError) > Math.abs(totalError) ? learnRatio * 1.05F : learnRatio * 0.5F;
} else if (learnDecay > 0 && learnDecay < 1) {
learnRatio *= learnDecay;
}
// limit to max-learn-rate after update
if (learnLimit > 0 && learnRatio > learnLimit) {
learnRatio = learnLimit;
}
}
@Deprecated
// TODO 此方法准备取消,利用向量的有序性代替
protected List<IntSet> getUserItemSet(SparseMatrix sparseMatrix) {
List<IntSet> userItemSet = new ArrayList<>(userSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = sparseMatrix.getRowVector(userIndex);
IntSet indexes = new IntOpenHashSet();
for (int position = 0, size = userVector.getElementSize(); position < size; position++) {
indexes.add(userVector.getIndex(position));
}
userItemSet.add(indexes);
}
return userItemSet;
}
}

View File

@ -0,0 +1,52 @@
package com.jstarcraft.rns.model;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.core.common.option.Option;
/**
* 推荐器
*
* <pre>
* 注意区分每个阶段的职责:
* 准备阶段关注数据,负责根据算法转换数据;
* 训练阶段关注参数,负责根据参数获得模型;
* 预测阶段关注模型,负责根据模型预测得分;
* </pre>
*
* @author Birdy
*
*/
public interface Model {
/**
* 准备
*
* @param configurator
*/
void prepare(Option configurator, DataModule module, DataSpace space);
// void prepare(Configuration configuration, SparseTensor trainTensor,
// SparseTensor testTensor, DataSpace storage);
/**
* 训练
*
* @param trainTensor
* @param testTensor
* @param contextModels
*/
void practice();
/**
* 预测
*
* @param userIndex
* @param itemIndex
* @param featureIndexes
* @return
*/
void predict(DataInstance instance);
// double predict(int userIndex, int itemIndex, int... featureIndexes);
}

View File

@ -0,0 +1,100 @@
package com.jstarcraft.rns.model;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.linalg.api.ndarray.INDArray;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.core.common.option.Option;
/**
* 神经网络推荐器
*
* @author Birdy
*
*/
public abstract class NeuralNetworkModel extends EpocheModel {
/**
* the dimension of input units
*/
protected int inputDimension;
/**
* the dimension of hidden units
*/
protected int hiddenDimension;
/**
* the activation function of the hidden layer in the neural network
*/
protected String hiddenActivation;
/**
* the activation function of the output layer in the neural network
*/
protected String outputActivation;
/**
* the learning rate of the optimization algorithm
*/
protected float learnRatio;
/**
* the momentum of the optimization algorithm
*/
protected float momentum;
/**
* the regularization coefficient of the weights in the neural network
*/
protected float weightRegularization;
/**
* the data structure that stores the training data
*/
protected INDArray inputData;
/**
* the data structure that stores the predicted data
*/
protected INDArray outputData;
protected MultiLayerNetwork network;
protected abstract int getInputDimension();
protected abstract MultiLayerConfiguration getNetworkConfiguration();
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
inputDimension = getInputDimension();
hiddenDimension = configuration.getInteger("recommender.hidden.dimension");
hiddenActivation = configuration.getString("recommender.hidden.activation");
outputActivation = configuration.getString("recommender.output.activation");
learnRatio = configuration.getFloat("recommender.iterator.learnrate");
momentum = configuration.getFloat("recommender.iterator.momentum");
weightRegularization = configuration.getFloat("recommender.weight.regularization");
}
@Override
protected void doPractice() {
MultiLayerConfiguration configuration = getNetworkConfiguration();
network = new MultiLayerNetwork(configuration);
network.init();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
network.fit(inputData, inputData);
totalError = (float) network.score();
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
outputData = network.output(inputData);
}
}

View File

@ -0,0 +1,149 @@
package com.jstarcraft.rns.model;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.StringUtility;
import com.jstarcraft.rns.model.exception.ModelException;
import it.unimi.dsi.fastutil.floats.Float2IntLinkedOpenHashMap;
import it.unimi.dsi.fastutil.floats.FloatRBTreeSet;
import it.unimi.dsi.fastutil.floats.FloatSet;
/**
* 概率图推荐器
*
* @author Birdy
*
*/
public abstract class ProbabilisticGraphicalModel extends EpocheModel {
/**
* burn-in period
*/
protected int burnIn;
/**
* size of statistics
*/
protected int numberOfStatistics = 0;
/**
* number of topics
*/
protected int factorSize;
/** 分数索引 (TODO 考虑取消或迁移.本质为连续特征离散化) */
protected Float2IntLinkedOpenHashMap scoreIndexes;
protected int scoreSize;
/**
* sample lag (if -1 only one sample taken)
*/
protected int sampleSize;
/**
* setup init member method
*
* @throws ModelException if error occurs during setting up
*/
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
factorSize = configuration.getInteger("recommender.topic.number", 10);
burnIn = configuration.getInteger("recommender.pgm.burnin", 100);
sampleSize = configuration.getInteger("recommender.pgm.samplelag", 100);
// TODO 此处会与scoreIndexes一起重构,本质为连续特征离散化.
FloatSet scores = new FloatRBTreeSet();
for (MatrixScalar term : scoreMatrix) {
scores.add(term.getValue());
}
scores.remove(0F);
scoreIndexes = new Float2IntLinkedOpenHashMap();
int index = 0;
for (float score : scores) {
scoreIndexes.put(score, index++);
}
scoreSize = scoreIndexes.size();
}
@Override
protected void doPractice() {
long now = System.currentTimeMillis();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// E-step: infer parameters
eStep();
if (logger.isInfoEnabled()) {
String message = StringUtility.format("eStep time is {}", System.currentTimeMillis() - now);
now = System.currentTimeMillis();
logger.info(message);
}
// M-step: update hyper-parameters
mStep();
if (logger.isInfoEnabled()) {
String message = StringUtility.format("mStep time is {}", System.currentTimeMillis() - now);
now = System.currentTimeMillis();
logger.info(message);
}
// get statistics after burn-in
if ((epocheIndex > burnIn) && (epocheIndex % sampleSize == 0)) {
readoutParameters();
if (logger.isInfoEnabled()) {
String message = StringUtility.format("readoutParams time is {}", System.currentTimeMillis() - now);
now = System.currentTimeMillis();
logger.info(message);
}
estimateParameters();
if (logger.isInfoEnabled()) {
String message = StringUtility.format("estimateParams time is {}", System.currentTimeMillis() - now);
now = System.currentTimeMillis();
logger.info(message);
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
// retrieve posterior probability distributions
estimateParameters();
if (logger.isInfoEnabled()) {
String message = StringUtility.format("estimateParams time is {}", System.currentTimeMillis() - now);
now = System.currentTimeMillis();
logger.info(message);
}
}
protected boolean isConverged(int iter) {
return false;
}
/**
* parameters estimation: used in the training phase
*/
protected abstract void eStep();
/**
* update the hyper-parameters
*/
protected abstract void mStep();
/**
* read out parameters for each iteration
*/
protected void readoutParameters() {
}
/**
* estimate the model parameters
*/
protected void estimateParameters() {
}
}

View File

@ -0,0 +1,87 @@
package com.jstarcraft.rns.model;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.core.common.option.Option;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
* 社交推荐器
*
* <pre>
* 注意:基缘,是指构成人际关系的最基本的因素,包括血缘,地缘,业缘,趣缘.
* 实际业务使用过程中要注意人与人之间社区关系(趣缘)与社会关系(血缘,地缘,业缘)的区分.
* </pre>
*
* @author Birdy
*
*/
public abstract class SocialModel extends MatrixFactorizationModel {
protected String trusterField, trusteeField, coefficientField;
protected int trusterDimension, trusteeDimension, coefficientDimension;
/**
* socialMatrix: social rate matrix, indicating a user is connecting to a number
* of other users
*/
protected SparseMatrix socialMatrix;
/**
* social regularization
*/
protected float socialRegularization;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
socialRegularization = configuration.getFloat("recommender.social.regularization", 0.01f);
// social path for the socialMatrix
// TODO 此处是不是应该使用context.getSimilarity().getSimilarityMatrix();代替?
DataModule socialModel = space.getModule("social");
// TODO 此处需要重构,trusterDimension与trusteeDimension要配置
coefficientField = configuration.getString("data.model.fields.coefficient");
trusterDimension = socialModel.getQualityInner(userField) + 0;
trusteeDimension = socialModel.getQualityInner(userField) + 1;
coefficientDimension = socialModel.getQuantityInner(coefficientField);
HashMatrix matrix = new HashMatrix(true, userSize, userSize, new Long2FloatRBTreeMap());
for (DataInstance instance : socialModel) {
matrix.setValue(instance.getQualityFeature(trusterDimension), instance.getQualityFeature(trusteeDimension), instance.getQuantityFeature(coefficientDimension));
}
socialMatrix = SparseMatrix.valueOf(userSize, userSize, matrix);
}
/**
* 逆态化
*
* <pre>
* 把数值从(0,1)转换为(minimumOfScore,maximumOfScore)
* </pre>
*
* @param value
* @return
*/
protected float denormalize(float value) {
return minimumScore + value * (maximumScore - minimumScore);
}
/**
* 正态化
*
* <pre>
* 把数值从(minimumOfScore,maximumOfScore)转换为(0,1)
* </pre>
*
* @param value
* @return
*/
protected float normalize(float value) {
return (value - minimumScore) / (maximumScore - minimumScore);
}
}

View File

@ -0,0 +1,34 @@
package com.jstarcraft.rns.model.benchmark;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* Random Guess推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "userDimension", "itemDimension", "numberOfItems", "minimumOfScore", "maximumOfScore" })
public class RandomGuessModel extends AbstractModel {
@Override
protected void doPractice() {
}
@Override
public synchronized void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
RandomUtility.setSeed(userIndex * itemSize + itemIndex);
instance.setQuantityMark(RandomUtility.randomFloat(minimumScore, maximumScore));
}
}

View File

@ -0,0 +1,45 @@
package com.jstarcraft.rns.model.benchmark.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* Most Popular推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "itemDimension", "populars" })
public class MostPopularModel extends AbstractModel {
private int[] populars;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
populars = new int[itemSize];
}
@Override
protected void doPractice() {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
populars[itemIndex] = scoreMatrix.getColumnScope(itemIndex);
}
}
@Override
public void predict(DataInstance instance) {
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(populars[itemIndex]);
}
}

View File

@ -0,0 +1,44 @@
package com.jstarcraft.rns.model.benchmark.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* Constant Guess推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "constant" })
public class ConstantGuessModel extends AbstractModel {
private float constant;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// 默认使用最高最低分的平均值
constant = (minimumScore + maximumScore) / 2F;
// TODO 支持配置分数
constant = configuration.getFloat("recommend.constant-guess.score", constant);
}
@Override
protected void doPractice() {
}
@Override
public void predict(DataInstance instance) {
instance.setQuantityMark(constant);
}
}

View File

@ -0,0 +1,30 @@
package com.jstarcraft.rns.model.benchmark.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* Global Average推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "meanOfScore" })
public class GlobalAverageModel extends AbstractModel {
@Override
protected void doPractice() {
}
@Override
public void predict(DataInstance instance) {
instance.setQuantityMark(meanScore);
}
}

View File

@ -0,0 +1,48 @@
package com.jstarcraft.rns.model.benchmark.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* Item Average推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "itemDimension", "itemMeans" })
public class ItemAverageModel extends AbstractModel {
/** 物品平均分数 */
private float[] itemMeans;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
itemMeans = new float[itemSize];
}
@Override
protected void doPractice() {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
itemMeans[itemIndex] = itemVector.getElementSize() == 0 ? meanScore : itemVector.getSum(false) / itemVector.getElementSize();
}
}
@Override
public void predict(DataInstance instance) {
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(itemMeans[itemIndex]);
}
}

View File

@ -0,0 +1,183 @@
package com.jstarcraft.rns.model.benchmark.rating;
import java.util.Map.Entry;
import org.apache.commons.math3.util.FastMath;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
/**
*
* Item Cluster推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "userDimension", "itemDimension", "itemTopicProbabilities", "numberOfFactors", "scoreIndexes", "topicScoreMatrix" })
public class ItemClusterModel extends ProbabilisticGraphicalModel {
/** 物品的每评分次数 */
private DenseMatrix itemScoreMatrix; // Nur
/** 物品的总评分次数 */
private DenseVector itemScoreVector; // Nu
/** 主题的每评分概率 */
private DenseMatrix topicScoreMatrix; // Pkr
/** 主题的总评分概率 */
private DenseVector topicScoreVector; // Pi
/** 物品主题概率映射 */
private DenseMatrix itemTopicProbabilities; // Gamma_(u,k)
@Override
protected boolean isConverged(int iter) {
// TODO 需要重构
float loss = 0F;
for (int i = 0; i < itemSize; i++) {
for (int k = 0; k < factorSize; k++) {
float rik = itemTopicProbabilities.getValue(i, k);
float pi_k = topicScoreVector.getValue(k);
float sum_nl = 0F;
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
float nir = itemScoreMatrix.getValue(i, scoreIndex);
float pkr = topicScoreMatrix.getValue(k, scoreIndex);
sum_nl += nir * Math.log(pkr);
}
loss += rik * (Math.log(pi_k) + sum_nl);
}
}
float deltaLoss = (float) (loss - currentError);
if (iter > 1 && (deltaLoss > 0 || Float.isNaN(deltaLoss))) {
return true;
}
currentError = loss;
return false;
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
topicScoreMatrix = DenseMatrix.valueOf(factorSize, scoreSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = topicScoreMatrix.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
float value = scalar.getValue();
scalar.setValue(RandomUtility.randomInteger(scoreSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
topicScoreVector = DenseVector.valueOf(factorSize);
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(factorSize) + 1);
});
topicScoreVector.scaleValues(1F / topicScoreVector.getSum(false));
// TODO
topicScoreMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((float) Math.log(scalar.getValue()));
});
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((float) Math.log(scalar.getValue()));
});
itemScoreMatrix = DenseMatrix.valueOf(itemSize, scoreSize);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector scoreVector = scoreMatrix.getColumnVector(itemIndex);
for (VectorScalar term : scoreVector) {
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score);
itemScoreMatrix.shiftValue(itemIndex, scoreIndex, 1);
}
}
itemScoreVector = DenseVector.valueOf(itemSize);
itemScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(scoreMatrix.getColumnVector(scalar.getIndex()).getElementSize());
});
currentError = Float.MIN_VALUE;
itemTopicProbabilities = DenseMatrix.valueOf(itemSize, factorSize);
}
@Override
protected void eStep() {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
DenseVector probabilityVector = itemTopicProbabilities.getRowVector(itemIndex);
SparseVector scoreVector = scoreMatrix.getColumnVector(itemIndex);
if (scoreVector.getElementSize() == 0) {
probabilityVector.copyVector(topicScoreVector);
} else {
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float topicProbability = topicScoreVector.getValue(index);
for (VectorScalar term : scoreVector) {
int scoreIndex = scoreIndexes.get(term.getValue());
float scoreProbability = topicScoreMatrix.getValue(index, scoreIndex);
topicProbability = topicProbability + scoreProbability;
}
scalar.setValue(topicProbability);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
}
}
@Override
protected void mStep() {
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
float numerator = 0F, denorminator = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float probability = (float) FastMath.exp(itemTopicProbabilities.getValue(itemIndex, index));
numerator += probability * itemScoreMatrix.getValue(itemIndex, scoreIndex);
denorminator += probability * itemScoreVector.getValue(itemIndex);
}
float probability = (numerator / denorminator);
topicScoreMatrix.setValue(index, scoreIndex, probability);
}
float sumProbability = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float probability = (float) FastMath.exp(itemTopicProbabilities.getValue(itemIndex, index));
sumProbability += probability;
}
scalar.setValue(sumProbability);
});
topicScoreVector.scaleValues(1F / topicScoreVector.getSum(false));
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float topicProbability = itemTopicProbabilities.getValue(itemIndex, topicIndex); // probability
float topicValue = 0F;
for (Entry<Float, Integer> entry : scoreIndexes.entrySet()) {
float score = entry.getKey();
float probability = topicScoreMatrix.getValue(topicIndex, entry.getValue());
topicValue += score * probability;
}
value += topicProbability * topicValue;
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,48 @@
package com.jstarcraft.rns.model.benchmark.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.AbstractModel;
/**
*
* User Average推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "userDimension", "userMeans" })
public class UserAverageModel extends AbstractModel {
/** 用户平均分数 */
private float[] userMeans;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userMeans = new float[userSize];
}
@Override
protected void doPractice() {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
userMeans[userIndex] = userVector.getElementSize() == 0 ? meanScore : userVector.getSum(false) / userVector.getElementSize();
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
instance.setQuantityMark(userMeans[userIndex]);
}
}

View File

@ -0,0 +1,186 @@
package com.jstarcraft.rns.model.benchmark.rating;
import java.util.Map.Entry;
import org.apache.commons.math3.util.FastMath;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.ai.modem.ModemDefinition;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
/**
*
* User Cluster推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
@ModemDefinition(value = { "userDimension", "itemDimension", "userTopicProbabilities", "numberOfFactors", "scoreIndexes", "topicScoreMatrix" })
public class UserClusterModel extends ProbabilisticGraphicalModel {
/** 用户的每评分次数 */
private DenseMatrix userScoreMatrix; // Nur
/** 用户的总评分次数 */
private DenseVector userScoreVector; // Nu
/** 主题的每评分概率 */
private DenseMatrix topicScoreMatrix; // Pkr
/** 主题的总评分概率 */
private DenseVector topicScoreVector; // Pi
/** 用户主题概率映射 */
private DenseMatrix userTopicProbabilities; // Gamma_(u,k)
@Override
protected boolean isConverged(int iter) {
// TODO 需要重构
float loss = 0F;
for (int u = 0; u < userSize; u++) {
for (int k = 0; k < factorSize; k++) {
float ruk = userTopicProbabilities.getValue(u, k);
float pi_k = topicScoreVector.getValue(k);
float sum_nl = 0F;
for (int r = 0; r < scoreIndexes.size(); r++) {
float nur = userScoreMatrix.getValue(u, r);
float pkr = topicScoreMatrix.getValue(k, r);
sum_nl += nur * Math.log(pkr);
}
loss += ruk * (Math.log(pi_k) + sum_nl);
}
}
float deltaLoss = (float) (loss - currentError);
if (iter > 1 && (deltaLoss > 0 || Float.isNaN(deltaLoss))) {
return true;
}
currentError = loss;
return false;
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
topicScoreMatrix = DenseMatrix.valueOf(factorSize, scoreSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = topicScoreMatrix.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(scoreSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
topicScoreVector = DenseVector.valueOf(factorSize);
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(factorSize) + 1);
});
topicScoreVector.scaleValues(1F / topicScoreVector.getSum(false));
// TODO
topicScoreMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((float) Math.log(scalar.getValue()));
});
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((float) Math.log(scalar.getValue()));
});
userScoreMatrix = DenseMatrix.valueOf(userSize, scoreSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector scoreVector = scoreMatrix.getRowVector(userIndex);
for (VectorScalar term : scoreVector) {
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score);
userScoreMatrix.shiftValue(userIndex, scoreIndex, 1);
}
}
userScoreVector = DenseVector.valueOf(userSize);
userScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(scoreMatrix.getRowVector(scalar.getIndex()).getElementSize());
});
currentError = Float.MIN_VALUE;
userTopicProbabilities = DenseMatrix.valueOf(userSize, factorSize);
}
@Override
protected void eStep() {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
DenseVector probabilityVector = userTopicProbabilities.getRowVector(userIndex);
SparseVector scoreVector = scoreMatrix.getRowVector(userIndex);
if (scoreVector.getElementSize() == 0) {
probabilityVector.copyVector(topicScoreVector);
} else {
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float topicProbability = topicScoreVector.getValue(index);
for (VectorScalar term : scoreVector) {
int scoreIndex = scoreIndexes.get(term.getValue());
float scoreProbability = topicScoreMatrix.getValue(index, scoreIndex);
topicProbability = topicProbability + scoreProbability;
}
scalar.setValue(topicProbability);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
}
}
@Override
protected void mStep() {
topicScoreVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
float numerator = 0F, denorminator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
float probability = (float) FastMath.exp(userTopicProbabilities.getValue(userIndex, index));
numerator += probability * userScoreMatrix.getValue(userIndex, scoreIndex);
denorminator += probability * userScoreVector.getValue(userIndex);
}
float probability = (numerator / denorminator);
topicScoreMatrix.setValue(index, scoreIndex, probability);
}
float sumProbability = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
float probability = (float) FastMath.exp(userTopicProbabilities.getValue(userIndex, index));
sumProbability += probability;
}
scalar.setValue(sumProbability);
});
topicScoreVector.scaleValues(1F / topicScoreVector.getSum(false));
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float topicProbability = userTopicProbabilities.getValue(userIndex, topicIndex);
float topicValue = 0F;
for (Entry<Float, Integer> entry : scoreIndexes.entrySet()) {
float score = entry.getKey();
float probability = topicScoreMatrix.getValue(topicIndex, entry.getValue());
topicValue += score * probability;
}
value += topicProbability * topicValue;
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,277 @@
package com.jstarcraft.rns.model.collaborative;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.MathCell;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.table.SparseTable;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2ObjectRBTreeMap;
/**
*
* BH Free推荐器
*
* <pre>
* Balancing Prediction and Recommendation Accuracy: Hierarchical Latent Factors for Preference Data
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public abstract class BHFreeModel extends ProbabilisticGraphicalModel {
private static class TopicTerm {
private int userTopic;
private int itemTopic;
private int scoreIndex;
private TopicTerm(int userTopic, int itemTopic, int scoreIndex) {
this.userTopic = userTopic;
this.itemTopic = itemTopic;
this.scoreIndex = scoreIndex;
}
void update(int userTopic, int itemTopic) {
this.userTopic = userTopic;
this.itemTopic = itemTopic;
}
public int getUserTopic() {
return userTopic;
}
public int getItemTopic() {
return itemTopic;
}
public int getScoreIndex() {
return scoreIndex;
}
}
private SparseTable<TopicTerm> topicMatrix;
private float initGamma, initSigma, initAlpha, initBeta;
/**
* number of user communities
*/
protected int userTopicSize; // K
/**
* number of item categories
*/
protected int itemTopicSize; // L
/**
* evaluation of the user u which have been assigned to the user topic k
*/
private DenseMatrix user2TopicNumbers;
/**
* observations for the user
*/
private DenseVector userNumbers;
/**
* observations associated with community k
*/
private DenseVector userTopicNumbers;
/**
* number of user communities * number of topics
*/
private DenseMatrix userTopic2ItemTopicNumbers; // Nkl
/**
* number of user communities * number of topics * number of ratings
*/
private int[][][] userTopic2ItemTopicScoreNumbers, userTopic2ItemTopicItemNumbers; // Nklr,
// Nkli;
// parameters
protected DenseMatrix user2TopicProbabilities, userTopic2ItemTopicProbabilities;
protected DenseMatrix user2TopicSums, userTopic2ItemTopicSums;
protected double[][][] userTopic2ItemTopicScoreProbabilities, userTopic2ItemTopicItemProbabilities;
protected double[][][] userTopic2ItemTopicScoreSums, userTopic2ItemTopicItemSums;
private DenseMatrix topicProbabilities;
private DenseVector userProbabilities;
private DenseVector itemProbabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userTopicSize = configuration.getInteger("recommender.bhfree.user.topic.number", 10);
itemTopicSize = configuration.getInteger("recommender.bhfree.item.topic.number", 10);
initAlpha = configuration.getFloat("recommender.bhfree.alpha", 1.0f / userTopicSize);
initBeta = configuration.getFloat("recommender.bhfree.beta", 1.0f / itemTopicSize);
initGamma = configuration.getFloat("recommender.bhfree.gamma", 1.0f / scoreSize);
initSigma = configuration.getFloat("recommender.sigma", 1.0f / itemSize);
scoreSize = scoreIndexes.size();
// TODO 考虑重构(整合为UserTopic对象)
user2TopicNumbers = DenseMatrix.valueOf(userSize, userTopicSize);
userNumbers = DenseVector.valueOf(userSize);
userTopic2ItemTopicNumbers = DenseMatrix.valueOf(userTopicSize, itemTopicSize);
userTopicNumbers = DenseVector.valueOf(userTopicSize);
userTopic2ItemTopicScoreNumbers = new int[userTopicSize][itemTopicSize][scoreSize];
userTopic2ItemTopicItemNumbers = new int[userTopicSize][itemTopicSize][itemSize];
topicMatrix = new SparseTable<>(true, userSize, itemSize, new Int2ObjectRBTreeMap<>());
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score);
int userTopic = RandomUtility.randomInteger(userTopicSize); // user's
// topic
// k
int itemTopic = RandomUtility.randomInteger(itemTopicSize); // item's
// topic
// l
user2TopicNumbers.shiftValue(userIndex, userTopic, 1F);
userNumbers.shiftValue(userIndex, 1F);
userTopic2ItemTopicNumbers.shiftValue(userTopic, itemTopic, 1F);
userTopicNumbers.shiftValue(userTopic, 1F);
userTopic2ItemTopicScoreNumbers[userTopic][itemTopic][scoreIndex]++;
userTopic2ItemTopicItemNumbers[userTopic][itemTopic][itemIndex]++;
TopicTerm topic = new TopicTerm(userTopic, itemTopic, scoreIndex);
topicMatrix.setValue(userIndex, itemIndex, topic);
}
// parameters
// TODO 考虑重构为一个对象
user2TopicSums = DenseMatrix.valueOf(userSize, userTopicSize);
userTopic2ItemTopicSums = DenseMatrix.valueOf(userTopicSize, itemTopicSize);
userTopic2ItemTopicScoreSums = new double[userTopicSize][itemTopicSize][scoreSize];
userTopic2ItemTopicScoreProbabilities = new double[userTopicSize][itemTopicSize][scoreSize];
userTopic2ItemTopicItemSums = new double[userTopicSize][itemTopicSize][itemSize];
userTopic2ItemTopicItemProbabilities = new double[userTopicSize][itemTopicSize][itemSize];
topicProbabilities = DenseMatrix.valueOf(userTopicSize, itemTopicSize);
userProbabilities = DenseVector.valueOf(userTopicSize);
itemProbabilities = DenseVector.valueOf(itemTopicSize);
}
@Override
protected void eStep() {
for (MathCell<TopicTerm> term : topicMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
TopicTerm topicTerm = term.getValue();
int scoreIndex = topicTerm.getScoreIndex();
int userTopic = topicTerm.getUserTopic();
int itemTopic = topicTerm.getItemTopic();
user2TopicNumbers.shiftValue(userIndex, userTopic, -1F);
userNumbers.shiftValue(userIndex, -1F);
userTopic2ItemTopicNumbers.shiftValue(userTopic, itemTopic, -1F);
userTopicNumbers.shiftValue(userTopic, -1F);
userTopic2ItemTopicScoreNumbers[userTopic][itemTopic][scoreIndex]--;
userTopic2ItemTopicItemNumbers[userTopic][itemTopic][itemIndex]--;
// normalization
int userTopicIndex = userTopic;
int itemTopicIndex = itemTopic;
topicProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
float value = (user2TopicNumbers.getValue(userIndex, userTopicIndex) + initAlpha) / (userNumbers.getValue(userIndex) + userTopicSize * initAlpha);
value *= (userTopic2ItemTopicNumbers.getValue(userTopicIndex, itemTopicIndex) + initBeta) / (userTopicNumbers.getValue(userTopicIndex) + itemTopicSize * initBeta);
value *= (userTopic2ItemTopicScoreNumbers[userTopicIndex][itemTopicIndex][scoreIndex] + initGamma) / (userTopic2ItemTopicNumbers.getValue(userTopicIndex, itemTopicIndex) + scoreSize * initGamma);
value *= (userTopic2ItemTopicItemNumbers[userTopicIndex][itemTopicIndex][itemIndex] + initSigma) / (userTopic2ItemTopicNumbers.getValue(userTopicIndex, itemTopicIndex) + itemSize * initSigma);
scalar.setValue(value);
});
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
userProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = topicProbabilities.getRowVector(index).getSum(false);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
userTopic = SampleUtility.binarySearch(userProbabilities, 0, userProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
sum.setValue(0F);
itemProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = topicProbabilities.getColumnVector(index).getSum(false);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
itemTopic = SampleUtility.binarySearch(itemProbabilities, 0, itemProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
topicTerm.update(userTopic, itemTopic);
// add statistic
user2TopicNumbers.shiftValue(userIndex, userTopic, 1F);
userNumbers.shiftValue(userIndex, 1F);
userTopic2ItemTopicNumbers.shiftValue(userTopic, itemTopic, 1F);
userTopicNumbers.shiftValue(userTopic, 1F);
userTopic2ItemTopicScoreNumbers[userTopic][itemTopic][scoreIndex]++;
userTopic2ItemTopicItemNumbers[userTopic][itemTopic][itemIndex]++;
}
}
@Override
protected void mStep() {
}
@Override
protected void readoutParameters() {
for (int userTopic = 0; userTopic < userTopicSize; userTopic++) {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
user2TopicSums.shiftValue(userIndex, userTopic, (user2TopicNumbers.getValue(userIndex, userTopic) + initAlpha) / (userNumbers.getValue(userIndex) + userTopicSize * initAlpha));
}
for (int itemTopic = 0; itemTopic < itemTopicSize; itemTopic++) {
userTopic2ItemTopicSums.shiftValue(userTopic, itemTopic, (userTopic2ItemTopicNumbers.getValue(userTopic, itemTopic) + initBeta) / (userTopicNumbers.getValue(userTopic) + itemTopicSize * initBeta));
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
userTopic2ItemTopicScoreSums[userTopic][itemTopic][scoreIndex] += (userTopic2ItemTopicScoreNumbers[userTopic][itemTopic][scoreIndex] + initGamma) / (userTopic2ItemTopicNumbers.getValue(userTopic, itemTopic) + scoreSize * initGamma);
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
userTopic2ItemTopicItemSums[userTopic][itemTopic][itemIndex] += (userTopic2ItemTopicItemNumbers[userTopic][itemTopic][itemIndex] + initSigma) / (userTopic2ItemTopicNumbers.getValue(userTopic, itemTopic) + itemSize * initSigma);
}
}
}
numberOfStatistics++;
}
@Override
protected void estimateParameters() {
float scale = 1F / numberOfStatistics;
user2TopicProbabilities = DenseMatrix.copyOf(user2TopicSums);
user2TopicProbabilities.scaleValues(scale);
userTopic2ItemTopicProbabilities = DenseMatrix.copyOf(userTopic2ItemTopicSums);
userTopic2ItemTopicProbabilities.scaleValues(scale);
for (int userTopic = 0; userTopic < userTopicSize; userTopic++) {
for (int itemTopic = 0; itemTopic < itemTopicSize; itemTopic++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
userTopic2ItemTopicScoreProbabilities[userTopic][itemTopic][scoreIndex] = userTopic2ItemTopicScoreSums[userTopic][itemTopic][scoreIndex] * scale;
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
userTopic2ItemTopicItemProbabilities[userTopic][itemTopic][itemIndex] = userTopic2ItemTopicItemSums[userTopic][itemTopic][itemIndex] * scale;
}
}
}
}
}

View File

@ -0,0 +1,370 @@
package com.jstarcraft.rns.model.collaborative;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.GammaUtility;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap;
/**
*
* BUCM推荐器
*
* <pre>
* Bayesian User Community Model
* Modeling Item Selection and Relevance for Accurate Recommendations
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public abstract class BUCMModel extends ProbabilisticGraphicalModel {
/**
* number of occurrences of entry (t, i, r)
*/
private int[][][] topicItemScoreNumbers;
/**
* number of occurrentces of entry (user, topic)
*/
private DenseMatrix userTopicNumbers;
/**
* number of occurences of users
*/
private DenseVector userNumbers;
/**
* number of occurrences of entry (topic, item)
*/
private DenseMatrix topicItemNumbers;
/**
* number of occurrences of items
*/
private DenseVector topicNumbers;
/**
* cumulative statistics of probabilities of (t, i, r)
*/
private float[][][] topicItemScoreSums;
/**
* posterior probabilities of parameters epsilon_{k, i, r}
*/
protected float[][][] topicItemScoreProbabilities;
/**
* P(k | u)
*/
protected DenseMatrix userTopicProbabilities, userTopicSums;
/**
* P(i | k)
*/
protected DenseMatrix topicItemProbabilities, topicItemSums;
/**
*
*/
private DenseVector alpha;
/**
*
*/
private DenseVector beta;
/**
*
*/
private DenseVector gamma;
/**
*
*/
protected Int2IntRBTreeMap topicAssignments;
private DenseVector probabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// cumulative parameters
// TODO 考虑重构
userTopicSums = DenseMatrix.valueOf(userSize, factorSize);
topicItemSums = DenseMatrix.valueOf(factorSize, itemSize);
topicItemScoreSums = new float[factorSize][itemSize][scoreSize];
// initialize count varialbes
userTopicNumbers = DenseMatrix.valueOf(userSize, factorSize);
userNumbers = DenseVector.valueOf(userSize);
topicItemNumbers = DenseMatrix.valueOf(factorSize, itemSize);
topicNumbers = DenseVector.valueOf(factorSize);
topicItemScoreNumbers = new int[factorSize][itemSize][scoreSize];
float initAlpha = configuration.getFloat("recommender.bucm.alpha", 1F / factorSize);
alpha = DenseVector.valueOf(factorSize);
alpha.setValues(initAlpha);
float initBeta = configuration.getFloat("re.bucm.beta", 1F / itemSize);
beta = DenseVector.valueOf(itemSize);
beta.setValues(initBeta);
float initGamma = configuration.getFloat("recommender.bucm.gamma", 1F / factorSize);
gamma = DenseVector.valueOf(scoreSize);
gamma.setValues(initGamma);
// initialize topics
topicAssignments = new Int2IntRBTreeMap();
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score); // rating level 0 ~
// numLevels
int topicIndex = RandomUtility.randomInteger(factorSize); // 0 ~
// k-1
// Assign a topic t to pair (u, i)
topicAssignments.put(userIndex * itemSize + itemIndex, topicIndex);
// for users
userTopicNumbers.shiftValue(userIndex, topicIndex, 1F);
userNumbers.shiftValue(userIndex, 1F);
// for items
topicItemNumbers.shiftValue(topicIndex, itemIndex, 1F);
topicNumbers.shiftValue(topicIndex, 1F);
// for ratings
topicItemScoreNumbers[topicIndex][itemIndex][scoreIndex]++;
}
probabilities = DenseVector.valueOf(factorSize);
}
@Override
protected void eStep() {
float alphaSum = alpha.getSum(false);
float betaSum = beta.getSum(false);
float gammaSum = gamma.getSum(false);
// collapse Gibbs sampling
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score); // rating level 0 ~
// numLevels
int topicIndex = topicAssignments.get(userIndex * itemSize + itemIndex);
// for user
userTopicNumbers.shiftValue(userIndex, topicIndex, -1F);
userNumbers.shiftValue(userIndex, -1F);
// for item
topicItemNumbers.shiftValue(topicIndex, itemIndex, -1F);
topicNumbers.shiftValue(topicIndex, -1F);
// for rating
topicItemScoreNumbers[topicIndex][itemIndex][scoreIndex]--;
// do multinomial sampling via cumulative method:
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
probabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = (userTopicNumbers.getValue(userIndex, index) + alpha.getValue(index)) / (userNumbers.getValue(userIndex) + alphaSum);
value *= (topicItemNumbers.getValue(index, itemIndex) + beta.getValue(itemIndex)) / (topicNumbers.getValue(index) + betaSum);
value *= (topicItemScoreNumbers[index][itemIndex][scoreIndex] + gamma.getValue(scoreIndex)) / (topicItemNumbers.getValue(index, itemIndex) + gammaSum);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
topicIndex = SampleUtility.binarySearch(probabilities, 0, probabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
// new topic t
topicAssignments.put(userIndex * itemSize + itemIndex, topicIndex);
// add newly estimated z_i to count variables
userTopicNumbers.shiftValue(userIndex, topicIndex, 1F);
userNumbers.shiftValue(userIndex, 1F);
topicItemNumbers.shiftValue(topicIndex, itemIndex, 1F);
topicNumbers.shiftValue(topicIndex, 1F);
topicItemScoreNumbers[topicIndex][itemIndex][scoreIndex]++;
}
}
/**
* Thomas P. Minka, Estimating a Dirichlet distribution, see Eq.(55)
*/
@Override
protected void mStep() {
float denominator;
float value = 0F;
// update alpha
float alphaValue;
float alphaSum = alpha.getSum(false);
float alphaDigamma = GammaUtility.digamma(alphaSum);
denominator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userNumbers.getValue(userIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + alphaSum) - alphaDigamma;
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
alphaValue = alpha.getValue(topicIndex);
alphaDigamma = GammaUtility.digamma(alphaValue);
float numerator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userTopicNumbers.getValue(userIndex, topicIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + alphaValue) - alphaDigamma;
}
}
if (numerator != 0F) {
alpha.setValue(topicIndex, alphaValue * (numerator / denominator));
}
}
// update beta
float betaValue;
float bataSum = beta.getSum(false);
float betaDigamma = GammaUtility.digamma(bataSum);
denominator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicNumbers.getValue(topicIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + bataSum) - betaDigamma;
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
betaValue = beta.getValue(itemIndex);
betaDigamma = GammaUtility.digamma(betaValue);
float numerator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemNumbers.getValue(topicIndex, itemIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + betaValue) - betaDigamma;
}
}
if (numerator != 0F) {
beta.setValue(itemIndex, betaValue * (numerator / denominator));
}
}
// update gamma
float gammaValue;
float gammaSum = gamma.getSum(false);
float gammaDigamma = GammaUtility.digamma(gammaSum);
denominator = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemNumbers.getValue(topicIndex, itemIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + gammaSum) - gammaDigamma;
}
}
}
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
gammaValue = gamma.getValue(scoreIndex);
gammaDigamma = GammaUtility.digamma(gammaValue);
float numerator = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemScoreNumbers[topicIndex][itemIndex][scoreIndex];
if (value != 0F) {
numerator += GammaUtility.digamma(value + gammaValue) - gammaDigamma;
}
}
}
if (numerator != 0F) {
gamma.setValue(scoreIndex, gammaValue * (numerator / denominator));
}
}
}
@Override
protected boolean isConverged(int iter) {
float loss = 0F;
// get params
estimateParameters();
// compute likelihood
int sum = 0;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score);
float probability = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
probability += userTopicProbabilities.getValue(userIndex, topicIndex) * topicItemProbabilities.getValue(topicIndex, itemIndex) * topicItemScoreProbabilities[topicIndex][itemIndex][scoreIndex];
}
loss += (float) -Math.log(probability);
sum++;
}
loss /= sum;
float delta = loss - currentError; // loss gets smaller, delta <= 0
if (numberOfStatistics > 1 && delta > 0) {
return true;
}
currentError = loss;
return false;
}
protected void readoutParameters() {
float value;
float sumAlpha = alpha.getSum(false);
float sumBeta = beta.getSum(false);
float sumGamma = gamma.getSum(false);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = (userTopicNumbers.getValue(userIndex, topicIndex) + alpha.getValue(topicIndex)) / (userNumbers.getValue(userIndex) + sumAlpha);
userTopicSums.shiftValue(userIndex, topicIndex, value);
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
value = (topicItemNumbers.getValue(topicIndex, itemIndex) + beta.getValue(itemIndex)) / (topicNumbers.getValue(topicIndex) + sumBeta);
topicItemSums.shiftValue(topicIndex, itemIndex, value);
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
value = (topicItemScoreNumbers[topicIndex][itemIndex][scoreIndex] + gamma.getValue(scoreIndex)) / (topicItemNumbers.getValue(topicIndex, itemIndex) + sumGamma);
topicItemScoreSums[topicIndex][itemIndex][scoreIndex] += value;
}
}
}
numberOfStatistics++;
}
@Override
protected void estimateParameters() {
userTopicProbabilities = DenseMatrix.copyOf(userTopicSums);
userTopicProbabilities.scaleValues(1F / numberOfStatistics);
topicItemProbabilities = DenseMatrix.copyOf(topicItemSums);
topicItemProbabilities.scaleValues(1F / numberOfStatistics);
topicItemScoreProbabilities = new float[factorSize][itemSize][scoreSize];
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
topicItemScoreProbabilities[topicIndex][itemIndex][scoreIndex] = topicItemScoreSums[topicIndex][itemIndex][scoreIndex] / numberOfStatistics;
}
}
}
}
}

View File

@ -0,0 +1,134 @@
package com.jstarcraft.rns.model.collaborative;
import java.util.Collection;
import java.util.Comparator;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.correlation.MathCorrelation;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.common.reflection.ReflectionUtility;
import com.jstarcraft.core.utility.Integer2FloatKeyValue;
import com.jstarcraft.core.utility.Neighborhood;
import com.jstarcraft.rns.model.AbstractModel;
import it.unimi.dsi.fastutil.ints.Int2FloatMap;
import it.unimi.dsi.fastutil.ints.Int2FloatRBTreeMap;
import it.unimi.dsi.fastutil.ints.Int2FloatSortedMap;
/**
*
* Item KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public abstract class ItemKNNModel extends AbstractModel {
/** 邻居数量 */
private int neighborSize;
protected DenseVector itemMeans;
/**
* item's nearest neighbors for kNN > 0
*/
protected MathVector[] itemNeighbors;
protected SparseVector[] userVectors;
protected SparseVector[] itemVectors;
private Comparator<Integer2FloatKeyValue> comparator = new Comparator<Integer2FloatKeyValue>() {
@Override
public int compare(Integer2FloatKeyValue left, Integer2FloatKeyValue right) {
int compare = -(Float.compare(left.getValue(), right.getValue()));
if (compare == 0) {
compare = Integer.compare(left.getKey(), right.getKey());
}
return compare;
}
};
protected MathVector getNeighborVector(Collection<Integer2FloatKeyValue> neighbors) {
int size = neighbors.size();
int[] indexes = new int[size];
float[] values = new float[size];
Int2FloatSortedMap keyValues = new Int2FloatRBTreeMap();
for (Integer2FloatKeyValue term : neighbors) {
keyValues.put(term.getKey(), term.getValue());
}
int cursor = 0;
for (Int2FloatMap.Entry term : keyValues.int2FloatEntrySet()) {
indexes[cursor] = term.getIntKey();
values[cursor] = term.getFloatValue();
cursor++;
}
return new ArrayVector(size, indexes, values);
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
neighborSize = configuration.getInteger("recommender.neighbors.knn.number", 50);
// TODO 设置容量
itemNeighbors = new MathVector[itemSize];
Neighborhood<Integer2FloatKeyValue>[] knns = new Neighborhood[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
knns[itemIndex] = new Neighborhood<>(neighborSize, comparator);
}
// TODO 修改为配置枚举
try {
Class<MathCorrelation> correlationClass = (Class<MathCorrelation>) Class.forName(configuration.getString("recommender.correlation.class"));
MathCorrelation correlation = ReflectionUtility.getInstance(correlationClass);
correlation.calculateCoefficients(scoreMatrix, true, (leftIndex, rightIndex, coefficient) -> {
if (leftIndex == rightIndex) {
return;
}
// 忽略相似度为0的物品
if (coefficient == 0F) {
return;
}
knns[leftIndex].updateNeighbor(new Integer2FloatKeyValue(rightIndex, coefficient));
knns[rightIndex].updateNeighbor(new Integer2FloatKeyValue(leftIndex, coefficient));
});
} catch (Exception exception) {
throw new RuntimeException(exception);
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemNeighbors[itemIndex] = getNeighborVector(knns[itemIndex].getNeighbors());
}
itemMeans = DenseVector.valueOf(itemSize);
userVectors = new SparseVector[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userVectors[userIndex] = scoreMatrix.getRowVector(userIndex);
}
itemVectors = new SparseVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemVectors[itemIndex] = scoreMatrix.getColumnVector(itemIndex);
}
}
@Override
protected void doPractice() {
meanScore = scoreMatrix.getSum(false) / scoreMatrix.getElementSize();
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
itemMeans.setValue(itemIndex, itemVector.getElementSize() > 0 ? itemVector.getSum(false) / itemVector.getElementSize() : meanScore);
}
}
}

View File

@ -0,0 +1,134 @@
package com.jstarcraft.rns.model.collaborative;
import java.util.Collection;
import java.util.Comparator;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.correlation.MathCorrelation;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.common.reflection.ReflectionUtility;
import com.jstarcraft.core.utility.Integer2FloatKeyValue;
import com.jstarcraft.core.utility.Neighborhood;
import com.jstarcraft.rns.model.AbstractModel;
import it.unimi.dsi.fastutil.ints.Int2FloatMap;
import it.unimi.dsi.fastutil.ints.Int2FloatRBTreeMap;
import it.unimi.dsi.fastutil.ints.Int2FloatSortedMap;
/**
*
* User KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public abstract class UserKNNModel extends AbstractModel {
/** 邻居数量 */
private int neighborSize;
protected DenseVector userMeans;
/**
* user's nearest neighbors for kNN > 0
*/
protected MathVector[] userNeighbors;
protected SparseVector[] userVectors;
protected SparseVector[] itemVectors;
private Comparator<Integer2FloatKeyValue> comparator = new Comparator<Integer2FloatKeyValue>() {
@Override
public int compare(Integer2FloatKeyValue left, Integer2FloatKeyValue right) {
int compare = -(Float.compare(left.getValue(), right.getValue()));
if (compare == 0) {
compare = Integer.compare(left.getKey(), right.getKey());
}
return compare;
}
};
protected MathVector getNeighborVector(Collection<Integer2FloatKeyValue> neighbors) {
int size = neighbors.size();
int[] indexes = new int[size];
float[] values = new float[size];
Int2FloatSortedMap keyValues = new Int2FloatRBTreeMap();
for (Integer2FloatKeyValue term : neighbors) {
keyValues.put(term.getKey(), term.getValue());
}
int cursor = 0;
for (Int2FloatMap.Entry term : keyValues.int2FloatEntrySet()) {
indexes[cursor] = term.getIntKey();
values[cursor] = term.getFloatValue();
cursor++;
}
return new ArrayVector(size, indexes, values);
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
neighborSize = configuration.getInteger("recommender.neighbors.knn.number");
// TODO 设置容量
userNeighbors = new MathVector[userSize];
Neighborhood<Integer2FloatKeyValue>[] knns = new Neighborhood[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
knns[userIndex] = new Neighborhood<>(neighborSize, comparator);
}
// TODO 修改为配置枚举
try {
Class<MathCorrelation> correlationClass = (Class<MathCorrelation>) Class.forName(configuration.getString("recommender.correlation.class"));
MathCorrelation correlation = ReflectionUtility.getInstance(correlationClass);
correlation.calculateCoefficients(scoreMatrix, false, (leftIndex, rightIndex, coefficient) -> {
if (leftIndex == rightIndex) {
return;
}
// 忽略相似度为0的物品
if (coefficient == 0F) {
return;
}
knns[leftIndex].updateNeighbor(new Integer2FloatKeyValue(rightIndex, coefficient));
knns[rightIndex].updateNeighbor(new Integer2FloatKeyValue(leftIndex, coefficient));
});
} catch (Exception exception) {
throw new RuntimeException(exception);
}
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userNeighbors[userIndex] = getNeighborVector(knns[userIndex].getNeighbors());
}
userMeans = DenseVector.valueOf(userSize);
userVectors = new SparseVector[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userVectors[userIndex] = scoreMatrix.getRowVector(userIndex);
}
itemVectors = new SparseVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemVectors[itemIndex] = scoreMatrix.getColumnVector(itemIndex);
}
}
@Override
protected void doPractice() {
meanScore = scoreMatrix.getSum(false) / scoreMatrix.getElementSize();
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
userMeans.setValue(userIndex, userVector.getElementSize() > 0 ? userVector.getSum(false) / userVector.getElementSize() : meanScore);
}
}
}

View File

@ -0,0 +1,2 @@
协同过滤推荐算法总结:
http://www.cnblogs.com/pinard/p/6349233.html

View File

@ -0,0 +1,185 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
*
* AoBPR推荐器
*
* <pre>
* AoBPR: BPR with Adaptive Oversampling
* Improving pairwise learning for item recommendation from implicit feedback
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class AoBPRModel extends MatrixFactorizationModel {
private int loopNumber;
/**
* item geometric distribution parameter
*/
private int lambdaItem;
// TODO 考虑修改为矩阵和向量
private float[] factorVariances;
private int[][] factorRanks;
private DenseVector rankProbabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// set for this alg
lambdaItem = (int) (configuration.getFloat("recommender.item.distribution.parameter") * itemSize);
// lamda_Item=500;
loopNumber = (int) (itemSize * Math.log(itemSize));
factorVariances = new float[factorSize];
factorRanks = new int[factorSize][itemSize];
}
@Override
protected void doPractice() {
// 排序列表
List<KeyValue<Integer, Float>> sortList = new ArrayList<>(itemSize);
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
rankProbabilities = DenseVector.valueOf(itemSize);
rankProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
sortList.add(new KeyValue<>(index, 0F));
float value = (float) Math.exp(-(index + 1) / lambdaItem);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
List<IntSet> userItemSet = getUserItemSet(scoreMatrix);
// TODO 此处需要重构
List<Integer> userIndexes = new ArrayList<>(actionSize), itemIndexes = new ArrayList<>(actionSize);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
userIndexes.add(userIndex);
itemIndexes.add(itemIndex);
}
// randoms get a f by p(f|c)
DenseVector factorProbabilities = DenseVector.valueOf(factorSize);
int sampleCount = 0;
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
// update Ranking every |I|log|I|
if (sampleCount % loopNumber == 0) {
updateSortListByFactor(sortList);
sampleCount = 0;
}
sampleCount++;
// randomly draw (u, i, j)
int userIndex, positiveItemIndex, negativeItemIndex;
while (true) {
int random = RandomUtility.randomInteger(actionSize);
userIndex = userIndexes.get(random);
IntSet itemSet = userItemSet.get(userIndex);
if (itemSet.size() == 0 || itemSet.size() == itemSize) {
continue;
}
positiveItemIndex = itemIndexes.get(random);
// 计算概率
DenseVector factorVector = userFactors.getRowVector(userIndex);
sum.setValue(0F);
factorProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = Math.abs(factorVector.getValue(index)) * factorVariances[index];
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
do {
// randoms get a r by exp(-r/lamda)
int rankIndex = SampleUtility.binarySearch(rankProbabilities, 0, rankProbabilities.getElementSize() - 1, RandomUtility.randomFloat(rankProbabilities.getValue(rankProbabilities.getElementSize() - 1)));
int factorIndex = SampleUtility.binarySearch(factorProbabilities, 0, factorProbabilities.getElementSize() - 1, RandomUtility.randomFloat(factorProbabilities.getValue(factorProbabilities.getElementSize() - 1)));
// get the r-1 in f item
if (userFactors.getValue(userIndex, factorIndex) > 0) {
negativeItemIndex = factorRanks[factorIndex][rankIndex];
} else {
negativeItemIndex = factorRanks[factorIndex][itemSize - rankIndex - 1];
}
} while (itemSet.contains(negativeItemIndex));
break;
}
// update parameters
float positiveScore = predict(userIndex, positiveItemIndex);
float negativeScore = predict(userIndex, negativeItemIndex);
float error = positiveScore - negativeScore;
float value = (float) -Math.log(LogisticUtility.getValue(error));
totalError += value;
value = LogisticUtility.getValue(-error);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (value * (positiveFactor - negativeFactor) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (value * userFactor - itemRegularization * positiveFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (value * (-userFactor) - itemRegularization * negativeFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveFactor * positiveFactor + itemRegularization * negativeFactor * negativeFactor;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
// TODO 考虑重构
private void updateSortListByFactor(List<KeyValue<Integer, Float>> sortList) {
// echo for each factors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float sum = 0F;
DenseVector factorVector = itemFactors.getColumnVector(factorIndex);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float value = factorVector.getValue(itemIndex);
sortList.get(itemIndex).setValue(value);
sum += value;
}
Collections.sort(sortList, (left, right) -> {
// 降序
return right.getValue().compareTo(left.getValue());
});
float mean = sum / factorVector.getElementSize();
sum = 0F;
for (int sortIndex = 0; sortIndex < itemSize; sortIndex++) {
float value = factorVector.getValue(sortIndex);
sum += (value - mean) * (value - mean);
factorRanks[factorIndex][sortIndex] = sortList.get(sortIndex).getKey();
}
factorVariances[factorIndex] = sum / factorVector.getElementSize();
}
}
}

View File

@ -0,0 +1,135 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
/**
*
* Aspect Model推荐器
*
* <pre>
* Latent class models for collaborative filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class AspectModelRankingModel extends ProbabilisticGraphicalModel {
/**
* Conditional distribution: P(u|z)
*/
private DenseMatrix userProbabilities, userSums;
/**
* Conditional distribution: P(i|z)
*/
private DenseMatrix itemProbabilities, itemSums;
/**
* topic distribution: P(z)
*/
private DenseVector topicProbabilities, topicSums;
private DenseVector probabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// Initialize topic distribution
// TODO 考虑重构
topicProbabilities = DenseVector.valueOf(factorSize);
topicSums = DenseVector.valueOf(factorSize);
topicProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(factorSize) + 1);
});
topicProbabilities.scaleValues(1F / topicProbabilities.getSum(false));
userProbabilities = DenseMatrix.valueOf(factorSize, userSize);
userSums = DenseMatrix.valueOf(factorSize, userSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = userProbabilities.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
float value = scalar.getValue();
scalar.setValue(RandomUtility.randomInteger(userSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
itemProbabilities = DenseMatrix.valueOf(factorSize, itemSize);
itemSums = DenseMatrix.valueOf(factorSize, itemSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = itemProbabilities.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(itemSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
probabilities = DenseVector.valueOf(factorSize);
}
/*
*
*/
@Override
protected void eStep() {
topicSums.setValues(0F);
userSums.setValues(0F);
itemSums.setValues(0F);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
probabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = userProbabilities.getValue(index, userIndex) * itemProbabilities.getValue(index, itemIndex) * topicProbabilities.getValue(index);
scalar.setValue(value);
});
probabilities.scaleValues(1F / probabilities.getSum(false));
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float value = probabilities.getValue(topicIndex) * term.getValue();
topicSums.shiftValue(topicIndex, value);
userSums.shiftValue(topicIndex, userIndex, value);
itemSums.shiftValue(topicIndex, itemIndex, value);
}
}
}
@Override
protected void mStep() {
float scale = 1F / topicSums.getSum(false);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
topicProbabilities.setValue(topicIndex, topicSums.getValue(topicIndex) * scale);
float userSum = userProbabilities.getColumnVector(topicIndex).getSum(false);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userProbabilities.setValue(topicIndex, userIndex, userSums.getValue(topicIndex, userIndex) / userSum);
}
float itemSum = itemProbabilities.getColumnVector(topicIndex).getSum(false);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemProbabilities.setValue(topicIndex, itemIndex, itemSums.getValue(topicIndex, itemIndex) / itemSum);
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value += userProbabilities.getValue(topicIndex, userIndex) * itemProbabilities.getValue(topicIndex, itemIndex) * topicProbabilities.getValue(topicIndex);
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,39 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.rns.model.collaborative.BHFreeModel;
/**
*
* BH Free推荐器
*
* <pre>
* Balancing Prediction and Recommendation Accuracy: Hierarchical Latent Factors for Preference Data
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BHFreeRankingModel extends BHFreeModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (Entry<Float, Integer> entry : scoreIndexes.entrySet()) {
float score = entry.getKey();
float probability = 0F;
for (int userTopic = 0; userTopic < userTopicSize; userTopic++) {
for (int itemTopic = 0; itemTopic < itemTopicSize; itemTopic++) {
probability += user2TopicProbabilities.getValue(userIndex, userTopic) * userTopic2ItemTopicProbabilities.getValue(userTopic, itemTopic) * userTopic2ItemTopicItemSums[userTopic][itemTopic][itemIndex] * userTopic2ItemTopicScoreProbabilities[userTopic][itemTopic][entry.getValue()];
}
}
value += score * probability;
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,74 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* BPR推荐器
*
* <pre>
* BPR: Bayesian Personalized Ranking from Implicit Feedback
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BPRModel extends MatrixFactorizationModel {
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
// randomly draw (userIdx, posItemIdx, negItemIdx)
int userIndex, positiveItemIndex, negativeItemIndex;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
positiveItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
negativeItemIndex = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (VectorScalar term : userVector) {
if (negativeItemIndex >= term.getIndex()) {
negativeItemIndex++;
} else {
break;
}
}
break;
}
// update parameters
float positiveScore = predict(userIndex, positiveItemIndex);
float negativeScore = predict(userIndex, negativeItemIndex);
float error = positiveScore - negativeScore;
float value = (float) -Math.log(LogisticUtility.getValue(error));
totalError += value;
value = LogisticUtility.getValue(-error);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (value * (positiveFactor - negativeFactor) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (value * userFactor - itemRegularization * positiveFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (value * (-userFactor) - itemRegularization * negativeFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveFactor * positiveFactor + itemRegularization * negativeFactor * negativeFactor;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,41 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.rns.model.collaborative.BUCMModel;
/**
*
* BUCM推荐器
*
* <pre>
* Bayesian User Community Model
* Modeling Item Selection and Relevance for Accurate Recommendations
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BUCMRankingModel extends BUCMModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (int topicIndex = 0; topicIndex < factorSize; ++topicIndex) {
float sum = 0F;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
double score = term.getKey();
if (score > meanScore) {
sum += topicItemScoreProbabilities[topicIndex][itemIndex][term.getValue()];
}
}
value += userTopicProbabilities.getValue(userIndex, topicIndex) * topicItemProbabilities.getValue(topicIndex, itemIndex) * sum;
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,139 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.HashMap;
import java.util.List;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
*
* Random Guess推荐器
*
* <pre>
* CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class CLiMFModel extends MatrixFactorizationModel {
@Override
protected void doPractice() {
List<IntSet> userItemSet = getUserItemSet(scoreMatrix);
float[] factorValues = new float[factorSize];
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
// TODO 此处应该考虑重构,不再使用itemSet
IntSet itemSet = userItemSet.get(userIndex);
// 缓存预测值
DenseVector predictVector = DenseVector.valueOf(itemSet.size());
DenseVector logisticVector = DenseVector.valueOf(itemSet.size());
int index = 0;
for (int itemIndex : itemSet) {
float value = predict(userIndex, itemIndex);
predictVector.setValue(index, value);
logisticVector.setValue(index, LogisticUtility.getValue(-value));
index++;
}
DenseMatrix logisticMatrix = DenseMatrix.valueOf(itemSet.size(), itemSet.size());
DenseMatrix gradientMatrix = DenseMatrix.valueOf(itemSet.size(), itemSet.size());
gradientMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = predictVector.getValue(row) - predictVector.getValue(column);
float logistic = LogisticUtility.getValue(value);
logisticMatrix.setValue(row, column, logistic);
float gradient = LogisticUtility.getGradient(value);
scalar.setValue(gradient);
});
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float factorValue = -userRegularization * userFactors.getValue(userIndex, factorIndex);
int leftIndex = 0;
for (int itemIndex : itemSet) {
float itemFactorValue = itemFactors.getValue(itemIndex, factorIndex);
factorValue += logisticVector.getValue(leftIndex) * itemFactorValue;
// TODO 此处应该考虑对称性减少迭代次数
int rightIndex = 0;
for (int compareIndex : itemSet) {
if (compareIndex != itemIndex) {
float compareValue = itemFactors.getValue(compareIndex, factorIndex);
factorValue += gradientMatrix.getValue(rightIndex, leftIndex) / (1 - logisticMatrix.getValue(rightIndex, leftIndex)) * (itemFactorValue - compareValue);
}
rightIndex++;
}
leftIndex++;
}
factorValues[factorIndex] = factorValue;
}
int leftIndex = 0;
for (int itemIndex : itemSet) {
float logisticValue = logisticVector.getValue(leftIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactorValue = userFactors.getValue(userIndex, factorIndex);
float itemFactorValue = itemFactors.getValue(itemIndex, factorIndex);
float judgeValue = 1F;
float factorValue = judgeValue * logisticValue * userFactorValue - itemRegularization * itemFactorValue;
// TODO 此处应该考虑对称性减少迭代次数
int rightIndex = 0;
for (int compareIndex : itemSet) {
if (compareIndex != itemIndex) {
factorValue += gradientMatrix.getValue(rightIndex, leftIndex) * (judgeValue / (judgeValue - logisticMatrix.getValue(rightIndex, leftIndex)) - judgeValue / (judgeValue - logisticMatrix.getValue(leftIndex, rightIndex))) * userFactorValue;
}
rightIndex++;
}
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * factorValue);
}
leftIndex++;
}
for (int factorIdx = 0; factorIdx < factorSize; factorIdx++) {
userFactors.shiftValue(userIndex, factorIdx, learnRatio * factorValues[factorIdx]);
}
// TODO 获取预测值
HashMap<Integer, Float> predictMap = new HashMap<>(itemSet.size());
for (int itemIndex : itemSet) {
float predictValue = predict(userIndex, itemIndex);
predictMap.put(itemIndex, predictValue);
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (itemSet.contains(itemIndex)) {
float predictValue = predictMap.get(itemIndex);
totalError += (float) Math.log(LogisticUtility.getValue(predictValue));
// TODO 此处应该考虑对称性减少迭代次数
for (int compareIndex : itemSet) {
float compareValue = predictMap.get(compareIndex);
totalError += (float) Math.log(1 - LogisticUtility.getValue(compareValue - predictValue));
}
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactorValue = userFactors.getValue(userIndex, factorIndex);
float itemFactorValue = itemFactors.getValue(itemIndex, factorIndex);
totalError += -0.5 * (userRegularization * userFactorValue * userFactorValue + itemRegularization * itemFactorValue * itemFactorValue);
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,237 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.concurrent.CountDownLatch;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.environment.EnvironmentContext;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.model.exception.ModelException;
/**
*
* EALS推荐器
*
* <pre>
* EALS: efficient Alternating Least Square for Weighted Regularized Matrix Factorization
* Collaborative filtering for implicit feedback dataset
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class EALSModel extends MatrixFactorizationModel {
/**
* confidence weight coefficient for WRMF
*/
protected float weightCoefficient;
/**
* the significance level of popular items over un-popular ones
*/
private float ratio;
/**
* the overall weight of missing data c0
*/
private float overallWeight;
/**
* 0eALS MF; 1WRMF; 2: both
*/
private int type;
/**
* confidence that item i missed by users
*/
private float[] confidences;
/**
* weights of all user-item pair (u,i)
*/
private SparseMatrix weights;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
weightCoefficient = configuration.getFloat("recommender.wrmf.weight.coefficient", 4.0f);
ratio = configuration.getFloat("recommender.eals.ratio", 0.4f);
overallWeight = configuration.getFloat("recommender.eals.overall", 128.0f);
type = configuration.getInteger("recommender.eals.wrmf.judge", 1);
confidences = new float[itemSize];
// get ci
if (type == 0 || type == 2) {
float sumPopularity = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float alphaPopularity = (float) Math.pow(scoreMatrix.getColumnScope(itemIndex) * 1.0 / actionSize, ratio);
confidences[itemIndex] = overallWeight * alphaPopularity;
sumPopularity += alphaPopularity;
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
confidences[itemIndex] = confidences[itemIndex] / sumPopularity;
}
} else {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
confidences[itemIndex] = 1;
}
}
weights = SparseMatrix.copyOf(scoreMatrix, false);
weights.iterateElement(MathCalculator.SERIAL, (scalar) -> {
if (type == 1 || type == 2) {
scalar.setValue(1F + weightCoefficient * scalar.getValue());
} else {
scalar.setValue(1F);
}
});
}
private ThreadLocal<float[]> itemScoreStorage = new ThreadLocal<>();
private ThreadLocal<float[]> itemWeightStorage = new ThreadLocal<>();
private ThreadLocal<float[]> userScoreStorage = new ThreadLocal<>();
private ThreadLocal<float[]> userWeightStorage = new ThreadLocal<>();
@Override
protected void constructEnvironment() {
// TODO 可以继续节省数组分配的大小(按照稀疏矩阵的最大向量作为缓存大小).
itemScoreStorage.set(new float[itemSize]);
itemWeightStorage.set(new float[itemSize]);
userScoreStorage.set(new float[userSize]);
userWeightStorage.set(new float[userSize]);
}
@Override
protected void destructEnvironment() {
itemScoreStorage.remove();
itemWeightStorage.remove();
userScoreStorage.remove();
userWeightStorage.remove();
}
@Override
protected void doPractice() {
EnvironmentContext context = EnvironmentContext.getContext();
DenseMatrix itemDeltas = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix userDeltas = DenseMatrix.valueOf(factorSize, factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// Update the Sq cache
for (int leftFactorIndex = 0; leftFactorIndex < factorSize; leftFactorIndex++) {
for (int rightFactorIndex = leftFactorIndex; rightFactorIndex < factorSize; rightFactorIndex++) {
float value = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
value += confidences[itemIndex] * itemFactors.getValue(itemIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, rightFactorIndex);
}
itemDeltas.setValue(leftFactorIndex, rightFactorIndex, value);
itemDeltas.setValue(rightFactorIndex, leftFactorIndex, value);
}
}
// Step 1: update user factors;
// 按照用户切割任务实现并发计算.
CountDownLatch userLatch = new CountDownLatch(userSize);
for (int index = 0; index < userSize; index++) {
int userIndex = index;
context.doAlgorithmByAny(index, () -> {
DefaultScalar scalar = DefaultScalar.getInstance();
SparseVector userVector = weights.getRowVector(userIndex);
DenseVector factorVector = userFactors.getRowVector(userIndex);
float[] itemScores = itemScoreStorage.get();
float[] itemWeights = itemWeightStorage.get();
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
itemScores[itemIndex] = scalar.dotProduct(itemVector, factorVector).getValue();
itemWeights[itemIndex] = term.getValue();
}
for (int leftFactorIndex = 0; leftFactorIndex < factorSize; leftFactorIndex++) {
float numerator = 0, denominator = userRegularization + itemDeltas.getValue(leftFactorIndex, leftFactorIndex);
for (int rightFactorIndex = 0; rightFactorIndex < factorSize; rightFactorIndex++) {
if (leftFactorIndex != rightFactorIndex) {
numerator -= userFactors.getValue(userIndex, rightFactorIndex) * itemDeltas.getValue(leftFactorIndex, rightFactorIndex);
}
}
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
itemScores[itemIndex] -= userFactors.getValue(userIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, leftFactorIndex);
numerator += (itemWeights[itemIndex] - (itemWeights[itemIndex] - confidences[itemIndex]) * itemScores[itemIndex]) * itemFactors.getValue(itemIndex, leftFactorIndex);
denominator += (itemWeights[itemIndex] - confidences[itemIndex]) * itemFactors.getValue(itemIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, leftFactorIndex);
}
// update puf
userFactors.setValue(userIndex, leftFactorIndex, numerator / denominator);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
itemScores[itemIndex] += userFactors.getValue(userIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, leftFactorIndex);
}
}
userLatch.countDown();
});
}
try {
userLatch.await();
} catch (Exception exception) {
throw new ModelException(exception);
}
// Update the Sp cache
userDeltas.dotProduct(userFactors, true, userFactors, false, MathCalculator.SERIAL);
// Step 2: update item factors;
// 按照物品切割任务实现并发计算.
CountDownLatch itemLatch = new CountDownLatch(itemSize);
for (int index = 0; index < itemSize; index++) {
int itemIndex = index;
context.doAlgorithmByAny(index, () -> {
DefaultScalar scalar = DefaultScalar.getInstance();
SparseVector itemVector = weights.getColumnVector(itemIndex);
DenseVector factorVector = itemFactors.getRowVector(itemIndex);
float[] userScores = userScoreStorage.get();
float[] userWeights = userWeightStorage.get();
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
DenseVector userVector = userFactors.getRowVector(userIndex);
userScores[userIndex] = scalar.dotProduct(userVector, factorVector).getValue();
userWeights[userIndex] = term.getValue();
}
for (int leftFactorIndex = 0; leftFactorIndex < factorSize; leftFactorIndex++) {
float numerator = 0, denominator = confidences[itemIndex] * userDeltas.getValue(leftFactorIndex, leftFactorIndex) + itemRegularization;
for (int rightFactorIndex = 0; rightFactorIndex < factorSize; rightFactorIndex++) {
if (leftFactorIndex != rightFactorIndex) {
numerator -= itemFactors.getValue(itemIndex, rightFactorIndex) * userDeltas.getValue(rightFactorIndex, leftFactorIndex);
}
}
numerator *= confidences[itemIndex];
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
userScores[userIndex] -= userFactors.getValue(userIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, leftFactorIndex);
numerator += (userWeights[userIndex] - (userWeights[userIndex] - confidences[itemIndex]) * userScores[userIndex]) * userFactors.getValue(userIndex, leftFactorIndex);
denominator += (userWeights[userIndex] - confidences[itemIndex]) * userFactors.getValue(userIndex, leftFactorIndex) * userFactors.getValue(userIndex, leftFactorIndex);
}
// update qif
itemFactors.setValue(itemIndex, leftFactorIndex, numerator / denominator);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
userScores[userIndex] += userFactors.getValue(userIndex, leftFactorIndex) * itemFactors.getValue(itemIndex, leftFactorIndex);
}
}
itemLatch.countDown();
});
}
try {
itemLatch.await();
} catch (Exception exception) {
throw new ModelException(exception);
}
}
}
}

View File

@ -0,0 +1,245 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* FISM-AUC推荐器
*
* <pre>
* FISM: Factored Item Similarity Models for Top-N Recommender Systems
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
// 注意:FISM使用itemFactors来组成userFactors
public class FISMAUCModel extends MatrixFactorizationModel {
private float rho, alpha, beta, gamma;
/**
* bias regularization
*/
private float biasRegularization;
/**
* items and users biases vector
*/
private DenseVector itemBiases;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// 注意:FISM使用itemFactors来组成userFactors
userFactors = DenseMatrix.valueOf(itemSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
// TODO
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
rho = configuration.getFloat("recommender.fismauc.rho");// 3-15
alpha = configuration.getFloat("recommender.fismauc.alpha", 0.5F);
beta = configuration.getFloat("recommender.fismauc.beta", 0.6F);
gamma = configuration.getFloat("recommender.fismauc.gamma", 0.1F);
biasRegularization = configuration.getFloat("recommender.iteration.learnrate", 0.0001F);
// cacheSpec = conf.get("guava.cache.spec",
// "maximumSize=200,expireAfterAccess=2m");
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
// x <- 0
DenseVector userVector = DenseVector.valueOf(factorSize);
// t <- (n - 1)^(-alpha) Σ pj (j!=i)
DenseVector itemVector = DenseVector.valueOf(factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
// for all u in C
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector rateVector = scoreMatrix.getRowVector(userIndex);
int size = rateVector.getElementSize();
if (size == 0 || size == 1) {
size = 2;
}
// for all i in Ru+
for (VectorScalar positiveTerm : rateVector) {
int positiveIndex = positiveTerm.getIndex();
userVector.setValues(0F);
itemVector.setValues(0F);
for (VectorScalar negativeTerm : rateVector) {
int negativeIndex = negativeTerm.getIndex();
if (positiveIndex != negativeIndex) {
itemVector.addVector(userFactors.getRowVector(negativeIndex));
}
}
itemVector.scaleValues((float) Math.pow(size - 1, -alpha));
// Z <- SampleZeros(rho)
int sampleSize = (int) (rho * size);
// make a random sample of negative feedback for Ru-
List<Integer> negativeIndexes = new LinkedList<>();
for (int sampleIndex = 0; sampleIndex < sampleSize; sampleIndex++) {
int negativeItemIndex = RandomUtility.randomInteger(itemSize - negativeIndexes.size());
int index = 0;
for (int negativeIndex : negativeIndexes) {
if (negativeItemIndex >= negativeIndex) {
negativeItemIndex++;
index++;
} else {
break;
}
}
negativeIndexes.add(index, negativeItemIndex);
}
int leftCursor = 0, rightCursor = 0, leftSize = rateVector.getElementSize(), rightSize = sampleSize;
if (leftSize != 0 && rightSize != 0) {
Iterator<VectorScalar> leftIterator = rateVector.iterator();
Iterator<Integer> rightIterator = negativeIndexes.iterator();
VectorScalar leftTerm = leftIterator.next();
int negativeItemIndex = rightIterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (leftTerm.getIndex() == negativeItemIndex) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
rightIterator.remove();
if (rightIterator.hasNext()) {
negativeItemIndex = rightIterator.next();
}
leftCursor++;
rightCursor++;
} else if (leftTerm.getIndex() > negativeItemIndex) {
if (rightIterator.hasNext()) {
negativeItemIndex = rightIterator.next();
}
rightCursor++;
} else if (leftTerm.getIndex() < negativeItemIndex) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
leftCursor++;
}
}
}
// for all j in Z
for (int negativeIndex : negativeIndexes) {
// update pui puj rui ruj
float positiveScore = positiveTerm.getValue();
float negativeScore = 0F;
float positiveBias = itemBiases.getValue(positiveIndex);
float negativeBias = itemBiases.getValue(negativeIndex);
float positiveFactor = positiveBias + scalar.dotProduct(itemFactors.getRowVector(positiveIndex), itemVector).getValue();
float negativeFactor = negativeBias + scalar.dotProduct(itemFactors.getRowVector(negativeIndex), itemVector).getValue();
float error = (positiveScore - negativeScore) - (positiveFactor - negativeFactor);
totalError += error * error;
// update bi bj
itemBiases.shiftValue(positiveIndex, biasRegularization * (error - gamma * positiveBias));
itemBiases.shiftValue(negativeIndex, biasRegularization * (error - gamma * negativeBias));
// update qi qj
DenseVector positiveVector = itemFactors.getRowVector(positiveIndex);
positiveVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue(value + (itemVector.getValue(index) * error - value * beta) * biasRegularization);
});
DenseVector negativeVector = itemFactors.getRowVector(negativeIndex);
negativeVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue(value - (itemVector.getValue(index) * error - value * beta) * biasRegularization);
});
// update x
userVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue(value + (positiveVector.getValue(index) - negativeVector.getValue(index)) * error);
});
}
float scale = (float) (Math.pow(rho, -1) * Math.pow(size - 1, -alpha));
// for all j in Ru+\{i}
for (VectorScalar term : rateVector) {
int negativeIndex = term.getIndex();
if (negativeIndex != positiveIndex) {
// update pj
DenseVector negativeVector = userFactors.getRowVector(negativeIndex);
negativeVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue((userVector.getValue(index) * scale - value * beta) * biasRegularization + value);
});
}
}
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
double itemBias = itemBiases.getValue(itemIndex);
totalError += gamma * itemBias * itemBias;
totalError += beta * scalar.dotProduct(itemFactors.getRowVector(itemIndex), itemFactors.getRowVector(itemIndex)).getValue();
totalError += beta * scalar.dotProduct(userFactors.getRowVector(itemIndex), userFactors.getRowVector(itemIndex)).getValue();
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float bias = itemBiases.getValue(itemIndex);
float sum = 0F;
int count = 0;
for (VectorScalar term : scoreMatrix.getRowVector(userIndex)) {
int compareIndex = term.getIndex();
// for test, i and j will be always unequal as j is unrated
if (compareIndex != itemIndex) {
DenseVector compareVector = userFactors.getRowVector(compareIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
sum += scalar.dotProduct(compareVector, itemVector).getValue();
count++;
}
}
sum *= (float) (count > 0 ? Math.pow(count, -alpha) : 0F);
instance.setQuantityMark(bias + sum);
}
}

View File

@ -0,0 +1,202 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* FISM-RMSE推荐器
*
* <pre>
* FISM: Factored Item Similarity Models for Top-N Recommender Systems
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
// 注意:FISM使用itemFactors来组成userFactors
public class FISMRMSEModel extends MatrixFactorizationModel {
private int numNeighbors;
private float rho, alpha, beta, itemRegularization, userRegularization;
/**
* bias regularization
*/
private float learnRatio;
/**
* items and users biases vector
*/
private DenseVector itemBiases, userBiases;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// 注意:FISM使用itemFactors来组成userFactors
userFactors = DenseMatrix.valueOf(itemSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
userBiases = DenseVector.valueOf(userSize);
userBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
numNeighbors = scoreMatrix.getElementSize();
rho = configuration.getFloat("recommender.fismrmse.rho");// 3-15
alpha = configuration.getFloat("recommender.fismrmse.alpha", 0.5F);
beta = configuration.getFloat("recommender.fismrmse.beta", 0.6F);
itemRegularization = configuration.getFloat("recommender.fismrmse.gamma", 0.1F);
userRegularization = configuration.getFloat("recommender.fismrmse.gamma", 0.1F);
learnRatio = configuration.getFloat("recommender.fismrmse.lrate", 0.0001F);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
int sampleSize = (int) (rho * numNeighbors);
int totalSize = userSize * itemSize;
HashMatrix rateMatrix = new HashMatrix(true, userSize, itemSize, new Long2FloatRBTreeMap());
for (MatrixScalar cell : scoreMatrix) {
rateMatrix.setValue(cell.getRow(), cell.getColumn(), cell.getValue());
}
int[] sampleIndexes = new int[sampleSize];
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
DenseVector userVector = DenseVector.valueOf(factorSize);
totalError = 0F;
// new training data by sampling negative values
// R是一个在trainMatrix基础上增加负样本的矩阵.
// make a random sample of negative feedback (total - nnz)
for (int sampleIndex = 0; sampleIndex < sampleSize; sampleIndex++) {
while (true) {
int randomIndex = RandomUtility.randomInteger(totalSize - numNeighbors);
int rowIndex = randomIndex / itemSize;
int columnIndex = randomIndex % itemSize;
if (Float.isNaN(rateMatrix.getValue(rowIndex, columnIndex))) {
sampleIndexes[sampleIndex] = randomIndex;
rateMatrix.setValue(rowIndex, columnIndex, 0F);
break;
}
}
}
// update throughout each user-item-rating (u, i, rui) cell
for (MatrixScalar cell : rateMatrix) {
int userIndex = cell.getRow();
int itemIndex = cell.getColumn();
float score = cell.getValue();
SparseVector rateVector = scoreMatrix.getRowVector(userIndex);
int size = rateVector.getElementSize() - 1;
if (size == 0 || size == -1) {
size = 1;
}
for (VectorScalar term : rateVector) {
int compareIndex = term.getIndex();
if (itemIndex != compareIndex) {
userVector.addVector(userFactors.getRowVector(compareIndex));
}
}
userVector.scaleValues((float) Math.pow(size, -alpha));
// for efficiency, use the below code to predict rui instead of
// simply using "predict(u,j)"
float itemBias = itemBiases.getValue(itemIndex);
float userBias = userBiases.getValue(userIndex);
float predict = itemBias + userBias + scalar.dotProduct(itemFactors.getRowVector(itemIndex), userVector).getValue();
float error = score - predict;
totalError += error * error;
// update bi
itemBiases.shiftValue(itemIndex, learnRatio * (error - itemRegularization * itemBias));
totalError += itemRegularization * itemBias * itemBias;
// update bu
userBiases.shiftValue(userIndex, learnRatio * (error - userRegularization * userBias));
totalError += userRegularization * userBias * userBias;
DenseVector factorVector = itemFactors.getRowVector(itemIndex);
factorVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue((userVector.getValue(index) * error - value * beta) * learnRatio + value);
});
totalError += beta * scalar.dotProduct(factorVector, factorVector).getValue();
for (VectorScalar term : rateVector) {
int compareIndex = term.getIndex();
if (itemIndex != compareIndex) {
float scale = (float) (error * Math.pow(size, -alpha));
factorVector = userFactors.getRowVector(compareIndex);
factorVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
element.setValue((value * scale - value * beta) * learnRatio + value);
});
totalError += beta * scalar.dotProduct(factorVector, factorVector).getValue();
}
}
}
for (int sampleIndex : sampleIndexes) {
int rowIndex = sampleIndex / itemSize;
int columnIndex = sampleIndex % itemSize;
rateMatrix.setValue(rowIndex, columnIndex, Float.NaN);
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float bias = userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex);
float sum = 0F;
int count = 0;
for (VectorScalar term : scoreMatrix.getRowVector(userIndex)) {
int index = term.getIndex();
// for test, i and j will be always unequal as j is unrated
if (index != itemIndex) {
DenseVector userVector = userFactors.getRowVector(index);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
sum += scalar.dotProduct(userVector, itemVector).getValue();
count++;
}
}
sum *= (float) (count > 0 ? Math.pow(count, -alpha) : 0F);
instance.setQuantityMark(bias + sum);
}
}

View File

@ -0,0 +1,177 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.HashSet;
import java.util.Set;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* Random Guess推荐器
*
* <pre>
* GBPR: Group Preference Based Bayesian Personalized Ranking for One-Class Collaborative Filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class GBPRModel extends MatrixFactorizationModel {
private float rho;
private int gLen;
/**
* bias regularization
*/
private float regBias;
/**
* items biases vector
*/
private DenseVector itemBiases;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
rho = configuration.getFloat("recommender.gpbr.rho", 1.5f);
gLen = configuration.getInteger("recommender.gpbr.gsize", 2);
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
// TODO 考虑重构
DenseMatrix userDeltas = DenseMatrix.valueOf(userSize, factorSize);
DenseMatrix itemDeltas = DenseMatrix.valueOf(itemSize, factorSize);
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
int userIndex, positiveItemIndex, negativeItemIndex;
SparseVector userVector;
do {
userIndex = RandomUtility.randomInteger(userSize);
userVector = scoreMatrix.getRowVector(userIndex);
} while (userVector.getElementSize() == 0);
positiveItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
// users group Set
Set<Integer> memberSet = new HashSet<>();
SparseVector positiveItemVector = scoreMatrix.getColumnVector(positiveItemIndex);
if (positiveItemVector.getElementSize() <= gLen) {
for (VectorScalar entry : positiveItemVector) {
memberSet.add(entry.getIndex());
}
} else {
memberSet.add(userIndex); // u in G
while (memberSet.size() < gLen) {
memberSet.add(positiveItemVector.getIndex(RandomUtility.randomInteger(positiveItemVector.getElementSize())));
}
}
float positiveScore = predict(userIndex, positiveItemIndex, memberSet);
negativeItemIndex = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (VectorScalar term : userVector) {
if (negativeItemIndex >= term.getIndex()) {
negativeItemIndex++;
} else {
break;
}
}
float negativeScore = predict(userIndex, negativeItemIndex);
float error = positiveScore - negativeScore;
float value = (float) -Math.log(LogisticUtility.getValue(error));
totalError += value;
value = LogisticUtility.getValue(-error);
// update bi, bj
float positiveBias = itemBiases.getValue(positiveItemIndex);
itemBiases.shiftValue(positiveItemIndex, learnRatio * (value - regBias * positiveBias));
float negativeBias = itemBiases.getValue(negativeItemIndex);
itemBiases.shiftValue(negativeItemIndex, learnRatio * (-value - regBias * negativeBias));
// update Pw
float averageWeight = 1F / memberSet.size();
float memberSums[] = new float[factorSize];
for (int memberIndex : memberSet) {
float delta = memberIndex == userIndex ? 1F : 0F;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float memberFactor = userFactors.getValue(memberIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
float deltaGroup = rho * averageWeight * positiveFactor + (1 - rho) * delta * positiveFactor - delta * negativeFactor;
userDeltas.shiftValue(memberIndex, factorIndex, learnRatio * (value * deltaGroup - userRegularization * memberFactor));
memberSums[factorIndex] += memberFactor;
}
}
// update itemFactors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
float positiveDelta = rho * averageWeight * memberSums[factorIndex] + (1 - rho) * userFactor;
itemDeltas.shiftValue(positiveItemIndex, factorIndex, learnRatio * (value * positiveDelta - itemRegularization * positiveFactor));
float negativeDelta = -userFactor;
itemDeltas.shiftValue(negativeItemIndex, factorIndex, learnRatio * (value * negativeDelta - itemRegularization * negativeFactor));
}
}
userFactors.addMatrix(userDeltas, false);
itemFactors.addMatrix(itemDeltas, false);
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
private float predict(int userIndex, int itemIndex, Set<Integer> memberIndexes) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float value = itemBiases.getValue(itemIndex) + scalar.dotProduct(userVector, itemVector).getValue();
float sum = 0F;
for (int memberIndex : memberIndexes) {
userVector = userFactors.getRowVector(memberIndex);
sum += scalar.dotProduct(userVector, itemVector).getValue();
}
float groupScore = sum / memberIndexes.size() + itemBiases.getValue(itemIndex);
return rho * groupScore + (1 - rho) * value;
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
return itemBiases.getValue(itemIndex) + scalar.dotProduct(userVector, itemVector).getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,361 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.core.utility.StringUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.GammaUtility;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap;
/**
*
* Item Bigram推荐器
*
* <pre>
* Topic modeling: beyond bag-of-words
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class ItemBigramModel extends ProbabilisticGraphicalModel {
/** 上下文字段 */
private String instantField;
/** 上下文维度 */
private int instantDimension;
private Map<Integer, List<Integer>> userItemMap;
/**
* k: current topic; j: previously rated item; i: current item
*/
private int[][][] topicItemBigramTimes;
private DenseMatrix topicItemProbabilities;
private float[][][] topicItemBigramProbabilities, topicItemBigramSums;
private DenseMatrix beta;
/**
* vector of hyperparameters for alpha
*/
private DenseVector alpha;
/**
* Dirichlet hyper-parameters of user-topic distribution: typical value is 50/K
*/
private float initAlpha;
/**
* Dirichlet hyper-parameters of topic-item distribution, typical value is 0.01
*/
private float initBeta;
/**
* cumulative statistics of theta, phi
*/
private DenseMatrix userTopicSums;
/**
* entry[u, k]: number of tokens assigned to topic k, given user u.
*/
private DenseMatrix userTopicTimes;
/**
* entry[u]: number of tokens rated by user u.
*/
private DenseVector userTokenNumbers;
/**
* posterior probabilities of parameters
*/
private DenseMatrix userTopicProbabilities;
/**
* entry[u, i, k]: topic assignment as sparse structure
*/
// TODO 考虑DenseMatrix支持Integer类型
private Int2IntRBTreeMap topicAssignments;
private DenseVector randomProbabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
initAlpha = configuration.getFloat("recommender.user.dirichlet.prior", 0.01F);
initBeta = configuration.getFloat("recommender.topic.dirichlet.prior", 0.01F);
instantField = configuration.getString("data.model.fields.instant");
instantDimension = model.getQualityInner(instantField);
Int2IntRBTreeMap instantTabel = new Int2IntRBTreeMap();
instantTabel.defaultReturnValue(-1);
for (DataInstance sample : model) {
int instant = instantTabel.get(sample.getQualityFeature(userDimension) * itemSize + sample.getQualityFeature(itemDimension));
if (instant == -1) {
instant = sample.getQualityFeature(instantDimension);
} else {
instant = sample.getQualityFeature(instantDimension) > instant ? sample.getQualityFeature(instantDimension) : instant;
}
instantTabel.put(sample.getQualityFeature(userDimension) * itemSize + sample.getQualityFeature(itemDimension), instant);
}
// build the training data, sorting by date
userItemMap = new HashMap<>();
for (int userIndex = 0; userIndex < userSize; userIndex++) {
// TODO 考虑优化
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
// 按照时间排序
List<KeyValue<Integer, Integer>> instants = new ArrayList<>(userVector.getElementSize());
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
instants.add(new KeyValue<>(itemIndex, instantTabel.get(userIndex * itemSize + itemIndex)));
}
Collections.sort(instants, (left, right) -> {
// 升序
return left.getValue().compareTo(right.getValue());
});
List<Integer> items = new ArrayList<>(userVector.getElementSize());
for (KeyValue<Integer, Integer> term : instants) {
items.add(term.getKey());
}
userItemMap.put(userIndex, items);
}
// count variables
// initialize count variables.
userTopicTimes = DenseMatrix.valueOf(userSize, factorSize);
userTokenNumbers = DenseVector.valueOf(userSize);
// 注意:numItems + 1的最后一个元素代表没有任何记录的概率
topicItemBigramTimes = new int[factorSize][itemSize + 1][itemSize];
topicItemProbabilities = DenseMatrix.valueOf(factorSize, itemSize + 1);
// Logs.debug("topicPreItemCurItemNum consumes {} bytes",
// Strings.toString(Memory.bytes(topicPreItemCurItemNum)));
// parameters
userTopicSums = DenseMatrix.valueOf(userSize, factorSize);
topicItemBigramSums = new float[factorSize][itemSize + 1][itemSize];
topicItemBigramProbabilities = new float[factorSize][itemSize + 1][itemSize];
// hyper-parameters
alpha = DenseVector.valueOf(factorSize);
alpha.setValues(initAlpha);
beta = DenseMatrix.valueOf(factorSize, itemSize + 1);
beta.setValues(initBeta);
// initialization
topicAssignments = new Int2IntRBTreeMap();
for (Entry<Integer, List<Integer>> term : userItemMap.entrySet()) {
int userIndex = term.getKey();
List<Integer> items = term.getValue();
for (int index = 0; index < items.size(); index++) {
int nextItemIndex = items.get(index);
// TODO 需要重构
int topicIndex = RandomUtility.randomInteger(factorSize);
topicAssignments.put(userIndex * itemSize + nextItemIndex, topicIndex);
userTopicTimes.shiftValue(userIndex, topicIndex, 1F);
userTokenNumbers.shiftValue(userIndex, 1F);
int previousItemIndex = index > 0 ? items.get(index - 1) : itemSize;
topicItemBigramTimes[topicIndex][previousItemIndex][nextItemIndex]++;
topicItemProbabilities.shiftValue(topicIndex, previousItemIndex, 1F);
}
}
randomProbabilities = DenseVector.valueOf(factorSize);
}
@Override
protected void eStep() {
float sumAlpha = alpha.getSum(false);
DenseVector topicVector = DenseVector.valueOf(factorSize);
topicVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(beta.getRowVector(scalar.getIndex()).getSum(false));
});
for (Entry<Integer, List<Integer>> term : userItemMap.entrySet()) {
int userIndex = term.getKey();
List<Integer> items = term.getValue();
for (int index = 0; index < items.size(); index++) {
int nextItemIndex = items.get(index);
int assignmentIndex = topicAssignments.get(userIndex * itemSize + nextItemIndex);
userTopicTimes.shiftValue(userIndex, assignmentIndex, -1F);
userTokenNumbers.shiftValue(userIndex, -1F);
int previousItemIndex = index > 0 ? items.get(index - 1) : itemSize;
topicItemBigramTimes[assignmentIndex][previousItemIndex][nextItemIndex]--;
topicItemProbabilities.shiftValue(assignmentIndex, previousItemIndex, -1F);
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
randomProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int topicIndex = scalar.getIndex();
float userProbability = (userTopicTimes.getValue(userIndex, assignmentIndex) + alpha.getValue(topicIndex)) / (userTokenNumbers.getValue(userIndex) + sumAlpha);
float topicProbability = (topicItemBigramTimes[topicIndex][previousItemIndex][nextItemIndex] + beta.getValue(topicIndex, previousItemIndex)) / (topicItemProbabilities.getValue(topicIndex, previousItemIndex) + topicVector.getValue(topicIndex));
float value = userProbability * topicProbability;
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
int randomIndex = SampleUtility.binarySearch(randomProbabilities, 0, randomProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
topicAssignments.put(userIndex * itemSize + nextItemIndex, randomIndex);
userTopicTimes.shiftValue(userIndex, randomIndex, 1F);
userTokenNumbers.shiftValue(userIndex, 1F);
topicItemBigramTimes[randomIndex][previousItemIndex][nextItemIndex]++;
topicItemProbabilities.shiftValue(randomIndex, previousItemIndex, 1F);
}
}
}
@Override
protected void mStep() {
float denominator = 0F;
float value = 0F;
float alphaSum = alpha.getSum(false);
float alphaDigamma = GammaUtility.digamma(alphaSum);
float alphaValue;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
// TODO 应该修改为稀疏向量
value = userTokenNumbers.getValue(userIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + alphaSum) - alphaDigamma;
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
alphaValue = alpha.getValue(topicIndex);
alphaDigamma = GammaUtility.digamma(alphaValue);
float numerator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
// TODO 应该修改为稀疏矩阵
value = userTopicTimes.getValue(userIndex, topicIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + alphaValue) - alphaDigamma;
}
}
if (numerator != 0D) {
alpha.setValue(topicIndex, alphaValue * (numerator / denominator));
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float betaSum = beta.getRowVector(topicIndex).getSum(false);
float betaDigamma = GammaUtility.digamma(betaSum);
float betaValue;
float[] denominators = new float[itemSize + 1];
for (int itemIndex = 0; itemIndex < itemSize + 1; itemIndex++) {
// TODO 应该修改为稀疏矩阵
value = topicItemProbabilities.getValue(topicIndex, itemIndex);
if (value != 0F) {
denominators[itemIndex] = GammaUtility.digamma(value + betaSum) - betaDigamma;
}
}
for (int previousItemIndex = 0; previousItemIndex < itemSize + 1; previousItemIndex++) {
betaValue = beta.getValue(topicIndex, previousItemIndex);
betaDigamma = GammaUtility.digamma(betaValue);
float numerator = 0F;
denominator = 0F;
for (int nextItemIndex = 0; nextItemIndex < itemSize; nextItemIndex++) {
// TODO 应该修改为稀疏张量
value = topicItemBigramTimes[topicIndex][previousItemIndex][nextItemIndex];
if (value != 0F) {
numerator += GammaUtility.digamma(value + betaValue) - betaDigamma;
}
denominator += denominators[previousItemIndex];
}
if (numerator != 0F) {
beta.setValue(topicIndex, previousItemIndex, betaValue * (numerator / denominator));
}
}
}
}
@Override
protected void readoutParameters() {
float value;
float sumAlpha = alpha.getSum(false);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = (userTopicTimes.getValue(userIndex, topicIndex) + alpha.getValue(topicIndex)) / (userTokenNumbers.getValue(userIndex) + sumAlpha);
userTopicSums.shiftValue(userIndex, topicIndex, value);
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float betaTopicValue = beta.getRowVector(topicIndex).getSum(false);
for (int previousItemIndex = 0; previousItemIndex < itemSize + 1; previousItemIndex++) {
for (int nextItemIndex = 0; nextItemIndex < itemSize; nextItemIndex++) {
value = (topicItemBigramTimes[topicIndex][previousItemIndex][nextItemIndex] + beta.getValue(topicIndex, previousItemIndex)) / (topicItemProbabilities.getValue(topicIndex, previousItemIndex) + betaTopicValue);
topicItemBigramSums[topicIndex][previousItemIndex][nextItemIndex] += value;
}
}
}
if (logger.isInfoEnabled()) {
String message = StringUtility.format("sumAlpha is {}", sumAlpha);
logger.info(message);
}
numberOfStatistics++;
}
@Override
protected void estimateParameters() {
userTopicProbabilities = DenseMatrix.copyOf(userTopicSums);
userTopicProbabilities.scaleValues(1F / numberOfStatistics);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int previousItemIndex = 0; previousItemIndex < itemSize + 1; previousItemIndex++) {
for (int nextItemIndex = 0; nextItemIndex < itemSize; nextItemIndex++) {
topicItemBigramProbabilities[topicIndex][previousItemIndex][nextItemIndex] = topicItemBigramSums[topicIndex][previousItemIndex][nextItemIndex] / numberOfStatistics;
}
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
List<Integer> items = userItemMap.get(userIndex);
int scoreIndex = items == null ? itemSize : items.get(items.size() - 1); // last
// rated
// item
float value = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value += userTopicProbabilities.getValue(userIndex, topicIndex) * topicItemBigramProbabilities[topicIndex][scoreIndex][itemIndex];
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,76 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.rns.model.collaborative.ItemKNNModel;
/**
*
* Item KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class ItemKNNRankingModel extends ItemKNNModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
SparseVector userVector = userVectors[userIndex];
MathVector neighbors = itemNeighbors[itemIndex];
if (userVector.getElementSize() == 0 || neighbors.getElementSize() == 0) {
instance.setQuantityMark(0F);
return;
}
float sum = 0F, absolute = 0F;
int count = 0;
int leftCursor = 0, rightCursor = 0, leftSize = userVector.getElementSize(), rightSize = neighbors.getElementSize();
Iterator<VectorScalar> leftIterator = userVector.iterator();
VectorScalar leftTerm = leftIterator.next();
Iterator<VectorScalar> rightIterator = neighbors.iterator();
VectorScalar rightTerm = rightIterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (leftTerm.getIndex() == rightTerm.getIndex()) {
count++;
sum += rightTerm.getValue();
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
leftCursor++;
rightCursor++;
} else if (leftTerm.getIndex() > rightTerm.getIndex()) {
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
rightCursor++;
} else if (leftTerm.getIndex() < rightTerm.getIndex()) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
leftCursor++;
}
}
if (count == 0) {
instance.setQuantityMark(0F);
return;
}
instance.setQuantityMark(sum);
}
}

View File

@ -0,0 +1,294 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.ArrayList;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.model.exception.ModelException;
import com.jstarcraft.rns.utility.GammaUtility;
import com.jstarcraft.rns.utility.SampleUtility;
/**
*
* LDA推荐器
*
* <pre>
* Latent Dirichlet Allocation for implicit feedback
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class LDAModel extends ProbabilisticGraphicalModel {
/**
* entry[k, i]: number of tokens assigned to topic k, given item i.
*/
private DenseMatrix topicItemNumbers;
/**
* entry[u, k]: number of tokens assigned to topic k, given user u.
*/
private DenseMatrix userTopicNumbers;
/**
* topic assignment as list from the iterator of trainMatrix
*/
private List<Integer> topicAssignments;
/**
* entry[u]: number of tokens rated by user u.
*/
private DenseVector userTokenNumbers;
/**
* entry[k]: number of tokens assigned to topic t.
*/
private DenseVector topicTokenNumbers;
/**
* vector of hyperparameters for alpha and beta
*/
private DenseVector alpha, beta;
/**
* cumulative statistics of theta, phi
*/
private DenseMatrix userTopicSums, topicItemSums;
/**
* posterior probabilities of parameters
*/
private DenseMatrix userTopicProbabilities, topicItemProbabilities;
private DenseVector sampleProbabilities;
/**
* setup init member method
*
* @throws ModelException if error occurs
*/
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
userTopicSums = DenseMatrix.valueOf(userSize, factorSize);
topicItemSums = DenseMatrix.valueOf(factorSize, itemSize);
// initialize count variables.
userTopicNumbers = DenseMatrix.valueOf(userSize, factorSize);
userTokenNumbers = DenseVector.valueOf(userSize);
topicItemNumbers = DenseMatrix.valueOf(factorSize, itemSize);
topicTokenNumbers = DenseVector.valueOf(factorSize);
// default value:
// homas L Griffiths and Mark Steyvers. Finding scientific topics.
// Proceedings of the National Academy of Sciences, 101(suppl
// 1):52285235, 2004.
/**
* Dirichlet hyper-parameters of user-topic distribution: typical value is 50/K
*/
float initAlpha = configuration.getFloat("recommender.user.dirichlet.prior", 50F / factorSize);
/**
* Dirichlet hyper-parameters of topic-item distribution, typical value is 0.01
*/
float initBeta = configuration.getFloat("recommender.topic.dirichlet.prior", 0.01F);
alpha = DenseVector.valueOf(factorSize);
alpha.setValues(initAlpha);
beta = DenseVector.valueOf(itemSize);
beta.setValues(initBeta);
// The z_u,i are initialized to values in [0, K-1] to determine the
// initial state of the Markov chain.
topicAssignments = new ArrayList<>(scoreMatrix.getElementSize());
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
int times = (int) (term.getValue());
for (int time = 0; time < times; time++) {
int topicIndex = RandomUtility.randomInteger(factorSize); // 0
// ~
// k-1
// assign a topic t to pair (u, i)
topicAssignments.add(topicIndex);
// number of items of user u assigned to topic t.
userTopicNumbers.shiftValue(userIndex, topicIndex, 1F);
// total number of items of user u
userTokenNumbers.shiftValue(userIndex, 1F);
// number of instances of item i assigned to topic t
topicItemNumbers.shiftValue(topicIndex, itemIndex, 1F);
// total number of words assigned to topic t.
topicTokenNumbers.shiftValue(topicIndex, 1F);
}
}
sampleProbabilities = DenseVector.valueOf(factorSize);
}
@Override
protected void eStep() {
float sumAlpha = alpha.getSum(false);
float sumBeta = beta.getSum(false);
// Gibbs sampling from full conditional distribution
int assignmentsIndex = 0;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
int times = (int) (term.getValue());
for (int time = 0; time < times; time++) {
int topicIndex = topicAssignments.get(assignmentsIndex); // topic
userTopicNumbers.shiftValue(userIndex, topicIndex, -1F);
userTokenNumbers.shiftValue(userIndex, -1F);
topicItemNumbers.shiftValue(topicIndex, itemIndex, -1F);
topicTokenNumbers.shiftValue(topicIndex, -1F);
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
sampleProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = (userTopicNumbers.getValue(userIndex, index) + alpha.getValue(index)) / (userTokenNumbers.getValue(userIndex) + sumAlpha) * (topicItemNumbers.getValue(index, itemIndex) + beta.getValue(itemIndex)) / (topicTokenNumbers.getValue(index) + sumBeta);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
// scaled sample because of unnormalized p[], randomly sampled a
// new topic t
topicIndex = SampleUtility.binarySearch(sampleProbabilities, 0, sampleProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
// add newly estimated z_i to count variables
userTopicNumbers.shiftValue(userIndex, topicIndex, 1F);
userTokenNumbers.shiftValue(userIndex, 1F);
topicItemNumbers.shiftValue(topicIndex, itemIndex, 1F);
topicTokenNumbers.shiftValue(topicIndex, 1F);
topicAssignments.set(assignmentsIndex, topicIndex);
assignmentsIndex++;
}
}
}
@Override
protected void mStep() {
float denominator;
float value;
// update alpha vector
float alphaSum = alpha.getSum(false);
float alphaDigamma = GammaUtility.digamma(alphaSum);
float alphaValue;
denominator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userTokenNumbers.getValue(userIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + alphaSum) - alphaDigamma;
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
alphaValue = alpha.getValue(topicIndex);
alphaDigamma = GammaUtility.digamma(alphaValue);
float numerator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userTopicNumbers.getValue(userIndex, topicIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + alphaValue) - alphaDigamma;
}
}
if (numerator != 0F) {
alpha.setValue(topicIndex, alphaValue * (numerator / denominator));
}
}
// update beta vector
float betaSum = beta.getSum(false);
float betaDigamma = GammaUtility.digamma(betaSum);
float betaValue;
denominator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicTokenNumbers.getValue(topicIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + betaSum) - betaDigamma;
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
betaValue = beta.getValue(itemIndex);
betaDigamma = GammaUtility.digamma(betaValue);
float numerator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemNumbers.getValue(topicIndex, itemIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + betaValue) - betaDigamma;
}
}
if (numerator != 0F) {
beta.setValue(itemIndex, betaValue * (numerator / denominator));
}
}
}
/**
* Add to the statistics the values of theta and phi for the current state.
*/
@Override
protected void readoutParameters() {
float sumAlpha = alpha.getSum(false);
float sumBeta = beta.getSum(false);
float value;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = (userTopicNumbers.getValue(userIndex, topicIndex) + alpha.getValue(topicIndex)) / (userTokenNumbers.getValue(userIndex) + sumAlpha);
userTopicSums.shiftValue(userIndex, topicIndex, value);
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
value = (topicItemNumbers.getValue(topicIndex, itemIndex) + beta.getValue(itemIndex)) / (topicTokenNumbers.getValue(topicIndex) + sumBeta);
topicItemSums.shiftValue(topicIndex, itemIndex, value);
}
}
numberOfStatistics++;
}
@Override
protected void estimateParameters() {
float scale = 1F / numberOfStatistics;
userTopicProbabilities = DenseMatrix.copyOf(userTopicSums);
userTopicProbabilities.scaleValues(scale);
topicItemProbabilities = DenseMatrix.copyOf(topicItemSums);
topicItemProbabilities.scaleValues(scale);
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userTopicProbabilities.getRowVector(userIndex);
DenseVector itemVector = topicItemProbabilities.getColumnVector(itemIndex);
instance.setQuantityMark(scalar.dotProduct(userVector, itemVector).getValue());
}
}

View File

@ -0,0 +1,128 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Arrays;
import java.util.Comparator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.module.ArrayInstance;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.utility.LogisticUtility;
import com.jstarcraft.rns.utility.SampleUtility;
/**
*
* Lambda FM推荐器
*
* <pre>
* LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
* </pre>
*
* @author Birdy
*
*/
public class LambdaFMDynamicModel extends LambdaFMModel {
// Dynamic
private float dynamicRho;
private int numberOfOrders;
private DenseVector orderProbabilities;
private ArrayInstance[] negatives;
private Integer[] orderIndexes;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
dynamicRho = configuration.getFloat("recommender.item.distribution.parameter");
numberOfOrders = configuration.getInteger("recommender.number.orders", 10);
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
orderProbabilities = DenseVector.valueOf(numberOfOrders);
orderProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = (float) (Math.exp(-(index + 1) / (numberOfOrders * dynamicRho)));
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
negatives = new ArrayInstance[numberOfOrders];
orderIndexes = new Integer[numberOfOrders];
for (int index = 0; index < numberOfOrders; index++) {
negatives[index] = new ArrayInstance(model.getQualityOrder(), model.getQuantityOrder());
orderIndexes[index] = index;
}
}
@Override
protected float getGradientValue(DataModule[] modules, ArrayInstance positive, ArrayInstance negative, DefaultScalar scalar) {
int userIndex;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0 || userVector.getElementSize() == itemSize) {
continue;
}
DataModule module = modules[userIndex];
DataInstance instance = module.getInstance(0);
int positivePosition = RandomUtility.randomInteger(module.getSize());
instance.setCursor(positivePosition);
positive.copyInstance(instance);
// TODO negativeGroup.size()可能永远达不到numberOfNegatives,需要处理
for (int orderIndex = 0; orderIndex < numberOfOrders; orderIndex++) {
int negativeItemIndex = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (int position = 0, size = userVector.getElementSize(); position < size; position++) {
if (negativeItemIndex >= userVector.getIndex(position)) {
negativeItemIndex++;
continue;
}
break;
}
// TODO 注意,此处为了故意制造负面特征.
int negativePosition = RandomUtility.randomInteger(module.getSize());
instance.setCursor(negativePosition);
negatives[orderIndex].copyInstance(instance);
negatives[orderIndex].setQualityFeature(itemDimension, negativeItemIndex);
MathVector vector = getFeatureVector(negatives[orderIndex]);
negatives[orderIndex].setQuantityMark(predict(scalar, vector));
}
int orderIndex = SampleUtility.binarySearch(orderProbabilities, 0, orderProbabilities.getElementSize() - 1, RandomUtility.randomFloat(orderProbabilities.getValue(orderProbabilities.getElementSize() - 1)));
Arrays.sort(orderIndexes, new Comparator<Integer>() {
@Override
public int compare(Integer leftIndex, Integer rightIndex) {
return (negatives[leftIndex].getQuantityMark() > negatives[rightIndex].getQuantityMark() ? -1 : (negatives[leftIndex].getQuantityMark() < negatives[rightIndex].getQuantityMark() ? 1 : 0));
}
});
negative = negatives[orderIndexes[orderIndex]];
break;
}
positiveVector = getFeatureVector(positive);
negativeVector = getFeatureVector(negative);
float positiveScore = predict(scalar, positiveVector);
float negativeScore = predict(scalar, negativeVector);
float error = positiveScore - negativeScore;
// 由于pij_real默认为1,所以简化了loss的计算.
// loss += -pij_real * Math.log(pij) - (1 - pij_real) *
// Math.log(1 - pij);
totalError += (float) -Math.log(LogisticUtility.getValue(error));
float gradient = calaculateGradientValue(lossType, error);
return gradient;
}
}

View File

@ -0,0 +1,149 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.module.ArrayInstance;
import com.jstarcraft.ai.data.processor.DataSplitter;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.data.processor.QualityFeatureDataSplitter;
import com.jstarcraft.rns.model.FactorizationMachineModel;
/**
*
* Lambda FM推荐器
*
* <pre>
* LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
* </pre>
*
* @author Birdy
*
*/
public abstract class LambdaFMModel extends FactorizationMachineModel {
protected int lossType;
protected MathVector positiveVector, negativeVector;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
lossType = configuration.getInteger("losstype", 3);
biasRegularization = configuration.getFloat("recommender.fm.regw0", 0.1F);
weightRegularization = configuration.getFloat("recommender.fm.regW", 0.1F);
factorRegularization = configuration.getFloat("recommender.fm.regF", 0.001F);
}
protected abstract float getGradientValue(DataModule[] modules, ArrayInstance positive, ArrayInstance negative, DefaultScalar scalar);
@Override
protected void doPractice() {
ArrayInstance positive = new ArrayInstance(marker.getQualityOrder(), marker.getQuantityOrder());
ArrayInstance negative = new ArrayInstance(marker.getQualityOrder(), marker.getQuantityOrder());
DefaultScalar scalar = DefaultScalar.getInstance();
DataSplitter splitter = new QualityFeatureDataSplitter(userDimension);
DataModule[] modules = splitter.split(marker, userSize);
DenseVector positiveSum = DenseVector.valueOf(factorSize);
DenseVector negativeSum = DenseVector.valueOf(factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
long totalTime = 0;
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 50; sampleIndex < sampleTimes; sampleIndex++) {
long current = System.currentTimeMillis();
float gradient = getGradientValue(modules, positive, negative, scalar);
totalTime += (System.currentTimeMillis() - current);
sum(positiveVector, positiveSum);
sum(negativeVector, negativeSum);
int leftIndex = 0, rightIndex = 0;
Iterator<VectorScalar> leftIterator = positiveVector.iterator();
Iterator<VectorScalar> rightIterator = negativeVector.iterator();
for (int index = 0; index < marker.getQualityOrder(); index++) {
VectorScalar leftTerm = leftIterator.next();
VectorScalar rightTerm = rightIterator.next();
leftIndex = leftTerm.getIndex();
rightIndex = rightTerm.getIndex();
if (leftIndex == rightIndex) {
weightVector.shiftValue(leftIndex, learnRatio * (gradient * 0F - weightRegularization * weightVector.getValue(leftIndex)));
totalError += weightRegularization * weightVector.getValue(leftIndex) * weightVector.getValue(leftIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float positiveFactor = positiveSum.getValue(factorIndex) * leftTerm.getValue() - featureFactors.getValue(leftIndex, factorIndex) * leftTerm.getValue() * leftTerm.getValue();
float negativeFactor = negativeSum.getValue(factorIndex) * rightTerm.getValue() - featureFactors.getValue(rightIndex, factorIndex) * rightTerm.getValue() * rightTerm.getValue();
featureFactors.shiftValue(leftIndex, factorIndex, learnRatio * (gradient * (positiveFactor - negativeFactor) - factorRegularization * featureFactors.getValue(leftIndex, factorIndex)));
totalError += factorRegularization * featureFactors.getValue(leftIndex, factorIndex) * featureFactors.getValue(leftIndex, factorIndex);
}
} else {
weightVector.shiftValue(leftIndex, learnRatio * (gradient * leftTerm.getValue() - weightRegularization * weightVector.getValue(leftIndex)));
totalError += weightRegularization * weightVector.getValue(leftIndex) * weightVector.getValue(leftIndex);
weightVector.shiftValue(rightIndex, learnRatio * (gradient * -rightTerm.getValue() - weightRegularization * weightVector.getValue(rightIndex)));
totalError += weightRegularization * weightVector.getValue(rightIndex) * weightVector.getValue(rightIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float positiveFactor = positiveSum.getValue(factorIndex) * leftTerm.getValue() - featureFactors.getValue(leftIndex, factorIndex) * leftTerm.getValue() * leftTerm.getValue();
featureFactors.shiftValue(leftIndex, factorIndex, learnRatio * (gradient * positiveFactor - factorRegularization * featureFactors.getValue(leftIndex, factorIndex)));
totalError += factorRegularization * featureFactors.getValue(leftIndex, factorIndex) * featureFactors.getValue(leftIndex, factorIndex);
float negativeFactor = negativeSum.getValue(factorIndex) * rightTerm.getValue() - featureFactors.getValue(rightIndex, factorIndex) * rightTerm.getValue() * rightTerm.getValue();
featureFactors.shiftValue(rightIndex, factorIndex, learnRatio * (gradient * -negativeFactor - factorRegularization * featureFactors.getValue(rightIndex, factorIndex)));
totalError += factorRegularization * featureFactors.getValue(rightIndex, factorIndex) * featureFactors.getValue(rightIndex, factorIndex);
}
}
}
}
System.out.println(totalTime);
totalError *= 0.5;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
protected void isLearned(int iteration) {
if (learnRatio < 0F) {
return;
}
if (isLearned && iteration > 1) {
learnRatio = Math.abs(currentError) > Math.abs(totalError) ? learnRatio * 1.05F : learnRatio * 0.5F;
} else if (learnDecay > 0 && learnDecay < 1) {
learnRatio *= learnDecay;
}
// limit to max-learn-rate after update
if (learnLimit > 0 && learnRatio > learnLimit) {
learnRatio = learnLimit;
}
}
private void sum(MathVector vector, DenseVector sum) {
// TODO 考虑调整为向量操作.
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float value = 0F;
for (VectorScalar term : vector) {
value += featureFactors.getValue(term.getIndex(), factorIndex) * term.getValue();
}
sum.setValue(factorIndex, value);
}
}
}

View File

@ -0,0 +1,130 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Arrays;
import java.util.Comparator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.module.ArrayInstance;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.utility.LogisticUtility;
import com.jstarcraft.rns.utility.SampleUtility;
/**
*
* Lambda FM推荐器
*
* <pre>
* LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
* </pre>
*
* @author Birdy
*
*/
public class LambdaFMStaticModel extends LambdaFMModel {
// Static
private float staticRho;
protected DenseVector itemProbabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
staticRho = configuration.getFloat("recommender.item.distribution.parameter");
// calculate popularity
Integer[] orderItems = new Integer[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
orderItems[itemIndex] = itemIndex;
}
Arrays.sort(orderItems, new Comparator<Integer>() {
@Override
public int compare(Integer leftItemIndex, Integer rightItemIndex) {
return (scoreMatrix.getColumnScope(leftItemIndex) > scoreMatrix.getColumnScope(rightItemIndex) ? -1 : (scoreMatrix.getColumnScope(leftItemIndex) < scoreMatrix.getColumnScope(rightItemIndex) ? 1 : 0));
}
});
Integer[] itemOrders = new Integer[itemSize];
for (int index = 0; index < itemSize; index++) {
int itemIndex = orderItems[index];
itemOrders[itemIndex] = index;
}
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
itemProbabilities = DenseVector.valueOf(itemSize);
itemProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = (float) Math.exp(-(itemOrders[index] + 1) / (itemSize * staticRho));
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
for (MatrixScalar term : scoreMatrix) {
term.setValue(itemProbabilities.getValue(term.getColumn()));
}
}
@Override
protected float getGradientValue(DataModule[] modules, ArrayInstance positive, ArrayInstance negative, DefaultScalar scalar) {
int userIndex;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0 || userVector.getElementSize() == itemSize) {
continue;
}
DataModule module = modules[userIndex];
DataInstance instance = module.getInstance(0);
int positivePosition = RandomUtility.randomInteger(module.getSize());
instance.setCursor(positivePosition);
positive.copyInstance(instance);
// TODO 注意,此处为了故意制造负面特征.
int negativeItemIndex = -1;
while (negativeItemIndex == -1) {
int position = SampleUtility.binarySearch(userVector, 0, userVector.getElementSize() - 1, RandomUtility.randomFloat(itemProbabilities.getValue(itemProbabilities.getElementSize() - 1)));
int low;
int high;
if (position == -1) {
low = userVector.getIndex(userVector.getElementSize() - 1);
high = itemProbabilities.getElementSize() - 1;
} else if (position == 0) {
low = 0;
high = userVector.getIndex(position);
} else {
low = userVector.getIndex(position - 1);
high = userVector.getIndex(position);
}
negativeItemIndex = SampleUtility.binarySearch(itemProbabilities, low, high, RandomUtility.randomFloat(itemProbabilities.getValue(high)));
}
int negativePosition = RandomUtility.randomInteger(module.getSize());
;
instance.setCursor(negativePosition);
negative.copyInstance(instance);
negative.setQualityFeature(itemDimension, negativeItemIndex);
break;
}
positiveVector = getFeatureVector(positive);
negativeVector = getFeatureVector(negative);
float positiveScore = predict(scalar, positiveVector);
float negativeScore = predict(scalar, negativeVector);
float error = positiveScore - negativeScore;
// 由于pij_real默认为1,所以简化了loss的计算.
// loss += -pij_real * Math.log(pij) - (1 - pij_real) *
// Math.log(1 - pij);
totalError += (float) -Math.log(LogisticUtility.getValue(error));
float gradient = calaculateGradientValue(lossType, error);
return gradient;
}
}

View File

@ -0,0 +1,102 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.module.ArrayInstance;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* Lambda FM推荐器
*
* <pre>
* LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
* </pre>
*
* @author Birdy
*
*/
public class LambdaFMWeightModel extends LambdaFMModel {
// Weight
private float[] orderLosses;
private float epsilon;
private int Y, N;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
epsilon = configuration.getFloat("epsilon");
orderLosses = new float[itemSize - 1];
float orderLoss = 0F;
for (int orderIndex = 1; orderIndex < itemSize; orderIndex++) {
orderLoss += 1F / orderIndex;
orderLosses[orderIndex - 1] = orderLoss;
}
for (int rankIndex = 1; rankIndex < itemSize; rankIndex++) {
orderLosses[rankIndex - 1] /= orderLoss;
}
}
@Override
protected float getGradientValue(DataModule[] modules, ArrayInstance positive, ArrayInstance negative, DefaultScalar scalar) {
int userIndex;
float positiveScore;
float negativeScore;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0 || userVector.getElementSize() == itemSize) {
continue;
}
N = 0;
Y = itemSize - scoreMatrix.getRowScope(userIndex);
DataModule module = modules[userIndex];
DataInstance instance = module.getInstance(0);
int positivePosition = RandomUtility.randomInteger(module.getSize());
instance.setCursor(positivePosition);
positive.copyInstance(instance);
positiveVector = getFeatureVector(positive);
positiveScore = predict(scalar, positiveVector);
do {
N++;
int negativeItemIndex = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (int position = 0, size = userVector.getElementSize(); position < size; position++) {
if (negativeItemIndex >= userVector.getIndex(position)) {
negativeItemIndex++;
continue;
}
break;
}
// TODO 注意,此处为了故意制造负面特征.
int negativePosition = RandomUtility.randomInteger(module.getSize());
// TODO 注意,此处为了故意制造负面特征.
instance.setCursor(negativePosition);
negative.copyInstance(instance);
negative.setQualityFeature(itemDimension, negativeItemIndex);
negativeVector = getFeatureVector(negative);
negativeScore = predict(scalar, negativeVector);
} while ((positiveScore - negativeScore > epsilon) && N < Y - 1);
break;
}
float error = positiveScore - negativeScore;
// 由于pij_real默认为1,所以简化了loss的计算.
// loss += -pij_real * Math.log(pij) - (1 - pij_real) *
// Math.log(1 - pij);
totalError += (float) -Math.log(LogisticUtility.getValue(error));
float gradient = calaculateGradientValue(lossType, error);
int orderIndex = (int) ((Y - 1) / N);
float orderLoss = orderLosses[orderIndex];
gradient = gradient * orderLoss;
return gradient;
}
}

View File

@ -0,0 +1,80 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* ListwiseMF推荐器
*
* <pre>
* List-wise learning to rank with matrix factorization for collaborative filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class ListwiseMFModel extends MatrixFactorizationModel {
private DenseVector userExponentials;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userExponentials = DenseVector.valueOf(userSize);
for (MatrixScalar matrixentry : scoreMatrix) {
int userIndex = matrixentry.getRow();
float score = matrixentry.getValue();
userExponentials.shiftValue(userIndex, (float) Math.exp(score));
}
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
float exponential = 0F;
for (VectorScalar term : userVector) {
exponential += Math.exp(predict(userIndex, term.getIndex()));
}
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
float score = term.getValue();
float predict = predict(userIndex, itemIndex);
float error = (float) (Math.exp(score) / userExponentials.getValue(userIndex) - Math.log(Math.exp(predict) / exponential)) * LogisticUtility.getGradient(predict);
totalError -= error;
// update factors
for (int factorIdx = 0; factorIdx < factorSize; factorIdx++) {
float userFactor = userFactors.getValue(userIndex, factorIdx);
float itemFactor = itemFactors.getValue(itemIndex, factorIdx);
float userDelta = error * itemFactor - userRegularization * userFactor;
float itemDelta = error * userFactor - itemRegularization * itemFactor;
userFactors.shiftValue(userIndex, factorIdx, learnRatio * userDelta);
itemFactors.shiftValue(itemIndex, factorIdx, learnRatio * itemDelta);
totalError += 0.5D * userRegularization * userFactor * userFactor + 0.5D * itemRegularization * itemFactor * itemFactor;
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,161 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.table.SparseTable;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import it.unimi.dsi.fastutil.ints.Int2ObjectRBTreeMap;
/**
*
* PLSA推荐器
*
* <pre>
* Latent semantic models for collaborative filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class PLSAModel extends ProbabilisticGraphicalModel {
/**
* {user, item, {topic z, probability}}
*/
private SparseTable<DenseVector> probabilityTensor;
/**
* Conditional Probability: P(z|u)
*/
private DenseMatrix userTopicProbabilities, userTopicSums;
/**
* Conditional Probability: P(i|z)
*/
private DenseMatrix topicItemProbabilities, topicItemSums;
/**
* topic probability sum value
*/
private DenseVector topicProbabilities;
/**
* entry[u]: number of tokens rated by user u.
*/
private DenseVector userScoreTimes;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
userTopicSums = DenseMatrix.valueOf(userSize, factorSize);
topicItemSums = DenseMatrix.valueOf(factorSize, itemSize);
topicProbabilities = DenseVector.valueOf(factorSize);
userTopicProbabilities = DenseMatrix.valueOf(userSize, factorSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
DenseVector probabilityVector = userTopicProbabilities.getRowVector(userIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(userSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
topicItemProbabilities = DenseMatrix.valueOf(factorSize, itemSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = topicItemProbabilities.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(itemSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
// initialize Q
// initialize Q
probabilityTensor = new SparseTable<>(true, userSize, itemSize, new Int2ObjectRBTreeMap<>());
userScoreTimes = DenseVector.valueOf(userSize);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
probabilityTensor.setValue(userIndex, itemIndex, DenseVector.valueOf(factorSize));
userScoreTimes.shiftValue(userIndex, term.getValue());
}
}
@Override
protected void eStep() {
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
DenseVector probabilities = probabilityTensor.getValue(userIndex, itemIndex);
probabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = userTopicProbabilities.getValue(userIndex, index) * topicItemProbabilities.getValue(index, itemIndex);
scalar.setValue(value);
});
probabilities.scaleValues(1F / probabilities.getSum(false));
}
}
@Override
protected void mStep() {
userTopicSums.setValues(0F);
topicItemSums.setValues(0F);
topicProbabilities.setValues(0F);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float numerator = term.getValue();
DenseVector probabilities = probabilityTensor.getValue(userIndex, itemIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float value = probabilities.getValue(topicIndex) * numerator;
userTopicSums.shiftValue(userIndex, topicIndex, value);
topicItemSums.shiftValue(topicIndex, itemIndex, value);
topicProbabilities.shiftValue(topicIndex, value);
}
}
for (int userIndex = 0; userIndex < userSize; userIndex++) {
float denominator = userScoreTimes.getValue(userIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float value = denominator > 0F ? userTopicSums.getValue(userIndex, topicIndex) / denominator : 0F;
userTopicProbabilities.setValue(userIndex, topicIndex, value);
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float probability = topicProbabilities.getValue(topicIndex);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float value = probability > 0F ? topicItemSums.getValue(topicIndex, itemIndex) / probability : 0F;
topicItemProbabilities.setValue(topicIndex, itemIndex, value);
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userTopicProbabilities.getRowVector(userIndex);
DenseVector itemVector = topicItemProbabilities.getColumnVector(itemIndex);
instance.setQuantityMark(scalar.dotProduct(userVector, itemVector).getValue());
}
}

View File

@ -0,0 +1,241 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MatrixUtility;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* Rank ALS推荐器
*
* <pre>
* Alternating Least Squares for Personalized Ranking
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RankALSModel extends MatrixFactorizationModel {
// whether support based weighting is used ($s_i=|U_i|$) or not ($s_i=1$)
private boolean weight;
private DenseVector weightVector;
private float sumSupport;
// TODO 考虑重构到父类
private List<Integer> userList;
// TODO 考虑重构到父类
private List<Integer> itemList;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
weight = configuration.getBoolean("recommender.rankals.support.weight", true);
weightVector = DenseVector.valueOf(itemSize);
sumSupport = 0;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float supportValue = weight ? scoreMatrix.getColumnScope(itemIndex) : 1F;
weightVector.setValue(itemIndex, supportValue);
sumSupport += supportValue;
}
userList = new LinkedList<>();
for (int userIndex = 0; userIndex < userSize; userIndex++) {
if (scoreMatrix.getRowVector(userIndex).getElementSize() > 0) {
userList.add(userIndex);
}
}
userList = new ArrayList<>(userList);
itemList = new LinkedList<>();
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (scoreMatrix.getColumnVector(itemIndex).getElementSize() > 0) {
itemList.add(itemIndex);
}
}
itemList = new ArrayList<>(itemList);
}
@Override
protected void doPractice() {
// 缓存特征计算,避免消耗内存
DenseMatrix matrixCache = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix copyCache = DenseMatrix.valueOf(factorSize, factorSize);
DenseVector vectorCache = DenseVector.valueOf(factorSize);
DenseMatrix inverseCache = DenseMatrix.valueOf(factorSize, factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// P step: update user vectors
// 特征权重矩阵和特征权重向量
DenseMatrix factorWeightMatrix = DenseMatrix.valueOf(factorSize, factorSize);
DenseVector factorWeightVector = DenseVector.valueOf(factorSize);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
float weight = weightVector.getValue(itemIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
factorWeightMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue(value + (itemVector.getValue(row) * itemVector.getValue(column) * weight));
});
factorWeightVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + itemVector.getValue(index) * weight);
});
}
// 用户特征矩阵,用户权重向量,用户评分向量,用户次数向量.
DenseMatrix userDeltas = DenseMatrix.valueOf(userSize, factorSize);
DenseVector userWeights = DenseVector.valueOf(userSize);
DenseVector userScores = DenseVector.valueOf(userSize);
DenseVector userTimes = DenseVector.valueOf(userSize);
// 根据物品特征构建用户特征
for (int userIndex : userList) {
// for each user
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
// TODO 此处考虑重构,尽量减少数组构建
DenseMatrix factorValues = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix copyValues = DenseMatrix.valueOf(factorSize, factorSize);
DenseVector rateValues = DenseVector.valueOf(factorSize);
DenseVector weightValues = DenseVector.valueOf(factorSize);
float weightSum = 0F, rateSum = 0F, timeSum = userVector.getElementSize();
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
float score = term.getValue();
// double cui = 1;
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
factorValues.iterateElement(MathCalculator.PARALLEL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue(value + itemVector.getValue(row) * itemVector.getValue(column));
});
// ratings of unrated items will be 0
float weight = weightVector.getValue(itemIndex) * score;
float value;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
value = itemVector.getValue(factorIndex);
userDeltas.shiftValue(userIndex, factorIndex, value);
rateValues.shiftValue(factorIndex, value * score);
weightValues.shiftValue(factorIndex, value * weight);
}
rateSum += score;
weightSum += weight;
}
factorValues.iterateElement(MathCalculator.PARALLEL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue((row == column ? userRegularization : 0F) + value * sumSupport - (userDeltas.getValue(userIndex, row) * factorWeightVector.getValue(column)) - (factorWeightVector.getValue(row) * userDeltas.getValue(userIndex, column)) + (factorWeightMatrix.getValue(row, column) * timeSum));
});
float rateScale = rateSum;
float weightScale = weightSum;
rateValues.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue((value * sumSupport - userDeltas.getValue(userIndex, index) * weightScale) - (factorWeightVector.getValue(index) * rateScale) + (weightValues.getValue(index) * timeSum));
});
userFactors.getRowVector(userIndex).dotProduct(MatrixUtility.inverse(factorValues, copyValues, inverseCache), false, rateValues, MathCalculator.SERIAL);
userWeights.setValue(userIndex, weightSum);
userScores.setValue(userIndex, rateSum);
userTimes.setValue(userIndex, timeSum);
}
// Q step: update item vectors
DenseMatrix itemFactorMatrix = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix itemTimeMatrix = DenseMatrix.valueOf(factorSize, factorSize);
DenseVector itemFactorVector = DenseVector.valueOf(factorSize);
DenseVector factorValues = DenseVector.valueOf(factorSize);
for (int userIndex : userList) {
DenseVector userVector = userFactors.getRowVector(userIndex);
matrixCache.dotProduct(userVector, userVector, MathCalculator.SERIAL);
itemFactorMatrix.addMatrix(matrixCache, false);
itemTimeMatrix.iterateElement(MathCalculator.PARALLEL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue(value + (matrixCache.getValue(row, column) * userTimes.getValue(userIndex)));
});
itemFactorVector.addVector(vectorCache.dotProduct(matrixCache, false, userDeltas.getRowVector(userIndex), MathCalculator.SERIAL));
float rateSum = userScores.getValue(userIndex);
factorValues.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + userVector.getValue(index) * rateSum);
});
}
// 根据用户特征构建物品特征
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
// for each item
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
// TODO 此处考虑重构,尽量减少数组构建
DenseVector rateValues = DenseVector.valueOf(factorSize);
DenseVector weightValues = DenseVector.valueOf(factorSize);
DenseVector timeValues = DenseVector.valueOf(factorSize);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
float score = term.getValue();
float weight = userWeights.getValue(userIndex);
float time = score * userTimes.getValue(userIndex);
float value;
DenseVector userVector = userFactors.getRowVector(userIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
value = userVector.getValue(factorIndex);
rateValues.shiftValue(factorIndex, value * score);
weightValues.shiftValue(factorIndex, value * weight);
timeValues.shiftValue(factorIndex, value * time);
}
}
float weight = weightVector.getValue(itemIndex);
vectorCache.dotProduct(itemFactorMatrix, false, factorWeightVector, MathCalculator.SERIAL);
matrixCache.iterateElement(MathCalculator.PARALLEL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
scalar.setValue(itemFactorMatrix.getValue(row, column) * (weight + 1));
});
DenseVector itemValues = itemFactors.getRowVector(itemIndex);
vectorCache.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
value = value + (rateValues.getValue(index) * sumSupport) - weightValues.getValue(index) + (itemFactorVector.getValue(index) * weight) - (factorValues.getValue(index) * weight) + (timeValues.getValue(index) * weight);
value = value - scalar.dotProduct(matrixCache.getRowVector(index), itemValues).getValue();
scalar.setValue(value);
});
matrixCache.iterateElement(MathCalculator.PARALLEL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue((row == column ? itemRegularization : 0F) + (value / (weight + 1)) * sumSupport + itemTimeMatrix.getValue(row, column) * weight - value);
});
itemValues.dotProduct(MatrixUtility.inverse(matrixCache, copyCache, inverseCache), false, vectorCache, MathCalculator.SERIAL);
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
}

View File

@ -0,0 +1,145 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* Rank CD推荐器
*
* <pre>
* </pre>
*
* @author Birdy
*
*/
public class RankCDModel extends MatrixFactorizationModel {
// private float alpha;
// item confidence
private float confidence;
private SparseMatrix weightMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
confidence = configuration.getFloat("recommender.rankcd.alpha");
weightMatrix = SparseMatrix.copyOf(scoreMatrix, false);
weightMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(1F + confidence * scalar.getValue());
});
}
@Override
protected void doPractice() {
// Init caches
double[] userScores = new double[userSize];
double[] itemScores = new double[itemSize];
double[] userConfidences = new double[userSize];
double[] itemConfidences = new double[itemSize];
// Init Sq
DenseMatrix itemDeltas = DenseMatrix.valueOf(factorSize, factorSize);
// Init Sp
DenseMatrix userDeltas = DenseMatrix.valueOf(factorSize, factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
itemDeltas.dotProduct(itemFactors, true, itemFactors, false, MathCalculator.SERIAL);
// Step 1: update user factors;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = weightMatrix.getRowVector(userIndex);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
itemScores[itemIndex] = predict(userIndex, itemIndex);
itemConfidences[itemIndex] = term.getValue();
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float numerator = 0F, denominator = userRegularization + itemDeltas.getValue(factorIndex, factorIndex);
// TODO 此处可以改为减法
for (int k = 0; k < factorSize; k++) {
if (factorIndex != k) {
numerator -= userFactors.getValue(userIndex, k) * itemDeltas.getValue(factorIndex, k);
}
}
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
itemScores[itemIndex] -= userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
numerator += (itemConfidences[itemIndex] - (itemConfidences[itemIndex] - 1) * itemScores[itemIndex]) * itemFactors.getValue(itemIndex, factorIndex);
denominator += (itemConfidences[itemIndex] - 1) * itemFactors.getValue(itemIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
}
// update puf
userFactors.setValue(userIndex, factorIndex, numerator / denominator);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
itemScores[itemIndex] += userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
}
}
}
// Update the Sp cache
userDeltas.dotProduct(userFactors, true, userFactors, false, MathCalculator.SERIAL);
// Step 2: update item factors;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = weightMatrix.getColumnVector(itemIndex);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
userScores[userIndex] = predict(userIndex, itemIndex);
userConfidences[userIndex] = term.getValue();
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float numerator = 0F, denominator = itemRegularization + userDeltas.getValue(factorIndex, factorIndex);
// TODO 此处可以改为减法
for (int k = 0; k < factorSize; k++) {
if (factorIndex != k) {
numerator -= itemFactors.getValue(itemIndex, k) * userDeltas.getValue(k, factorIndex);
}
}
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
userScores[userIndex] -= userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
numerator += (userConfidences[userIndex] - (userConfidences[userIndex] - 1) * userScores[userIndex]) * userFactors.getValue(userIndex, factorIndex);
denominator += (userConfidences[userIndex] - 1) * userFactors.getValue(userIndex, factorIndex) * userFactors.getValue(userIndex, factorIndex);
}
// update qif
itemFactors.setValue(itemIndex, factorIndex, numerator / denominator);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
userScores[userIndex] += userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
// TODO 目前没有totalLoss.
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
instance.setQuantityMark(scalar.dotProduct(userFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex)).getValue());
}
}

View File

@ -0,0 +1,99 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.List;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
*
* Rank SGD推荐器
*
* <pre>
* Collaborative Filtering Ensemble for Ranking
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RankSGDModel extends MatrixFactorizationModel {
// item sampling probabilities sorted ascendingly
protected DenseVector itemProbabilities;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// compute item sampling probability
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
itemProbabilities = DenseVector.valueOf(itemSize);
itemProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float userSize = scoreMatrix.getColumnScope(index);
// sample items based on popularity
float value = (userSize + 0F) / actionSize;
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
}
@Override
protected void doPractice() {
List<IntSet> userItemSet = getUserItemSet(scoreMatrix);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
// for each rated user-item (u,i) pair
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
IntSet itemSet = userItemSet.get(userIndex);
int positiveItemIndex = term.getColumn();
float positiveScore = term.getValue();
int negativeItemIndex = -1;
do {
// draw an item j with probability proportional to
// popularity
negativeItemIndex = SampleUtility.binarySearch(itemProbabilities, 0, itemProbabilities.getElementSize() - 1, RandomUtility.randomFloat(itemProbabilities.getValue(itemProbabilities.getElementSize() - 1)));
// ensure that it is unrated by user u
} while (itemSet.contains(negativeItemIndex));
float negativeScore = 0F;
// compute predictions
float error = (predict(userIndex, positiveItemIndex) - predict(userIndex, negativeItemIndex)) - (positiveScore - negativeScore);
totalError += error * error;
// update vectors
float value = learnRatio * error;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveItemFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeItemFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, -value * (positiveItemFactor - negativeItemFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, -value * userFactor);
itemFactors.shiftValue(negativeItemIndex, factorIndex, value * userFactor);
}
}
totalError *= 0.5D;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,325 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* Rank VFCD推荐器
*
* <pre>
* </pre>
*
* @author Birdy
*
*/
public class RankVFCDModel extends MatrixFactorizationModel {
/**
* two low-rank item matrices, an item-item similarity was learned as a product
* of these two matrices
*/
private DenseMatrix userFactors, explicitItemFactors;
private float alpha, beta, gamma, lamutaE;
private SparseMatrix featureMatrix;
private DenseVector featureVector;
private int numberOfFeatures;
private DenseMatrix featureFactors, implicitItemFactors, factorMatrix;
private SparseMatrix relationMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
alpha = configuration.getFloat("recommender.rankvfcd.alpha", 5F);
beta = configuration.getFloat("recommender.rankvfcd.beta", 10F);
gamma = configuration.getFloat("recommender.rankvfcd.gamma", 50F);
lamutaE = configuration.getFloat("recommender.rankvfcd.lamutaE", 50F);
numberOfFeatures = 4096;
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
explicitItemFactors = DenseMatrix.valueOf(itemSize, factorSize);
explicitItemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
implicitItemFactors = DenseMatrix.valueOf(itemSize, factorSize);
implicitItemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
featureFactors = DenseMatrix.valueOf(numberOfFeatures, factorSize);
featureFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
// 相关矩阵
DataModule relationModel = space.getModule("relation");
// TODO 此处需要重构,leftDimension与rightDimension要配置
String leftField = configuration.getString("data.model.fields.left");
String rightField = configuration.getString("data.model.fields.right");
String coefficientField = configuration.getString("data.model.fields.coefficient");
int leftDimension = 0;
int rightDimension = 1;
int coefficientDimension = relationModel.getQuantityInner(coefficientField);
HashMatrix relationTable = new HashMatrix(true, itemSize, itemSize, new Long2FloatRBTreeMap());
for (DataInstance instance : relationModel) {
int itemIndex = instance.getQualityFeature(leftDimension);
int neighborIndex = instance.getQualityFeature(rightDimension);
relationTable.setValue(itemIndex, neighborIndex, instance.getQuantityFeature(coefficientDimension));
}
relationMatrix = SparseMatrix.valueOf(itemSize, itemSize, relationTable);
relationTable = null;
// 特征矩阵
float minimumValue = Float.MAX_VALUE;
float maximumValue = Float.MIN_VALUE;
HashMatrix visualTable = new HashMatrix(true, numberOfFeatures, itemSize, new Long2FloatRBTreeMap());
DataModule featureModel = space.getModule("article");
String articleField = configuration.getString("data.model.fields.article");
String featureField = configuration.getString("data.model.fields.feature");
String degreeField = configuration.getString("data.model.fields.degree");
int articleDimension = featureModel.getQualityInner(articleField);
int featureDimension = featureModel.getQualityInner(featureField);
int degreeDimension = featureModel.getQuantityInner(degreeField);
for (DataInstance instance : featureModel) {
int itemIndex = instance.getQualityFeature(articleDimension);
int featureIndex = instance.getQualityFeature(featureDimension);
float featureValue = instance.getQuantityFeature(degreeDimension);
if (featureValue < minimumValue) {
minimumValue = featureValue;
}
if (featureValue > maximumValue) {
maximumValue = featureValue;
}
visualTable.setValue(featureIndex, itemIndex, featureValue);
}
featureMatrix = SparseMatrix.valueOf(numberOfFeatures, itemSize, visualTable);
visualTable = null;
float maximum = maximumValue;
float minimum = minimumValue;
featureMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((scalar.getValue() - minimum) / (maximum - minimum));
});
factorMatrix = DenseMatrix.valueOf(factorSize, itemSize);
featureVector = DenseVector.valueOf(numberOfFeatures);
for (MatrixScalar term : featureMatrix) {
int featureIndex = term.getRow();
float value = featureVector.getValue(featureIndex) + term.getValue() * term.getValue();
featureVector.setValue(featureIndex, value);
}
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
// Init caches
float[] prediction_users = new float[userSize];
float[] prediction_items = new float[itemSize];
float[] prediction_itemrelated = new float[itemSize];
float[] prediction_relateditem = new float[itemSize];
float[] w_users = new float[userSize];
float[] w_items = new float[itemSize];
float[] q_itemrelated = new float[itemSize];
float[] q_relateditem = new float[itemSize];
DenseMatrix explicitItemDeltas = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix implicitItemDeltas = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix userDeltas = DenseMatrix.valueOf(factorSize, factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// Update the Sq cache
explicitItemDeltas.dotProduct(explicitItemFactors, true, explicitItemFactors, false, MathCalculator.SERIAL);
// Step 1: update user factors;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
prediction_items[itemIndex] = scalar.dotProduct(userFactors.getRowVector(userIndex), explicitItemFactors.getRowVector(itemIndex)).getValue();
w_items[itemIndex] = 1F + alpha * term.getValue();
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float numerator = 0F, denominator = userRegularization + explicitItemDeltas.getValue(factorIndex, factorIndex);
// TODO 此处可以改为减法
for (int k = 0; k < factorSize; k++) {
if (factorIndex != k) {
numerator -= userFactors.getValue(userIndex, k) * explicitItemDeltas.getValue(factorIndex, k);
}
}
float userFactor = userFactors.getValue(userIndex, factorIndex);
for (VectorScalar entry : userVector) {
int i = entry.getIndex();
float qif = explicitItemFactors.getValue(i, factorIndex);
prediction_items[i] -= userFactor * qif;
numerator += (w_items[i] - (w_items[i] - 1) * prediction_items[i]) * qif;
denominator += (w_items[i] - 1) * qif * qif;
}
// update puf
userFactor = numerator / denominator;
userFactors.setValue(userIndex, factorIndex, userFactor);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
prediction_items[itemIndex] += userFactor * explicitItemFactors.getValue(itemIndex, factorIndex);
}
}
}
// Update the Sp cache
userDeltas.dotProduct(userFactors, true, userFactors, false, MathCalculator.SERIAL);
implicitItemDeltas.dotProduct(implicitItemFactors, true, implicitItemFactors, false, MathCalculator.SERIAL);
DenseMatrix ETF = factorMatrix;
ETF.dotProduct(featureFactors, true, featureMatrix, false, MathCalculator.PARALLEL);
// Step 2: update item factors;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
SparseVector relationVector = relationMatrix.getRowVector(itemIndex);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
prediction_users[userIndex] = scalar.dotProduct(userFactors.getRowVector(userIndex), explicitItemFactors.getRowVector(itemIndex)).getValue();
w_users[userIndex] = 1F + alpha * term.getValue();
}
for (VectorScalar term : relationVector) {
int neighborIndex = term.getIndex();
prediction_itemrelated[neighborIndex] = scalar.dotProduct(explicitItemFactors.getRowVector(itemIndex), implicitItemFactors.getRowVector(neighborIndex)).getValue();
q_itemrelated[neighborIndex] = 1F + alpha * term.getValue();
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float explicitNumerator = 0F, explicitDenominator = userDeltas.getValue(factorIndex, factorIndex) + itemRegularization;
float implicitNumerator = 0F, implicitDenominator = implicitItemDeltas.getValue(factorIndex, factorIndex);
// TODO 此处可以改为减法
for (int k = 0; k < factorSize; k++) {
if (factorIndex != k) {
explicitNumerator -= explicitItemFactors.getValue(itemIndex, k) * userDeltas.getValue(k, factorIndex);
implicitNumerator -= explicitItemFactors.getValue(itemIndex, k) * implicitItemDeltas.getValue(k, factorIndex);
}
}
float explicitItemFactor = explicitItemFactors.getValue(itemIndex, factorIndex);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
float userFactor = userFactors.getValue(userIndex, factorIndex);
prediction_users[userIndex] -= userFactor * explicitItemFactor;
explicitNumerator += (w_users[userIndex] - (w_users[userIndex] - 1) * prediction_users[userIndex]) * userFactor;
explicitDenominator += (w_users[userIndex] - 1) * userFactor * userFactor;
}
for (VectorScalar term : relationVector) {
int neighborIndex = term.getIndex();
float implicitItemFactor = implicitItemFactors.getValue(neighborIndex, factorIndex);
prediction_itemrelated[neighborIndex] -= implicitItemFactor * explicitItemFactor;
implicitNumerator += (q_itemrelated[neighborIndex] - (q_itemrelated[neighborIndex] - 1) * prediction_itemrelated[neighborIndex]) * implicitItemFactor;
implicitDenominator += (q_itemrelated[neighborIndex] - 1) * implicitItemFactor * implicitItemFactor;
}
// update qif
explicitItemFactor = (explicitNumerator + implicitNumerator * beta + gamma * ETF.getValue(factorIndex, itemIndex)) / (explicitDenominator + implicitDenominator * beta + gamma);
explicitItemFactors.setValue(itemIndex, factorIndex, explicitItemFactor);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
prediction_users[userIndex] += userFactors.getValue(userIndex, factorIndex) * explicitItemFactor;
}
for (VectorScalar term : relationVector) {
int neighborIndex = term.getIndex();
prediction_itemrelated[neighborIndex] += implicitItemFactors.getValue(neighborIndex, factorIndex) * explicitItemFactor;
}
}
}
explicitItemDeltas.dotProduct(explicitItemFactors, true, explicitItemFactors, false, MathCalculator.SERIAL);
// Step 1: update Z factors;
for (int neighborIndex = 0; neighborIndex < itemSize; neighborIndex++) {
SparseVector relationVector = relationMatrix.getColumnVector(neighborIndex);
for (VectorScalar term : relationVector) {
int itemIndex = term.getIndex();
prediction_relateditem[itemIndex] = scalar.dotProduct(explicitItemFactors.getRowVector(itemIndex), implicitItemFactors.getRowVector(neighborIndex)).getValue();
q_relateditem[itemIndex] = 1F + alpha * term.getValue();
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float numerator = 0F, denominator = explicitItemDeltas.getValue(factorIndex, factorIndex);
// TODO 此处可以改为减法
for (int k = 0; k < factorSize; k++) {
if (factorIndex != k) {
numerator -= implicitItemFactors.getValue(neighborIndex, k) * explicitItemDeltas.getValue(factorIndex, k);
}
}
float implicitItemFactor = implicitItemFactors.getValue(neighborIndex, factorIndex);
for (VectorScalar term : relationVector) {
int itemIndex = term.getIndex();
float explicitItemFactor = explicitItemFactors.getValue(itemIndex, factorIndex);
prediction_relateditem[itemIndex] -= implicitItemFactor * explicitItemFactor;
numerator += (q_relateditem[itemIndex] - (q_relateditem[itemIndex] - 1) * prediction_relateditem[itemIndex]) * explicitItemFactor;
denominator += (q_relateditem[itemIndex] - 1) * explicitItemFactor * explicitItemFactor;
}
// update puf
implicitItemFactor = beta * numerator / (beta * denominator + itemRegularization);
implicitItemFactors.setValue(neighborIndex, factorIndex, implicitItemFactor);
for (VectorScalar term : relationVector) {
int itemIndex = term.getIndex();
prediction_relateditem[itemIndex] += implicitItemFactor * explicitItemFactors.getValue(itemIndex, factorIndex);
}
}
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
for (int featureIndex = 0; featureIndex < numberOfFeatures; featureIndex++) {
SparseVector featureVector = featureMatrix.getRowVector(featureIndex);
float numerator = 0F, denominator = featureFactors.getValue(featureIndex, factorIndex);
for (VectorScalar term : featureVector) {
float featureValue = term.getValue();
int itemIndex = term.getIndex();
ETF.setValue(factorIndex, itemIndex, ETF.getValue(factorIndex, itemIndex) - denominator * featureValue);
numerator += (explicitItemFactors.getValue(itemIndex, factorIndex) - ETF.getValue(factorIndex, itemIndex)) * featureValue;
}
denominator = numerator * gamma / (gamma * this.featureVector.getValue(featureIndex) + lamutaE);
featureFactors.setValue(featureIndex, factorIndex, denominator);
for (VectorScalar term : featureVector) {
float featureValue = term.getValue();
int itemIndex = term.getIndex();
ETF.setValue(factorIndex, itemIndex, ETF.getValue(factorIndex, itemIndex) + denominator * featureValue);
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
// TODO 目前没有totalLoss.
}
factorMatrix.dotProduct(featureFactors, true, featureMatrix, false, MathCalculator.PARALLEL);
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float score = 0F;
if (scoreMatrix.getColumnVector(itemIndex).getElementSize() == 0) {
score = scalar.dotProduct(userFactors.getRowVector(userIndex), factorMatrix.getColumnVector(itemIndex)).getValue();
} else {
score = scalar.dotProduct(userFactors.getRowVector(userIndex), explicitItemFactors.getRowVector(itemIndex)).getValue();
}
instance.setQuantityMark(score);
}
}

View File

@ -0,0 +1,301 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Arrays;
import java.util.Comparator;
import java.util.Iterator;
import java.util.TreeSet;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.correlation.MathCorrelation;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SymmetryMatrix;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.common.reflection.ReflectionUtility;
import com.jstarcraft.core.utility.Integer2FloatKeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.EpocheModel;
import com.jstarcraft.rns.model.exception.ModelException;
import it.unimi.dsi.fastutil.ints.Int2ObjectMap;
import it.unimi.dsi.fastutil.ints.Int2ObjectOpenHashMap;
/**
*
* SLIM推荐器
*
* <pre>
* SLIM: Sparse Linear Methods for Top-N Recommender Systems
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class SLIMModel extends EpocheModel {
/**
* W in original paper, a sparse matrix of aggregation coefficients
*/
// TODO 考虑修改为对称矩阵?
private DenseMatrix coefficientMatrix;
/**
* item's nearest neighbors for kNN > 0
*/
private int[][] itemNeighbors;
/**
* regularization parameters for the L1 or L2 term
*/
private float regL1Norm, regL2Norm;
/**
* number of nearest neighbors
*/
private int neighborSize;
/**
* item similarity matrix
*/
private SymmetryMatrix symmetryMatrix;
private ArrayVector[] userVectors;
private ArrayVector[] itemVectors;
private Comparator<Integer2FloatKeyValue> comparator = new Comparator<Integer2FloatKeyValue>() {
@Override
public int compare(Integer2FloatKeyValue left, Integer2FloatKeyValue right) {
int compare = -(Float.compare(left.getValue(), right.getValue()));
if (compare == 0) {
compare = Integer.compare(left.getKey(), right.getKey());
}
return compare;
}
};
/**
* initialization
*
* @throws ModelException if error occurs
*/
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
neighborSize = configuration.getInteger("recommender.neighbors.knn.number", 50);
regL1Norm = configuration.getFloat("recommender.slim.regularization.l1", 1.0F);
regL2Norm = configuration.getFloat("recommender.slim.regularization.l2", 1.0F);
// TODO 考虑重构
coefficientMatrix = DenseMatrix.valueOf(itemSize, itemSize);
coefficientMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
coefficientMatrix.setValue(itemIndex, itemIndex, 0F);
}
// initial guesses: make smaller guesses (e.g., W.init(0.01)) to speed
// up training
// TODO 修改为配置枚举
try {
Class<MathCorrelation> correlationClass = (Class<MathCorrelation>) Class.forName(configuration.getString("recommender.correlation.class"));
MathCorrelation correlation = ReflectionUtility.getInstance(correlationClass);
symmetryMatrix = new SymmetryMatrix(scoreMatrix.getColumnSize());
correlation.calculateCoefficients(scoreMatrix, true, symmetryMatrix::setValue);
} catch (Exception exception) {
throw new RuntimeException(exception);
}
// TODO 设置容量
itemNeighbors = new int[itemSize][];
Int2ObjectMap<TreeSet<Integer2FloatKeyValue>> itemNNs = new Int2ObjectOpenHashMap<>();
for (MatrixScalar term : symmetryMatrix) {
int row = term.getRow();
int column = term.getColumn();
if (row == column) {
continue;
}
float value = term.getValue();
// 忽略相似度为0的物品
if (value == 0F) {
continue;
}
TreeSet<Integer2FloatKeyValue> neighbors = itemNNs.get(row);
if (neighbors == null) {
neighbors = new TreeSet<>(comparator);
itemNNs.put(row, neighbors);
}
neighbors.add(new Integer2FloatKeyValue(column, value));
neighbors = itemNNs.get(column);
if (neighbors == null) {
neighbors = new TreeSet<>(comparator);
itemNNs.put(column, neighbors);
}
neighbors.add(new Integer2FloatKeyValue(row, value));
}
// 构建物品邻居映射
for (Int2ObjectMap.Entry<TreeSet<Integer2FloatKeyValue>> term : itemNNs.int2ObjectEntrySet()) {
TreeSet<Integer2FloatKeyValue> neighbors = term.getValue();
int[] value = new int[neighbors.size() < neighborSize ? neighbors.size() : neighborSize];
int index = 0;
for (Integer2FloatKeyValue neighbor : neighbors) {
value[index++] = neighbor.getKey();
if (index >= neighborSize) {
break;
}
}
Arrays.sort(value);
itemNeighbors[term.getIntKey()] = value;
}
userVectors = new ArrayVector[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userVectors[userIndex] = new ArrayVector(scoreMatrix.getRowVector(userIndex));
}
itemVectors = new ArrayVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemVectors[itemIndex] = new ArrayVector(scoreMatrix.getColumnVector(itemIndex));
}
}
/**
* train model
*
* @throws ModelException if error occurs
*/
@Override
protected void doPractice() {
float[] scores = new float[userSize];
// number of iteration cycles
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
// each cycle iterates through one coordinate direction
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
int[] neighborIndexes = itemNeighbors[itemIndex];
if (neighborIndexes == null) {
continue;
}
ArrayVector itemVector = itemVectors[itemIndex];
for (VectorScalar term : itemVector) {
scores[term.getIndex()] = term.getValue();
}
// for each nearest neighbor nearestNeighborItemIdx, update
// coefficienMatrix by the coordinate
// descent update rule
for (int neighborIndex : neighborIndexes) {
itemVector = itemVectors[neighborIndex];
float valueSum = 0F, rateSum = 0F, errorSum = 0F;
int count = itemVector.getElementSize();
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
float neighborScore = term.getValue();
float userScore = scores[userIndex];
float error = userScore - predict(userIndex, itemIndex, neighborIndexes, neighborIndex);
valueSum += neighborScore * error;
rateSum += neighborScore * neighborScore;
errorSum += error * error;
}
valueSum /= count;
rateSum /= count;
errorSum /= count;
// TODO 此处考虑重构
float coefficient = coefficientMatrix.getValue(neighborIndex, itemIndex);
totalError += errorSum + 0.5F * regL2Norm * coefficient * coefficient + regL1Norm * coefficient;
if (regL1Norm < Math.abs(valueSum)) {
if (valueSum > 0) {
coefficient = (valueSum - regL1Norm) / (regL2Norm + rateSum);
} else {
// One doubt: in this case, wij<0, however, the
// paper says wij>=0. How to gaurantee that?
coefficient = (valueSum + regL1Norm) / (regL2Norm + rateSum);
}
} else {
coefficient = 0F;
}
coefficientMatrix.setValue(neighborIndex, itemIndex, coefficient);
}
itemVector = itemVectors[itemIndex];
for (VectorScalar term : itemVector) {
scores[term.getIndex()] = 0F;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
/**
* predict a specific ranking score for user userIdx on item itemIdx.
*
* @param userIndex user index
* @param itemIndex item index
* @param excludIndex excluded item index
* @return a prediction without the contribution of excluded item
*/
private float predict(int userIndex, int itemIndex, int[] neighbors, int currentIndex) {
float value = 0F;
ArrayVector userVector = userVectors[userIndex];
if (userVector.getElementSize() == 0) {
return value;
}
int leftCursor = 0, rightCursor = 0, leftSize = userVector.getElementSize(), rightSize = neighbors.length;
Iterator<VectorScalar> iterator = userVector.iterator();
VectorScalar term = iterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (term.getIndex() == neighbors[rightCursor]) {
if (neighbors[rightCursor] != currentIndex) {
value += term.getValue() * coefficientMatrix.getValue(neighbors[rightCursor], itemIndex);
}
if (iterator.hasNext()) {
term = iterator.next();
}
leftCursor++;
rightCursor++;
} else if (term.getIndex() > neighbors[rightCursor]) {
rightCursor++;
} else if (term.getIndex() < neighbors[rightCursor]) {
if (iterator.hasNext()) {
term = iterator.next();
}
leftCursor++;
}
}
return value;
}
/**
* predict a specific ranking score for user userIdx on item itemIdx.
*
* @param userIndex user index
* @param itemIndex item index
* @return predictive ranking score for user userIdx on item itemIdx
* @throws ModelException if error occurs
*/
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
int[] neighbors = itemNeighbors[itemIndex];
if (neighbors == null) {
instance.setQuantityMark(0F);
return;
}
instance.setQuantityMark(predict(userIndex, itemIndex, neighbors, -1));
}
}

View File

@ -0,0 +1,76 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.rns.model.collaborative.UserKNNModel;
/**
*
* User KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class UserKNNRankingModel extends UserKNNModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
SparseVector itemVector = itemVectors[itemIndex];
MathVector neighbors = userNeighbors[userIndex];
if (itemVector.getElementSize() == 0 || neighbors.getElementSize() == 0) {
instance.setQuantityMark(0F);
return;
}
float sum = 0F, absolute = 0F;
int count = 0;
int leftCursor = 0, rightCursor = 0, leftSize = itemVector.getElementSize(), rightSize = neighbors.getElementSize();
Iterator<VectorScalar> leftIterator = itemVector.iterator();
VectorScalar leftTerm = leftIterator.next();
Iterator<VectorScalar> rightIterator = neighbors.iterator();
VectorScalar rightTerm = rightIterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (leftTerm.getIndex() == rightTerm.getIndex()) {
count++;
sum += rightTerm.getValue();
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
leftCursor++;
rightCursor++;
} else if (leftTerm.getIndex() > rightTerm.getIndex()) {
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
rightCursor++;
} else if (leftTerm.getIndex() < rightTerm.getIndex()) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
leftCursor++;
}
}
if (count == 0) {
instance.setQuantityMark(0F);
return;
}
instance.setQuantityMark(sum);
}
}

View File

@ -0,0 +1,270 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* VBPR推荐器
*
* <pre>
* VBPR: Visual Bayesian Personalized Randking from Implicit Feedback
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class VBPRModel extends MatrixFactorizationModel {
/**
* items biases
*/
private DenseVector itemBiases;
private float biasRegularization;
private double featureRegularization;
private int numberOfFeatures;
private DenseMatrix userFeatures;
private DenseVector itemFeatures;
private DenseMatrix featureFactors;
private HashMatrix featureTable;
private DenseMatrix factorMatrix;
private DenseVector featureVector;
/** 采样比例 */
private int sampleRatio;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处代码可以消除(使用常量Marker代替或者使用binarize.threshold)
for (MatrixScalar term : scoreMatrix) {
term.setValue(1F);
}
biasRegularization = configuration.getFloat("recommender.bias.regularization", 0.1F);
// TODO 此处应该修改为配置或者动态计算.
numberOfFeatures = 4096;
featureRegularization = 1000;
sampleRatio = configuration.getInteger("recommender.vbpr.alpha", 5);
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFeatures = DenseVector.valueOf(numberOfFeatures);
itemFeatures.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
userFeatures = DenseMatrix.valueOf(userSize, factorSize);
userFeatures.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
featureFactors = DenseMatrix.valueOf(factorSize, numberOfFeatures);
featureFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
float minimumValue = Float.MAX_VALUE;
float maximumValue = Float.MIN_VALUE;
featureTable = new HashMatrix(true, itemSize, numberOfFeatures, new Long2FloatRBTreeMap());
DataModule featureModel = space.getModule("article");
String articleField = configuration.getString("data.model.fields.article");
String featureField = configuration.getString("data.model.fields.feature");
String degreeField = configuration.getString("data.model.fields.degree");
int articleDimension = featureModel.getQualityInner(articleField);
int featureDimension = featureModel.getQualityInner(featureField);
int degreeDimension = featureModel.getQuantityInner(degreeField);
for (DataInstance instance : featureModel) {
int itemIndex = instance.getQualityFeature(articleDimension);
int featureIndex = instance.getQualityFeature(featureDimension);
float featureValue = instance.getQuantityFeature(degreeDimension);
if (featureValue < minimumValue) {
minimumValue = featureValue;
}
if (featureValue > maximumValue) {
maximumValue = featureValue;
}
featureTable.setValue(itemIndex, featureIndex, featureValue);
}
for (MatrixScalar cell : featureTable) {
float value = (cell.getValue() - minimumValue) / (maximumValue - minimumValue);
featureTable.setValue(cell.getRow(), cell.getColumn(), value);
}
factorMatrix = DenseMatrix.valueOf(factorSize, itemSize);
factorMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector factorVector = DenseVector.valueOf(featureFactors.getRowSize());
ArrayVector[] featureVectors = new ArrayVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
MathVector keyValues = featureTable.getRowVector(itemIndex);
int[] featureIndexes = new int[keyValues.getElementSize()];
float[] featureValues = new float[keyValues.getElementSize()];
int position = 0;
for (VectorScalar keyValue : keyValues) {
featureIndexes[position] = keyValue.getIndex();
featureValues[position] = keyValue.getValue();
position++;
}
featureVectors[itemIndex] = new ArrayVector(numberOfFeatures, featureIndexes, featureValues);
}
float[] featureValues = new float[numberOfFeatures];
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, numberOfSamples = userSize * sampleRatio; sampleIndex < numberOfSamples; sampleIndex++) {
// randomly draw (u, i, j)
int userKey, positiveItemKey, negativeItemKey;
while (true) {
userKey = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userKey);
if (userVector.getElementSize() == 0) {
continue;
}
positiveItemKey = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
negativeItemKey = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (VectorScalar term : userVector) {
if (negativeItemKey >= term.getIndex()) {
negativeItemKey++;
} else {
break;
}
}
break;
}
int userIndex = userKey, positiveItemIndex = positiveItemKey, negativeItemIndex = negativeItemKey;
ArrayVector positiveItemVector = featureVectors[positiveItemIndex];
ArrayVector negativeItemVector = featureVectors[negativeItemIndex];
// update parameters
float positiveScore = predict(userIndex, positiveItemIndex, scalar.dotProduct(itemFeatures, positiveItemVector).getValue(), factorVector.dotProduct(featureFactors, false, positiveItemVector, MathCalculator.SERIAL));
float negativeScore = predict(userIndex, negativeItemIndex, scalar.dotProduct(itemFeatures, negativeItemVector).getValue(), factorVector.dotProduct(featureFactors, false, negativeItemVector, MathCalculator.SERIAL));
float error = LogisticUtility.getValue(positiveScore - negativeScore);
totalError += (float) -Math.log(error);
// update bias
float positiveBias = itemBiases.getValue(positiveItemIndex), negativeBias = itemBiases.getValue(negativeItemIndex);
itemBiases.shiftValue(positiveItemIndex, learnRatio * (error - biasRegularization * positiveBias));
itemBiases.shiftValue(negativeItemIndex, learnRatio * (-error - biasRegularization * negativeBias));
totalError += biasRegularization * positiveBias * positiveBias + biasRegularization * negativeBias * negativeBias;
for (VectorScalar term : positiveItemVector) {
featureValues[term.getIndex()] = term.getValue();
}
for (VectorScalar term : negativeItemVector) {
featureValues[term.getIndex()] -= term.getValue();
}
// update user/item vectors
// 按照因子切割任务实现并发计算.
// CountDownLatch factorLatch = new
// CountDownLatch(numberOfFactors);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveItemFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeItemFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * (positiveItemFactor - negativeItemFactor) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (error * (userFactor) - itemRegularization * positiveItemFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (error * (-userFactor) - itemRegularization * negativeItemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveItemFactor * positiveItemFactor + itemRegularization * negativeItemFactor * negativeItemFactor;
float userFeature = userFeatures.getValue(userIndex, factorIndex);
DenseVector featureVector = featureFactors.getRowVector(factorIndex);
userFeatures.shiftValue(userIndex, factorIndex, learnRatio * (error * (scalar.dotProduct(featureVector, positiveItemVector).getValue() - scalar.dotProduct(featureVector, negativeItemVector).getValue()) - userRegularization * userFeature));
totalError += userRegularization * userFeature * userFeature;
featureVector.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
totalError += featureRegularization * value * value;
value += learnRatio * (error * userFeature * featureValues[index] - featureRegularization * value);
element.setValue(value);
});
}
// 按照特征切割任务实现并发计算.
itemFeatures.iterateElement(MathCalculator.SERIAL, (element) -> {
int index = element.getIndex();
float value = element.getValue();
totalError += featureRegularization * value * value;
value += learnRatio * (featureValues[index] - featureRegularization * value);
element.setValue(value);
});
// try {
// factorLatch.await();
// } catch (Exception exception) {
// throw new LibrecException(exception);
// }
for (VectorScalar term : positiveItemVector) {
featureValues[term.getIndex()] = 0F;
}
for (VectorScalar term : negativeItemVector) {
featureValues[term.getIndex()] -= 0F;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
factorMatrix.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
ArrayVector vector = featureVectors[column];
float value = 0;
for (VectorScalar entry : vector) {
value += featureFactors.getValue(row, entry.getIndex()) * entry.getValue();
}
element.setValue(value);
});
featureVector = DenseVector.valueOf(itemSize);
featureVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.dotProduct(itemFeatures, featureVectors[element.getIndex()]).getValue();
});
}
private float predict(int userIndex, int itemIndex, float itemFeature, MathVector factorVector) {
DefaultScalar scalar = DefaultScalar.getInstance();
scalar.setValue(0F);
scalar.shiftValue(itemBiases.getValue(itemIndex) + itemFeature);
scalar.accumulateProduct(userFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex));
scalar.accumulateProduct(userFeatures.getRowVector(userIndex), factorVector);
return scalar.getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex, featureVector.getValue(itemIndex), factorMatrix.getColumnVector(itemIndex)));
}
}

View File

@ -0,0 +1,112 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* WARP推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class WARPMFModel extends MatrixFactorizationModel {
private int lossType;
private float epsilon;
private float[] orderLosses;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
lossType = configuration.getInteger("losstype", 3);
epsilon = configuration.getFloat("epsilon");
orderLosses = new float[itemSize - 1];
float orderLoss = 0F;
for (int orderIndex = 1; orderIndex < itemSize; orderIndex++) {
orderLoss += 1D / orderIndex;
orderLosses[orderIndex - 1] = orderLoss;
}
for (int rankIndex = 1; rankIndex < itemSize; rankIndex++) {
orderLosses[rankIndex - 1] /= orderLoss;
}
}
@Override
protected void doPractice() {
int Y, N;
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
int userIndex, positiveItemIndex, negativeItemIndex;
float positiveScore;
float negativeScore;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0 || userVector.getElementSize() == itemSize) {
continue;
}
N = 0;
Y = itemSize - scoreMatrix.getRowScope(userIndex);
positiveItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
positiveScore = predict(userIndex, positiveItemIndex);
do {
N++;
negativeItemIndex = RandomUtility.randomInteger(itemSize - userVector.getElementSize());
for (int index = 0, size = userVector.getElementSize(); index < size; index++) {
if (negativeItemIndex >= userVector.getIndex(index)) {
negativeItemIndex++;
continue;
}
break;
}
negativeScore = predict(userIndex, negativeItemIndex);
} while ((positiveScore - negativeScore > epsilon) && N < Y - 1);
break;
}
// update parameters
float error = positiveScore - negativeScore;
float gradient = calaculateGradientValue(lossType, error);
int orderIndex = (int) ((Y - 1) / N);
float orderLoss = orderLosses[orderIndex];
gradient = gradient * orderLoss;
totalError += -Math.log(LogisticUtility.getValue(error));
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (gradient * (positiveFactor - negativeFactor) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (gradient * userFactor - itemRegularization * positiveFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (gradient * (-userFactor) - itemRegularization * negativeFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveFactor * positiveFactor + itemRegularization * negativeFactor * negativeFactor;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,182 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
*
* WBPR推荐器
*
* <pre>
* Bayesian Personalized Ranking for Non-Uniformly Sampled Items
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class WBPRModel extends MatrixFactorizationModel {
/**
* user items Set
*/
// private LoadingCache<Integer, IntSet> userItemsSet;
/**
* pre-compute and sort by item's popularity
*/
private List<KeyValue<Integer, Double>> itemPopularities;
private List<KeyValue<Integer, Double>>[] itemProbabilities;
/**
* items biases
*/
private DenseVector itemBiases;
/**
* bias regularization
*/
private float biasRegularization;
/**
* Guava cache configuration
*/
// protected static String cacheSpec;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
biasRegularization = configuration.getFloat("recommender.bias.regularization", 0.01F);
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
// pre-compute and sort by item's popularity
itemPopularities = new ArrayList<>(itemSize);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemPopularities.add(new KeyValue<>(itemIndex, Double.valueOf(scoreMatrix.getColumnScope(itemIndex))));
}
Collections.sort(itemPopularities, (left, right) -> {
// 降序
return right.getValue().compareTo(left.getValue());
});
itemProbabilities = new List[userSize];
List<IntSet> userItemSet = getUserItemSet(scoreMatrix);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
IntSet scoreSet = userItemSet.get(userIndex);
List<KeyValue<Integer, Double>> probabilities = new LinkedList<>();
itemProbabilities[userIndex] = probabilities;
// filter candidate items
double sum = 0;
for (KeyValue<Integer, Double> term : itemPopularities) {
int itemIndex = term.getKey();
double popularity = term.getValue();
if (!scoreSet.contains(itemIndex) && popularity > 0D) {
// make a clone to prevent bugs from normalization
probabilities.add(term);
sum += popularity;
}
}
// normalization
for (KeyValue<Integer, Double> term : probabilities) {
term.setValue(term.getValue() / sum);
}
}
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
// randomly draw (userIdx, posItemIdx, negItemIdx)
int userIndex, positiveItemIndex, negativeItemIndex = 0;
List<KeyValue<Integer, Double>> probabilities;
while (true) {
userIndex = RandomUtility.randomInteger(userSize);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
positiveItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
// sample j by popularity (probability)
probabilities = itemProbabilities[userIndex];
double random = RandomUtility.randomDouble(1D);
for (KeyValue<Integer, Double> term : probabilities) {
if ((random -= term.getValue()) <= 0D) {
negativeItemIndex = term.getKey();
break;
}
}
break;
}
// update parameters
float positiveScore = predict(userIndex, positiveItemIndex);
float negativeScore = predict(userIndex, negativeItemIndex);
float error = positiveScore - negativeScore;
float value = (float) -Math.log(LogisticUtility.getValue(error));
totalError += value;
value = LogisticUtility.getValue(-error);
// update bias
float positiveBias = itemBiases.getValue(positiveItemIndex), negativeBias = itemBiases.getValue(negativeItemIndex);
itemBiases.shiftValue(positiveItemIndex, learnRatio * (value - biasRegularization * positiveBias));
itemBiases.shiftValue(negativeItemIndex, learnRatio * (-value - biasRegularization * negativeBias));
totalError += biasRegularization * (positiveBias * positiveBias + negativeBias * negativeBias);
// update user/item vectors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveItemFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negativeItemFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (value * (positiveItemFactor - negativeItemFactor) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (value * userFactor - itemRegularization * positiveItemFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (value * (-userFactor) - itemRegularization * negativeItemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveItemFactor * positiveItemFactor + itemRegularization * negativeItemFactor * negativeItemFactor;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
return itemBiases.getValue(itemIndex) + scalar.dotProduct(userVector, itemVector).getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,202 @@
package com.jstarcraft.rns.model.collaborative.ranking;
import java.util.Date;
import java.util.concurrent.CountDownLatch;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.environment.EnvironmentContext;
import com.jstarcraft.ai.math.MatrixUtility;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.model.exception.ModelException;
/**
*
* WRMF推荐器
*
* <pre>
* WRMF: Weighted Regularized Matrix Factorization
* Collaborative filtering for implicit feedback datasets
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class WRMFModel extends MatrixFactorizationModel {
/**
* confidence weight coefficient
*/
private float weightCoefficient;
/**
* confindence Minus Identity Matrix{ui} = confidenceMatrix_{ui} - 1 =alpha *
* r_{ui} or log(1+10^alpha * r_{ui})
*/
// TODO 应该重构为SparseMatrix
private SparseMatrix confindenceMatrix;
/**
* preferenceMatrix_{ui} = 1 if {@code r_{ui}>0 or preferenceMatrix_{ui} = 0}
*/
// TODO 应该重构为SparseMatrix
private SparseMatrix preferenceMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
weightCoefficient = configuration.getFloat("recommender.wrmf.weight.coefficient", 4.0f);
confindenceMatrix = SparseMatrix.copyOf(scoreMatrix, false);
confindenceMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue((float) Math.log(1F + Math.pow(10, weightCoefficient) * scalar.getValue()));
});
preferenceMatrix = SparseMatrix.copyOf(scoreMatrix, false);
preferenceMatrix.setValues(1F);
}
private ThreadLocal<DenseMatrix> factorMatrixStorage = new ThreadLocal<>();
private ThreadLocal<DenseMatrix> copyMatrixStorage = new ThreadLocal<>();
private ThreadLocal<DenseMatrix> inverseMatrixStorage = new ThreadLocal<>();
@Override
protected void constructEnvironment() {
// 缓存特征计算,避免消耗内存
factorMatrixStorage.set(DenseMatrix.valueOf(factorSize, factorSize));
copyMatrixStorage.set(DenseMatrix.valueOf(factorSize, factorSize));
inverseMatrixStorage.set(DenseMatrix.valueOf(factorSize, factorSize));
}
@Override
protected void destructEnvironment() {
factorMatrixStorage.remove();
copyMatrixStorage.remove();
inverseMatrixStorage.remove();
}
@Override
protected void doPractice() {
EnvironmentContext context = EnvironmentContext.getContext();
// 缓存特征计算,避免消耗内存
DenseMatrix transposeMatrix = DenseMatrix.valueOf(factorSize, factorSize);
// To be consistent with the symbols in the paper
// Updating by using alternative least square (ALS)
// due to large amount of entries to be processed (SGD will be too slow)
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// Step 1: update user factors;
// 按照用户切割任务实现并发计算.
DenseMatrix itemSymmetryMatrix = transposeMatrix;
itemSymmetryMatrix.dotProduct(itemFactors, true, itemFactors, false, MathCalculator.SERIAL);
CountDownLatch userLatch = new CountDownLatch(userSize);
for (int index = 0; index < userSize; index++) {
int userIndex = index;
context.doAlgorithmByAny(index, () -> {
DenseMatrix factorMatrix = factorMatrixStorage.get();
DenseMatrix copyMatrix = copyMatrixStorage.get();
DenseMatrix inverseMatrix = inverseMatrixStorage.get();
SparseVector confindenceVector = confindenceMatrix.getRowVector(userIndex);
// YtY + Yt * (Cu - itemIdx) * Y
factorMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = 0F;
for (VectorScalar term : confindenceVector) {
int itemIndex = term.getIndex();
value += itemFactors.getValue(itemIndex, row) * term.getValue() * itemFactors.getValue(itemIndex, column);
}
value += itemSymmetryMatrix.getValue(row, column);
value += userRegularization;
scalar.setValue(value);
});
// (YtCuY + lambda * itemIdx)^-1
// lambda * itemIdx can be pre-difined because every time is
// the
// same.
// Yt * (Cu - itemIdx) * Pu + Yt * Pu
DenseVector userFactorVector = DenseVector.valueOf(factorSize);
SparseVector preferenceVector = preferenceMatrix.getRowVector(userIndex);
for (int position = 0, size = preferenceVector.getElementSize(); position < size; position++) {
int itemIndex = preferenceVector.getIndex(position);
float confindence = confindenceVector.getValue(position);
float preference = preferenceVector.getValue(position);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
userFactorVector.shiftValue(factorIndex, preference * (itemFactors.getValue(itemIndex, factorIndex) * confindence + itemFactors.getValue(itemIndex, factorIndex)));
}
}
// udpate user factors
userFactors.getRowVector(userIndex).dotProduct(MatrixUtility.inverse(factorMatrix, copyMatrix, inverseMatrix), false, userFactorVector, MathCalculator.SERIAL);
userLatch.countDown();
});
}
try {
userLatch.await();
} catch (Exception exception) {
throw new ModelException(exception);
}
// Step 2: update item factors;
// 按照物品切割任务实现并发计算.
DenseMatrix userSymmetryMatrix = transposeMatrix;
userSymmetryMatrix.dotProduct(userFactors, true, userFactors, false, MathCalculator.SERIAL);
CountDownLatch itemLatch = new CountDownLatch(itemSize);
for (int index = 0; index < itemSize; index++) {
int itemIndex = index;
context.doAlgorithmByAny(index, () -> {
DenseMatrix factorMatrix = factorMatrixStorage.get();
DenseMatrix copyMatrix = copyMatrixStorage.get();
DenseMatrix inverseMatrix = inverseMatrixStorage.get();
SparseVector confindenceVector = confindenceMatrix.getColumnVector(itemIndex);
// XtX + Xt * (Ci - itemIdx) * X
factorMatrix.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = 0F;
for (VectorScalar term : confindenceVector) {
int userIndex = term.getIndex();
value += userFactors.getValue(userIndex, row) * term.getValue() * userFactors.getValue(userIndex, column);
}
value += userSymmetryMatrix.getValue(row, column);
value += itemRegularization;
scalar.setValue(value);
});
// (XtCuX + lambda * itemIdx)^-1
// lambda * itemIdx can be pre-difined because every time is
// the
// same.
// Xt * (Ci - itemIdx) * Pu + Xt * Pu
DenseVector itemFactorVector = DenseVector.valueOf(factorSize);
SparseVector preferenceVector = preferenceMatrix.getColumnVector(itemIndex);
for (int position = 0, size = preferenceVector.getElementSize(); position < size; position++) {
int userIndex = preferenceVector.getIndex(position);
float confindence = confindenceVector.getValue(position);
float preference = preferenceVector.getValue(position);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
itemFactorVector.shiftValue(factorIndex, preference * (userFactors.getValue(userIndex, factorIndex) * confindence + userFactors.getValue(userIndex, factorIndex)));
}
}
// udpate item factors
itemFactors.getRowVector(itemIndex).dotProduct(MatrixUtility.inverse(factorMatrix, copyMatrix, inverseMatrix), false, itemFactorVector, MathCalculator.SERIAL);
itemLatch.countDown();
});
}
try {
itemLatch.await();
} catch (Exception exception) {
throw new ModelException(exception);
}
if (logger.isInfoEnabled()) {
logger.info(getClass() + " runs at iteration = " + epocheIndex + " " + new Date());
}
}
}
}

View File

@ -0,0 +1,128 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
/**
*
* Asymmetric SVD++推荐器
*
* <pre>
* Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class ASVDPlusPlusModel extends BiasedMFModel {
private DenseMatrix positiveFactors, negativeFactors;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
positiveFactors = DenseMatrix.valueOf(itemSize, factorSize);
positiveFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
negativeFactors = DenseMatrix.valueOf(itemSize, factorSize);
negativeFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// TODO 目前没有totalLoss.
totalError = 0f;
for (MatrixScalar matrixTerm : scoreMatrix) {
int userIndex = matrixTerm.getRow();
int itemIndex = matrixTerm.getColumn();
float score = matrixTerm.getValue();
float predict = predict(userIndex, itemIndex);
float error = score - predict;
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
// update factors
float userBiasValue = userBiases.getValue(userIndex);
userBiases.shiftValue(userIndex, learnRatio * (error - regBias * userBiasValue));
float itemBiasValue = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (error - regBias * itemBiasValue));
float squareRoot = (float) Math.sqrt(userVector.getElementSize());
float[] positiveSums = new float[factorSize];
float[] negativeSums = new float[factorSize];
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float positiveSum = 0F;
float negativeSum = 0F;
for (VectorScalar term : userVector) {
int ItemIdx = term.getIndex();
positiveSum += positiveFactors.getValue(ItemIdx, factorIndex);
negativeSum += negativeFactors.getValue(ItemIdx, factorIndex) * (score - meanScore - userBiases.getValue(userIndex) - itemBiases.getValue(ItemIdx));
}
positiveSums[factorIndex] = squareRoot > 0 ? positiveSum / squareRoot : positiveSum;
negativeSums[factorIndex] = squareRoot > 0 ? negativeSum / squareRoot : negativeSum;
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
float userValue = error * itemFactor - userRegularization * userFactor;
float itemValue = error * (userFactor + positiveSums[factorIndex] + negativeSums[factorIndex]) - itemRegularization * itemFactor;
userFactors.shiftValue(userIndex, factorIndex, learnRatio * userValue);
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * itemValue);
for (VectorScalar term : userVector) {
int index = term.getIndex();
float positiveFactor = positiveFactors.getValue(index, factorIndex);
float negativeFactor = negativeFactors.getValue(index, factorIndex);
float positiveDelta = error * itemFactor / squareRoot - userRegularization * positiveFactor;
float negativeDelta = error * itemFactor * (score - meanScore - userBiases.getValue(userIndex) - itemBiases.getValue(index)) / squareRoot - userRegularization * negativeFactor;
positiveFactors.shiftValue(index, factorIndex, learnRatio * positiveDelta);
negativeFactors.shiftValue(index, factorIndex, learnRatio * negativeDelta);
}
}
}
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float value = meanScore + userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex) + scalar.dotProduct(userVector, itemVector).getValue();
SparseVector rateVector = scoreMatrix.getRowVector(userIndex);
float squareRoot = (float) Math.sqrt(rateVector.getElementSize());
for (VectorScalar term : rateVector) {
itemIndex = term.getIndex();
DenseVector positiveVector = positiveFactors.getRowVector(itemIndex);
DenseVector negativeVector = negativeFactors.getRowVector(itemIndex);
value += scalar.dotProduct(positiveVector, itemVector).getValue() / squareRoot;
float scale = term.getValue() - meanScore - userBiases.getValue(userIndex) - itemBiases.getValue(itemIndex);
value += scalar.dotProduct(negativeVector, itemVector).getValue() * scale / squareRoot;
}
if (Float.isNaN(value)) {
value = meanScore;
}
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,189 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.GaussianUtility;
/**
*
* Aspect Model推荐器
*
* <pre>
* Latent class models for collaborative filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class AspectModelRatingModel extends ProbabilisticGraphicalModel {
/*
* Conditional distribution: P(u|z)
*/
private DenseMatrix userProbabilities, userSums;
/*
* Conditional distribution: P(i|z)
*/
private DenseMatrix itemProbabilities, itemSums;
/*
* topic distribution: P(z)
*/
private DenseVector topicProbabilities, topicSums;
/*
*
*/
private DenseVector meanProbabilities, meanSums;
/*
*
*/
private DenseVector varianceProbabilities, varianceSums;
/*
* small value
*/
private static float smallValue = MathUtility.EPSILON;
/*
* {user, item, {topic z, probability}}
*/
private float[][] probabilityTensor;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// Initialize topic distribution
topicProbabilities = DenseVector.valueOf(factorSize);
topicProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(factorSize) + 1);
});
topicProbabilities.scaleValues(1F / topicProbabilities.getSum(false));
topicSums = DenseVector.valueOf(factorSize);
// intialize conditional distribution P(u|z)
userProbabilities = DenseMatrix.valueOf(factorSize, userSize);
userSums = DenseMatrix.valueOf(factorSize, userSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = userProbabilities.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(userSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
// initialize conditional distribution P(i|z)
itemProbabilities = DenseMatrix.valueOf(factorSize, itemSize);
itemSums = DenseMatrix.valueOf(factorSize, itemSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
DenseVector probabilityVector = itemProbabilities.getRowVector(topicIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(itemSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
// initialize Q
probabilityTensor = new float[actionSize][factorSize];
float globalMean = scoreMatrix.getSum(false) / scoreMatrix.getElementSize();
meanProbabilities = DenseVector.valueOf(factorSize);
varianceProbabilities = DenseVector.valueOf(factorSize);
meanSums = DenseVector.valueOf(factorSize);
varianceSums = DenseVector.valueOf(factorSize);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
meanProbabilities.setValue(topicIndex, globalMean);
varianceProbabilities.setValue(topicIndex, 2);
}
}
@Override
protected void eStep() {
topicSums.setValues(smallValue);
userSums.setValues(0F);
itemSums.setValues(0F);
meanSums.setValues(0F);
varianceSums.setValues(smallValue);
// variational inference to compute Q
int actionIndex = 0;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float denominator = 0F;
float[] numerator = probabilityTensor[actionIndex++];
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float value = topicProbabilities.getValue(topicIndex) * userProbabilities.getValue(topicIndex, userIndex) * itemProbabilities.getValue(topicIndex, itemIndex) * GaussianUtility.probabilityDensity(term.getValue(), meanProbabilities.getValue(topicIndex), varianceProbabilities.getValue(topicIndex));
numerator[topicIndex] = value;
denominator += value;
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float probability = denominator > 0 ? numerator[topicIndex] / denominator : 0F;
numerator[topicIndex] = probability;
}
float score = term.getValue();
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float probability = numerator[topicIndex];
topicSums.shiftValue(topicIndex, probability);
userSums.shiftValue(topicIndex, userIndex, probability);
itemSums.shiftValue(topicIndex, itemIndex, probability);
meanSums.shiftValue(topicIndex, score * probability);
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float mean = meanSums.getValue(topicIndex) / topicSums.getValue(topicIndex);
meanProbabilities.setValue(topicIndex, mean);
}
actionIndex = 0;
for (MatrixScalar term : scoreMatrix) {
float[] probabilities = probabilityTensor[actionIndex++];
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float mean = meanProbabilities.getValue(topicIndex);
float error = term.getValue() - mean;
float probability = probabilities[topicIndex];
varianceSums.shiftValue(topicIndex, error * error * probability);
}
}
}
@Override
protected void mStep() {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
varianceProbabilities.setValue(topicIndex, varianceSums.getValue(topicIndex) / topicSums.getValue(topicIndex));
topicProbabilities.setValue(topicIndex, topicSums.getValue(topicIndex) / actionSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userProbabilities.setValue(topicIndex, userIndex, userSums.getValue(topicIndex, userIndex) / topicSums.getValue(topicIndex));
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemProbabilities.setValue(topicIndex, itemIndex, itemSums.getValue(topicIndex, itemIndex) / topicSums.getValue(topicIndex));
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
float denominator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float weight = topicProbabilities.getValue(topicIndex) * userProbabilities.getValue(topicIndex, userIndex) * itemProbabilities.getValue(topicIndex, itemIndex);
denominator += weight;
value += weight * meanProbabilities.getValue(topicIndex);
}
value = value / denominator;
if (Float.isNaN(value)) {
value = meanScore;
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,91 @@
package com.jstarcraft.rns.model.collaborative.rating;
import org.nd4j.linalg.activations.IActivation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.lossfunctions.ILossFunction;
import org.nd4j.linalg.ops.transforms.Transforms;
import org.nd4j.linalg.primitives.Pair;
/**
*
* AutoRec学习器
*
* <pre>
* AutoRec: Autoencoders Meet Collaborative Filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class AutoRecLearner implements ILossFunction {
private INDArray maskData;
public AutoRecLearner(INDArray maskData) {
this.maskData = maskData;
}
private INDArray scoreArray(INDArray labels, INDArray preOutput, IActivation activationFn, INDArray mask) {
INDArray scoreArr;
INDArray output = activationFn.getActivation(preOutput.dup(), true);
INDArray yMinusyHat = Transforms.abs(labels.sub(output));
scoreArr = yMinusyHat.mul(yMinusyHat);
scoreArr = scoreArr.mul(maskData);
if (mask != null) {
scoreArr.muliColumnVector(mask);
}
return scoreArr;
}
@Override
public double computeScore(INDArray labels, INDArray preOutput, IActivation activationFn, INDArray mask, boolean average) {
INDArray scoreArr = scoreArray(labels, preOutput, activationFn, mask);
double score = scoreArr.sumNumber().doubleValue();
if (average) {
score /= scoreArr.size(0);
}
return score;
}
@Override
public INDArray computeScoreArray(INDArray labels, INDArray preOutput, IActivation activationFn, INDArray mask) {
INDArray scoreArr = scoreArray(labels, preOutput, activationFn, mask);
return scoreArr.sum(1);
}
@Override
public INDArray computeGradient(INDArray labels, INDArray preOutput, IActivation activationFn, INDArray mask) {
INDArray output = activationFn.getActivation(preOutput.dup(), true);
INDArray yMinusyHat = labels.sub(output);
INDArray dldyhat = yMinusyHat.mul(-2);
INDArray gradients = activationFn.backprop(preOutput.dup(), dldyhat).getFirst();
gradients = gradients.mul(maskData);
// multiply with masks, always
if (mask != null) {
gradients.muliColumnVector(mask);
}
return gradients;
}
@Override
public Pair<Double, INDArray> computeGradientAndScore(INDArray labels, INDArray preOutput, IActivation activationFn, INDArray mask, boolean average) {
return new Pair<>(computeScore(labels, preOutput, activationFn, mask, average), computeGradient(labels, preOutput, activationFn, mask));
}
@Override
public String toString() {
return super.toString() + "AutoRecLossFunction";
}
@Override
public String name() {
// TODO Auto-generated method stub
return toString();
}
}

View File

@ -0,0 +1,75 @@
package com.jstarcraft.rns.model.collaborative.rating;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.NeuralNetworkModel;
/**
*
* AutoRec学习器
*
* <pre>
* AutoRec: Autoencoders Meet Collaborative Filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class AutoRecModel extends NeuralNetworkModel {
/**
* the data structure that indicates which element in the user-item is non-zero
*/
private INDArray maskData;
@Override
protected int getInputDimension() {
return userSize;
}
@Override
protected MultiLayerConfiguration getNetworkConfiguration() {
NeuralNetConfiguration.ListBuilder factory = new NeuralNetConfiguration.Builder().seed(6).updater(new Nesterovs(learnRatio, momentum)).weightInit(WeightInit.XAVIER_UNIFORM).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).l2(weightRegularization).list();
factory.layer(0, new DenseLayer.Builder().nIn(inputDimension).nOut(hiddenDimension).activation(Activation.fromString(hiddenActivation)).build());
factory.layer(1, new OutputLayer.Builder(new AutoRecLearner(maskData)).nIn(hiddenDimension).nOut(inputDimension).activation(Activation.fromString(outputActivation)).build());
MultiLayerConfiguration configuration = factory.pretrain(false).backprop(true).build();
return configuration;
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// transform the sparse matrix to INDArray
int[] matrixShape = new int[] { itemSize, userSize };
inputData = Nd4j.zeros(matrixShape);
maskData = Nd4j.zeros(matrixShape);
for (MatrixScalar term : scoreMatrix) {
if (term.getValue() > 0D) {
inputData.putScalar(term.getColumn(), term.getRow(), term.getValue());
maskData.putScalar(term.getColumn(), term.getRow(), 1D);
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(outputData.getFloat(itemIndex, userIndex));
}
}

View File

@ -0,0 +1,41 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.rns.model.collaborative.BHFreeModel;
/**
*
* BH Free推荐器
*
* <pre>
* Balancing Prediction and Recommendation Accuracy: Hierarchical Latent Factors for Preference Data
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BHFreeRatingModel extends BHFreeModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F, probabilities = 0F;
for (Entry<Float, Integer> entry : scoreIndexes.entrySet()) {
float score = entry.getKey();
float probability = 0F;
for (int userTopic = 0; userTopic < userTopicSize; userTopic++) {
for (int itemTopic = 0; itemTopic < itemTopicSize; itemTopic++) {
probability += user2TopicProbabilities.getValue(userIndex, userTopic) * userTopic2ItemTopicProbabilities.getValue(userTopic, itemTopic) * userTopic2ItemTopicScoreProbabilities[userTopic][itemTopic][entry.getValue()];
}
}
value += score * probability;
probabilities += probability;
}
instance.setQuantityMark(value / probabilities);
}
}

View File

@ -0,0 +1,300 @@
package com.jstarcraft.rns.model.collaborative.rating;
import org.apache.commons.math3.distribution.GammaDistribution;
import org.apache.commons.math3.distribution.NormalDistribution;
import org.apache.commons.math3.random.JDKRandomGenerator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MatrixUtility;
import com.jstarcraft.ai.math.algorithm.probability.QuantityProbability;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.model.exception.ModelException;
/**
*
* BPMF推荐器
*
* <pre>
* Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BPMFModel extends MatrixFactorizationModel {
private float userMean, userWishart;
private float itemMean, itemWishart;
private float userBeta, itemBeta;
private float rateSigma;
private int gibbsIterations;
private DenseMatrix[] userMatrixes;
private DenseMatrix[] itemMatrixes;
private QuantityProbability normalDistribution;
private QuantityProbability[] userGammaDistributions;
private QuantityProbability[] itemGammaDistributions;
private class HyperParameter {
// 缓存
private float[] thisVectorCache;
private float[] thatVectorCache;
private float[] thisMatrixCache;
private float[] thatMatrixCache;
private DenseVector factorMeans;
private DenseMatrix factorVariances;
private DenseVector randoms;
private DenseVector outerMeans, innerMeans;
private DenseMatrix covariance, cholesky, inverse, transpose, gaussian, gamma, wishart, copy;
HyperParameter(int cache, DenseMatrix factors) {
if (cache < factorSize) {
cache = factorSize;
}
thisVectorCache = new float[cache];
thisMatrixCache = new float[cache * factorSize];
thatVectorCache = new float[cache];
thatMatrixCache = new float[cache * factorSize];
factorMeans = DenseVector.valueOf(factorSize);
float scale = factors.getRowSize();
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
factorMeans.setValue(factorIndex, factors.getColumnVector(factorIndex).getSum(false) / scale);
}
outerMeans = DenseVector.valueOf(factors.getRowSize());
innerMeans = DenseVector.valueOf(factors.getRowSize());
covariance = DenseMatrix.valueOf(factorSize, factorSize);
cholesky = DenseMatrix.valueOf(factorSize, factorSize);
inverse = DenseMatrix.valueOf(factorSize, factorSize);
transpose = DenseMatrix.valueOf(factorSize, factorSize);
randoms = DenseVector.valueOf(factorSize);
gaussian = DenseMatrix.valueOf(factorSize, factorSize);
gamma = DenseMatrix.valueOf(factorSize, factorSize);
wishart = DenseMatrix.valueOf(factorSize, factorSize);
copy = DenseMatrix.valueOf(factorSize, factorSize);
factorVariances = MatrixUtility.inverse(MatrixUtility.covariance(factors, outerMeans, innerMeans, covariance), copy, inverse);
}
/**
* 取样
*
* @param hyperParameter
* @param factors
* @param normalMu
* @param normalBeta
* @param wishartScale
* @return
* @throws ModelException
*/
private void sampleParameter(QuantityProbability[] gammaDistributions, DenseMatrix factors, float normalMu, float normalBeta, float wishartScale) throws ModelException {
int rowSize = factors.getRowSize();
int columnSize = factors.getColumnSize();
// 重复利用内存.
DenseVector meanCache = DenseVector.valueOf(factorSize, thisVectorCache);
float scale = factors.getRowSize();
meanCache.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(factors.getColumnVector(index).getSum(false) / scale);
});
float beta = normalBeta + rowSize;
DenseMatrix populationVariance = MatrixUtility.covariance(factors, outerMeans, innerMeans, covariance);
wishart.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = 0F;
if (row == column) {
value = wishartScale;
}
value += populationVariance.getValue(row, column) * rowSize;
value += (normalMu - meanCache.getValue(row)) * (normalMu - meanCache.getValue(column)) * (normalBeta * rowSize / beta);
scalar.setValue(value);
});
DenseMatrix wishartMatrix = wishart;
wishartMatrix = MatrixUtility.inverse(wishartMatrix, copy, inverse);
wishartMatrix.addMatrix(transpose.copyMatrix(wishartMatrix, true), false).scaleValues(0.5F);
wishartMatrix = MatrixUtility.wishart(wishartMatrix, normalDistribution, gammaDistributions, randoms, cholesky, gaussian, gamma, transpose, wishart);
if (wishartMatrix != null) {
factorVariances = wishartMatrix;
}
DenseMatrix normalVariance = MatrixUtility.cholesky(MatrixUtility.inverse(factorVariances, copy, inverse).scaleValues(normalBeta), cholesky);
if (normalVariance != null) {
randoms.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(normalDistribution.sample().floatValue());
});
factorMeans.dotProduct(normalVariance, false, randoms, MathCalculator.SERIAL).iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + (normalMu * normalBeta + meanCache.getValue(index) * rowSize) * (1F / beta));
});
}
}
/**
* 更新
*
* @param factorMatrix
* @param scoreVector
* @param hyperParameter
* @return
* @throws ModelException
*/
private void updateParameter(DenseMatrix factorMatrix, SparseVector scoreVector, DenseVector factorVector) throws ModelException {
int size = scoreVector.getElementSize();
// 重复利用内存.
DenseMatrix factorCache = DenseMatrix.valueOf(size, factorSize, thisMatrixCache);
MathVector meanCache = DenseVector.valueOf(size, thisVectorCache);
int index = 0;
for (VectorScalar term : scoreVector) {
meanCache.setValue(index, term.getValue() - meanScore);
MathVector vector = factorMatrix.getRowVector(term.getIndex());
factorCache.getRowVector(index).iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(vector.getValue(scalar.getIndex()));
});
index++;
}
transpose.dotProduct(factorCache, true, factorCache, false, MathCalculator.SERIAL);
transpose.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
float value = scalar.getValue();
scalar.setValue(value * rateSigma + factorVariances.getValue(row, column));
});
DenseMatrix covariance = transpose;
covariance = MatrixUtility.inverse(covariance, copy, inverse);
// 重复利用内存.
meanCache = DenseVector.valueOf(factorCache.getColumnSize(), thatVectorCache).dotProduct(factorCache, true, meanCache, MathCalculator.SERIAL);
meanCache.scaleValues(rateSigma);
// 重复利用内存.
meanCache.addVector(DenseVector.valueOf(factorVariances.getRowSize(), thisVectorCache).dotProduct(factorVariances, false, factorMeans, MathCalculator.SERIAL));
// 重复利用内存.
meanCache = DenseVector.valueOf(covariance.getRowSize(), thisVectorCache).dotProduct(covariance, false, meanCache, MathCalculator.SERIAL);
covariance = MatrixUtility.cholesky(covariance, cholesky);
if (covariance != null) {
randoms.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(normalDistribution.sample().floatValue());
});
factorVector.dotProduct(covariance, false, randoms, MathCalculator.SERIAL).addVector(meanCache);
} else {
factorVector.setValues(0F);
}
}
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userMean = configuration.getFloat("recommender.recommender.user.mu", 0F);
userBeta = configuration.getFloat("recommender.recommender.user.beta", 1F);
userWishart = configuration.getFloat("recommender.recommender.user.wishart.scale", 1F);
itemMean = configuration.getFloat("recommender.recommender.item.mu", 0F);
itemBeta = configuration.getFloat("recommender.recommender.item.beta", 1F);
itemWishart = configuration.getFloat("recommender.recommender.item.wishart.scale", 1F);
rateSigma = configuration.getFloat("recommender.recommender.rating.sigma", 2F);
gibbsIterations = configuration.getInteger("recommender.recommender.iterations.gibbs", 1);
userMatrixes = new DenseMatrix[epocheSize - 1];
itemMatrixes = new DenseMatrix[epocheSize - 1];
normalDistribution = new QuantityProbability(JDKRandomGenerator.class, factorSize, NormalDistribution.class, 0D, 1D);
userGammaDistributions = new QuantityProbability[factorSize];
itemGammaDistributions = new QuantityProbability[factorSize];
for (int index = 0; index < factorSize; index++) {
userGammaDistributions[index] = new QuantityProbability(JDKRandomGenerator.class, index, GammaDistribution.class, (userSize + factorSize - (index + 1D)) / 2D, 2D);
itemGammaDistributions[index] = new QuantityProbability(JDKRandomGenerator.class, index, GammaDistribution.class, (itemSize + factorSize - (index + 1D)) / 2D, 2D);
}
}
@Override
protected void doPractice() {
int cacheSize = 0;
SparseVector[] userVectors = new SparseVector[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
cacheSize = cacheSize < userVector.getElementSize() ? userVector.getElementSize() : cacheSize;
userVectors[userIndex] = userVector;
}
SparseVector[] itemVectors = new SparseVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
cacheSize = cacheSize < itemVector.getElementSize() ? itemVector.getElementSize() : cacheSize;
itemVectors[itemIndex] = itemVector;
}
// TODO 此处考虑重构
HyperParameter userParameter = new HyperParameter(cacheSize, userFactors);
HyperParameter itemParameter = new HyperParameter(cacheSize, itemFactors);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
userParameter.sampleParameter(userGammaDistributions, userFactors, userMean, userBeta, userWishart);
itemParameter.sampleParameter(itemGammaDistributions, itemFactors, itemMean, itemBeta, itemWishart);
for (int gibbsIteration = 0; gibbsIteration < gibbsIterations; gibbsIteration++) {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector scoreVector = userVectors[userIndex];
if (scoreVector.getElementSize() == 0) {
continue;
}
userParameter.updateParameter(itemFactors, scoreVector, userFactors.getRowVector(userIndex));
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector scoreVector = itemVectors[itemIndex];
if (scoreVector.getElementSize() == 0) {
continue;
}
itemParameter.updateParameter(userFactors, scoreVector, itemFactors.getRowVector(itemIndex));
}
}
if (epocheIndex > 0) {
userMatrixes[epocheIndex - 1] = DenseMatrix.copyOf(userFactors);
itemMatrixes[epocheIndex - 1] = DenseMatrix.copyOf(itemFactors);
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float value = 0F;
for (int iterationStep = 0; iterationStep < epocheSize - 1; iterationStep++) {
DenseVector userVector = userMatrixes[iterationStep].getRowVector(userIndex);
DenseVector itemVector = itemMatrixes[iterationStep].getRowVector(itemIndex);
value = (value * (iterationStep) + meanScore + scalar.dotProduct(userVector, itemVector).getValue()) / (iterationStep + 1);
}
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,40 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.rns.model.collaborative.BUCMModel;
/**
*
* BUCM推荐器
*
* <pre>
* Bayesian User Community Model
* Modeling Item Selection and Relevance for Accurate Recommendations
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BUCMRatingModel extends BUCMModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F, probabilities = 0F;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
float score = term.getKey();
float probability = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
probability += userTopicProbabilities.getValue(userIndex, topicIndex) * topicItemProbabilities.getValue(topicIndex, itemIndex) * topicItemScoreProbabilities[topicIndex][itemIndex][term.getValue()];
}
value += probability * score;
probabilities += probability;
}
instance.setQuantityMark(value / probabilities);
}
}

View File

@ -0,0 +1,116 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* BiasedMF推荐器
*
* <pre>
* Biased Matrix Factorization
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class BiasedMFModel extends MatrixFactorizationModel {
/**
* bias regularization
*/
protected float regBias;
/**
* user biases
*/
protected DenseVector userBiases;
/**
* user biases
*/
protected DenseVector itemBiases;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
regBias = configuration.getFloat("recommender.bias.regularization", 0.01F);
// initialize the userBiased and itemBiased
userBiases = DenseVector.valueOf(userSize);
userBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user userIdx
int itemIndex = term.getColumn(); // item itemIdx
float score = term.getValue(); // real rating on item
// itemIdx rated by user
// userIdx
float predict = predict(userIndex, itemIndex);
float error = score - predict;
totalError += error * error;
// update user and item bias
float userBias = userBiases.getValue(userIndex);
userBiases.shiftValue(userIndex, learnRatio * (error - regBias * userBias));
totalError += regBias * userBias * userBias;
float itemBias = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (error - regBias * itemBias));
totalError += regBias * itemBias * itemBias;
// update user and item factors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactor - userRegularization * userFactor));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * userFactor - itemRegularization * itemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
}
}
totalError *= 0.5D;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float value = scalar.dotProduct(userVector, itemVector).getValue();
value += meanScore + userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex);
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,102 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Date;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.StringUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* CCD推荐器
*
* <pre>
* Large-Scale Parallel Collaborative Filtering for the Netflix Prize
* http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class CCDModel extends MatrixFactorizationModel {
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = 0F;
float numerator = 0F;
float denominator = 0F;
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
numerator += (term.getValue() + userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex)) * itemFactors.getValue(itemIndex, factorIndex);
denominator += itemFactors.getValue(itemIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex);
}
userFactor = numerator / (denominator + userRegularization);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
term.setValue(term.getValue() - (userFactor - userFactors.getValue(userIndex, factorIndex)) * itemFactors.getValue(itemIndex, factorIndex));
}
userFactors.setValue(userIndex, factorIndex, userFactor);
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float itemFactor = 0F;
float numerator = 0F;
float denominator = 0F;
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
numerator += (term.getValue() + userFactors.getValue(userIndex, factorIndex) * itemFactors.getValue(itemIndex, factorIndex)) * userFactors.getValue(userIndex, factorIndex);
denominator += userFactors.getValue(userIndex, factorIndex) * userFactors.getValue(userIndex, factorIndex);
}
itemFactor = numerator / (denominator + itemRegularization);
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
term.setValue(term.getValue() - (itemFactor - itemFactors.getValue(itemIndex, factorIndex)) * userFactors.getValue(userIndex, factorIndex));
}
itemFactors.setValue(itemIndex, factorIndex, itemFactor);
}
}
logger.info(StringUtility.format("{} runs at iter {}/{} {}", this.getClass().getSimpleName(), epocheIndex, epocheSize, new Date()));
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float score = scalar.dotProduct(userFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex)).getValue();
if (score == 0F) {
score = meanScore;
}
instance.setQuantityMark(score);
}
}

View File

@ -0,0 +1,150 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.FactorizationMachineModel;
/**
*
* FFM推荐器
*
* <pre>
* Field Aware Factorization Machines for CTR Prediction
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class FFMModel extends FactorizationMachineModel {
/**
* record the <feature: filed>
*/
private int[] featureOrders;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// Matrix for p * (factor * filed)
// TODO 此处应该还是稀疏
featureFactors = DenseMatrix.valueOf(featureSize, factorSize * marker.getQualityOrder());
featureFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
// init the map for feature of filed
featureOrders = new int[featureSize];
int count = 0;
for (int orderIndex = 0, orderSize = dimensionSizes.length; orderIndex < orderSize; orderIndex++) {
int size = dimensionSizes[orderIndex];
for (int index = 0; index < size; index++) {
featureOrders[count + index] = orderIndex;
}
count += size;
}
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
int outerIndex = 0;
int innerIndex = 0;
float outerValue = 0F;
float innerValue = 0F;
float oldWeight = 0F;
float newWeight = 0F;
float oldFactor = 0F;
float newFactor = 0F;
for (DataInstance sample : marker) {
// TODO 因为每次的data都是1,可以考虑避免重复构建featureVector.
MathVector featureVector = getFeatureVector(sample);
float score = sample.getQuantityMark();
float predict = predict(scalar, featureVector);
float error = predict - score;
totalError += error * error;
// global bias
totalError += biasRegularization * globalBias * globalBias;
// update w0
float hW0 = 1;
float gradW0 = error * hW0 + biasRegularization * globalBias;
globalBias += -learnRatio * gradW0;
// 1-way interactions
for (VectorScalar outerTerm : featureVector) {
outerIndex = outerTerm.getIndex();
innerIndex = 0;
oldWeight = weightVector.getValue(outerIndex);
newWeight = outerTerm.getValue();
newWeight = error * newWeight + weightRegularization * oldWeight;
weightVector.shiftValue(outerIndex, -learnRatio * newWeight);
totalError += weightRegularization * oldWeight * oldWeight;
outerValue = outerTerm.getValue();
innerValue = 0F;
// 2-way interactions
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
oldFactor = featureFactors.getValue(outerIndex, featureOrders[outerIndex] + factorIndex);
newFactor = 0F;
for (VectorScalar innerTerm : featureVector) {
innerIndex = innerTerm.getIndex();
innerValue = innerTerm.getValue();
if (innerIndex != outerIndex) {
newFactor += outerValue * featureFactors.getValue(innerIndex, featureOrders[outerIndex] + factorIndex) * innerValue;
}
}
newFactor = error * newFactor + factorRegularization * oldFactor;
featureFactors.shiftValue(outerIndex, featureOrders[outerIndex] + factorIndex, -learnRatio * newFactor);
totalError += factorRegularization * oldFactor * oldFactor;
}
}
}
totalError *= 0.5;
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
@Override
protected float predict(DefaultScalar scalar, MathVector featureVector) {
float value = 0F;
// global bias
value += globalBias;
// 1-way interaction
value += scalar.dotProduct(weightVector, featureVector).getValue();
int outerIndex = 0;
int innerIndex = 0;
float outerValue = 0F;
float innerValue = 0F;
// 2-way interaction
for (int featureIndex = 0; featureIndex < factorSize; featureIndex++) {
for (VectorScalar outerVector : featureVector) {
outerIndex = outerVector.getIndex();
outerValue = outerVector.getValue();
for (VectorScalar innerVector : featureVector) {
innerIndex = innerVector.getIndex();
innerValue = innerVector.getValue();
if (outerIndex != innerIndex) {
value += featureFactors.getValue(outerIndex, featureOrders[innerIndex] + featureIndex) * featureFactors.getValue(innerIndex, featureOrders[outerIndex] + featureIndex) * outerValue * innerValue;
}
}
}
}
return value;
}
}

View File

@ -0,0 +1,188 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.FactorizationMachineModel;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* FM ALS推荐器
*
* <pre>
* Factorization Machines via Alternating Least Square
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class FMALSModel extends FactorizationMachineModel {
/**
* train appender matrix
*/
private SparseMatrix featureMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// init Q
// TODO 此处为rateFactors
actionFactors = DenseMatrix.valueOf(actionSize, factorSize);
// construct training appender matrix
HashMatrix table = new HashMatrix(true, actionSize, featureSize, new Long2FloatRBTreeMap());
int index = 0;
int order = marker.getQualityOrder();
for (DataInstance sample : model) {
int count = 0;
for (int orderIndex = 0; orderIndex < order; orderIndex++) {
table.setValue(index, count + sample.getQualityFeature(orderIndex), 1F);
count += dimensionSizes[orderIndex];
}
index++;
}
// TODO 考虑重构(.此处似乎就是FactorizationMachineRecommender.getFeatureVector);
featureMatrix = SparseMatrix.valueOf(actionSize, featureSize, table);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
// precomputing Q and errors, for efficiency
DenseVector errorVector = DenseVector.valueOf(actionSize);
int index = 0;
for (DataInstance sample : marker) {
// TODO 因为每次的data都是1,可以考虑避免重复构建featureVector.
MathVector featureVector = getFeatureVector(sample);
float score = sample.getQuantityMark();
float predict = predict(scalar, featureVector);
float error = score - predict;
errorVector.setValue(index, error);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float sum = 0F;
for (VectorScalar vectorTerm : featureVector) {
sum += featureFactors.getValue(vectorTerm.getIndex(), factorIndex) * vectorTerm.getValue();
}
actionFactors.setValue(index, factorIndex, sum);
}
index++;
}
/**
* parameter optimized by using formula in [1]. errors updated by using formula:
* error_new = error_old + theta_old*h_old - theta_new * h_new; reference: [1].
* Rendle, Steffen, "Factorization Machines with libFM." ACM Transactions on
* Intelligent Systems and Technology, 2012.
*/
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
// global bias
float numerator = 0F;
float denominator = 0F;
for (int scoreIndex = 0; scoreIndex < actionSize; scoreIndex++) {
// TODO 因为此处相当与迭代trainTensor的featureVector,所以h_theta才会是1D.
float h_theta = 1F;
numerator += globalBias * h_theta * h_theta + h_theta * errorVector.getValue(scoreIndex);
denominator += h_theta;
}
denominator += biasRegularization;
float bias = numerator / denominator;
// update errors
for (int scoreIndex = 0; scoreIndex < actionSize; scoreIndex++) {
float oldError = errorVector.getValue(scoreIndex);
float newError = oldError + (globalBias - bias);
errorVector.setValue(scoreIndex, newError);
totalError += oldError * oldError;
}
// update w0
globalBias = bias;
totalError += biasRegularization * globalBias * globalBias;
// 1-way interactions
for (int featureIndex = 0; featureIndex < featureSize; featureIndex++) {
float oldWeight = weightVector.getValue(featureIndex);
numerator = 0F;
denominator = 0F;
// TODO 考虑重构
SparseVector featureVector = featureMatrix.getColumnVector(featureIndex);
for (VectorScalar vectorTerm : featureVector) {
int scoreIndex = vectorTerm.getIndex();
float h_theta = vectorTerm.getValue();
numerator += oldWeight * h_theta * h_theta + h_theta * errorVector.getValue(scoreIndex);
denominator += h_theta * h_theta;
}
denominator += weightRegularization;
float newWeight = numerator / denominator;
// update errors
for (VectorScalar vectorTerm : featureVector) {
int scoreIndex = vectorTerm.getIndex();
float oldError = errorVector.getValue(scoreIndex);
float newError = oldError + (oldWeight - newWeight) * vectorTerm.getValue();
errorVector.setValue(scoreIndex, newError);
}
// update W
weightVector.setValue(featureIndex, newWeight);
totalError += weightRegularization * oldWeight * oldWeight;
}
// 2-way interactions
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
for (int featureIndex = 0; featureIndex < featureSize; featureIndex++) {
float oldValue = featureFactors.getValue(featureIndex, factorIndex);
numerator = 0F;
denominator = 0F;
SparseVector featureVector = featureMatrix.getColumnVector(featureIndex);
for (VectorScalar vectorTerm : featureVector) {
int scoreIndex = vectorTerm.getIndex();
float x_val = vectorTerm.getValue();
float h_theta = x_val * (actionFactors.getValue(scoreIndex, factorIndex) - oldValue * x_val);
numerator += oldValue * h_theta * h_theta + h_theta * errorVector.getValue(scoreIndex);
denominator += h_theta * h_theta;
}
denominator += factorRegularization;
float newValue = numerator / denominator;
// update errors and Q
for (VectorScalar vectorTerm : featureVector) {
int scoreIndex = vectorTerm.getIndex();
float x_val = vectorTerm.getValue();
float oldScore = actionFactors.getValue(scoreIndex, factorIndex);
float newScore = oldScore + (newValue - oldValue) * x_val;
float h_theta_old = x_val * (oldScore - oldValue * x_val);
float h_theta_new = x_val * (newScore - newValue * x_val);
float oldError = errorVector.getValue(scoreIndex);
float newError = oldError + oldValue * h_theta_old - newValue * h_theta_new;
errorVector.setValue(scoreIndex, newError);
actionFactors.setValue(scoreIndex, factorIndex, newScore);
}
// update V
featureFactors.setValue(featureIndex, factorIndex, newValue);
totalError += factorRegularization * oldValue * oldValue;
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
}

View File

@ -0,0 +1,88 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.FactorizationMachineModel;
/**
*
* FM SGD推荐器
*
* <pre>
* Factorization Machines via Stochastic Gradient Descent with Square Loss
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class FMSGDModel extends FactorizationMachineModel {
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (DataInstance sample : marker) {
// TODO 因为每次的data都是1,可以考虑避免重复构建featureVector.
MathVector featureVector = getFeatureVector(sample);
float score = sample.getQuantityMark();
float predict = predict(scalar, featureVector);
float error = predict - score;
totalError += error * error;
// global bias
totalError += biasRegularization * globalBias * globalBias;
// TODO 因为此处相当与迭代trainTensor的featureVector,所以hW0才会是1D.
float hW0 = 1F;
float bias = error * hW0 + biasRegularization * globalBias;
// update w0
globalBias += -learnRatio * bias;
// 1-way interactions
for (VectorScalar outerTerm : featureVector) {
int outerIndex = outerTerm.getIndex();
float oldWeight = weightVector.getValue(outerIndex);
float featureWeight = outerTerm.getValue();
float newWeight = error * featureWeight + weightRegularization * oldWeight;
weightVector.shiftValue(outerIndex, -learnRatio * newWeight);
totalError += weightRegularization * oldWeight * oldWeight;
// 2-way interactions
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float oldValue = featureFactors.getValue(outerIndex, factorIndex);
float newValue = 0F;
for (VectorScalar innerTerm : featureVector) {
int innerIndex = innerTerm.getIndex();
if (innerIndex != outerIndex) {
newValue += featureWeight * featureFactors.getValue(innerIndex, factorIndex) * innerTerm.getValue();
}
}
newValue = error * newValue + factorRegularization * oldValue;
featureFactors.shiftValue(outerIndex, factorIndex, -learnRatio * newValue);
totalError += factorRegularization * oldValue * oldValue;
}
}
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
}

View File

@ -0,0 +1,224 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.table.SparseTable;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.Float2FloatKeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.GaussianUtility;
import it.unimi.dsi.fastutil.ints.Int2ObjectRBTreeMap;
/**
*
* GPLSA推荐器
*
* <pre>
* Collaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class GPLSAModel extends ProbabilisticGraphicalModel {
/*
* {user, item, {topic z, probability}}
*/
protected SparseTable<float[]> probabilityTensor;
/*
* Conditional Probability: P(z|u)
*/
protected DenseMatrix userTopicProbabilities;
/*
* Conditional Probability: P(v|y,z)
*/
protected DenseMatrix itemMus, itemSigmas;
/*
* regularize ratings
*/
protected DenseVector userMus, userSigmas;
/*
* smoothing weight
*/
protected float smoothWeight;
/*
* tempered EM parameter beta, suggested by Wu Bin
*/
protected float beta;
/*
* small value for initialization
*/
protected static float smallValue = MathUtility.EPSILON;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// Initialize users' conditional probabilities
userTopicProbabilities = DenseMatrix.valueOf(userSize, factorSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
DenseVector probabilityVector = userTopicProbabilities.getRowVector(userIndex);
probabilityVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomInteger(factorSize) + 1);
});
probabilityVector.scaleValues(1F / probabilityVector.getSum(false));
}
Float2FloatKeyValue keyValue = scoreMatrix.getVariance();
float mean = keyValue.getKey();
float variance = keyValue.getValue() / scoreMatrix.getElementSize();
userMus = DenseVector.valueOf(userSize);
userSigmas = DenseVector.valueOf(userSize);
smoothWeight = configuration.getInteger("recommender.recommender.smoothWeight");
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
int size = userVector.getElementSize();
if (size < 1) {
continue;
}
float mu = (userVector.getSum(false) + smoothWeight * mean) / (size + smoothWeight);
userMus.setValue(userIndex, mu);
float sigma = userVector.getVariance(mu);
sigma += smoothWeight * variance;
sigma = (float) Math.sqrt(sigma / (size + smoothWeight));
userSigmas.setValue(userIndex, sigma);
}
// Initialize Q
// TODO 重构
probabilityTensor = new SparseTable<>(true, userSize, itemSize, new Int2ObjectRBTreeMap<>());
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
score = (score - userMus.getValue(userIndex)) / userSigmas.getValue(userIndex);
term.setValue(score);
probabilityTensor.setValue(userIndex, itemIndex, new float[factorSize]);
}
itemMus = DenseMatrix.valueOf(itemSize, factorSize);
itemSigmas = DenseMatrix.valueOf(itemSize, factorSize);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
int size = itemVector.getElementSize();
if (size < 1) {
continue;
}
float mu = itemVector.getSum(false) / size;
float sigma = itemVector.getVariance(mu);
sigma = (float) Math.sqrt(sigma / size);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
itemMus.setValue(itemIndex, topicIndex, mu + smallValue * RandomUtility.randomFloat(1F));
itemSigmas.setValue(itemIndex, topicIndex, sigma + smallValue * RandomUtility.randomFloat(1F));
}
}
}
@Override
protected void eStep() {
// variational inference to compute Q
float[] numerators = new float[factorSize];
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
float denominator = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float pdf = GaussianUtility.probabilityDensity(score, itemMus.getValue(itemIndex, topicIndex), itemSigmas.getValue(itemIndex, topicIndex));
float value = (float) Math.pow(userTopicProbabilities.getValue(userIndex, topicIndex) * pdf, beta); // Tempered
// EM
numerators[topicIndex] = value;
denominator += value;
}
float[] probabilities = probabilityTensor.getValue(userIndex, itemIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float probability = (denominator > 0 ? numerators[topicIndex] / denominator : 0);
probabilities[topicIndex] = probability;
}
}
}
@Override
protected void mStep() {
float[] numerators = new float[factorSize];
// theta_u,z
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() < 1) {
continue;
}
float denominator = 0F;
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
float[] probabilities = probabilityTensor.getValue(userIndex, itemIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
numerators[topicIndex] = probabilities[topicIndex];
denominator += numerators[topicIndex];
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
userTopicProbabilities.setValue(userIndex, topicIndex, numerators[topicIndex] / denominator);
}
}
// topicItemMu, topicItemSigma
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
if (itemVector.getElementSize() < 1) {
continue;
}
float numerator = 0F, denominator = 0F;
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
float score = term.getValue();
float[] probabilities = probabilityTensor.getValue(userIndex, itemIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
float probability = probabilities[topicIndex];
numerator += score * probability;
denominator += probability;
}
}
float mu = denominator > 0F ? numerator / denominator : 0F;
numerator = 0F;
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
float score = term.getValue();
float[] probabilities = probabilityTensor.getValue(userIndex, itemIndex);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
double probability = probabilities[topicIndex];
numerator += Math.pow(score - mu, 2) * probability;
}
}
float sigma = (float) (denominator > 0F ? Math.sqrt(numerator / denominator) : 0F);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
itemMus.setValue(itemIndex, topicIndex, mu);
itemSigmas.setValue(itemIndex, topicIndex, sigma);
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float sum = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
sum += userTopicProbabilities.getValue(userIndex, topicIndex) * itemMus.getValue(itemIndex, topicIndex);
}
instance.setQuantityMark(userMus.getValue(userIndex) + userSigmas.getValue(userIndex) * sum);
}
}

View File

@ -0,0 +1,429 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import com.google.common.collect.HashBasedTable;
import com.google.common.collect.Table;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* IRRG推荐器
*
* <pre>
* Exploiting Implicit Item Relationships for Recommender Systems
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class IRRGModel extends MatrixFactorizationModel {
/** item relationship regularization coefficient */
private float correlationRegularization;
/** adjust the reliability */
// TODO 修改为配置.
private float reliability = 50F;
/** k nearest neighborhoods */
// TODO 修改为配置.
private int neighborSize = 50;
/** store co-occurence between two items. */
@Deprecated
private Table<Integer, Integer, Integer> itemCount = HashBasedTable.create();
/** store item-to-item AR */
@Deprecated
private Table<Integer, Integer, Float> itemCorrsAR = HashBasedTable.create();
/** store sorted item-to-item AR */
@Deprecated
private Table<Integer, Integer, Float> itemCorrsAR_Sorted = HashBasedTable.create();
/** store the complementary item-to-item AR */
@Deprecated
private Table<Integer, Integer, Float> itemCorrsAR_added = HashBasedTable.create();
/** store group-to-item AR */
@Deprecated
private Map<Integer, List<KeyValue<KeyValue<Integer, Integer>, Float>>> itemCorrsGAR = new HashMap<>();
private SparseMatrix complementMatrix;
/** store sorted group-to-item AR */
private Map<Integer, SparseMatrix> itemCorrsGAR_Sorted = new HashMap<>();
// TODO 临时性表格,用于代替trainMatrix.getTermValue.
@Deprecated
Table<Integer, Integer, Float> dataTable = HashBasedTable.create();
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
dataTable.put(userIndex, itemIndex, term.getValue());
}
correlationRegularization = configuration.getFloat("recommender.alpha");
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.8F));
});
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.8F));
});
computeAssociationRuleByItem();
sortAssociationRuleByItem();
computeAssociationRuleByGroup();
sortAssociationRuleByGroup();
complementAssociationRule();
complementMatrix = SparseMatrix.valueOf(itemSize, itemSize, itemCorrsAR_added);
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
DenseMatrix userDeltas = DenseMatrix.valueOf(userSize, factorSize);
DenseMatrix itemDeltas = DenseMatrix.valueOf(itemSize, factorSize);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
if (score <= 0F) {
continue;
}
float predict = super.predict(userIndex, itemIndex);
float error = LogisticUtility.getValue(predict) - (score - minimumScore) / (maximumScore - minimumScore);
float csgd = LogisticUtility.getGradient(predict) * error;
totalError += error * error;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
userDeltas.shiftValue(userIndex, factorIndex, csgd * itemFactor + userRegularization * userFactor);
itemDeltas.shiftValue(itemIndex, factorIndex, csgd * userFactor + itemRegularization * itemFactor);
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
}
}
for (int leftItemIndex = 0; leftItemIndex < itemSize; leftItemIndex++) { // complementary
// item-to-item
// AR
SparseVector itemVector = complementMatrix.getColumnVector(leftItemIndex);
for (VectorScalar term : itemVector) {
int rightItemIndex = term.getIndex();
float skj = term.getValue();
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float ekj = itemFactors.getValue(leftItemIndex, factorIndex) - itemFactors.getValue(rightItemIndex, factorIndex);
itemDeltas.shiftValue(leftItemIndex, factorIndex, correlationRegularization * skj * ekj);
totalError += correlationRegularization * skj * ekj * ekj;
}
}
itemVector = complementMatrix.getRowVector(leftItemIndex);
for (VectorScalar term : itemVector) {
int rightItemIndex = term.getIndex();
float sjg = term.getValue();
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float ejg = itemFactors.getValue(leftItemIndex, factorIndex) - itemFactors.getValue(rightItemIndex, factorIndex);
itemDeltas.shiftValue(leftItemIndex, factorIndex, correlationRegularization * sjg * ejg);
}
}
}
// group-to-item AR
for (Entry<Integer, SparseMatrix> leftKeyValue : itemCorrsGAR_Sorted.entrySet()) {
int leftItemIndex = leftKeyValue.getKey();
SparseMatrix leftTable = leftKeyValue.getValue();
for (MatrixScalar term : leftTable) {
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float egkj = (float) (itemFactors.getValue(leftItemIndex, factorIndex) - (itemFactors.getValue(term.getRow(), factorIndex) + itemFactors.getValue(term.getColumn(), factorIndex)) / Math.sqrt(2F));
float egkj_1 = correlationRegularization * term.getValue() * egkj;
itemDeltas.shiftValue(leftItemIndex, factorIndex, egkj_1);
totalError += egkj_1 * egkj;
}
}
for (Entry<Integer, SparseMatrix> rightKeyValue : itemCorrsGAR_Sorted.entrySet()) {
int rightItemIndex = rightKeyValue.getKey();
if (rightItemIndex != leftItemIndex) {
SparseMatrix rightTable = rightKeyValue.getValue();
SparseVector itemVector = rightTable.getRowVector(leftItemIndex);
for (VectorScalar term : itemVector) {
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float ejgk = (float) (itemFactors.getValue(rightItemIndex, factorIndex) - (itemFactors.getValue(leftItemIndex, factorIndex) + itemFactors.getValue(term.getIndex(), factorIndex)) / Math.sqrt(2F));
float ejgk_1 = (float) (-correlationRegularization * term.getValue() * ejgk / Math.sqrt(2F));
itemDeltas.shiftValue(leftItemIndex, factorIndex, ejgk_1);
}
}
}
}
}
userFactors.addMatrix(userDeltas.scaleValues(-learnRatio), false);
itemFactors.addMatrix(itemDeltas.scaleValues(-learnRatio), false);
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float score = super.predict(userIndex, itemIndex);
score = LogisticUtility.getValue(score);
score = minimumScore + score * (maximumScore - minimumScore);
instance.setQuantityMark(score);
}
/**
* 计算物品之间的关联规则
*/
private void computeAssociationRuleByItem() {
// TODO 此处可以参考Abstract.getScoreList的相似度计算.
for (int leftItemIndex = 0; leftItemIndex < itemSize; leftItemIndex++) {
if (scoreMatrix.getColumnScope(leftItemIndex) == 0) {
continue;
}
SparseVector itemVector = scoreMatrix.getColumnVector(leftItemIndex);
int total = itemVector.getElementSize();
for (int rightItemIndex = 0; rightItemIndex < itemSize; rightItemIndex++) {
if (leftItemIndex == rightItemIndex) {
continue;
}
float coefficient = 0F;
int count = 0;
for (VectorScalar term : itemVector) {
int userIndex = term.getIndex();
if (dataTable.contains(userIndex, rightItemIndex)) {
count++;
}
}
float shrink = count / (count + reliability);
coefficient = shrink * count / total;
if (coefficient > 0F) {
itemCorrsAR.put(leftItemIndex, rightItemIndex, coefficient);
itemCount.put(leftItemIndex, rightItemIndex, count);
}
}
}
}
/**
* 排序关联规则
*/
private void sortAssociationRuleByItem() {
for (int leftItemIndex : itemCorrsAR.columnKeySet()) {
int size = itemCorrsAR.column(leftItemIndex).size();
float temp[][] = new float[size][3];
int flag = 0;
for (int rightItemIndex : itemCorrsAR.column(leftItemIndex).keySet()) {
temp[flag][0] = rightItemIndex;
temp[flag][1] = leftItemIndex;
temp[flag][2] = itemCorrsAR.get(rightItemIndex, leftItemIndex);
flag++;
}
if (size > neighborSize) {
for (int i = 0; i < neighborSize; i++) { // sort k nearest
// neighbors
for (int j = i + 1; j < size; j++) {
if (temp[i][2] < temp[j][2]) {
for (int k = 0; k < 3; k++) {
float trans = temp[i][k];
temp[i][k] = temp[j][k];
temp[j][k] = trans;
}
}
}
}
storeAssociationRule(neighborSize, temp);
} else {
storeAssociationRule(size, temp);
}
}
}
/**
* 保存关联规则
*
* @param size
* @param temp
*/
private void storeAssociationRule(int size, float temp[][]) {
for (int i = 0; i < size; i++) {
int leftItemIndex = (int) (temp[i][0]);
int rightItemIndex = (int) (temp[i][1]);
itemCorrsAR_Sorted.put(leftItemIndex, rightItemIndex, temp[i][2]);
}
}
/**
* Find out itemsets which contain three items and store them into mylist.
*/
private void computeAssociationRuleByGroup() {
for (int groupIndex : itemCorrsAR.columnKeySet()) {
Integer[] itemIndexes = itemCorrsAR_Sorted.column(groupIndex).keySet().toArray(new Integer[] {});
LinkedList<KeyValue<Integer, Integer>> groupItemList = new LinkedList<>();
for (int leftIndex = 0; leftIndex < itemIndexes.length - 1; leftIndex++) {
for (int rightIndex = leftIndex + 1; rightIndex < itemIndexes.length; rightIndex++) {
if (itemCount.contains(itemIndexes[leftIndex], itemIndexes[rightIndex])) {
groupItemList.add(new KeyValue<>(itemIndexes[leftIndex], itemIndexes[rightIndex]));
}
}
}
computeAssociationRuleByGroup(groupIndex, groupItemList);
}
}
/**
* Compute group-to-item AR and store them into map itemCorrsGAR
*/
private void computeAssociationRuleByGroup(int groupIndex, LinkedList<KeyValue<Integer, Integer>> itemList) {
List<KeyValue<KeyValue<Integer, Integer>, Float>> coefficientList = new LinkedList<>();
for (KeyValue<Integer, Integer> keyValue : itemList) {
int leftIndex = keyValue.getKey();
int rightIndex = keyValue.getValue();
SparseVector groupVector = scoreMatrix.getColumnVector(groupIndex);
int count = 0;
for (VectorScalar term : groupVector) {
int userIndex = term.getIndex();
if (dataTable.contains(userIndex, leftIndex) && dataTable.contains(userIndex, rightIndex)) {
count++;
}
}
if (count > 0) {
float shrink = count / (count + reliability);
int co_bc = itemCount.get(leftIndex, rightIndex);
float coefficient = shrink * (count + 0F) / co_bc;
coefficientList.add(new KeyValue<>(keyValue, coefficient));
}
}
itemCorrsGAR.put(groupIndex, new ArrayList<>(coefficientList));
}
/**
* Order group-to-item AR and store them into map itemCorrsGAR_Sorted
*/
private void sortAssociationRuleByGroup() {
for (int groupIndex : itemCorrsGAR.keySet()) {
List<KeyValue<KeyValue<Integer, Integer>, Float>> list = itemCorrsGAR.get(groupIndex);
if (list.size() > neighborSize) {
Collections.sort(list, (left, right) -> {
return right.getValue().compareTo(left.getValue());
});
list = list.subList(0, neighborSize);
}
HashMatrix groupTable = new HashMatrix(true, itemSize, itemSize, new Long2FloatRBTreeMap());
for (KeyValue<KeyValue<Integer, Integer>, Float> keyValue : list) {
int leftItemIndex = keyValue.getKey().getKey();
int rightItemIndex = keyValue.getKey().getValue();
float correlation = keyValue.getValue();
groupTable.setValue(leftItemIndex, rightItemIndex, correlation);
}
itemCorrsGAR_Sorted.put(groupIndex, SparseMatrix.valueOf(itemSize, itemSize, groupTable));
}
}
/**
* Select item-to-item AR to complement group-to-item AR
*/
/**
* 选择物品关联规则补充分组关联规则.
*/
private void complementAssociationRule() {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (scoreMatrix.getColumnScope(itemIndex) == 0) {
continue;
}
SparseMatrix groupTable = itemCorrsGAR_Sorted.get(itemIndex);
if (groupTable != null) {
int groupSize = groupTable.getElementSize();
if (groupSize < neighborSize) {
int complementSize = neighborSize - groupSize;
int itemSize = itemCorrsAR_Sorted.column(itemIndex).size();
// TODO 使用KeyValue代替.
float[][] trans = new float[itemSize][2];
if (itemSize > complementSize) {
int count = 0;
for (int id : itemCorrsAR_Sorted.column(itemIndex).keySet()) {
float value = itemCorrsAR_Sorted.get(id, itemIndex);
trans[count][0] = id;
trans[count][1] = value;
count++;
}
for (int x = 0; x < complementSize; x++) {
for (int y = x + 1; y < trans.length; y++) {
float x_value = trans[x][1];
float y_value = trans[y][1];
if (x_value < y_value) {
for (int z = 0; z < 2; z++) {
float tran = trans[x][z];
trans[x][z] = trans[y][z];
trans[y][z] = tran;
}
}
}
}
for (int x = 0; x < complementSize; x++) {
int id = (int) (trans[x][0]);
float value = trans[x][1];
itemCorrsAR_added.put(id, itemIndex, value);
}
} else {
storeCAR(itemIndex);
}
}
} else {
storeCAR(itemIndex);
}
}
}
/**
* Function to store complementary item-to-item AR into table itemCorrsAR_added.
*
* @param leftItemIndex
*/
private void storeCAR(int leftItemIndex) {
for (int rightItemIndex : itemCorrsAR_Sorted.column(leftItemIndex).keySet()) {
float value = itemCorrsAR_Sorted.get(rightItemIndex, leftItemIndex);
itemCorrsAR_added.put(rightItemIndex, leftItemIndex, value);
}
}
}

View File

@ -0,0 +1,79 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.rns.model.collaborative.ItemKNNModel;
/**
*
* Item KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class ItemKNNRatingModel extends ItemKNNModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
SparseVector userVector = userVectors[userIndex];
MathVector neighbors = itemNeighbors[itemIndex];
if (userVector.getElementSize() == 0 || neighbors.getElementSize() == 0) {
instance.setQuantityMark(meanScore);
return;
}
float sum = 0F, absolute = 0F;
int count = 0;
int leftCursor = 0, rightCursor = 0, leftSize = userVector.getElementSize(), rightSize = neighbors.getElementSize();
Iterator<VectorScalar> leftIterator = userVector.iterator();
VectorScalar leftTerm = leftIterator.next();
Iterator<VectorScalar> rightIterator = neighbors.iterator();
VectorScalar rightTerm = rightIterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (leftTerm.getIndex() == rightTerm.getIndex()) {
count++;
float correlation = rightTerm.getValue();
float score = leftTerm.getValue();
sum += correlation * (score - itemMeans.getValue(rightTerm.getIndex()));
absolute += Math.abs(correlation);
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
leftCursor++;
rightCursor++;
} else if (leftTerm.getIndex() > rightTerm.getIndex()) {
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
rightCursor++;
} else if (leftTerm.getIndex() < rightTerm.getIndex()) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
leftCursor++;
}
}
if (count == 0) {
instance.setQuantityMark(meanScore);
return;
}
instance.setQuantityMark(absolute > 0 ? itemMeans.getValue(itemIndex) + sum / absolute : meanScore);
}
}

View File

@ -0,0 +1,33 @@
package com.jstarcraft.rns.model.collaborative.rating;
/**
* 核平滑器
*
* <pre>
* {@link LLORMAModel}
* </pre>
*
* @author Birdy
*
*/
enum KernelSmoother {
TRIANGULAR_KERNEL, UNIFORM_KERNEL, EPANECHNIKOV_KERNEL, GAUSSIAN_KERNEL;
public float kernelize(float similarity, float width) {
float distance = 1F - similarity;
switch (this) {
case TRIANGULAR_KERNEL:
return Math.max(1F - distance / width, 0F);
case UNIFORM_KERNEL:
return distance < width ? 1F : 0F;
case EPANECHNIKOV_KERNEL:
return (float) Math.max(3F / 4F * (1F - Math.pow(distance / width, 2F)), 0F);
case GAUSSIAN_KERNEL:
return (float) (1F / Math.sqrt(2F * Math.PI) * Math.exp(-0.5F * Math.pow(distance / width, 2F)));
default:
return Math.max(1F - distance / width, 0F);
}
}
}

View File

@ -0,0 +1,291 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap;
/**
*
* LDCC推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class LDCCModel extends ProbabilisticGraphicalModel {
// TODO 重构为稀疏矩阵?
private Int2IntRBTreeMap userTopics, itemTopics; // Zu, Zv
private DenseMatrix userTopicTimes, itemTopicTimes; // Nui, Nvj
private DenseVector userScoreTimes, itemScoreTimes; // Nv
private DenseMatrix topicTimes;
private DenseMatrix topicProbabilities;
private DenseVector userProbabilities;
private DenseVector itemProbabilities;
private int[][][] rateTopicTimes;
private int numberOfUserTopics, numberOfItemTopics;
private float userAlpha, itemAlpha, ratingBeta;
private DenseMatrix userTopicProbabilities, itemTopicProbabilities;
private DenseMatrix userTopicSums, itemTopicSums;
private float[][][] rateTopicProbabilities, rateTopicSums;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
numberOfStatistics = 0;
numberOfUserTopics = configuration.getInteger("recommender.pgm.number.users", 10);
numberOfItemTopics = configuration.getInteger("recommender.pgm.number.items", 10);
userAlpha = configuration.getFloat("recommender.pgm.user.alpha", 1F / numberOfUserTopics);
itemAlpha = configuration.getFloat("recommender.pgm.item.alpha", 1F / numberOfItemTopics);
ratingBeta = configuration.getFloat("recommender.pgm.rating.beta", 1F / actionSize);
userTopicTimes = DenseMatrix.valueOf(userSize, numberOfUserTopics);
itemTopicTimes = DenseMatrix.valueOf(itemSize, numberOfItemTopics);
userScoreTimes = DenseVector.valueOf(userSize);
itemScoreTimes = DenseVector.valueOf(itemSize);
rateTopicTimes = new int[numberOfUserTopics][numberOfItemTopics][actionSize];
topicTimes = DenseMatrix.valueOf(numberOfUserTopics, numberOfItemTopics);
topicProbabilities = DenseMatrix.valueOf(numberOfUserTopics, numberOfItemTopics);
userProbabilities = DenseVector.valueOf(numberOfUserTopics);
itemProbabilities = DenseVector.valueOf(numberOfItemTopics);
userTopics = new Int2IntRBTreeMap();
itemTopics = new Int2IntRBTreeMap();
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score);
int userTopic = RandomUtility.randomInteger(numberOfUserTopics);
int itemTopic = RandomUtility.randomInteger(numberOfItemTopics);
userTopicTimes.shiftValue(userIndex, userTopic, 1);
userScoreTimes.shiftValue(userIndex, 1);
itemTopicTimes.shiftValue(itemIndex, itemTopic, 1);
itemScoreTimes.shiftValue(itemIndex, 1);
rateTopicTimes[userTopic][itemTopic][scoreIndex]++;
topicTimes.shiftValue(userTopic, itemTopic, 1);
userTopics.put(userIndex * itemSize + itemIndex, userTopic);
itemTopics.put(userIndex * itemSize + itemIndex, itemTopic);
}
// parameters
userTopicSums = DenseMatrix.valueOf(userSize, numberOfUserTopics);
itemTopicSums = DenseMatrix.valueOf(itemSize, numberOfItemTopics);
rateTopicProbabilities = new float[numberOfUserTopics][numberOfItemTopics][actionSize];
rateTopicSums = new float[numberOfUserTopics][numberOfItemTopics][actionSize];
}
@Override
protected void eStep() {
// 缓存概率
float random = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
// TODO 此处可以重构
int scoreIndex = scoreIndexes.get(score);
// TODO 此处可以重构
// user and item's factors
int userTopic = userTopics.get(userIndex * itemSize + itemIndex);
int itemTopic = itemTopics.get(userIndex * itemSize + itemIndex);
// remove this observation
userTopicTimes.shiftValue(userIndex, userTopic, -1);
userScoreTimes.shiftValue(userIndex, -1);
itemTopicTimes.shiftValue(itemIndex, itemTopic, -1);
itemScoreTimes.shiftValue(itemIndex, -1);
rateTopicTimes[userTopic][itemTopic][scoreIndex]--;
topicTimes.shiftValue(userTopic, itemTopic, -1);
int topicIndex = userTopic;
// TODO
// 此处topicProbabilities似乎可以与userProbabilities和itemProbabilities整合.
// Compute P(i, j)
// 归一化
topicProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int row = scalar.getRow();
int column = scalar.getColumn();
// Compute Pmn
float v1 = (userTopicTimes.getValue(userIndex, row) + userAlpha) / (userScoreTimes.getValue(userIndex) + numberOfUserTopics * userAlpha);
float v2 = (userTopicTimes.getValue(topicIndex, column) + itemAlpha) / (itemScoreTimes.getValue(itemIndex) + numberOfItemTopics * itemAlpha);
float v3 = (rateTopicTimes[row][column][scoreIndex] + ratingBeta) / (topicTimes.getValue(row, column) + actionSize * ratingBeta);
float value = v1 * v2 * v3;
scalar.setValue(value);
});
// Re-sample user factor
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
userProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = topicProbabilities.getRowVector(index).getSum(false);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
userTopic = SampleUtility.binarySearch(userProbabilities, 0, userProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
sum.setValue(0F);
itemProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = topicProbabilities.getColumnVector(index).getSum(false);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
itemTopic = SampleUtility.binarySearch(itemProbabilities, 0, itemProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
// Add statistics
userTopicTimes.shiftValue(userIndex, userTopic, 1);
userScoreTimes.shiftValue(userIndex, 1);
itemTopicTimes.shiftValue(itemIndex, itemTopic, 1);
itemScoreTimes.shiftValue(itemIndex, 1);
rateTopicTimes[userTopic][itemTopic][scoreIndex]++;
topicTimes.shiftValue(userTopic, itemTopic, 1);
userTopics.put(userIndex * itemSize + itemIndex, userTopic);
itemTopics.put(userIndex * itemSize + itemIndex, itemTopic);
}
}
@Override
protected void mStep() {
// TODO Auto-generated method stub
}
@Override
protected void readoutParameters() {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
for (int topicIndex = 0; topicIndex < numberOfUserTopics; topicIndex++) {
userTopicSums.shiftValue(userIndex, topicIndex, (userTopicTimes.getValue(userIndex, topicIndex) + userAlpha) / (userScoreTimes.getValue(userIndex) + numberOfUserTopics * userAlpha));
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int topicIndex = 0; topicIndex < numberOfItemTopics; topicIndex++) {
itemTopicSums.shiftValue(itemIndex, topicIndex, (itemTopicTimes.getValue(itemIndex, topicIndex) + itemAlpha) / (itemScoreTimes.getValue(itemIndex) + numberOfItemTopics * itemAlpha));
}
}
for (int userTopic = 0; userTopic < numberOfUserTopics; userTopic++) {
for (int itemTopic = 0; itemTopic < numberOfItemTopics; itemTopic++) {
for (int scoreIndex = 0; scoreIndex < actionSize; scoreIndex++) {
rateTopicSums[userTopic][itemTopic][scoreIndex] += (rateTopicTimes[userTopic][itemTopic][scoreIndex] + ratingBeta) / (topicTimes.getValue(userTopic, itemTopic) + actionSize * ratingBeta);
}
}
}
numberOfStatistics++;
}
/**
* estimate the model parameters
*/
@Override
protected void estimateParameters() {
float scale = 1F / numberOfStatistics;
// TODO
// 此处可以重构(整合userTopicProbabilities/userTopicSums和itemTopicProbabilities/itemTopicSums)
userTopicProbabilities = DenseMatrix.copyOf(userTopicSums);
userTopicProbabilities.scaleValues(scale);
itemTopicProbabilities = DenseMatrix.copyOf(itemTopicSums);
itemTopicProbabilities.scaleValues(scale);
// TODO 此处可以重构(整合rateTopicProbabilities/rateTopicSums)
for (int userTopic = 0; userTopic < numberOfUserTopics; userTopic++) {
for (int itemTopic = 0; itemTopic < numberOfItemTopics; itemTopic++) {
for (int scoreIndex = 0; scoreIndex < actionSize; scoreIndex++) {
rateTopicProbabilities[userTopic][itemTopic][scoreIndex] = rateTopicSums[userTopic][itemTopic][scoreIndex] / numberOfStatistics;
}
}
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = 0F;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
float score = term.getKey();
int scoreIndex = term.getValue();
float probability = 0F; // P(r|u,v)=\sum_{i,j} P(r|i,j)P(i|u)P(j|v)
for (int userTopic = 0; userTopic < numberOfUserTopics; userTopic++) {
for (int itemTopic = 0; itemTopic < numberOfItemTopics; itemTopic++) {
probability += rateTopicProbabilities[userTopic][itemTopic][scoreIndex] * userTopicProbabilities.getValue(userIndex, userTopic) * itemTopicProbabilities.getValue(itemIndex, itemTopic);
}
}
value += score * probability;
}
instance.setQuantityMark(value);
}
@Override
protected boolean isConverged(int iter) {
// Get the parameters
estimateParameters();
// Compute the perplexity
float sum = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
sum += perplexity(userIndex, itemIndex, score);
}
float perplexity = (float) Math.exp(sum / actionSize);
float delta = perplexity - currentError;
if (numberOfStatistics > 1 && delta > 0) {
return true;
}
currentError = perplexity;
return false;
}
private double perplexity(int user, int item, double score) {
int scoreIndex = (int) (score / minimumScore - 1);
// Compute P(r | u, v)
double probability = 0;
for (int userTopic = 0; userTopic < numberOfUserTopics; userTopic++) {
for (int itemTopic = 0; itemTopic < numberOfItemTopics; itemTopic++) {
probability += rateTopicProbabilities[userTopic][itemTopic][scoreIndex] * userTopicProbabilities.getValue(user, userTopic) * itemTopicProbabilities.getValue(item, itemTopic);
}
}
return -Math.log(probability);
}
}

View File

@ -0,0 +1,161 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
/**
*
* LLORMA学习器
*
* <pre>
* Local Low-Rank Matrix Approximation
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class LLORMALearner extends Thread {
/**
* The unique identifier of the thread.
*/
private int threadId;
/**
* The number of features.
*/
private int numberOfFactors;
/**
* Learning rate parameter.
*/
private float learnRatio;
/**
* The maximum number of iteration.
*/
private int localIteration;
/**
* Regularization factor parameter.
*/
private float userRegularization, itemRegularization;
/**
* User profile in low-rank matrix form.
*/
private DenseMatrix userFactors;
/**
* Item profile in low-rank matrix form.
*/
private DenseMatrix itemFactors;
/**
* The vector containing each user's weight.
*/
private DenseVector userWeights;
/**
* The vector containing each item's weight.
*/
private DenseVector itemWeights;
/**
* The rating matrix used for learning.
*/
private SparseMatrix trainMatrix;
/**
* Construct a local model for singleton LLORMA.
*
* @param threadId A unique thread ID.
* @param numberOfFactors The rank which will be used in this local model.
* @param numUsersParam The number of users.
* @param numItemsParam The number of items.
* @param anchorUserParam The anchor user used to learn this local model.
* @param anchorItemParam The anchor item used to learn this local model.
* @param learnRatio Learning rate parameter.
* @param userWeights Initial vector containing each user's weight.
* @param itemWeights Initial vector containing each item's weight.
* @param trainMatrix The rating matrix used for learning.
* @param localIteration localIterationParam
* @param itemRegularization localRegItemParam
* @param userRegularization localRegUserParam
*/
public LLORMALearner(int threadId, int numberOfFactors, float learnRatio, float userRegularization, float itemRegularization, int localIteration, DenseMatrix userFactors, DenseMatrix itemFactors, DenseVector userWeights, DenseVector itemWeights, SparseMatrix trainMatrix) {
this.threadId = threadId;
this.numberOfFactors = numberOfFactors;
this.learnRatio = learnRatio;
this.userRegularization = userRegularization;
this.itemRegularization = itemRegularization;
this.localIteration = localIteration;
this.userWeights = userWeights;
this.itemWeights = itemWeights;
this.userFactors = userFactors;
this.itemFactors = itemFactors;
this.trainMatrix = trainMatrix;
}
public int getIndex() {
return threadId;
}
/**
* Getter method for user profile of this local model.
*
* @return The user profile of this local model.
*/
public DenseMatrix getUserFactors() {
return userFactors;
}
/**
* Getter method for item profile of this local model.
*
* @return The item profile of this local model.
*/
public DenseMatrix getItemFactors() {
return itemFactors;
}
/**
* Learn this local model based on similar users to the anchor user and similar
* items to the anchor item. Implemented with gradient descent.
*/
@Override
public void run() {
// Learn by Weighted RegSVD
for (int iterationStep = 0; iterationStep < localIteration; iterationStep++) {
for (MatrixScalar term : trainMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
float score = term.getValue();
float predict = predict(userIndex, itemIndex);
float error = score - predict;
float weight = userWeights.getValue(userIndex) * itemWeights.getValue(itemIndex);
// update factors
for (int factorIndex = 0; factorIndex < numberOfFactors; factorIndex++) {
float userFactorValue = userFactors.getValue(userIndex, factorIndex);
float itemFactorValue = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactorValue * weight - userRegularization * userFactorValue));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * userFactorValue * weight - itemRegularization * itemFactorValue));
}
}
}
}
private float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float value = scalar.dotProduct(userVector, itemVector).getValue();
return value;
}
}

View File

@ -0,0 +1,270 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* LLORMA推荐器
*
* <pre>
* Local Low-Rank Matrix Approximation
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class LLORMAModel extends MatrixFactorizationModel {
private int numberOfGlobalFactors, numberOfLocalFactors;
private int globalEpocheSize, localEpocheSize;
private int numberOfThreads;
private float globalUserRegularization, globalItemRegularization, localUserRegularization, localItemRegularization;
private float globalLearnRatio, localLearnRatio;
private int numberOfModels;
private DenseMatrix globalUserFactors, globalItemFactors;
private DenseMatrix[] userMatrixes;
private DenseMatrix[] itemMatrixes;
private int[] anchorUsers;
private int[] anchorItems;
/*
* (non-Javadoc)
*
* @see net.librecommender.recommender.AbstractRecommender#setup()
*/
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
numberOfGlobalFactors = configuration.getInteger("recommender.global.factors.num", 20);
numberOfLocalFactors = factorSize;
globalEpocheSize = configuration.getInteger("recommender.global.iteration.maximum", 100);
localEpocheSize = epocheSize;
globalUserRegularization = configuration.getFloat("recommender.global.user.regularization", 0.01F);
globalItemRegularization = configuration.getFloat("recommender.global.item.regularization", 0.01F);
localUserRegularization = userRegularization;
localItemRegularization = itemRegularization;
globalLearnRatio = configuration.getFloat("recommender.global.iteration.learnrate", 0.01F);
localLearnRatio = configuration.getFloat("recommender.iteration.learnrate", 0.01F);
numberOfThreads = configuration.getInteger("recommender.thread.count", 4);
numberOfModels = configuration.getInteger("recommender.model.num", 50);
numberOfThreads = numberOfThreads > numberOfModels ? numberOfModels : numberOfThreads;
// global svd P Q to calculate the kernel value between users (or items)
globalUserFactors = DenseMatrix.valueOf(userSize, numberOfGlobalFactors);
globalUserFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
globalItemFactors = DenseMatrix.valueOf(itemSize, numberOfGlobalFactors);
globalItemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
}
// global svd P Q
private void practiceGlobalModel(DefaultScalar scalar) {
for (int epocheIndex = 0; epocheIndex < globalEpocheSize; epocheIndex++) {
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
float score = term.getValue();
// TODO 考虑重构,减少userVector与itemVector的重复构建
DenseVector userVector = globalUserFactors.getRowVector(userIndex);
DenseVector itemVector = globalItemFactors.getRowVector(itemIndex);
float predict = scalar.dotProduct(userVector, itemVector).getValue();
float error = score - predict;
// update factors
for (int factorIndex = 0; factorIndex < numberOfGlobalFactors; factorIndex++) {
float userFactor = globalUserFactors.getValue(userIndex, factorIndex);
float itemFactor = globalItemFactors.getValue(itemIndex, factorIndex);
globalUserFactors.shiftValue(userIndex, factorIndex, globalLearnRatio * (error * itemFactor - globalUserRegularization * userFactor));
globalItemFactors.shiftValue(itemIndex, factorIndex, globalLearnRatio * (error * userFactor - globalItemRegularization * itemFactor));
}
}
}
userMatrixes = new DenseMatrix[numberOfModels];
itemMatrixes = new DenseMatrix[numberOfModels];
anchorUsers = new int[numberOfModels];
anchorItems = new int[numberOfModels];
// end of training
}
/**
* Calculate similarity between two users, based on the global base SVD.
*
* @param leftUserIndex The first user's ID.
* @param rightUserIndex The second user's ID.
* @return The similarity value between two users idx1 and idx2.
*/
private float getUserSimilarity(DefaultScalar scalar, int leftUserIndex, int rightUserIndex) {
float similarity;
// TODO 减少向量的重复构建
DenseVector leftUserVector = globalUserFactors.getRowVector(leftUserIndex);
DenseVector rightUserVector = globalUserFactors.getRowVector(rightUserIndex);
similarity = (float) (1 - 2F / Math.PI * Math.acos(scalar.dotProduct(leftUserVector, rightUserVector).getValue() / (Math.sqrt(scalar.dotProduct(leftUserVector, leftUserVector).getValue()) * Math.sqrt(scalar.dotProduct(rightUserVector, rightUserVector).getValue()))));
if (Float.isNaN(similarity)) {
similarity = 0F;
}
return similarity;
}
/**
* Calculate similarity between two items, based on the global base SVD.
*
* @param leftItemIndex The first item's ID.
* @param rightItemIndex The second item's ID.
* @return The similarity value between two items idx1 and idx2.
*/
private float getItemSimilarity(DefaultScalar scalar, int leftItemIndex, int rightItemIndex) {
float similarity;
// TODO 减少向量的重复构建
DenseVector leftItemVector = globalItemFactors.getRowVector(leftItemIndex);
DenseVector rightItemVector = globalItemFactors.getRowVector(rightItemIndex);
similarity = (float) (1 - 2D / Math.PI * Math.acos(scalar.dotProduct(leftItemVector, rightItemVector).getValue() / (Math.sqrt(scalar.dotProduct(leftItemVector, leftItemVector).getValue()) * Math.sqrt(scalar.dotProduct(rightItemVector, rightItemVector).getValue()))));
if (Float.isNaN(similarity)) {
similarity = 0F;
}
return similarity;
}
/**
* Given the similarity, it applies the given kernel. This is done either for
* all users or for all items.
*
* @param size The length of user or item vector.
* @param anchorIdx The identifier of anchor point.
* @param type The type of kernel.
* @param width Kernel width.
* @param isItemFeature return item kernel if yes, return user kernel otherwise.
* @return The kernel-smoothed values for all users or all items.
*/
private DenseVector kernelSmoothing(DefaultScalar scalar, int size, int anchorIdx, KernelSmoother type, float width, boolean isItemFeature) {
DenseVector featureVector = DenseVector.valueOf(size);
// TODO 此处似乎有Bug?
featureVector.setValue(anchorIdx, 1F);
for (int index = 0; index < size; index++) {
float similarity;
if (isItemFeature) {
similarity = getItemSimilarity(scalar, index, anchorIdx);
} else { // userFeature
similarity = getUserSimilarity(scalar, index, anchorIdx);
}
featureVector.setValue(index, type.kernelize(similarity, width));
}
return featureVector;
}
private void practiceLocalModels(DefaultScalar scalar) {
// Pre-calculating similarity:
int completeModelCount = 0;
// TODO 此处的变量与矩阵可以整合到LLORMALearner,LLORMALearner变成任务.
LLORMALearner[] learners = new LLORMALearner[numberOfThreads];
int modelCount = 0;
int[] runningThreadList = new int[numberOfThreads];
int runningThreadCount = 0;
int waitingThreadPointer = 0;
int nextRunningSlot = 0;
// Parallel training:
while (completeModelCount < numberOfModels) {
int randomUserIndex = RandomUtility.randomInteger(userSize);
// TODO 考虑重构
SparseVector userVector = scoreMatrix.getRowVector(randomUserIndex);
if (userVector.getElementSize() == 0) {
continue;
}
// TODO 此处的并发模型有问题,需要重构.否则当第一次runningThreadCount >=
// numThreads之后,都是单线程执行.
if (runningThreadCount < numberOfThreads && modelCount < numberOfModels) {
// Selecting a new anchor point:
int randomItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
anchorUsers[modelCount] = randomUserIndex;
anchorItems[modelCount] = randomItemIndex;
// Preparing weight vectors:
DenseVector userWeights = kernelSmoothing(scalar, userSize, randomUserIndex, KernelSmoother.EPANECHNIKOV_KERNEL, 0.8F, false);
DenseVector itemWeights = kernelSmoothing(scalar, itemSize, randomItemIndex, KernelSmoother.EPANECHNIKOV_KERNEL, 0.8F, true);
DenseMatrix localUserFactors = DenseMatrix.valueOf(userSize, numberOfLocalFactors);
localUserFactors.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(distribution.sample().floatValue());
});
DenseMatrix localItemFactors = DenseMatrix.valueOf(itemSize, numberOfLocalFactors);
localItemFactors.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(distribution.sample().floatValue());
});
// Starting a new local model learning:
learners[nextRunningSlot] = new LLORMALearner(modelCount, numberOfLocalFactors, localLearnRatio, localUserRegularization, localItemRegularization, localEpocheSize, localUserFactors, localItemFactors, userWeights, itemWeights, scoreMatrix);
learners[nextRunningSlot].start();
runningThreadList[runningThreadCount] = modelCount;
runningThreadCount++;
modelCount++;
nextRunningSlot++;
} else if (runningThreadCount > 0) {
// Joining a local model which was done with learning:
try {
learners[waitingThreadPointer].join();
} catch (InterruptedException ie) {
logger.error("Join failed: " + ie);
}
LLORMALearner learner = learners[waitingThreadPointer];
userMatrixes[learner.getIndex()] = learner.getUserFactors();
itemMatrixes[learner.getIndex()] = learner.getItemFactors();
nextRunningSlot = waitingThreadPointer;
waitingThreadPointer = (waitingThreadPointer + 1) % numberOfThreads;
runningThreadCount--;
completeModelCount++;
}
}
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
practiceGlobalModel(scalar);
practiceLocalModels(scalar);
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float weightSum = 0F;
float valueSum = 0F;
for (int iterationStep = 0; iterationStep < numberOfModels; iterationStep++) {
float weight = KernelSmoother.EPANECHNIKOV_KERNEL.kernelize(getUserSimilarity(scalar, anchorUsers[iterationStep], userIndex), 0.8F) * KernelSmoother.EPANECHNIKOV_KERNEL.kernelize(getItemSimilarity(scalar, anchorItems[iterationStep], itemIndex), 0.8F);
float value = (scalar.dotProduct(userMatrixes[iterationStep].getRowVector(userIndex), itemMatrixes[iterationStep].getRowVector(itemIndex)).getValue()) * weight;
weightSum += weight;
valueSum += value;
}
float score = valueSum / weightSum;
if (Float.isNaN(score) || score == 0F) {
score = meanScore;
}
instance.setQuantityMark(score);
}
}

View File

@ -0,0 +1,103 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.math.MatrixUtility;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* MF ALS推荐器
*
* <pre>
* Large-Scale Parallel Collaborative Filtering for the Netflix Prize
* http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class MFALSModel extends MatrixFactorizationModel {
@Override
protected void doPractice() {
DenseVector scoreVector = DenseVector.valueOf(factorSize);
DenseMatrix inverseMatrix = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix transposeMatrix = DenseMatrix.valueOf(factorSize, factorSize);
DenseMatrix copyMatrix = DenseMatrix.valueOf(factorSize, factorSize);
// TODO 可以考虑只获取有评分的用户?
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// fix item matrix M, solve user matrix U
for (int userIndex = 0; userIndex < userSize; userIndex++) {
// number of items rated by user userIdx
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
int size = userVector.getElementSize();
if (size == 0) {
continue;
}
// TODO 此处应该避免valueOf
DenseMatrix rateMatrix = DenseMatrix.valueOf(size, factorSize);
DenseVector rateVector = DenseVector.valueOf(size);
int index = 0;
for (VectorScalar term : userVector) {
// step 1:
int itemIndex = term.getIndex();
rateMatrix.getRowVector(index).copyVector(itemFactors.getRowVector(itemIndex));
// step 2:
// ratings of this userIdx
rateVector.setValue(index++, term.getValue());
}
// step 3: the updated user matrix wrt user j
DenseMatrix matrix = transposeMatrix;
matrix.dotProduct(rateMatrix, true, rateMatrix, false, MathCalculator.SERIAL);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
matrix.shiftValue(factorIndex, factorIndex, userRegularization * size);
}
scoreVector.dotProduct(rateMatrix, true, rateVector, MathCalculator.SERIAL);
userFactors.getRowVector(userIndex).dotProduct(MatrixUtility.inverse(matrix, copyMatrix, inverseMatrix), false, scoreVector, MathCalculator.SERIAL);
}
// TODO 可以考虑只获取有评分的条目?
// fix user matrix U, solve item matrix M
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
// latent factor of users that have rated item itemIdx
// number of users rate item j
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
int size = itemVector.getElementSize();
if (size == 0) {
continue;
}
// TODO 此处应该避免valueOf
DenseMatrix rateMatrix = DenseMatrix.valueOf(size, factorSize);
DenseVector rateVector = DenseVector.valueOf(size);
int index = 0;
for (VectorScalar term : itemVector) {
// step 1:
int userIndex = term.getIndex();
rateMatrix.getRowVector(index).copyVector(userFactors.getRowVector(userIndex));
// step 2:
// ratings of this item
rateVector.setValue(index++, term.getValue());
}
// step 3: the updated item matrix wrt item j
DenseMatrix matrix = transposeMatrix;
matrix.dotProduct(rateMatrix, true, rateMatrix, false, MathCalculator.SERIAL);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
matrix.shiftValue(factorIndex, factorIndex, itemRegularization * size);
}
scoreVector.dotProduct(rateMatrix, true, rateVector, MathCalculator.SERIAL);
itemFactors.getRowVector(itemIndex).dotProduct(MatrixUtility.inverse(matrix, copyMatrix, inverseMatrix), false, scoreVector, MathCalculator.SERIAL);
}
}
}
}

View File

@ -0,0 +1,105 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* NMF推荐器
*
* <pre>
* Algorithms for Non-negative Matrix Factorization
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class NMFModel extends MatrixFactorizationModel {
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
for (int epocheIndex = 0; epocheIndex < epocheSize; ++epocheIndex) {
// update userFactors by fixing itemFactors
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
int user = userIndex;
ArrayVector predictVector = new ArrayVector(userVector);
predictVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(user, element.getIndex()));
});
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
DenseVector factorVector = itemFactors.getColumnVector(factorIndex);
float score = scalar.dotProduct(factorVector, userVector).getValue();
float predict = scalar.dotProduct(factorVector, predictVector).getValue() + MathUtility.EPSILON;
userFactors.setValue(userIndex, factorIndex, userFactors.getValue(userIndex, factorIndex) * (score / predict));
}
}
// update itemFactors by fixing userFactors
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
if (itemVector.getElementSize() == 0) {
continue;
}
int item = itemIndex;
ArrayVector predictVector = new ArrayVector(itemVector);
predictVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(element.getIndex(), item));
});
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
DenseVector factorVector = userFactors.getColumnVector(factorIndex);
float score = scalar.dotProduct(factorVector, itemVector).getValue();
float predict = scalar.dotProduct(factorVector, predictVector).getValue() + MathUtility.EPSILON;
itemFactors.setValue(itemIndex, factorIndex, itemFactors.getValue(itemIndex, factorIndex) * (score / predict));
}
}
// compute errors
totalError = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
if (score > 0) {
float error = predict(userIndex, itemIndex) - score;
totalError += error * error;
}
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
currentError = totalError;
}
}
}

View File

@ -0,0 +1,50 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* PMF推荐器
*
* <pre>
* PMF: Probabilistic Matrix Factorization
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class PMFModel extends MatrixFactorizationModel {
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
float score = term.getValue();
float predict = predict(userIndex, itemIndex);
float error = score - predict;
totalError += error * error;
// update factors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex), itemFactor = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactor - userRegularization * userFactor));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * userFactor - itemRegularization * itemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
}
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
}

View File

@ -0,0 +1,413 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Map.Entry;
import org.apache.commons.math3.distribution.NormalDistribution;
import org.apache.commons.math3.random.JDKRandomGenerator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.probability.QuantityProbability;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
/**
*
* RBM推荐器
*
* <pre>
* Restricted Boltzman Machines for Collaborative Filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RBMModel extends ProbabilisticGraphicalModel {
private int steps;
private float epsilonWeight;
private float epsilonExplicitBias;
private float epsilonImplicitBias;
private float momentum;
private float lamtaWeight;
private float lamtaBias;
private float[][][] weightSums;
private float[][][] weightProbabilities;
private float[][] explicitBiasSums;
private float[][] explicitBiasProbabilities;
private float[] implicitBiasSums;
private float[] implicitBiasProbabilities;
private float[][][] positiveWeights;
private float[][][] negativeWeights;
private float[][] positiveExplicitActs;
private float[][] negativeExplicitActs;
private float[] positiveImplicitActs;
private float[] negativeImplicitActs;
private int[] itemCount;
private PredictionType predictionType;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// TODO 此处可以重构
epocheSize = configuration.getInteger("recommender.iterator.maximum", 10);
sampleSize = configuration.getInteger("recommender.sample.mumber", 100);
scoreSize = scoreIndexes.size() + 1;
factorSize = configuration.getInteger("recommender.factor.number", 500);
epsilonWeight = configuration.getFloat("recommender.epsilonw", 0.001F);
epsilonExplicitBias = configuration.getFloat("recommender.epsilonvb", 0.001F);
epsilonImplicitBias = configuration.getFloat("recommender.epsilonhb", 0.001F);
steps = configuration.getInteger("recommender.tstep", 1);
momentum = configuration.getFloat("recommender.momentum", 0F);
lamtaWeight = configuration.getFloat("recommender.lamtaw", 0.001F);
lamtaBias = configuration.getFloat("recommender.lamtab", 0F);
predictionType = PredictionType.valueOf(configuration.getString("recommender.predictiontype", "mean").toUpperCase());
weightProbabilities = new float[itemSize][scoreSize][factorSize];
explicitBiasProbabilities = new float[itemSize][scoreSize];
implicitBiasProbabilities = new float[factorSize];
weightSums = new float[itemSize][scoreSize][factorSize];
implicitBiasSums = new float[factorSize];
explicitBiasSums = new float[itemSize][scoreSize];
positiveWeights = new float[itemSize][scoreSize][factorSize];
negativeWeights = new float[itemSize][scoreSize][factorSize];
positiveImplicitActs = new float[factorSize];
negativeImplicitActs = new float[factorSize];
positiveExplicitActs = new float[itemSize][scoreSize];
negativeExplicitActs = new float[itemSize][scoreSize];
itemCount = new int[itemSize];
// TODO 此处需要重构
int[][] itemScoreCount = new int[itemSize][scoreSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
for (VectorScalar term : userVector) {
int scoreIndex = scoreIndexes.get(term.getValue());
itemScoreCount[term.getIndex()][scoreIndex]++;
}
}
QuantityProbability distribution = new QuantityProbability(JDKRandomGenerator.class, 0, NormalDistribution.class, 0D, 0.01D);
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
weightProbabilities[itemIndex][scoreIndex][factorIndex] = distribution.sample().floatValue();
}
}
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
double totalScore = 0D;
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
totalScore += itemScoreCount[itemIndex][scoreIndex];
}
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
if (totalScore == 0D) {
explicitBiasProbabilities[itemIndex][scoreIndex] = RandomUtility.randomFloat(0.001F);
} else {
explicitBiasProbabilities[itemIndex][scoreIndex] = (float) Math.log(itemScoreCount[itemIndex][scoreIndex] / totalScore);
// visbiases[i][k] = Math.log(((moviecount[i][k]) + 1) /
// (trainMatrix.columnSize(i)+ softmax));
}
}
}
}
@Override
protected void doPractice() {
Collection<Integer> currentImplicitStates;
Collection<Integer> positiveImplicitStates = new ArrayList<>(factorSize);
Collection<Integer> negativeImplicitStates = new ArrayList<>(factorSize);
DenseVector negativeExplicitProbabilities = DenseVector.valueOf(scoreSize);
int[] negativeExplicitScores = new int[itemSize];
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
reset();
// 随机遍历顺序
Integer[] userIndexes = new Integer[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
userIndexes[userIndex] = userIndex;
}
RandomUtility.shuffle(userIndexes);
for (int userIndex : userIndexes) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
DenseVector factorSum = DenseVector.valueOf(factorSize);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
int scoreIndex = scoreIndexes.get(term.getValue());
itemCount[itemIndex]++;
positiveExplicitActs[itemIndex][scoreIndex] += 1F;
factorSum.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + weightProbabilities[itemIndex][scoreIndex][index]);
});
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float probability = (float) (1F / (1F + Math.exp(-factorSum.getValue(factorIndex) - implicitBiasProbabilities[factorIndex])));
if (probability > RandomUtility.randomFloat(1F)) {
positiveImplicitStates.add(factorIndex);
positiveImplicitActs[factorIndex] += 1F;
}
}
currentImplicitStates = positiveImplicitStates;
int step = 0;
do {
boolean isLast = (step + 1 >= steps);
for (VectorScalar term : userVector) {
negativeExplicitProbabilities.setValues(0F);
int itemIndex = term.getIndex();
for (int factorIndex : currentImplicitStates) {
negativeExplicitProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + weightProbabilities[itemIndex][index][factorIndex]);
});
}
// 归一化
negativeExplicitProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
value = (float) (1F / (1F + Math.exp(-value - explicitBiasProbabilities[itemIndex][index])));
scalar.setValue(value);
});
negativeExplicitProbabilities.scaleValues(1F / negativeExplicitProbabilities.getSum(false));
// TODO 此处随机概率落在某个分段(需要重构,否则永远最多落在5个分段,应该是Bug.)
float random = RandomUtility.randomFloat(1F);
negativeExplicitScores[itemIndex] = scoreSize - 1;
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
if ((random -= negativeExplicitProbabilities.getValue(scoreIndex)) <= 0F) {
negativeExplicitScores[itemIndex] = scoreIndex;
break;
}
}
if (isLast) {
negativeExplicitActs[itemIndex][negativeExplicitScores[itemIndex]] += 1F;
}
}
factorSum.setValues(0F);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
factorSum.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + weightProbabilities[itemIndex][negativeExplicitScores[itemIndex]][index]);
});
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float probability = (float) (1F / (1F + Math.exp(-factorSum.getValue(factorIndex) - implicitBiasProbabilities[factorIndex])));
if (probability > RandomUtility.randomFloat(1F)) {
negativeImplicitStates.add(factorIndex);
if (isLast) {
negativeImplicitActs[factorIndex] += 1.0;
}
}
}
if (!isLast) {
currentImplicitStates = negativeImplicitStates;
}
} while (++step < steps);
for (VectorScalar term : userVector) {
int itemIndex = term.getIndex();
int scoreIndex = scoreIndexes.get(term.getValue());
for (int factorIndex : positiveImplicitStates) {
positiveWeights[itemIndex][scoreIndex][factorIndex] += 1D;
}
for (int factorIndex : negativeImplicitStates) {
negativeWeights[itemIndex][negativeExplicitScores[itemIndex]][factorIndex] += 1D;
}
}
positiveImplicitStates.clear();
negativeImplicitStates.clear();
update(userIndex);
}
}
}
private void update(int userIndex) {
// TODO size是否应该由参数指定?
if (((userIndex + 1) % sampleSize) == 0 || (userIndex + 1) == userSize) {
int numCases = userIndex % sampleSize;
numCases++;
float positiveExplicitAct;
float negativeExplicitAct;
float positiveImplicitAct;
float negativeImplicitAct;
float positiveWeight;
float negativeWeight;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (itemCount[itemIndex] == 0) {
continue;
}
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
positiveExplicitAct = positiveExplicitActs[itemIndex][scoreIndex];
negativeExplicitAct = negativeExplicitActs[itemIndex][scoreIndex];
if (positiveExplicitAct != 0F || negativeExplicitAct != 0F) {
positiveExplicitAct /= itemCount[itemIndex];
negativeExplicitAct /= itemCount[itemIndex];
explicitBiasSums[itemIndex][scoreIndex] = momentum * explicitBiasSums[itemIndex][scoreIndex] + epsilonExplicitBias * (positiveExplicitAct - negativeExplicitAct - lamtaBias * explicitBiasProbabilities[itemIndex][scoreIndex]);
explicitBiasProbabilities[itemIndex][scoreIndex] += explicitBiasSums[itemIndex][scoreIndex];
positiveExplicitActs[itemIndex][scoreIndex] = 0F;
negativeExplicitActs[itemIndex][scoreIndex] = 0F;
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
positiveWeight = positiveWeights[itemIndex][scoreIndex][factorIndex];
negativeWeight = negativeWeights[itemIndex][scoreIndex][factorIndex];
if (positiveWeight != 0F || negativeWeight != 0F) {
positiveWeight /= itemCount[itemIndex];
negativeWeight /= itemCount[itemIndex];
weightSums[itemIndex][scoreIndex][factorIndex] = momentum * weightSums[itemIndex][scoreIndex][factorIndex] + epsilonWeight * ((positiveWeight - negativeWeight) - lamtaWeight * weightProbabilities[itemIndex][scoreIndex][factorIndex]);
weightProbabilities[itemIndex][scoreIndex][factorIndex] += weightSums[itemIndex][scoreIndex][factorIndex];
positiveWeights[itemIndex][scoreIndex][factorIndex] = 0F;
negativeWeights[itemIndex][scoreIndex][factorIndex] = 0F;
}
}
}
itemCount[itemIndex] = 0;
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
positiveImplicitAct = positiveImplicitActs[factorIndex];
negativeImplicitAct = negativeImplicitActs[factorIndex];
if (positiveImplicitAct != 0F || negativeImplicitAct != 0F) {
positiveImplicitAct /= numCases;
negativeImplicitAct /= numCases;
implicitBiasSums[factorIndex] = momentum * implicitBiasSums[factorIndex] + epsilonImplicitBias * (positiveImplicitAct - negativeImplicitAct - lamtaBias * implicitBiasProbabilities[factorIndex]);
implicitBiasProbabilities[factorIndex] += implicitBiasSums[factorIndex];
positiveImplicitActs[factorIndex] = 0F;
negativeImplicitActs[factorIndex] = 0F;
}
}
}
}
private void reset() {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
itemCount[itemIndex] = 0;
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
positiveExplicitActs[itemIndex][scoreIndex] = 0F;
negativeExplicitActs[itemIndex][scoreIndex] = 0F;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
positiveWeights[itemIndex][scoreIndex][factorIndex] = 0F;
negativeWeights[itemIndex][scoreIndex][factorIndex] = 0F;
}
}
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
positiveImplicitActs[factorIndex] = 0F;
negativeImplicitActs[factorIndex] = 0F;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float[] socreProbabilities = new float[scoreSize];
float[] factorProbabilities = new float[factorSize];
float[] factorSums = new float[factorSize];
// 用户历史分数记录?
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
for (VectorScalar term : userVector) {
int termIndex = term.getIndex();
int scoreIndex = scoreIndexes.get(term.getValue());
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
factorSums[factorIndex] += weightProbabilities[termIndex][scoreIndex][factorIndex];
}
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
factorProbabilities[factorIndex] = (float) (1F / (1F + Math.exp(0F - factorSums[factorIndex] - implicitBiasProbabilities[factorIndex])));
}
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
socreProbabilities[scoreIndex] += factorProbabilities[factorIndex] * weightProbabilities[itemIndex][scoreIndex][factorIndex];
}
}
float probabilitySum = 0F;
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
socreProbabilities[scoreIndex] = (float) (1F / (1F + Math.exp(0F - socreProbabilities[scoreIndex] - explicitBiasProbabilities[itemIndex][scoreIndex])));
probabilitySum += socreProbabilities[scoreIndex];
}
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
socreProbabilities[scoreIndex] /= probabilitySum;
}
float predict = 0F;
switch (predictionType) {
case MAX:
float score = 0F;
float probability = 0F;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
if (socreProbabilities[term.getValue()] > probability) {
probability = socreProbabilities[term.getValue()];
score = term.getKey();
}
}
predict = score;
break;
case MEAN:
float mean = 0f;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
mean += socreProbabilities[term.getValue()] * term.getKey();
}
predict = mean;
break;
}
instance.setQuantityMark(predict);
}
@Override
protected void eStep() {
// TODO Auto-generated method stub
}
@Override
protected void mStep() {
// TODO Auto-generated method stub
}
}
enum PredictionType {
MAX, MEAN
}

View File

@ -0,0 +1,199 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import it.unimi.dsi.fastutil.floats.Float2IntLinkedOpenHashMap;
import it.unimi.dsi.fastutil.floats.FloatRBTreeSet;
import it.unimi.dsi.fastutil.floats.FloatSet;
/**
*
* RF Rec推荐器
*
* <pre>
* RF-Rec: Fast and Accurate Computation of Recommendations based on Rating Frequencies
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RFRecModel extends MatrixFactorizationModel {
/**
* The average ratings of users
*/
private DenseVector userMeans;
/**
* The average ratings of items
*/
private DenseVector itemMeans;
/** 分数索引 (TODO 考虑取消或迁移.本质为连续特征离散化) */
protected Float2IntLinkedOpenHashMap scoreIndexes;
/**
* The number of ratings per rating value per user
*/
private DenseMatrix userScoreFrequencies;
/**
* The number of ratings per rating value per item
*/
private DenseMatrix itemScoreFrequencies;
/**
* User weights learned by the gradient solver
*/
private DenseVector userWeights;
/**
* Item weights learned by the gradient solver.
*/
private DenseVector itemWeights;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
// Calculate the average ratings
userMeans = DenseVector.valueOf(userSize);
itemMeans = DenseVector.valueOf(itemSize);
userWeights = DenseVector.valueOf(userSize);
itemWeights = DenseVector.valueOf(itemSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
userMeans.setValue(userIndex, meanScore);
} else {
userMeans.setValue(userIndex, userVector.getSum(false) / userVector.getElementSize());
}
userWeights.setValue(userIndex, 0.6F + RandomUtility.randomFloat(0.01F));
}
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
if (itemVector.getElementSize() == 0) {
itemMeans.setValue(itemIndex, meanScore);
} else {
itemMeans.setValue(itemIndex, itemVector.getSum(false) / itemVector.getElementSize());
}
itemWeights.setValue(itemIndex, 0.4F + RandomUtility.randomFloat(0.01F));
}
// TODO 此处会与scoreIndexes一起重构,本质为连续特征离散化.
FloatSet scores = new FloatRBTreeSet();
for (MatrixScalar term : scoreMatrix) {
scores.add(term.getValue());
}
scores.remove(0F);
scoreIndexes = new Float2IntLinkedOpenHashMap();
int index = 0;
for (float score : scores) {
scoreIndexes.put(score, index++);
}
// Calculate the frequencies.
// Users,items
userScoreFrequencies = DenseMatrix.valueOf(userSize, actionSize);
itemScoreFrequencies = DenseMatrix.valueOf(itemSize, actionSize);
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
int scoreIndex = scoreIndexes.get(term.getValue());
userScoreFrequencies.shiftValue(userIndex, scoreIndex, 1F);
itemScoreFrequencies.shiftValue(itemIndex, scoreIndex, 1F);
}
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float error = term.getValue() - predict(userIndex, itemIndex);
// Gradient-Step on user weights.
float userWeight = userWeights.getValue(userIndex) + learnRatio * (error - userRegularization * userWeights.getValue(userIndex));
userWeights.setValue(userIndex, userWeight);
// Gradient-Step on item weights.
float itemWeight = itemWeights.getValue(itemIndex) + learnRatio * (error - itemRegularization * itemWeights.getValue(itemIndex));
itemWeights.setValue(itemIndex, itemWeight);
}
}
}
/**
* Returns 1 if the rating is similar to the rounded average value
*
* @param mean the average
* @param score the rating
* @return 1 when the values are equal
*/
private float isMean(float mean, int score) {
return Math.round(mean) == score ? 1F : 0F;
}
@Override
protected float predict(int userIndex, int itemIndex) {
float value = meanScore;
float userSum = userScoreFrequencies.getRowVector(userIndex).getSum(false);
float itemSum = itemScoreFrequencies.getRowVector(itemIndex).getSum(false);
float userMean = userMeans.getValue(userIndex);
float itemMean = itemMeans.getValue(itemIndex);
if (userSum > 0F && itemSum > 0F && userMean > 0F && itemMean > 0F) {
float numeratorUser = 0F;
float denominatorUser = 0F;
float numeratorItem = 0F;
float denominatorItem = 0F;
float frequency = 0F;
// Go through all the possible rating values
for (int scoreIndex = 0, scoreSize = scoreIndexes.size(); scoreIndex < scoreSize; scoreIndex++) {
// user component
frequency = userScoreFrequencies.getValue(userIndex, scoreIndex);
frequency = frequency + 1F + isMean(userMean, scoreIndex);
numeratorUser += frequency * scoreIndex;
denominatorUser += frequency;
// item component
frequency = itemScoreFrequencies.getValue(itemIndex, scoreIndex);
frequency = frequency + 1F + isMean(itemMean, scoreIndex);
numeratorItem += frequency * scoreIndex;
denominatorItem += frequency;
}
float userWeight = userWeights.getValue(userIndex);
float itemWeight = itemWeights.getValue(itemIndex);
value = userWeight * numeratorUser / denominatorUser + itemWeight * numeratorItem / denominatorItem;
} else {
// if the user or item weren't known in the training phase...
if (userSum == 0F || userMean == 0F) {
if (itemMean != 0F) {
return itemMean;
} else {
return meanScore;
}
}
if (itemSum == 0F || itemMean == 0F) {
if (userMean != 0F) {
return userMean;
} else {
// Some heuristic -> a bit above the average rating
return meanScore;
}
}
}
return value;
}
}

View File

@ -0,0 +1,136 @@
package com.jstarcraft.rns.model.collaborative.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
/**
*
* SVD++推荐器
*
* <pre>
* Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class SVDPlusPlusModel extends BiasedMFModel {
/**
* item implicit feedback factors, "imp" string means implicit
*/
private DenseMatrix factorMatrix;
/**
* implicit item regularization
*/
private float regImpItem;
/*
* (non-Javadoc)
*
* @see net.librecommender.recommender.AbstractRecommender#setup()
*/
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
regImpItem = configuration.getFloat("recommender.impItem.regularization", 0.015F);
factorMatrix = DenseMatrix.valueOf(itemSize, factorSize);
factorMatrix.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(distribution.sample().floatValue());
});
}
@Override
protected void doPractice() {
DenseVector factorVector = DenseVector.valueOf(factorSize);
for (int epocheIndex = 10; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
if (userVector.getElementSize() == 0) {
continue;
}
for (VectorScalar outerTerm : userVector) {
int itemIndex = outerTerm.getIndex();
// TODO 此处可以修改为按userVector重置
factorVector.setValues(0F);
for (VectorScalar innerTerm : userVector) {
factorVector.addVector(factorMatrix.getRowVector(innerTerm.getIndex()));
}
float scale = (float) Math.sqrt(userVector.getElementSize());
if (scale > 0F) {
factorVector.scaleValues(1F / scale);
}
float error = outerTerm.getValue() - predict(userIndex, itemIndex, factorVector);
totalError += error * error;
// update user and item bias
float userBias = userBiases.getValue(userIndex);
userBiases.shiftValue(userIndex, learnRatio * (error - regBias * userBias));
totalError += regBias * userBias * userBias;
float itemBias = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (error - regBias * itemBias));
totalError += regBias * itemBias * itemBias;
// update user and item factors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactor - userRegularization * userFactor));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * (userFactor + factorVector.getValue(factorIndex)) - itemRegularization * itemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
for (VectorScalar innerTerm : userVector) {
int index = innerTerm.getIndex();
float factor = factorMatrix.getValue(index, factorIndex);
factorMatrix.shiftValue(index, factorIndex, learnRatio * (error * itemFactor / scale - regImpItem * factor));
totalError += regImpItem * factor * factor;
}
}
}
}
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
private float predict(int userIndex, int itemIndex, DenseVector factorVector) {
float value = userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex) + meanScore;
// sum with user factors
for (int index = 0; index < factorSize; index++) {
value = value + (factorVector.getValue(index) + userFactors.getValue(userIndex, index)) * itemFactors.getValue(itemIndex, index);
}
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
// TODO 此处需要重构,取消DenseVector.
DenseVector factorVector = DenseVector.valueOf(factorSize);
// sum of implicit feedback factors of userIdx with weight Math.sqrt(1.0
// / userItemsList.get(userIdx).size())
for (VectorScalar term : userVector) {
factorVector.addVector(factorMatrix.getRowVector(term.getIndex()));
}
float scale = (float) Math.sqrt(userVector.getElementSize());
if (scale > 0F) {
factorVector.scaleValues(1F / scale);
}
instance.setQuantityMark(predict(userIndex, itemIndex, factorVector));
}
}

View File

@ -0,0 +1,362 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.ProbabilisticGraphicalModel;
import com.jstarcraft.rns.utility.GammaUtility;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* URP推荐器
*
* <pre>
* User Rating Profile: a LDA model for rating prediction
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class URPModel extends ProbabilisticGraphicalModel {
private float preRMSE;
/**
* number of occurrentces of entry (user, topic)
*/
private DenseMatrix userTopicTimes;
/**
* number of occurences of users
*/
private DenseVector userTopicNumbers;
/**
* number of occurrences of entry (topic, item)
*/
private DenseMatrix topicItemNumbers;
/**
* P(k | u)
*/
private DenseMatrix userTopicProbabilities, userTopicSums;
/**
* user parameters
*/
private DenseVector alpha;
/**
* item parameters
*/
private DenseVector beta;
/**
*
*/
private Int2IntRBTreeMap topicAssignments;
/**
* number of occurrences of entry (t, i, r)
*/
private int[][][] topicItemTimes; // Nkir
/**
* cumulative statistics of probabilities of (t, i, r)
*/
private float[][][] topicItemScoreSums; // PkirSum;
/**
* posterior probabilities of parameters phi_{k, i, r}
*/
private float[][][] topicItemScoreProbabilities; // Pkir;
private DenseVector randomProbabilities;
/** 学习矩阵与校验矩阵(TODO 将scoreMatrix划分) */
private SparseMatrix learnMatrix, checkMatrix;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
float checkRatio = configuration.getFloat("recommender.urp.chech.ratio", 0F);
if (checkRatio == 0F) {
learnMatrix = scoreMatrix;
checkMatrix = null;
} else {
HashMatrix learnTable = new HashMatrix(true, userSize, itemSize, new Long2FloatRBTreeMap());
HashMatrix checkTable = new HashMatrix(true, userSize, itemSize, new Long2FloatRBTreeMap());
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
if (RandomUtility.randomFloat(1F) <= checkRatio) {
checkTable.setValue(userIndex, itemIndex, score);
} else {
learnTable.setValue(userIndex, itemIndex, score);
}
}
learnMatrix = SparseMatrix.valueOf(userSize, itemSize, learnTable);
checkMatrix = SparseMatrix.valueOf(userSize, itemSize, checkTable);
}
// cumulative parameters
userTopicSums = DenseMatrix.valueOf(userSize, factorSize);
topicItemScoreSums = new float[factorSize][itemSize][scoreSize];
// initialize count variables
userTopicTimes = DenseMatrix.valueOf(userSize, factorSize);
userTopicNumbers = DenseVector.valueOf(userSize);
topicItemTimes = new int[factorSize][itemSize][scoreSize];
topicItemNumbers = DenseMatrix.valueOf(factorSize, itemSize);
float initAlpha = configuration.getFloat("recommender.pgm.bucm.alpha", 1F / factorSize);
alpha = DenseVector.valueOf(factorSize);
alpha.setValues(initAlpha);
float initBeta = configuration.getFloat("recommender.pgm.bucm.beta", 1F / factorSize);
beta = DenseVector.valueOf(scoreSize);
beta.setValues(initBeta);
// initialize topics
topicAssignments = new Int2IntRBTreeMap();
for (MatrixScalar term : learnMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score); // rating level 0 ~
// numLevels
int topicIndex = RandomUtility.randomInteger(factorSize); // 0
// ~
// k-1
// Assign a topic t to pair (u, i)
topicAssignments.put(userIndex * itemSize + itemIndex, topicIndex);
// number of pairs (u, t) in (u, i, t)
userTopicTimes.shiftValue(userIndex, topicIndex, 1);
// total number of items of user u
userTopicNumbers.shiftValue(userIndex, 1);
// number of pairs (t, i, r)
topicItemTimes[topicIndex][itemIndex][scoreIndex]++;
// total number of words assigned to topic t
topicItemNumbers.shiftValue(topicIndex, itemIndex, 1);
}
randomProbabilities = DenseVector.valueOf(factorSize);
}
@Override
protected void eStep() {
float sumAlpha = alpha.getSum(false);
float sumBeta = beta.getSum(false);
// collapse Gibbs sampling
for (MatrixScalar term : learnMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
int scoreIndex = scoreIndexes.get(score); // rating level 0 ~
// numLevels
int assignmentIndex = topicAssignments.get(userIndex * itemSize + itemIndex);
userTopicTimes.shiftValue(userIndex, assignmentIndex, -1);
userTopicNumbers.shiftValue(userIndex, -1);
topicItemTimes[assignmentIndex][itemIndex][scoreIndex]--;
topicItemNumbers.shiftValue(assignmentIndex, itemIndex, -1);
// 计算概率
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
randomProbabilities.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = (userTopicTimes.getValue(userIndex, index) + alpha.getValue(index)) / (userTopicNumbers.getValue(userIndex) + sumAlpha) * (topicItemTimes[index][itemIndex][scoreIndex] + beta.getValue(scoreIndex)) / (topicItemNumbers.getValue(index, itemIndex) + sumBeta);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
assignmentIndex = SampleUtility.binarySearch(randomProbabilities, 0, randomProbabilities.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
// new topic t
topicAssignments.put(userIndex * itemSize + itemIndex, assignmentIndex);
// add newly estimated z_i to count variables
userTopicTimes.shiftValue(userIndex, assignmentIndex, 1);
userTopicNumbers.shiftValue(userIndex, 1);
topicItemTimes[assignmentIndex][itemIndex][scoreIndex]++;
topicItemNumbers.shiftValue(assignmentIndex, itemIndex, 1);
}
}
/**
* Thomas P. Minka, Estimating a Dirichlet distribution, see Eq.(55)
*/
@Override
protected void mStep() {
float denominator;
float value;
// update alpha vector
float alphaSum = alpha.getSum(false);
float alphaDigamma = GammaUtility.digamma(alphaSum);
float alphaValue;
denominator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userTopicNumbers.getValue(userIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + alphaSum) - alphaDigamma;
}
}
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
alphaValue = alpha.getValue(topicIndex);
alphaDigamma = GammaUtility.digamma(alphaValue);
float numerator = 0F;
for (int userIndex = 0; userIndex < userSize; userIndex++) {
value = userTopicTimes.getValue(userIndex, topicIndex);
if (value != 0F) {
numerator += GammaUtility.digamma(value + alphaValue) - alphaDigamma;
}
}
if (numerator != 0F) {
alpha.setValue(topicIndex, alphaValue * (numerator / denominator));
}
}
// update beta_k
float betaSum = beta.getSum(false);
float betaDigamma = GammaUtility.digamma(betaSum);
float betaValue;
denominator = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemNumbers.getValue(topicIndex, itemIndex);
if (value != 0F) {
denominator += GammaUtility.digamma(value + betaSum) - betaDigamma;
}
}
}
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
betaValue = beta.getValue(scoreIndex);
betaDigamma = GammaUtility.digamma(betaValue);
float numerator = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = topicItemTimes[topicIndex][itemIndex][scoreIndex];
if (value != 0F) {
numerator += GammaUtility.digamma(value + betaValue) - betaDigamma;
}
}
}
if (numerator != 0F) {
beta.setValue(scoreIndex, betaValue * (numerator / denominator));
}
}
}
protected void readoutParameters() {
float value = 0F;
float sumAlpha = alpha.getSum(false);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
value = (userTopicTimes.getValue(userIndex, topicIndex) + alpha.getValue(topicIndex)) / (userTopicNumbers.getValue(userIndex) + sumAlpha);
userTopicSums.shiftValue(userIndex, topicIndex, value);
}
}
float sumBeta = beta.getSum(false);
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
value = (topicItemTimes[topicIndex][itemIndex][scoreIndex] + beta.getValue(scoreIndex)) / (topicItemNumbers.getValue(topicIndex, itemIndex) + sumBeta);
topicItemScoreSums[topicIndex][itemIndex][scoreIndex] += value;
}
}
}
numberOfStatistics++;
}
@Override
protected void estimateParameters() {
userTopicProbabilities = DenseMatrix.copyOf(userTopicSums);
userTopicProbabilities.scaleValues(1F / numberOfStatistics);
topicItemScoreProbabilities = new float[factorSize][itemSize][scoreSize];
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
for (int scoreIndex = 0; scoreIndex < scoreSize; scoreIndex++) {
topicItemScoreProbabilities[topicIndex][itemIndex][scoreIndex] = topicItemScoreSums[topicIndex][itemIndex][scoreIndex] / numberOfStatistics;
}
}
}
}
@Override
protected boolean isConverged(int iter) {
// TODO 此处使用validMatrix似乎更合理.
if (checkMatrix == null) {
return false;
}
// get posterior probability distribution first
estimateParameters();
// compute current RMSE
int count = 0;
float sum = 0F;
// TODO 此处使用validMatrix似乎更合理.
for (MatrixScalar term : checkMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
float score = term.getValue();
float predict = predict(userIndex, itemIndex);
if (Float.isNaN(predict)) {
continue;
}
float error = score - predict;
sum += error * error;
count++;
}
float rmse = (float) Math.sqrt(sum / count);
float delta = rmse - preRMSE;
if (numberOfStatistics > 1 && delta > 0F) {
return true;
}
preRMSE = rmse;
return false;
}
private float predict(int userIndex, int itemIndex) {
float value = 0F;
for (Entry<Float, Integer> term : scoreIndexes.entrySet()) {
float score = term.getKey();
float probability = 0F;
for (int topicIndex = 0; topicIndex < factorSize; topicIndex++) {
probability += userTopicProbabilities.getValue(userIndex, topicIndex) * topicItemScoreProbabilities[topicIndex][itemIndex][term.getValue()];
}
value += probability * score;
}
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,79 @@
package com.jstarcraft.rns.model.collaborative.rating;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.rns.model.collaborative.UserKNNModel;
/**
*
* User KNN推荐器
*
* <pre>
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class UserKNNRatingModel extends UserKNNModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
SparseVector itemVector = itemVectors[itemIndex];
MathVector neighbors = userNeighbors[userIndex];
if (itemVector.getElementSize() == 0 || neighbors.getElementSize() == 0) {
instance.setQuantityMark(meanScore);
return;
}
float sum = 0F, absolute = 0F;
int count = 0;
int leftCursor = 0, rightCursor = 0, leftSize = itemVector.getElementSize(), rightSize = neighbors.getElementSize();
Iterator<VectorScalar> leftIterator = itemVector.iterator();
VectorScalar leftTerm = leftIterator.next();
Iterator<VectorScalar> rightIterator = neighbors.iterator();
VectorScalar rightTerm = rightIterator.next();
// 判断两个有序数组中是否存在相同的数字
while (leftCursor < leftSize && rightCursor < rightSize) {
if (leftTerm.getIndex() == rightTerm.getIndex()) {
count++;
float correlation = rightTerm.getValue();
float score = leftTerm.getValue();
sum += correlation * (score - userMeans.getValue(rightTerm.getIndex()));
absolute += Math.abs(correlation);
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
leftCursor++;
rightCursor++;
} else if (leftTerm.getIndex() > rightTerm.getIndex()) {
if (rightIterator.hasNext()) {
rightTerm = rightIterator.next();
}
rightCursor++;
} else if (leftTerm.getIndex() < rightTerm.getIndex()) {
if (leftIterator.hasNext()) {
leftTerm = leftIterator.next();
}
leftCursor++;
}
}
if (count == 0) {
instance.setQuantityMark(meanScore);
return;
}
instance.setQuantityMark(absolute > 0 ? userMeans.getValue(userIndex) + sum / absolute : meanScore);
}
}

View File

@ -0,0 +1,361 @@
package com.jstarcraft.rns.model.content;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.attribute.MemoryQualityAttribute;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.core.utility.StringUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* EFM推荐器
*
* <pre>
* Explicit factor models for explainable recommendation based on phrase-level sentiment analysis
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public abstract class EFMModel extends MatrixFactorizationModel {
protected String commentField;
protected int commentDimension;
protected int numberOfFeatures;
protected int numberOfExplicitFeatures;
protected int numberOfImplicitFeatures;
protected float scoreScale;
protected DenseMatrix featureFactors;
protected DenseMatrix userExplicitFactors;
protected DenseMatrix userImplicitFactors;
protected DenseMatrix itemExplicitFactors;
protected DenseMatrix itemImplicitFactors;
protected SparseMatrix userFeatures;
protected SparseMatrix itemFeatures;
protected float attentionRegularization;
protected float qualityRegularization;
protected float explicitRegularization;
protected float implicitRegularization;
protected float featureRegularization;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
commentField = configuration.getString("data.model.fields.comment");
commentDimension = model.getQualityInner(commentField);
MemoryQualityAttribute attribute = (MemoryQualityAttribute) space.getQualityAttribute(commentField);
Object[] wordValues = attribute.getDatas();
scoreScale = maximumScore - minimumScore;
numberOfExplicitFeatures = configuration.getInteger("recommender.factor.explicit", 5);
numberOfImplicitFeatures = factorSize - numberOfExplicitFeatures;
attentionRegularization = configuration.getFloat("recommender.regularization.lambdax", 0.001F);
qualityRegularization = configuration.getFloat("recommender.regularization.lambday", 0.001F);
explicitRegularization = configuration.getFloat("recommender.regularization.lambdau", 0.001F);
implicitRegularization = configuration.getFloat("recommender.regularization.lambdah", 0.001F);
featureRegularization = configuration.getFloat("recommender.regularization.lambdav", 0.001F);
Map<String, Integer> featureDictionaries = new HashMap<>();
Map<Integer, StringBuilder> userDictionaries = new HashMap<>();
Map<Integer, StringBuilder> itemDictionaries = new HashMap<>();
numberOfFeatures = 0;
// // TODO 此处保证所有特征都会被识别
// for (Object value : wordValues) {
// String wordValue = (String) value;
// String[] words = wordValue.split(" ");
// for (String word : words) {
// // TODO 此处似乎是Bug,不应该再将word划分为更细粒度.
// String feature = word.split(":")[0];
// if (!featureDictionaries.containsKey(feature) &&
// StringUtils.isNotEmpty(feature)) {
// featureDictionaries.put(feature, numberOfWords);
// numberOfWords++;
// }
// }
// }
for (DataInstance sample : model) {
int userIndex = sample.getQualityFeature(userDimension);
int itemIndex = sample.getQualityFeature(itemDimension);
int wordIndex = sample.getQualityFeature(commentDimension);
String wordValue = (String) wordValues[wordIndex];
String[] words = wordValue.split(" ");
StringBuilder buffer;
for (String word : words) {
// TODO 此处似乎是Bug,不应该再将word划分为更细粒度.
String feature = word.split(":")[0];
if (!featureDictionaries.containsKey(feature) && !StringUtility.isEmpty(feature)) {
featureDictionaries.put(feature, numberOfFeatures++);
}
buffer = userDictionaries.get(userIndex);
if (buffer != null) {
buffer.append(" ").append(word);
} else {
userDictionaries.put(userIndex, new StringBuilder(word));
}
buffer = itemDictionaries.get(itemIndex);
if (buffer != null) {
buffer.append(" ").append(word);
} else {
itemDictionaries.put(itemIndex, new StringBuilder(word));
}
}
}
// Create V,U1,H1,U2,H2
featureFactors = DenseMatrix.valueOf(numberOfFeatures, numberOfExplicitFeatures);
featureFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
userExplicitFactors = DenseMatrix.valueOf(userSize, numberOfExplicitFeatures);
userExplicitFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
userImplicitFactors = DenseMatrix.valueOf(userSize, numberOfImplicitFeatures);
userImplicitFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
itemExplicitFactors = DenseMatrix.valueOf(itemSize, numberOfExplicitFeatures);
itemExplicitFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
itemImplicitFactors = DenseMatrix.valueOf(itemSize, numberOfImplicitFeatures);
itemImplicitFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
float[] featureValues = new float[numberOfFeatures];
// compute UserFeatureAttention
HashMatrix userTable = new HashMatrix(true, userSize, numberOfFeatures, new Long2FloatRBTreeMap());
for (Entry<Integer, StringBuilder> term : userDictionaries.entrySet()) {
int userIndex = term.getKey();
String[] words = term.getValue().toString().split(" ");
for (String word : words) {
if (!StringUtility.isEmpty(word)) {
int featureIndex = featureDictionaries.get(word.split(":")[0]);
featureValues[featureIndex] += 1F;
}
}
for (int featureIndex = 0; featureIndex < numberOfFeatures; featureIndex++) {
if (featureValues[featureIndex] != 0F) {
float value = (float) (1F + (scoreScale - 1F) * (2F / (1F + Math.exp(-featureValues[featureIndex])) - 1F));
userTable.setValue(userIndex, featureIndex, value);
featureValues[featureIndex] = 0F;
}
}
}
userFeatures = SparseMatrix.valueOf(userSize, numberOfFeatures, userTable);
// compute ItemFeatureQuality
HashMatrix itemTable = new HashMatrix(true, itemSize, numberOfFeatures, new Long2FloatRBTreeMap());
for (Entry<Integer, StringBuilder> term : itemDictionaries.entrySet()) {
int itemIndex = term.getKey();
String[] words = term.getValue().toString().split(" ");
for (String word : words) {
if (!StringUtility.isEmpty(word)) {
int featureIndex = featureDictionaries.get(word.split(":")[0]);
featureValues[featureIndex] += Double.parseDouble(word.split(":")[1]);
}
}
for (int featureIndex = 0; featureIndex < numberOfFeatures; featureIndex++) {
if (featureValues[featureIndex] != 0F) {
float value = (float) (1F + (scoreScale - 1F) / (1F + Math.exp(-featureValues[featureIndex])));
itemTable.setValue(itemIndex, featureIndex, value);
featureValues[featureIndex] = 0F;
}
}
}
itemFeatures = SparseMatrix.valueOf(itemSize, numberOfFeatures, itemTable);
logger.info("numUsers:" + userSize);
logger.info("numItems:" + itemSize);
logger.info("numFeatures:" + numberOfFeatures);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int featureIndex = 0; featureIndex < numberOfFeatures; featureIndex++) {
if (userFeatures.getColumnScope(featureIndex) > 0 && itemFeatures.getColumnScope(featureIndex) > 0) {
SparseVector userVector = userFeatures.getColumnVector(featureIndex);
SparseVector itemVector = itemFeatures.getColumnVector(featureIndex);
// TODO 此处需要重构,应该避免不断构建SparseVector.
int feature = featureIndex;
ArrayVector userFactors = new ArrayVector(userVector);
userFactors.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predictUserFactor(scalar, element.getIndex(), feature));
});
ArrayVector itemFactors = new ArrayVector(itemVector);
itemFactors.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predictItemFactor(scalar, element.getIndex(), feature));
});
for (int factorIndex = 0; factorIndex < numberOfExplicitFeatures; factorIndex++) {
DenseVector factorUsersVector = userExplicitFactors.getColumnVector(factorIndex);
DenseVector factorItemsVector = itemExplicitFactors.getColumnVector(factorIndex);
float numerator = attentionRegularization * scalar.dotProduct(factorUsersVector, userVector).getValue() + qualityRegularization * scalar.dotProduct(factorItemsVector, itemVector).getValue();
float denominator = attentionRegularization * scalar.dotProduct(factorUsersVector, userFactors).getValue() + qualityRegularization * scalar.dotProduct(factorItemsVector, itemFactors).getValue() + featureRegularization * featureFactors.getValue(featureIndex, factorIndex) + MathUtility.EPSILON;
featureFactors.setValue(featureIndex, factorIndex, (float) (featureFactors.getValue(featureIndex, factorIndex) * Math.sqrt(numerator / denominator)));
}
}
}
// Update UserFeatureMatrix by fixing the others
for (int userIndex = 0; userIndex < userSize; userIndex++) {
if (scoreMatrix.getRowScope(userIndex) > 0 && userFeatures.getRowScope(userIndex) > 0) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
SparseVector attentionVector = userFeatures.getRowVector(userIndex);
// TODO 此处需要重构,应该避免不断构建SparseVector.
int user = userIndex;
ArrayVector itemPredictsVector = new ArrayVector(userVector);
itemPredictsVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(user, element.getIndex()));
});
ArrayVector attentionPredVector = new ArrayVector(attentionVector);
attentionPredVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predictUserFactor(scalar, user, element.getIndex()));
});
for (int factorIndex = 0; factorIndex < numberOfExplicitFeatures; factorIndex++) {
DenseVector factorItemsVector = itemExplicitFactors.getColumnVector(factorIndex);
DenseVector featureVector = featureFactors.getColumnVector(factorIndex);
float numerator = scalar.dotProduct(factorItemsVector, userVector).getValue() + attentionRegularization * scalar.dotProduct(featureVector, attentionVector).getValue();
float denominator = scalar.dotProduct(factorItemsVector, itemPredictsVector).getValue() + attentionRegularization * scalar.dotProduct(featureVector, attentionPredVector).getValue() + explicitRegularization * userExplicitFactors.getValue(userIndex, factorIndex) + MathUtility.EPSILON;
userExplicitFactors.setValue(userIndex, factorIndex, (float) (userExplicitFactors.getValue(userIndex, factorIndex) * Math.sqrt(numerator / denominator)));
}
}
}
// Update ItemFeatureMatrix by fixing the others
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (scoreMatrix.getColumnScope(itemIndex) > 0 && itemFeatures.getRowScope(itemIndex) > 0) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
SparseVector qualityVector = itemFeatures.getRowVector(itemIndex);
// TODO 此处需要重构,应该避免不断构建SparseVector.
int item = itemIndex;
ArrayVector userPredictsVector = new ArrayVector(itemVector);
userPredictsVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(element.getIndex(), item));
});
ArrayVector qualityPredVector = new ArrayVector(qualityVector);
qualityPredVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predictItemFactor(scalar, item, element.getIndex()));
});
for (int factorIndex = 0; factorIndex < numberOfExplicitFeatures; factorIndex++) {
DenseVector factorUsersVector = userExplicitFactors.getColumnVector(factorIndex);
DenseVector featureVector = featureFactors.getColumnVector(factorIndex);
float numerator = scalar.dotProduct(factorUsersVector, itemVector).getValue() + qualityRegularization * scalar.dotProduct(featureVector, qualityVector).getValue();
float denominator = scalar.dotProduct(factorUsersVector, userPredictsVector).getValue() + qualityRegularization * scalar.dotProduct(featureVector, qualityPredVector).getValue() + explicitRegularization * itemExplicitFactors.getValue(itemIndex, factorIndex) + MathUtility.EPSILON;
itemExplicitFactors.setValue(itemIndex, factorIndex, (float) (itemExplicitFactors.getValue(itemIndex, factorIndex) * Math.sqrt(numerator / denominator)));
}
}
}
// Update UserHiddenMatrix by fixing the others
for (int userIndex = 0; userIndex < userSize; userIndex++) {
if (scoreMatrix.getRowScope(userIndex) > 0) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
// TODO 此处需要重构,应该避免不断构建SparseVector.
int user = userIndex;
ArrayVector itemPredictsVector = new ArrayVector(userVector);
itemPredictsVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(user, element.getIndex()));
});
for (int factorIndex = 0; factorIndex < numberOfImplicitFeatures; factorIndex++) {
DenseVector hiddenItemsVector = itemImplicitFactors.getColumnVector(factorIndex);
float numerator = scalar.dotProduct(hiddenItemsVector, userVector).getValue();
float denominator = scalar.dotProduct(hiddenItemsVector, itemPredictsVector).getValue() + implicitRegularization * userImplicitFactors.getValue(userIndex, factorIndex) + MathUtility.EPSILON;
userImplicitFactors.setValue(userIndex, factorIndex, (float) (userImplicitFactors.getValue(userIndex, factorIndex) * Math.sqrt(numerator / denominator)));
}
}
}
// Update ItemHiddenMatrix by fixing the others
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
if (scoreMatrix.getColumnScope(itemIndex) > 0) {
SparseVector itemVector = scoreMatrix.getColumnVector(itemIndex);
// TODO 此处需要重构,应该避免不断构建SparseVector.
int item = itemIndex;
ArrayVector userPredictsVector = new ArrayVector(itemVector);
userPredictsVector.iterateElement(MathCalculator.SERIAL, (element) -> {
element.setValue(predict(element.getIndex(), item));
});
for (int factorIndex = 0; factorIndex < numberOfImplicitFeatures; factorIndex++) {
DenseVector hiddenUsersVector = userImplicitFactors.getColumnVector(factorIndex);
float numerator = scalar.dotProduct(hiddenUsersVector, itemVector).getValue();
float denominator = scalar.dotProduct(hiddenUsersVector, userPredictsVector).getValue() + implicitRegularization * itemImplicitFactors.getValue(itemIndex, factorIndex) + MathUtility.EPSILON;
itemImplicitFactors.setValue(itemIndex, factorIndex, (float) (itemImplicitFactors.getValue(itemIndex, factorIndex) * Math.sqrt(numerator / denominator)));
}
}
}
// Compute loss value
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
double rating = term.getValue();
double predRating = scalar.dotProduct(userExplicitFactors.getRowVector(userIndex), itemExplicitFactors.getRowVector(itemIndex)).getValue() + scalar.dotProduct(userImplicitFactors.getRowVector(userIndex), itemImplicitFactors.getRowVector(itemIndex)).getValue();
totalError += (rating - predRating) * (rating - predRating);
}
for (MatrixScalar term : userFeatures) {
int userIndex = term.getRow();
int featureIndex = term.getColumn();
double real = term.getValue();
double pred = predictUserFactor(scalar, userIndex, featureIndex);
totalError += (real - pred) * (real - pred);
}
for (MatrixScalar term : itemFeatures) {
int itemIndex = term.getRow();
int featureIndex = term.getColumn();
double real = term.getValue();
double pred = predictItemFactor(scalar, itemIndex, featureIndex);
totalError += (real - pred) * (real - pred);
}
totalError += explicitRegularization * (userExplicitFactors.getNorm(2F, false) + itemExplicitFactors.getNorm(2F, false));
totalError += implicitRegularization * (userImplicitFactors.getNorm(2F, false) + itemImplicitFactors.getNorm(2F, false));
totalError += featureRegularization * featureFactors.getNorm(2F, false);
logger.info("iter:" + epocheIndex + ", loss:" + totalError);
}
}
protected float predictUserFactor(DefaultScalar scalar, int userIndex, int featureIndex) {
return scalar.dotProduct(userExplicitFactors.getRowVector(userIndex), featureFactors.getRowVector(featureIndex)).getValue();
}
protected float predictItemFactor(DefaultScalar scalar, int itemIndex, int featureIndex) {
return scalar.dotProduct(itemExplicitFactors.getRowVector(itemIndex), featureFactors.getRowVector(featureIndex)).getValue();
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
return scalar.dotProduct(userExplicitFactors.getRowVector(userIndex), itemExplicitFactors.getRowVector(itemIndex)).getValue() + scalar.dotProduct(userImplicitFactors.getRowVector(userIndex), itemImplicitFactors.getRowVector(itemIndex)).getValue();
}
}

View File

@ -0,0 +1,8 @@
基于内容的推荐(Content-based Recommendations):
http://breezedeus.github.io/2012/04/10/breezedeus-content-based-rec.html
Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications:
https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0158423
a Python repository for content-based recommendation based on Latent semantic analysis (LSA) topic distance and Rocchio Algorithm, see the implementation interactively on http://www.scholarfy.net:
https://github.com/titipata/science_concierge

View File

@ -0,0 +1,71 @@
package com.jstarcraft.rns.model.content.ranking;
import java.util.Arrays;
import java.util.Comparator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.rns.model.content.EFMModel;
/**
*
* EFM推荐器
*
* <pre>
* Explicit factor models for explainable recommendation based on phrase-level sentiment analysis
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class EFMRankingModel extends EFMModel {
private float threshold;
private int featureLimit;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
threshold = configuration.getFloat("efmranking.threshold", 1F);
featureLimit = configuration.getInteger("efmranking.featureLimit", 250);
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
// TODO 此处可以优化性能
Integer[] orderIndexes = new Integer[numberOfFeatures];
for (int featureIndex = 0; featureIndex < numberOfFeatures; featureIndex++) {
orderIndexes[featureIndex] = featureIndex;
}
MathVector vector = DenseVector.valueOf(numberOfFeatures);
vector.dotProduct(userExplicitFactors.getRowVector(userIndex), featureFactors, true, MathCalculator.SERIAL);
Arrays.sort(orderIndexes, new Comparator<Integer>() {
@Override
public int compare(Integer leftIndex, Integer rightIndex) {
return (vector.getValue(leftIndex) > vector.getValue(rightIndex) ? -1 : (vector.getValue(leftIndex) < vector.getValue(rightIndex) ? 1 : 0));
}
});
float value = 0F;
for (int index = 0; index < featureLimit; index++) {
int featureIndex = orderIndexes[index];
value += predictUserFactor(scalar, userIndex, featureIndex) * predictItemFactor(scalar, itemIndex, featureIndex);
}
value = threshold * (value / (featureLimit * maximumScore));
value = value + (1F - threshold) * predict(userIndex, itemIndex);
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,187 @@
package com.jstarcraft.rns.model.content.ranking;
import java.util.Collection;
import java.util.Comparator;
import java.util.Iterator;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.algorithm.text.AbstractTermFrequency;
import com.jstarcraft.ai.math.algorithm.text.InverseDocumentFrequency;
import com.jstarcraft.ai.math.algorithm.text.NaturalInverseDocumentFrequency;
import com.jstarcraft.ai.math.algorithm.text.TermFrequency;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.HashVector;
import com.jstarcraft.ai.math.structure.vector.MathVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.Integer2FloatKeyValue;
import com.jstarcraft.core.utility.Neighborhood;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import it.unimi.dsi.fastutil.ints.Int2FloatAVLTreeMap;
import it.unimi.dsi.fastutil.ints.Int2FloatSortedMap;
import it.unimi.dsi.fastutil.longs.Long2FloatAVLTreeMap;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* TF-IDF推荐器
*
* @author Birdy
*
*/
public class TFIDFModel extends MatrixFactorizationModel {
private Comparator<Integer2FloatKeyValue> comparator = new Comparator<Integer2FloatKeyValue>() {
@Override
public int compare(Integer2FloatKeyValue left, Integer2FloatKeyValue right) {
int compare = -(Float.compare(left.getValue(), right.getValue()));
if (compare == 0) {
compare = Integer.compare(left.getKey(), right.getKey());
}
return compare;
}
};
protected String commentField;
protected int commentDimension;
protected ArrayVector[] userVectors;
protected SparseMatrix itemVectors;
// protected MathCorrelation correlation;
private class VectorTermFrequency extends AbstractTermFrequency {
public VectorTermFrequency(MathVector vector) {
super(new Int2FloatAVLTreeMap(), vector.getElementSize());
for (VectorScalar scalar : vector) {
keyValues.put(scalar.getIndex(), scalar.getValue());
}
}
}
private class DocumentIterator implements Iterator<TermFrequency> {
private int index = 0;
@Override
public boolean hasNext() {
return index < itemVectors.getRowSize();
}
@Override
public TermFrequency next() {
MathVector vector = itemVectors.getRowVector(index++);
VectorTermFrequency termFrequency = new VectorTermFrequency(vector);
return termFrequency;
}
}
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
int numberOfFeatures = 4096;
// 特征矩阵
HashMatrix featureMatrix = new HashMatrix(true, itemSize, numberOfFeatures, new Long2FloatRBTreeMap());
DataModule featureModel = space.getModule("article");
String articleField = configuration.getString("data.model.fields.article");
String featureField = configuration.getString("data.model.fields.feature");
String degreeField = configuration.getString("data.model.fields.degree");
int articleDimension = featureModel.getQualityInner(articleField);
int featureDimension = featureModel.getQualityInner(featureField);
int degreeDimension = featureModel.getQuantityInner(degreeField);
for (DataInstance instance : featureModel) {
int itemIndex = instance.getQualityFeature(articleDimension);
int featureIndex = instance.getQualityFeature(featureDimension);
float featureValue = instance.getQuantityFeature(degreeDimension);
featureMatrix.setValue(itemIndex, featureIndex, featureValue);
}
// 物品矩阵
itemVectors = SparseMatrix.valueOf(itemSize, numberOfFeatures, featureMatrix);
DocumentIterator iterator = new DocumentIterator();
Int2FloatSortedMap keyValues = new Int2FloatAVLTreeMap();
InverseDocumentFrequency inverseDocumentFrequency = new NaturalInverseDocumentFrequency(keyValues, iterator);
/** k控制着词频饱和度,值越小饱和度变化越快,值越大饱和度变化越慢 */
float k = 1.2F;
/** b控制着词频归一化所起的作用,0.0会完全禁用归一化,1.0会完全启用归一化 */
float b = 0.75F;
float avgdl = 0F;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
MathVector itemVector = itemVectors.getRowVector(itemIndex);
avgdl += itemVector.getElementSize();
}
avgdl /= itemSize;
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
MathVector itemVector = itemVectors.getRowVector(itemIndex);
float l = itemVector.getElementSize() / avgdl;
for (VectorScalar scalar : itemVector) {
float tf = scalar.getValue();
float idf = inverseDocumentFrequency.getValue(scalar.getIndex());
// use BM25
// scalar.setValue((idf * (k + 1F) * tf) / (k * (1F - b + b * l) + tf));
// use TF-IDF
scalar.setValue((idf * tf));
}
// 归一化
itemVector.scaleValues(1F / itemVector.getNorm(2F, true));
}
// 用户矩阵
userVectors = new ArrayVector[userSize];
for (int userIndex = 0; userIndex < userSize; userIndex++) {
MathVector rowVector = scoreMatrix.getRowVector(userIndex);
HashVector userVector = new HashVector(0L, numberOfFeatures, new Long2FloatAVLTreeMap());
for (VectorScalar scalar : rowVector) {
int itemIndex = scalar.getIndex();
MathVector itemVector = itemVectors.getRowVector(itemIndex);
for (int position = 0; position < itemVector.getElementSize(); position++) {
float value = userVector.getValue(itemVector.getIndex(position));
userVector.setValue(itemVector.getIndex(position), Float.isNaN(value) ? itemVector.getValue(position) : value + itemVector.getValue(position));
}
}
userVector.scaleValues(1F / rowVector.getElementSize());
Neighborhood<Integer2FloatKeyValue> knn = new Neighborhood<Integer2FloatKeyValue>(50, comparator);
for (int position = 0; position < userVector.getElementSize(); position++) {
knn.updateNeighbor(new Integer2FloatKeyValue(userVector.getIndex(position), userVector.getValue(position)));
}
userVector = new HashVector(0L, numberOfFeatures, new Long2FloatAVLTreeMap());
Collection<Integer2FloatKeyValue> neighbors = knn.getNeighbors();
for (Integer2FloatKeyValue neighbor : neighbors) {
userVector.setValue(neighbor.getKey(), neighbor.getValue());
}
userVectors[userIndex] = new ArrayVector(userVector);
}
}
@Override
protected void doPractice() {
}
@Override
protected float predict(int userIndex, int itemIndex) {
MathVector userVector = userVectors[userIndex];
MathVector itemVector = itemVectors.getRowVector(itemIndex);
return DefaultScalar.getInstance().dotProduct(userVector, itemVector).getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,28 @@
package com.jstarcraft.rns.model.content.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.rns.model.content.EFMModel;
/**
*
* User KNN推荐器
*
* <pre>
* Explicit factor models for explainable recommendation based on phrase-level sentiment analysis
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class EFMRatingModel extends EFMModel {
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float value = predict(userIndex, itemIndex);
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,380 @@
package com.jstarcraft.rns.model.content.rating;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.lang3.StringUtils;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.attribute.MemoryQualityAttribute;
import com.jstarcraft.ai.math.MathUtility;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.model.neuralnetwork.activation.ActivationFunction;
import com.jstarcraft.ai.model.neuralnetwork.activation.SoftMaxActivationFunction;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.utility.SampleUtility;
import it.unimi.dsi.fastutil.ints.Int2ObjectRBTreeMap;
/**
*
* HFT推荐器
*
* <pre>
* Hidden factors and hidden topics: understanding rating dimensions with review text
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class HFTModel extends MatrixFactorizationModel {
private static class Content {
private int[] wordIndexes;
private int[] topicIndexes;
private Content(int[] wordIndexes) {
this.wordIndexes = wordIndexes;
}
int[] getWordIndexes() {
return wordIndexes;
}
int[] getTopicIndexes() {
return topicIndexes;
}
void setTopicIndexes(int[] topicIndexes) {
this.topicIndexes = topicIndexes;
}
}
// TODO 考虑重构
private Int2ObjectRBTreeMap<Content> contentMatrix;
private DenseMatrix wordFactors;
protected String commentField;
protected int commentDimension;
/** 单词数量(TODO 考虑改名为numWords) */
private int numberOfWords;
/**
* user biases
*/
private DenseVector userBiases;
/**
* user biases
*/
private DenseVector itemBiases;
/**
* user latent factors
*/
// TODO 取消,父类已实现.
private DenseMatrix userFactors;
/**
* item latent factors
*/
// TODO 取消,父类已实现.
private DenseMatrix itemFactors;
/**
* init mean
*/
// TODO 取消,父类已实现.
private float initMean;
/**
* init standard deviation
*/
// TODO 取消,父类已实现.
private float initStd;
/**
* bias regularization
*/
private float biasRegularization;
/**
* user regularization
*/
// TODO 取消,父类已实现.
private float userRegularization;
/**
* item regularization
*/
// TODO 取消,父类已实现.
private float itemRegularization;
private DenseVector probability;
private DenseMatrix userProbabilities;
private DenseMatrix wordProbabilities;
protected ActivationFunction function;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
commentField = configuration.getString("data.model.fields.comment");
commentDimension = model.getQualityInner(commentField);
MemoryQualityAttribute attribute = (MemoryQualityAttribute) space.getQualityAttribute(commentField);
Object[] wordValues = attribute.getDatas();
biasRegularization = configuration.getFloat("recommender.bias.regularization", 0.01F);
userRegularization = configuration.getFloat("recommender.user.regularization", 0.01F);
itemRegularization = configuration.getFloat("recommender.item.regularization", 0.01F);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
// TODO 此处需要重构initMean与initStd
initMean = 0.0F;
initStd = 0.1F;
userBiases = DenseVector.valueOf(userSize);
userBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
numberOfWords = 0;
// build review matrix and counting the number of words
contentMatrix = new Int2ObjectRBTreeMap<>();
Map<String, Integer> wordDictionaries = new HashMap<>();
for (DataInstance sample : model) {
int userIndex = sample.getQualityFeature(userDimension);
int itemIndex = sample.getQualityFeature(itemDimension);
int contentIndex = sample.getQualityFeature(commentDimension);
String data = (String) wordValues[contentIndex];
String[] words = data.isEmpty() ? new String[0] : data.split(":");
for (String word : words) {
if (!wordDictionaries.containsKey(word) && StringUtils.isNotEmpty(word)) {
wordDictionaries.put(word, numberOfWords);
numberOfWords++;
}
}
// TODO 此处旧代码使用indexes[index] =
// Integer.valueOf(words[index])似乎有Bug,应该使用indexes[index] =
// wordDictionaries.get(word);
int[] wordIndexes = new int[words.length];
for (int index = 0; index < words.length; index++) {
wordIndexes[index] = Integer.valueOf(words[index]);
}
Content content = new Content(wordIndexes);
contentMatrix.put(userIndex * itemSize + itemIndex, content);
}
// TODO 此处保证所有特征都会被识别
for (Object value : wordValues) {
String content = (String) value;
String[] words = content.split(":");
for (String word : words) {
if (!wordDictionaries.containsKey(word) && StringUtils.isNotEmpty(word)) {
wordDictionaries.put(word, numberOfWords);
numberOfWords++;
}
}
}
logger.info("number of users : " + userSize);
logger.info("number of items : " + itemSize);
logger.info("number of words : " + numberOfWords);
wordFactors = DenseMatrix.valueOf(factorSize, numberOfWords);
wordFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.1F));
});
userProbabilities = DenseMatrix.valueOf(userSize, factorSize);
wordProbabilities = DenseMatrix.valueOf(factorSize, numberOfWords);
probability = DenseVector.valueOf(factorSize);
probability.setValues(1F);
function = new SoftMaxActivationFunction();
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
Content content = contentMatrix.get(userIndex * itemSize + itemIndex);
int[] wordIndexes = content.getWordIndexes();
int[] topicIndexes = new int[wordIndexes.length];
for (int wordIndex = 0; wordIndex < wordIndexes.length; wordIndex++) {
topicIndexes[wordIndex] = RandomUtility.randomInteger(factorSize);
}
content.setTopicIndexes(topicIndexes);
}
calculateThetas();
calculatePhis();
}
private void sample() {
calculateThetas();
calculatePhis();
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
Content content = contentMatrix.get(userIndex * itemSize + itemIndex);
int[] wordIndexes = content.getWordIndexes();
int[] topicIndexes = content.getTopicIndexes();
sampleTopicsToWords(userIndex, wordIndexes, topicIndexes);
// LOG.info("user:" + u + ", item:" + j + ", topics:" + s);
}
}
/**
* Update function for thetas and phiks, check if softmax comes in to NaN and
* update the parameters.
*
* @param oldValues old values of the parameter
* @param newValues new values to update the parameter
* @return the old values if new values contain NaN
* @throws Exception if error occurs
*/
private float[] updateArray(float[] oldValues, float[] newValues) {
for (float value : newValues) {
if (Float.isNaN(value)) {
return oldValues;
}
}
return newValues;
}
private void calculateThetas() {
for (int userIndex = 0; userIndex < userSize; userIndex++) {
DenseVector factorVector = userFactors.getRowVector(userIndex);
function.forward(factorVector, userProbabilities.getRowVector(userIndex));
}
}
private void calculatePhis() {
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
DenseVector factorVector = wordFactors.getRowVector(factorIndex);
function.forward(factorVector, wordProbabilities.getRowVector(factorIndex));
}
}
// TODO 考虑整理到Content.
private int[] sampleTopicsToWords(int userIndex, int[] wordsIndexes, int[] topicIndexes) {
for (int wordIndex = 0; wordIndex < wordsIndexes.length; wordIndex++) {
int topicIndex = wordsIndexes[wordIndex];
DefaultScalar sum = DefaultScalar.getInstance();
sum.setValue(0F);
probability.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = userProbabilities.getValue(userIndex, index) * wordProbabilities.getValue(index, topicIndex);
sum.shiftValue(value);
scalar.setValue(sum.getValue());
});
topicIndexes[wordIndex] = SampleUtility.binarySearch(probability, 0, probability.getElementSize() - 1, RandomUtility.randomFloat(sum.getValue()));
}
return topicIndexes;
}
/**
* The training approach is SGD instead of L-BFGS, so it can be slow if the
* dataset is big.
*/
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
// SGD training
// TODO 此处应该修改为配置
for (int iterationSDG = 0; iterationSDG < 5; iterationSDG++) {
totalError = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // user
int itemIndex = term.getColumn(); // item
float score = term.getValue();
float predict = predict(userIndex, itemIndex);
float error = score - predict;
totalError += error * error;
// update factors
float userBias = userBiases.getValue(userIndex);
float userSgd = error - biasRegularization * userBias;
userBiases.shiftValue(userIndex, learnRatio * userSgd);
// loss += regB * bu * bu;
float itemBias = itemBiases.getValue(itemIndex);
float itemSgd = error - biasRegularization * itemBias;
itemBiases.shiftValue(itemIndex, learnRatio * itemSgd);
// loss += regB * bj * bj;
// TODO 此处应该重构
Content content = contentMatrix.get(userIndex * itemSize + itemIndex);
int[] wordIndexes = content.getWordIndexes();
if (wordIndexes.length == 0) {
continue;
}
int[] topicIndexes = content.getTopicIndexes();
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
float userSGD = error * itemFactor - userRegularization * userFactor;
float itemSGD = error * userFactor - itemRegularization * itemFactor;
userFactors.shiftValue(userIndex, factorIndex, learnRatio * userSGD);
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * itemSGD);
for (int wordIndex = 0; wordIndex < wordIndexes.length; wordIndex++) {
int topicIndex = topicIndexes[wordIndex];
if (factorIndex == topicIndex) {
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (1 - userProbabilities.getValue(userIndex, topicIndex)));
} else {
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (-userProbabilities.getValue(userIndex, topicIndex)));
}
totalError -= MathUtility.logarithm(userProbabilities.getValue(userIndex, topicIndex) * wordProbabilities.getValue(topicIndex, wordIndexes[wordIndex]), 2);
}
}
for (int wordIndex = 0; wordIndex < wordIndexes.length; wordIndex++) {
int topicIndex = topicIndexes[wordIndex];
for (int dictionaryIndex = 0; dictionaryIndex < numberOfWords; dictionaryIndex++) {
if (dictionaryIndex == wordIndexes[wordIndex]) {
wordFactors.shiftValue(topicIndex, wordIndexes[wordIndex], learnRatio * (-1 + wordProbabilities.getValue(topicIndex, wordIndexes[wordIndex])));
} else {
wordFactors.shiftValue(topicIndex, wordIndexes[wordIndex], learnRatio * (wordProbabilities.getValue(topicIndex, wordIndexes[wordIndex])));
}
}
}
}
totalError *= 0.5F;
} // end of SGDtraining
logger.info(" iter:" + epocheIndex + ", loss:" + totalError);
logger.info(" iter:" + epocheIndex + ", sampling");
sample();
logger.info(" iter:" + epocheIndex + ", sample finished");
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float value = scalar.dotProduct(userVector, itemVector).getValue();
value += meanScore + userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex);
if (value > maximumScore) {
value = maximumScore;
} else if (value < minimumScore) {
value = minimumScore;
}
return value;
}
}

View File

@ -0,0 +1,281 @@
package com.jstarcraft.rns.model.content.rating;
import java.util.HashMap;
import java.util.Map;
import com.google.common.collect.HashBasedTable;
import com.google.common.collect.Table;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.attribute.MemoryQualityAttribute;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.ai.model.neuralnetwork.activation.ActivationFunction;
import com.jstarcraft.ai.model.neuralnetwork.activation.SoftMaxActivationFunction;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* TopicMF AT推荐器
*
* <pre>
* TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class TopicMFATModel extends MatrixFactorizationModel {
protected String commentField;
protected int commentDimension;
protected SparseMatrix W;
protected DenseMatrix documentFactors;
protected DenseMatrix wordFactors;
protected float K1, K2;
protected DenseVector userBiases;
protected DenseVector itemBiases;
// TODO 准备取消,父类已实现.
protected DenseMatrix userFactors;
protected DenseMatrix itemFactors;
// TODO topic似乎就是factor?
protected int numberOfTopics;
protected int numberOfWords;
protected int numberOfDocuments;
protected float lambda, lambdaU, lambdaV, lambdaB;
protected Table<Integer, Integer, Integer> userItemToDocument;
// TODO 准备取消,父类已实现.
protected float initMean;
protected float initStd;
protected DenseVector topicVector;
protected ActivationFunction function;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
commentField = configuration.getString("data.model.fields.comment");
commentDimension = model.getQualityInner(commentField);
MemoryQualityAttribute attribute = (MemoryQualityAttribute) space.getQualityAttribute(commentField);
Object[] documentValues = attribute.getDatas();
// init hyper-parameters
lambda = configuration.getFloat("recommender.regularization.lambda", 0.001F);
lambdaU = configuration.getFloat("recommender.regularization.lambdaU", 0.001F);
lambdaV = configuration.getFloat("recommender.regularization.lambdaV", 0.001F);
lambdaB = configuration.getFloat("recommender.regularization.lambdaB", 0.001F);
numberOfTopics = configuration.getInteger("recommender.topic.number", 10);
learnRatio = configuration.getFloat("recommender.iterator.learnrate", 0.01F);
epocheSize = configuration.getInteger("recommender.iterator.maximum", 10);
numberOfDocuments = scoreMatrix.getElementSize();
// count the number of words, build the word dictionary and
// userItemToDoc dictionary
Map<String, Integer> wordDictionaries = new HashMap<>();
Table<Integer, Integer, Float> documentTable = HashBasedTable.create();
// TODO rowCount改为documentIndex?
int rowCount = 0;
userItemToDocument = HashBasedTable.create();
for (DataInstance sample : model) {
int userIndex = sample.getQualityFeature(userDimension);
int itemIndex = sample.getQualityFeature(itemDimension);
int documentIndex = sample.getQualityFeature(commentDimension);
userItemToDocument.put(userIndex, itemIndex, rowCount);
// convert wordIds to wordIndices
String data = (String) documentValues[documentIndex];
String[] words = data.isEmpty() ? new String[0] : data.split(":");
for (String word : words) {
Integer wordIndex = wordDictionaries.get(word);
if (wordIndex == null) {
wordIndex = numberOfWords++;
wordDictionaries.put(word, wordIndex);
}
Float oldValue = documentTable.get(rowCount, wordIndex);
if (oldValue == null) {
oldValue = 0F;
}
float newValue = oldValue + 1F / words.length;
documentTable.put(rowCount, wordIndex, newValue);
}
rowCount++;
}
// build W
W = SparseMatrix.valueOf(numberOfDocuments, numberOfWords, documentTable);
userBiases = DenseVector.valueOf(userSize);
userBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
userFactors = DenseMatrix.valueOf(userSize, numberOfTopics);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, numberOfTopics);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
K1 = initStd;
K2 = initStd;
topicVector = DenseVector.valueOf(numberOfTopics);
function = new SoftMaxActivationFunction();
// init theta and phi
// TODO theta实际是documentFactors
documentFactors = DenseMatrix.valueOf(numberOfDocuments, numberOfTopics);
calculateTheta();
// TODO phi实际是wordFactors
wordFactors = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
wordFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
logger.info("number of users : " + userSize);
logger.info("number of Items : " + itemSize);
logger.info("number of words : " + wordDictionaries.size());
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseMatrix transposeThis = DenseMatrix.valueOf(numberOfTopics, numberOfTopics);
DenseMatrix thetaW = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
DenseMatrix thetaPhi = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
float wordLoss = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // userIdx
int itemIndex = term.getColumn(); // itemIdx
int documentIndex = userItemToDocument.get(userIndex, itemIndex);
float y_true = term.getValue();
float y_pred = predict(userIndex, itemIndex);
float error = y_true - y_pred;
totalError += error * error;
// update user item biases
float userBiasValue = userBiases.getValue(userIndex);
userBiases.shiftValue(userIndex, learnRatio * (error - lambdaB * userBiasValue));
totalError += lambdaB * userBiasValue * userBiasValue;
float itemBiasValue = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (error - lambdaB * itemBiasValue));
totalError += lambdaB * itemBiasValue * itemBiasValue;
// update user item factors
for (int factorIndex = 0; factorIndex < numberOfTopics; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactor - lambdaU * userFactor));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * userFactor - lambdaV * itemFactor));
totalError += lambdaU * userFactor * userFactor + lambdaV * itemFactor * itemFactor;
SparseVector documentVector = W.getRowVector(documentIndex);
for (VectorScalar documentTerm : documentVector) {
int wordIndex = documentTerm.getIndex();
float w_pred = scalar.dotProduct(documentFactors.getRowVector(documentIndex), wordFactors.getColumnVector(wordIndex)).getValue();
float w_true = documentTerm.getValue();
float w_error = w_true - w_pred;
wordLoss += w_error;
float derivative = 0F;
for (int topicIndex = 0; topicIndex < numberOfTopics; topicIndex++) {
if (factorIndex == topicIndex) {
derivative += w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (1 - documentFactors.getValue(documentIndex, topicIndex));
} else {
derivative += w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (-documentFactors.getValue(documentIndex, factorIndex));
}
// update K1 K2
K1 += learnRatio * lambda * w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (1 - documentFactors.getValue(documentIndex, topicIndex)) * Math.abs(userFactors.getValue(userIndex, topicIndex));
K2 += learnRatio * lambda * w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (1 - documentFactors.getValue(documentIndex, topicIndex)) * Math.abs(itemFactors.getValue(itemIndex, topicIndex));
}
userFactors.shiftValue(userIndex, factorIndex, learnRatio * K1 * derivative);
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * K2 * derivative);
}
}
}
// calculate theta
logger.info(" iter:" + epocheIndex + ", finish factors update");
// calculate wordLoss and loss
wordLoss = wordLoss / numberOfTopics;
totalError += wordLoss;
totalError *= 0.5F;
logger.info(" iter:" + epocheIndex + ", loss:" + totalError + ", wordLoss:" + wordLoss / 2F);
calculateTheta();
logger.info(" iter:" + epocheIndex + ", finish theta update");
// update phi by NMF
// TODO 此处操作可以整合
thetaW.dotProduct(documentFactors, true, W, false, MathCalculator.SERIAL);
transposeThis.dotProduct(documentFactors, true, documentFactors, false, MathCalculator.SERIAL);
thetaPhi.dotProduct(transposeThis, false, wordFactors, false, MathCalculator.SERIAL);
for (int topicIndex = 0; topicIndex < numberOfTopics; topicIndex++) {
for (int wordIndex = 0; wordIndex < numberOfWords; wordIndex++) {
float numerator = wordFactors.getValue(topicIndex, wordIndex) * thetaW.getValue(topicIndex, wordIndex);
float denominator = thetaPhi.getValue(topicIndex, wordIndex);
wordFactors.setValue(topicIndex, wordIndex, numerator / denominator);
}
}
logger.info(" iter:" + epocheIndex + ", finish phi update");
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
float value = meanScore + userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex);
value += scalar.dotProduct(userFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex)).getValue();
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
/**
* Calculate theta vectors via userFactors and itemFactors. thetaVector =
* softmax( exp(K1|u| + K2|v|) )
*/
private void calculateTheta() {
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
int documentIdx = userItemToDocument.get(userIndex, itemIndex);
DenseVector documentVector = documentFactors.getRowVector(documentIdx);
topicVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
value = Math.abs(userFactors.getValue(userIndex, index)) * K1 + Math.abs(itemFactors.getValue(itemIndex, index)) * K2;
scalar.setValue(value);
});
function.forward(topicVector, documentVector);
}
}
}

View File

@ -0,0 +1,277 @@
package com.jstarcraft.rns.model.content.rating;
import java.util.HashMap;
import java.util.Map;
import com.google.common.collect.HashBasedTable;
import com.google.common.collect.Table;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.data.attribute.MemoryQualityAttribute;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.ai.model.neuralnetwork.activation.ActivationFunction;
import com.jstarcraft.ai.model.neuralnetwork.activation.SoftMaxActivationFunction;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
/**
*
* TopicMF MT推荐器
*
* <pre>
* TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class TopicMFMTModel extends MatrixFactorizationModel {
protected String commentField;
protected int commentDimension;
protected SparseMatrix W;
protected DenseMatrix documentFactors;
protected DenseMatrix wordFactors;
protected float K;
protected DenseVector userBiases;
protected DenseVector itemBiases;
// TODO 准备取消,父类已实现.
protected DenseMatrix userFactors;
protected DenseMatrix itemFactors;
// TODO topic似乎就是factor?
protected int numberOfTopics;
protected int numberOfWords;
protected int numberOfDocuments;
protected float lambda, lambdaU, lambdaV, lambdaB;
protected Table<Integer, Integer, Integer> userItemToDocument;
// TODO 准备取消,父类已实现.
protected float initMean;
protected float initStd;
protected DenseVector topicVector;
protected ActivationFunction function;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
commentField = configuration.getString("data.model.fields.comment");
commentDimension = model.getQualityInner(commentField);
MemoryQualityAttribute attribute = (MemoryQualityAttribute) space.getQualityAttribute(commentField);
Object[] documentValues = attribute.getDatas();
// init hyper-parameters
lambda = configuration.getFloat("recommender.regularization.lambda", 0.001F);
lambdaU = configuration.getFloat("recommender.regularization.lambdaU", 0.001F);
lambdaV = configuration.getFloat("recommender.regularization.lambdaV", 0.001F);
lambdaB = configuration.getFloat("recommender.regularization.lambdaB", 0.001F);
numberOfTopics = configuration.getInteger("recommender.topic.number", 10);
learnRatio = configuration.getFloat("recommender.iterator.learnrate", 0.01F);
epocheSize = configuration.getInteger("recommender.iterator.maximum", 10);
numberOfDocuments = scoreMatrix.getElementSize();
// count the number of words, build the word dictionary and
// userItemToDoc dictionary
Map<String, Integer> wordDictionaries = new HashMap<>();
Table<Integer, Integer, Float> documentTable = HashBasedTable.create();
int rowCount = 0;
userItemToDocument = HashBasedTable.create();
for (DataInstance sample : model) {
int userIndex = sample.getQualityFeature(userDimension);
int itemIndex = sample.getQualityFeature(itemDimension);
int documentIndex = sample.getQualityFeature(commentDimension);
userItemToDocument.put(userIndex, itemIndex, rowCount);
// convert wordIds to wordIndices
String data = (String) documentValues[documentIndex];
String[] words = data.isEmpty() ? new String[0] : data.split(":");
for (String word : words) {
Integer wordIndex = wordDictionaries.get(word);
if (wordIndex == null) {
wordIndex = numberOfWords++;
wordDictionaries.put(word, wordIndex);
}
Float oldValue = documentTable.get(rowCount, wordIndex);
if (oldValue == null) {
oldValue = 0F;
}
float newValue = oldValue + 1F / words.length;
documentTable.put(rowCount, wordIndex, newValue);
}
rowCount++;
}
// build W
W = SparseMatrix.valueOf(numberOfDocuments, numberOfWords, documentTable);
userBiases = DenseVector.valueOf(userSize);
userBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
userFactors = DenseMatrix.valueOf(userSize, numberOfTopics);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, numberOfTopics);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
K = initStd;
topicVector = DenseVector.valueOf(numberOfTopics);
function = new SoftMaxActivationFunction();
// init theta and phi
// TODO theta实际是documentFactors
documentFactors = DenseMatrix.valueOf(numberOfDocuments, numberOfTopics);
calculateTheta();
// TODO phi实际是wordFactors
wordFactors = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
wordFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(0.01F));
});
logger.info("number of users : " + userSize);
logger.info("number of Items : " + itemSize);
logger.info("number of words : " + wordDictionaries.size());
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseMatrix transposeThis = DenseMatrix.valueOf(numberOfTopics, numberOfTopics);
DenseMatrix thetaW = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
DenseMatrix thetaPhi = DenseMatrix.valueOf(numberOfTopics, numberOfWords);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
float wordLoss = 0F;
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow(); // userIdx
int itemIndex = term.getColumn(); // itemIdx
int documentIndex = userItemToDocument.get(userIndex, itemIndex);
float y_true = term.getValue();
float y_pred = predict(userIndex, itemIndex);
float error = y_true - y_pred;
totalError += error * error;
// update user item biases
float userBiasValue = userBiases.getValue(userIndex);
userBiases.shiftValue(userIndex, learnRatio * (error - lambdaB * userBiasValue));
totalError += lambdaB * userBiasValue * userBiasValue;
float itemBiasValue = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (error - lambdaB * itemBiasValue));
totalError += lambdaB * itemBiasValue * itemBiasValue;
// update user item factors
for (int factorIndex = 0; factorIndex < numberOfTopics; factorIndex++) {
float userFactorValue = userFactors.getValue(userIndex, factorIndex);
float itemFactorValue = itemFactors.getValue(itemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (error * itemFactorValue - lambdaU * userFactorValue));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (error * userFactorValue - lambdaV * itemFactorValue));
totalError += lambdaU * userFactorValue * userFactorValue + lambdaV * itemFactorValue * itemFactorValue;
SparseVector documentVector = W.getRowVector(documentIndex);
for (VectorScalar documentTerm : documentVector) {
int wordIndex = documentTerm.getIndex();
float w_pred = scalar.dotProduct(documentFactors.getRowVector(documentIndex), wordFactors.getColumnVector(wordIndex)).getValue();
float w_true = documentTerm.getValue();
float w_error = w_true - w_pred;
wordLoss += w_error;
float derivative = 0F;
for (int topicIndex = 0; topicIndex < numberOfTopics; topicIndex++) {
if (factorIndex == topicIndex) {
derivative += w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (1 - documentFactors.getValue(documentIndex, topicIndex));
} else {
derivative += w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (-documentFactors.getValue(documentIndex, factorIndex));
}
// update K1 K2
K += learnRatio * lambda * w_error * wordFactors.getValue(topicIndex, wordIndex) * documentFactors.getValue(documentIndex, topicIndex) * (1 - documentFactors.getValue(documentIndex, topicIndex)) * Math.abs(userFactors.getValue(userIndex, topicIndex));
}
userFactors.shiftValue(userIndex, factorIndex, learnRatio * K * derivative * itemFactors.getValue(itemIndex, factorIndex));
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * K * derivative * userFactors.getValue(userIndex, factorIndex));
}
}
}
// calculate theta
logger.info(" iter:" + epocheIndex + ", finish factors update");
// calculate wordLoss and loss
wordLoss = wordLoss / numberOfTopics;
totalError += wordLoss;
totalError *= 0.5F;
logger.info(" iter:" + epocheIndex + ", loss:" + totalError + ", wordLoss:" + wordLoss / 2F);
calculateTheta();
logger.info(" iter:" + epocheIndex + ", finish theta update");
// update phi by NMF
// TODO 此处操作可以整合
thetaW.dotProduct(documentFactors, true, W, false, MathCalculator.SERIAL);
transposeThis.dotProduct(documentFactors, true, documentFactors, false, MathCalculator.SERIAL);
thetaPhi.dotProduct(transposeThis, false, wordFactors, false, MathCalculator.SERIAL);
for (int topicIndex = 0; topicIndex < numberOfTopics; topicIndex++) {
for (int wordIndex = 0; wordIndex < numberOfWords; wordIndex++) {
float numerator = wordFactors.getValue(topicIndex, wordIndex) * thetaW.getValue(topicIndex, wordIndex);
float denominator = thetaPhi.getValue(topicIndex, wordIndex);
wordFactors.setValue(topicIndex, wordIndex, numerator / denominator);
}
}
logger.info(" iter:" + epocheIndex + ", finish phi update");
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
float value = meanScore + userBiases.getValue(userIndex) + itemBiases.getValue(itemIndex);
value += scalar.dotProduct(userFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex)).getValue();
return value;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
/**
* Calculate theta vectors via userFactors and itemFactors. thetaVector =
* softmax( exp(K|u||v|) )
*/
private void calculateTheta() {
for (MatrixScalar term : scoreMatrix) {
int userIndex = term.getRow();
int itemIndex = term.getColumn();
int documentIdx = userItemToDocument.get(userIndex, itemIndex);
DenseVector documentVector = documentFactors.getRowVector(documentIdx);
topicVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
value = K * Math.abs(userFactors.getValue(userIndex, index)) * Math.abs(itemFactors.getValue(itemIndex, index));
scalar.setValue(value);
});
function.forward(topicVector, documentVector);
}
}
}

View File

@ -0,0 +1,6 @@
http://www.ntu.edu.sg/home/gaocong/
http://www.vldb.org/pvldb/vol10/p1010-liu.pdf
十二种POI算法.
http://spatialkeyword.sce.ntu.edu.sg/eval-vldb17/

View File

@ -0,0 +1,327 @@
package com.jstarcraft.rns.model.context.ranking;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.HashMatrix;
import com.jstarcraft.ai.math.structure.matrix.SparseMatrix;
import com.jstarcraft.ai.math.structure.vector.ArrayVector;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.Float2FloatKeyValue;
import com.jstarcraft.core.utility.KeyValue;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.MatrixFactorizationModel;
import com.jstarcraft.rns.model.exception.ModelException;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.longs.Long2FloatRBTreeMap;
/**
*
* Rank GeoFM推荐器
*
* <pre>
* Rank-GeoFM: A ranking based geographical factorization method for point of interest recommendation
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RankGeoFMModel extends MatrixFactorizationModel {
protected DenseMatrix explicitUserFactors, implicitUserFactors, itemFactors;
protected ArrayVector[] neighborWeights;
protected float margin, radius, balance;
protected DenseVector E;
protected DenseMatrix geoInfluences;
protected int knn;
protected Float2FloatKeyValue[] itemLocations;
private String longitudeField, latitudeField;
private int longitudeDimension, latitudeDimension;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
margin = configuration.getFloat("recommender.ranking.margin", 0.3F);
radius = configuration.getFloat("recommender.regularization.radius", 1F);
balance = configuration.getFloat("recommender.regularization.balance", 0.2F);
knn = configuration.getInteger("recommender.item.nearest.neighbour.number", 300);
longitudeField = configuration.getString("data.model.fields.longitude");
latitudeField = configuration.getString("data.model.fields.latitude");
DataModule locationModel = space.getModule("location");
longitudeDimension = locationModel.getQuantityInner(longitudeField);
latitudeDimension = locationModel.getQuantityInner(latitudeField);
geoInfluences = DenseMatrix.valueOf(itemSize, factorSize);
explicitUserFactors = DenseMatrix.valueOf(userSize, factorSize);
explicitUserFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
implicitUserFactors = DenseMatrix.valueOf(userSize, factorSize);
implicitUserFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(distribution.sample().floatValue());
});
itemLocations = new Float2FloatKeyValue[itemSize];
int itemDimension = locationModel.getQualityInner(itemField);
for (DataInstance instance : locationModel) {
int itemIndex = instance.getQualityFeature(itemDimension);
Float2FloatKeyValue itemLocation = new Float2FloatKeyValue(instance.getQuantityFeature(longitudeDimension), instance.getQuantityFeature(latitudeDimension));
itemLocations[itemIndex] = itemLocation;
}
calculateNeighborWeightMatrix(knn);
E = DenseVector.valueOf(itemSize + 1);
E.setValue(1, 1F);
for (int itemIndex = 2; itemIndex <= itemSize; itemIndex++) {
E.setValue(itemIndex, E.getValue(itemIndex - 1) + 1F / itemIndex);
}
geoInfluences = DenseMatrix.valueOf(itemSize, factorSize);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseMatrix explicitUserDeltas = DenseMatrix.valueOf(explicitUserFactors.getRowSize(), explicitUserFactors.getColumnSize());
DenseMatrix implicitUserDeltas = DenseMatrix.valueOf(implicitUserFactors.getRowSize(), implicitUserFactors.getColumnSize());
DenseMatrix itemDeltas = DenseMatrix.valueOf(itemFactors.getRowSize(), itemFactors.getColumnSize());
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
calculateGeoInfluenceMatrix();
totalError = 0F;
explicitUserDeltas.iterateElement(MathCalculator.PARALLEL, (element) -> {
element.setValue(explicitUserFactors.getValue(element.getRow(), element.getColumn()));
});
implicitUserDeltas.iterateElement(MathCalculator.PARALLEL, (element) -> {
element.setValue(implicitUserFactors.getValue(element.getRow(), element.getColumn()));
});
itemDeltas.iterateElement(MathCalculator.PARALLEL, (element) -> {
element.setValue(itemFactors.getValue(element.getRow(), element.getColumn()));
});
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
for (VectorScalar term : userVector) {
int positiveItemIndex = term.getIndex();
int sampleCount = 0;
float positiveScore = scalar.dotProduct(explicitUserDeltas.getRowVector(userIndex), itemDeltas.getRowVector(positiveItemIndex)).getValue() + scalar.dotProduct(implicitUserDeltas.getRowVector(userIndex), geoInfluences.getRowVector(positiveItemIndex)).getValue();
float positiveValue = term.getValue();
int negativeItemIndex;
float negativeScore;
float negativeValue;
while (true) {
negativeItemIndex = RandomUtility.randomInteger(itemSize);
negativeScore = scalar.dotProduct(explicitUserDeltas.getRowVector(userIndex), itemDeltas.getRowVector(negativeItemIndex)).getValue() + scalar.dotProduct(implicitUserDeltas.getRowVector(userIndex), geoInfluences.getRowVector(negativeItemIndex)).getValue();
negativeValue = 0F;
for (VectorScalar rateTerm : userVector) {
if (rateTerm.getIndex() == negativeItemIndex) {
negativeValue = rateTerm.getValue();
}
}
sampleCount++;
if ((indicator(positiveValue, negativeValue) && indicator(negativeScore + margin, positiveScore)) || sampleCount > itemSize) {
break;
}
}
if (indicator(positiveValue, negativeValue) && indicator(negativeScore + margin, positiveScore)) {
int sampleIndex = itemSize / sampleCount;
float s = LogisticUtility.getValue(negativeScore + margin - positiveScore);
totalError += E.getValue(sampleIndex) * s;
float uij = s * (1 - s);
float error = E.getValue(sampleIndex) * uij * learnRatio;
DenseVector positiveItemVector = itemFactors.getRowVector(positiveItemIndex);
DenseVector negativeItemVector = itemFactors.getRowVector(negativeItemIndex);
DenseVector explicitUserVector = explicitUserFactors.getRowVector(userIndex);
DenseVector positiveGeoVector = geoInfluences.getRowVector(positiveItemIndex);
DenseVector negativeGeoVector = geoInfluences.getRowVector(negativeItemIndex);
DenseVector implicitUserVector = implicitUserFactors.getRowVector(userIndex);
// TODO 可以并发计算
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
explicitUserVector.setValue(factorIndex, explicitUserVector.getValue(factorIndex) - (negativeItemVector.getValue(factorIndex) - positiveItemVector.getValue(factorIndex)) * error);
implicitUserVector.setValue(factorIndex, implicitUserVector.getValue(factorIndex) - (negativeGeoVector.getValue(factorIndex) - positiveGeoVector.getValue(factorIndex)) * error);
}
// TODO 可以并发计算
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float itemDelta = explicitUserVector.getValue(factorIndex) * error;
positiveItemVector.setValue(factorIndex, positiveItemVector.getValue(factorIndex) + itemDelta);
negativeItemVector.setValue(factorIndex, negativeItemVector.getValue(factorIndex) - itemDelta);
}
float explicitUserDelta = explicitUserVector.getNorm(2, true);
if (explicitUserDelta > radius) {
explicitUserDelta = radius / explicitUserDelta;
} else {
explicitUserDelta = 1F;
}
float implicitUserDelta = implicitUserVector.getNorm(2F, true);
if (implicitUserDelta > balance * radius) {
implicitUserDelta = balance * radius / implicitUserDelta;
} else {
implicitUserDelta = 1F;
}
float positiveItemDelta = positiveItemVector.getNorm(2, true);
if (positiveItemDelta > radius) {
positiveItemDelta = radius / positiveItemDelta;
} else {
positiveItemDelta = 1F;
}
float negativeItemDelta = negativeItemVector.getNorm(2, true);
if (negativeItemDelta > radius) {
negativeItemDelta = radius / negativeItemDelta;
} else {
negativeItemDelta = 1F;
}
// TODO 可以并发计算
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
if (explicitUserDelta != 1F) {
explicitUserVector.setValue(factorIndex, explicitUserVector.getValue(factorIndex) * explicitUserDelta);
}
if (implicitUserDelta != 1F) {
implicitUserVector.setValue(factorIndex, implicitUserVector.getValue(factorIndex) * implicitUserDelta);
}
if (positiveItemDelta != 1F) {
positiveItemVector.setValue(factorIndex, positiveItemVector.getValue(factorIndex) * positiveItemDelta);
}
if (negativeItemDelta != 1F) {
negativeItemVector.setValue(factorIndex, negativeItemVector.getValue(factorIndex) * negativeItemDelta);
}
}
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
/**
* @param k_nearest
* @return
*/
private void calculateNeighborWeightMatrix(Integer k_nearest) {
HashMatrix dataTable = new HashMatrix(true, itemSize, itemSize, new Long2FloatRBTreeMap());
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
List<KeyValue<Integer, Float>> locationNeighbors = new ArrayList<>(itemSize);
Float2FloatKeyValue location = itemLocations[itemIndex];
for (int neighborIndex = 0; neighborIndex < itemSize; neighborIndex++) {
if (itemIndex != neighborIndex) {
Float2FloatKeyValue neighborLocation = itemLocations[neighborIndex];
float distance = getDistance(location.getKey(), location.getValue(), neighborLocation.getKey(), neighborLocation.getValue());
locationNeighbors.add(new KeyValue<>(neighborIndex, distance));
}
}
Collections.sort(locationNeighbors, (left, right) -> {
// 升序
return left.getValue().compareTo(right.getValue());
});
locationNeighbors = locationNeighbors.subList(0, k_nearest);
for (int index = 0; index < locationNeighbors.size(); index++) {
int neighborItemIdx = locationNeighbors.get(index).getKey();
float weight;
if (locationNeighbors.get(index).getValue() < 0.5F) {
weight = 1F / 0.5F;
} else {
weight = 1F / (locationNeighbors.get(index).getValue());
}
dataTable.setValue(itemIndex, neighborItemIdx, weight);
}
}
SparseMatrix matrix = SparseMatrix.valueOf(itemSize, itemSize, dataTable);
neighborWeights = new ArrayVector[itemSize];
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
ArrayVector neighborVector = new ArrayVector(matrix.getRowVector(itemIndex));
neighborVector.scaleValues(1F / neighborVector.getSum(false));
neighborWeights[itemIndex] = neighborVector;
}
}
private void calculateGeoInfluenceMatrix() throws ModelException {
for (int itemIndex = 0; itemIndex < itemSize; itemIndex++) {
ArrayVector neighborVector = neighborWeights[itemIndex];
if (neighborVector.getElementSize() == 0) {
continue;
}
DenseVector geoVector = geoInfluences.getRowVector(itemIndex);
geoVector.setValues(0F);
for (VectorScalar term : neighborVector) {
DenseVector itemVector = itemFactors.getRowVector(term.getIndex());
geoVector.iterateElement(MathCalculator.SERIAL, (scalar) -> {
int index = scalar.getIndex();
float value = scalar.getValue();
scalar.setValue(value + itemVector.getValue(index) * term.getValue());
});
}
}
}
private float getDistance(float leftLatitude, float leftLongitude, float rightLatitude, float rightLongitude) {
float radius = 6378137F;
leftLatitude = (float) (leftLatitude * Math.PI / 180F);
rightLatitude = (float) (rightLatitude * Math.PI / 180F);
float latitude = leftLatitude - rightLatitude;
float longitude = (float) ((leftLongitude - rightLongitude) * Math.PI / 180F);
latitude = (float) Math.sin(latitude / 2F);
longitude = (float) Math.sin(longitude / 2F);
float distance = (float) (2F * radius * Math.asin(Math.sqrt(latitude * latitude + Math.cos(leftLatitude) * Math.cos(rightLatitude) * longitude * longitude)));
return distance / 1000F;
}
private boolean indicator(double left, double right) {
return left > right;
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
float value = scalar.dotProduct(explicitUserFactors.getRowVector(userIndex), itemFactors.getRowVector(itemIndex)).getValue();
value += scalar.dotProduct(implicitUserFactors.getRowVector(userIndex), geoInfluences.getRowVector(itemIndex)).getValue();
instance.setQuantityMark(value);
}
}

View File

@ -0,0 +1,216 @@
package com.jstarcraft.rns.model.context.ranking;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.SocialModel;
import com.jstarcraft.rns.utility.LogisticUtility;
import it.unimi.dsi.fastutil.ints.IntSet;
/**
*
* SBPR推荐器
*
* <pre>
* Social Bayesian Personalized Ranking (SBPR)
* Leveraging Social Connections to Improve Personalized Ranking for Collaborative Filtering
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
// TODO 仍需重构
public class SBPRModel extends SocialModel {
/**
* items biases vector
*/
private DenseVector itemBiases;
/**
* bias regularization
*/
protected float regBias;
/**
* find items rated by trusted neighbors only
*/
// TODO 考虑重构为List<IntSet>
private List<List<Integer>> socialItemList;
private List<IntSet> userItemSet;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
regBias = configuration.getFloat("recommender.bias.regularization", 0.01F);
// cacheSpec = conf.get("guava.cache.spec",
// "maximumSize=5000,expireAfterAccess=50m");
itemBiases = DenseVector.valueOf(itemSize);
itemBiases.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
userItemSet = getUserItemSet(scoreMatrix);
// TODO 考虑重构
// find items rated by trusted neighbors only
socialItemList = new ArrayList<>(userSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector userVector = scoreMatrix.getRowVector(userIndex);
IntSet itemSet = userItemSet.get(userIndex);
// find items rated by trusted neighbors only
SparseVector socialVector = socialMatrix.getRowVector(userIndex);
List<Integer> socialList = new LinkedList<>();
for (VectorScalar term : socialVector) {
int socialIndex = term.getIndex();
userVector = scoreMatrix.getRowVector(socialIndex);
for (VectorScalar enrty : userVector) {
int itemIndex = enrty.getIndex();
// v's rated items
if (!itemSet.contains(itemIndex) && !socialList.contains(itemIndex)) {
socialList.add(itemIndex);
}
}
}
socialItemList.add(new ArrayList<>(socialList));
}
}
@Override
protected void doPractice() {
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
for (int sampleIndex = 0, sampleTimes = userSize * 100; sampleIndex < sampleTimes; sampleIndex++) {
// uniformly draw (userIdx, posItemIdx, k, negItemIdx)
int userIndex, positiveItemIndex, negativeItemIndex;
// userIdx
SparseVector userVector;
do {
userIndex = RandomUtility.randomInteger(userSize);
userVector = scoreMatrix.getRowVector(userIndex);
} while (userVector.getElementSize() == 0);
// positive item index
positiveItemIndex = userVector.getIndex(RandomUtility.randomInteger(userVector.getElementSize()));
float positiveScore = predict(userIndex, positiveItemIndex);
// social Items List
// TODO 应该修改为IntSet合适点.
List<Integer> socialList = socialItemList.get(userIndex);
IntSet itemSet = userItemSet.get(userIndex);
do {
negativeItemIndex = RandomUtility.randomInteger(itemSize);
} while (itemSet.contains(negativeItemIndex) || socialList.contains(negativeItemIndex));
float negativeScore = predict(userIndex, negativeItemIndex);
if (socialList.size() > 0) {
// if having social neighbors
int itemIndex = socialList.get(RandomUtility.randomInteger(socialList.size()));
float socialScore = predict(userIndex, itemIndex);
SparseVector socialVector = socialMatrix.getRowVector(userIndex);
float socialWeight = 0F;
for (VectorScalar term : socialVector) {
int socialIndex = term.getIndex();
itemSet = userItemSet.get(socialIndex);
if (itemSet.contains(itemIndex)) {
socialWeight += 1;
}
}
float positiveError = (positiveScore - socialScore) / (1 + socialWeight);
float negativeError = socialScore - negativeScore;
float positiveGradient = LogisticUtility.getValue(-positiveError), negativeGradient = LogisticUtility.getValue(-negativeError);
float error = (float) (-Math.log(1 - positiveGradient) - Math.log(1 - negativeGradient));
totalError += error;
// update bi, bk, bj
float positiveBias = itemBiases.getValue(positiveItemIndex);
itemBiases.shiftValue(positiveItemIndex, learnRatio * (positiveGradient / (1F + socialWeight) - regBias * positiveBias));
totalError += regBias * positiveBias * positiveBias;
float socialBias = itemBiases.getValue(itemIndex);
itemBiases.shiftValue(itemIndex, learnRatio * (-positiveGradient / (1F + socialWeight) + negativeGradient - regBias * socialBias));
totalError += regBias * socialBias * socialBias;
float negativeBias = itemBiases.getValue(negativeItemIndex);
itemBiases.shiftValue(negativeItemIndex, learnRatio * (-negativeGradient - regBias * negativeBias));
totalError += regBias * negativeBias * negativeBias;
// update P, Q
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
float negativeFactor = itemFactors.getValue(negativeItemIndex, factorIndex);
float delta = positiveGradient * (positiveFactor - itemFactor) / (1F + socialWeight) + negativeGradient * (itemFactor - negativeFactor);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (delta - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (positiveGradient * userFactor / (1F + socialWeight) - itemRegularization * positiveFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (negativeGradient * (-userFactor) - itemRegularization * negativeFactor));
delta = positiveGradient * (-userFactor / (1F + socialWeight)) + negativeGradient * userFactor;
itemFactors.shiftValue(itemIndex, factorIndex, learnRatio * (delta - itemRegularization * itemFactor));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveFactor * positiveFactor + itemRegularization * negativeFactor * negativeFactor + itemRegularization * itemFactor * itemFactor;
}
} else {
// if no social neighbors, the same as BPR
float error = positiveScore - negativeScore;
totalError += error;
float gradient = LogisticUtility.getValue(-error);
// update bi, bj
float positiveBias = itemBiases.getValue(positiveItemIndex);
itemBiases.shiftValue(positiveItemIndex, learnRatio * (gradient - regBias * positiveBias));
totalError += regBias * positiveBias * positiveBias;
float negativeBias = itemBiases.getValue(negativeItemIndex);
itemBiases.shiftValue(negativeItemIndex, learnRatio * (-gradient - regBias * negativeBias));
totalError += regBias * negativeBias * negativeBias;
// update user factors, item factors
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float positiveFactor = itemFactors.getValue(positiveItemIndex, factorIndex);
float negItemFactorValue = itemFactors.getValue(negativeItemIndex, factorIndex);
userFactors.shiftValue(userIndex, factorIndex, learnRatio * (gradient * (positiveFactor - negItemFactorValue) - userRegularization * userFactor));
itemFactors.shiftValue(positiveItemIndex, factorIndex, learnRatio * (gradient * userFactor - itemRegularization * positiveFactor));
itemFactors.shiftValue(negativeItemIndex, factorIndex, learnRatio * (gradient * (-userFactor) - itemRegularization * negItemFactorValue));
totalError += userRegularization * userFactor * userFactor + itemRegularization * positiveFactor * positiveFactor + itemRegularization * negItemFactorValue * negItemFactorValue;
}
}
}
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
protected float predict(int userIndex, int itemIndex) {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
return itemBiases.getValue(itemIndex) + scalar.dotProduct(userVector, itemVector).getValue();
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
instance.setQuantityMark(predict(userIndex, itemIndex));
}
}

View File

@ -0,0 +1,169 @@
package com.jstarcraft.rns.model.context.rating;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.vector.DenseVector;
import com.jstarcraft.ai.math.structure.vector.SparseVector;
import com.jstarcraft.ai.math.structure.vector.VectorScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.SocialModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* RSTE推荐器
*
* <pre>
* Learning to Recommend with Social Trust Ensemble
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class RSTEModel extends SocialModel {
private float userSocialRatio;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
userSocialRatio = configuration.getFloat("recommender.user.social.ratio", 0.8F);
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector socialFactors = DenseVector.valueOf(factorSize);
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
DenseMatrix userDeltas = DenseMatrix.valueOf(userSize, factorSize);
DenseMatrix itemDeltas = DenseMatrix.valueOf(itemSize, factorSize);
// ratings
for (int userIndex = 0; userIndex < userSize; userIndex++) {
SparseVector socialVector = socialMatrix.getRowVector(userIndex);
float socialWeight = 0F;
socialFactors.setValues(0F);
for (VectorScalar socialTerm : socialVector) {
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
socialFactors.setValue(factorIndex, socialFactors.getValue(factorIndex) + socialTerm.getValue() * userFactors.getValue(socialTerm.getIndex(), factorIndex));
}
socialWeight += socialTerm.getValue();
}
DenseVector userVector = userFactors.getRowVector(userIndex);
for (VectorScalar rateTerm : scoreMatrix.getRowVector(userIndex)) {
int itemIndex = rateTerm.getIndex();
float score = rateTerm.getValue();
score = (score - minimumScore) / (maximumScore - minimumScore);
// compute directly to speed up calculation
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float predict = scalar.dotProduct(userVector, itemVector).getValue();
float sum = 0F;
for (VectorScalar socialTerm : socialVector) {
sum += socialTerm.getValue() * scalar.dotProduct(userFactors.getRowVector(socialTerm.getIndex()), itemVector).getValue();
}
predict = userSocialRatio * predict + (1F - userSocialRatio) * (socialWeight > 0F ? sum / socialWeight : 0F);
float error = LogisticUtility.getValue(predict) - score;
totalError += error * error;
error = LogisticUtility.getGradient(predict) * error;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float itemFactor = itemFactors.getValue(itemIndex, factorIndex);
float userDelta = userSocialRatio * error * itemFactor + userRegularization * userFactor;
float socialFactor = socialWeight > 0 ? socialFactors.getValue(factorIndex) / socialWeight : 0;
float itemDelta = error * (userSocialRatio * userFactor + (1 - userSocialRatio) * socialFactor) + itemRegularization * itemFactor;
userDeltas.shiftValue(userIndex, factorIndex, userDelta);
itemDeltas.shiftValue(itemIndex, factorIndex, itemDelta);
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
}
}
}
// social
for (int trusterIndex = 0; trusterIndex < userSize; trusterIndex++) {
SparseVector trusterVector = socialMatrix.getColumnVector(trusterIndex);
for (VectorScalar term : trusterVector) {
int trusteeIndex = term.getIndex();
SparseVector trusteeVector = socialMatrix.getRowVector(trusteeIndex);
DenseVector userVector = userFactors.getRowVector(trusteeIndex);
float socialWeight = 0F;
for (VectorScalar socialTerm : trusteeVector) {
socialWeight += socialTerm.getValue();
}
for (VectorScalar rateTerm : scoreMatrix.getRowVector(trusteeIndex)) {
int itemIndex = rateTerm.getIndex();
float score = rateTerm.getValue();
score = (score - minimumScore) / (maximumScore - minimumScore);
// compute prediction for user-item (p, j)
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float predict = scalar.dotProduct(userVector, itemVector).getValue();
float sum = 0F;
for (VectorScalar socialTerm : trusteeVector) {
sum += socialTerm.getValue() * scalar.dotProduct(itemFactors.getRowVector(socialTerm.getIndex()), itemVector).getValue();
}
predict = userSocialRatio * predict + (1F - userSocialRatio) * (socialWeight > 0F ? sum / socialWeight : 0F);
// double pred = predict(p, j, false);
float error = LogisticUtility.getValue(predict) - score;
error = LogisticUtility.getGradient(predict) * error * term.getValue();
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
userDeltas.shiftValue(trusterIndex, factorIndex, (1 - userSocialRatio) * error * itemFactors.getValue(itemIndex, factorIndex));
}
}
}
}
userFactors.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
float value = element.getValue();
element.setValue(value + userDeltas.getValue(row, column) * -learnRatio);
});
itemFactors.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
float value = element.getValue();
element.setValue(value + itemDeltas.getValue(row, column) * -learnRatio);
});
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
DefaultScalar scalar = DefaultScalar.getInstance();
DenseVector userVector = userFactors.getRowVector(userIndex);
DenseVector itemVector = itemFactors.getRowVector(itemIndex);
float predict = scalar.dotProduct(userVector, itemVector).getValue();
float sum = 0F, socialWeight = 0F;
SparseVector socialVector = socialMatrix.getRowVector(userIndex);
for (VectorScalar soicalTerm : socialVector) {
float score = soicalTerm.getValue();
DenseVector soicalFactor = userFactors.getRowVector(soicalTerm.getIndex());
sum += score * scalar.dotProduct(soicalFactor, itemVector).getValue();
socialWeight += score;
}
predict = userSocialRatio * predict + (1 - userSocialRatio) * (socialWeight > 0 ? sum / socialWeight : 0);
instance.setQuantityMark(denormalize(LogisticUtility.getValue(predict)));
}
}

View File

@ -0,0 +1,160 @@
package com.jstarcraft.rns.model.context.rating;
import java.util.ArrayList;
import java.util.List;
import com.jstarcraft.ai.data.DataInstance;
import com.jstarcraft.ai.data.DataModule;
import com.jstarcraft.ai.data.DataSpace;
import com.jstarcraft.ai.math.structure.DefaultScalar;
import com.jstarcraft.ai.math.structure.MathCalculator;
import com.jstarcraft.ai.math.structure.matrix.DenseMatrix;
import com.jstarcraft.ai.math.structure.matrix.MatrixScalar;
import com.jstarcraft.core.common.option.Option;
import com.jstarcraft.core.utility.RandomUtility;
import com.jstarcraft.rns.model.SocialModel;
import com.jstarcraft.rns.utility.LogisticUtility;
/**
*
* SoRec推荐器
*
* <pre>
* SoRec: Social recommendation using probabilistic matrix factorization
* 参考LibRec团队
* </pre>
*
* @author Birdy
*
*/
public class SoRecModel extends SocialModel {
/**
* adaptive learn rate
*/
private DenseMatrix socialFactors;
private float regScore, regSocial;
private List<Integer> inDegrees, outDegrees;
@Override
public void prepare(Option configuration, DataModule model, DataSpace space) {
super.prepare(configuration, model, space);
userFactors = DenseMatrix.valueOf(userSize, factorSize);
userFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
itemFactors = DenseMatrix.valueOf(itemSize, factorSize);
itemFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
socialFactors = DenseMatrix.valueOf(userSize, factorSize);
socialFactors.iterateElement(MathCalculator.SERIAL, (scalar) -> {
scalar.setValue(RandomUtility.randomFloat(1F));
});
regScore = configuration.getFloat("recommender.rate.social.regularization", 0.01F);
regSocial = configuration.getFloat("recommender.user.social.regularization", 0.01F);
inDegrees = new ArrayList<>(userSize);
outDegrees = new ArrayList<>(userSize);
for (int userIndex = 0; userIndex < userSize; userIndex++) {
int in = socialMatrix.getColumnScope(userIndex);
int out = socialMatrix.getRowScope(userIndex);
inDegrees.add(in);
outDegrees.add(out);
}
}
@Override
protected void doPractice() {
DefaultScalar scalar = DefaultScalar.getInstance();
for (int epocheIndex = 0; epocheIndex < epocheSize; epocheIndex++) {
totalError = 0F;
DenseMatrix userDeltas = DenseMatrix.valueOf(userSize, factorSize);
DenseMatrix itemDeltas = DenseMatrix.valueOf(itemSize, factorSize);
DenseMatrix socialDeltas = DenseMatrix.valueOf(userSize, factorSize);
// ratings
for (MatrixScalar term : scoreMatrix) {
int userIdx = term.getRow();
int itemIdx = term.getColumn();
float score = term.getValue();
float predict = super.predict(userIdx, itemIdx);
float error = LogisticUtility.getValue(predict) - (score - minimumScore) / (maximumScore - minimumScore);
totalError += error * error;
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIdx, factorIndex);
float itemFactor = itemFactors.getValue(itemIdx, factorIndex);
userDeltas.shiftValue(userIdx, factorIndex, LogisticUtility.getGradient(predict) * error * itemFactor + userRegularization * userFactor);
itemDeltas.shiftValue(itemIdx, factorIndex, LogisticUtility.getGradient(predict) * error * userFactor + itemRegularization * itemFactor);
totalError += userRegularization * userFactor * userFactor + itemRegularization * itemFactor * itemFactor;
}
}
// friends
// TODO 此处是对称矩阵,是否有方法减少计算?
for (MatrixScalar term : socialMatrix) {
int userIndex = term.getRow();
int socialIndex = term.getColumn();
float socialScore = term.getValue();
// tuv ~ cik in the original paper
if (socialScore == 0F) {
continue;
}
float socialPredict = scalar.dotProduct(userFactors.getRowVector(userIndex), socialFactors.getRowVector(socialIndex)).getValue();
float socialInDegree = inDegrees.get(socialIndex); // ~ d-(k)
float userOutDegree = outDegrees.get(userIndex); // ~ d+(i)
float weight = (float) Math.sqrt(socialInDegree / (userOutDegree + socialInDegree));
float socialError = LogisticUtility.getValue(socialPredict) - weight * socialScore;
totalError += regScore * socialError * socialError;
socialPredict = LogisticUtility.getGradient(socialPredict);
for (int factorIndex = 0; factorIndex < factorSize; factorIndex++) {
float userFactor = userFactors.getValue(userIndex, factorIndex);
float socialFactor = socialFactors.getValue(socialIndex, factorIndex);
userDeltas.shiftValue(userIndex, factorIndex, regScore * socialPredict * socialError * socialFactor);
socialDeltas.shiftValue(socialIndex, factorIndex, regScore * socialPredict * socialError * userFactor + regSocial * socialFactor);
totalError += regSocial * socialFactor * socialFactor;
}
}
userFactors.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
float value = element.getValue();
element.setValue(value + userDeltas.getValue(row, column) * -learnRatio);
});
itemFactors.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
float value = element.getValue();
element.setValue(value + itemDeltas.getValue(row, column) * -learnRatio);
});
socialFactors.iterateElement(MathCalculator.PARALLEL, (element) -> {
int row = element.getRow();
int column = element.getColumn();
float value = element.getValue();
element.setValue(value + socialDeltas.getValue(row, column) * -learnRatio);
});
totalError *= 0.5F;
if (isConverged(epocheIndex) && isConverged) {
break;
}
isLearned(epocheIndex);
currentError = totalError;
}
}
@Override
public void predict(DataInstance instance) {
int userIndex = instance.getQualityFeature(userDimension);
int itemIndex = instance.getQualityFeature(itemDimension);
float predict = super.predict(userIndex, itemIndex);
predict = denormalize(LogisticUtility.getValue(predict));
instance.setQuantityMark(predict);
}
}

Some files were not shown because too many files have changed in this diff Show More