GNNRecom/gnnrec/kgrec
jeny b6dbdad80f Initial commit 2021-11-16 15:04:52 +08:00
..
data Initial commit 2021-11-16 15:04:52 +08:00
utils Initial commit 2021-11-16 15:04:52 +08:00
__init__.py Initial commit 2021-11-16 15:04:52 +08:00
random_walk.py Initial commit 2021-11-16 15:04:52 +08:00
rank.py Initial commit 2021-11-16 15:04:52 +08:00
readme.md Initial commit 2021-11-16 15:04:52 +08:00
recall.py Initial commit 2021-11-16 15:04:52 +08:00
scibert.py Initial commit 2021-11-16 15:04:52 +08:00
train.py Initial commit 2021-11-16 15:04:52 +08:00

readme.md

基于图神经网络的推荐算法

数据集

oag-cs - 使用OAG微软学术数据构造的计算机领域的学术网络readme

预训练顶点嵌入

使用metapath2vec随机游走+word2vec预训练顶点嵌入作为GNN模型的顶点输入特征

  1. 随机游走
python -m gnnrec.kgrec.random_walk model/word2vec/oag_cs_corpus.txt
  1. 训练词向量
python -m gnnrec.hge.metapath2vec.train_word2vec --size=128 --workers=8 model/word2vec/oag_cs_corpus.txt model/word2vec/oag_cs.model

召回

使用微调后的SciBERT模型readme 第2步将查询词编码为向量与预先计算好的论文标题向量计算余弦相似度取top k

python -m gnnrec.kgrec.recall

召回结果示例:

graph neural network

0.9629	Aggregation Graph Neural Networks
0.9579	Neural Graph Learning: Training Neural Networks Using Graphs
0.9556	Heterogeneous Graph Neural Network
0.9552	Neural Graph Machines: Learning Neural Networks Using Graphs
0.9490	On the choice of graph neural network architectures
0.9474	Measuring and Improving the Use of Graph Information in Graph Neural Networks
0.9362	Challenging the generalization capabilities of Graph Neural Networks for network modeling
0.9295	Strategies for Pre-training Graph Neural Networks
0.9142	Supervised Neural Network Models for Processing Graphs
0.9112	Geometrically Principled Connections in Graph Neural Networks

recommendation algorithm based on knowledge graph

0.9172	Research on Video Recommendation Algorithm Based on Knowledge Reasoning of Knowledge Graph
0.8972	An Improved Recommendation Algorithm in Knowledge Network
0.8558	A personalized recommendation algorithm based on interest graph
0.8431	An Improved Recommendation Algorithm Based on Graph Model
0.8334	The Research of Recommendation Algorithm based on Complete Tripartite Graph Model
0.8220	Recommendation Algorithm based on Link Prediction and Domain Knowledge in Retail Transactions
0.8167	Recommendation Algorithm Based on Graph-Model Considering User Background Information
0.8034	A Tripartite Graph Recommendation Algorithm Based on Item Information and User Preference
0.7774	Improvement of TF-IDF Algorithm Based on Knowledge Graph
0.7770	Graph Searching Algorithms for Semantic-Social Recommendation

scholar disambiguation

0.9690	Scholar search-oriented author disambiguation
0.9040	Author name disambiguation in scientific collaboration and mobility cases
0.8901	Exploring author name disambiguation on PubMed-scale
0.8852	Author Name Disambiguation in Heterogeneous Academic Networks
0.8797	KDD Cup 2013: author disambiguation
0.8796	A survey of author name disambiguation techniques: 20102016
0.8721	Who is Who: Name Disambiguation in Large-Scale Scientific Literature
0.8660	Use of ResearchGate and Google CSE for author name disambiguation
0.8643	Automatic Methods for Disambiguating Author Names in Bibliographic Data Repositories
0.8641	A brief survey of automatic methods for author name disambiguation

精排

构造ground truth

1验证集

从AMiner发布的 AI 2000人工智能全球最具影响力学者榜单 抓取人工智能20个子领域的top 100学者

pip install scrapy>=2.3.0
cd gnnrec/kgrec/data/preprocess
scrapy runspider ai2000_crawler.py -a save_path=/home/zzy/GNN-Recommendation/data/rank/ai2000.json

与oag-cs数据集的学者匹配并人工确认一些排名较高但未匹配上的学者作为学者排名ground truth验证集

export DJANGO_SETTINGS_MODULE=academic_graph.settings.common
export SECRET_KEY=xxx
python -m gnnrec.kgrec.data.preprocess.build_author_rank build-val

2训练集

参考AI 2000的计算公式根据某个领域的论文引用数加权求和构造学者排名作为ground truth训练集

计算公式: 计算公式假设一篇论文有n个作者第k作者的权重为1/k最后一个视为通讯作者权重为1/2归一化之后计算论文引用数的加权求和

python -m gnnrec.kgrec.data.preprocess.build_author_rank build-train

3评估ground truth训练集的质量

python -m gnnrec.kgrec.data.preprocess.build_author_rank eval
nDGC@100=0.2420 Precision@100=0.1859    Recall@100=0.2016
nDGC@50=0.2308  Precision@50=0.2494     Recall@50=0.1351
nDGC@20=0.2492  Precision@20=0.3118     Recall@20=0.0678
nDGC@10=0.2743  Precision@10=0.3471     Recall@10=0.0376
nDGC@5=0.3165   Precision@5=0.3765      Recall@5=0.0203

4采样三元组

从学者排名训练集中采样三元组(t, ap, an)表示对于领域t学者ap的排名在an之前

python -m gnnrec.kgrec.data.preprocess.build_author_rank sample

训练GNN模型

python -m gnnrec.kgrec.train model/word2vec/oag-cs.model model/garec_gnn.pt data/rank/author_embed.pt