Hybrid transformer with multi-level fusion for multimodal knowledge graph completion

SIGIR 2022

用统一的框架实现三个KG任务

Multimodal Link Prediction
Multimodal Relation extraction (MRE)
MNER is the task of extracting named entities from text sequences and corresponding images (MNER)

框架
#

RE：对CLS token进行预测

NER：对每个词向量进行预测

链路预测：对mask token进行预测

视觉：直接混合attention

文本：先self-attention再cross-attention