- Signals
- Visual Knowledge signal
- ResNet101 -> 分patch -> RRSA
- RRSA (Region Relationship Self Attention) : 不仅考虑patch特征的相似度,还考虑两个patch之间的相对位置
- Clinical Knowledge Signal
- symptom node embedding (from MBERT) ✖️ symptom probability(from classifier)
- Graph Attention: based on symptom relationship graph
- Contextual Knowledge Signal
- Previous output -> MBERT -> Masked MHA
- Visual Knowledge signal
- U connection
- 自回归式Decoder,考虑previous output和encoder visual signals
- Cross Attention的Q是previous output,但K和V不再是last Encoder的输出,而是N-i+1个Encoder的输出
- Injected Knowledge Distiller
- Last Decoder的输出通过attention整合clinical knowledge signal和contextual knowledge signal,沿序列维度拼接后预测下一个词的输出