Hu and Singh 2021 - Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Paper for: hu21_trans_is_all_you_need
Train 7 tasks – some of them language-vision, some of them language only – at the same time using one model. Language goes through a language encoder. Images go through an image encoder. There is a separate head for each task on the decoder.
See Transformer
Cites carion20_end_to_end_objec_detec_with_trans (see Carion et al 2020 - End-to-End Object Detection with Transformers)
1. things to look up:
- warm-up cosine learning rate for Adam optimizer