Hu and Singh 2021 - Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Paper for: hu21_trans_is_all_you_need

Train 7 tasks – some of them language-vision, some of them language only – at the same time using one model. Language goes through a language encoder. Images go through an image encoder. There is a separate head for each task on the decoder.

See Transformer

Cites carion20_end_to_end_objec_detec_with_trans (see Carion et al 2020 - End-to-End Object Detection with Transformers)

1. things to look up:

warm-up cosine learning rate for Adam optimizer

2. bib

bibliography/references.bib