self-supervision
1. for audio
- two main approaches
- contrastive loss: see Oord et al 2019 – Representation Learning with Contrastive Predictive Coding
- discriminate between representation for an interval and representations for distractors
- reconstructive loss: see TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
- reconstruct something about the masked segment – usually spectrogram or spectral features
- contrastive loss: see Oord et al 2019 – Representation Learning with Contrastive Predictive Coding