Belikov 2021 – Probing Classifiers: Promises, Shortcomings, and Advances

1. why probing?

"A main motivation in this body of work is the opacity of the representations. Compared to performance on downstream tasks, probing classifiers aim to provide more nuanced evaluations w.r.t simple properties"
"Good probing performance is often taken to indicate several potential situations: good quality of the representations w.r.t the probing property, readability of information found in the representations, or its extractability."

"A first concern with the framework is how to interpret the results of a probing classifier experiment."
compare with baselines – static embedding
compare with "skylines" – human evaluation
Hewitt and Liang 2019 (TODO) – compare with control tasks, where the labels to words are assigned randomly

a balance between simplicity and performance
an argument for simplicity: you don't want to overfit
an argument for complexity: the original model contains non-linearities, so to see if semantic information is present in the embeddings, our probe should also contain non-linearities
- question: what is this controlling for? I think: for the fact that the probe may just memorize whatever, e.g., POS label it has been assigned. So there's no need for it to look for POS information in the embedding.
"parameter-less" methods:

"Some prior work acknowledged that conclusions can only be made about the existing trained models, not about general architectures"