first derivative

1. why is the first derivative the best linear approximation of a function?

first, what does it mean to be the best linear approximation of a function?
consider a function in 1D \(f(x)\)
We say that an approximation, given by coefficients \(A,B\), is best if \(f(x) = A + Bx + o(|x-a|)\)
Here, \(o(|x-a|)\) is an error term that is dominated by any linear function.
Quick reminder for myself: Why does this describe the best linear approximation? What if we could correct our term and get a better than linear improvement? That would mean we would have \(f(x) = A + Bx + Cx + o(|x-a|)\), where the error term is from the same class, but now even smaller than before. But this would mean \(o(|x-a|) = Cx + (|x-a|)\). But this can't be. Look at the RHS: we're adding a linear function to something that is dominated by a linear function and still getting something in the same class (see LHS).
It turns out that only the first derivative can give these coefficients.
Quick note: in computer science, little \(o\) notation deals with things as \(n\rightarrow \infty\). Here we are concerned with what happens as \(n\rightarrow 0\).
see more https://math.stackexchange.com/questions/1784262/how-is-the-derivative-truly-literally-the-best-linear-approximation-near-a-po