e3nn

From Tess Smidt and Mario Geiger

1. Inputs

The inputs are irreps (see irreducible representation) of O(3). Note that there is some linguistic ambiguity here. Irrep is used to both refer to the:

homomorphism between the group and some set of matrices (see group action)
the vector space that these matrices act on

So what you do is specify the types of vector spaces that your inputs come from. So I could say my inputs are 47x0e + 5x1e. This means I have 47 scalars and 5 vectors. (See the e3nn documentation for the interpretation for parity, e and o, and also for the labeling of irreps)

What does it mean for our inputs to be irreps? It means that we have defined, for example for O(3), the way that the inputs transform with the group.

Why does the above matter? Because ultimately we want equivariance (see group invariant and group equivariant functions). So if our inputs transform as irreps then: \(f(D_x x) = D_y f(x)\) where \(f:X\rightarrow Y\) is our function, \(D_x\) is the way that O(3) acts on \(X\), and \(D_y\) is the way O(3) acts on \(Y\). What is our function \(f\)? See below.

2. Filters

Our filters are the "tensor product" of our input with the spherical harmonics. Quick review: the spherical harmonics of order \(l\) are \(2l+1\) irreps of O(3). That is, consider the 3d point \(x\) and the vector \(y = [Y^l_{-(2l+1)}(x)...Y^l_{(2l+1)}(x)]\). Then what happens when we rotate \(x\)? The vector transforms according to the matrix representation of the rotation.

It turns out that the tensor product of a representation is also a representation. So we take the tensor product of our input \(x\) with the value of the spherical harmonics \([Y^l_{-(2l+1)}(x)...Y^l_{(2l+1)}(x)]\). So what happens when we rotate the input \(x\)? The irreps transform according to their matrix representations. The spherical harmonics rotate according to their matrix representations. And the whole tensor product transforms according to the tensor product of both these representations.

3. Learning

One thing that was missing from above: what is the learned part? It turns out that in e3nn, the "tensor product" is actually a tensor product followed by a transformation so that the resulting representation is irreducible. (In general, the tensor product of irreps is not irreducible). When we do this transformation, we end up with many "paths" to the same irrep. We can take any linear combination of those paths and still end up with a representation.

4. sources

e3nn arxiv document