date: April 23, 2021
As shown in the module on GNN, invariant and equivariant functions are crucial for GNN. For example, the message passing GNN (MGNN) layer is defined by:
where means that nodes and are neighbors and the function should not depend on the order of the elements in the multiset . This layer is applied in parallel to all nodes (with the same function ) producing a mapping from to with where is the number of nodes in the graph (and only real hidden states are considered for simplicity). It is easy to see that is an equivariant function, i.e. permuting its input will permute its output.
Another example of invariant and equivariant functions is given by the attention layer defined for a tensor of row queries, the keys and the values, by
The queries are obtained from a tensor by and the keys and values are obtained from a tensor by and . We see that when the queries are fixed, the attention layer is invariant in the pair (keys, values):
hence is invariant in . Similarly, when the pair (keys, values) is fixed, the attention layer is equivariant in the queries:
hence is equivariant in . If , we get the self-attention layer so that is equivariant in .
In this post, we will characterize invariant and equivariant functions following the ideas given in the paper Deep Sets.
We start with some definitions.
For a vector and a permutation , we define
A function is invariant if for all and all , we have .
A function is equivariant if for all and all , we have .
We can now state our main result:
invariant case: let be a continuous function. is invariant if and only if there are continuous functions and such that
equivariant case: let be a continuous function. is equivariant if and only if there are continuous functions and such that
We give some remarks before providing the proof below. For the sake of simplicity, we consider here a fixed number of points on the unit interval . For results with a varying number of points, see On the Limitations of Representing Functions on Sets and for points in higher dimension with , see On Universal Equivariant Set Networks and Expressive Power of Invariant and Equivariant Graph Neural Networks.
Our proof will make the mapping explicit and it will not depend on the function . The mapping can be seen as an embedding of the points in in a space of high-dimension. Indeed this embedding space has to be of dimension at least the number of points in order to ensure universality. This is an important remark as in learning scenario, the size of the embedding is typically fixed and hence will limit the expressiveness of the algorithm.
Coming back to the GNN layer (1), our result on the invariant case tells us that we can always rewrite it as:
and the dimension of the embedding needs to be of the same order as the maximum degree in the graph. Note that (8) is not of the form of (7) as the sum inside the function is taken only on neighbors. Indeed, we know that message passing GNN are not universal (see Expressive Power of Invariant and Equivariant Graph Neural Networks).
As a last remark, note that the original PointNet architecture is of the form which is not universal equivariant. Indeed, it is impossible to approximate the equivariant function as shown below (we denote ):
and these quantities cannot be small together. Hence PointNet is not universal equivariant but as shown in On Universal Equivariant Set Networks, modifying PointNet by adding the term inside the function as in (7) makes it universal equivariant. We refer to Are Transformers universal approximators of sequence-to-sequence functions? for similar results about transformers based on self-attention.
We first show that the equivariant case is not more difficult than the invariant case. Assume that we proved the invariant case. Consider a permutation such that so that gives for the first component:
For any , the mapping is invariant. Hence by (6), we have
Now consider a permutation such that and for , then we have
hence and (7) follows.
Hence, we only need to prove (6) and follow the proof given in Deep Sets. We start with a crucial result stating that a set of real points is characterized by the first moments of its empirical measure. Let see what it means for : we can recover the values of and from the quantities and . To see that this is correct, note that
so that . As a result, we have
and clearly and can be recovered as the roots of this polynomial whose coefficients are functions of and . The result below extends this argument for a general :
Let , where , be defined by
is injective and has a continuous inverse mapping.
From Newton's identities, we have for ,
so that, we can express the elementary symmetric polynomials from the power sums:
Note that and since
if then so that and , showing that is injective.
Hence we proved that where is the image of , is a bijection. We need now to prove that is continuous and we'll prove it directly. Let , we need to show that . Now if , since is compact, this means that there exists a convergent subsequence of with . But by continuity of , we have , so that we get a contradiction and hence proved the continuity of , finishing the proof of the proposition.
We are now ready to prove (6). Let be defined by and . Note that and , where is the vector with components sorted in non-decreasing order. Hence as soon as f is invariant, we have so that (6) is valid. We only need to extend the function from the domain to in a continuous way. This can be done by considering the projection on the compact and define .
Follow on twitter!