Seems like the easiest way to satisfy that definition would be to:
My troll example is a fully connected network with all zero weights and biases, no skip connections.
This isn't something that you'd reach in regular training, since networks are initialized away from zero to avoid this. But it does exhibit a basic ingredient in controlling the gradient flow.
To look for a true hacker I'd try to reconfigure the way the downstream computation works (by modifying attention weights, saturating relus, or similar) based on some model of the inputs, in a way that pushes around where the gradients go.
This is a relatively clean subproblem that we came upon a few months ago while thinking about gradient hacking. We're throwing it out to the world to see if anyone can make progress.
Problem: Construct a gradient hacker (definition below), or prove that one cannot exist under the given conditions.
Setup: Let x∈Rd be the network inputs, θ∈Rw, be the network trainable parameters, ^y∈R be the network outputs, and f(x,θ)=^y be the network architecture. f(x,θ) must be a feedforward neural network that reads in its own weights as input. Weight sharing is allowed (vertical or horizontal), and some weight sharing is necessary in order to take all w weights as input.
Suppose that we train this network on a dataset D={x(i),y(i)}, with MSE loss L(θ)=1n∑i(f(x(i),θ)−y(i))2, assuming gradient flows: θ′(t)=−∇θL(θ(t)). Given an initial point θ0, the training dynamics are therefore:
θ(t)=θ0−∫t0∇θL(θ(x))dxDefinition: A tuple (f,θg,D) is a gradient hacker if the following conditions hold:
This captures my intuition that a gradient hacker knows where it wants to go (in this case "get to θg"), and then it should decide what it outputs in order to make the gradient true.
Some more ambitious problems (if gradient hackers exist):
This is a bit strict, but we didn't want to list different ways something could be isomorphic to the gradient.