It seems like there must be some decent ways to see how different two classifiers are, but I can only think of unprincipled things.
Two ideas:
Sample a lot of items and use both models to generate two rankings of the items (or log odds or some other score). Models that give similar scores to lots of examples are probably pretty similar. One problem with this is that optimizing for it when the problem is too easy will train your model to solve the problem a random way and then invert the ordering within the classes. (A similar solution with a similar problem is judging model similarity by how similarly they respond to deleting parts of the image.)
Maybe you could split the models into two parts, which we might hope were a "feature extractor" part and a "simple classifier" part. (Potentially a reconstruction loss could be added at the split to try to encourage the features to stay feature-y, but maybe it's not too important.) Then you measure how different two models are by training a third classifier that's given access to the features from both models, and seeing by how much it outperforms the originals.
With thanks to Lee Sharkey and Michael Cohen for the conversations that lead to these ideas.
In a previous post, I talked about how we could train classifiers on the same classification problem - a set of lions vs a set of huskies - but using different approaches to classify.
What we want is something we can informally call a 'basis' - a collection of classifiers that are as independent of each other as possible, but that you can combine to generate any way of dividing those two image sets. For example, we might have a colour classifier (white vs yellow-brown), a terrain classifier (snow vs dirt), a background plant classifier, various classifiers on the animals themselves, and so on. Then, if we've done our job well, when we find any not-too-complex classifier Cn, we can say that it's something like '50 colour, 60% nose shape and −10% plant[1]'.
We shouldn't put too much weight on that analogy, but we do want our classifiers to be independent, each classifier distinct from anything you can construct with the all others.
Here are four ways we might achieve this this.
Randomised initial seeds
An easy way of getting an ensemble of classifiers is to have bunch of neural nets (or other classification methods), initialise them with different initial weights, and train them on the same sets. And/or we could train them on different subsets of the lion and husky sets.
The advantage of this method is that it's simple and easy to do - as long as we can train one classifier, we can train them all. The disadvantage is that we're relying on luck and local minima to do the job for us. In practice, I expect these methods to all converge to "white vs yellow-brown" or similar. Even if there are local minima in the classification, there's no guarantee that we'll find them all, or even any. And there's no guarantee that the local minima are very independent - 99.9% colour and 0.01% nose shape might be a local minima, but it's barely different from a colour classifier.
So theoretically, this isn't sound; in practice, it's easy to implement and play around with, so might lead to interesting insights.
Distinct internal structure
Another approach would be to insist the classifiers internal structures are distinct. For example, we could train two neural net classifiers, C1 with weights →w1 and C2 with →w2. They could be trained to minimise their individual classification losses and regularisations, while ensuring that →w1 and →w2 are distinct; so a term like −||→v1−→v2|| would be added to the loss function.
This approach has the advantage of forcing the classifier to explore a larger space, and is not restricted to finding local minima. But it's still theoretically unsatisfactory, and there's no guarantee that the classifiers will really be distinct: C1 and C2 may still end up as colour classifiers, classifying the same colour in two very different ways.
Distinct relative to another set
In the previous methods, we have defined independence relative to the classifiers themselves, not to their results. But imagine now that we had another unlabelled set of images U, consisting of, say, lots of varied animal images.
We can now get a theoretical definition of independence: C1 and C2 are independent if they give similar results on the lion-vs-husky problem, but are distinct on U.
We might imagine measuring this difference directly on U: then knowing the classification that C1 gives on any element of U, tells us nothing about what C2 would give. Or we could use U is a more semi-supervised way: from these images, we might extract features and concepts like background, fur, animal, tree, sky, etc. Then we could require that C1 and C2 classify huskies and lions using only those features; independence being enforced by the requirement that they use different features, as uncorrelated as possible.
This seems an area of promising research.
Distinct in some idealised sense
What if U was the set of all conceivable images? Then, if we applied the previous method, we'd get a "maximal" collection of classifiers, spanning all the possible ways that husky-vs-lion classifiers would be different.
I won't add anything to this section currently, as the idea is clearly intractable as stated, and there's no certainty that there is a tractable version. Still, worth keeping in mind as we develop the other methods.
The −10% meaning that it actually internally classifies the plants the wrong way round, but still separates the sets correctly, because of the strength of its colour and nose shape classifications. ↩︎