Home Artificial Intelligence Quantifying the Complexity and Learnability of Strategic Classification Problems References

Quantifying the Complexity and Learnability of Strategic Classification Problems References

0
Quantifying the Complexity and Learnability of Strategic Classification Problems
References

Counting Achievable Labelings: Canonical Shattering Coefficients

Verbally defining shattering coefficients seems straightforward at first glance:

Given a hypothesis class H, its nᵗʰ shattering coefficient, denoted Sₙ(H), represents the largest variety of labelings achievable by classifiers in H on a sample of n feature vectors.

But what’s a “labeling”? And what makes it “achievable”? Answering those questions will help us lay some groundwork in pursuit of a more formal definition.

Within the context of binary classification, a labeling of a sample of feature vectors is solely any certainly one of the ways we will assign values from the set { -1, 1 } to those vectors. As a quite simple example, consider two one-dimensional feature vectors (i.e., points on a number line), x₁ = 1 and x₂ = 2.

A visualization of the 4 possible labelings of the sample x₁ = 1, x₂ = 2. Red points are negatively classified, blue ones are positively classified. Image by the writer.

The possible labelings are any combination of the classification values we will assign the person feature vectors independent of each other. We are able to represent each labeling as a vector, where the primary and second coordinate represent the values assigned to x₁ and x₂, respectively. The set of possible labelings is thus { (-1, -1), (-1, 1), (1, -1), (1, 1) }. Note that a sample of size 2 yields 2² = 4 possible labelings — we’ll see how this generalizes to arbitrarily-sized samples soon.

We are saying that a labeling is achievable by a hypothesis class H if there exists a classifier hH from which that labeling may end up. Continuing with our easy example, suppose we’re limited to classifiers of the shape xk, k ∈ ℝ, that’s, one-dimensional thresholds such that anything to the best of the edge is positively classified. The labeling (1, -1) just isn’t achievable by this hypothesis class. x₂ being greater than x₁ implies that any threshold that classifies x₁ positively must do the identical for x₂. The set of achievable labelings is subsequently { (-1, -1), (-1, 1), (1, 1) }.

Examples of one-dimensional threshold classifiers that could be used to realize all but certainly one of the possible labelings of the sample x₁ = 1, x₂ = 2. Image by the writer.

Having understood the essential terminology, we will begin to develop some notation to formally express elements of the verbal definition with which we began.

We keep on with representing labelings as vectors as we did in our easy example, with each coordinate representing the classification value assigned to the corresponding feature vector. There are 2 possible labelings in total: there are two choices for every feature vector, and we will consider a labeling as a set of n such selections, each made independently of the remaining. If a hypothesis class H can achieve all possible labelings of a sample 𝒞, i.e., if the variety of achievable labelings of 𝒞 equals 2, we are saying that H shatters 𝒞ₙ.

Finally, using the notation from above, we converge on a more rigorous definition of Sₙ(H):

Consistent with our explanation of shattering, Sₙ(H) equalling 2 implies that there exists a sample of size n that’s shattered by H.

Estimating Hypothesis Class Expressiveness: Canonical VC Dimension

The Vapnik–Chervonenkis (VC) dimension is a strategy to gauge the expressive power of a hypothesis class. It’s based on the concept of shattering we just defined, and it plays a vital role in helping us determine which hypothesis classes are PAC learnable and which aren’t.

Let’s begin by attempting to intuitively define the canonical VC dimension:

Given a hypothesis class H, its VC dimension, denoted VCdim(H), is defined to be the best natural number n for which there exists a sample of size n that’s shattered by H.

Using Sₙ(H) enables us to precise this way more cleanly and succinctly:

VCdim(H) = max{ n ∈ ℕ : Sₙ(H) = 2 }

Nevertheless, this definition isn’t precise. Note that the set of numbers for which the shattering coefficient equals 2could also be infinite. (Consequently, it is feasible that VCdim(H) = ∞.) If that’s the case, the set has no well-defined maximum. We address this by taking the supremum as an alternative:

VCdim(H) = sup{ n ∈ ℕ : Sₙ(H) = 2 }

This rigorous and concise definition is the one we’ll use moving forward.

Adding Preferences to the Mix: Strategic Shattering Coefficients

Generalizing the canonical notions we just went over in order that they work in a strategic setup is fairly straightforward. Redefining shattering coefficients when it comes to the data point best response we defined within the previous article is practically all we’ll must do.

Given a hypothesis class H, a preference set R, and a value function c, the nᵗʰ shattering coefficient of Sᴛʀᴀᴄ⟨H, R, c⟩, denoted σ(H, R, c), represents the biggest variety of labelings achievable by classifiers in H on a set of n potentially-manipulated feature vectors, i.e., n data point best responses.

As a reminder, that is how we defined the information point best response:

We are able to tweak the notation we utilized in our discussion of canonical shattering coefficients to further formalize this:

The foremost difference is that every x within the sample has to have a corresponding r. Aside from that, putting the information point best response where we had x within the canonical case works easily.

As a fast sanity check, let’s consider what happens if R = { 0 }. The realized reward term 𝕀(h(z) = 1) ⋅ r might be 0 across all the information points. Maximizing utility thus becomes synonymous with minimizing cost. The most effective strategy to minimize the associated fee incurred by a knowledge point is trivial: never manipulating its feature vector.

Δ(x, r; h) finally ends up all the time just being x, placing us firmly inside the territory of canonical classification. It follows that σ(H, { 0 }, c) = Sₙ(H) for all H, c. That is consistent with our remark that the impartial preference class represented by R = { 0 } is such as canonical binary classification.

Expressiveness with Preferences: Strategic VC Dimension (SVC)

Having defined the nᵗʰ strategic shattering coefficient, we will simply swap out the Sₙ(H) within the canonical definition of the VC dimension for σ(H, R, c).

SVC(H, R, c) = sup{ n ∈ ℕ : σ(H, R, c) = 2 }

Based on the instance we considered above, we discover that SVC(H, { 0 }, c) = VCdim(H) for any H, c. Indeed, SVC is to VCdim because the strategic shattering coefficient is to its canonical equivalent: each are elegant generalizations of non-strategic concepts.

From SVC to Strategic PAC Learnability: The Fundamental Theorem of Strategic Learning

We are able to now use SVC to state the Fundamental Theorem of Strategic Learning, which relates the complexity of a strategic classification problem to its (agnostic) PAC learnability.

A strategic classification instance Sᴛʀᴀᴄ⟨H, R, c⟩ is agnostic PAC learnable if and provided that SVC(H, R, c) is finite. The sample complexity for strategic agnostic PAC learning is m(δ, ε) ≤ ⁻² ⋅ (SVC(H, R, c) + log⁡(1/δ)), with C being a continuing.

We won’t elaborate an excessive amount of on how this could be proven. Suffice it to say that it boils right down to a clever reduction to the (well-documented) Fundamental Theorem of Statistical Learning, which is basically the non-strategic version of the concept. When you’re mathematically inclined and fascinated with the nuts and bolts of the proof, you’ll find it in Appendix B of the paper.

This theorem essentially completes our generalization of classic PAC learning to a strategic classification setting. It shows that the way in which we defined SVC actually doesn’t just make sense in our heads; it actually works as a generalization of VCdim where it matters most. Armed with the Fundamental Theorem, we’re well-equipped to research strategic classification problems much as we’d any old binary classification problem. For my part, having the flexibility to find out whether a strategic problem is theoretically learnable or not is pretty incredible!

LEAVE A REPLY

Please enter your comment!
Please enter your name here