Vapnik-Chervonenkis Dimension

Vivek Gupta
4 min readOct 31, 2022

Lets start with some background:

Bias–variance tradeoff: High-variance methods may represent their training set well but are at risk of overfitting. In contrast, high bias typically produce simpler models that may underfit in the data.

Similarly, the big tradeoff is that the larger your hypothesis class, the better the best hypothesis models the underlying true function, but the harder it is to find that best hypothesis.

hypothesis class: It consists of all possible hypotheses that you are searching over, regardless of their form.

In Finite hypothesis, the number of examples to ensure Probably approximately correct (PAC) learning is log of size of hypothesis space.

Vladimir Vapnik and Alexey Chervonenkis wrote a paper, which allows us to estimate the sample complexity for infinite hypothesis classes.

What is infinite hypothesis class?

If we take two attributes x1 & x2 on x-y axis and my hypothesis is a straight line,

and if x1 & x2 are real valued, the number of liner functions can be infinite. Like: it could be a circle and triangle etc.

The idea of ‘VCD’ is that the size of the hypothesis class is a poor measure of how “complex” or how “expressive” the hypothesis class really is. It stated that even though the hypothesis space is infinite, as long as the sample is finite, then we can know how much the samples do we need.

What is shattering of sets?

Example: The number of classes => (n) can be labelled in 2^n ways.
If for every labelling there is a function in the hypothesis space which is consistent with labelling then we say this set of points is shattered by the hypothesis space.

The VC-dimension of a concept class ‘C’ is defined as the cardinality of the largest set which can be shattered by ‘C’. If arbitrarily large sets can be shattered by ‘C’, then the VC-dimension is said to be + infinite.

Given a collection ‘F’ of subsets of a set {S}, we say that the finite subset {A} of {S} is shattered provided that every subset {B} contained in {A} can be written as intersection of {A} with an element of ‘F’.

Thus, a set of {A} of three points in general position in the plane ‘S’ is shattered by the collection ‘F’ of half-planes, but a set of four points ‘A’ in the plane is not shattered by the collection ‘F’ of half-planes.

  • If there exists atleast one subset of {X} of size ‘d’ that can be shattered then VCD ≥ d
  • If no subset of size ‘d’ can be shattered then VCD < d

The larger the subset of {X} that can be shattered, the more expressive (and less biased) the hypothesis space is.

Now that we know VCD, it is important to note that the examples sufficient for PAC learning are defined as:

Examples for PAC learning

VCD is a much better indicator of the ability of models to fit in arbitrary data than is suggested by the number of parameters in the models.

  • Compared to the finite hypothesis => ln(H),
    since VC(H) ≤ ln(H), this can provide a tighter upper bound on the number of examples needed for PAC learning.

Lower Bounds of VCD:

Consider a concept class ‘C’ such that:

  • VC(H) > 2,
  • any learner L
  • and any 0<epsilon<1/8,
  • 0<delta<1/100

Then there is a distribution D and target concept C such that if L observes fewer than:

examples, then with the probability at least delta, L outputs an hypothesis having error greater than epsilon i.e we do not get a approximate hypothesis.

LinkedIn: https://www.linkedin.com/in/vivekg-/

Check out my legal space here: https://easylaw.quora.com

--

--