## Documentation Center |

**Package: **clustering.evaluation**Superclasses: **clustering.evaluation.ClusterCriterion

Gap criterion clustering evaluation object

`clustering.evaluation.GapEvaluation` is an
object consisting of sample data, clustering data, and gap criterion
values used to evaluate the optimal number of clusters. Create a gap
criterion clustering evaluation object using `evalclusters`.

` eva = evalclusters(x,clust,'Gap')` creates
a gap criterion clustering evaluation object.

` eva = evalclusters(x,clust,'Gap',Name,Value)` creates
a gap criterion clustering evaluation object using additional options
specified by one or more name-value pair arguments.

increaseB | Increase reference data sets |

addK | Evaluate additional numbers of clusters |

compact | Compact clustering evaluation object |

plot | Plot clustering evaluation object criterion values |

A common graphical approach to cluster evaluation involves plotting an error measurement versus several proposed numbers of clusters, and locating the "elbow" of this plot. The "elbow" occurs at the most dramatic decrease in error measurement. The gap criterion formalizes this approach by estimating the "elbow" location as the number of clusters with the largest gap value. Therefore, under the gap criterion, the optimal number of clusters occurs at the solution with the largest local or global gap value within a tolerance range.

The gap value is defined as

where *n* is
the sample size, *k* is the number of clusters being
evaluated, and *W*_{k} is the
pooled within-cluster dispersion measurement

where *n*_{r} is
the number of data points in cluster *r*, and *D*_{r} is
the sum of the pairwise distances for all points in cluster *r*.

The expected value
is
determined by Monte Carlo sampling from a reference distribution,
and `log( W_{k})` is
computed from the sample data.

The gap value is defined even for clustering solutions that contain only one cluster, and can be used with any distance metric. However, the gap criterion is more computationally expensive than other cluster evaluation criteria, because the clustering algorithm must be applied to the reference data for each proposed clustering solution.

[1] Tibshirani, R., G. Walther, and T. Hastie. "Estimating
the number of clusters in a data set via the gap statistic." *Journal
of the Royal Statistical Society: Series B*. Vol. 63, Part
2, 2001, pp. 411–423.

`clustering.evaluation.CalinskiHarabaszEvaluation` | `clustering.evaluation.DaviesBouldinEvaluation` | `clustering.evaluation.SilhouetteEvaluation` | `evalclusters` | `evalclusters`

Was this topic helpful?