apache-ignite

gaussian-mixture.adoc
70 строк · 3.1 Кб
Перенос по словам
1
// Licensed to the Apache Software Foundation (ASF) under one or more
2
// contributor license agreements.  See the NOTICE file distributed with
3
// this work for additional information regarding copyright ownership.
4
// The ASF licenses this file to You under the Apache License, Version 2.0
5
// (the "License"); you may not use this file except in compliance with
6
// the License.  You may obtain a copy of the License at
7
//
8
// http://www.apache.org/licenses/LICENSE-2.0
9
//
10
// Unless required by applicable law or agreed to in writing, software
11
// distributed under the License is distributed on an "AS IS" BASIS,
12
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
// See the License for the specific language governing permissions and
14
// limitations under the License.
15
= Gaussian mixture (GMM)
16

17
A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.
18

19
NOTE: You could think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians.
20

21
== Model
22

23
This algorithm represents a soft clustering model where each cluster is a Gaussian distribution with its own mean value and covariation matrix. Such a model can predict a cluster using the maximum likelihood principle.
24

25
It defines the labels by the following way:
26

27

28
[source, java]
29
----
30
KMeansModel mdl = trainer.fit(
31
    ignite,
32
    dataCache,
33
    vectorizer
34
);
35

36
double clusterLabel = mdl.predict(inputVector);
37
----
38

39

40
== Trainer
41

42

43
GMM is a unsupervised learning algorithm. The GaussianMixture object implements the expectation-maximization (EM) algorithm for fitting mixture-of-Gaussian models. It can compute the Bayesian Information Criterion to assess the number of clusters in the data.
44

45
Presently, Ignite ML supports a few parameters for the GMM classification algorithm:
46

47
* `maxCountOfClusters ` - the number of possible clusters
48
* `maxCountOfIterations ` - one stop criteria (the other one is epsilon)
49
* `epsilon` - delta of convergence(delta between old and new centroid's values)
50
* `countOfComponents` - the number of components
51
* `maxLikelihoodDivergence` - maximum divergence between maximum of likelihood of vector in dataset and other for anomalies identification
52
* `minElementsForNewCluster` - minimum required anomalies in terms of maxLikelihoodDivergence for creating new cluster
53
* `minClusterProbability` - minimum cluster probability
54

55

56
[source, java]
57
----
58
// Set up the trainer
59
GmmTrainer trainer = new GmmTrainer(COUNT_OF_COMPONENTS);
60

61
// Build the model
62
GmmModel mdl = trainer
63
    .withMaxCountIterations(MAX_COUNT_ITERATIONS)
64
    .withMaxCountOfClusters(MAX_AMOUNT_OF_CLUSTERS)
65
    .fit(ignite, dataCache, vectorizer);
66
----
67

68
== Example
69

70
To see how GMM clustering can be used in practice, try this https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/clustering/GmmClusterizationExample.java[example] that is available on GitHub and delivered with every Apache Ignite distribution.
71

72
apache-ignite

Использование cookies