apache-ignite

random-forest.adoc
85 строк · 4.4 Кб
Перенос по словам
1
// Licensed to the Apache Software Foundation (ASF) under one or more
2
// contributor license agreements.  See the NOTICE file distributed with
3
// this work for additional information regarding copyright ownership.
4
// The ASF licenses this file to You under the Apache License, Version 2.0
5
// (the "License"); you may not use this file except in compliance with
6
// the License.  You may obtain a copy of the License at
7
//
8
// http://www.apache.org/licenses/LICENSE-2.0
9
//
10
// Unless required by applicable law or agreed to in writing, software
11
// distributed under the License is distributed on an "AS IS" BASIS,
12
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
// See the License for the specific language governing permissions and
14
// limitations under the License.
15
= Random Forest
16

17
== Random Forest in Apache Ignite
18

19
Random forest is an ensemble learning method to solve any classification and regression problem. Random forest training builds a model composition (ensemble) of one type and uses some aggregation algorithm of several answers from models. Each model is trained on a part of the training dataset. The part is defined according to bagging and feature subspace methods. More information about these concepts may be found here: https://en.wikipedia.org/wiki/Random_forest, https://en.wikipedia.org/wiki/Bootstrap_aggregating and https://en.wikipedia.org/wiki/Random_subspace_method.
20

21
There are several implementations of aggregation algorithms in Apache Ignite ML:
22

23
* `MeanValuePredictionsAggregator` - computes answer of a random forest as mean value of predictions from all models in the given composition. Often this is is used for regression tasks.
24
* `OnMajorityPredictionsAggegator` - gets a mode of predictions from all models in the given composition. This can be useful for a classification task. NOTE: This aggregator supports multi-classification tasks.
25

26

27
== Model
28

29
The random forest algorithm is implemented in Ignite ML as a special case of a model composition with specific aggregators for different problems (`MeanValuePredictionsAggregator` for regression, `OnMajorityPredictionsAggegator` for classification).
30

31
Here is an example of model usage:
32

33

34
[source, java]
35
----
36
ModelsComposition randomForest = ….
37

38
double prediction = randomForest.apply(featuresVector);
39

40
----
41

42

43
== Trainer
44

45
The random forest training algorithm is implemented with RandomForestRegressionTrainer and RandomForestClassifierTrainer trainers with the following parameters:
46

47
`meta` - features meta, list of feature type description such as:
48

49
  * `featureId` - index in features vector.
50
  * `isCategoricalFeature` - flag having true value if a feature is categorical.
51
  * `featureName`.
52

53
This meta-information is important for random forest training algorithms because it builds feature histograms and categorical features should be represented in histograms for all feature values:
54

55
  * `featuresCountSelectionStrgy` - sets strategy defining count of random features for learning one tree. There are several strategies: SQRT, LOG2, ALL and ONE_THIRD strategies implemented in the FeaturesCountSelectionStrategies class.
56
  * `maxDepth` - sets the maximum tree depth.
57
  * `minInpurityDelta` - a node in a decision tree is split into two nodes if the impurity values on these two nodes is less than the unspilt node's minImpurityDecrease value.
58
  * `subSampleSize` - value lying in the [0; MAX_DOUBLE]-interval. This parameter defines the count of sample repetitions in uniformly sampling with replacement.
59
  * `seed` - seed value used in random generators.
60

61
Random forest training may be used as follows:
62

63

64
[source, java]
65
----
66
RandomForestClassifierTrainer trainer = new RandomForestClassifierTrainer(featuresMeta)
67
  .withCountOfTrees(101)
68
  .withFeaturesCountSelectionStrgy(FeaturesCountSelectionStrategies.ONE_THIRD)
69
  .withMaxDepth(4)
70
  .withMinImpurityDelta(0.)
71
  .withSubSampleSize(0.3)
72
  .withSeed(0);
73

74
ModelsComposition rfModel = trainer.fit(
75
  ignite,
76
  dataCache,
77
  vectorizer
78
);
79
----
80

81

82

83
== Example
84

85
To see how Random Forest Classifier can be used in practice, try this https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tree/randomforest/RandomForestClassificationExample.java[example] that is available on GitHub and delivered with every Apache Ignite distribution. In this example, a Wine recognition dataset was used. Description of this dataset and data are available from the https://archive.ics.uci.edu/ml/datasets/wine[UCI Machine Learning Repository].
86
apache-ignite

Использование cookies