apache-ignite

machine-learning.adoc
139 строк · 8.3 Кб
Перенос по словам
1
// Licensed to the Apache Software Foundation (ASF) under one or more
2
// contributor license agreements.  See the NOTICE file distributed with
3
// this work for additional information regarding copyright ownership.
4
// The ASF licenses this file to You under the Apache License, Version 2.0
5
// (the "License"); you may not use this file except in compliance with
6
// the License.  You may obtain a copy of the License at
7
//
8
// http://www.apache.org/licenses/LICENSE-2.0
9
//
10
// Unless required by applicable law or agreed to in writing, software
11
// distributed under the License is distributed on an "AS IS" BASIS,
12
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
// See the License for the specific language governing permissions and
14
// limitations under the License.
15
= Machine Learning
16

17
== Overview
18

19
Apache Ignite Machine Learning (ML) is a set of simple, scalable and efficient tools that allow the building of predictive Machine Learning models without costly data transfers.
20

21
The rationale for adding machine and deep learning (DL) to Apache Ignite is quite simple. Today's data scientists have to deal with two major factors that keep ML from mainstream adoption:
22

23
* First, the models are trained and deployed (after the training is over) in different systems. The data scientists have to wait for ETL or some other data transfer process to move the data into a system like Apache Mahout or Apache Spark for a training purpose. Then they have to wait while this process completes and redeploy the models in a production environment. The whole process can take hours moving terabytes of data from one system to another. Moreover, the training part usually happens over the old data set.
24

25
* The second factor is related to scalability. ML and DL algorithms that have to process data sets which no longer fit within a single server unit are constantly growing. This urges the data scientist to come up with sophisticated solutions or turn to distributed computing platforms such as Apache Spark and TensorFlow. However, those platforms mostly solve only a part of the puzzle which is the model training, making it a burden of the developers to decide how do deploy the models in production later.
26

27

28
image::images/machine_learning.png[]
29

30

31
=== Zero ETL and Massive Scalability
32

33
Ignite Machine Learning relies on Ignite's memory-centric storage that brings massive scalability for ML and DL tasks and eliminates the wait imposed by ETL between the different systems. For instance, it allows users to run ML/DL training and inference directly on data stored across memory and disk in an Ignite cluster. Next, Ignite provides a host of ML and DL algorithms that are optimized for Ignite's collocated distributed processing. These implementations deliver in-memory speed and unlimited horizontal scalability when running in place against massive data sets or incrementally against incoming data streams, without requiring the data to be moved into another store. By eliminating the data movement and the long processing wait times, Ignite Machine learning enables continuous learning that can improve decisions based on the latest data as it arrives in real-time.
34

35

36
=== Fault Tolerance and Continuous Learning
37

38
Apache Ignite Machine Learning is tolerant to node failures. This means that in the case of node failures during the learning process, all recovery procedures will be transparent to the user, learning processes won't be interrupted, and we will get results in the time similar to the case when all nodes work fine. For more information please see link:machine-learning/partition-based-dataset[Partition Based Dataset].
39

40

41
== Algorithms and Applicability
42

43
=== Classification
44

45
Identifying to which category a new observation belongs, on the basis of a training set.
46

47
*Applicability:* spam detection, image recognition, credit scoring, disease identification.
48

49
*Algorithms:* link:machine-learning/binary-classification/logistic-regression[Logistic Regression], link:machine-learning/binary-classification/linear-svm[Linear SVM (Support Vector Machine)], link:machine-learning/binary-classification/knn-classification[k-NN Classification], link:machine-learning/binary-classification/naive-bayes[Naive Bayes], link:machine-learning/binary-classification/decision-trees[Decision Trees], link:machine-learning/binary-classification/random-forest[Random Forest], link:machine-learning/binary-classification/multilayer-perceptron[Multilayer perceptron], link:machine-learning/ensemble-methods/gradient-boosting[Gradient Boosting], link:machine-learning/binary-classification/ann[ANN (Approximate Nearest Neighbor)].
50

51

52
=== Regression
53

54
Modeling the relationship between a scalar dependent variable (y) and one or more explanatory variables or independent variables (x).
55

56

57
*Applicability:* drug response, stock prices, supermarket revenue.
58

59
*Algorithms:* Linear Regression, Decision Trees Regression, k-NN Regression.
60

61
=== Clustering
62

63
Grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
64

65
*Applicability:* customer segmentation, grouping experiment outcomes, grouping of shopping items.
66

67
*Algorithms:* K-Means Clustering, Gaussian mixture (GMM).
68

69
=== Recommendation
70

71
Building a recommendation system, which is a subclass of information filtering systems that seeks to predict the "rating" or "preference" a user would give to an item.
72

73
*Applicability:*  playlist generators for video and music services, product recommenders for services
74

75
*Algorithms:* link:machine-learning/recommendation-systems[Matrix Factorization].
76

77
=== Preprocessing
78

79
Feature extraction and normalization.
80

81
*Applicability:* transform input data such as text for use with machine learning algorithms, to extract features we need to fit on, to normalize input data.
82

83
*Algorithms:* Apache Ignite ML supports custom preprocessing using partition based dataset capabilities and has default link:machine-learning/preprocessing[preprocessors] such as normalization preprocessor, one-hot-encoder, min-max scaler and so on.
84

85

86
== Getting Started
87

88
The fastest way to get started with the Machine Learning is to build and run existing examples, study their output and keep coding. The ML examples are located in the https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml[examples] folder of every Apache Ignite distribution.
89

90
Follow the steps below to try out the examples:
91

92
. Download Apache Ignite version 2.8 or later.
93
. Open the `examples` project in an IDE, such as IntelliJ IDEA or Eclipse.
94
. Go to the `src/main/java/org/apache/ignite/examples/ml` folder in the IDE and run an ML example.
95

96
The examples do not require any special configuration. All ML  examples will launch, run and stop successfully without any user intervention and provide meaningful output on the console. Additionally, the Tracer API example will launch a web browser and generate HTML output.
97

98
=== Get it With Maven
99

100
Add the Maven dependency below to your project in order to include the ML functionality provided by Ignite:
101

102
[source, xml]
103
----
104
<dependency>
105
    <groupId>org.apache.ignite</groupId>
106
    <artifactId>ignite-ml</artifactId>
107
    <version>${ignite.version}</version>
108
</dependency>
109

110
----
111

112

113
Replace `${ignite-version}` with an actual Ignite version.
114

115
=== Build From Sources
116

117
The latest Apache Ignite Machine Learning jar is always uploaded to the Maven repository. If you need to take the jar and deploy it in a custom environment, then it can be either downloaded from Maven or built from scratch. To build the Machine Learning component from sources:
118

119
1. Download the latest Apache Ignite source release.
120
2. Clean the local Maven repository (this is to ensure that older Maven builds don’t impact the build).
121
3. Build and install Apache Ignite from the project's root directory:
122
+
123
[source, shell]
124
----
125
./mvnw clean install -DskipTests -Dmaven.javadoc.skip=true
126
----
127

128
4. Locate the Machine Learning jar in your local Maven repository under the path `{user_dir}/.m2/repository/org/apache/ignite/ignite-ml/{ignite-version}/ignite-ml-{ignite-version}.jar`.
129

130
5. If you want to build ML or DL examples from sources, execute the following commands:
131
+
132
[source, shell]
133
----
134
cd examples
135
mvn clean package -DskipTests
136
----
137

138

139
If needed, refer to `DEVNOTES.txt` in the project's root folder and the `README` files in the `ignite-ml` component for more details.
140
apache-ignite

Использование cookies