apache-ignite

data-partitioning.adoc
140 строк · 9.4 Кб
Перенос по словам
1
// Licensed to the Apache Software Foundation (ASF) under one or more
2
// contributor license agreements.  See the NOTICE file distributed with
3
// this work for additional information regarding copyright ownership.
4
// The ASF licenses this file to You under the Apache License, Version 2.0
5
// (the "License"); you may not use this file except in compliance with
6
// the License.  You may obtain a copy of the License at
7
//
8
// http://www.apache.org/licenses/LICENSE-2.0
9
//
10
// Unless required by applicable law or agreed to in writing, software
11
// distributed under the License is distributed on an "AS IS" BASIS,
12
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
// See the License for the specific language governing permissions and
14
// limitations under the License.
15
= Data Partitioning
16

17
Data partitioning is a method of subdividing large sets of data into smaller chunks and distributing them between all server nodes in a balanced manner.
18

19
Partitioning is controlled by the _affinity function_.
20
The affinity function determines the mapping between keys and partitions.
21
Each partition is identified by a number from a limited set (0 to 1023 by default).
22
The set of partitions is distributed between the server nodes available at the moment.
23
Thus, each key is mapped to a specific node and is stored on that node.
24
When the number of nodes in the cluster changes, the partitions are re-distributed — through a process called <<rebalancing,rebalancing>> — between the new set of nodes.
25

26
image:images/partitioning.png[Data Partitioning]
27

28
The affinity function takes the _affinity key_ as an argument.
29
The affinity key can be any field of the objects stored in the cache (any column in the SQL table).
30
If the affinity key is not specified, the default key is used (in case of SQL tables, it is the PRIMARY KEY column).
31

32
NOTE: For more information on data partitioning, see the advanced link:https://www.gridgain.com/resources/blog/data-distribution-in-apache-ignite[deep-dive on data partitioning,window=_blank] in Ignite.
33

34
Partitioning boosts performance by distributing both read and write operations.
35
Moreover, you can design your data model in such a way that the data entries that are used together are stored together (i.e., in one partition).
36
When you request that data, only a small number of partitions is scanned.
37
This technique is called link:data-modeling/affinity-collocation[Affinity Colocation].
38

39
Partitioning helps achieve linear scalability at virtually any scale.
40
You can add more nodes to the cluster as your data set grows, and Ignite makes sure that the data is distributed "equally" among all the nodes.
41

42
== Affinity Function
43

44
The affinity function controls how data entries are mapped onto partitions and partitions onto nodes.
45
The default affinity function implements the _rendezvous hashing_ algorithm.
46
It allows a bit of discrepancy in the partition-to-node mapping (i.e., some nodes may be responsible for a slightly larger number of partitions than others).
47
However, the affinity function guarantees that when the topology changes, partitions are migrated only to the new node that joined or from the node that left.
48
No data exchange happens between the remaining nodes.
49

50

51
////////////////////////////////////////////////////////////////////////////////
52

53
TODO:
54
You can implement a custom affinity function if you want to control the way data is distributed in the cluster.
55
See the link:advanced-topics/affinity-function[Affinity Function] section in Advanced Topics.
56

57
////////////////////////////////////////////////////////////////////////////////
58

59
== Partitioned/Replicated Mode
60

61
When creating a cache or SQL table, you can choose between partitioned and replicated mode of cache operation. The two modes are designed for different use case scenarios and provide different performance and availability benefits.
62

63

64
=== PARTITIONED
65

66
In this mode, all partitions are split equally between all server nodes.
67
This mode is the most scalable distributed cache mode and allows you to store as much data as fits in the total memory (RAM and disk) available across all nodes.
68
Essentially, the more nodes you have, the more data you can store.
69

70
Unlike the `REPLICATED` mode, where updates are expensive because every node in the cluster needs to be updated, with `PARTITIONED` mode, updates become cheap because only one primary node (and optionally 1 or more backup nodes) need to be updated for every key. However, reads are somewhat more expensive because only certain nodes have the data cached.
71

72
NOTE: Partitioned caches are ideal when data sets are large and updates are frequent.
73

74
The picture below illustrates the distribution of a partitioned cache. Essentially we have key A assigned to a node running in JVM1, key B assigned to a node running in JVM3, etc.
75

76
image:images/partitioned_cache.png[]
77

78

79
===  REPLICATED
80

81
In the `REPLICATED` mode, all the data (every partition) is replicated to every node in the cluster. This cache mode provides the utmost availability of data as it is available on every node. However, every data update must be propagated to all other nodes, which can impact performance and scalability.
82

83
NOTE: Replicated caches are ideal when data sets are small and updates are infrequent.
84

85
In the diagram below, the node running in JVM1 is a primary node for key A, but it also stores backup copies for all other keys as well (B, C, D).
86

87
image:images/replicated_cache.png[]
88

89
Because the same data is stored on all cluster nodes, the size of a replicated cache is limited by the amount of memory (RAM and disk) available on the node. This mode is ideal for scenarios where cache reads are a lot more frequent than cache writes, and data sets are small. If your system does cache lookups over 80% of the time, then you should consider using the `REPLICATED` cache mode.
90

91
== Backup Partitions [[backup-partitions]]
92

93
//tag::partition-backups[]
94

95
By default, Ignite keeps a single copy of each partition (a single copy of the entire data set). In this case, if one or multiple nodes become unavailable, you lose access to partitions stored on these nodes. To avoid this, you can configure Ignite to maintain backup copies of each partition.
96

97
IMPORTANT: By default, backups are disabled.
98

99
Backup copies are configured per cache (table).
100
If you configure 2 backup copies, the cluster maintains 3 copies of each partition.
101
One of the partitions is called the _primary_ partition, and the other two are called _backup_ partitions.
102
By extension, the node that has the primary partition is called the _primary node for the keys stored in the partition_.
103
The node with backup partitions is called the _backup node_.
104

105
When a node with the primary partition for some key leaves the cluster, Ignite triggers the partition map exchange (PME) process.
106
PME labels one of the backup partitions (if they are configured) for the key as primary.
107

108
Backup partitions increase the availability of your data, and in some cases, the speed of read operations, since Ignite reads data from backed-up partitions if they are available on the local node (this is the default behavior that can be disabled. See link:configuring-caches/configuration-overview#readfrombackup[Cache Configuration] for details.). However, they also increase memory consumption or the size of the persistent storage (if enabled).
109

110
//end::partition-backups[]
111

112
////////////////////////////////////////////////////////////////////////////////
113
*TODO: draw a diagram that illustrates backup partition distribution*
114
////////////////////////////////////////////////////////////////////////////////
115

116
NOTE: Backup partitions can be configured in PARTITIONED mode only. Refer to the link:configuring-caches/configuring-backups[Configuring Partition Backups] section.
117

118
== Partition Map Exchange
119
Partition map exchange (PME) is a process of sharing information about partition distribution (partition map) across the cluster so that every node knows where to look for specific keys. PME is required whenever the partition distribution for any cache changes, for example, when new nodes are added to the topology or old nodes leave the topology (whether on user request or due to a failure).
120

121
Examples of events that trigger PME include (but are not limited to):
122

123
* A new node joins/leaves the topology.
124
* A new cache starts/stops.
125
* An index is created.
126

127
When one of the PME-triggering events occurs, the cluster waits for all ongoing transactions to complete and then starts PME. Also, during PME, new transactions are postponed until the process finishes.
128

129
The PME process works in the following way: The coordinator node requests from all nodes the information about the partitions they own. Each node sends this information to the coordinator. Once the coordinator node receives the messages from all nodes, it merges the information into a full partition map and sends it to all nodes. When the coordinator has received confirmation messages from all nodes, PME is considered completed.
130

131
== Rebalancing
132
////
133
*TODO: the information from the https://apacheignite.readme.io/docs/rebalancing[data rebalancing] page can be useful*
134
////
135

136
Refer to the link:data-rebalancing[Data Rebalancing] page for details.
137

138
== Partition Loss Policy
139

140
It may happen that throughout the cluster’s lifecycle, some of the data partitions are lost due to the failure of some primary node and backup nodes that held a copy of the partitions. Such a situation leads to a partial data loss and needs to be addressed according to your use case. For detailed information about partition loss policies, see link:configuring-caches/partition-loss-policy[Partition Loss Policy].
141

142

143
apache-ignite

Использование cookies