apache-ignite
1291 строка · 44.1 Кб
1// Licensed to the Apache Software Foundation (ASF) under one or more
2// contributor license agreements. See the NOTICE file distributed with
3// this work for additional information regarding copyright ownership.
4// The ASF licenses this file to You under the Apache License, Version 2.0
5// (the "License"); you may not use this file except in compliance with
6// the License. You may obtain a copy of the License at
7//
8// http://www.apache.org/licenses/LICENSE-2.0
9//
10// Unless required by applicable law or agreed to in writing, software
11// distributed under the License is distributed on an "AS IS" BASIS,
12// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13// See the License for the specific language governing permissions and
14// limitations under the License.
15= Control Script
16
17
18Ignite provides a command line script — `control.sh|bat` — that you can use to monitor and control your clusters.
19The script is located under the `/bin/` folder of the installation directory.
20
21The control script syntax is as follows:
22
23[tabs]
24--
25tab:Unix[]
26[source, shell]
27----
28control.sh <connection parameters> <command> <arguments>
29----
30tab:Windows[]
31[source, shell]
32----
33control.bat <connection parameters> <command> <arguments>
34----
35--
36
37== Connecting to Cluster
38
39When executed without connection parameters, the control script tries to connect to a node running on localhost (`localhost:11211`).
40If you want to connect to a node that is running on a remove machine, specify the connection parameters.
41
42[cols="2,3,1",opts="header"]
43|===
44|Parameter | Description | Default Value
45
46| --host HOST_OR_IP | The host name or IP address of the node. | `localhost`
47
48| --port PORT | The port to connect to. | `11211`
49
50| --user USER | The user name. |
51| --password PASSWORD |The user password. |
52| --ping-interval PING_INTERVAL | The ping interval. | 5000
53| --ping-timeout PING_TIMEOUT | Ping response timeout. | 30000
54| --ssl-protocol PROTOCOL1, PROTOCOL2... | A list of SSL protocols to try when connecting to the cluster. link:https://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SunJSSE_Protocols[Supported protocols,window=_blank]. | `TLS`
55| --ssl-cipher-suites CIPHER1,CIPHER2... | A list of SSL ciphers. link:https://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SupportedCipherSuites[Supported ciphers,window=_blank]. |
56| --ssl-key-algorithm ALG | The SSL key algorithm. | `SunX509`
57| --keystore-type KEYSTORE_TYPE | The keystore type. | `JKS`
58| --keystore KEYSTORE_PATH | The path to the keystore. Specify a keystore to enable SSL for the control script.|
59| --keystore-password KEYSTORE_PWD | The password to the keystore. |
60| --truststore-type TRUSTSTORE_TYPE | The type of the truststore. | `JKS`
61| --truststore TRUSTSTORE_PATH | The path to the truststore. |
62| --truststore-password TRUSTSTORE_PWD | The password to the truststore. |
63| --ssl-factory SSL_FACTORY_PATH | Custom SSL factory Spring xml file path. |
64|===
65
66
67== Activation, Deactivation and Topology Management
68
69You can use the control script to activate or deactivate your cluster, and manage the link:clustering/baseline-topology[Baseline Topology].
70
71
72=== Getting Cluster State
73
74The cluster can be in one of the three states: active, read only, or inactive. Refer to link:monitoring-metrics/cluster-states[Cluster States] for details.
75
76To get the state of the cluster, run the following command:
77
78[tabs]
79--
80tab:Unix[]
81[source,shell,subs="verbatim,quotes"]
82----
83control.sh --state
84----
85tab:Windows[]
86[source,shell,subs="verbatim,quotes"]
87----
88control.bat --state
89----
90--
91
92=== Activating Cluster
93
94Activation sets the baseline topology of the cluster to the set of nodes available at the moment of activation.
95Activation is required only if you use link:persistence/native-persistence[native persistence].
96
97To activate the cluster, run the following command:
98
99[tabs]
100--
101tab:Unix[]
102[source,shell,subs="verbatim,quotes"]
103----
104control.sh --set-state ACTIVE
105----
106tab:Windows[]
107[source,shell,subs="verbatim,quotes"]
108----
109control.bat --set-state ACTIVE
110----
111--
112
113=== Deactivating Cluster
114
115include::includes/note-on-deactivation.adoc[]
116
117To deactivate the cluster, run the following command:
118
119[tabs]
120--
121tab:Unix[]
122[source,shell,subs="verbatim,quotes"]
123----
124control.sh --set-state INACTIVE [--yes]
125----
126tab:Windows[]
127[source,shell,subs="verbatim,quotes"]
128----
129control.bat --set-state INACTIVE [--yes]
130----
131--
132
133
134
135=== Getting Nodes Registered in Baseline Topology
136
137To get the list of nodes registered in the baseline topology, run the following command:
138
139[tabs]
140--
141tab:Unix[]
142[source,shell,subs="verbatim,quotes"]
143----
144control.sh --baseline
145----
146
147tab:Windows[]
148[source,shell,subs="verbatim,quotes"]
149----
150control.bat --baseline
151----
152--
153
154The output contains the current topology version, the list of consistent IDs of the nodes included in the baseline topology, and the list of nodes that joined the cluster but were not added to the baseline topology.
155
156[source, shell]
157----
158Command [BASELINE] started
159Arguments: --baseline
160--------------------------------------------------------------------------------
161Cluster state: active
162Current topology version: 3
163
164Current topology version: 3 (Coordinator: ConsistentId=dd3d3959-4fd6-4dc2-8199-bee213b34ff1, Order=1)
165
166Baseline nodes:
167ConsistentId=7d79a1b5-cbbd-4ab5-9665-e8af0454f178, State=ONLINE, Order=2
168ConsistentId=dd3d3959-4fd6-4dc2-8199-bee213b34ff1, State=ONLINE, Order=1
169--------------------------------------------------------------------------------
170Number of baseline nodes: 2
171
172Other nodes:
173ConsistentId=30e16660-49f8-4225-9122-c1b684723e97, Order=3
174Number of other nodes: 1
175Command [BASELINE] finished with code: 0
176Control utility has completed execution at: 2019-12-24T16:53:08.392865
177Execution time: 333 ms
178----
179
180=== Adding Nodes to Baseline Topology
181
182To add a node to the baseline topology, run the command given below.
183After the node is added, the link:data-rebalancing[rebalancing process] starts.
184
185[tabs]
186--
187tab:Unix[]
188[source,shell,subs="verbatim,quotes"]
189----
190control.sh --baseline add _consistentId1,consistentId2,..._ [--yes]
191----
192tab:Windows[]
193[source,shell,subs="verbatim,quotes"]
194----
195control.bat --baseline add _consistentId1,consistentId2,..._ [--yes]
196----
197--
198
199=== Removing Nodes from Baseline Topology
200
201To remove a node from the baseline topology, use the `remove` command.
202Only offline nodes can be removed from the baseline topology: shut down the node first and then use the `remove` command.
203This operation starts the rebalancing process, which re-distributes the data across the nodes that remain in the baseline topology.
204
205[tabs]
206--
207tab:Unix[]
208[source,shell,subs="verbatim,quotes"]
209----
210control.sh --baseline remove _consistentId1,consistentId2,..._ [--yes]
211----
212tab:Windows[]
213[source,shell,subs="verbatim,quotes"]
214----
215control.bat --baseline remove _consistentId1,consistentId2,..._ [--yes]
216----
217--
218
219=== Setting Baseline Topology
220
221You can set the baseline topology by either providing a list of nodes (consistent IDs) or by specifying the desired version of the baseline topology.
222
223To set a list of node as the baseline topology, use the following command:
224
225[tabs]
226--
227tab:Unix[]
228
229[source,shell,subs="verbatim,quotes"]
230----
231control.sh --baseline set _consistentId1,consistentId2,..._ [--yes]
232----
233tab:Windows[]
234[source,shell,subs="verbatim,quotes"]
235----
236control.bat --baseline set _consistentId1,consistentId2,..._ [--yes]
237----
238--
239
240
241To restore a specific version of the baseline topology, use the following command:
242
243[tabs]
244--
245tab:Unix[]
246[source,shell,subs="verbatim,quotes"]
247----
248control.sh --baseline version _topologyVersion_ [--yes]
249----
250tab:Windows[]
251[source,shell,subs="verbatim,quotes"]
252----
253control.bat --baseline version _topologyVersion_ [--yes]
254----
255--
256
257=== Enabling Baseline Topology Autoadjustment
258
259link:clustering/baseline-topology#baseline-topology-autoadjustment[Baseline topology autoadjustment] refers to automatic update of baseline topology after the topology has been stable for a specific amount of time.
260
261For in-memory clusters, autoadjustment is enabled by default with the timeout set to 0. It means that baseline topology changes immediately after server nodes join or leave the cluster.
262For clusters with persistence, the automatic baseline adjustment is disabled by default.
263To enable it, use the following command:
264
265[tabs]
266--
267tab:Unix[]
268
269[source, shell]
270----
271control.sh --baseline auto_adjust enable timeout 30000
272----
273tab:Windows[]
274[source, shell]
275----
276control.bat --baseline auto_adjust enable timeout 30000
277----
278--
279
280The timeout is set in milliseconds. The baseline is set to the current topology when a given number of milliseconds has passed after the last JOIN/LEFT/FAIL event.
281Every new JOIN/LEFT/FAIL event restarts the timeout countdown.
282
283To disable baseline autoadjustment, use the following command:
284
285[tabs]
286--
287tab:Unix[]
288
289[source, shell]
290----
291control.sh --baseline auto_adjust disable
292----
293tab:Windows[]
294[source, shell]
295----
296control.bat --baseline auto_adjust disable
297----
298--
299
300
301== Transaction Management
302
303The control script allows you to get the information about the transactions being executed in the cluster.
304You can also cancel specific transactions.
305
306The following command returns a list of transactions that satisfy a given filter (or all transactions if no filter is provided):
307[tabs]
308--
309tab:Unix[]
310[source,shell,subs="verbatim,quotes"]
311----
312control.sh --tx _<transaction filter>_ --info
313----
314tab:Windows[]
315[source,shell,subs="verbatim,quotes"]
316----
317control.bat --tx _<transaction filter>_ --info
318----
319--
320
321The transaction filter parameters are listed in the following table.
322
323[cols="2,5",opts="header"]
324|===
325|Parameter | Description
326| --xid _XID_ | Transaction ID.
327| --min-duration _SECONDS_ | Minimum number of seconds a transaction has been executing.
328|--min-size _SIZE_ | Minimum size of a transaction
329|--label _LABEL_ | User label for transactions. You can use a regular expression.
330|--servers\|--clients | Limit the scope of the operation to either server or client nodes.
331| --nodes _nodeId1,nodeId2..._ | The list of consistent IDs of the nodes you want to get transactions from.
332|--limit _NUMBER_ | Limit the number of transactions to the given value.
333|--order DURATION\|SIZE\|START_TIME | The parameter that is used to sort the output.
334|===
335
336
337To cancel transactions, use the following command:
338
339[tabs]
340--
341tab:Unix[]
342[source,shell,subs="verbatim,quotes"]
343----
344control.sh --tx _<transaction filter>_ --kill
345----
346tab:Windows[]
347[source,shell,subs="verbatim,quotes"]
348----
349control.bat --tx _<transaction filter>_ --kill
350----
351--
352
353For example, to cancel the transactions that have been running for more than 100 seconds, execute the following command:
354
355[source, shell]
356----
357control.sh --tx --min-duration 100 --kill
358----
359
360== Contention Detection in Transactions
361
362The `contention` command detects when multiple transactions are in contention to create a lock for the same key. The command is useful if you have long-running or hanging transactions.
363
364Example:
365
366[tabs]
367--
368tab:Shell[]
369[source,shell]
370----
371# Reports all keys that are point of contention for at least 5 transactions on all cluster nodes.
372control.sh|bat --cache contention 5
373
374# Reports all keys that are point of contention for at least 5 transactions on specific server node.
375control.sh|bat --cache contention 5 f2ea-5f56-11e8-9c2d-fa7a
376----
377--
378
379If there are any highly contended keys, the utility dumps extensive information including the keys, transactions, and nodes where the contention took place.
380
381Example:
382
383[source,text]
384----
385[node=TcpDiscoveryNode [id=d9620450-eefa-4ab6-a821-644098f00001, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
386
387// No contention on node d9620450-eefa-4ab6-a821-644098f00001.
388
389[node=TcpDiscoveryNode [id=03379796-df31-4dbd-80e5-09cef5000000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
390TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=CREATE, val=UserCacheObjectImpl [val=0, hasValBytes=false], tx=GridNearTxLocal[xid=e9754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439646, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1247], other=[]]
391TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=8a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439656, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
392TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=6a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439654, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
393TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=7a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439655, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
394TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=4a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439652, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
395
396// Node 03379796-df31-4dbd-80e5-09cef5000000 is place for contention on key KeyCacheObjectImpl [part=0, val=0, hasValBytes=false].
397----
398
399
400== Monitoring Cache State
401
402One of the most important commands that `control.sh|bat` provides is `--cache list`, which is used for cache monitoring. The command provides a list of deployed caches and their affinity/distributiong parameters and distribution within cache groups. There is also a command for viewing existing atomic sequences.
403
404[source,shell]
405----
406# Displays a list of all caches
407control.sh|bat --cache list .
408
409# Displays a list of caches whose names start with "account-".
410control.sh|bat --cache list account-.*
411
412# Displays info about cache group distribution for all caches.
413control.sh|bat --cache list . --groups
414
415# Displays info about cache group distribution for the caches whose names start with "account-".
416control.sh|bat --cache list account-.* --groups
417
418# Displays info about all atomic sequences.
419control.sh|bat --cache list . --seq
420
421# Displays info about the atomic sequnces whose names start with "counter-".
422control.sh|bat --cache list counter-.* --seq
423----
424
425== Creating Caches
426
427You can use the control script to create specific caches.
428
429NOTE: The 'ignite-spring' module should be enabled.
430
431[source, shell]
432----
433control.sh|bat --cache create --springXmlConfig springXmlFilePath
434----
435
436Parameters:
437
438[cols="1,3",opts="header"]
439|===
440| Parameter | Description
441| `--springXmlConfig springXmlConfigPath` | Path to the Spring XML configuration that contains
442'org.apache.ignite.configuration.CacheConfiguration' beans to create caches from.
443|===
444
445Examples:
446[source, shell]
447----
448# Create caches from the `/ignite/config/userCaches.xml` configuration.
449control.sh|bat --cache create --springXmlConfig /ignite/config/userCaches.xml`
450----
451
452== Destroying Caches
453
454You can use the control script to destroy specific caches.
455
456[source, shell]
457----
458control.sh|bat --cache destroy --caches cache1,...,cacheN|--destroy-all-caches
459----
460
461Parameters:
462
463[cols="1,3",opts="header"]
464|===
465| Parameter | Description
466| `--caches cache1,...,cacheN`| Specifies a comma-separated list of cache names to be destroyed.
467| `--destroy-all-caches` | Permanently destroy all user-created caches.
468|===
469
470Examples:
471[source, shell]
472----
473# Destroy cache1 and cache2.
474control.sh|bat --cache destroy --caches cache1,cache2
475
476# Destroy all user-created caches.
477control.sh|bat --cache destroy --destroy-all-caches
478----
479
480== Clearing Caches
481
482You can use the control script to clear specific caches.
483
484[source, shell]
485----
486control.sh|bat --cache clear --caches cache1,...,cacheN
487----
488
489Parameters:
490
491[cols="1,3",opts="header"]
492|===
493| Parameter | Description
494| `--caches cache1,...,cacheN`| Specifies a comma-separated list of cache names to be cleared.
495|===
496
497Examples:
498[source, shell]
499----
500# Clear cache1 and cache2.
501control.sh|bat --cache clear --caches cache1,cache2
502----
503
504== Scanning Caches
505
506You can use the control script to scan cache entries.
507
508[source, shell]
509----
510control.sh|bat --cache scan cacheName [--limit N]
511----
512
513For each entry four columns will be displayed: key class, string representation of key, value class, and string representation of value.
514
515Parameters:
516
517[cols="1,3",opts="header"]
518|===
519| Parameter | Description
520| `--limit N`| Limit amount of entries to scan (default 1000).
521|===
522
523Examples:
524[source, shell]
525----
526# Query no more than 10 entries from cache "cache1"
527control.sh|bat --cache scan cache1 --limit 10
528----
529
530== Resetting Lost Partitions
531
532You can use the control script to reset lost partitions for specific caches.
533Refer to link:configuring-caches/partition-loss-policy[Partition Loss Policy] for details.
534
535[source, shell]
536----
537control.sh --cache reset_lost_partitions cacheName1,cacheName2,...
538----
539
540
541== Consistency Check and Repair Commands
542
543`control.sh|bat` includes a set of consistency check commands that enable you to verify and repair internal data consistency.
544
545First, the commands can be used for debugging and troubleshooting purposes especially if you're in active development.
546
547Second, if there is a suspicion that a query (such as a SQL query, etc.) returns an incomplete or wrong result set, the commands can verify whether there is inconsistency in the data.
548
549Third, the consistency check commands can be utilized as a part of regular cluster health monitoring.
550
551Finally, consistency can be repaired if necessary.
552
553Let's review these usage scenarios in more detail.
554
555=== Verifying Partition Checksums
556
557Even if update counters and size are equal on the primary and backup nodes, the primary and backup might diverge due to some critical failure.
558
559The `idle_verify` command compares the hash of the primary partition with that of the backup partitions and reports any differences.
560The differences might be the result of node failure or incorrect shutdown during an update operation.
561
562If any inconsistency is detected, we recommend removing the incorrect partitions or repairing the consistency using the `--consistency repair` command.
563
564[source,shell]
565----
566# Checks partitions of all caches that their partitions actually contain same data.
567control.sh|bat --cache idle_verify
568
569# Checks partitions of specific caches that their partitions actually contain same data.
570control.sh|bat --cache idle_verify cache1,cache2,cache3
571----
572
573If any partitions diverge, a list of conflict partitions is printed out, as follows:
574
575[source,text]
576----
577idle_verify check has finished, found 2 conflict partitions.
578
579Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=5]
580Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97506054, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=65957380, updateCntr=3, size=2, consistentId=bltTest0]]
581Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=6]
582Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97595430, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=66016964, updateCntr=3, size=2, consistentId=bltTest0]]
583----
584
585[WARNING]
586====
587[discrete]
588=== Cluster Should Be Idle During `idle_verify` Check
589All updates should be stopped when `idle_verify` calculates hashes, otherwise it may show false positive error results. It's impossible to compare big datasets in a distributed system if they are being constantly updated.
590====
591
592=== Repairing cache consistency
593[WARNING]
594====
595[discrete]
596=== Experimental feature
597The command may not work on some special/unique configurations or even cause a cluster/node failure.
598
599Command execution MUST be checked on the test environment using the data/configuration similar to the production before the execution on the real production environment.
600====
601
602[WARNING]
603====
604[discrete]
605=== Additional configuration required
606The command uses special link:https://ignite.apache.org/releases/{version}/javadoc/org/apache/ignite/events/EventType.html#EVT_CONSISTENCY_VIOLATION[Consistency Violation Event] to detect the consistency violations. This event must be enabled before the command execution.
607
608Please, refer to the link:events/listening-to-events#enabling-events[Enabling Events] section for details.
609====
610
611`idle_verify` command provides the inconsistent cache group names and partitions list as a result.
612The `repair` command allows performing cache consistency check and repair (when possible) using the link:key-value-api/read-repair[Read Repair] approach for every inconsistent partition found by `idle_verify`.
613
614The command uses special strategies to perform the repair. It's recommended to use `CHECK_ONLY` strategy to list inconsistent values and then choose the proper link:key-value-api/read-repair#strategies[Repair Strategy].
615
616By default, found inconsistent entries will be listed in the application log. You may change the location by configuring the logger for a special logging path for the `org.apache.ignite.internal.visor.consistency` package.
617
618By default, found inconsistent entries will be listed as is but may be masked by enabling link:logging#suppressing-sensitive-information[IGNITE_TO_STRING_INCLUDE_SENSITIVE] system property.
619
620[tabs]
621--
622tab:Unix[]
623[source,shell]
624----
625control.sh --enable-experimental --consistency repair --cache cache-name --partitions partitions --strategy strategy
626----
627tab:Window[]
628[source,shell]
629----
630control.bat --enable-experimental --consistency repair --cache cache-name --partitions partitions --strategy strategy
631----
632--
633Parameters:
634
635[cols="1,3",opts="header"]
636|===
637| Parameter | Description
638| `cache-name`| Cache (or cache group) name to be checked/repaired.
639| `partitions`| Comma separated list of cache's partitions to be checked/repaired.
640| `strategy`| See link:key-value-api/read-repair#strategies[Repair Strategies].
641|===
642
643Optional parameters:
644
645[cols="1,3",opts="header"]
646|===
647| Parameter | Description
648| `--parallel`| Allows performing check/repair in the fastest way, by parallel execution at all partition owners.
649|===
650
651=== Cache consistency check/repair operations status
652
653The command allows to check `--consistency repair` commands status.
654
655[tabs]
656--
657tab:Unix[]
658[source,shell]
659----
660control.sh --enable-experimental --consistency status
661----
662tab:Window[]
663[source,shell]
664----
665control.bat --enable-experimental --consistency status
666----
667--
668
669=== Partition update counters finalization
670
671The command allows fo finalize partition update counters after the manual repair.
672Finalization closes gaps at transactional cache partition update counters.
673
674[tabs]
675--
676tab:Unix[]
677[source,shell]
678----
679control.sh --enable-experimental --consistency finalize
680----
681tab:Window[]
682[source,shell]
683----
684control.bat --enable-experimental --consistency finalize
685----
686--
687
688=== Validating SQL Index Consistency
689The `validate_indexes` command validates the indexes of given caches on all cluster nodes.
690
691The following is checked by the validation process:
692
693. All the key-value entries that are referenced from a primary index has to be reachable from secondary SQL indexes.
694. All the key-value entries that are referenced from a primary index has to be reachable. A reference from the primary index shouldn't point to nowhere.
695. All the key-value entries that are referenced from secondary SQL indexes have to be reachable from the primary index.
696
697[tabs]
698--
699tab:Shell[]
700[source,shell]
701----
702# Checks indexes of all caches on all cluster nodes.
703control.sh|bat --cache validate_indexes
704
705# Checks indexes of specific caches on all cluster nodes.
706control.sh|bat --cache validate_indexes cache1,cache2
707
708# Checks indexes of specific caches on node with given node ID.
709control.sh|bat --cache validate_indexes cache1,cache2 f2ea-5f56-11e8-9c2d-fa7a
710----
711--
712
713If indexes refer to non-existing entries (or some entries are not indexed), errors are dumped to the output, as follows:
714
715[source,text]
716----
717PartitionKey [grpId=-528791027, grpName=persons-cache-vi, partId=0] ValidateIndexesPartitionResult [updateCntr=313, size=313, isPrimary=true, consistentId=bltTest0]
718IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
719IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=PERSON_ORGID_ASC_IDX], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
720validate_indexes has finished with errors (listed above).
721----
722
723[WARNING]
724====
725[discrete]
726=== Cluster Should Be Idle During `validate_indexes` Check
727Like `idle_verify`, index validation tool works correctly only if updates are stopped. Otherwise, there may be a race between the checker thread and the thread that updates the entry/index, which can result in a false positive error report.
728====
729
730=== Checking Snapshot Consistency
731
732The checking snapshot consistency command works the same way as the `idle_verify` command does. It compares hashes between
733a primary partition and a corresponding backup partitions and prints a report if any differences are found.
734Differences may be the result of inconsistencies in some data on the cluster from which the snapshot was taken. It is
735recommended to perform the `idle_verify` procedure on the cluster if this case occurs.
736
737The checking incremental snapshot command verifies data in WAL segments only. It checks that every transaction included into
738snapshot is fully committed on every participated node. It also calculates hashes of these transactions and committed data
739changes and compares it between nodes.
740
741[WARNING]
742====
743[discrete]
744=== The Incremental Snapshot Check verifies transactional caches only
745Please note, incremental snapshots doesn't guarantee consistency of atomic caches. It is highly recommended verifying these
746caches after restoring with the `idle_verify` command. If it is needed it's possible to repair inconsistent partitions with
747the `--consistency` command.
748====
749
750This procedure does not require the cluster to be in the `idle` state.
751
752[tabs]
753--
754tab:Shell[]
755[source,shell]
756----
757# Checks that partitions of all snapshot caches have the correct checksums and primary/backup ones actually contain the same data.
758control.(sh|bat) --snapshot check snapshot_name
759
760# Checks the transactional data included into incremental snapshots. Incremental snapshots with indices from 1 to 3 are checked.
761control.(sh|bat) --snapshot check snapshot_name --increment 3
762
763----
764--
765
766=== Check SQL Index Inline Size
767
768A running Ignite cluster could have different SQL index inline sizes on its cluster nodes.
769For example, it could happen due to the `IGNITE_MAX_INDEX_PAYLOAD_SIZE` property value is different on the cluster nodes. The difference
770between index inline sizes may lead to a performance drop.
771
772The `check_index_inline_sizes` command validates the indexes inline size of given caches on all cluster nodes. The inline
773size of secondary indexes is always checked on a node join and a WARN message is printed to the log if they differ.
774
775Use the command below to check if the secondary indexes inline sizes are the same on all cluster nodes.
776
777[tabs]
778--
779tab:Shell[]
780[source,shell]
781----
782control.sh|bat --cache check_index_inline_sizes
783----
784--
785
786If the index inline sizes are different, the console output is similar to the data below:
787
788[source,text]
789----
790Control utility [ver. 2.10.0]
7912022 Copyright(C) Apache Software Foundation
792User: test
793Time: 2021-04-27T16:13:21.213
794Command [CACHE] started
795Arguments: --cache check_index_inline_sizes --yes
796
797Found 4 secondary indexes.
7983 index(es) have different effective inline size on nodes. It can lead to
799performance degradation in SQL queries.
800Index(es):
801Full index name: PUBLIC#TEST_TABLE#L_IDX nodes:
802[ca1d23ae-89d4-4e8d-ae12-6c68f3900000] inline size: 1, nodes:
803[8327bbd1-df08-4b97-8721-de95e363e745] inline size: 2
804Full index name: PUBLIC#TEST_TABLE#S1_IDX nodes:
805[ca1d23ae-89d4-4e8d-ae12-6c68f3900000] inline size: 1, nodes:
806[8327bbd1-df08-4b97-8721-de95e363e745] inline size: 2
807Full index name: PUBLIC#TEST_TABLE#I_IDX nodes:
808[ca1d23ae-89d4-4e8d-ae12-6c68f3900000] inline size: 1, nodes:
809[8327bbd1-df08-4b97-8721-de95e363e745] inline size: 2
810----
811
812== Tracing Configuration
813
814You can enable or disable sampling of traces for a specific API by using the `--tracing-configuration` command.
815Refer to the link:monitoring-metrics/tracing[Tracing] section for details.
816
817Before using the command, enable experimental features of the control script:
818
819[source, shell]
820----
821export IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true
822----
823
824To view the current tracing configuration, execute the following command:
825
826[source, shell]
827----
828control.sh --tracing-configuration
829----
830
831To enable trace sampling for a specific API:
832
833
834[source, shell]
835----
836control.sh --tracing-configuration set --scope <scope> --sampling-rate <rate> --label <label>
837----
838
839Parameters:
840
841[cols="1,3",opts="header"]
842|===
843| Parameter | Description
844| `--scope` a| The API you want to trace:
845
846* `DISCOVERY`: discovery events
847* `EXCHANGE`: exchange events
848* `COMMUNICATION`: communication events
849* `TX`: transactions
850
851| `--sampling-rate` a| The probabilistic sampling rate, a number between `0.0` and `1.0` inclusive.
852`0` means no sampling (default), `1` means always sampling. Ex. `0.5` means every trace is sampled with the probability of 50%.
853
854| `--label` | Only applicable to the `TX` scope. The parameter defines the sampling rate for the transactions with the given label.
855When the `--label` parameter is specified, Ignite will trace transactions with the given label. You can configure different sampling rates for different labels.
856
857Transaction traces with no label will be sampled at the default sampling rate.
858The default rate for the `TX` scope can be set by using this command without the `--label` parameter.
859|===
860
861
862Examples:
863
864* Trace all discovery events:
865+
866[source, shell]
867----
868control.sh --tracing-configuration set --scope DISCOVER --sampling-rate 1
869----
870* Trace all transactions:
871+
872[source, shell]
873----
874control.sh --tracing-configuration set --scope TX --sampling-rate 1
875----
876* Trace transactions with label "report" at a 50% rate:
877+
878[source, shell]
879----
880control.sh --tracing-configuration set --scope TX --sampling-rate 0.5
881----
882
883
884
885== Cluster ID and Tag
886
887A cluster ID is a unique identifier of the cluster that is generated automatically when the cluster starts for the first time. Read link:monitoring-metrics/cluster-id[Cluster ID and Tag] for more information.
888
889To view the cluster ID, run the `--state` command:
890
891[tabs]
892--
893tab:Unix[]
894[source,shell,subs="verbatim,quotes"]
895----
896control.sh --state
897----
898tab:Windows[]
899[source,shell,subs="verbatim,quotes"]
900----
901control.bat --state
902----
903--
904
905And check the output:
906
907[source, text]
908----
909Command [STATE] started
910Arguments: --state
911--------------------------------------------------------------------------------
912Cluster ID: bf9764ea-995e-4ea9-b35d-8c6d078b0234
913Cluster tag: competent_black
914--------------------------------------------------------------------------------
915Cluster is active
916Command [STATE] finished with code: 0
917----
918
919A cluster tag is a user friendly name that you can assign to your cluster.
920To change the tag, use the following command (the tag must contain no more than 280 characters):
921
922[tabs]
923--
924tab:Unix[]
925[source,shell,subs="verbatim,quotes"]
926----
927control.sh --change-tag _<new-tag>_
928----
929tab:Windows[]
930[source,shell,subs="verbatim,quotes"]
931----
932control.bat --change-tag _<new-tag>_
933----
934--
935
936== Metric Command
937
938The metrics command prints out the value of a metric or metric registry provided in the parameters list. Use the `--node-id` parameter, If you need to get a metric from a specific node. Ignite selects a random node, if the `--node-id` is not set.
939
940[tabs]
941--
942tab:Unix[]
943[source,shell,subs="verbatim,quotes"]
944----
945control.sh --metric sys
946----
947tab:Windows[]
948[source,shell,subs="verbatim,quotes"]
949----
950control.bat --metric sys
951----
952--
953
954Example of the metric output:
955[source, text]
956control.sh --metric sysCurrentThreadCpuTime
957Command [METRIC] started
958Arguments: --metric sys
959--------------------------------------------------------------------------------
960metric value
961sys.CurrentThreadCpuTime 17270000
962Command [METRIC] finished with code: 0
963
964
965Example of the metric registry output:
966[source, text]
967control.sh --metric io.dataregion.default
968Command [METRIC] started
969Arguments: --metric sys
970--------------------------------------------------------------------------------
971metric value
972io.dataregion.default.TotalAllocatedSize 0
973io.dataregion.default.LargeEntriesPagesCount 0
974io.dataregion.default.PagesReplaced 0
975io.dataregion.default.PhysicalMemorySize 0
976io.dataregion.default.CheckpointBufferSize 0
977io.dataregion.default.PagesReplaceRate 0
978io.dataregion.default.InitialSize 268435456
979io.dataregion.default.PagesRead 0
980io.dataregion.default.AllocationRate 0
981io.dataregion.default.OffHeapSize 0
982io.dataregion.default.UsedCheckpointBufferSize 0
983io.dataregion.default.MaxSize 6871947673
984io.dataregion.default.OffheapUsedSize 0
985io.dataregion.default.EmptyDataPages 0
986io.dataregion.default.PagesFillFactor 0.0
987io.dataregion.default.DirtyPages 0
988io.dataregion.default.TotalThrottlingTime 0
989io.dataregion.default.EvictionRate 0
990io.dataregion.default.PagesWritten 0
991io.dataregion.default.TotalAllocatedPages 0
992io.dataregion.default.PagesReplaceAge 0
993io.dataregion.default.PhysicalMemoryPages 0
994Command [METRIC] finished with code: 0
995
996== Metric configure command
997
998The metrics command configure bounds of histogram metrics or rate time interval of hitrate metric.
999
1000[tabs]
1001--
1002tab:Unix[]
1003[source,shell,subs="verbatim,quotes"]
1004----
1005control.sh --metric --configure-histogram histogram-metric-name 1,2,3
1006control.sh --metric --configure-hitrate hitrate-metric-name 1000
1007----
1008tab:Windows[]
1009[source,shell,subs="verbatim,quotes"]
1010----
1011control.bat --metric --configure-histogram histogram-metric-name 1,2,3
1012control.bat --metric --configure-hitrate hitrate-metric-name 1000
1013----
1014--
1015
1016NOTE: For metric command use following format as metric name: `<register-name>.<metric-name>`. For example: `io.datastorage.WalLoggingRate` must be set for `WalLoggingRate` metric.
1017
1018== Indexes Management
1019
1020The commands below allow to get a specific information on indexes and to trigger the indexes rebuild process.
1021
1022To get the list of all indexes that match specified filters, use the command:
1023
1024[tabs]
1025--
1026tab:Unix[]
1027[source,shell]
1028----
1029control.sh --cache indexes_list [--node-id nodeId] [--group-name grpRegExp] [--cache-name cacheRegExp] [--index-name idxNameRegExp]
1030----
1031tab:Window[]
1032[source,shell]
1033----
1034control.bat --cache indexes_list [--node-id nodeId] [--group-name grpRegExp] [--cache-name cacheRegExp] [--index-name idxNameRegExp]
1035----
1036--
1037
1038Parameters:
1039
1040[cols="1,3",opts="header"]
1041|===
1042| Parameter | Description
1043| `--node-id nodeId`| Node ID for the job execution. If the ID is not specified, a node is chosen by the grid.
1044| `--group-name regExp`| Regular expression enabling filtering by cache group name.
1045| `--cache-name regExp`| Regular expression enabling filtering by cache name.
1046| `--index-name regExp`| Regular expression enabling filtering by index name.
1047|===
1048
1049To get the list of all caches that have index rebuild in progress, use the command below:
1050
1051
1052[tabs]
1053--
1054tab:Unix[]
1055[source,shell]
1056----
1057control.sh --cache indexes_rebuild_status [--node-id nodeId]
1058----
1059tab:Window[]
1060[source,shell]
1061----
1062control.bat --cache indexes_rebuild_status [--node-id nodeId]
1063----
1064--
1065
1066
1067To trigger the rebuild process of all indexes for the specified caches or the cache groups, use the command:
1068
1069[tabs]
1070--
1071tab:Unix[]
1072[source,shell]
1073----
1074control.sh --cache indexes_force_rebuild --node-ids nodeId1,...nodeIdN|--all-nodes --cache-names cacheName1,...cacheNameN|--group-names groupName1,...groupNameN
1075----
1076tab:Window[]
1077[source,shell]
1078----
1079control.bat --cache indexes_force_rebuild --node-ids nodeId1,...nodeIdN|--all-nodes --cache-names cacheName1,...cacheNameN|--group-names groupName1,...groupNameN
1080----
1081--
1082
1083Parameters:
1084
1085[cols="1,3",opts="header"]
1086|===
1087| Parameter | Description
1088| `--node-id`| Node ID for the indexes rebuild.
1089| `--cache-names`| Comma-separated list of cache names for which indexes should be rebuilt.
1090| `--group-names`| Comma-separated list of cache group names for which indexes should be rebuilt.
1091|===
1092
1093
1094== System View Command
1095
1096The system view command prints out the content of a system view provided in the parameters list. Use the `--node-id` parameter, if you need to get a metric from a specific node. Ignite selects a random node, if the `--node-id` is not set.
1097
1098[tabs]
1099--
1100tab:Unix[]
1101[source,shell,subs="verbatim,quotes"]
1102----
1103control.sh --system-view views
1104----
1105tab:Windows[]
1106[source,shell,subs="verbatim,quotes"]
1107----
1108control.bat --system-view views
1109----
1110--
1111
1112
1113Examples of the output:
1114
1115[source, text]
1116control.sh --system-view nodes
1117Command [SYSTEM-VIEW] started
1118Arguments: --system-view nodes
1119--------------------------------------------------------------------------------
1120nodeId consistentId version isClient nodeOrder addresses hostnames isLocal
1121a8a28869-cac6-4b17-946a-6f7f547b9f62 0:0:0:0:0:0:0:1%lo0,127.0.0.1,192.168.31.45:47500 2.10.0#20201230-sha1:00000000 false 1 [0:0:0:0:0:0:0:1%lo0, 127.0.0.1, 192.168.31.45] [192.168.31.45] true
1122d580433d-c621-45ff-a558-b4df82d09613 0:0:0:0:0:0:0:1%lo0,127.0.0.1,192.168.31.45:47501 2.10.0#20201230-sha1:00000000 false 2 [0:0:0:0:0:0:0:1%lo0, 127.0.0.1, 192.168.31.45] [192.168.31.45] false
1123Command [SYSTEM-VIEW] finished with code: 0
1124
1125
1126[source, text]
1127control.sh --system-view views
1128Command [SYSTEM-VIEW] started
1129Arguments: --system-view views
1130--------------------------------------------------------------------------------
1131name schema description
1132NODES SYS Cluster nodes
1133SQL_QUERIES_HISTORY SYS SQL queries history.
1134INDEXES SYS SQL indexes
1135BASELINE_NODES SYS Baseline topology nodes
1136STRIPED_THREADPOOL_QUEUE SYS Striped thread pool task queue
1137LOCAL_CACHE_GROUPS_IO SYS Local node IO statistics for cache groups
1138SCAN_QUERIES SYS Scan queries
1139CLIENT_CONNECTIONS SYS Client connections
1140PARTITION_STATES SYS Distribution of cache group partitions across cluster nodes
1141VIEW_COLUMNS SYS SQL view columns
1142SQL_QUERIES SYS Running SQL queries.
1143CACHE_GROUP_PAGE_LISTS SYS Cache group page lists
1144METRICS SYS Ignite metrics
1145CONTINUOUS_QUERIES SYS Continuous queries
1146TABLE_COLUMNS SYS SQL table columns
1147TABLES SYS SQL tables
1148DISTRIBUTED_METASTORAGE SYS Distributed metastorage data
1149SERVICES SYS Services
1150DATASTREAM_THREADPOOL_QUEUE SYS Datastream thread pool task queue
1151NODE_METRICS SYS Node metrics
1152BINARY_METADATA SYS Binary metadata
1153JOBS SYS Running compute jobs, part of compute task started on remote host.
1154SCHEMAS SYS SQL schemas
1155CACHE_GROUPS SYS Cache groups
1156VIEWS SYS SQL views
1157DATA_REGION_PAGE_LISTS SYS Data region page lists
1158NODE_ATTRIBUTES SYS Node attributes
1159TRANSACTIONS SYS Running transactions
1160CACHES SYS Caches
1161TASKS SYS Running compute tasks
1162Command [SYSTEM-VIEW] finished with code: 0
1163
1164
1165== Performance Statistics
1166
1167Ignite provides a built-in tool for cluster profiling. Read link:monitoring-metrics/performance-statistics[Performance Statistics] for more information.
1168
1169
1170[tabs]
1171--
1172tab:Unix[]
1173[source,shell]
1174----
1175control.sh --performance-statistics [start|stop|rotate|status]
1176----
1177tab:Window[]
1178[source,shell]
1179----
1180control.bat --performance-statistics [start|stop|rotate|status]
1181----
1182--
1183
1184Parameters:
1185
1186[cols="1,3",opts="header"]
1187|===
1188| Parameter | Description
1189| `start`| Start collecting performance statistics in the cluster.
1190| `stop`| Stop collecting performance statistics in the cluster.
1191| `rotate`| Rotate collecting performance statistics in the cluster.
1192| `status`| Get status of collecting performance statistics in the cluster.
1193|===
1194
1195== Working with Cluster Properties
1196
1197The `control.sh|bat` script provides an ability to work with link:SQL/sql-statistics[SQL statistics,window=_blank] functionality.
1198
1199To get the full list of available properties, use the `--property list` command. This command returns the list of all available properties to work with:
1200
1201[tabs]
1202--
1203tab:Unix[]
1204[source,shell]
1205----
1206control.sh --property list
1207----
1208tab:Windows[]
1209[source,shell]
1210----
1211control.bat --property list
1212----
1213--
1214
1215You can set property value with `--property set` command. For example, to enable or disable SQL statistics in cluster use, specify `ON`, `OFF`, or `NO_UPDATE` values:
1216
1217[tabs]
1218--
1219tab:Unix[]
1220[source,shell]
1221----
1222control.sh --property set --name 'statistics.usage.state' --val 'ON'
1223----
1224tab:Windows[]
1225[source,shell]
1226----
1227control.bat --property set --name 'statistics.usage.state' --val 'ON'
1228----
1229--
1230
1231You can also get property value with `--property get` command. For example:
1232
1233[tabs]
1234--
1235tab:Unix[]
1236[source,shell]
1237----
1238control.sh --property get --name 'statistics.usage.state'
1239----
1240tab:Windows[]
1241[source,shell]
1242----
1243control.bat --property get --name 'statistics.usage.state'
1244----
1245--
1246
1247== Manage cache metrics collection
1248
1249The command provides an ability to enable, disable or show status of cache metrics collection.
1250
1251[source, shell]
1252----
1253control.sh|bat --cache metrics enable|disable|status --caches cache1[,...,cacheN]|--all-caches
1254----
1255
1256Parameters:
1257
1258[cols="1,3",opts="header"]
1259|===
1260| Parameter | Description
1261| `--caches cache1[,...,cacheN]`| Specifies a comma-separated list of cache names to which operation should be applied.
1262| `--all-caches` | Applies operation to all user caches.
1263|===
1264
1265Examples:
1266[source, shell]
1267----
1268# Show metrics statuses for all caches:
1269control.sh|bat --cache metrics status --all-caches
1270
1271# Enable metrics collection for cache-1 and cache-2:
1272control.sh|bat --cache metrics enable --caches cache-2,cache-1
1273----
1274
1275== Rebuild index
1276
1277The `schedule_indexes_rebuild` commands Apache Ignite to rebuild indexes for specified caches or cache groups. Target caches or cache groups must be in Maintenance Mode.
1278
1279[source, shell]
1280----
1281control.sh|bat --cache schedule_indexes_rebuild --node-ids nodeId1,...nodeIdN|--all-nodes --cache-names cacheName[index1,...indexN],cacheName2,cacheName3[index1] --group-names groupName1,groupName2,...groupNameN
1282----
1283
1284Parameters:
1285
1286[cols="1,3",opts="header"]
1287|===
1288| Parameter | Description
1289|--node-id | A list of nodes to rebuild indexes on. If not specified, schedules rebuild on all nodes.
1290|--cache-names | Comma-separated list of cache names, optionally with indexes. If indexes are not specified, all indexes of the cache will be scheduled for the rebuild operation. Can be used simultaneously with cache group names.
1291|--group-names | Comma-separated list of cache group names. Can be used simultaneously with cache names.
1292