apache-ignite
248 строк · 9.2 Кб
1// Licensed to the Apache Software Foundation (ASF) under one or more
2// contributor license agreements. See the NOTICE file distributed with
3// this work for additional information regarding copyright ownership.
4// The ASF licenses this file to You under the Apache License, Version 2.0
5// (the "License"); you may not use this file except in compliance with
6// the License. You may obtain a copy of the License at
7//
8// http://www.apache.org/licenses/LICENSE-2.0
9//
10// Unless required by applicable law or agreed to in writing, software
11// distributed under the License is distributed on an "AS IS" BASIS,
12// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13// See the License for the specific language governing permissions and
14// limitations under the License.
15= Tracing
16
17:javaFile: {javaCodeDir}/Tracing.java
18
19WARNING: This feature is experimental.
20
21A number of APIs in Ignite are instrumented for tracing with OpenCensus.
22You can collect distributed traces of various tasks executed in your cluster and use this information to diagnose latency problems.
23
24We suggest you get familiar with OpenCensus tracing documentation before reading this chapter: https://opencensus.io/tracing/[^].
25
26The following Ignite APIs are instrumented for tracing:
27
28* Discovery
29* Communication
30* Exchange
31* Transactions
32* SQL queries
33
34
35To view traces, you must export them into an external system.
36You can use one of the OpenCensus exporters or write your own, but in any case, you will have to write code that registers an exporter in Ignite.
37Refer to <<Exporting Traces>> for details.
38
39
40== Configuring Tracing
41
42Enable OpenCensus tracing in the node configuration. All nodes in the cluster must use the same tracing configuration.
43
44[tabs]
45--
46tab:XML[]
47[source, xml]
48----
49include::code-snippets/xml/tracing.xml[tags=ignite-config;!discovery, indent=0]
50----
51
52tab:Java[]
53[source, java]
54----
55include::{javaFile}[tags=config, indent=0]
56----
57tab:C#/.NET[]
58
59tab:C++[unsupported]
60--
61
62
63== Enabling Trace Sampling
64
65When you start your cluster with the above configuration, Ignite does not collect traces.
66You have to enable trace sampling for a specific API at runtime.
67You can turn trace sampling on and off at will, for example, only for the period when you are troubleshooting a problem.
68
69You can do this in two ways:
70
71* via the control script from the command line
72* programmatically
73
74Traces are collected at a given probabilistic sampling rate.
75The rate is specified as a value between 0.0 and 1.0 inclusive: `0` means no sampling, `1` means always sampling.
76
77When the sampling rate is set to a value greater than 0, Ignite collects traces.
78To disable trace collection, set the sampling rate to 0.
79
80The following sections describe the two ways of enabling trace sampling.
81
82=== Using Control Script
83
84Go to the `{IGNITE_HOME}/bin` directory of your Ignite installation.
85Enable experimental commands in the control script:
86
87[source, shell]
88----
89export IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true
90----
91
92Enable tracing for a specific API:
93
94[source, shell]
95----
96./control.sh --tracing-configuration set --scope TX --sampling-rate 1
97----
98
99Refer to the link:control-script#tracing-configuration[Control Script] sections for the list of all parameters.
100
101=== Programmatically
102
103Once you start the node, you can enable trace sampling as follows:
104
105[source, java]
106----
107include::{javaFile}[tags=enable-sampling, indent=0]
108----
109
110
111The `--scope` parameter specifies the API you want to trace.
112The following APIs are instrumented for tracing:
113
114* `DISCOVERY` — discovery events
115* `EXCHANGE` — exchange events
116* `COMMUNICATION` — communication events
117* `TX` — transactions
118* `SQL` — SQL queries
119
120The `--sampling-rate` is the probabilistic sampling rate, a number between `0` and `1`:
121
122* `0` means no sampling,
123* `1` means always sampling.
124
125
126== Exporting Traces
127
128To view traces, you need to export them to an external backend using one of the available exporters.
129OpenCensus supports a number of exporters out-of-the-box, and you can write a custom one.
130Refer to the link:https://opencensus.io/exporters/[OpenCensus Exporters^] for details.
131
132In this section, we will show how to export traces to link:https://zipkin.io[Zipkin^].
133
134. Follow link:https://zipkin.io/pages/quickstart.html[this guide^] to launch Zipkin on your machine.
135. Register `ZipkinTraceExporter` in the application where you start Ignite:
136+
137--
138[source, java]
139----
140include::{javaFile}[tags=export-to-zipkin, indent=0]
141----
142--
143
144
145. Open http://localhost:9411/zipkin[^] in your browser and click the search icon.
146+
147--
148This is what a trace of the transaction looks like:
149
150image::images/trace_in_zipkin.png[]
151--
152
153== Analyzing Trace Data
154
155A trace is recorded information about the execution of a specific event.
156Each trace consists of a tree of _spans_.
157A span is an individual unit of work performed by the system in order to process the event.
158
159Because of the distributed nature of Ignite, an operation usually involves multiple nodes.
160Therefore, a trace can include spans from multiple nodes.
161Each span always contains the information about the node where the corresponding operation was executed.
162
163In the image of the transaction trace presented above, you can see that the trace contains the spans associated with the following operations:
164
165* acquire locks (`transactions.colocated.lock.map`),
166* get (`transactions.near.enlist.read`),
167* put (`transactions.near.enlist.write`),
168* commit (`transactions.commit`), and
169* close (`transactions.close`).
170
171The commit operation, in turn, consists of two operations: prepare and finish.
172
173You can click on each span to view the annotations and tags attached to it.
174
175
176image::images/span.png[Span]
177
178
179== Tracing SQL Queries
180
181To enable SQL queries tracing, use `SQL` as a value of the `scope` parameter during the link:https://ignite.apache.org/docs/latest/monitoring-metrics/tracing#enabling-trace-sampling[trace sampling configuration, window=_blank].
182If tracing of SQL queries is enabled, execution of each SQL query on any cluster node will produce a separate trace.
183
184[IMPORTANT]
185====
186[discrete]
187Enabling tracing for SQL queries imposes a severe degradation on SQL engine performance.
188====
189
190The table below provides descriptions, a list of tags, and annotations for each span that can be a part of the SQL query trace tree.
191
192[NOTE]
193====
194[discrete]
195Depending on the SQL query type and its execution plan, some spans may not be present in the SQL query span tree.
196====
197
198[cols="2,5,5",opts="header"]
199|===
200|Span Name | Description | Tags and Annotations
201| sql.query | Execution of an SQL query from the moment of registration until the used resources on the query initiator node are released a|
202* sql.query.text - SQL query text
203* sql.schema - SQL schema
204| sql.cursor.open | SQL query cursor opening |
205| sql.cursor.close | SQL query cursor closure |
206| sql.cursor.cancel | SQL query cursor cancellation |
207| sql.query.parse | Parsing of SQL query a|
208* sql.parser.cache.hit - Whether parsing of the SQL query was skipped due to the cached result
209| sql.query.execute.request | Processing of SQL query execution request a|
210* sql.query.text - SQL query text
211| sql.next.page.request | Processing of the request for obtaining the next page of local SQL query execution result |
212| sql.page.response | Processing of the message with a node local SQL query execution result page |
213| sql.query.execute | Execution of query by H2 SQL engine a|
214* sql.query.text - SQL query text
215| sql.page.prepare | Reading rows from the cursor and preparing a result page a|
216* sql.page.rows - Number of rows that a result page contains
217| sql.fail.response | Processing of a message that indicates failure of SQL query execution |
218| sql.dml.query.execute.request | Processing of SQL DML query execution request a|
219* sql.query.text - SQL query text
220| sql.dml.query.response | Processing of SQL DML query execution result by query initiator node |
221| sql.query.cancel.request | Processing of SQL query cancel request |
222| sql.iterator.open | SQL query iterator opening |
223| sql.iterator.close | SQL query iterator closure |
224| sql.page.fetch | Fetching SQL query result page a|
225* sql.page.rows - Number of rows that result page contains
226| sql.page.wait | Waiting for SQL query results page to be received from remote node |
227| sql.index.range.request | Processing SQL index range request a|
228* sql.index - SQL index name
229* sql.table - SQL table name
230* sql.index.range.rows - Number of rows that an index range request result contains
231| sql.index.range.response | Processing SQL index range responce |
232| sql.dml.query.execute | Execution of SQL DML query |
233| sql.command.query.execute | Execution of an SQL command query, which is either a DDL query or an Ignite native command |
234| sql.partitions.reserve | Reservation of data partitions used to execute a query a|
235* Annotation message that indicates reservation of data partitions for a particular cache - `Cache partitions were reserved [cache=<name of the cache>, partitions=[<partitions numbers>]`
236| sql.cache.update | Cache update as a result of SQL DML query execution a|
237* sql.cache.updates - Number of cache entries to be updated as a result of DML query
238| sql.batch.process| Processing of SQL batch update |
239|===
240
241////
242TODO: describe annotations and tags
243=== Annotations
244
245=== Tags
246
247The `node.id` and `node.consistentId` are the ID and consistent ID of the node where the root operation started.
248////
249