apache-ignite

Форк
0
1067 строк · 32.9 Кб
1
// Licensed to the Apache Software Foundation (ASF) under one or more
2
// contributor license agreements.  See the NOTICE file distributed with
3
// this work for additional information regarding copyright ownership.
4
// The ASF licenses this file to You under the Apache License, Version 2.0
5
// (the "License"); you may not use this file except in compliance with
6
// the License.  You may obtain a copy of the License at
7
//
8
// http://www.apache.org/licenses/LICENSE-2.0
9
//
10
// Unless required by applicable law or agreed to in writing, software
11
// distributed under the License is distributed on an "AS IS" BASIS,
12
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
// See the License for the specific language governing permissions and
14
// limitations under the License.
15
= Data Format
16

17
Standard data types are represented as a combination of type code and value.
18

19
:table_opts: cols="1,1,4",opts="header"
20

21
[{table_opts}]
22
|===
23
|Field |  Size in bytes |  Description
24
|`type_code` |  1 |   Signed one-byte integer code that indicates the type of the value.
25
|`value` |  Variable|    Value itself. Its format and size depends on the type_code
26
|===
27

28

29
Below you can find description of the supported types and their format.
30

31

32
== Primitives
33

34
Primitives are the very basic types, such as numbers.
35

36

37
=== Byte
38
[{table_opts}]
39
|===
40
| Field  | Size in bytes  | Description
41
|Type |   1|   1
42
|Value  | 1  | Single byte value.
43

44
|===
45

46
=== Short
47

48
Type code: 2;
49

50
2-bytes long signed integer number. Little-endian.
51

52
Structure:
53

54

55
[{table_opts}]
56
|===
57
| Field |   Size in bytes | Description
58
| `Value`  |  2|   The value.
59
|===
60

61

62
=== Int
63

64
Type code: 3;
65

66
4-bytes long signed integer number. Little-endian.
67

68
Structure:
69

70
[{table_opts}]
71
|===
72
|Field|   Size in bytes|   Description
73
|`value`|   4|   The value.
74
|===
75

76
=== Long
77

78
Type code: 4;
79

80
8-bytes long signed integer number. Little-endian.
81

82
Structure:
83

84

85
[{table_opts}]
86
|===
87
|Field|   Size in bytes |  Description
88
|`value` |   8  | The value.
89
|===
90

91

92
=== Float
93

94
Type code: 5;
95

96
4-byte long IEEE 754 floating-point number. Little-endian.
97

98
Structure:
99

100
[{table_opts}]
101
|===
102
|Field |   Size in bytes|   Description
103
| value|   4|   The value.
104
|===
105

106
=== Double
107
Type code: 6;
108

109
8-byte long IEEE 754 floating-point number. Little-endian.
110

111
Structure:
112

113
[{table_opts}]
114
|===
115
|Field|   Size in bytes|   Description
116
|value  | 8|   The value.
117

118
|===
119

120
=== Char
121
Type code: 7;
122

123
Single UTF-16 code unit. Little-endian.
124

125
Structure:
126

127
[{table_opts}]
128
|===
129
|Field|   Size in bytes|   Description
130
|value |   2 |   The UTF-16 code unit in little-endian.
131
|===
132

133

134
=== Bool
135

136
Type code: 8;
137

138
Boolean value. Zero for false and non-zero for true.
139

140
Structure:
141

142
[{table_opts}]
143
|===
144
|Field |   Size in bytes |   Description
145

146
|value |  1 |  The value. Zero for false and non-zero for true.
147

148
|===
149

150
=== NULL
151

152
Type code: 101;
153

154
This is not exactly a type. It's just a null value, which can be assigned to object of any type.
155
Has no payload, only consists of the type code.
156

157
== Standard objects
158

159
=== String
160

161
Type code: 9;
162

163
String in UTF-8 encoding. Should always be a valid UTF-8 string.
164

165
Structure:
166

167
[{table_opts}]
168
|===
169
|Field |   Size in bytes |   Description
170
|length|  4|   Signed integer number in little-endian. Length of the string in UTF-8 code units, i.e. in bytes.
171
| data |    length |  String data in UTF-8 encoding. Without BOM.
172

173
|===
174

175
=== UUID (Guid)
176

177

178
Type code: 10;
179

180
A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems.
181

182
Structure:
183

184
[{table_opts}]
185
|===
186
|Field|   Size in bytes|   Description
187
|most_significant_bits|   8|   64-bit number in little endian, representing 64 most significant bits of UUID.
188
|least_significant_bits|  8|   64-bit number in little endian, representing 64 least significant bits of UUID.
189

190
|===
191

192
=== Timestamp
193

194
Type code: 33;
195

196
More precise than a Date data type. Except for a milliseconds since epoch, contains a nanoseconds fraction of a last millisecond, which value could be in a range from 0 to 999999. It means, the full time stamp in nanoseconds can be obtained with the following expression: `msecs_since_epoch \* 1000000 + msec_fraction_in_nsecs`.
197

198
NOTE: The nanoseconds time stamp evaluation expression is provided for clarification purposes only. One should not use the expression in production code, as in some languages the expression may result in integer number overflow.
199

200
Structure:
201

202
[{table_opts}]
203
|===
204
|Field|   Size in bytes  | Description
205
|`msecs_since_epoch`|   8|   Signed integer number in little-endian. Number of milliseconds elapsed since 00:00:00 1 Jan 1970 UTC. This format widely known as a Unix or POSIX time.
206
|`msec_fraction_in_nsecs`|  4|   Signed integer number in little-endian. Nanosecond fraction of a millisecond.
207

208
|===
209

210
=== Date
211

212
Type code: 11;
213

214
Date, represented as a number of milliseconds elapsed since 00:00:00 1 Jan 1970 UTC. This format widely known as a Unix or POSIX time.
215

216
Structure:
217

218
[{table_opts}]
219
|===
220
|Field|   Size in bytes|   Description
221
|`msecs_since_epoch`|   8|   The value. Signed integer number in little-endian.
222
|===
223

224
=== Time
225

226
Type code: 36;
227

228
Time, represented as a number of milliseconds elapsed since midnight, i.e. 00:00:00 UTC.
229

230
Structure:
231

232
[{table_opts}]
233
|===
234
|Field|   Size in bytes|   Description
235
|value|   8|   Signed integer number in little-endian. Number of milliseconds elapsed since 00:00:00 UTC.
236

237
|===
238

239
=== Decimal
240

241
Type code: 30;
242

243
Numeric value of any desired precision and scale.
244

245
Structure:
246

247
[{table_opts}]
248
|===
249
|Field |   Size in bytes|   Description
250
|scale|   4|   Signed integer number in little-endian. Effectively, a power of the ten, on which the unscaled value should be divided. For example, 42 with scale 3 is 0.042, 42 with scale -3 is 42000, and 42 with scale 1 is 42.
251
|length|  4|   Signed integer number in little-endian. Length of the number in bytes.
252
|data|    length|  First bit is the flag of negativity. If it's set to 1, then value is negative. Other bits form signed integer number of variable length in big-endian format.
253

254
|===
255

256
=== Enum
257

258
Type code: 28;
259

260
Value of an enumerable type. For such types defined only a finite number of named values.
261

262
Structure:
263

264
[{table_opts}]
265
|===
266
|Field|   Size in bytes|   Description
267
|type_id| 4|   Signed integer number in little-endian. See <<Type ID>> for details.
268
|ordinal| 4|   Signed integer number stored in little-endian. Enumeration value ordinal . Its position in its enum declaration, where the initial constant is assigned an ordinal of zero.
269

270
|===
271

272
== Arrays of primitives
273

274
Arrays of this kind only contain payloads of values as elements. They all have similar format. See format description in a table below for details. Pay attention that array only contains payloads, not type codes.
275

276

277
[{table_opts}]
278
|===
279
|Field|   Size in bytes|   Description
280
|`length`|  4|   Signed integer number. Number of elements in the array.
281
|`element_0_payload`|   Depends on the type.|    Payload of the value 0.
282
|`element_1_payload`|   Depends on the type.|    Payload of the value 1.
283
|... |... |...
284
|`element_N_payload`|   Depends on the type. |   Payload of the value N.
285

286
|===
287

288
=== Byte array
289

290
Type code: 12;
291

292
Array of bytes. May be either a piece of raw data, or array of small signed integer numbers.
293

294
Structure:
295

296
[{table_opts}]
297
|===
298
|Field|   Size in bytes|   Description
299
|length|  4|   Signed integer number. Number of elements in the array.
300
|elements|    length|  Elements sequence. Every element is a payload of type "byte".
301

302
|===
303

304
Short array
305

306
Type code: 13;
307

308
Array of short signed integer numbers.
309

310
Structure:
311

312
[{table_opts}]
313
|===
314
|Field |   Size in bytes|   Description
315
|length|  4|   Signed integer number. Number of elements in the array.
316
|elements|    `length * 2`|  Elements sequence. Every element is a payload of type "short".
317

318
|===
319

320
=== Int array
321

322
Type code: 14;
323

324
Array of signed integer numbers.
325

326
Structure:
327

328
[{table_opts}]
329
|===
330
|Field|   Size in bytes|   Description
331
|length|  4|   Signed integer number. Number of elements in the array.
332
|elements|    `length * 4`|  Elements sequence. Every element is a payload of type "int".
333

334
|===
335

336
=== Long array
337

338
Type code: 15;
339

340
Array of long signed integer numbers.
341

342
Structure:
343

344
[{table_opts}]
345
|===
346
|Field|   Size in bytes|   Description
347
|length|  4|   Signed integer number. Number of elements in the array.
348
|elements|    `length * 8`|  Elements sequence. Every element is a payload of type "long".
349

350
|===
351

352
=== Float array
353

354
Type code: 16;
355

356
Array of floating point numbers.
357

358
Structure:
359

360
[{table_opts}]
361
|===
362
|Field|   Size in bytes|   Description
363
|length|  4|   Signed integer number. Number of elements in the array.
364
|elements|    `length * 4` | Elements sequence. Every element is a payload of type "float".
365

366
|===
367

368
=== Double array
369

370
Type code: 17;
371

372
Array of floating point numbers with double precision.
373

374
Structure:
375

376
[{table_opts}]
377
|===
378
|Field|   Size in bytes |  Description
379
|length|  4|   Signed integer number. Number of elements in the array.
380
|elements|    `length * 8`|  Elements sequence. Every element is a payload of type "double".
381

382
|===
383

384
=== Char array
385

386
Type code: 18;
387

388
Array of UTF-16 code units. Unlike string, this type is not necessary contains valid UTF-16 text.
389

390
Structure:
391

392
[{table_opts}]
393
|===
394
|Field |   Size in bytes|   Description
395
|length | 4|   Signed integer number. Number of elements in the array.
396
|elements|    length * 2|  Elements sequence. Every element is a payload of type "char".
397

398
|===
399

400
=== Bool array
401

402
Type code: 19;
403

404
Array of boolean values.
405

406
Structure:
407

408
[{table_opts}]
409
|===
410
|Field|   Size in bytes |  Description
411
|length|  4|   Signed integer number. Number of elements in the array.
412
|elements|    length|  Elements sequence. Every element is a payload of type "bool".
413

414
|===
415

416
== Arrays of standard objects
417

418
Arrays of this kind contain full values as elements. It means, their elements contain type code as well as payload. This format allows for elements of such collections to be NULL values. That's why they are called "objects". They all have similar format. See format description in a table below for details.
419

420

421
[{table_opts}]
422
|===
423
|Field|   Size in bytes|   Description
424
|`length` | 4|   Signed integer number.  Number of elements in the array.
425
|`element_0_full_value`|    Depends on value type.|  Full value of the element 0. Contains of type code and payload. Also, can be NULL.
426
|`element_1_full_value`|    Depends on value type.|  Full value of the element 1 or NULL.
427
|... |...| ...
428
|`element_N_full_value`|    Depends on value type.|  Full value of the element N or NULL.
429

430
|===
431

432
=== String array
433

434
Type code: 20;
435

436
Array of UTF-8 string values.
437

438
Structure:
439

440

441
[{table_opts}]
442
|===
443
|Field |   Size in bytes|   Description
444
|length|  4|   Signed integer number. Number of elements in the array.
445
|elements|    Variable. Depends on every string length. Every element size is either `5 + value_length` for string, or 1 for `NULL`.|  Elements sequence. Every element is a full value of type "string", including type code, or `NULL`.
446

447
|===
448

449
=== UUID (Guid) array
450

451
Type code: 21;
452

453
Array of UUIDs (Guids).
454

455
Structure:
456

457

458
[{table_opts}]
459
|===
460
|Field|   Size in bytes|   Description
461
|length|  4|   Signed integer number. Number of elements in the array.
462
|elements|    Variable. Every element size is either 17 for UUID, or 1 for NULL.|  Elements sequence. Every element is a full value of type "UUID", including type code, or NULL.
463

464
|===
465

466
=== Timestamp array
467

468
Type code: 34;
469

470
Array of timestamp values.
471

472
Structure:
473

474
[{table_opts}]
475
|===
476
|Field|   Size in bytes |  Description
477
|length|  4|   Signed integer number. Number of elements in the array.
478
|elements|    Variable. Every element size is either 13 for Timestamp, or 1 for NULL.| Elements sequence. Every element is a full value of type "timestamp", including type code, or NULL.
479

480
|===
481

482
=== Date array
483

484
Type code: 22;
485

486
Array of dates.
487

488
Structure:
489

490
[{table_opts}]
491
|===
492
|Field|   Size in bytes|   Description
493
|length|  4|   Signed integer number. Number of elements in the array.
494
|elements|    Variable. Every element size is either 9 for Date, or 1 for NULL.|   Elements sequence. Every element is a full value of type "date", including type code, or NULL.
495

496
|===
497

498
=== Time array
499

500
Type code: 37;
501

502
Array of time values.
503

504
Structure:
505

506
[{table_opts}]
507
|===
508
|Field |   Size in bytes|   Description
509
|length|  4|   Signed integer number. Number of elements in the array.
510
|elements   | Variable. Every element size is either 9 for Time, or 1 for NULL.|   Elements sequence. Every element is a full value of type "time", including type code, or NULL.
511

512
|===
513

514
=== Decimal array
515

516
Type code: 31;
517

518
Array of decimal values.
519

520
Structure:
521

522
[{table_opts}]
523
|===
524
|Field|   Size in bytes|   Description
525
|length|  4|   Signed integer number. Number of elements in the array.
526
|elements|    Variable. Every element size is either `9 + value_length` for Decimal, or 1 for NULL.| Elements sequence. Every element is a full value of type "decimal", including type code, or NULL.
527

528
|===
529

530
== Object collections
531

532
=== Object array
533

534
Type code: 23;
535

536
Array of objects of any type. Can contain objects of any type. This includes standard objects of any type, as well as complex objects of various types, NULL values and any combinations of them. This also means, that collections may contain other collections.
537

538
Structure:
539

540
[{table_opts}]
541
|===
542
|Field|   Size in bytes|   Description
543
|type_id |4|   Type identifier of the contained objects. For example, in Java this type is used to de-serialize to a Type[]. Obviously, all values in array should have Type as a parent. It is parent type of any object type. For example, in Java this always can be java.lang.Object. Type ID for such "root" object type is -1. See <<Type ID>> for details.
544
|length|  4|   Signed integer number. Number of elements in the array.
545
|elements|    Variable. Depends on sizes of the objects.|  Elements sequence. Every element is a full value of any type or NULL.
546

547
|===
548

549
=== Collection
550

551
Type code: 24;
552

553
General collection type. Just as an object array, contains objects, but unlike array, it have a hint for a deserialization to a platform-specific collection of a certain type, not just an array. There are following collection types:
554

555

556
*  `USER_SET` = -1. This is a general set type, which can not be mapped to more specific set type. Still, it is known, that it is set. It makes sense to deserialize such a collection to the basic and most widely used set-like type on your platform, e.g. hash set.
557
*    `USER_COL` = 0. This is a general collection type, which can not be mapped to any more specific collection type. It makes sense to deserialize such a collection to the basic and most widely used collection type on your platform, e.g. resizeable array.
558
*    `ARR_LIST` = 1. This is in fact a resizeable array type.
559
*    `LINKED_LIST` = 2. This is a linked list type.
560
*    `HASH_SET` = 3. This is a basic hash set type.
561
*    `LINKED_HASH_SET` = 4. This is a hash set type, which maintains element order.
562
*    `SINGLETON_LIST` = 5. This is a collection that only contains a single element, but behaves as a collection. Could be used by platforms for optimization purposes. If not applicable, any collection type could be used.
563

564
[NOTE]
565
====
566
Collection type byte is used as a hint by a certain platform to deserialize a collection to the most suitable type. For example, in Java HASH_SET deserialized to java.util.HashSet, while LINKED_HASH_SET deserialized to java.util.LinkedHashSet. It is recommended for a thin client implementation to try and use the most suitable collection type on serialization and deserialization. But still, it is only a hint, which user can ignore if it is not relevant or not applicable for the platform.
567
====
568

569
Structure:
570

571

572
[{table_opts}]
573
|===
574
|Field|   Size in bytes|   Description
575
|length|  4|   Signed integer number. Number of elements in the collection.
576
|type|    1|   Type of the collection. See description for details.
577
elements  |  Variable. Depends on sizes of the objects.  Elements sequence. Every element is a full value of any type or NULL.
578

579
|===
580

581
=== Map
582

583
Type code: 25;
584

585
Map-like collection type. Contains pairs of key and value objects. Both key and value objects can be objects of a various types. It includes standard objects of various type, as well as complex objects of various types and any combinations of them. Have a hint for a deserialization to a map of a certain type. There are following map types:
586

587
*   `HASH_MAP` = 1. This is a basic hash map.
588
*   `LINKED_HASH_MAP` = 2. This is a hash map, which maintains element order.
589

590
[NOTE]
591
====
592
Map type byte is used as a hint by a certain platform to deserialize a collection to the most suitable type. It is recommended for a thin client implementation to try and use the most suitable map type on serialization and deserialization. But still, it is only a hint, which user can ignore if it is not relevant or not applicable for the platform.
593
====
594

595
Structure:
596

597
[{table_opts}]
598
|===
599
|Field|   Size in bytes|   Description
600
|length|  4|   Signed integer number. Number of elements in the collection.
601
|type|    1|   Type of the collection. See description for details.
602
|elements|    Variable. Depends on sizes of the objects.|  Elements sequence. Elements here are keys and values, followed one by one in pairs. Every element is a full value of any type or NULL.
603

604
|===
605

606
=== Enum array
607

608
Type code: 29;
609

610
Array of enumerable type value. Element could be either enumerable value or null. So, any element either occupies 9 bytes or 1 byte.
611

612
Structure:
613

614

615
[{table_opts}]
616
|===
617
|Field|   Size in bytes|   Description
618
|type_id| 4|   Type identifier of the contained objects. For example, in Java this type is used to de-serialize to a EnumType[]. Obviously, all values in array should have EnumType as a parent. It is parent type of any enumerable object type. See <<Type ID>> for details.
619
|length|  4|   Signed integer number. Number of elements in the collection.
620
|elements|    Variable. Depends on sizes of the objects. | Elements sequence. Every element is a full value of enum type or NULL.
621

622
|===
623

624
== Complex object
625

626
Type code: 103;
627

628
Complex object consist of a 24-byte header, set of fields (data objects), and a schema (field IDs and positions). Depending on an operation and your data model, a data object can be of a primitive type or complex type (set of fields).
629

630
Structure:
631

632
[{table_opts}]
633
|===
634
|Field |   Size in bytes|   Optionality
635
|`version`| 1|   Mandatory
636
|`flags`|   2|   Mandatory
637
|`type_id`| 4|   Mandatory
638
|`hash_code`|   4|   Mandatory
639
|`length`|  4|   Mandatory
640
|`schema_id`|   4|   Mandatory
641
|`object_fields`|   Variable| length.    Optional
642
|`schema`|  Variable| length.    Optional
643
|`raw_data_offset`| 4|   Optional
644

645
|===
646

647

648
== Version
649

650
This is a field, indicating complex object layout version. It is needed for backward compatibility. Clients should check this field and indicate error to a user, if the object layout version is unknown to them, to prevent data corruption and unpredictable results of the de-serialization.
651

652
== Flags
653

654
This field is 16-bit long little-endian bitmask. Contains object flags, which indicate how the object instance should be handled by a reader. There are following flags:
655

656
*    `USER_TYPE = 0x0001` - Indicates that type is a user type. Should be always set for any client type. Can be ignored on a de-serialization.
657
*    `HAS_SCHEMA = 0x0002` - Indicates that object layout contains schema in the footer. See <<Schema>> for details.
658
*    `HAS_RAW_DATA = 0x0004` - Indicating that object has raw data. See <<Raw data offset>> for details.
659
*    `OFFSET_ONE_BYTE = 0x0008` - Indicating that schema field offset is one byte long. See <<Schema>> for details.
660
*    `OFFSET_TWO_BYTES = 0x0010` - Indicating that schema field offset is two byte long. See <<Schema>> for details.
661
*    `COMPACT_FOOTER = 0x0020` - Indicating that footer does not contain field IDs, only offsets. See <<Schema>> for details.
662

663
== Type ID
664

665
This field contains a unique type identifier. It is 4 bytes long and stored in little-endian. By default, Type ID is obtained as a Java-style hash code of the type name. Type ID evaluation algorithm should be the same across all platforms in the cluster for all platforms to be able to operate with objects of this type. Default type ID calculation algorithm, which is recommended for use by all thin clients, can be found below.
666

667
[tabs]
668
--
669

670
tab:Java[]
671
[source, java]
672
----
673
static int hashCode(String str) {
674
  int len = str.length;
675

676
  int h = 0;
677

678
  for (int i = 0; i < len; i++) {
679
    int c = str.charAt(i);
680

681
    c = Character.toLowerCase(c);
682

683
    h = 31 * h + c;
684
  }
685

686
  return h;
687
}
688
----
689

690
tab:C[]
691

692
[source, c]
693
----
694
int32_t HashCode(const char* val, size_t size)
695
{
696
  if (!val && size == 0)
697
    return 0;
698

699
  int32_t hash = 0;
700

701
  for (size_t i = 0; i < size; ++i)
702
  {
703
    char c = val[i];
704

705
    if ('A' <= c && c <= 'Z')
706
      c |= 0x20;
707

708
    hash = 31 * hash + c;
709
  }
710

711
  return hash;
712
}
713
----
714

715
--
716

717

718

719

720

721
== Hash code
722

723
Hash code of the value. It is stored as a 4-byte long little-endian value and calculated as a Java-style hash of contents without header. Used by Ignite engine for comparisons, for example - to compare keys. Hash calculation algorithm can be found below.
724

725
[tabs]
726
--
727
tab:Java[]
728
[source, java]
729
----
730
static int dataHashCode(byte[] data) {
731
  int len = data.length;
732

733
  int h = 0;
734

735
  for (int i = 0; i < len; i++)
736
    h = 31 * h + data[i];
737

738
  return h;
739
}
740
----
741
tab:C[]
742

743
[source, c]
744
----
745
int32_t GetDataHashCode(const void* data, size_t size)
746
{
747
  if (!data)
748
    return 0;
749

750
  int32_t hash = 1;
751
  const int8_t* bytes = static_cast<const int8_t*>(data);
752

753
  for (int i = 0; i < size; ++i)
754
    hash = 31 * hash + bytes[i];
755

756
  return hash;
757
}
758
----
759

760
--
761

762

763

764

765
== Length
766

767
This field contains full length of the object including header. It is stored as a 4-byte long little-endian integer number. Using this field you can easily skip the whole object by simply increasing current data stream position by the value of this field.
768

769
== Schema ID
770

771
Object schema identifier. It is stored as a 4-byte long little-endian value and calculated as a hash of all object field IDs. It is used for complex object size optimization. Ignite uses schema ID to avoid writing of the whole schema to the end of the every complex object value. Instead, it stores all schemas in the binary metadata store and only writes field offsets to the object. This optimization helps to significantly reduce size for the complex object containing a lot of short fields (such as ints).
772

773
If the schema is missing (e.g. the whole object is written in raw mode, or have no fields at all), the schema ID field is 0.
774

775
See <<Schema>> for details on schema structure.
776

777
[NOTE]
778
====
779
Schema ID can not be determined using Type ID as objects of the same type (and thus, having the same Type ID) can have a multiple schemas, i.e. field sequence.
780
====
781

782
Schema ID calculation algorithm can be found below:
783

784
[tabs]
785
--
786

787
tab:Java[]
788

789
[source, java]
790
----
791
/** FNV1 hash offset basis. */
792
private static final int FNV1_OFFSET_BASIS = 0x811C9DC5;
793

794
/** FNV1 hash prime. */
795
private static final int FNV1_PRIME = 0x01000193;
796

797
static int calculateSchemaId(int fieldIds[])
798
{
799
  if (fieldIds == null || fieldIds.length == 0)
800
    return 0;
801

802
  int len = fieldIds.length;
803

804
  int schemaId = FNV1_OFFSET_BASIS;
805

806
  for (size_t i = 0; i < len; ++i)
807
  {
808
    fieldId = fieldIds[i];
809

810
    schemaId = schemaId ^ (fieldId & 0xFF);
811
    schemaId = schemaId * FNV1_PRIME;
812
    schemaId = schemaId ^ ((fieldId >> 8) & 0xFF);
813
    schemaId = schemaId * FNV1_PRIME;
814
    schemaId = schemaId ^ ((fieldId >> 16) & 0xFF);
815
    schemaId = schemaId * FNV1_PRIME;
816
    schemaId = schemaId ^ ((fieldId >> 24) & 0xFF);
817
    schemaId = schemaId * FNV1_PRIME;
818
  }
819
}
820
----
821

822

823
tab:C[]
824

825
[source, c]
826
----
827
/** FNV1 hash offset basis. */
828
enum { FNV1_OFFSET_BASIS = 0x811C9DC5 };
829

830
/** FNV1 hash prime. */
831
enum { FNV1_PRIME = 0x01000193 };
832

833
int32_t CalculateSchemaId(const int32_t* fieldIds, size_t num)
834
{
835
  if (!fieldIds || num == 0)
836
    return 0;
837

838
  int32_t schemaId = FNV1_OFFSET_BASIS;
839

840
  for (size_t i = 0; i < num; ++i)
841
  {
842
    fieldId = fieldIds[i];
843

844
    schemaId ^= fieldId & 0xFF;
845
    schemaId *= FNV1_PRIME;
846
    schemaId ^= (fieldId >> 8) & 0xFF;
847
    schemaId *= FNV1_PRIME;
848
    schemaId ^= (fieldId >> 16) & 0xFF;
849
    schemaId *= FNV1_PRIME;
850
    schemaId ^= (fieldId >> 24) & 0xFF;
851
    schemaId *= FNV1_PRIME;
852
  }
853
}
854
----
855

856

857
--
858

859

860

861
== Object Fields
862

863
Object fields. Every field is a binary object and could be either complex or standard type. Note that a complex object that has no fields at all is a valid object and may be encountered. Every field can have or not have a name. For named fields there is an offset written in the object schema, by which they can be located in object without de-serialization of the whole object. Fields without name are always stored after the named fields and are written in a so called "raw mode".
864

865
Thus, fields that have been written in a raw mode can only be accessed by sequential read in the same order as they were written, while named fields can be read in a random order.
866

867
== Schema
868

869
Object schema. Any complex object may have or have no schema, so this field is optional. Schema is not present in object, if there is no named fields in object. It also includes cases, when the object does not have fields at all. You should check the HAS_SCHEMA object flag to determine if the object has schema.
870

871
The main purpose of a schema is to allow for fast search of object fields. For this purpose, schema contains a sequence of offsets of object fields in the object payload. Field offsets themselves can be of a different size. The size of these fields determined on a write by a max offset value. If it is in the range of [24..255] bytes, then 1-byte offset is used, if it's in the range of [256..65535] bytes, then 2-byte offset is used. In all other cases 4-byte offsets are used. To determine the size of the offsets on read, clients should check `OFFSET_ONE_BYTE` and `OFFSET_TWO_BYTES` flags. If the `OFFSET_ONE_BYTE` flag is set, then offsets are 1 byte long, else if `OFFSET_TWO_BYTES` flag is set, then offsets are 2-byte long, otherwise offsets are 4-byte long.
872

873
There are two formats of schema supported:
874

875
* Full schema approach - simpler to implement but uses more resources.
876
*  Compact footer approach - harder to implement, but provides better performance and reduces memory consumption; thus it is recommended for new clients to implement this approach.
877

878
You can find more details on both formats below.
879

880
Note that the flag COMPACT_FOOTER should be checked by clients to determine which approach is used in every specific object.
881

882
=== Full schema approach
883

884
When this approach is used, COMPACT_FOOTER flag is not set and the whole object schema is written to the footer of the object. In this case only complex object itself is needed for a de-serialization - schema_id field is ignored and no additional data is required. The structure of the schema field of the complex object in this case can be found below:
885

886
[cols="1,1,2",opts="header"]
887
|===
888
|Field |  Size in bytes |  Description
889
|`field_id_0`|  4|   ID of the field with the index 0. 4-byte long hash stored in little-endian. The Field ID calculated using field name the same way it is done for a <<Type ID>>.
890
|`field_offset_0`|  Variable, depending on the size of the object: 1, 2 or 4. |  Unsigned integer number stored in little-endian Offset of the field in object, starting from the very first byte of the full object value (i.e. type_code position).
891
|`field_id_1`|  4|   4-byte long hash stored in little-endian. ID of the field with the index 1.
892
|`field_offset_1` | Variable, depending on the size of the object: 1, 2 or 4.|   Unsigned integer number stored in little-endian. Offset of the field in object.
893
|...| ...| ...
894
|`field_id_N`|  4|   4-byte long hash stored in little-endian. ID of the field with the index N.
895
|`field_offset_N`|  Variable, depending on the size of the object: 1, 2 or 4. |   Unsigned integer number stored in little-endian. Offset of the field in object.
896

897
|===
898

899
=== Compact footer approach
900

901
In this approach, COMPACT_FOOTER flag is set and only field offset sequence is written to the object footer. In this case client uses schema_id field to search objects schema in a previously stored meta store to find out fields order and associate field with its offset.
902

903
If this approach is used, client needs to keep schemas in a special meta store and send/retrieve them to Ignite servers. See link:check[Binary Types] for details.
904

905
The structure of the schema in this case can be found below:
906

907
[cols="1,1,2",opts="header"]
908
|===
909
|Field |  Size in bytes |  Description
910
|`field_offset_0` | Variable, depending on the size of the object: 1, 2 or 4. |  Unsigned integer number stored in little-endian. Offset of the field 0 in the object, starting from the very first byte of the full object value (i.e. type_code position).
911
|`field_offset_1`|  Variable, depending on the size of the object: 1, 2 or 4. |  Unsigned integer number stored in little-endian. Offset of the 1-st field in object.
912
|...| ...| ...
913
|`field_id_N`|  Variable, depending on the size of the object: 1, 2 or 4.  | Unsigned integer number stored in little-endian.
914
Offset of the N-th field in object.
915

916
|===
917

918
== Raw data offset
919

920
Optional field. Only present in object, if there is any fields, that have been written in a raw mode. In this case, HAS_RAW_DATA flag is set and the raw data offset field is present and is stored as an 4-byte long little-endian value, which points to the offset of the raw data in complex object, starting from the very first byte of the header (i.e. this field always greater than a header length).
921

922
This field is used to position stream for user to start reading in a raw mode.
923

924
== Special types
925

926
=== Wrapped Data
927

928
Type code: 27;
929

930
One or more binary objects can be wrapped in an array. This allows reading, storing, passing and writing objects efficiently without understanding their contents, performing simple byte copy.
931
All cache operations return complex objects inside a wrapper (but not primitives).
932

933
Structure:
934

935
[{table_opts}]
936
|===
937
|Field |   Size |    Description
938
|length|  4|   Signed integer number stored in little-endian. Size of the wrapped data in bytes.
939
|payload| length|  Payload.
940
|offset|  4|   Signed integer number stored in little-endian. Offset of the object within an array. Array can contain an object graph, this offset points to the root object.
941

942
|===
943

944
=== Binary enum
945

946
Type code: 38
947

948
Wrapped enumerable type. This type can be returned by the engine in place of the ordinary enum type. Enums should be written in this form when Binary API is used.
949

950
Structure:
951

952
[{table_opts}]
953
|===
954
|Field |  Size  |  Description
955
|type_id| 4|   Signed integer number in little-endian. See <<Type ID>> for details.
956
|ordinal| 4|   Signed integer number stored in little-endian. Enumeration value ordinal . Its position in its enum declaration, where the initial constant is assigned an ordinal of zero.
957

958
|===
959

960
== Serialization and Deserialization examples
961

962
=== Reading objects
963

964
A code template below shows how to read data of various types from an input byte stream:
965

966

967
[source, java]
968
----
969
private static Object readDataObject(DataInputStream in) throws IOException {
970
  byte code = in.readByte();
971

972
  switch (code) {
973
    case 1:
974
      return in.readByte();
975
    case 2:
976
      return readShortLittleEndian(in);
977
    case 3:
978
      return readIntLittleEndian(in);
979
    case 4:
980
      return readLongLittleEndian(in);
981
    case 27: {
982
      int len = readIntLittleEndian(in);
983
      // Assume 0 offset for simplicity
984
      Object res = readDataObject(in);
985
      int offset = readIntLittleEndian(in);
986
      return res;
987
    }
988
    case 103:
989
      byte ver = in.readByte();
990
      assert ver == 1; // version
991
      short flags = readShortLittleEndian(in);
992
      int typeId = readIntLittleEndian(in);
993
      int hash = readIntLittleEndian(in);
994
      int len = readIntLittleEndian(in);
995
      int schemaId = readIntLittleEndian(in);
996
      int schemaOffset = readIntLittleEndian(in);
997
      byte[] data = new byte[len - 24];
998
      in.read(data);
999
      return "Binary Object: " + typeId;
1000
    default:
1001
      throw new Error("Unsupported type: " + code);
1002
  }
1003
}
1004
----
1005

1006
=== Int
1007

1008
The following code snippet shows how to write and read a data object of type int, using a socket based output/input stream.
1009

1010

1011
[source, java]
1012
----
1013
// Write int data object
1014
DataOutputStream out = new DataOutputStream(socket.getOutputStream());
1015

1016
int val = 11;
1017
writeByteLittleEndian(3, out);  // Integer type code
1018
writeIntLittleEndian(val, out);
1019

1020
// Read int data object
1021
DataInputStream in = new DataInputStream(socket.getInputStream());
1022
int typeCode = readByteLittleEndian(in);
1023
int val = readIntLittleEndian(in);
1024
----
1025

1026
Refer to the link:example[example section] for implementation of `write...()` and `read..()` methods shown above.
1027

1028
As another example, for String type, the structure would be:
1029

1030

1031

1032
[cols="1,2",opts="header"]
1033
|===
1034
|Type |    Description
1035
| byte |    String type code, 9.
1036
|int | String length in UTF-8 bytes.
1037
|bytes |   Actual string.
1038
|===
1039

1040
=== String
1041

1042
The code snippet below shows how to write and read a String value following this format:
1043

1044

1045
[source, java]
1046
----
1047
private static void writeString (String str, DataOutputStream out) throws IOException {
1048
  writeByteLittleEndian(9, out); // type code for String
1049

1050
  int strLen = str.getBytes("UTF-8").length; // length of the string
1051
  writeIntLittleEndian(strLen, out);
1052

1053
  out.writeBytes(str);
1054
}
1055

1056
private static String readString(DataInputStream in) throws IOException {
1057
  int type = readByteLittleEndian(in); // type code
1058

1059
  int strLen = readIntLittleEndian(in); // length of the string
1060

1061
  byte[] buf = new byte[strLen];
1062

1063
  readFully(in, buf, 0, strLen);
1064

1065
  return new String(buf);
1066
}
1067
----
1068

1069

1070

1071

1072

1073

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.