apache-ignite
1067 строк · 32.9 Кб
1// Licensed to the Apache Software Foundation (ASF) under one or more
2// contributor license agreements. See the NOTICE file distributed with
3// this work for additional information regarding copyright ownership.
4// The ASF licenses this file to You under the Apache License, Version 2.0
5// (the "License"); you may not use this file except in compliance with
6// the License. You may obtain a copy of the License at
7//
8// http://www.apache.org/licenses/LICENSE-2.0
9//
10// Unless required by applicable law or agreed to in writing, software
11// distributed under the License is distributed on an "AS IS" BASIS,
12// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13// See the License for the specific language governing permissions and
14// limitations under the License.
15= Data Format
16
17Standard data types are represented as a combination of type code and value.
18
19:table_opts: cols="1,1,4",opts="header"
20
21[{table_opts}]
22|===
23|Field | Size in bytes | Description
24|`type_code` | 1 | Signed one-byte integer code that indicates the type of the value.
25|`value` | Variable| Value itself. Its format and size depends on the type_code
26|===
27
28
29Below you can find description of the supported types and their format.
30
31
32== Primitives
33
34Primitives are the very basic types, such as numbers.
35
36
37=== Byte
38[{table_opts}]
39|===
40| Field | Size in bytes | Description
41|Type | 1| 1
42|Value | 1 | Single byte value.
43
44|===
45
46=== Short
47
48Type code: 2;
49
502-bytes long signed integer number. Little-endian.
51
52Structure:
53
54
55[{table_opts}]
56|===
57| Field | Size in bytes | Description
58| `Value` | 2| The value.
59|===
60
61
62=== Int
63
64Type code: 3;
65
664-bytes long signed integer number. Little-endian.
67
68Structure:
69
70[{table_opts}]
71|===
72|Field| Size in bytes| Description
73|`value`| 4| The value.
74|===
75
76=== Long
77
78Type code: 4;
79
808-bytes long signed integer number. Little-endian.
81
82Structure:
83
84
85[{table_opts}]
86|===
87|Field| Size in bytes | Description
88|`value` | 8 | The value.
89|===
90
91
92=== Float
93
94Type code: 5;
95
964-byte long IEEE 754 floating-point number. Little-endian.
97
98Structure:
99
100[{table_opts}]
101|===
102|Field | Size in bytes| Description
103| value| 4| The value.
104|===
105
106=== Double
107Type code: 6;
108
1098-byte long IEEE 754 floating-point number. Little-endian.
110
111Structure:
112
113[{table_opts}]
114|===
115|Field| Size in bytes| Description
116|value | 8| The value.
117
118|===
119
120=== Char
121Type code: 7;
122
123Single UTF-16 code unit. Little-endian.
124
125Structure:
126
127[{table_opts}]
128|===
129|Field| Size in bytes| Description
130|value | 2 | The UTF-16 code unit in little-endian.
131|===
132
133
134=== Bool
135
136Type code: 8;
137
138Boolean value. Zero for false and non-zero for true.
139
140Structure:
141
142[{table_opts}]
143|===
144|Field | Size in bytes | Description
145
146|value | 1 | The value. Zero for false and non-zero for true.
147
148|===
149
150=== NULL
151
152Type code: 101;
153
154This is not exactly a type. It's just a null value, which can be assigned to object of any type.
155Has no payload, only consists of the type code.
156
157== Standard objects
158
159=== String
160
161Type code: 9;
162
163String in UTF-8 encoding. Should always be a valid UTF-8 string.
164
165Structure:
166
167[{table_opts}]
168|===
169|Field | Size in bytes | Description
170|length| 4| Signed integer number in little-endian. Length of the string in UTF-8 code units, i.e. in bytes.
171| data | length | String data in UTF-8 encoding. Without BOM.
172
173|===
174
175=== UUID (Guid)
176
177
178Type code: 10;
179
180A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems.
181
182Structure:
183
184[{table_opts}]
185|===
186|Field| Size in bytes| Description
187|most_significant_bits| 8| 64-bit number in little endian, representing 64 most significant bits of UUID.
188|least_significant_bits| 8| 64-bit number in little endian, representing 64 least significant bits of UUID.
189
190|===
191
192=== Timestamp
193
194Type code: 33;
195
196More precise than a Date data type. Except for a milliseconds since epoch, contains a nanoseconds fraction of a last millisecond, which value could be in a range from 0 to 999999. It means, the full time stamp in nanoseconds can be obtained with the following expression: `msecs_since_epoch \* 1000000 + msec_fraction_in_nsecs`.
197
198NOTE: The nanoseconds time stamp evaluation expression is provided for clarification purposes only. One should not use the expression in production code, as in some languages the expression may result in integer number overflow.
199
200Structure:
201
202[{table_opts}]
203|===
204|Field| Size in bytes | Description
205|`msecs_since_epoch`| 8| Signed integer number in little-endian. Number of milliseconds elapsed since 00:00:00 1 Jan 1970 UTC. This format widely known as a Unix or POSIX time.
206|`msec_fraction_in_nsecs`| 4| Signed integer number in little-endian. Nanosecond fraction of a millisecond.
207
208|===
209
210=== Date
211
212Type code: 11;
213
214Date, represented as a number of milliseconds elapsed since 00:00:00 1 Jan 1970 UTC. This format widely known as a Unix or POSIX time.
215
216Structure:
217
218[{table_opts}]
219|===
220|Field| Size in bytes| Description
221|`msecs_since_epoch`| 8| The value. Signed integer number in little-endian.
222|===
223
224=== Time
225
226Type code: 36;
227
228Time, represented as a number of milliseconds elapsed since midnight, i.e. 00:00:00 UTC.
229
230Structure:
231
232[{table_opts}]
233|===
234|Field| Size in bytes| Description
235|value| 8| Signed integer number in little-endian. Number of milliseconds elapsed since 00:00:00 UTC.
236
237|===
238
239=== Decimal
240
241Type code: 30;
242
243Numeric value of any desired precision and scale.
244
245Structure:
246
247[{table_opts}]
248|===
249|Field | Size in bytes| Description
250|scale| 4| Signed integer number in little-endian. Effectively, a power of the ten, on which the unscaled value should be divided. For example, 42 with scale 3 is 0.042, 42 with scale -3 is 42000, and 42 with scale 1 is 42.
251|length| 4| Signed integer number in little-endian. Length of the number in bytes.
252|data| length| First bit is the flag of negativity. If it's set to 1, then value is negative. Other bits form signed integer number of variable length in big-endian format.
253
254|===
255
256=== Enum
257
258Type code: 28;
259
260Value of an enumerable type. For such types defined only a finite number of named values.
261
262Structure:
263
264[{table_opts}]
265|===
266|Field| Size in bytes| Description
267|type_id| 4| Signed integer number in little-endian. See <<Type ID>> for details.
268|ordinal| 4| Signed integer number stored in little-endian. Enumeration value ordinal . Its position in its enum declaration, where the initial constant is assigned an ordinal of zero.
269
270|===
271
272== Arrays of primitives
273
274Arrays of this kind only contain payloads of values as elements. They all have similar format. See format description in a table below for details. Pay attention that array only contains payloads, not type codes.
275
276
277[{table_opts}]
278|===
279|Field| Size in bytes| Description
280|`length`| 4| Signed integer number. Number of elements in the array.
281|`element_0_payload`| Depends on the type.| Payload of the value 0.
282|`element_1_payload`| Depends on the type.| Payload of the value 1.
283|... |... |...
284|`element_N_payload`| Depends on the type. | Payload of the value N.
285
286|===
287
288=== Byte array
289
290Type code: 12;
291
292Array of bytes. May be either a piece of raw data, or array of small signed integer numbers.
293
294Structure:
295
296[{table_opts}]
297|===
298|Field| Size in bytes| Description
299|length| 4| Signed integer number. Number of elements in the array.
300|elements| length| Elements sequence. Every element is a payload of type "byte".
301
302|===
303
304Short array
305
306Type code: 13;
307
308Array of short signed integer numbers.
309
310Structure:
311
312[{table_opts}]
313|===
314|Field | Size in bytes| Description
315|length| 4| Signed integer number. Number of elements in the array.
316|elements| `length * 2`| Elements sequence. Every element is a payload of type "short".
317
318|===
319
320=== Int array
321
322Type code: 14;
323
324Array of signed integer numbers.
325
326Structure:
327
328[{table_opts}]
329|===
330|Field| Size in bytes| Description
331|length| 4| Signed integer number. Number of elements in the array.
332|elements| `length * 4`| Elements sequence. Every element is a payload of type "int".
333
334|===
335
336=== Long array
337
338Type code: 15;
339
340Array of long signed integer numbers.
341
342Structure:
343
344[{table_opts}]
345|===
346|Field| Size in bytes| Description
347|length| 4| Signed integer number. Number of elements in the array.
348|elements| `length * 8`| Elements sequence. Every element is a payload of type "long".
349
350|===
351
352=== Float array
353
354Type code: 16;
355
356Array of floating point numbers.
357
358Structure:
359
360[{table_opts}]
361|===
362|Field| Size in bytes| Description
363|length| 4| Signed integer number. Number of elements in the array.
364|elements| `length * 4` | Elements sequence. Every element is a payload of type "float".
365
366|===
367
368=== Double array
369
370Type code: 17;
371
372Array of floating point numbers with double precision.
373
374Structure:
375
376[{table_opts}]
377|===
378|Field| Size in bytes | Description
379|length| 4| Signed integer number. Number of elements in the array.
380|elements| `length * 8`| Elements sequence. Every element is a payload of type "double".
381
382|===
383
384=== Char array
385
386Type code: 18;
387
388Array of UTF-16 code units. Unlike string, this type is not necessary contains valid UTF-16 text.
389
390Structure:
391
392[{table_opts}]
393|===
394|Field | Size in bytes| Description
395|length | 4| Signed integer number. Number of elements in the array.
396|elements| length * 2| Elements sequence. Every element is a payload of type "char".
397
398|===
399
400=== Bool array
401
402Type code: 19;
403
404Array of boolean values.
405
406Structure:
407
408[{table_opts}]
409|===
410|Field| Size in bytes | Description
411|length| 4| Signed integer number. Number of elements in the array.
412|elements| length| Elements sequence. Every element is a payload of type "bool".
413
414|===
415
416== Arrays of standard objects
417
418Arrays of this kind contain full values as elements. It means, their elements contain type code as well as payload. This format allows for elements of such collections to be NULL values. That's why they are called "objects". They all have similar format. See format description in a table below for details.
419
420
421[{table_opts}]
422|===
423|Field| Size in bytes| Description
424|`length` | 4| Signed integer number. Number of elements in the array.
425|`element_0_full_value`| Depends on value type.| Full value of the element 0. Contains of type code and payload. Also, can be NULL.
426|`element_1_full_value`| Depends on value type.| Full value of the element 1 or NULL.
427|... |...| ...
428|`element_N_full_value`| Depends on value type.| Full value of the element N or NULL.
429
430|===
431
432=== String array
433
434Type code: 20;
435
436Array of UTF-8 string values.
437
438Structure:
439
440
441[{table_opts}]
442|===
443|Field | Size in bytes| Description
444|length| 4| Signed integer number. Number of elements in the array.
445|elements| Variable. Depends on every string length. Every element size is either `5 + value_length` for string, or 1 for `NULL`.| Elements sequence. Every element is a full value of type "string", including type code, or `NULL`.
446
447|===
448
449=== UUID (Guid) array
450
451Type code: 21;
452
453Array of UUIDs (Guids).
454
455Structure:
456
457
458[{table_opts}]
459|===
460|Field| Size in bytes| Description
461|length| 4| Signed integer number. Number of elements in the array.
462|elements| Variable. Every element size is either 17 for UUID, or 1 for NULL.| Elements sequence. Every element is a full value of type "UUID", including type code, or NULL.
463
464|===
465
466=== Timestamp array
467
468Type code: 34;
469
470Array of timestamp values.
471
472Structure:
473
474[{table_opts}]
475|===
476|Field| Size in bytes | Description
477|length| 4| Signed integer number. Number of elements in the array.
478|elements| Variable. Every element size is either 13 for Timestamp, or 1 for NULL.| Elements sequence. Every element is a full value of type "timestamp", including type code, or NULL.
479
480|===
481
482=== Date array
483
484Type code: 22;
485
486Array of dates.
487
488Structure:
489
490[{table_opts}]
491|===
492|Field| Size in bytes| Description
493|length| 4| Signed integer number. Number of elements in the array.
494|elements| Variable. Every element size is either 9 for Date, or 1 for NULL.| Elements sequence. Every element is a full value of type "date", including type code, or NULL.
495
496|===
497
498=== Time array
499
500Type code: 37;
501
502Array of time values.
503
504Structure:
505
506[{table_opts}]
507|===
508|Field | Size in bytes| Description
509|length| 4| Signed integer number. Number of elements in the array.
510|elements | Variable. Every element size is either 9 for Time, or 1 for NULL.| Elements sequence. Every element is a full value of type "time", including type code, or NULL.
511
512|===
513
514=== Decimal array
515
516Type code: 31;
517
518Array of decimal values.
519
520Structure:
521
522[{table_opts}]
523|===
524|Field| Size in bytes| Description
525|length| 4| Signed integer number. Number of elements in the array.
526|elements| Variable. Every element size is either `9 + value_length` for Decimal, or 1 for NULL.| Elements sequence. Every element is a full value of type "decimal", including type code, or NULL.
527
528|===
529
530== Object collections
531
532=== Object array
533
534Type code: 23;
535
536Array of objects of any type. Can contain objects of any type. This includes standard objects of any type, as well as complex objects of various types, NULL values and any combinations of them. This also means, that collections may contain other collections.
537
538Structure:
539
540[{table_opts}]
541|===
542|Field| Size in bytes| Description
543|type_id |4| Type identifier of the contained objects. For example, in Java this type is used to de-serialize to a Type[]. Obviously, all values in array should have Type as a parent. It is parent type of any object type. For example, in Java this always can be java.lang.Object. Type ID for such "root" object type is -1. See <<Type ID>> for details.
544|length| 4| Signed integer number. Number of elements in the array.
545|elements| Variable. Depends on sizes of the objects.| Elements sequence. Every element is a full value of any type or NULL.
546
547|===
548
549=== Collection
550
551Type code: 24;
552
553General collection type. Just as an object array, contains objects, but unlike array, it have a hint for a deserialization to a platform-specific collection of a certain type, not just an array. There are following collection types:
554
555
556* `USER_SET` = -1. This is a general set type, which can not be mapped to more specific set type. Still, it is known, that it is set. It makes sense to deserialize such a collection to the basic and most widely used set-like type on your platform, e.g. hash set.
557* `USER_COL` = 0. This is a general collection type, which can not be mapped to any more specific collection type. It makes sense to deserialize such a collection to the basic and most widely used collection type on your platform, e.g. resizeable array.
558* `ARR_LIST` = 1. This is in fact a resizeable array type.
559* `LINKED_LIST` = 2. This is a linked list type.
560* `HASH_SET` = 3. This is a basic hash set type.
561* `LINKED_HASH_SET` = 4. This is a hash set type, which maintains element order.
562* `SINGLETON_LIST` = 5. This is a collection that only contains a single element, but behaves as a collection. Could be used by platforms for optimization purposes. If not applicable, any collection type could be used.
563
564[NOTE]
565====
566Collection type byte is used as a hint by a certain platform to deserialize a collection to the most suitable type. For example, in Java HASH_SET deserialized to java.util.HashSet, while LINKED_HASH_SET deserialized to java.util.LinkedHashSet. It is recommended for a thin client implementation to try and use the most suitable collection type on serialization and deserialization. But still, it is only a hint, which user can ignore if it is not relevant or not applicable for the platform.
567====
568
569Structure:
570
571
572[{table_opts}]
573|===
574|Field| Size in bytes| Description
575|length| 4| Signed integer number. Number of elements in the collection.
576|type| 1| Type of the collection. See description for details.
577elements | Variable. Depends on sizes of the objects. Elements sequence. Every element is a full value of any type or NULL.
578
579|===
580
581=== Map
582
583Type code: 25;
584
585Map-like collection type. Contains pairs of key and value objects. Both key and value objects can be objects of a various types. It includes standard objects of various type, as well as complex objects of various types and any combinations of them. Have a hint for a deserialization to a map of a certain type. There are following map types:
586
587* `HASH_MAP` = 1. This is a basic hash map.
588* `LINKED_HASH_MAP` = 2. This is a hash map, which maintains element order.
589
590[NOTE]
591====
592Map type byte is used as a hint by a certain platform to deserialize a collection to the most suitable type. It is recommended for a thin client implementation to try and use the most suitable map type on serialization and deserialization. But still, it is only a hint, which user can ignore if it is not relevant or not applicable for the platform.
593====
594
595Structure:
596
597[{table_opts}]
598|===
599|Field| Size in bytes| Description
600|length| 4| Signed integer number. Number of elements in the collection.
601|type| 1| Type of the collection. See description for details.
602|elements| Variable. Depends on sizes of the objects.| Elements sequence. Elements here are keys and values, followed one by one in pairs. Every element is a full value of any type or NULL.
603
604|===
605
606=== Enum array
607
608Type code: 29;
609
610Array of enumerable type value. Element could be either enumerable value or null. So, any element either occupies 9 bytes or 1 byte.
611
612Structure:
613
614
615[{table_opts}]
616|===
617|Field| Size in bytes| Description
618|type_id| 4| Type identifier of the contained objects. For example, in Java this type is used to de-serialize to a EnumType[]. Obviously, all values in array should have EnumType as a parent. It is parent type of any enumerable object type. See <<Type ID>> for details.
619|length| 4| Signed integer number. Number of elements in the collection.
620|elements| Variable. Depends on sizes of the objects. | Elements sequence. Every element is a full value of enum type or NULL.
621
622|===
623
624== Complex object
625
626Type code: 103;
627
628Complex object consist of a 24-byte header, set of fields (data objects), and a schema (field IDs and positions). Depending on an operation and your data model, a data object can be of a primitive type or complex type (set of fields).
629
630Structure:
631
632[{table_opts}]
633|===
634|Field | Size in bytes| Optionality
635|`version`| 1| Mandatory
636|`flags`| 2| Mandatory
637|`type_id`| 4| Mandatory
638|`hash_code`| 4| Mandatory
639|`length`| 4| Mandatory
640|`schema_id`| 4| Mandatory
641|`object_fields`| Variable| length. Optional
642|`schema`| Variable| length. Optional
643|`raw_data_offset`| 4| Optional
644
645|===
646
647
648== Version
649
650This is a field, indicating complex object layout version. It is needed for backward compatibility. Clients should check this field and indicate error to a user, if the object layout version is unknown to them, to prevent data corruption and unpredictable results of the de-serialization.
651
652== Flags
653
654This field is 16-bit long little-endian bitmask. Contains object flags, which indicate how the object instance should be handled by a reader. There are following flags:
655
656* `USER_TYPE = 0x0001` - Indicates that type is a user type. Should be always set for any client type. Can be ignored on a de-serialization.
657* `HAS_SCHEMA = 0x0002` - Indicates that object layout contains schema in the footer. See <<Schema>> for details.
658* `HAS_RAW_DATA = 0x0004` - Indicating that object has raw data. See <<Raw data offset>> for details.
659* `OFFSET_ONE_BYTE = 0x0008` - Indicating that schema field offset is one byte long. See <<Schema>> for details.
660* `OFFSET_TWO_BYTES = 0x0010` - Indicating that schema field offset is two byte long. See <<Schema>> for details.
661* `COMPACT_FOOTER = 0x0020` - Indicating that footer does not contain field IDs, only offsets. See <<Schema>> for details.
662
663== Type ID
664
665This field contains a unique type identifier. It is 4 bytes long and stored in little-endian. By default, Type ID is obtained as a Java-style hash code of the type name. Type ID evaluation algorithm should be the same across all platforms in the cluster for all platforms to be able to operate with objects of this type. Default type ID calculation algorithm, which is recommended for use by all thin clients, can be found below.
666
667[tabs]
668--
669
670tab:Java[]
671[source, java]
672----
673static int hashCode(String str) {
674int len = str.length;
675
676int h = 0;
677
678for (int i = 0; i < len; i++) {
679int c = str.charAt(i);
680
681c = Character.toLowerCase(c);
682
683h = 31 * h + c;
684}
685
686return h;
687}
688----
689
690tab:C[]
691
692[source, c]
693----
694int32_t HashCode(const char* val, size_t size)
695{
696if (!val && size == 0)
697return 0;
698
699int32_t hash = 0;
700
701for (size_t i = 0; i < size; ++i)
702{
703char c = val[i];
704
705if ('A' <= c && c <= 'Z')
706c |= 0x20;
707
708hash = 31 * hash + c;
709}
710
711return hash;
712}
713----
714
715--
716
717
718
719
720
721== Hash code
722
723Hash code of the value. It is stored as a 4-byte long little-endian value and calculated as a Java-style hash of contents without header. Used by Ignite engine for comparisons, for example - to compare keys. Hash calculation algorithm can be found below.
724
725[tabs]
726--
727tab:Java[]
728[source, java]
729----
730static int dataHashCode(byte[] data) {
731int len = data.length;
732
733int h = 0;
734
735for (int i = 0; i < len; i++)
736h = 31 * h + data[i];
737
738return h;
739}
740----
741tab:C[]
742
743[source, c]
744----
745int32_t GetDataHashCode(const void* data, size_t size)
746{
747if (!data)
748return 0;
749
750int32_t hash = 1;
751const int8_t* bytes = static_cast<const int8_t*>(data);
752
753for (int i = 0; i < size; ++i)
754hash = 31 * hash + bytes[i];
755
756return hash;
757}
758----
759
760--
761
762
763
764
765== Length
766
767This field contains full length of the object including header. It is stored as a 4-byte long little-endian integer number. Using this field you can easily skip the whole object by simply increasing current data stream position by the value of this field.
768
769== Schema ID
770
771Object schema identifier. It is stored as a 4-byte long little-endian value and calculated as a hash of all object field IDs. It is used for complex object size optimization. Ignite uses schema ID to avoid writing of the whole schema to the end of the every complex object value. Instead, it stores all schemas in the binary metadata store and only writes field offsets to the object. This optimization helps to significantly reduce size for the complex object containing a lot of short fields (such as ints).
772
773If the schema is missing (e.g. the whole object is written in raw mode, or have no fields at all), the schema ID field is 0.
774
775See <<Schema>> for details on schema structure.
776
777[NOTE]
778====
779Schema ID can not be determined using Type ID as objects of the same type (and thus, having the same Type ID) can have a multiple schemas, i.e. field sequence.
780====
781
782Schema ID calculation algorithm can be found below:
783
784[tabs]
785--
786
787tab:Java[]
788
789[source, java]
790----
791/** FNV1 hash offset basis. */
792private static final int FNV1_OFFSET_BASIS = 0x811C9DC5;
793
794/** FNV1 hash prime. */
795private static final int FNV1_PRIME = 0x01000193;
796
797static int calculateSchemaId(int fieldIds[])
798{
799if (fieldIds == null || fieldIds.length == 0)
800return 0;
801
802int len = fieldIds.length;
803
804int schemaId = FNV1_OFFSET_BASIS;
805
806for (size_t i = 0; i < len; ++i)
807{
808fieldId = fieldIds[i];
809
810schemaId = schemaId ^ (fieldId & 0xFF);
811schemaId = schemaId * FNV1_PRIME;
812schemaId = schemaId ^ ((fieldId >> 8) & 0xFF);
813schemaId = schemaId * FNV1_PRIME;
814schemaId = schemaId ^ ((fieldId >> 16) & 0xFF);
815schemaId = schemaId * FNV1_PRIME;
816schemaId = schemaId ^ ((fieldId >> 24) & 0xFF);
817schemaId = schemaId * FNV1_PRIME;
818}
819}
820----
821
822
823tab:C[]
824
825[source, c]
826----
827/** FNV1 hash offset basis. */
828enum { FNV1_OFFSET_BASIS = 0x811C9DC5 };
829
830/** FNV1 hash prime. */
831enum { FNV1_PRIME = 0x01000193 };
832
833int32_t CalculateSchemaId(const int32_t* fieldIds, size_t num)
834{
835if (!fieldIds || num == 0)
836return 0;
837
838int32_t schemaId = FNV1_OFFSET_BASIS;
839
840for (size_t i = 0; i < num; ++i)
841{
842fieldId = fieldIds[i];
843
844schemaId ^= fieldId & 0xFF;
845schemaId *= FNV1_PRIME;
846schemaId ^= (fieldId >> 8) & 0xFF;
847schemaId *= FNV1_PRIME;
848schemaId ^= (fieldId >> 16) & 0xFF;
849schemaId *= FNV1_PRIME;
850schemaId ^= (fieldId >> 24) & 0xFF;
851schemaId *= FNV1_PRIME;
852}
853}
854----
855
856
857--
858
859
860
861== Object Fields
862
863Object fields. Every field is a binary object and could be either complex or standard type. Note that a complex object that has no fields at all is a valid object and may be encountered. Every field can have or not have a name. For named fields there is an offset written in the object schema, by which they can be located in object without de-serialization of the whole object. Fields without name are always stored after the named fields and are written in a so called "raw mode".
864
865Thus, fields that have been written in a raw mode can only be accessed by sequential read in the same order as they were written, while named fields can be read in a random order.
866
867== Schema
868
869Object schema. Any complex object may have or have no schema, so this field is optional. Schema is not present in object, if there is no named fields in object. It also includes cases, when the object does not have fields at all. You should check the HAS_SCHEMA object flag to determine if the object has schema.
870
871The main purpose of a schema is to allow for fast search of object fields. For this purpose, schema contains a sequence of offsets of object fields in the object payload. Field offsets themselves can be of a different size. The size of these fields determined on a write by a max offset value. If it is in the range of [24..255] bytes, then 1-byte offset is used, if it's in the range of [256..65535] bytes, then 2-byte offset is used. In all other cases 4-byte offsets are used. To determine the size of the offsets on read, clients should check `OFFSET_ONE_BYTE` and `OFFSET_TWO_BYTES` flags. If the `OFFSET_ONE_BYTE` flag is set, then offsets are 1 byte long, else if `OFFSET_TWO_BYTES` flag is set, then offsets are 2-byte long, otherwise offsets are 4-byte long.
872
873There are two formats of schema supported:
874
875* Full schema approach - simpler to implement but uses more resources.
876* Compact footer approach - harder to implement, but provides better performance and reduces memory consumption; thus it is recommended for new clients to implement this approach.
877
878You can find more details on both formats below.
879
880Note that the flag COMPACT_FOOTER should be checked by clients to determine which approach is used in every specific object.
881
882=== Full schema approach
883
884When this approach is used, COMPACT_FOOTER flag is not set and the whole object schema is written to the footer of the object. In this case only complex object itself is needed for a de-serialization - schema_id field is ignored and no additional data is required. The structure of the schema field of the complex object in this case can be found below:
885
886[cols="1,1,2",opts="header"]
887|===
888|Field | Size in bytes | Description
889|`field_id_0`| 4| ID of the field with the index 0. 4-byte long hash stored in little-endian. The Field ID calculated using field name the same way it is done for a <<Type ID>>.
890|`field_offset_0`| Variable, depending on the size of the object: 1, 2 or 4. | Unsigned integer number stored in little-endian Offset of the field in object, starting from the very first byte of the full object value (i.e. type_code position).
891|`field_id_1`| 4| 4-byte long hash stored in little-endian. ID of the field with the index 1.
892|`field_offset_1` | Variable, depending on the size of the object: 1, 2 or 4.| Unsigned integer number stored in little-endian. Offset of the field in object.
893|...| ...| ...
894|`field_id_N`| 4| 4-byte long hash stored in little-endian. ID of the field with the index N.
895|`field_offset_N`| Variable, depending on the size of the object: 1, 2 or 4. | Unsigned integer number stored in little-endian. Offset of the field in object.
896
897|===
898
899=== Compact footer approach
900
901In this approach, COMPACT_FOOTER flag is set and only field offset sequence is written to the object footer. In this case client uses schema_id field to search objects schema in a previously stored meta store to find out fields order and associate field with its offset.
902
903If this approach is used, client needs to keep schemas in a special meta store and send/retrieve them to Ignite servers. See link:check[Binary Types] for details.
904
905The structure of the schema in this case can be found below:
906
907[cols="1,1,2",opts="header"]
908|===
909|Field | Size in bytes | Description
910|`field_offset_0` | Variable, depending on the size of the object: 1, 2 or 4. | Unsigned integer number stored in little-endian. Offset of the field 0 in the object, starting from the very first byte of the full object value (i.e. type_code position).
911|`field_offset_1`| Variable, depending on the size of the object: 1, 2 or 4. | Unsigned integer number stored in little-endian. Offset of the 1-st field in object.
912|...| ...| ...
913|`field_id_N`| Variable, depending on the size of the object: 1, 2 or 4. | Unsigned integer number stored in little-endian.
914Offset of the N-th field in object.
915
916|===
917
918== Raw data offset
919
920Optional field. Only present in object, if there is any fields, that have been written in a raw mode. In this case, HAS_RAW_DATA flag is set and the raw data offset field is present and is stored as an 4-byte long little-endian value, which points to the offset of the raw data in complex object, starting from the very first byte of the header (i.e. this field always greater than a header length).
921
922This field is used to position stream for user to start reading in a raw mode.
923
924== Special types
925
926=== Wrapped Data
927
928Type code: 27;
929
930One or more binary objects can be wrapped in an array. This allows reading, storing, passing and writing objects efficiently without understanding their contents, performing simple byte copy.
931All cache operations return complex objects inside a wrapper (but not primitives).
932
933Structure:
934
935[{table_opts}]
936|===
937|Field | Size | Description
938|length| 4| Signed integer number stored in little-endian. Size of the wrapped data in bytes.
939|payload| length| Payload.
940|offset| 4| Signed integer number stored in little-endian. Offset of the object within an array. Array can contain an object graph, this offset points to the root object.
941
942|===
943
944=== Binary enum
945
946Type code: 38
947
948Wrapped enumerable type. This type can be returned by the engine in place of the ordinary enum type. Enums should be written in this form when Binary API is used.
949
950Structure:
951
952[{table_opts}]
953|===
954|Field | Size | Description
955|type_id| 4| Signed integer number in little-endian. See <<Type ID>> for details.
956|ordinal| 4| Signed integer number stored in little-endian. Enumeration value ordinal . Its position in its enum declaration, where the initial constant is assigned an ordinal of zero.
957
958|===
959
960== Serialization and Deserialization examples
961
962=== Reading objects
963
964A code template below shows how to read data of various types from an input byte stream:
965
966
967[source, java]
968----
969private static Object readDataObject(DataInputStream in) throws IOException {
970byte code = in.readByte();
971
972switch (code) {
973case 1:
974return in.readByte();
975case 2:
976return readShortLittleEndian(in);
977case 3:
978return readIntLittleEndian(in);
979case 4:
980return readLongLittleEndian(in);
981case 27: {
982int len = readIntLittleEndian(in);
983// Assume 0 offset for simplicity
984Object res = readDataObject(in);
985int offset = readIntLittleEndian(in);
986return res;
987}
988case 103:
989byte ver = in.readByte();
990assert ver == 1; // version
991short flags = readShortLittleEndian(in);
992int typeId = readIntLittleEndian(in);
993int hash = readIntLittleEndian(in);
994int len = readIntLittleEndian(in);
995int schemaId = readIntLittleEndian(in);
996int schemaOffset = readIntLittleEndian(in);
997byte[] data = new byte[len - 24];
998in.read(data);
999return "Binary Object: " + typeId;
1000default:
1001throw new Error("Unsupported type: " + code);
1002}
1003}
1004----
1005
1006=== Int
1007
1008The following code snippet shows how to write and read a data object of type int, using a socket based output/input stream.
1009
1010
1011[source, java]
1012----
1013// Write int data object
1014DataOutputStream out = new DataOutputStream(socket.getOutputStream());
1015
1016int val = 11;
1017writeByteLittleEndian(3, out); // Integer type code
1018writeIntLittleEndian(val, out);
1019
1020// Read int data object
1021DataInputStream in = new DataInputStream(socket.getInputStream());
1022int typeCode = readByteLittleEndian(in);
1023int val = readIntLittleEndian(in);
1024----
1025
1026Refer to the link:example[example section] for implementation of `write...()` and `read..()` methods shown above.
1027
1028As another example, for String type, the structure would be:
1029
1030
1031
1032[cols="1,2",opts="header"]
1033|===
1034|Type | Description
1035| byte | String type code, 9.
1036|int | String length in UTF-8 bytes.
1037|bytes | Actual string.
1038|===
1039
1040=== String
1041
1042The code snippet below shows how to write and read a String value following this format:
1043
1044
1045[source, java]
1046----
1047private static void writeString (String str, DataOutputStream out) throws IOException {
1048writeByteLittleEndian(9, out); // type code for String
1049
1050int strLen = str.getBytes("UTF-8").length; // length of the string
1051writeIntLittleEndian(strLen, out);
1052
1053out.writeBytes(str);
1054}
1055
1056private static String readString(DataInputStream in) throws IOException {
1057int type = readByteLittleEndian(in); // type code
1058
1059int strLen = readIntLittleEndian(in); // length of the string
1060
1061byte[] buf = new byte[strLen];
1062
1063readFully(in, buf, 0, strLen);
1064
1065return new String(buf);
1066}
1067----
1068
1069
1070
1071
1072
1073