-
Notifications
You must be signed in to change notification settings - Fork 1.6k
/
spec.xml
1673 lines (1478 loc) · 74.6 KB
/
spec.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "https://forrest.apache.org/dtd/document-v20.dtd" [
<!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN"
"../../../../build/avro.ent">
%avro-entities;
]>
<document>
<header>
<title>Apache Avro™ &AvroVersion; Specification</title>
</header>
<body>
<section id="preamble">
<title>Introduction</title>
<p>This document defines Apache Avro. It is intended to be the
authoritative specification. Implementations of Avro must
adhere to this document.
</p>
</section>
<section id="schemas">
<title>Schema Declaration</title>
<p>A Schema is represented in <a href="ext:json">JSON</a> by one of:</p>
<ul>
<li>A JSON string, naming a defined type.</li>
<li>A JSON object, of the form:
<source>{"type": "<em>typeName</em>" ...<em>attributes</em>...}</source>
where <em>typeName</em> is either a primitive or derived
type name, as defined below. Attributes not defined in this
document are permitted as metadata, but must not affect
the format of serialized data.
</li>
<li>A JSON array, representing a union of embedded types.</li>
</ul>
<section id="schema_primitive">
<title>Primitive Types</title>
<p>The set of primitive type names is:</p>
<ul>
<li><code>null</code>: no value</li>
<li><code>boolean</code>: a binary value</li>
<li><code>int</code>: 32-bit signed integer</li>
<li><code>long</code>: 64-bit signed integer</li>
<li><code>float</code>: single precision (32-bit) IEEE 754 floating-point number</li>
<li><code>double</code>: double precision (64-bit) IEEE 754 floating-point number</li>
<li><code>bytes</code>: sequence of 8-bit unsigned bytes</li>
<li><code>string</code>: unicode character sequence</li>
</ul>
<p>Primitive types have no specified attributes.</p>
<p>Primitive type names are also defined type names. Thus, for
example, the schema "string" is equivalent to:</p>
<source>{"type": "string"}</source>
</section>
<section id="schema_complex">
<title>Complex Types</title>
<p>Avro supports six kinds of complex types: records, enums,
arrays, maps, unions and fixed.</p>
<section id="schema_record">
<title>Records</title>
<p>Records use the type name "record" and support three attributes:</p>
<ul>
<li><code>name</code>: a JSON string providing the name
of the record (required).</li>
<li><em>namespace</em>, a JSON string that qualifies the name;</li>
<li><code>doc</code>: a JSON string providing documentation to the
user of this schema (optional).</li>
<li><code>aliases:</code> a JSON array of strings, providing
alternate names for this record (optional).</li>
<li><code>fields</code>: a JSON array, listing fields (required).
Each field is a JSON object with the following attributes:
<ul>
<li><code>name</code>: a JSON string providing the name
of the field (required), and </li>
<li><code>doc</code>: a JSON string describing this field
for users (optional).</li>
<li><code>type:</code> a <a href="#schemas">schema</a>, as defined above</li>
<li><code>default:</code> A default value for this
field, used when reading instances that lack this
field (optional). Permitted values depend on the
field's schema type, according to the table below.
Default values for union fields correspond to the
first schema in the union. Default values for bytes
and fixed fields are JSON strings, where Unicode
code points 0-255 are mapped to unsigned 8-bit byte
values 0-255.
<table class="right">
<caption>field default values</caption>
<tr><th>avro type</th><th>json type</th><th>example</th></tr>
<tr><td>null</td><td>null</td><td>null</td></tr>
<tr><td>boolean</td><td>boolean</td><td>true</td></tr>
<tr><td>int,long</td><td>integer</td><td>1</td></tr>
<tr><td>float,double</td><td>number</td><td>1.1</td></tr>
<tr><td>bytes</td><td>string</td><td>"\u00FF"</td></tr>
<tr><td>string</td><td>string</td><td>"foo"</td></tr>
<tr><td>record</td><td>object</td><td>{"a": 1}</td></tr>
<tr><td>enum</td><td>string</td><td>"FOO"</td></tr>
<tr><td>array</td><td>array</td><td>[1]</td></tr>
<tr><td>map</td><td>object</td><td>{"a": 1}</td></tr>
<tr><td>fixed</td><td>string</td><td>"\u00ff"</td></tr>
</table>
</li>
<li><code>order:</code> specifies how this field
impacts sort ordering of this record (optional).
Valid values are "ascending" (the default),
"descending", or "ignore". For more details on how
this is used, see the the <a href="#order">sort
order</a> section below.</li>
<li><code>aliases:</code> a JSON array of strings, providing
alternate names for this field (optional).</li>
</ul>
</li>
</ul>
<p>For example, a linked-list of 64-bit values may be defined with:</p>
<source>
{
"type": "record",
"name": "LongList",
"aliases": ["LinkedLongs"], // old name for this
"fields" : [
{"name": "value", "type": "long"}, // each element has a long
{"name": "next", "type": ["null", "LongList"]} // optional next element
]
}
</source>
</section>
<section>
<title>Enums</title>
<p>Enums use the type name "enum" and support the following
attributes:</p>
<ul>
<li><code>name</code>: a JSON string providing the name
of the enum (required).</li>
<li><em>namespace</em>, a JSON string that qualifies the name;</li>
<li><code>aliases:</code> a JSON array of strings, providing
alternate names for this enum (optional).</li>
<li><code>doc</code>: a JSON string providing documentation to the
user of this schema (optional).</li>
<li><code>symbols</code>: a JSON array, listing symbols,
as JSON strings (required). All symbols in an enum must
be unique; duplicates are prohibited. Every symbol must
match the regular expression <code>[A-Za-z_][A-Za-z0-9_]*</code>
(the same requirement as for <a href="#names">names</a>).</li>
<li><code>default</code>: A default value for this
enumeration, used during resolution when the reader
encounters a symbol from the writer that isn't defined
in the reader's schema (optional). The value provided
here must be a JSON string that's a member of
the <code>symbols</code> array.
See documentation on schema resolution for how this gets
used.</li>
</ul>
<p>For example, playing card suits might be defined with:</p>
<source>
{
"type": "enum",
"name": "Suit",
"symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}
</source>
</section>
<section>
<title>Arrays</title>
<p>Arrays use the type name <code>"array"</code> and support
a single attribute:</p>
<ul>
<li><code>items</code>: the schema of the array's items.</li>
</ul>
<p>For example, an array of strings is declared
with:</p>
<source>
{
"type": "array",
"items" : "string",
"default": []
}
</source>
</section>
<section>
<title>Maps</title>
<p>Maps use the type name <code>"map"</code> and support
one attribute:</p>
<ul>
<li><code>values</code>: the schema of the map's values.</li>
</ul>
<p>Map keys are assumed to be strings.</p>
<p>For example, a map from string to long is declared
with:</p>
<source>
{
"type": "map",
"items" : "long",
"default": {}
}
</source>
</section>
<section>
<title>Unions</title>
<p>Unions, as mentioned above, are represented using JSON
arrays. For example, <code>["null", "string"]</code>
declares a schema which may be either a null or string.</p>
<p>(Note that when a <a href="#schema_record">default
value</a> is specified for a record field whose type is a
union, the type of the default value must match the
<em>first</em> element of the union. Thus, for unions
containing "null", the "null" is usually listed first, since
the default value of such unions is typically null.)</p>
<p>Unions may not contain more than one schema with the same
type, except for the named types record, fixed and enum. For
example, unions containing two array types or two map types
are not permitted, but two types with different names are
permitted. (Names permit efficient resolution when reading
and writing unions.)</p>
<p>Unions may not immediately contain other unions.</p>
</section>
<section>
<title>Fixed</title>
<p>Fixed uses the type name <code>"fixed"</code> and supports
two attributes:</p>
<ul>
<li><code>name</code>: a string naming this fixed (required).</li>
<li><em>namespace</em>, a string that qualifies the name;</li>
<li><code>aliases:</code> a JSON array of strings, providing
alternate names for this enum (optional).</li>
<li><code>size</code>: an integer, specifying the number
of bytes per value (required).</li>
</ul>
<p>For example, 16-byte quantity may be declared with:</p>
<source>{"type": "fixed", "size": 16, "name": "md5"}</source>
</section>
</section> <!-- end complex types -->
<section id="names">
<title>Names</title>
<p>Record, enums and fixed are named types. Each has
a <em>fullname</em> that is composed of two parts;
a <em>name</em> and a <em>namespace</em>. Equality of names
is defined on the fullname.</p>
<p>The name portion of a fullname, record field names, and
enum symbols must:</p>
<ul>
<li>start with <code>[A-Za-z_]</code></li>
<li>subsequently contain only <code>[A-Za-z0-9_]</code></li>
</ul>
<p>A namespace is a dot-separated sequence of such names.
The empty string may also be used as a namespace to indicate the
null namespace.
Equality of names (including field names and enum symbols)
as well as fullnames is case-sensitive.</p>
<p>In record, enum and fixed definitions, the fullname is
determined in one of the following ways:</p>
<ul>
<li>A name and namespace are both specified. For example,
one might use <code>"name": "X", "namespace":
"org.foo"</code> to indicate the
fullname <code>org.foo.X</code>.</li>
<li>A fullname is specified. If the name specified contains
a dot, then it is assumed to be a fullname, and any
namespace also specified is ignored. For example,
use <code>"name": "org.foo.X"</code> to indicate the
fullname <code>org.foo.X</code>.</li>
<li>A name only is specified, i.e., a name that contains no
dots. In this case the namespace is taken from the most
tightly enclosing schema or protocol. For example,
if <code>"name": "X"</code> is specified, and this occurs
within a field of the record definition
of <code>org.foo.Y</code>, then the fullname
is <code>org.foo.X</code>. If there is no enclosing
namespace then the null namespace is used.</li>
</ul>
<p>References to previously defined names are as in the latter
two cases above: if they contain a dot they are a fullname, if
they do not contain a dot, the namespace is the namespace of
the enclosing definition.</p>
<p>Primitive type names have no namespace and their names may
not be defined in any namespace.</p>
<p> A schema or protocol may not contain multiple definitions
of a fullname. Further, a name must be defined before it is
used ("before" in the depth-first, left-to-right traversal of
the JSON parse tree, where the <code>types</code> attribute of
a protocol is always deemed to come "before" the
<code>messages</code> attribute.)
</p>
</section>
<section>
<title>Aliases</title>
<p>Named types and fields may have aliases. An implementation
may optionally use aliases to map a writer's schema to the
reader's. This faciliates both schema evolution as well as
processing disparate datasets.</p>
<p>Aliases function by re-writing the writer's schema using
aliases from the reader's schema. For example, if the
writer's schema was named "Foo" and the reader's schema is
named "Bar" and has an alias of "Foo", then the implementation
would act as though "Foo" were named "Bar" when reading.
Similarly, if data was written as a record with a field named
"x" and is read as a record with a field named "y" with alias
"x", then the implementation would act as though "x" were
named "y" when reading.</p>
<p>A type alias may be specified either as a fully
namespace-qualified, or relative to the namespace of the name
it is an alias for. For example, if a type named "a.b" has
aliases of "c" and "x.y", then the fully qualified names of
its aliases are "a.c" and "x.y".</p>
</section>
</section> <!-- end schemas -->
<section>
<title>Data Serialization and Deserialization</title>
<p>Binary encoded Avro data does not include type information or
field names. The benefit is that the serialized data is small, but
as a result a schema must always be used in order to read Avro data
correctly. The best way to ensure that the schema is structurally
identical to the one used to write the data is to use the exact same
schema.</p>
<p>Therefore, files or systems that store Avro data should always
include the writer's schema for that data. Avro-based remote procedure
call (RPC) systems must also guarantee that remote recipients of data
have a copy of the schema used to write that data. In general, it is
advisable that any reader of Avro data should use a schema that is
the same (as defined more fully in
<a href="#Parsing+Canonical+Form+for+Schemas">Parsing Canonical Form for
Schemas</a>) as the schema that was used to write the data in order to
deserialize it correctly. Deserializing data into a newer schema is
accomplished by specifying an additional schema, the results of which are
described in <a href="#Schema+Resolution">Schema Resolution</a>.</p>
<p>In general, both serialization and deserialization proceed as a
depth-first, left-to-right traversal of the schema, serializing or
deserializing primitive types as they are encountered. Therefore, it is
possible, though not advisable, to read Avro data with a schema that
does not have the same Parsing Canonical Form as the schema with which
the data was written. In order for this to work, the serialized primitive
values must be compatible, in order value by value, with the items in the
deserialization schema. For example, int and long are always serialized
the same way, so an int could be deserialized as a long. Since the
compatibility of two schemas depends on both the data and the
serialization format (eg. binary is more permissive than JSON because JSON
includes field names, eg. a long that is too large will overflow an int),
it is simpler and more reliable to use schemas with identical Parsing
Canonical Form.</p>
<section>
<title>Encodings</title>
<p>Avro specifies two serialization encodings: binary and
JSON. Most applications will use the binary encoding, as it
is smaller and faster. But, for debugging and web-based
applications, the JSON encoding may sometimes be
appropriate.</p>
</section>
<section id="binary_encoding">
<title>Binary Encoding</title>
<p>Binary encoding does not include field names, self-contained
information about the types of individual bytes, nor field or
record separators. Therefore readers are wholly reliant on
the schema used when the data was encoded.</p>
<section id="binary_encode_primitive">
<title>Primitive Types</title>
<p>Primitive types are encoded in binary as follows:</p>
<ul>
<li><code>null</code> is written as zero bytes.</li>
<li>a <code>boolean</code> is written as a single byte whose
value is either <code>0</code> (false) or <code>1</code>
(true).</li>
<li><code>int</code> and <code>long</code> values are written
using <a href="ext:vint">variable-length</a>
<a href="ext:zigzag">zig-zag</a> coding. Some examples:
<table class="right">
<tr><th>value</th><th>hex</th></tr>
<tr><td><code> 0</code></td><td><code>00</code></td></tr>
<tr><td><code>-1</code></td><td><code>01</code></td></tr>
<tr><td><code> 1</code></td><td><code>02</code></td></tr>
<tr><td><code>-2</code></td><td><code>03</code></td></tr>
<tr><td><code> 2</code></td><td><code>04</code></td></tr>
<tr><td colspan="2"><code>...</code></td></tr>
<tr><td><code>-64</code></td><td><code>7f</code></td></tr>
<tr><td><code> 64</code></td><td><code> 80 01</code></td></tr>
<tr><td colspan="2"><code>...</code></td></tr>
</table>
</li>
<li>a <code>float</code> is written as 4 bytes. The float is
converted into a 32-bit integer using a method equivalent
to <a href="https://java.sun.com/javase/6/docs/api/java/lang/Float.html#floatToIntBits%28float%29">Java's floatToIntBits</a> and then encoded
in little-endian format.</li>
<li>a <code>double</code> is written as 8 bytes. The double
is converted into a 64-bit integer using a method equivalent
to <a href="https://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits%28double%29">Java's
doubleToLongBits</a> and then encoded in little-endian
format.</li>
<li><code>bytes</code> are encoded as
a <code>long</code> followed by that many bytes of data.
</li>
<li>a <code>string</code> is encoded as
a <code>long</code> followed by that many bytes of UTF-8
encoded character data.
<p>For example, the three-character string "foo" would
be encoded as the long value 3 (encoded as
hex <code>06</code>) followed by the UTF-8 encoding of
'f', 'o', and 'o' (the hex bytes <code>66 6f
6f</code>):
</p>
<source>06 66 6f 6f</source>
</li>
</ul>
</section>
<section id="binary_encode_complex">
<title>Complex Types</title>
<p>Complex types are encoded in binary as follows:</p>
<section id="record_encoding">
<title>Records</title>
<p>A record is encoded by encoding the values of its
fields in the order that they are declared. In other
words, a record is encoded as just the concatenation of
the encodings of its fields. Field values are encoded per
their schema.</p>
<p>For example, the record schema</p>
<source>
{
"type": "record",
"name": "test",
"fields" : [
{"name": "a", "type": "long"},
{"name": "b", "type": "string"}
]
}
</source>
<p>An instance of this record whose <code>a</code> field has
value 27 (encoded as hex <code>36</code>) and
whose <code>b</code> field has value "foo" (encoded as hex
bytes <code>06 66 6f 6f</code>), would be encoded simply
as the concatenation of these, namely the hex byte
sequence:</p>
<source>36 06 66 6f 6f</source>
</section>
<section id="enum_encoding">
<title>Enums</title>
<p>An enum is encoded by a <code>int</code>, representing
the zero-based position of the symbol in the schema.</p>
<p>For example, consider the enum:</p>
<source>
{"type": "enum", "name": "Foo", "symbols": ["A", "B", "C", "D"] }
</source>
<p>This would be encoded by an <code>int</code> between
zero and three, with zero indicating "A", and 3 indicating
"D".</p>
</section>
<section id="array_encoding">
<title>Arrays</title>
<p>Arrays are encoded as a series of <em>blocks</em>.
Each block consists of a <code>long</code> <em>count</em>
value, followed by that many array items. A block with
count zero indicates the end of the array. Each item is
encoded per the array's item schema.</p>
<p>If a block's count is negative, its absolute value is used,
and the count is followed immediately by a <code>long</code>
block <em>size</em> indicating the number of bytes in the
block. This block size permits fast skipping through data,
e.g., when projecting a record to a subset of its fields.</p>
<p>For example, the array schema</p>
<source>{"type": "array", "items": "long"}</source>
<p>an array containing the items 3 and 27 could be encoded
as the long value 2 (encoded as hex 04) followed by long
values 3 and 27 (encoded as hex <code>06 36</code>)
terminated by zero:</p>
<source>04 06 36 00</source>
<p>The blocked representation permits one to read and write
arrays larger than can be buffered in memory, since one can
start writing items without knowing the full length of the
array.</p>
</section>
<section id="map_encoding">
<title>Maps</title>
<p>Maps are encoded as a series of <em>blocks</em>. Each
block consists of a <code>long</code> <em>count</em>
value, followed by that many key/value pairs. A block
with count zero indicates the end of the map. Each item
is encoded per the map's value schema.</p>
<p>If a block's count is negative, its absolute value is used,
and the count is followed immediately by a <code>long</code>
block <em>size</em> indicating the number of bytes in the
block. This block size permits fast skipping through data,
e.g., when projecting a record to a subset of its fields.</p>
<p>The blocked representation permits one to read and write
maps larger than can be buffered in memory, since one can
start writing items without knowing the full length of the
map.</p>
</section>
<section id="union_encoding">
<title>Unions</title>
<p>A union is encoded by first writing a <code>long</code>
value indicating the zero-based position within the
union of the schema of its value. The value is then
encoded per the indicated schema within the union.</p>
<p>For example, the union
schema <code>["null","string"]</code> would encode:</p>
<ul>
<li><code>null</code> as zero (the index of "null" in the union):
<source>00</source></li>
<li>the string <code>"a"</code> as one (the index of
"string" in the union, encoded as hex <code>02</code>),
followed by the serialized string:
<source>02 02 61</source></li>
</ul>
</section>
<section id="fixed_encoding">
<title>Fixed</title>
<p>Fixed instances are encoded using the number of bytes
declared in the schema.</p>
</section>
</section> <!-- end complex types -->
</section>
<section id="json_encoding">
<title>JSON Encoding</title>
<p>Except for unions, the JSON encoding is the same as is used
to encode <a href="#schema_record">field default
values</a>.</p>
<p>The value of a union is encoded in JSON as follows:</p>
<ul>
<li>if its type is <code>null</code>, then it is encoded as
a JSON null;</li>
<li>otherwise it is encoded as a JSON object with one
name/value pair whose name is the type's name and whose
value is the recursively encoded value. For Avro's named
types (record, fixed or enum) the user-specified name is
used, for other types the type name is used.</li>
</ul>
<p>For example, the union
schema <code>["null","string","Foo"]</code>, where Foo is a
record name, would encode:</p>
<ul>
<li><code>null</code> as <code>null</code>;</li>
<li>the string <code>"a"</code> as
<code>{"string": "a"}</code>; and</li>
<li>a Foo instance as <code>{"Foo": {...}}</code>,
where <code>{...}</code> indicates the JSON encoding of a
Foo instance.</li>
</ul>
<p>Note that the original schema is still required to correctly
process JSON-encoded data. For example, the JSON encoding does not
distinguish between <code>int</code>
and <code>long</code>, <code>float</code>
and <code>double</code>, records and maps, enums and strings,
etc.</p>
</section>
<section id="single_object_encoding">
<title>Single-object encoding</title>
<p>In some situations a single Avro serialized object is to be stored for a
longer period of time. One very common example is storing Avro records
for several weeks in an <a href="https://kafka.apache.org/">Apache Kafka</a> topic.</p>
<p>In the period after a schema change this persistence system will contain records
that have been written with different schemas. So the need arises to know which schema
was used to write a record to support schema evolution correctly.
In most cases the schema itself is too large to include in the message,
so this binary wrapper format supports the use case more effectively.</p>
<section id="single_object_encoding_spec">
<title>Single object encoding specification</title>
<p>Single Avro objects are encoded as follows:</p>
<ol>
<li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li>
<li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li>
<li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li>
</ol>
</section>
<p>Implementations use the 2-byte marker to determine whether a payload is Avro.
This check helps avoid expensive lookups that resolve the schema from a
fingerprint, when the message is not an encoded Avro payload.</p>
</section>
</section>
<section id="order">
<title>Sort Order</title>
<p>Avro defines a standard sort order for data. This permits
data written by one system to be efficiently sorted by another
system. This can be an important optimization, as sort order
comparisons are sometimes the most frequent per-object
operation. Note also that Avro binary-encoded data can be
efficiently ordered without deserializing it to objects.</p>
<p>Data items may only be compared if they have identical
schemas. Pairwise comparisons are implemented recursively
with a depth-first, left-to-right traversal of the schema.
The first mismatch encountered determines the order of the
items.</p>
<p>Two items with the same schema are compared according to the
following rules.</p>
<ul>
<li><code>null</code> data is always equal.</li>
<li><code>boolean</code> data is ordered with false before true.</li>
<li><code>int</code>, <code>long</code>, <code>float</code>
and <code>double</code> data is ordered by ascending numeric
value.</li>
<li><code>bytes</code> and <code>fixed</code> data are
compared lexicographically by unsigned 8-bit values.</li>
<li><code>string</code> data is compared lexicographically by
Unicode code point. Note that since UTF-8 is used as the
binary encoding for strings, sorting of bytes and string
binary data is identical.</li>
<li><code>array</code> data is compared lexicographically by
element.</li>
<li><code>enum</code> data is ordered by the symbol's position
in the enum schema. For example, an enum whose symbols are
<code>["z", "a"]</code> would sort <code>"z"</code> values
before <code>"a"</code> values.</li>
<li><code>union</code> data is first ordered by the branch
within the union, and, within that, by the type of the
branch. For example, an <code>["int", "string"]</code>
union would order all int values before all string values,
with the ints and strings themselves ordered as defined
above.</li>
<li><code>record</code> data is ordered lexicographically by
field. If a field specifies that its order is:
<ul>
<li><code>"ascending"</code>, then the order of its values
is unaltered.</li>
<li><code>"descending"</code>, then the order of its values
is reversed.</li>
<li><code>"ignore"</code>, then its values are ignored
when sorting.</li>
</ul>
</li>
<li><code>map</code> data may not be compared. It is an error
to attempt to compare data containing maps unless those maps
are in an <code>"order":"ignore"</code> record field.
</li>
</ul>
</section>
<section>
<title>Object Container Files</title>
<p>Avro includes a simple object container file format. A file
has a schema, and all objects stored in the file must be written
according to that schema, using binary encoding. Objects are
stored in blocks that may be compressed. Syncronization markers
are used between blocks to permit efficient splitting of files
for MapReduce processing.</p>
<p>Files may include arbitrary user-specified metadata.</p>
<p>A file consists of:</p>
<ul>
<li>A <em>file header</em>, followed by</li>
<li>one or more <em>file data blocks</em>.</li>
</ul>
<p>A file header consists of:</p>
<ul>
<li>Four bytes, ASCII 'O', 'b', 'j', followed by 1.</li>
<li><em>file metadata</em>, including the schema.</li>
<li>The 16-byte, randomly-generated sync marker for this file.</li>
</ul>
<p>File metadata is written as if defined by the following <a
href="#map_encoding">map</a> schema:</p>
<source>{"type": "map", "values": "bytes"}</source>
<p>All metadata properties that start with "avro." are reserved.
The following file metadata properties are currently used:</p>
<ul>
<li><strong>avro.schema</strong> contains the schema of objects
stored in the file, as JSON data (required).</li>
<li><strong>avro.codec</strong> the name of the compression codec
used to compress blocks, as a string. Implementations
are required to support the following codecs: "null" and "deflate".
If codec is absent, it is assumed to be "null". The codecs
are described with more detail below.</li>
</ul>
<p>A file header is thus described by the following schema:</p>
<source>
{"type": "record", "name": "org.apache.avro.file.Header",
"fields" : [
{"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}},
{"name": "meta", "type": {"type": "map", "values": "bytes"}},
{"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}},
]
}
</source>
<p>A file data block consists of:</p>
<ul>
<li>A long indicating the count of objects in this block.</li>
<li>A long indicating the size in bytes of the serialized objects
in the current block, after any codec is applied</li>
<li>The serialized objects. If a codec is specified, this is
compressed by that codec.</li>
<li>The file's 16-byte sync marker.</li>
</ul>
<p>Thus, each block's binary data can be efficiently extracted or skipped without
deserializing the contents. The combination of block size, object counts, and
sync markers enable detection of corrupt blocks and help ensure data integrity.</p>
<section>
<title>Required Codecs</title>
<section>
<title>null</title>
<p>The "null" codec simply passes through data uncompressed.</p>
</section>
<section>
<title>deflate</title>
<p>The "deflate" codec writes the data block using the
deflate algorithm as specified in
<a href="https://www.isi.edu/in-notes/rfc1951.txt">RFC 1951</a>,
and typically implemented using the zlib library. Note that this
format (unlike the "zlib format" in RFC 1950) does not have a
checksum.
</p>
</section>
</section>
<section>
<title>Optional Codecs</title>
<section>
<title>snappy</title>
<p>The "snappy" codec uses
Google's <a href="https://code.google.com/p/snappy/">Snappy</a>
compression library. Each compressed block is followed
by the 4-byte, big-endian CRC32 checksum of the
uncompressed data in the block.</p>
</section>
</section>
</section>
<section>
<title>Protocol Declaration</title>
<p>Avro protocols describe RPC interfaces. Like schemas, they are
defined with JSON text.</p>
<p>A protocol is a JSON object with the following attributes:</p>
<ul>
<li><em>protocol</em>, a string, the name of the protocol
(required);</li>
<li><em>namespace</em>, an optional string that qualifies the name;</li>
<li><em>doc</em>, an optional string describing this protocol;</li>
<li><em>types</em>, an optional list of definitions of named types
(records, enums, fixed and errors). An error definition is
just like a record definition except it uses "error" instead
of "record". Note that forward references to named types
are not permitted.</li>
<li><em>messages</em>, an optional JSON object whose keys are
message names and whose values are objects whose attributes
are described below. No two messages may have the same
name.</li>
</ul>
<p>The name and namespace qualification rules defined for schema objects
apply to protocols as well.</p>
<section>
<title>Messages</title>
<p>A message has attributes:</p>
<ul>
<li>a <em>doc</em>, an optional description of the message,</li>
<li>a <em>request</em>, a list of named,
typed <em>parameter</em> schemas (this has the same form
as the fields of a record declaration);</li>
<li>a <em>response</em> schema; </li>
<li>an optional union of declared <em>error</em> schemas.
The <em>effective</em> union has <code>"string"</code>
prepended to the declared union, to permit transmission of
undeclared "system" errors. For example, if the declared
error union is <code>["AccessError"]</code>, then the
effective union is <code>["string", "AccessError"]</code>.
When no errors are declared, the effective error union
is <code>["string"]</code>. Errors are serialized using
the effective union; however, a protocol's JSON
declaration contains only the declared union.
</li>
<li>an optional <em>one-way</em> boolean parameter.</li>
</ul>
<p>A request parameter list is processed equivalently to an
anonymous record. Since record field lists may vary between
reader and writer, request parameters may also differ
between the caller and responder, and such differences are
resolved in the same manner as record field differences.</p>
<p>The one-way parameter may only be true when the response type
is <code>"null"</code> and no errors are listed.</p>
</section>
<section>
<title>Sample Protocol</title>
<p>For example, one may define a simple HelloWorld protocol with:</p>
<source>
{
"namespace": "com.acme",
"protocol": "HelloWorld",
"doc": "Protocol Greetings",
"types": [
{"name": "Greeting", "type": "record", "fields": [
{"name": "message", "type": "string"}]},
{"name": "Curse", "type": "error", "fields": [
{"name": "message", "type": "string"}]}
],
"messages": {
"hello": {
"doc": "Say hello.",
"request": [{"name": "greeting", "type": "Greeting" }],
"response": "Greeting",
"errors": ["Curse"]
}
}
}
</source>
</section>
</section>
<section>
<title>Protocol Wire Format</title>
<section>
<title>Message Transport</title>
<p>Messages may be transmitted via
different <em>transport</em> mechanisms.</p>
<p>To the transport, a <em>message</em> is an opaque byte sequence.</p>
<p>A transport is a system that supports:</p>
<ul>
<li><strong>transmission of request messages</strong>
</li>
<li><strong>receipt of corresponding response messages</strong>
<p>Servers may send a response message back to the client
corresponding to a request message. The mechanism of
correspondance is transport-specific. For example, in
HTTP it is implicit, since HTTP directly supports requests
and responses. But a transport that multiplexes many
client threads over a single socket would need to tag
messages with unique identifiers.</p>
</li>
</ul>
<p>Transports may be either <em>stateless</em>
or <em>stateful</em>. In a stateless transport, messaging
assumes no established connection state, while stateful
transports establish connections that may be used for multiple
messages. This distinction is discussed further in
the <a href="#handshake">handshake</a> section below.</p>
<section>
<title>HTTP as Transport</title>
<p>When
<a href="https://www.w3.org/Protocols/rfc2616/rfc2616.html">HTTP</a>
is used as a transport, each Avro message exchange is an
HTTP request/response pair. All messages of an Avro
protocol should share a single URL at an HTTP server.
Other protocols may also use that URL. Both normal and
error Avro response messages should use the 200 (OK)
response code. The chunked encoding may be used for
requests and responses, but, regardless the Avro request
and response are the entire content of an HTTP request and
response. The HTTP Content-Type of requests and responses
should be specified as "avro/binary". Requests should be
made using the POST method.</p>
<p>HTTP is used by Avro as a stateless transport.</p>
</section>
</section>
<section>
<title>Message Framing</title>
<p>Avro messages are <em>framed</em> as a list of buffers.</p>
<p>Framing is a layer between messages and the transport.
It exists to optimize certain operations.</p>
<p>The format of framed message data is:</p>
<ul>
<li>a series of <em>buffers</em>, where each buffer consists of:
<ul>
<li>a four-byte, big-endian <em>buffer length</em>, followed by</li>
<li>that many bytes of <em>buffer data</em>.</li>
</ul>
</li>
<li>A message is always terminated by a zero-length buffer.</li>
</ul>
<p>Framing is transparent to request and response message
formats (described below). Any message may be presented as a
single or multiple buffers.</p>
<p>Framing can permit readers to more efficiently get
different buffers from different sources and for writers to
more efficiently store different buffers to different
destinations. In particular, it can reduce the number of
times large binary objects are copied. For example, if an RPC
parameter consists of a megabyte of file data, that data can
be copied directly to a socket from a file descriptor, and, on
the other end, it could be written directly to a file
descriptor, never entering user space.</p>
<p>A simple, recommended, framing policy is for writers to
create a new segment whenever a single binary object is
written that is larger than a normal output buffer. Small
objects are then appended in buffers, while larger objects are
written as their own buffers. When a reader then tries to
read a large object the runtime can hand it an entire buffer
directly, without having to copy it.</p>
</section>
<section id="handshake">
<title>Handshake</title>
<p>The purpose of the handshake is to ensure that the client
and the server have each other's protocol definition, so that
the client can correctly deserialize responses, and the server
can correctly deserialize requests. Both clients and servers
should maintain a cache of recently seen protocols, so that,
in most cases, a handshake will be completed without extra
round-trip network exchanges or the transmission of full
protocol text.</p>
<p>RPC requests and responses may not be processed until a
handshake has been completed. With a stateless transport, all
requests and responses are prefixed by handshakes. With a
stateful transport, handshakes are only attached to requests
and responses until a successful handshake response has been
returned over a connection. After this, request and response
payloads are sent without handshakes for the lifetime of that
connection.</p>
<p>The handshake process uses the following record schemas:</p>
<source>
{
"type": "record",
"name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
"fields": [