compact binary serialization format with built-in compression
1# Hateno Specification
2
3The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
4"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
5interpreted as described in [RFC 2119][rfc2119].
6
7## 0. Table of Contents
8
9<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->
10
11- [1. Overview](#1-overview)
12- [2. Data Representation](#2-data-representation)
13 - [2.1 Numeric Encoding](#21-numeric-encoding)
14 - [2.2 String Encoding](#22-string-encoding)
15- [3. Type System](#3-type-system)
16 - [3.1 Type Identifiers](#31-type-identifiers)
17 - [3.2 Array-Compatible Types](#32-array-compatible-types)
18 - [3.3 Map Key Compatible Types](#33-map-key-compatible-types)
19- [4. Binary Encoding](#4-binary-encoding)
20 - [4.1 Primitive Types](#41-primitive-types)
21 - [4.1.1 Integer Types](#411-integer-types)
22 - [4.1.2 Floating-Point Types](#412-floating-point-types)
23 - [4.1.3 Boolean](#413-boolean)
24 - [4.2 String Type](#42-string-type)
25 - [4.3 Option Type](#43-option-type)
26 - [4.4 List Type (Heterogeneous)](#44-list-type-heterogeneous)
27 - [4.5 Map Type](#45-map-type)
28 - [4.6 Array Type (Homogeneous)](#46-array-type-homogeneous)
29 - [4.7 Timestamp Type](#47-timestamp-type)
30 - [4.8 UUID Type](#48-uuid-type)
31- [5. File Format](#5-file-format)
32 - [5.1 File Structure](#51-file-structure)
33 - [5.2 Header Format](#52-header-format)
34 - [5.2.1 Magic Bytes](#521-magic-bytes)
35 - [5.2.1.1 File Extension and MIME type](#5211-file-extension-and-mime-type)
36 - [5.2.2 Version](#522-version)
37 - [5.2.3 Flags](#523-flags)
38 - [5.2.4 Compression Method](#524-compression-method)
39 - [5.2.5 Payload Length](#525-payload-length)
40 - [5.3 Payload](#53-payload)
41- [6. Example](#6-example)
42- [7. Conformance Requirements](#7-conformance-requirements)
43
44<!-- TOC end -->
45
46<!-- TOC --><a name="1-overview"></a>
47
48## 1. Overview
49
50Hateno is a binary serialization format designed be simple to read and write.
51
52<!-- TOC --><a name="2-data-representation"></a>
53
54## 2. Data Representation
55
56<!-- TOC --><a name="21-numeric-encoding"></a>
57
58### 2.1 Numeric Encoding
59
60- Unsigned integers: Standard binary representation
61- Signed integers: Two's complement representation
62- Floating-point: IEEE 754 standard (binary32 for f32, binary64 for f64)
63- Endianness: Determined by file header flags (see Section 5.2.3)
64
65<!-- TOC --><a name="22-string-encoding"></a>
66
67### 2.2 String Encoding
68
69All strings MUST be encoded as UTF-8. Alternate encoding schemes, such as Java's
70[Modified UTF-8][mutf8], are explicitly NOT supported.
71
72<!-- TOC --><a name="3-type-system"></a>
73
74## 3. Type System
75
76<!-- TOC --><a name="31-type-identifiers"></a>
77
78### 3.1 Type Identifiers
79
80| ID | Type | Description | Size (bytes) |
81| ---- | --------- | ----------------------------- | ------------ |
82| 0x00 | u8 | Unsigned 8-bit integer | 1 |
83| 0x01 | i8 | Signed 8-bit integer | 1 |
84| 0x02 | u16 | Unsigned 16-bit integer | 2 |
85| 0x03 | i16 | Signed 16-bit integer | 2 |
86| 0x04 | u32 | Unsigned 32-bit integer | 4 |
87| 0x05 | i32 | Signed 32-bit integer | 4 |
88| 0x06 | u64 | Unsigned 64-bit integer | 8 |
89| 0x07 | i64 | Signed 64-bit integer | 8 |
90| 0x08 | f32 | 32-bit IEEE 754 float | 4 |
91| 0x09 | f64 | 64-bit IEEE 754 float | 8 |
92| 0x0A | bool | Boolean value | 1 |
93| 0x0B | String | UTF-8 encoded string | Variable |
94| 0x0C | Option | Optional value container | Variable |
95| 0x0D | List | Heterogeneous array | Variable |
96| 0x0E | Map | Key-value dictionary | Variable |
97| 0x0F | Array | Homogeneous typed array | Variable |
98| 0x10 | Timestamp | Unix timestamp (milliseconds) | 8 |
99| 0x11 | UUID | 128-bit UUID | 16 |
100
101**Note**: Type IDs 0x12-0xFF are reserved for future expansion.
102
103<!-- TOC --><a name="32-array-compatible-types"></a>
104
105### 3.2 Array-Compatible Types
106
107The following types are valid as `Array` element types:
108
109- All integer types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`
110- All floating-point types: `f32`, `f64`
111- Boolean type: `bool`
112
113Complex types (`String`, `Option`, `List`, `Map`, `Array`, `Timestamp`, `UUID`)
114MUST use the heterogeneous `List` type.
115
116<!-- TOC --><a name="33-map-key-compatible-types"></a>
117
118### 3.3 Map Key Compatible Types
119
120All types are valid as `Map` key types except:
121
122- `Option`
123- `List`
124- `Map`
125- `Array`
126
127<!-- TOC --><a name="4-binary-encoding"></a>
128
129## 4. Binary Encoding
130
131<!-- TOC --><a name="41-primitive-types"></a>
132
133### 4.1 Primitive Types
134
135<!-- TOC --><a name="411-integer-types"></a>
136
137#### 4.1.1 Integer Types
138
139```
140[type_id: u8] [value: T]
141```
142
143Where `T` is the appropriately-sized integer in the file's endianness.
144
145<!-- TOC --><a name="412-floating-point-types"></a>
146
147#### 4.1.2 Floating-Point Types
148
149```
150[type_id: u8] [value: T]
151```
152
153Where `T` follows IEEE 754 encoding in the file's endianness.
154
155<!-- TOC --><a name="413-boolean"></a>
156
157#### 4.1.3 Boolean
158
159```
160[type_id: u8] [value: u8]
161```
162
163- `0x00`: false
164- `0x01`: true
165- All other values are invalid
166
167<!-- TOC --><a name="42-string-type"></a>
168
169### 4.2 String Type
170
171```
172[type_id: u8] [length: u32] [utf8_data: [u8; length]]
173```
174
175- `length`: Number of bytes in UTF-8 encoding
176- `utf8_data`: Valid UTF-8 byte sequence
177- Empty strings have length 0 and no data bytes
178
179<!-- TOC --><a name="43-option-type"></a>
180
181### 4.3 Option Type
182
183```
184[type_id: u8] [inner_type_id: u8] [discriminant: u8] [payload?]
185```
186
187Note that the inner type ID is preserved so that type information is
188available even in case of a `None` value.
189
190- `inner_type_id`: Type ID of the contained value
191- `discriminant`:
192 - `0x00`: None (no payload follows)
193 - `0x01`: Some (payload follows)
194- `payload`: Present only when discriminant is `0x01`
195
196**Examples:**
197
198- `Option<u32>::None`: `[0x0C] [0x04] [0x00]`
199- `Option<u32>::Some(42)`: `[0x0C] [0x04] [0x01] [42, 0, 0, 0]` (little-endian)
200
201<!-- TOC --><a name="44-list-type-heterogeneous"></a>
202
203### 4.4 List Type (Heterogeneous)
204
205```
206[type_id: u8] [length: u32] [element_1] [element_2] ... [element_n]
207```
208
209- `length`: Number of elements
210- Each element is a complete typed value (type_id + data)
211- Elements may have different types
212
213**Example:** List `[42u8, "hello", true]`
214
215```
216[0x0D] // List type
217[3, 0, 0, 0] // 3 elements
218[0x00] [42] // u8: 42
219[0x0B] [5, 0, 0, 0] [h, e, l, l, o] // String: "hello"
220[0x0A] [1] // bool: true
221```
222
223<!-- TOC --><a name="45-map-type"></a>
224
225### 4.5 Map Type
226
227```
228[type_id: u8] [length: u32] [pair_1] [pair_2] ... [pair_n]
229```
230
231Each key-value pair:
232
233```
234[key] [value]
235```
236
237- `length`: Number of key-value pairs
238- Both `key` and `value` are complete typed values
239- `key` MUST be a map key compatible type (see Section 3.3)
240- Keys SHOULD be unique (behavior for duplicate keys is undefined)
241
242**Example:** Map `{42u8: "answer", "pi": 3.14f32}`
243
244```
245[0x0E] [2, 0, 0, 0] // Map with 2 pairs
246[0x00] [42] [0x0B] [6, 0, 0, 0] [a, n, s, w, e, r] // 42u8 -> "answer"
247[0x0B] [2, 0, 0, 0] [p, i] [0x08] [0xC3, 0xF5, 0x48, 0x40] // "pi" -> 3.14f32
248```
249
250<!-- TOC --><a name="46-array-type-homogeneous"></a>
251
252### 4.6 Array Type (Homogeneous)
253
254```
255[type_id: u8] [length: u32] [element_type_id: u8] [element_1] ... [element_n]
256```
257
258- `length`: Number of elements
259- `element_type_id`: Must be an array-compatible type (see Section 3.2)
260- Elements are stored as raw values (no type_id prefix per element)
261
262**Example:** `i32` array `[1, 2, 3]`
263
264```
265[0x0F] // Array type
266[3, 0, 0, 0] // 3 elements
267[0x05] // element type: i32
268[1, 0, 0, 0] // 1
269[2, 0, 0, 0] // 2
270[3, 0, 0, 0] // 3
271```
272
273<!-- TOC --><a name="47-timestamp-type"></a>
274
275### 4.7 Timestamp Type
276
277```
278[type_id: u8] [value: i64]
279```
280
281- `value`: Milliseconds since Unix epoch
282- Negative values represent times before the epoch
283
284<!-- TOC --><a name="48-uuid-type"></a>
285
286### 4.8 UUID Type
287
288```
289[type_id: u8] [bytes: [u8; 16]]
290```
291
292- `bytes`: 16 bytes representing the UUID in **big-endian** byte order
293- Follows [RFC 4122][rfc4122] standard binary representation
294- Byte order is independent of file endianness flag
295
296**Example:** UUID `550e8400-e29b-41d4-a716-446655440000`
297
298```
299[0x11] // UUID type
300[0x55, 0x0e, 0x84, 0x00, 0xe2, 0x9b, 0x41, 0xd4, // UUID bytes
301 0xa7, 0x16, 0x44, 0x66, 0x55, 0x44, 0x00, 0x00] // (big-endian)
302```
303
304<!-- TOC --><a name="5-file-format"></a>
305
306## 5. File Format
307
308<!-- TOC --><a name="51-file-structure"></a>
309
310### 5.1 File Structure
311
312```
313[header] [payload]
314```
315
316<!-- TOC --><a name="52-header-format"></a>
317
318### 5.2 Header Format
319
320```
321[magic: [u8; 4]] [version: u8] [flags: u8] [compression: u8] [payload_length: u32]
322```
323
324<!-- TOC --><a name="521-magic-bytes"></a>
325
326#### 5.2.1 Magic Bytes
327
328Fixed 4-byte signature: `HTNO` (0x48, 0x54, 0x4e, 0x4f).
329
330<!-- TOC --><a name="5211-file-extension-and-mime-type"></a>
331
332##### 5.2.1.1 File Extension and MIME type
333
334The recommended file extension is `.ht`. The recommended MIME type is
335`application/x-hateno`.
336
337<!-- TOC --><a name="522-version"></a>
338
339#### 5.2.2 Version
340
341- `0x01`: 1.0 (current version)
342- Future versions increment this value
343
344<!-- TOC --><a name="523-flags"></a>
345
346#### 5.2.3 Flags
347
3488-bit flag field:
349
350- Bit 0: Endianness (0 = little-endian, 1 = big-endian)
351- Bits 1-7: Reserved (MUST be zero)
352
353<!-- TOC --><a name="524-compression-method"></a>
354
355#### 5.2.4 Compression Method
356
357- `0x00`: No compression
358- `0x01`: Gzip compression ([RFC 1952][rfc1952])
359- `0x02`: Zlib compression ([RFC 1950][rfc1950])
360- `0x03`: LZ4 compression
361- `0x04-0xFF`: Reserved for future compression methods
362
363<!-- TOC --><a name="525-payload-length"></a>
364
365#### 5.2.5 Payload Length
366
367- Length of payload in bytes (u32, in file's endianness)
368- For compressed files: length of compressed data
369- For uncompressed files: length of raw data
370
371<!-- TOC --><a name="53-payload"></a>
372
373### 5.3 Payload
374
375The payload contains a single root typed value, typically a `Map`.
376
377<!-- TOC --><a name="6-example"></a>
378
379## 6. Example
380
381Complete uncompressed little-endian file containing `{"test": 42i32}`:
382
383```
384[0x48, 0x54, 0x4e, 0x4f] // Magic: "HTNO"
385[0x01] // Version: 1.0
386[0x00] // Flags: little-endian
387[0x00] // Compression: none
388[23, 0, 0, 0] // Payload length: 23 bytes
389
390// Payload: Map with one entry
391[0x0E] // Map type
392[1, 0, 0, 0] // 1 key-value pair
393[0x0B] [4, 0, 0, 0] [t, e, s, t] // String key: "test"
394[0x05] [42, 0, 0, 0] // i32 value: 42
395```
396
397<!-- TOC --><a name="7-conformance-requirements"></a>
398
399## 7. Conformance Requirements
400
401- Implementations MUST reject files with invalid magic bytes
402- Implementations MUST support at least version `0x01`
403- Unknown compression methods SHOULD be rejected
404- Invalid UTF-8 in strings MUST be rejected
405- Boolean values other than 0x00/0x01 MUST be rejected
406- Reserved flag bits MUST be zero in generated files
407
408[rfc2119]: https://www.rfc-editor.org/rfc/rfc2119
409[mutf8]: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/io/DataInput.html#modified-utf-8
410[rfc1950]: https://www.rfc-editor.org/rfc/rfc1950
411[rfc1952]: https://www.rfc-editor.org/rfc/rfc1952
412[rfc4122]: https://www.rfc-editor.org/rfc/rfc4122