compact binary serialization format with built-in compression
1# Hateno Specification
2
3The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
4"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
5interpreted as described in [RFC 2119][rfc2119].
6
7## 0. Table of Contents
8
9<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->
10
11- [1. Overview](#1-overview)
12- [2. Data Representation](#2-data-representation)
13 - [2.1 Numeric Encoding](#21-numeric-encoding)
14 - [2.2 String Encoding](#22-string-encoding)
15- [3. Type System](#3-type-system)
16 - [3.1 Type Identifiers](#31-type-identifiers)
17 - [3.2 Array-Compatible Types](#32-array-compatible-types)
18 - [3.3 Map Key Compatible Types](#33-map-key-compatible-types)
19- [4. Binary Encoding](#4-binary-encoding)
20 - [4.1 Primitive Types](#41-primitive-types)
21 - [4.1.1 Integer Types](#411-integer-types)
22 - [4.1.2 Floating-Point Types](#412-floating-point-types)
23 - [4.1.3 Boolean](#413-boolean)
24 - [4.2 String Type](#42-string-type)
25 - [4.3 Option Type](#43-option-type)
26 - [4.4 List Type (Heterogeneous)](#44-list-type-heterogeneous)
27 - [4.5 Map Type](#45-map-type)
28 - [4.6 Array Type (Homogeneous)](#46-array-type-homogeneous)
29 - [4.7 Timestamp Type](#47-timestamp-type)
30 - [4.8 UUID Type](#48-uuid-type)
31- [5. File Format](#5-file-format)
32 - [5.1 File Structure](#51-file-structure)
33 - [5.2 Header Format](#52-header-format)
34 - [5.2.1 Magic Bytes](#521-magic-bytes)
35 - [5.2.1.1 File Extension and MIME type](#5211-file-extension-and-mime-type)
36 - [5.2.2 Version](#522-version)
37 - [5.2.3 Flags](#523-flags)
38 - [5.2.4 Compression Method](#524-compression-method)
39 - [5.2.5 Payload Length](#525-payload-length)
40 - [5.3 Payload](#53-payload)
41- [6. Example](#6-example)
42- [7. Conformance Requirements](#7-conformance-requirements)
43
44<!-- TOC end -->
45
46<!-- TOC --><a name="1-overview"></a>
47
48## 1. Overview
49
50Hateno is a binary serialization format designed be simple to read and write.
51
52<!-- TOC --><a name="2-data-representation"></a>
53
54## 2. Data Representation
55
56<!-- TOC --><a name="21-numeric-encoding"></a>
57
58### 2.1 Numeric Encoding
59
60- Unsigned integers: Standard binary representation
61- Signed integers: Two's complement representation
62- Floating-point: IEEE 754 standard (binary32 for f32, binary64 for f64)
63- Endianness: Determined by file header flags (see Section 4.2.4)
64
65<!-- TOC --><a name="22-string-encoding"></a>
66
67### 2.2 String Encoding
68
69All strings MUST be encoded as UTF-8. Alternate encoding schemes, such as Java's
70[Modified UTF-8][mutf8], are explicitly NOT supported.
71
72<!-- TOC --><a name="3-type-system"></a>
73
74## 3. Type System
75
76<!-- TOC --><a name="31-type-identifiers"></a>
77
78### 3.1 Type Identifiers
79
80| ID | Type | Description | Size (bytes) |
81| ---- | --------- | ----------------------------- | ------------ |
82| 0x00 | u8 | Unsigned 8-bit integer | 1 |
83| 0x01 | i8 | Signed 8-bit integer | 1 |
84| 0x02 | u16 | Unsigned 16-bit integer | 2 |
85| 0x03 | i16 | Signed 16-bit integer | 2 |
86| 0x04 | u32 | Unsigned 32-bit integer | 4 |
87| 0x05 | i32 | Signed 32-bit integer | 4 |
88| 0x06 | u64 | Unsigned 64-bit integer | 8 |
89| 0x07 | i64 | Signed 64-bit integer | 8 |
90| 0x08 | f32 | 32-bit IEEE 754 float | 4 |
91| 0x09 | f64 | 64-bit IEEE 754 float | 8 |
92| 0x0A | bool | Boolean value | 1 |
93| 0x0B | String | UTF-8 encoded string | Variable |
94| 0x0C | Option | Optional value container | Variable |
95| 0x0D | List | Heterogeneous array | Variable |
96| 0x0E | Map | Key-value dictionary | Variable |
97| 0x0F | Array | Homogeneous typed array | Variable |
98| 0x10 | Timestamp | Unix timestamp (milliseconds) | 8 |
99| 0x11 | UUID | 128-bit UUID | 16 |
100
101**Note**: Type IDs 0x12-0xFF are reserved for future expansion.
102
103<!-- TOC --><a name="32-array-compatible-types"></a>
104
105### 3.2 Array-Compatible Types
106
107The following types are valid as `Array` element types:
108
109- All integer types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`
110- All floating-point types: `f32`, `f64`
111- Boolean type: `bool`
112
113Complex types (`String`, `Option`, `List`, `Map`, `Array`, `Timestamp`, `UUID`)
114MUST use the heterogeneous `List` type.
115
116<!-- TOC --><a name="33-map-key-compatible-types"></a>
117
118### 3.3 Map Key Compatible Types
119
120All types are valid as `Map` key types except:
121
122- `Option`
123- `List`
124- `Map`
125- `Array`
126
127<!-- TOC --><a name="4-binary-encoding"></a>
128
129## 4. Binary Encoding
130
131<!-- TOC --><a name="41-primitive-types"></a>
132
133### 4.1 Primitive Types
134
135<!-- TOC --><a name="411-integer-types"></a>
136
137#### 4.1.1 Integer Types
138
139```
140[type_id: u8] [value: T]
141```
142
143Where `T` is the appropriately-sized integer in the file's endianness.
144
145<!-- TOC --><a name="412-floating-point-types"></a>
146
147#### 4.1.2 Floating-Point Types
148
149```
150[type_id: u8] [value: T]
151```
152
153Where `T` follows IEEE 754 encoding in the file's endianness.
154
155<!-- TOC --><a name="413-boolean"></a>
156
157#### 4.1.3 Boolean
158
159```
160[type_id: u8] [value: u8]
161```
162
163- `0x00`: false
164- `0x01`: true
165- All other values are invalid
166
167<!-- TOC --><a name="42-string-type"></a>
168
169### 4.2 String Type
170
171```
172[type_id: u8] [length: u32] [utf8_data: [u8; length]]
173```
174
175- `length`: Number of bytes in UTF-8 encoding
176- `utf8_data`: Valid UTF-8 byte sequence
177- Empty strings have length 0 and no data bytes
178
179<!-- TOC --><a name="43-option-type"></a>
180
181### 4.3 Option Type
182
183```
184[type_id: u8] [inner_type_id: u8] [discriminant: u8] [payload?]
185```
186
187- `inner_type_id`: Type ID of the contained value
188- `discriminant`:
189 - `0x00`: None (no payload follows)
190 - `0x01`: Some (payload follows)
191- `payload`: Present only when discriminant is `0x01`
192
193**Examples:**
194
195- `Option<u32>::None`: `[0x0C] [0x04] [0x00]`
196- `Option<u32>::Some(42)`: `[0x0C] [0x04] [0x01] [42, 0, 0, 0]` (little-endian)
197
198<!-- TOC --><a name="44-list-type-heterogeneous"></a>
199
200### 4.4 List Type (Heterogeneous)
201
202```
203[type_id: u8] [length: u32] [element_1] [element_2] ... [element_n]
204```
205
206- `length`: Number of elements
207- Each element is a complete typed value (type_id + data)
208- Elements may have different types
209
210**Example:** List `[42u8, "hello", true]`
211
212```
213[0x0D] // List type
214[3, 0, 0, 0] // 3 elements
215[0x00] [42] // u8: 42
216[0x0B] [5, 0, 0, 0] [h, e, l, l, o] // String: "hello"
217[0x0A] [1] // bool: true
218```
219
220<!-- TOC --><a name="45-map-type"></a>
221
222### 4.5 Map Type
223
224```
225[type_id: u8] [length: u32] [pair_1] [pair_2] ... [pair_n]
226```
227
228Each key-value pair:
229
230```
231[key] [value]
232```
233
234- `length`: Number of key-value pairs
235- Both `key` and `value` are complete typed values
236- `key` MUST be a map key compatible type (see Section 3.3)
237- Keys SHOULD be unique (behavior for duplicate keys is undefined)
238
239**Example:** Map `{42u8: "answer", "pi": 3.14f32}`
240
241```
242[0x0E] [2, 0, 0, 0] // Map with 2 pairs
243[0x00] [42] [0x0B] [6, 0, 0, 0] [a, n, s, w, e, r] // 42u8 -> "answer"
244[0x0B] [2, 0, 0, 0] [p, i] [0x08] [0xC3, 0xF5, 0x48, 0x40] // "pi" -> 3.14f32
245```
246
247<!-- TOC --><a name="46-array-type-homogeneous"></a>
248
249### 4.6 Array Type (Homogeneous)
250
251```
252[type_id: u8] [length: u32] [element_type_id: u8] [element_1] ... [element_n]
253```
254
255- `length`: Number of elements
256- `element_type_id`: Must be an array-compatible type (see Section 3.2)
257- Elements are stored as raw values (no type_id prefix per element)
258
259**Example:** `i32` array `[1, 2, 3]`
260
261```
262[0x0F] // Array type
263[3, 0, 0, 0] // 3 elements
264[0x05] // element type: i32
265[1, 0, 0, 0] // 1
266[2, 0, 0, 0] // 2
267[3, 0, 0, 0] // 3
268```
269
270<!-- TOC --><a name="47-timestamp-type"></a>
271
272### 4.7 Timestamp Type
273
274```
275[type_id: u8] [value: i64]
276```
277
278- `value`: Milliseconds since Unix epoch
279- Negative values represent times before the epoch
280
281<!-- TOC --><a name="48-uuid-type"></a>
282
283### 4.8 UUID Type
284
285```
286[type_id: u8] [bytes: [u8; 16]]
287```
288
289- `bytes`: 16 bytes representing the UUID in **big-endian** byte order
290- Follows [RFC 4122][rfc4122] standard binary representation
291- Byte order is independent of file endianness flag
292
293**Example:** UUID `550e8400-e29b-41d4-a716-446655440000`
294
295```
296[0x11] // UUID type
297[0x55, 0x0e, 0x84, 0x00, 0xe2, 0x9b, 0x41, 0xd4, // UUID bytes
298 0xa7, 0x16, 0x44, 0x66, 0x55, 0x44, 0x00, 0x00] // (big-endian)
299```
300
301<!-- TOC --><a name="5-file-format"></a>
302
303## 5. File Format
304
305<!-- TOC --><a name="51-file-structure"></a>
306
307### 5.1 File Structure
308
309```
310[header] [payload]
311```
312
313<!-- TOC --><a name="52-header-format"></a>
314
315### 5.2 Header Format
316
317```
318[magic: [u8; 4]] [version: u8] [flags: u8] [compression: u8] [payload_length: u32]
319```
320
321<!-- TOC --><a name="521-magic-bytes"></a>
322
323#### 5.2.1 Magic Bytes
324
325Fixed 4-byte signature: `HTNO` (0x48, 0x54, 0x4e, 0x4f).
326
327<!-- TOC --><a name="5211-file-extension-and-mime-type"></a>
328
329##### 5.2.1.1 File Extension and MIME type
330
331The recommended file extension is `.ht`. The recommended MIME type is
332`application/x-hateno`.
333
334<!-- TOC --><a name="522-version"></a>
335
336#### 5.2.2 Version
337
338- `0x01`: 1.0 (current version)
339- Future versions increment this value
340
341<!-- TOC --><a name="523-flags"></a>
342
343#### 5.2.3 Flags
344
3458-bit flag field:
346
347- Bit 0: Endianness (0 = little-endian, 1 = big-endian)
348- Bits 1-7: Reserved (MUST be zero)
349
350<!-- TOC --><a name="524-compression-method"></a>
351
352#### 5.2.4 Compression Method
353
354- `0x00`: No compression
355- `0x01`: Gzip compression ([RFC 1952][rfc1952])
356- `0x02`: Zlib compression ([RFC 1950][rfc1950])
357- `0x03`: LZ4 compression
358- `0x04-0xFF`: Reserved for future compression methods
359
360<!-- TOC --><a name="525-payload-length"></a>
361
362#### 5.2.5 Payload Length
363
364- Length of payload in bytes (u32, in file's endianness)
365- For compressed files: length of compressed data
366- For uncompressed files: length of raw data
367
368<!-- TOC --><a name="53-payload"></a>
369
370### 5.3 Payload
371
372The payload contains a single root typed value, typically a `Map`.
373
374<!-- TOC --><a name="6-example"></a>
375
376## 6. Example
377
378Complete uncompressed little-endian file containing `{"test": 42i32}`:
379
380```
381[0x48, 0x54, 0x4e, 0x4f] // Magic: "HTNO"
382[0x01] // Version: 2.1
383[0x00] // Flags: little-endian
384[0x00] // Compression: none
385[23, 0, 0, 0] // Payload length: 23 bytes
386
387// Payload: Map with one entry
388[0x0E] // Map type
389[1, 0, 0, 0] // 1 key-value pair
390[0x0B] [4, 0, 0, 0] [t, e, s, t] // String key: "test"
391[0x05] [42, 0, 0, 0] // i32 value: 42
392```
393
394<!-- TOC --><a name="7-conformance-requirements"></a>
395
396## 7. Conformance Requirements
397
398- Implementations MUST reject files with invalid magic bytes
399- Implementations MUST support at least version `0x01`
400- Unknown compression methods SHOULD be rejected
401- Invalid UTF-8 in strings MUST be rejected
402- Boolean values other than 0x00/0x01 MUST be rejected
403- Reserved flag bits MUST be zero in generated files
404
405[rfc2119]: https://www.rfc-editor.org/rfc/rfc2119
406[mutf8]: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/io/DataInput.html#modified-utf-8
407[rfc1950]: https://www.rfc-editor.org/rfc/rfc1950
408[rfc1952]: https://www.rfc-editor.org/rfc/rfc1952
409[rfc4122]: https://www.rfc-editor.org/rfc/rfc4122