compact binary serialization format with built-in compression

Hateno Specification#

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

0. Table of Contents#

1. Overview#

Hateno is a binary serialization format designed be simple to read and write.

2. Data Representation#

2.1 Numeric Encoding#

  • Unsigned integers: Standard binary representation
  • Signed integers: Two's complement representation
  • Floating-point: IEEE 754 standard (binary32 for f32, binary64 for f64)
  • Endianness: Determined by file header flags (see Section 5.2.3)

2.2 String Encoding#

All strings MUST be encoded as UTF-8. Alternate encoding schemes, such as Java's Modified UTF-8, are explicitly NOT supported.

3. Type System#

3.1 Type Identifiers#

ID Type Description Size (bytes)
0x00 u8 Unsigned 8-bit integer 1
0x01 i8 Signed 8-bit integer 1
0x02 u16 Unsigned 16-bit integer 2
0x03 i16 Signed 16-bit integer 2
0x04 u32 Unsigned 32-bit integer 4
0x05 i32 Signed 32-bit integer 4
0x06 u64 Unsigned 64-bit integer 8
0x07 i64 Signed 64-bit integer 8
0x08 f32 32-bit IEEE 754 float 4
0x09 f64 64-bit IEEE 754 float 8
0x0A bool Boolean value 1
0x0B String UTF-8 encoded string Variable
0x0C Option Optional value container Variable
0x0D List Heterogeneous array Variable
0x0E Map Key-value dictionary Variable
0x0F Array Homogeneous typed array Variable
0x10 Timestamp Unix timestamp (milliseconds) 8
0x11 UUID 128-bit UUID 16

Note: Type IDs 0x12-0xFF are reserved for future expansion.

3.2 Array-Compatible Types#

The following types are valid as Array element types:

  • All integer types: u8, i8, u16, i16, u32, i32, u64, i64
  • All floating-point types: f32, f64
  • Boolean type: bool

Complex types (String, Option, List, Map, Array, Timestamp, UUID) MUST use the heterogeneous List type.

3.3 Map Key Compatible Types#

All types are valid as Map key types except:

  • Option
  • List
  • Map
  • Array

4. Binary Encoding#

4.1 Primitive Types#

4.1.1 Integer Types#

[type_id: u8] [value: T]

Where T is the appropriately-sized integer in the file's endianness.

4.1.2 Floating-Point Types#

[type_id: u8] [value: T]

Where T follows IEEE 754 encoding in the file's endianness.

4.1.3 Boolean#

[type_id: u8] [value: u8]
  • 0x00: false
  • 0x01: true
  • All other values are invalid

4.2 String Type#

[type_id: u8] [length: u32] [utf8_data: [u8; length]]
  • length: Number of bytes in UTF-8 encoding
  • utf8_data: Valid UTF-8 byte sequence
  • Empty strings have length 0 and no data bytes

4.3 Option Type#

[type_id: u8] [inner_type_id: u8] [discriminant: u8] [payload?]

Note that the inner type ID is preserved so that type information is available even in case of a None value.

  • inner_type_id: Type ID of the contained value
  • discriminant:
    • 0x00: None (no payload follows)
    • 0x01: Some (payload follows)
  • payload: Present only when discriminant is 0x01

Examples:

  • Option<u32>::None: [0x0C] [0x04] [0x00]
  • Option<u32>::Some(42): [0x0C] [0x04] [0x01] [42, 0, 0, 0] (little-endian)

4.4 List Type (Heterogeneous)#

[type_id: u8] [length: u32] [element_1] [element_2] ... [element_n]
  • length: Number of elements
  • Each element is a complete typed value (type_id + data)
  • Elements may have different types

Example: List [42u8, "hello", true]

[0x0D]                              // List type
[3, 0, 0, 0]                        // 3 elements
[0x00] [42]                         // u8: 42
[0x0B] [5, 0, 0, 0] [h, e, l, l, o] // String: "hello"
[0x0A] [1]                          // bool: true

4.5 Map Type#

[type_id: u8] [length: u32] [pair_1] [pair_2] ... [pair_n]

Each key-value pair:

[key] [value]
  • length: Number of key-value pairs
  • Both key and value are complete typed values
  • key MUST be a map key compatible type (see Section 3.3)
  • Keys SHOULD be unique (behavior for duplicate keys is undefined)

Example: Map {42u8: "answer", "pi": 3.14f32}

[0x0E] [2, 0, 0, 0]                                        // Map with 2 pairs
[0x00] [42] [0x0B] [6, 0, 0, 0] [a, n, s, w, e, r]         // 42u8 -> "answer"
[0x0B] [2, 0, 0, 0] [p, i] [0x08] [0xC3, 0xF5, 0x48, 0x40] // "pi" -> 3.14f32

4.6 Array Type (Homogeneous)#

[type_id: u8] [length: u32] [element_type_id: u8] [element_1] ... [element_n]
  • length: Number of elements
  • element_type_id: Must be an array-compatible type (see Section 3.2)
  • Elements are stored as raw values (no type_id prefix per element)

Example: i32 array [1, 2, 3]

[0x0F]           // Array type
[3, 0, 0, 0]     // 3 elements
[0x05]           // element type: i32
[1, 0, 0, 0]     // 1
[2, 0, 0, 0]     // 2
[3, 0, 0, 0]     // 3

4.7 Timestamp Type#

[type_id: u8] [value: i64]
  • value: Milliseconds since Unix epoch
  • Negative values represent times before the epoch

4.8 UUID Type#

[type_id: u8] [bytes: [u8; 16]]
  • bytes: 16 bytes representing the UUID in big-endian byte order
  • Follows RFC 4122 standard binary representation
  • Byte order is independent of file endianness flag

Example: UUID 550e8400-e29b-41d4-a716-446655440000

[0x11]                                           // UUID type
[0x55, 0x0e, 0x84, 0x00, 0xe2, 0x9b, 0x41, 0xd4, // UUID bytes
 0xa7, 0x16, 0x44, 0x66, 0x55, 0x44, 0x00, 0x00] //  (big-endian)

5. File Format#

5.1 File Structure#

[header] [payload]

5.2 Header Format#

[magic: [u8; 4]] [version: u8] [flags: u8] [compression: u8] [payload_length: u32]

5.2.1 Magic Bytes#

Fixed 4-byte signature: HTNO (0x48, 0x54, 0x4e, 0x4f).

5.2.1.1 File Extension and MIME type#

The recommended file extension is .ht. The recommended MIME type is application/x-hateno.

5.2.2 Version#

  • 0x01: 1.0 (current version)
  • Future versions increment this value

5.2.3 Flags#

8-bit flag field:

  • Bit 0: Endianness (0 = little-endian, 1 = big-endian)
  • Bits 1-7: Reserved (MUST be zero)

5.2.4 Compression Method#

  • 0x00: No compression
  • 0x01: Gzip compression (RFC 1952)
  • 0x02: Zlib compression (RFC 1950)
  • 0x03: LZ4 compression
  • 0x04-0xFF: Reserved for future compression methods

5.2.5 Payload Length#

  • Length of payload in bytes (u32, in file's endianness)
  • For compressed files: length of compressed data
  • For uncompressed files: length of raw data

5.3 Payload#

The payload contains a single root typed value, typically a Map.

6. Example#

Complete uncompressed little-endian file containing {"test": 42i32}:

[0x48, 0x54, 0x4e, 0x4f]          // Magic: "HTNO"
[0x01]                            // Version: 1.0
[0x00]                            // Flags: little-endian
[0x00]                            // Compression: none
[23, 0, 0, 0]                     // Payload length: 23 bytes

// Payload: Map with one entry
[0x0E]                            // Map type
[1, 0, 0, 0]                      // 1 key-value pair
[0x0B] [4, 0, 0, 0] [t, e, s, t]  // String key: "test"
[0x05] [42, 0, 0, 0]              // i32 value: 42

7. Conformance Requirements#

  • Implementations MUST reject files with invalid magic bytes
  • Implementations MUST support at least version 0x01
  • Unknown compression methods SHOULD be rejected
  • Invalid UTF-8 in strings MUST be rejected
  • Boolean values other than 0x00/0x01 MUST be rejected
  • Reserved flag bits MUST be zero in generated files