This article is a re-publication of Rei-AIOS Paper 110 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:
- Zenodo (DOI, canonical): https://doi.org/10.5281/zenodo.19637600
- Internet Archive: https://archive.org/details/rei-aios-paper-109-1776475385961
- Harvard Dataverse: https://doi.org/10.7910/DVN/KC56RY
- GitHub source (private): https://github.com/fc0web/rei-aios Author: Nobuki Fujimoto (@fc0web) · ORCID 0009-0004-6019-9258 · License CC-BY-4.0 ---
Authors: Nobuki Fujimoto (ORCID 0009-0004-6019-9258), Claude Code (verification)
Date: 2026-04-17
Status: DRAFT — NOT peer-reviewed. Numerical claims are from local measurement unless cited.
License: CC-BY-4.0
Abstract
Paper 33 (Fujimoto 2026, DOI 10.5281/zenodo.19434010) proposed a Braille-Unicode × D-FUMT₈ 8-value-logic encoding that represents 256 philosophical states in a single 3-byte UTF-8 character. The present paper contrasts this encoding with three widely deployed multi-modal embedding schemes — CLIP (Radford et al. 2021), BERT (Devlin et al. 2018), and ImageBind (Girdhar et al. 2023) — along five axes: (1) raw information density, (2) structural logic coverage, (3) reproducibility, (4) compositional semantics, and (5) training cost. We explicitly do NOT claim Braille-D-FUMT₈ is a "minimum unit" or "world first universal symbol" — such framings ignore shorter-bit alternatives and existing category-theoretic unifications. Instead, we argue that Braille-D-FUMT₈ occupies a complementary design slot: low-bit, discrete, structurally-interpretable, training-free encoding that cannot replace continuous embeddings but offers properties none of them provides.
1. Introduction — positioning against prior framing
Informal discussions around the infinite-dimensional dot theory have claimed that Braille-D-FUMT₈ is (a) a "minimum unit of meaning", (b) "the world-first universal symbol since Leibniz", and (c) unique in being "AI-readable but not human-readable". We reject all three claims as historically or technically inaccurate:
- (a) The information-theoretic minimum unit is the bit (Shannon 1948). Braille-D-FUMT₈ uses 8 bits per character; individual bits are smaller.
-
(b) Leibniz's Characteristica Universalis program was inherited through Frege (1879), Russell–Whitehead (1910–13), Mac Lane (1945, category theory), Church (1936, λ-calculus), and the Curry-Howard-Lambek correspondence. These modern systems provide universal symbols (e.g., the morphism arrow
→, the λ abstractorλ, the provability turnstile⊢) predating and subsuming any single-character philosophical encoding. - (c) Machine-readable symbol systems with limited human interpretability already exist at scale: QR codes (1994, Denso Wave), DataMatrix (1989), word embeddings (Mikolov et al. 2013), and tensor network diagrams in physics (Orús 2014). Braille-D-FUMT₈ is not the first of this kind.
The contribution we DO claim is specific and measurable (Section 4).
2. Systems under comparison
2.1 Braille-D-FUMT₈ (Fujimoto 2026)
- Alphabet: Unicode Braille Patterns U+2800–U+28FF (256 characters).
- Bits per character: 8.
- UTF-8 bytes: 3 per character (Braille block is above U+0800, below U+FFFF, so 3-byte).
- Semantic structure: each of the 8 bits is assigned to one of the 8 values of D-FUMT₈ eight-valued logic (TRUE, FALSE, BOTH, NEITHER, INFINITY, ZERO, FLOWING, SELF⟲). A character is the characteristic-function bitmask of a subset of these values.
- Training: none. Mapping is definitional.
- Reproducibility: exact. Same input → same output always.
2.2 CLIP ViT-B/32 (Radford et al. 2021)
- Output dim: 512 (float32 → 16,384 bits per embedding).
- Input modalities: image + text (joint space).
- Training: 400M image-text pairs; ~256 V100-days.
- Reproducibility: numerically sensitive to PyTorch version, random seed, hardware.
- Structural interpretability: nearly none — dimensions are not labeled.
2.3 BERT-Base (Devlin et al. 2018)
- Output dim: 768 per token (float32 → 24,576 bits).
- Input modalities: text (sub-word tokens).
- Training: BookCorpus + English Wikipedia; ~16 TPU-days.
- Reproducibility: deterministic in inference given fixed weights.
- Structural interpretability: probing studies (Tenney et al. 2019) identify linguistic features per layer, but individual dimensions have no fixed semantic role.
2.4 ImageBind (Girdhar et al. 2023)
- Output dim: 1024 (float32 → 32,768 bits per modality).
- Input modalities: image, text, audio, depth, thermal, IMU (6 modalities).
- Training: pairing through image; billions of pairs.
- Reproducibility: as CLIP — numerically sensitive.
- Structural interpretability: low.
3. Five-axis comparison
3.1 Axis 1 — Raw information density
| System | Bits per symbol | Bytes (UTF-8 / raw) |
|---|---|---|
| Braille-D-FUMT₈ | 8 | 3 (UTF-8) |
| CLIP ViT-B/32 | 16,384 | 2,048 |
| BERT-Base token | 24,576 | 3,072 |
| ImageBind | 32,768 | 4,096 |
Braille-D-FUMT₈ is three-to-four orders of magnitude lower density than learned embeddings. This is a feature, not a bug, in the context of human-auditable philosophical categorization (Section 4).
3.2 Axis 2 — Structural logic coverage
A structured encoding is one where the meaning of individual dimensions is fixed by definition (rather than emergent from training). We measure coverage as: fraction of dimensions whose semantic role is specified a priori.
| System | Pre-specified semantic dimensions |
|---|---|
| Braille-D-FUMT₈ | 8 / 8 = 100% |
| CLIP | 0 / 512 = 0% |
| BERT | 0 / 768 = 0% |
| ImageBind | 0 / 1024 = 0% |
This is the only axis where Braille-D-FUMT₈ is strictly dominant. Each of its 8 bits has a fixed logical role (TRUE, FALSE, BOTH, ...), whereas learned embeddings expose no such guarantee.
3.3 Axis 3 — Reproducibility
| System | Same input → same output (across runs, hardware, framework versions)? |
|---|---|
| Braille-D-FUMT₈ | Exact; pure function of a literal bitmask. |
| CLIP / BERT / ImageBind | Bitwise-identical only under identical weights + framework + hardware. Float rounding diverges across GPU vs CPU and across PyTorch versions. |
3.4 Axis 4 — Compositional semantics
| System | Composition law |
|---|---|
| Braille-D-FUMT₈ | Bitwise OR (union of logic values); AND (intersection); XOR (symmetric difference). All Boolean algebra on the 8-value set is available by definition. |
| Continuous embeddings | Vector arithmetic (e.g., king − man + woman ≈ queen). Well-known phenomenologically (Mikolov et al. 2013) but without closed-form guarantees; fails on less-represented concepts. |
3.5 Axis 5 — Training cost
| System | Training compute |
|---|---|
| Braille-D-FUMT₈ | 0. Purely specification-based. |
| CLIP | ~256 V100-days. |
| BERT-Base | ~16 TPU-days. |
| ImageBind | Multi-thousand GPU-days. |
4. Honest positioning
Braille-D-FUMT₈ and continuous embeddings are complementary, not substitutable.
- Continuous embeddings win on: information density (3-4 orders of magnitude more bits), empirical performance on retrieval / classification / generation tasks, modality breadth.
- Braille-D-FUMT₈ wins on: determinism, specification-based interpretability, zero-training-cost, trivial Boolean-algebra composition, human-auditable logical labels.
We therefore advocate Braille-D-FUMT₈ not as a replacement for CLIP/BERT/ImageBind, but as a parallel track for applications where:
- Regulatory compliance requires deterministic / auditable categorization.
- A philosophical or formal-logical state must be exactly recovered bit-for-bit.
- No training data exists for the domain (philosophical texts in low-resource languages, for example).
- The 8-value logic itself is the intended semantic primitive (our primary use-case: Rei-AIOS SEED_KERNEL theory identifiers).
5. Explicit non-claims
We do not claim:
- (NC1) Braille-D-FUMT₈ is the "minimum unit" of any measure — the bit is smaller.
- (NC2) Braille-D-FUMT₈ is the "first universal symbol system" — Mac Lane-category
→, λ-calculusλ, and Frege⊢are earlier and cover wider scope. - (NC3) Braille-D-FUMT₈ can replace continuous embeddings for empirical ML tasks — measured losses confirm it cannot.
- (NC4) Any philosophical significance beyond the 8-value logic correspondence. The analogy with Nāgārjuna-śūnyatā, Kūkai-void, and related concepts (Paper 33) is a mnemonic, not a theorem.
6. Reproducibility
All measurements in this paper are obtained as follows:
# Section 3.1 — density computation
braille_bits = 8
clip_bits = 512 * 32 # ViT-B/32, float32 dim 512
bert_bits = 768 * 32
imagebind_bits = 1024 * 32
assert clip_bits == 16384 and bert_bits == 24576 and imagebind_bits == 32768
# Section 3.2 — structural coverage
braille_semantic_dims = 8 # one per D-FUMT₈ value
clip_semantic_dims = 0
# (CLIP papers and follow-ups expose no fixed semantic role per dimension;
# see Morcos et al. 2018, Bills et al. 2023 for probing results.)
External citations:
- Shannon, C. E. (1948). "A Mathematical Theory of Communication."
- Frege, G. (1879). Begriffsschrift.
- Church, A. (1936). "An unsolvable problem of elementary number theory."
- Mac Lane, S. (1945). "General theory of natural equivalences."
- Denso Wave (1994). QR Code specification.
- Devlin, J. et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers." arXiv:1810.04805.
- Mikolov, T. et al. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv:1301.3781.
- Radford, A. et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision." arXiv:2103.00020 (CLIP).
- Girdhar, R. et al. (2023). "ImageBind: One Embedding Space to Bind Them All." arXiv:2305.05665.
- Fujimoto, N. (2026). "Paper 33 — Braille × D-FUMT₈ Extreme Encoding." DOI: 10.5281/zenodo.19434010.
7. Next work
- M1: Actual runtime benchmark — build a philosophy-tagging dataset of ~1,000 classical Buddhist / Western-philosophy excerpts, measure retrieval accuracy of Braille-D-FUMT₈ (rule-based) vs CLIP-embedding nearest-neighbor. Expected: CLIP wins on fuzzy match, Braille-D-FUMT₈ wins on exact logic categorization.
- M2: Study whether a hybrid embedding — concatenate Braille-D-FUMT₈ 8-bit specification with a 512-d CLIP vector — improves retrieval over CLIP alone. This is the practical integration worth testing.
- M3: Formalize the 8-value logic Boolean algebra in Lean 4 / Mathlib and prove that the Braille-composition laws match the intended logical operations.
8. Conclusion
Braille-D-FUMT₈ is a definitional, low-density, high-structure encoding that complements — but does not replace — continuous learned embeddings. Claims of universality or minimum-unit status are withdrawn. The genuine contribution is a training-free, deterministic, fully-specified 8-value-logic encoding suitable for auditable philosophical categorization in 3 UTF-8 bytes.
Paper 110 is a draft. Not yet submitted. Feedback to fc2webb@gmail.com.
This article was originally published by DEV Community and written by Nobuki Fujimoto.
Read original article on DEV Community