Open Source Binary Asset Management

Content-Addressable Binary Asset Management

BinaryBeast uses FastCDC chunking, Blake3 hashing, and pack-based deduplication to efficiently store and transfer large binary files at scale.

# Upload an asset to a library
$ bb asset push --library game-assets --wait hero-model.fbx
Submitting upload task for hero-model.fbx...
✅ Task 1 completed · 2.4 GB · 11.4s
# Download it anywhere
$ bb asset pull --wait abc123...f9e8 ./hero-model.fbx
✅ Task 2 completed · Blake3 verified · 5.8s
# Or use workspaces for team workflows
$ bb workspace init --library game-assets ./project
✅ Initialized empty workspace at ./project/.bb-workspace
$ bb workspace add ./project/hero-model.fbx ./project/textures/
Task 3 submitted for hero-model.fbx
Task 4 submitted for textures/
$ bb workspace version
Waiting for pending workspace add tasks... done.
✅ Committed staged changes · 🎉 Initial workspace created!
$ bb workspace status
Workspace: ./project/.bb-workspace
Version: 1 · Libraries: 1 · Assets: 2
$ bb workspace sync
Syncing workspace: ./project/.bb-workspace
Uploaded 2 files · 55% bandwidth saved · Everything is in sync.
Blake3
Cryptographic Hashing
FastCDC
Content-Defined Chunking
~100MB
Optimized Pack Size
4
Specialized Worker Pools

Three-Tier System Design

A clean separation of concerns across CLI, daemon, and server — each optimized for its workload with dedicated data stores and communication protocols.

CLI Client

bb

Cobra-powered command interface for asset upload, download, library management, and workspace sync. Communicates with daemon via Unix socket IPC.

Cobra Unix Socket Async Tasks

Background Daemon

bbd

Task orchestration engine with 4 specialized worker pools, pack accumulators, batch state tracking, and async SQLite persistence in WAL mode.

Worker Pools SQLite WAL Accumulators

Server API

bbs

Gin-powered HTTP/2 REST API coordinating metadata in MongoDB and binary storage in S3/MinIO. Presigned URLs enable direct client-to-storage transfers.

Gin MongoDB S3/MinIO Keycloak
bb → bbd via Unix Socket
bbd → bbs via HTTP/2 REST
bbd → S3 via Presigned URLs

Upload & Download Flows

Optimized multi-stage pipelines with single-pass hashing, cross-asset pack accumulation, and parallel presigned-URL transfers.

Upload Pipeline

bb asset upload
01

Chunk & Hash

FastCDC splits file into 256KB–8MB variable chunks. Blake3 hashes both chunks and full asset in a single streaming pass.

02

Check Chunks

Batch POST chunk hashes to server. Existing chunks skip upload — instant cross-asset deduplication at the chunk level.

03

Pack Accumulate

Single-goroutine accumulator consolidates new chunks across multiple assets into ~100MB packs. 10x fewer S3 PUT requests.

04

Build & Hash

CPU-bound workers compute Blake3 digest of assembled packs. Platform-optimized: ARM64 NEON on macOS, SIMD on Linux.

05

Upload to S3

Direct presigned PUT to S3/MinIO — server never proxies data. Retry with exponential backoff on transient failures.

06

Finalize

Server creates asset record, generates MsgPack download manifest, and stores to S3 for instant future downloads.

Download Pipeline

bb asset download
01

Fetch Manifest

Pre-computed MsgPack manifest retrieved via presigned URL. Contains pack hashes and byte-range write targets.

02

Preallocate File

fallocate() on Linux reserves contiguous disk space instantly. Enables concurrent random-access writes without fragmentation.

03

Batch Presign

Single batch request gets presigned GET URLs for all unique packs. Deduplicates pack references across manifest entries.

04

Download & Write

Parallel pack downloads from S3 with contiguous write merging. pwrite64 enables lock-free concurrent file writes.

05

Verify Integrity

Full-file Blake3 hash verification ensures bit-perfect reconstruction. Automatic cleanup on hash mismatch.

Engineered for Performance

Every component is purpose-built for handling large binary assets efficiently, from hashing algorithms to network protocols.

FastCDC Chunking

Content-defined chunking with rolling hash boundary detection. Variable-size chunks (256KB–8MB) ensure identical content produces identical chunks regardless of file context.

avg 1MB · window 64B · seed 0x280AE5C0

Blake3 Hashing

Platform-optimized cryptographic hashing. ARM64 NEON on Apple Silicon, SIMD assembly on Linux x86_64. Single-pass hashing computes chunk and asset digests simultaneously.

256-bit · platform-optimized · streaming

Pack-Based Storage

Chunks consolidated into ~100MB packs via single-goroutine accumulator. Cross-asset packing reduces S3 PUT requests by 10x while maintaining per-chunk addressability.

~100MB packs · cross-asset · 10x fewer PUTs

Presigned URL Transfers

Direct client-to-S3 data transfers via presigned URLs. Server coordinates metadata only — never proxies binary data. Eliminates server bandwidth bottleneck.

direct S3 · 1hr expiry · zero proxy

4 Worker Pools

Dedicated pools for I/O-bound asset ops, network chunk checks, CPU-bound pack building, and network-bound transfers. Each tuned for its workload characteristics.

asset · check · build · transfer

SQLite WAL Mode

Write-Ahead Logging enables concurrent reads with serialized writes for the daemon task queue. 30s busy timeout handles heavy concurrent workloads gracefully.

WAL · NORMAL sync · 10 connections

Worker Buffer Pool

Managed 100MB reusable buffers with idle-timeout deallocation. Channel-based semaphore limits in-flight packs for bounded memory growth under load.

100MB buffers · semaphore · lazy alloc

File Preallocation

fallocate() on Linux reserves contiguous disk space before download. Enables efficient pwrite64 random-access writes without filesystem fragmentation.

fallocate · pwrite64 · zero fragmentation

Cross-Platform

Native support for macOS (ARM64), Linux (amd64), and Windows. Platform-specific IPC (Unix sockets vs named pipes), hashing, and file I/O optimizations.

darwin · linux · windows

OAuth2 / Keycloak

JWT-based authentication with automatic token refresh via background goroutine. JWKS caching with TTL avoids constant key fetches.

JWT · auto-refresh · JWKS cache

MsgPack Manifests

Pre-computed binary download manifests stored in S3 at upload time. Downloads skip chunk resolution entirely — just fetch manifest and go.

binary · pre-computed · instant pulls

Rolling Metrics

30-second rolling window with per-task-type I/O breakdown. Tracks upload/download speed, disk I/O, and cumulative throughput in real-time.

30s window · per-task · real-time
Version 1 — Initial Upload (2.4 GB)
2.4 GB transferred
Version 2 — 200 MB modified
2.2 GB deduplicated
200 MB
Version 3 — 50 MB modified
2.35 GB deduplicated
50 MB
Same file, different user
2.4 GB deduplicated — 0 bytes transferred

Multi-Level Content Deduplication

BinaryBeast deduplicates at every level: chunk-level across assets, within-pack across concurrent uploads, and across versions via predecessor chains.

Modified 200 MB of a 2.4 GB file? Only the changed chunks upload. Same file uploaded by another user? Zero bytes transferred.

1 MB
Average chunk size
256K–8M
Variable chunk range
3-level
Dedup granularity
Batch
Existence check

Specialized Worker Pools

Four dedicated worker pools, each tuned to its workload profile. Tasks flow through a dispatcher to the right pool — no lock contention, no shared mutable state.

Asset Workers

I/O Bound
upload_asset
download_asset
finish_*

Check Workers

Network + I/O
check_chunks
batch verification
dedup lookups

Pack Builders

CPU Bound
build_pack
Blake3 digest
pack assembly

Transfer Workers

Network Bound
upload_pack
download_and_write
presigned S3 I/O

Pack Accumulator

Single-goroutine pattern — no locks, no contention. Consolidates chunks from multiple concurrent uploads into optimal packs with 5-second timeout flush and drain-signal eager flush.

Download Accumulator

Batches download requests by pack hash for optimal network utilization. Flushes at 5,000 chunks or 2-second timeout. Tracks presigned URLs and maps write targets across files.

Built With

Production-grade components chosen for performance, reliability, and operational simplicity.

🔷
Go
Core language — all three binaries
Blake3
Cryptographic hashing with SIMD
📦
FastCDC
Content-defined chunking
🗄️
SQLite
Daemon task queue (WAL mode)
🍃
MongoDB
Server metadata & chunk mapping
☁️
S3 / MinIO
Object storage with presigned URLs
🔐
Keycloak
OAuth2 + JWT authentication
🌐
Gin
HTTP/2 REST API framework
💚
Vue 3
Web UI (Composition API)
🎨
Tailwind CSS
Utility-first styling
📡
MsgPack
Binary manifest serialization
🐍
Cobra
CLI command framework
pkg/chunking/fastcdc.go
// BoundaryIterator streams chunk boundaries via channels // Single-pass: computes chunk hashes + asset hash simultaneously func BoundaryIterator(r io.Reader) ( <-chan ChunkBoundary, // hash, offset, size per chunk <-chan string, // final asset Blake3 hash <-chan error, ) { opts := fastcdc.Options{ MinSize: 256 * 1024, // 256 KB AverageSize: 1 * 1024 * 1024, // 1 MB MaxSize: 8 * 1024 * 1024, // 8 MB } // TeeReader: hash entire asset while chunking assetHasher := hash.New() tee := io.TeeReader(r, assetHasher) // ... }

Native Everywhere

Platform-specific optimizations via Go build tags. Each platform gets the best available primitives for hashing, I/O, and IPC.

🍎

macOS (ARM64)

Blake3 with ARM NEON acceleration. Unix domain socket IPC. mmap for memory-mapped file I/O.

_darwin_arm64.go _unix.go
🐧

Linux (amd64)

SIMD-optimized Blake3. fallocate() for contiguous preallocation. Unix sockets. Full mmap support.

_linux.go _unix.go
🪟

Windows (amd64)

Named pipes for IPC. Pure Go Blake3. Truncate-based file preallocation with standard I/O fallbacks.

_windows.go

Ready to Tame Your Binary Assets?

BinaryBeast handles the complexity of large binary file management so you can focus on what matters — building great software.