A lightweight, clustered TCP key-value store written in C++23. Binary protocol, epoll-driven, with optional disk persistence and distributed clustering.
  • C++ 97.4%
  • Python 1.9%
  • CMake 0.6%
  • Dockerfile 0.1%
Find a file
Insidious Fiddler d6d256527e
feat(config): add YAML config support with extension auto-detect
Add yaml-cpp via FetchContent and link yaml-cpp::yaml-cpp to targets.
Implement parse_yaml_config_file and load_config_file (dispatch
.yaml/.yml
to YAML, others to INI), add is_yaml_extension helper, and update main/
server to use load_config_file. Add extensive unit tests for YAML
parsing.
2026-02-21 00:37:19 -05:00
.forgejo/workflows ci(sync-wiki): update wiki repository clone URL 2026-02-06 16:05:19 -05:00
include/mayfly feat(config): add YAML config support with extension auto-detect 2026-02-21 00:37:19 -05:00
src feat(config): add YAML config support with extension auto-detect 2026-02-21 00:37:19 -05:00
tests feat(config): add YAML config support with extension auto-detect 2026-02-21 00:37:19 -05:00
tools feat(admin): add stats command and server statistics 2026-02-06 13:19:57 -05:00
wiki feat(docker): add healthcheck, stop signal, and expose cluster port 2026-02-10 12:09:36 -05:00
.clang-format feat: initial commit 2026-02-06 00:06:48 -05:00
.clang-tidy feat: initial commit 2026-02-06 00:06:48 -05:00
.dockerignore feat(docker): add healthcheck, stop signal, and expose cluster port 2026-02-10 12:09:36 -05:00
.gitignore feat(security): add TLS/mTLS and HMAC cluster authentication 2026-02-21 00:07:02 -05:00
CMakeLists.txt feat(config): add YAML config support with extension auto-detect 2026-02-21 00:37:19 -05:00
Dockerfile feat(docker): add healthcheck, stop signal, and expose cluster port 2026-02-10 12:09:36 -05:00
README.md feat(config): update configuration options for batch processing and worker threads 2026-02-08 19:48:27 -05:00
SPEC.md feat(config): update configuration options for batch processing and worker threads 2026-02-08 19:48:27 -05:00

Mayfly

A lightweight, clustered TCP key-value store written in C++23. Binary protocol, epoll-driven, with optional disk persistence and distributed clustering.

Key Features

  • Binary Protocol — Compact 12-byte header with big-endian wire format. Low overhead, no parsing ambiguity.
  • Multiple Data Types — Strings, Lists (O(1) push/pop), 64-bit Counters, and Hashes.
  • Disk Persistence — Append-only write-ahead log (.egg files) with CRC-32 integrity, snapshots, and automatic compaction.
  • Clustering — Consistent hashing, SWIM gossip protocol, tunable quorum replication, and Hybrid Logical Clock conflict resolution.
  • Lock-Free Reads — Epoch-based reclamation with shared-lock + atomic-load read path. Readers never block each other.
  • LRU Eviction — Striped LRU lists with configurable memory limits and automatic eviction under pressure.
  • TTL Support — Per-key expiration with a background reaper thread.
  • Pipelining — Configurable worker thread pool with ordered or unordered response delivery.

Architecture

                          ┌─────────────────────────────────────────┐
                          │              Mayfly Server              │
  Clients ──TCP:7275──►   │                                         │
                          │   epoll event loop                      │
                          │     │                                   │
                          │     ├─► Protocol Parser (0xCAFE)        │
                          │     │     │                             │
                          │     │     ├─► Store (striped R/W locks) │
                          │     │     │     ├── Strings             │
                          │     │     │     ├── Lists               │
                          │     │     │     ├── Counters            │
                          │     │     │     └── Hashes              │
                          │     │     │                             │
                          │     │     ├─► EggLog (WAL)              │
                          │     │     └─► Snapshotter               │
                          │     │                                   │
  Peers ───TCP:8275──►    │     └─► Cluster Protocol (0xBEE5)       │
                          │           ├── Consistent Hash Ring      │
                          │           ├── SWIM Gossip               │
                          │           ├── Quorum Replication         │
                          │           └── Anti-Entropy Sync         │
                          └─────────────────────────────────────────┘

Data flow: Client connects via TCP → epoll dispatches to connection state machine → binary protocol parsed → request routed to the in-memory store (protected by striped reader-writer locks). If persistence is enabled, mutations hit the write-ahead log before acknowledgement. In cluster mode, writes replicate to N-1 peers and reads optionally gather a quorum.

Quick Start

Prerequisites

  • C++23 compiler (GCC 13+ or Clang 17+)
  • CMake 3.20+

Build

git clone https://github.com/your-org/mayfly.git
cd mayfly
cmake -B build
cmake --build build -j$(nproc)

Binaries are output to bin/.

Run

Start the server:

./bin/mayfly

Connect with the CLI:

./bin/mayfly-cli
> connect 127.0.0.1 7275
> SET hello world
OK
> GET hello
world
> SET counter 0
OK
> INCR counter
1
> LPUSH tasks "buy milk" "walk dog"
OK
> LRANGE tasks 0 -1
walk dog
buy milk
> DEL hello
OK

Run with Persistence

./bin/mayfly --backend disk --data-dir ./data

Run a Cluster

# Node 1
./bin/mayfly --port 7275 --cluster-port 8275

# Node 2 (joins node 1)
./bin/mayfly --port 7276 --cluster-port 8276 --join 127.0.0.1:8275

# Node 3
./bin/mayfly --port 7277 --cluster-port 8277 --join 127.0.0.1:8275

Commands

Category Commands
Basic SET GET DEL EXISTS KEYS TTL PING QUIT FLUSH
Lists LPUSH RPUSH LPOP RPOP LRANGE LLEN
Counters INCR DECR INCRBY DECRBY
Hashes HSET HGET HDEL HGETALL HKEYS HLEN
Admin AUTH COMPACT SNAPSHOT MEMORY
Cluster CLUSTER_INFO CLUSTER_NODES LEAVE

See SPEC.md for the full protocol specification with wire format diagrams.

Configuration

Server

Flag Default Description
--port 7275 Client listen port
--max-clients 128 Maximum concurrent connections
--token (none) Require authentication token
--connection-timeout 0 Idle connection timeout in seconds (0 = disabled)
--max-memory 0 Memory limit in MB (0 = unlimited)
--log-level info trace debug info warn error off

Persistence

Flag Default Description
--backend memory memory or disk
--data-dir ./mayfly-data WAL and snapshot directory
--batch-interval 10 WAL group-commit interval in ms
--compact-threshold 64 Auto-compaction trigger size in MB
--snapshot-retention 5 Number of snapshots to keep

Pipelining

Flag Default Description
(fixed) auto Worker threads (hardware concurrency)
(fixed) 64 Max in-flight requests per connection
--unordered-responses false Allow out-of-order response delivery

Clustering

Flag Default Description
--cluster-port 0 Cluster gossip port (0 = standalone)
--join (none) Seed node address (host:port)
--replication-factor 3 Replicas per key (1-5)
--vnodes 128 Virtual nodes per physical node
--gossip-interval 500 SWIM gossip period in ms
--suspicion-timeout 5000 Suspect-to-dead timeout in ms
--read-quorum 0 Read quorum (0 = auto: floor(N/2)+1)
--write-quorum 0 Write quorum (0 = auto: floor(N/2)+1)
--hint-ttl 3600 Hinted handoff hint TTL in seconds

Configuration can also be loaded from a file via --config <path> or set through MAYFLY_* environment variables.

Protocol

Mayfly uses a custom binary protocol. All integers are big-endian.

Client Protocol (magic 0xCAFE)

 0      1      2      3      4      5      6      7      8      9     10     11
┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│    Magic    │ Ver  │ Cmd  │         Request ID        │       Payload Length      │
│   0xCAFE   │ 0x01 │      │                           │                           │
└──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘

In cluster mode, an optional 13th byte specifies consistency level: ONE (0x01), QUORUM (0x02), or ALL (0x03).

Clustering

Mayfly's clustering layer is designed for partition tolerance and eventual consistency.

  • Consistent Hashing — xxHash-based token ring with configurable virtual nodes. Keys route to the first N nodes clockwise from their token position.
  • SWIM Gossip — Failure detection via direct probes, indirect probes (PING_REQ), and adaptive suspicion timeouts that scale with log2(cluster_size).
  • Quorum Replication — Tunable consistency: writes block until W replicas acknowledge, reads gather R responses and resolve via HLC. Read repair corrects stale replicas automatically.
  • Conflict Resolution — Hybrid Logical Clocks (wall time + logical counter + node ID) with last-write-wins semantics.
  • Anti-Entropy — Periodic Merkle tree comparison by token range. Only differing keys are transferred.
  • Hinted Handoff — Writes destined for temporarily unavailable nodes are stored locally and replayed on recovery.
  • Coordinated Leave — Two-phase handoff: keys are transferred to successors and ACKed before the node marks itself as leaving.

Testing

# Run all tests
cd build && ctest --output-on-failure

# Run by category
./bin/unit_tests
./bin/integration_tests
./bin/functional_tests

# Run by tag
./bin/unit_tests "[store]"
./bin/unit_tests "[cluster]"
./bin/unit_tests "[gossip]"
./bin/unit_tests "[protocol]"
./bin/unit_tests "[egglog]"

Benchmarking

A cluster-aware benchmark tool is included:

python3 tools/benchmark.py --host 127.0.0.1 --port 7275 \
    --threads 4 --ops 100000 --read-ratio 0.8

# Cluster mode (smart routing, zero MOVED errors)
python3 tools/benchmark.py --host 127.0.0.1 --port 7275 \
    --cluster --threads 8 --ops 500000

Limits

Resource Limit
Key size 256 bytes
Value size 64 KB
Replication factor 15
Virtual nodes Configurable (default 128)

Contributing

Contributions are welcome. Please open an issue to discuss significant changes before submitting a pull request.

License

See LICENSE for details.