- C++ 97.4%
- Python 1.9%
- CMake 0.6%
- Dockerfile 0.1%
Add yaml-cpp via FetchContent and link yaml-cpp::yaml-cpp to targets. Implement parse_yaml_config_file and load_config_file (dispatch .yaml/.yml to YAML, others to INI), add is_yaml_extension helper, and update main/ server to use load_config_file. Add extensive unit tests for YAML parsing. |
||
|---|---|---|
| .forgejo/workflows | ||
| include/mayfly | ||
| src | ||
| tests | ||
| tools | ||
| wiki | ||
| .clang-format | ||
| .clang-tidy | ||
| .dockerignore | ||
| .gitignore | ||
| CMakeLists.txt | ||
| Dockerfile | ||
| README.md | ||
| SPEC.md | ||
Mayfly
A lightweight, clustered TCP key-value store written in C++23. Binary protocol, epoll-driven, with optional disk persistence and distributed clustering.
Key Features
- Binary Protocol — Compact 12-byte header with big-endian wire format. Low overhead, no parsing ambiguity.
- Multiple Data Types — Strings, Lists (O(1) push/pop), 64-bit Counters, and Hashes.
- Disk Persistence — Append-only write-ahead log (
.eggfiles) with CRC-32 integrity, snapshots, and automatic compaction. - Clustering — Consistent hashing, SWIM gossip protocol, tunable quorum replication, and Hybrid Logical Clock conflict resolution.
- Lock-Free Reads — Epoch-based reclamation with shared-lock + atomic-load read path. Readers never block each other.
- LRU Eviction — Striped LRU lists with configurable memory limits and automatic eviction under pressure.
- TTL Support — Per-key expiration with a background reaper thread.
- Pipelining — Configurable worker thread pool with ordered or unordered response delivery.
Architecture
┌─────────────────────────────────────────┐
│ Mayfly Server │
Clients ──TCP:7275──► │ │
│ epoll event loop │
│ │ │
│ ├─► Protocol Parser (0xCAFE) │
│ │ │ │
│ │ ├─► Store (striped R/W locks) │
│ │ │ ├── Strings │
│ │ │ ├── Lists │
│ │ │ ├── Counters │
│ │ │ └── Hashes │
│ │ │ │
│ │ ├─► EggLog (WAL) │
│ │ └─► Snapshotter │
│ │ │
Peers ───TCP:8275──► │ └─► Cluster Protocol (0xBEE5) │
│ ├── Consistent Hash Ring │
│ ├── SWIM Gossip │
│ ├── Quorum Replication │
│ └── Anti-Entropy Sync │
└─────────────────────────────────────────┘
Data flow: Client connects via TCP → epoll dispatches to connection state machine → binary protocol parsed → request routed to the in-memory store (protected by striped reader-writer locks). If persistence is enabled, mutations hit the write-ahead log before acknowledgement. In cluster mode, writes replicate to N-1 peers and reads optionally gather a quorum.
Quick Start
Prerequisites
- C++23 compiler (GCC 13+ or Clang 17+)
- CMake 3.20+
Build
git clone https://github.com/your-org/mayfly.git
cd mayfly
cmake -B build
cmake --build build -j$(nproc)
Binaries are output to bin/.
Run
Start the server:
./bin/mayfly
Connect with the CLI:
./bin/mayfly-cli
> connect 127.0.0.1 7275
> SET hello world
OK
> GET hello
world
> SET counter 0
OK
> INCR counter
1
> LPUSH tasks "buy milk" "walk dog"
OK
> LRANGE tasks 0 -1
walk dog
buy milk
> DEL hello
OK
Run with Persistence
./bin/mayfly --backend disk --data-dir ./data
Run a Cluster
# Node 1
./bin/mayfly --port 7275 --cluster-port 8275
# Node 2 (joins node 1)
./bin/mayfly --port 7276 --cluster-port 8276 --join 127.0.0.1:8275
# Node 3
./bin/mayfly --port 7277 --cluster-port 8277 --join 127.0.0.1:8275
Commands
| Category | Commands |
|---|---|
| Basic | SET GET DEL EXISTS KEYS TTL PING QUIT FLUSH |
| Lists | LPUSH RPUSH LPOP RPOP LRANGE LLEN |
| Counters | INCR DECR INCRBY DECRBY |
| Hashes | HSET HGET HDEL HGETALL HKEYS HLEN |
| Admin | AUTH COMPACT SNAPSHOT MEMORY |
| Cluster | CLUSTER_INFO CLUSTER_NODES LEAVE |
See SPEC.md for the full protocol specification with wire format diagrams.
Configuration
Server
| Flag | Default | Description |
|---|---|---|
--port |
7275 |
Client listen port |
--max-clients |
128 |
Maximum concurrent connections |
--token |
(none) | Require authentication token |
--connection-timeout |
0 |
Idle connection timeout in seconds (0 = disabled) |
--max-memory |
0 |
Memory limit in MB (0 = unlimited) |
--log-level |
info |
trace debug info warn error off |
Persistence
| Flag | Default | Description |
|---|---|---|
--backend |
memory |
memory or disk |
--data-dir |
./mayfly-data |
WAL and snapshot directory |
--batch-interval |
10 |
WAL group-commit interval in ms |
--compact-threshold |
64 |
Auto-compaction trigger size in MB |
--snapshot-retention |
5 |
Number of snapshots to keep |
Pipelining
| Flag | Default | Description |
|---|---|---|
| (fixed) | auto |
Worker threads (hardware concurrency) |
| (fixed) | 64 |
Max in-flight requests per connection |
--unordered-responses |
false |
Allow out-of-order response delivery |
Clustering
| Flag | Default | Description |
|---|---|---|
--cluster-port |
0 |
Cluster gossip port (0 = standalone) |
--join |
(none) | Seed node address (host:port) |
--replication-factor |
3 |
Replicas per key (1-5) |
--vnodes |
128 |
Virtual nodes per physical node |
--gossip-interval |
500 |
SWIM gossip period in ms |
--suspicion-timeout |
5000 |
Suspect-to-dead timeout in ms |
--read-quorum |
0 |
Read quorum (0 = auto: floor(N/2)+1) |
--write-quorum |
0 |
Write quorum (0 = auto: floor(N/2)+1) |
--hint-ttl |
3600 |
Hinted handoff hint TTL in seconds |
Configuration can also be loaded from a file via --config <path> or set through MAYFLY_* environment variables.
Protocol
Mayfly uses a custom binary protocol. All integers are big-endian.
Client Protocol (magic 0xCAFE)
0 1 2 3 4 5 6 7 8 9 10 11
┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ Magic │ Ver │ Cmd │ Request ID │ Payload Length │
│ 0xCAFE │ 0x01 │ │ │ │
└──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘
In cluster mode, an optional 13th byte specifies consistency level: ONE (0x01), QUORUM (0x02), or ALL (0x03).
Clustering
Mayfly's clustering layer is designed for partition tolerance and eventual consistency.
- Consistent Hashing — xxHash-based token ring with configurable virtual nodes. Keys route to the first N nodes clockwise from their token position.
- SWIM Gossip — Failure detection via direct probes, indirect probes (PING_REQ), and adaptive suspicion timeouts that scale with
log2(cluster_size). - Quorum Replication — Tunable consistency: writes block until W replicas acknowledge, reads gather R responses and resolve via HLC. Read repair corrects stale replicas automatically.
- Conflict Resolution — Hybrid Logical Clocks (wall time + logical counter + node ID) with last-write-wins semantics.
- Anti-Entropy — Periodic Merkle tree comparison by token range. Only differing keys are transferred.
- Hinted Handoff — Writes destined for temporarily unavailable nodes are stored locally and replayed on recovery.
- Coordinated Leave — Two-phase handoff: keys are transferred to successors and ACKed before the node marks itself as leaving.
Testing
# Run all tests
cd build && ctest --output-on-failure
# Run by category
./bin/unit_tests
./bin/integration_tests
./bin/functional_tests
# Run by tag
./bin/unit_tests "[store]"
./bin/unit_tests "[cluster]"
./bin/unit_tests "[gossip]"
./bin/unit_tests "[protocol]"
./bin/unit_tests "[egglog]"
Benchmarking
A cluster-aware benchmark tool is included:
python3 tools/benchmark.py --host 127.0.0.1 --port 7275 \
--threads 4 --ops 100000 --read-ratio 0.8
# Cluster mode (smart routing, zero MOVED errors)
python3 tools/benchmark.py --host 127.0.0.1 --port 7275 \
--cluster --threads 8 --ops 500000
Limits
| Resource | Limit |
|---|---|
| Key size | 256 bytes |
| Value size | 64 KB |
| Replication factor | 1–5 |
| Virtual nodes | Configurable (default 128) |
Contributing
Contributions are welcome. Please open an issue to discuss significant changes before submitting a pull request.
License
See LICENSE for details.