JanusGraph Storage Backend Architecture & Configuration

JanusGraph decouples storage, indexing, and compute into three independently scaled tiers, which buys architectural flexibility but converts every write into a distributed-consistency problem you must engineer explicitly. This subsystem is the primary failure surface in production JanusGraph: a single-node graph that “works on my laptop” collapses under sustained ingestion because the storage commit path, the asynchronous index dispatch, and the traversal layer each fail in different ways and at different times. This reference is the entry point for the storage tier; it sits alongside the External Index Synchronization & Consistency Tuning and Graph Schema Validation & Modeling Strategies references, and links down into every storage-specific guide. The focus throughout is deterministic behavior, measurable latency, and failure recovery under load — not conceptual tourism.

The diagram below shows how a single mutation flows through JanusGraph’s three layers — committed synchronously to storage, then dispatched asynchronously to the index.

Core Architecture & Consistency Boundaries

JanusGraph operates as a distributed graph engine that translates Gremlin traversals into discrete storage and index operations. It owns no persistence of its own — it is a coordination and query-planning layer bolted onto pluggable backends. The architecture consists of three primary tiers, each with a distinct responsibility and a distinct failure mode:

Storage Backend — Persists vertices, edges, and properties as wide rows keyed by vertex ID. Typically Apache Cassandra or ScyllaDB via the CQL driver. Owns partitioning, compaction, replication, and durability. Failure here is the loudest and the safest: writes reject, reads throw NoHostAvailableException, and nothing silently corrupts.
Index Backend — Maintains secondary indexes for property lookups and full-text search. Typically Elasticsearch or OpenSearch. Owns has() predicate resolution, range queries, and mixed-index scoring. Failure here is the quietest and the most dangerous: traversals still return, but they return stale or incomplete results because the index has drifted from storage.
Compute / Traversal Layer — The JanusGraph instance (embedded or behind Gremlin Server) that executes Gremlin, manages transaction boundaries, holds the database-level cache, and coordinates the two-phase write across storage and index. Failure here surfaces as transaction aborts, lock contention, and cache incoherence.

The transaction pipeline and where drift originates

Data mutations flow through a strict, two-phase pipeline. When a vertex or edge is created, JanusGraph writes to the storage backend first and commits synchronously — the CQL write must be acknowledged before tx.commit() returns. Only after the storage commit succeeds does JanusGraph enqueue the corresponding index mutation for asynchronous application by a background worker thread. These two phases are not atomic. The storage write can succeed while the index write is still queued, in flight, retrying, or dropped.

That non-atomic seam is the origin of almost all production “the data is there but the query can’t find it” incidents. The index is eventually consistent by construction, and the length of “eventually” is governed by index-backend refresh intervals, bulk-flush thresholds, worker-thread backpressure, and network health — none of which JanusGraph guarantees for you. The distinction between the storage tier’s tunable quorum guarantees and the index tier’s inherent lag is the single most important mental model on this site; it is developed in depth under Eventual vs Strong Consistency. Explicit control over commit boundaries, backpressure thresholds, and recovery routines is mandatory — defaults do not survive contact with production throughput.

Production Storage Backend Configuration

Storage backend tuning requires explicit control over partitioning, consistency levels, compaction, and connection lifecycle. Default janusgraph.properties values target a demo, not a production cluster. The following block is a hardened baseline for CQL deployments carrying high-throughput ingestion alongside low-latency traversals; every non-default value is justified below it.

properties

# Storage Backend
storage.backend=cql
storage.hostname=10.0.1.10,10.0.1.11,10.0.1.12
storage.port=9042
storage.cql.keyspace=janusgraph_prod
storage.cql.local-datacenter=dc1

# Consistency & Performance
storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.compression=LZ4Compressor
storage.cql.compaction-strategy-options.class=SizeTieredCompactionStrategy
storage.cql.compaction-strategy-options.sstable_size_in_mb=256
storage.cql.batch-statement-size=50

# Transaction & Cache
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.25

Rationale for every non-default value:

storage.cql.local-datacenter — The CQL driver refuses to build a load-balancing policy without an explicit local datacenter in modern versions. Omitting it silently routes requests across datacenters, adding cross-DC latency to your commit path. Set it to the DC your JanusGraph instances physically live in.
read-consistency-level / write-consistency-level = LOCAL_QUORUM — LOCAL_QUORUM acknowledges once a majority of replicas in the local DC confirm, which survives a single-node failure without paying cross-DC round-trips. ALL bottlenecks under any concurrent write and turns one slow node into cluster-wide stalls; ONE risks losing acknowledged writes during node failure. The precise availability math is worked through under Replication Strategies.
compression = LZ4Compressor — LZ4 is the correct default for graph SSTables: near-free CPU cost, meaningful I/O reduction on the adjacency-list rows JanusGraph writes. NONE only makes sense when your storage nodes are CPU-bound rather than I/O-bound, which is rare for graph workloads.
compaction-strategy = SizeTieredCompactionStrategy — STCS suits the write-heavy, append-mostly mutation pattern of graph ingestion. Switch to TimeWindowCompactionStrategy only when your data has strict TTL-based expiration (e.g. time-bucketed event graphs); switching for a non-TTL workload inflates read amplification.
batch-statement-size = 50 — Caps the number of statements JanusGraph packs into a single CQL logged batch. Oversized batches overload the coordinator node and accelerate tombstone accumulation; 50 is a safe ceiling that still amortizes round-trip cost.
cache.db-cache-size = 0.25 — Reserves 25% of heap for the database-level cache. Push this higher only after you have confirmed the JVM has headroom; the cache competes directly with transaction working sets, and an oversized cache triggers long GC pauses that read exactly like storage latency.

Two adjacent surfaces deserve their own attention before you scale ingestion. The DataStax Java driver underneath JanusGraph manages socket allocation, and an under-sized pool causes traversal timeouts during peak load; getting Connection Pooling right prevents thread starvation and reduces GC pressure on the JanusGraph JVM. Keyspace replication factors must also align with your physical topology before any bulk load — a mismatch between replication-strategy-options and the real rack layout triggers read-repair storms that inflate p99 latency; settle your Replication Strategies first. Initial cluster provisioning, keyspace DDL, and token-range alignment are covered end to end in Cassandra Backend Setup.

When the storage engine itself is the bottleneck

If Cassandra’s coordinator-thread model and RPC limits become the ceiling on your ingestion rate, the CQL-compatible path forward is ScyllaDB, which reimplements the same protocol on a shard-per-core, thread-per-shard architecture. The property overrides, driver settings, and latency benchmarks for that transition — plus the compatibility caveats that bite during cutover — live in the ScyllaDB Migration guide. Do not treat it as a drop-in swap: schema handling, compaction behavior, and consistency defaults differ enough to require validation against your own traffic.

Index Backend Wiring & Synchronization Mechanics

Secondary indexes in JanusGraph are strictly eventual by default. The index.search.backend=elasticsearch setting routes property mutations to a separate search cluster; OpenSearch clusters use the same elasticsearch backend value because they speak a compatible wire protocol. A hardened index binding looks like this:

properties

# Index Backend (value stays "elasticsearch" even for OpenSearch clusters)
index.search.backend=elasticsearch
index.search.hostname=10.0.2.10,10.0.2.11,10.0.2.12
index.search.port=9200
index.search.elasticsearch.client-only=true
index.search.elasticsearch.create.ext.cluster.name=janusgraph-index-prod
index.search.elasticsearch.create.ext.number_of_shards=3
index.search.elasticsearch.create.ext.number_of_replicas=1
index.search.elasticsearch.create.ext.refresh_interval=5s
index.search.elasticsearch.bulk-refresh=wait_for

The asynchronous dispatch model

Index synchronization runs on a background worker that drains a mutation queue populated at commit time. The dispatch is fire-and-forget from the traversal’s perspective: tx.commit() returns as soon as storage acknowledges, and the index catches up afterward. Three settings govern how far behind it can fall:

refresh_interval = 5s — Elasticsearch only makes newly indexed documents searchable at each refresh. The default 1s refresh generates excessive segment I/O under heavy ingestion; 5s (or 10s for pure batch loads) trades a slightly longer visibility window for dramatically lower merge pressure. This is a direct latency-vs-throughput dial.
bulk-refresh = wait_for — Makes bulk index requests block until the next refresh completes, which tightens the write-to-visible window at the cost of bulk throughput. Use it for pipelines that must read their own writes; leave it at the default for fire-and-forget bulk loads.
number_of_replicas = 1 — At least one replica so an index-node failure does not take down has()-predicate resolution. Do not run production property lookups against a zero-replica index.

Under heavy write load, the mutation queue can outrun the index cluster’s ingestion capacity, producing backpressure that shows up as growing queue depth and rising index lag. Tune index.search.elasticsearch.bulk-size and index.search.elasticsearch.max-retry-time to match your index cluster’s flush and merge capacity, and monitor the queue depth as a first-class metric — it is your earliest warning of drift. The backend-specific tuning of these windows, and the way OpenSearch and Elasticsearch diverge in practice, are detailed under Elasticsearch Integration and OpenSearch Sync Patterns.

Mixed indexes, composite indexes, and routing

JanusGraph exposes two index shapes. Composite indexes live in the storage backend itself and are strongly consistent for exact-match equality lookups — they never drift because they share the storage commit. Mixed indexes live in the search backend and support range, full-text, and geo predicates but inherit the eventual-consistency seam described above. Choosing the right shape per property, and routing a query to the index that can actually serve it, is the difference between a millisecond lookup and a full-graph scan; the decision procedure is developed under Mixed-Index Routing. Which properties even deserve an index in the first place is a schema-modeling decision, governed by the rules in Property Indexing Rules.

Python Pipeline Orchestration & Transaction Management

Graph ingestion pipelines built with gremlin-python must enforce explicit transaction boundaries and idempotent mutation patterns. The Gremlin Server session model does not auto-commit; a failure mid-pipeline leaves partial state that later batches will silently duplicate on retry. The pattern below batches mutations, commits explicitly, retries transient faults with bounded exponential backoff, and stays idempotent by keying every vertex on a business ID so a replayed batch updates rather than duplicates.

python

import logging
import random
import time
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.protocol import GremlinServerError

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

TRANSIENT = (ConnectionError, TimeoutError, GremlinServerError)


def with_backoff(fn, attempts=5, base=0.5, cap=8.0):
    """Retry a callable on transient faults with exponential backoff + jitter."""
    for attempt in range(1, attempts + 1):
        try:
            return fn()
        except TRANSIENT as exc:
            if attempt == attempts:
                raise
            sleep = min(cap, base * (2 ** (attempt - 1))) + random.uniform(0, base)
            logger.warning("transient fault (%s), retry %d in %.2fs", exc, attempt, sleep)
            time.sleep(sleep)


def batch_ingest(g, vertices, batch_size=500):
    """Idempotent batched ingestion with explicit per-batch transaction boundaries."""
    for i in range(0, len(vertices), batch_size):
        batch = vertices[i:i + batch_size]
        tx = g.tx()
        gtx = tx.begin()  # begin() spawns the transaction-bound traversal source
        try:
            for v in batch:
                # Upsert: match on business id, create only if absent -> replay-safe.
                (gtx.V().has(v["label"], "id", v["id"]).fold()
                    .coalesce(
                        __.unfold(),
                        __.addV(v["label"]).property("id", v["id"]))
                    .property("data", v["data"])
                    .iterate())
            with_backoff(tx.commit)  # explicit commit flushes to the CQL backend
            logger.info("committed batch %d..%d", i, i + len(batch))
        except Exception as exc:
            tx.rollback()
            raise RuntimeError(f"ingestion failed at offset {i}: {exc}") from exc


if __name__ == "__main__":
    connection = DriverRemoteConnection("ws://janusgraph-server:8182/gremlin", "g")
    g = traversal().withRemote(connection)
    try:
        batch_ingest(g, load_vertices())  # load_vertices() supplied by your pipeline
    finally:
        connection.close()

Pipeline rules that hold across every backend:

Use .iterate() for mutations, never .toList(). .iterate() streams the traversal without materializing a result set, which keeps the Gremlin Server heap flat during million-vertex loads.
Key every write on a business ID and upsert with coalesce. The has(...).fold().coalesce(unfold(), addV(...)) idiom makes a replayed batch a no-op instead of a duplicate — the property that makes with_backoff safe to wrap around commit.
Commit 200–500 mutations per transaction. Smaller batches waste round-trips on transaction-log overhead; larger batches amplify CQL writes and lengthen the rollback you pay on any failure inside the batch.
Retry only transient exceptions. ConnectionClosedException, TimeoutException, and coordinator-overload GremlinServerErrors are retryable; a schema or data-validity error is not, and blindly retrying it burns your backoff budget while the real fault persists.
Close the connection in finally. A leaked DriverRemoteConnection holds a WebSocket and its share of the server-side thread pool open until the server times it out.

For traversal-optimization and session-management patterns beyond ingestion, the authoritative source is the Apache TinkerPop Gremlin documentation. Schema-level guarantees that keep these pipelines from writing invalid graphs — label and cardinality enforcement — are covered under Vertex and Edge Validation.

Diagnostics, Index Repair & Failure Recovery

Production graph systems require continuous observability into storage latency, index queue depth, and cache hit ratios. Expose JMX over Prometheus and track, at minimum, these three MBeans:

org.janusgraph.diskstorage.cql.CQLStoreManager — CQL read/write latency percentiles and connection-pool utilization. Rising pool utilization with flat throughput is the signature of a starved connection pool.
org.janusgraph.diskstorage.indexing.IndexProvider — Index queue size and bulk-flush duration. This is your index-drift early-warning system; alert when queue depth trends upward across successive scrapes.
org.janusgraph.graphdb.database.StandardJanusGraph — Cache hit/miss ratios and transaction abort rate. A collapsing cache hit ratio usually precedes a latency incident by minutes.

Reindexing after drift

When the index drifts — because of a network partition, an index-node failure, or a dropped bulk request — reconcile it with a targeted reindex through the Management API rather than rebuilding from scratch:

java

JanusGraphManagement mgmt = graph.openManagement();
JanusGraphIndex index = mgmt.getGraphIndex("search");
mgmt.updateIndex(index, SchemaAction.REINDEX).get();
mgmt.commit();

// Block until the index is demonstrably consistent before routing reads to it.
ManagementSystem.awaitGraphIndexStatus(graph, "search")
    .status(SchemaStatus.ENABLED)
    .timeout(10, java.time.temporal.ChronoUnit.MINUTES)
    .call();

Always gate production traffic on awaitGraphIndexStatus() after any reindex or schema change; routing reads to an index still in INSTALLED or REGISTERED state returns partial results with no error. Reindexing is also the mandatory final step of any storage migration — validate index parity before cutover, using the procedure in the ScyllaDB Migration guide. When drift crosses a threshold you care about, the alerting wiring that pages a human belongs in Alert Routing for Violations.

Failure-mode reference

The four-to-six failure modes below account for the overwhelming majority of production storage-tier incidents. Each maps a symptom to the command that confirms the diagnosis and the resolution that clears it.

Symptom	Diagnosis command	Resolution
`NoHostAvailableException` on every write	`nodetool status` — check for `DN` (down) nodes and whether live replicas meet the write consistency level	Restore downed nodes or temporarily drop `write-consistency-level` from `LOCAL_QUORUM` to `LOCAL_ONE` if degraded writes are acceptable; verify pool sizing per Connection Pooling
Vertex committed but `g.V().has(...)` returns nothing	JMX `IndexProvider` queue size climbing; compare storage vs index counts	Wait one `refresh_interval`, then reconcile via `SchemaAction.REINDEX`; if chronic, raise `bulk-size` and lower `refresh_interval` per OpenSearch Sync Patterns
p99 traversal latency spikes under steady load	`nodetool tpstats` — look for pending/blocked flush and read-repair tasks	Align keyspace RF with topology to stop read-repair storms; revisit Replication Strategies and compaction backlog
Traversal timeouts during ingestion peaks, storage nodes healthy	JMX `CQLStoreManager` pool utilization pinned at max; JanusGraph JVM GC pauses lengthening	Increase `max-connections-per-host` and `core-connections-per-host`; cap `cache.db-cache-size`; full procedure in the connection pool tuning guide
Partial graph after a crashed pipeline	Query business-ID range and diff against source-of-truth counts	Re-run the idempotent upsert batch (safe to replay); never re-run a non-idempotent `addV`-only loop
Index stuck in `INSTALLED` / `REGISTERED`, reads incomplete	`mgmt.printIndexes()` shows non-`ENABLED` status	Run `SchemaAction.REGISTER_INDEX` then `REINDEX`, and block on `awaitGraphIndexStatus(...).status(ENABLED)` before routing reads

Up: External Index Synchronization & Consistency Tuning and Graph Schema Validation & Modeling Strategies — the two companion references that sit alongside this storage-tier reference.
Cassandra Backend Setup — keyspace provisioning, token-range alignment, and consistency defaults.
Connection Pooling — CQL driver pool sizing to prevent thread starvation and traversal timeouts.
Replication Strategies — NetworkTopologyStrategy, per-DC replication factors, and quorum availability math.
ScyllaDB Migration — CQL-compatible cutover for higher throughput and lower tail latency.
Eventual vs Strong Consistency — the storage-versus-index consistency model in depth.
Mixed-Index Routing — composite vs mixed index selection and query routing.