Enforcing Strict Vertex and Edge Validation

This is the exact procedure for switching a running JanusGraph cluster from permissive schema auto-creation to strict, pre-commit enforcement, and it prevents the single most expensive failure in graph operations: silent schema drift that corrupts a dataset one malformed write at a time. It is a task-level guide under the parent Vertex and Edge Validation cluster, and it assumes the modeling decisions in that cluster and its parent Graph Schema Validation & Modeling Strategies pillar are already made. Lenient defaults let ingestion pipelines bypass type constraints, cardinality rules, and label boundaries; downstream analytics and index synchronization then degrade into unpredictable states with no clean error to page on. Enforcement happens in three coordinated places — the storage backend configuration, an explicit validation gate in the ingestion pipeline, and deterministic index reconciliation — and this guide walks all three in the order you must apply them.

Prerequisites

Confirm every item below before you change a single property. Enabling strict validation on a running cluster with unregistered schema elements will reject all subsequent writes that reference them, so the order of operations is not optional.

JanusGraph 0.6.x or 1.0.x with the CQL storage adapter, running against Apache Cassandra 3.11+/4.x or ScyllaDB. Version-specific keyspace notes live in Cassandra Backend Setup; ScyllaDB cutovers should follow ScyllaDB Migration before you tighten schema mode.
A reachable mixed-index backend (Elasticsearch 7.x/8.x or an API-compatible OpenSearch cluster) with client-only=true connectivity from every JanusGraph node.
gremlin-python 3.6+ matching the server’s Apache TinkerPop line, plus network access to the Gremlin Server WebSocket endpoint on port 8182.
Management-API permissions — the account running the Gremlin console must be able to call graph.openManagement(), updateIndex, and commit schema mutations.
A complete schema inventory. Export the current registered labels and keys with mgmt.printSchema() and diff it against what your pipeline actually writes. Every label and property key your data uses must be registered before you switch to none.
A staging cluster that mirrors production topology. Never rehearse this change for the first time on a live keyspace.

Step 1 — Register missing schema, then harden the backend config

First close the inventory gap. For any label or key that appears in the write path but not in printSchema(), register it explicitly through the ManagementSystem so the strict switch does not immediately reject live traffic:

gremlin

mgmt = graph.openManagement()
if (mgmt.getPropertyKey("user_id") == null) {
    mgmt.makePropertyKey("user_id").dataType(String.class).cardinality(Cardinality.SINGLE).make()
}
if (mgmt.getVertexLabel("device") == null) {
    mgmt.makeVertexLabel("device").make()
}
mgmt.commit()

Then apply the hardened block to janusgraph.properties (or inject it via your orchestration layer — Kubernetes ConfigMap, Consul, etc.). Every non-default value here exists to make drift impossible or observable:

properties

# Reject any write that references an unregistered key or label.
schema.default=none

# Enforce vertex-label -> property and connection constraints.
schema.constraints=true

# Disable bulk-load optimizations so every mutation passes the validation gate.
storage.batch-loading=false

# Allow the pipeline to supply deterministic vertex IDs (idempotent retries).
graph.set-vertex-id=true

# Mixed-index backend (the elasticsearch key also serves OpenSearch).
index.search.backend=elasticsearch
index.search.elasticsearch.client-only=true
index.search.elasticsearch.bulk-refresh=wait_for
index.search.elasticsearch.create.ext.refresh_interval=1s

schema.default=none is the line that does the work: it converts silent schema auto-creation into an immediate, catchable SchemaViolationException. storage.batch-loading=false matters just as much — bulk-load mode bypasses the validation gate entirely, so leaving it on quietly defeats everything else. Align the consistency levels these writes run at with your Replication Strategies before bulk ingestion, or a validating writer can read stale schema during a node failure and wrongly believe a key is unregistered. Restart the JanusGraph cluster after applying the block.

Step 2 — Add a pre-commit validation gate in the pipeline

Backend enforcement alone catches violations only after a payload reaches the transaction log, producing SchemaViolationException noise and rollback churn. A pre-commit layer in the Python ingestion pipeline rejects malformed records before they ever open a transaction. The gate below checks label, property-key, and type against a single source-of-truth SCHEMA dict, then commits inside an explicit transaction boundary:

python

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.traversal import Cardinality

# Source of truth — must mirror the registered JanusGraph schema exactly.
SCHEMA = {
    "vertex_labels": {"user", "device", "transaction"},
    "edge_labels": {"owns", "initiated_by", "connected_to"},
    "properties": {
        "user_id": {"type": str,   "cardinality": Cardinality.single},
        "status":  {"type": str,   "cardinality": Cardinality.single},
        "weight":  {"type": float, "cardinality": Cardinality.single},
        "tags":    {"type": list,  "cardinality": Cardinality.set_},
    },
}


def validate_and_commit(conn, record):
    """Validate a payload against SCHEMA before executing any traversal."""
    label = record.get("label")
    props = record.get("properties", {})

    # 1. Label validation — reject before opening a transaction.
    if label not in SCHEMA["vertex_labels"] and label not in SCHEMA["edge_labels"]:
        raise ValueError(f"Invalid label: {label}")

    # 2. Property-key and type validation.
    for key, value in props.items():
        if key not in SCHEMA["properties"]:
            raise KeyError(f"Undefined property key: {key}")
        expected = SCHEMA["properties"][key]["type"]
        if not isinstance(value, expected):
            raise TypeError(
                f"Property '{key}' expects {expected.__name__}, got {type(value).__name__}"
            )

    # 3. Commit inside an explicit transaction boundary.
    g = traversal().withRemote(conn)
    tx = g.tx()
    gtx = tx.begin()
    try:
        t = gtx.addV(label) if label in SCHEMA["vertex_labels"] else gtx.addE(label)
        for key, value in props.items():
            t = t.property(key, value)
        t.iterate()
        tx.commit()
        return True
    except Exception as exc:
        tx.rollback()
        raise RuntimeError(f"Transaction failed: {exc}") from exc

Size the driver connection pool to your ingestion concurrency; an undersized pool starves worker threads and surfaces as spurious timeouts rather than validation errors — see Connection Pooling for sizing rules.

Step 3 — Reconcile the mixed index

Strict validation keeps the graph correct, but mixed indexes replicate to the search backend asynchronously, so an out-of-sync index still causes stale reads and query timeouts. This is the async seam where drift originates — the model behind it is covered under Eventual vs Strong Consistency, and the dispatch tuning under Mixed-Index Routing. If any index is not ENABLED, force it back to a serving state:

gremlin

mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("searchIndex"), SchemaAction.REINDEX).get()
mgmt.commit()

Verification

Confirm each step landed before you route production traffic.

Schema mode — verify the strict switch took effect and was not overridden by a mounted file or env var:

gremlin

mgmt = graph.openManagement()
print(mgmt.get("schema.default"))
// Expected output: none
mgmt.rollback()

Validation gate — assert the gate rejects a bad payload without touching the graph:

python

try:
    validate_and_commit(conn, {"label": "ghost", "properties": {}})
    assert False, "gate failed to reject an invalid label"
except ValueError:
    pass  # expected

Index status — confirm the index serves queries. INSTALLED, REGISTERED, or REINDEX all mean it does not:

gremlin

mgmt = graph.openManagement()
index = mgmt.getGraphIndex("searchIndex")
print(index.getIndexStatus(mgmt.getPropertyKey("user_id")))
// Expected output: ENABLED
mgmt.rollback()

Replication lag — compare the search backend document count against a storage-backed traversal. For Elasticsearch, run GET /_cat/indices?v and diff against g.V().hasLabel('user').count(). A delta above 5% indicates index lag, not schema failure.

Fallback procedures

Each step has a defined recovery path. Do not improvise around strict mode by disabling it — that reopens the corruption window you just closed.

Schema mode still reports default. The properties file is being overridden. Run grep -r "schema.default" /etc/janusgraph/ /opt/conf/ to locate the precedence conflict, fix the source of truth, and restart. Do not toggle none on a live cluster that still has unregistered elements in its write path — register them first (Step 1).
tx.commit() throws SchemaViolationException on a payload that passed the Python gate. The SCHEMA dict is stale relative to the backend, or a concurrent schema update landed. Halt the ingestion worker, pull the live schema with graph.openManagement().getVertexLabels() and getPropertyKeys(), update the SCHEMA constant, and resume from the last acknowledged offset.
REINDEX hangs or throws BackendNotFoundException. The search cluster is unreachable — check network policies and TLS certificates. As an immediate operational fallback, route reads to storage-backed traversals that avoid mixed-index predicates until the index reaches ENABLED. Never disable schema.default=none to work around an index failure; that corrupts the write path while leaving the read path broken.
A pipeline committed partial data before failing. Run a compensating traversal inside a single transaction to remove orphaned elements, substituting your pipeline’s staging label:
gremlin
```
g.V().hasLabel("temp_ingest").drop().iterate()
```
Rollback of the whole change. If validation must be reverted, set schema.default=default and storage.batch-loading back to your prior value, restart, and treat the incident as a schema-registration gap to close before re-attempting — the switch itself is safe once the inventory is complete.

Operational guardrails. Rehearse this procedure on staging with synthetic payloads and confirm zero SchemaViolationException events before promoting. Monitor the transaction-log rollback rate; a sustained rate above 2% signals pipeline drift that needs schema reconciliation, and routing that signal is the job of Alert Routing for Violations.

FAQ

Does schema.default=none block reads as well as writes? No. It only rejects writes that reference unregistered labels or keys. Reads against existing data are unaffected, which is why the safe rollout is register-first, then switch.

Why keep storage.batch-loading=false if it slows ingestion? Bulk-load mode bypasses the validation gate to gain throughput. With it on, malformed payloads skip enforcement entirely — the exact drift you are trying to stop. Re-enable it only for a trusted, pre-validated backfill, then turn it off.

The Python gate already validates types — why also set schema.constraints=true? The gate is advisory and lives outside the database; a second pipeline, a Gremlin console session, or a bug can write around it. schema.constraints=true enforces label-to-property and connection constraints at the storage boundary, so enforcement holds regardless of who is writing.

Vertex and Edge Validation — parent guide: partition-aware IDs, supernode limits, and edge-signature design
Automating Property-Index Collision Resolution — sibling procedure for reconciling conflicting index mappings
Schema Evolution and CI Gating — block unregistered keys and type mismatches before they merge
Alert Routing for Violations — classify and escalate SchemaViolationException and index-lag events
Connection Pooling — size the driver pool so the validation gate does not starve under load