Ask Concord

Answers from our documentation

Ask anything about Concord. Every answer comes from our actual documentation.

Capability: Knowledge Graph + Retrieval

One retrieval brain. Many surfaces. No model in the query path.

Concord by IaxaI indexes every entity, event, and relationship the engine produces into a dense-sparse knowledge graph. Analysts query it directly. Semantic Alert Dedup correlates through it. The Detection Portability Layer retrieves analogous detections from it. Compliance Auto-Packets assembles evidence out of it. Three latency tiers, all deterministic, all CPU-cheap.

Want to see retrieval on your stack?

The Problem

Every surface that needs to ask "what else relates to this?" ends up rebuilding retrieval from scratch.

Dedup needs a correlation engine. Detection portability needs an analog finder. Auto-Packets needs an evidence assembler. The analyst chat needs all three. Most platforms ship four disconnected indexes, each with its own freshness lag, its own relevance bugs, and its own latency budget. Then someone wires a language model in front of the query path because the indexes disagree, and now every analyst question hits a black-box model that nobody can replay.

The Impact

Slow surfaces, non-deterministic answers, and a query path no examiner can trust.

A correlation lookup takes seconds because it traverses three stores. The same question asked twice returns different answers because the model in front is sampling. An auditor asks how an evidence packet was assembled and the honest answer is "the model decided." In a regulated SOC that's the wrong answer.

What Concord Does Differently

One graph. One retrieval engine. Three surfaces query it through the same API. The query path is embedding search plus Personalized PageRank traversal. No language model, fully deterministic, CPU-cheap. Language models run only in nightly batch enrichment to surface relationships rule-based extraction missed, never on the live path.

The Outcome

Same answer twice. Sub-second on the common case. Replayable end to end.

Dedup decisions, detection translations, and audit packets all rest on the same retrieval substrate. Every query writes a ledger entry with the seed set, the result set, the tier, and the latency. Regulated end-clients get an answer they can verify after the fact. Operators get a query path they can actually load-test.

What It Is

A dense-sparse graph fed by everything the engine resolves.

Two node types: phrase nodes for entities, controls, regulations, techniques, and indicators; passage nodes for events, documents, alerts, and incidents. Three edge classes: relations between entities, synonym links by embedding similarity, and context links between entities and the passages they appear in. Translation writes the passage. Entity Resolution writes the phrase. The retrieval engine reads both.

Phrase nodes

Canonical entities: a user, a host, an IP, a MITRE technique, a control objective, an indicator of compromise. Each carries a 768-dimensional embedding so semantic neighbors are discoverable, even when the literal strings disagree.

Passage nodes

Events, alerts, incidents, policy chunks, evidence documents. Each links back to the phrase nodes it touches. The dense-sparse bridge: passage matches expand into their phrase neighbors, phrase matches surface the passages that mention them.

How Retrieval Works

Three latency tiers. The right cost for the right question.

A query classifier routes each request to the cheapest tier that can answer it honestly. The technical buyer cares about latency budgets. Here they are.

Tier 1: Lookup

<100ms p95

Direct ID or entity-value resolution. No embedding, no graph walk. Used by the analyst omnibox when an analyst pastes a hash, an IP, or a username and wants the canonical record back without a roundtrip.

Tier 2: Semantic

<500ms p95

The common case. Embed the query, run dual-path search across phrase and passage indexes with similarity thresholds, seed Personalized PageRank on the graph, blend the scores, return ranked results with reasoning paths. This is what Dedup, DPL, and most analyst questions hit.

Tier 3: Deep Analysis

<5s p95

Extended graph traversal: wider hop budget, lower decay. Used by Compliance Auto-Packets when an audit window needs every event touching a control across months of telemetry. Optional language-model synthesis is opt-in only and ledgered when used.

Hot Path Discipline

No language model in the query path. On purpose.

Concord's retrieval path is embedding search plus graph traversal. Both are CPU-cheap and deterministic. Same input, same output, every time. The language model only runs in nightly batch enrichment, where it picks up passages with thin extracted triples and proposes new relationships for the next day's queries. Audit-grade systems can't have non-deterministic inference sitting between an analyst question and an answer that ends up in an evidence packet. Ours doesn't.

Query path

Sentence-transformer embedding plus Personalized PageRank on the in-memory graph. Around 80ms for the embedding, under 100ms for the traversal, deterministic on the same corpus.

Indexing path

Each ingested event becomes a passage node with its entities linked as phrase nodes. About 30 regex extractors plus per-OCSF-category templates pull triples on the way in. Sub-50ms budget per event.

Enrichment path

Nightly batch only. A small open-weights model runs over under-extracted passages from the previous 24 hours and proposes new triples and synonym edges. Throttled, throttle-able, never on live traffic.

One Brain, Three Surfaces

Why this is core engine, not a surface.

Without this layer, every surface would re-implement retrieval and the answers would diverge. With it, all three surfaces query the same graph through the same public API.

Semantic Alert Dedup

On every incoming event, Dedup asks the graph "anything in the last four hours involving this identity or this source IP that semantically matches this alert?" Tier 2 query, blended score, reasoning path. If the top hit is similar enough and close enough in time, the new event collapses into the existing Security Narrative card instead of producing a duplicate alert.

Detection Portability Layer

When a customer asks for a CrowdStrike rule ported to SentinelOne, the graph gets queried for analogous detections from MITRE, Sigma, and the canonical detection corpus already indexed as phrase and passage nodes. The reasoning path becomes the "why this translation" justification the customer sees alongside the ported rule.

Compliance Evidence Auto-Packets

For each control in FFIEC, GLBA, PCI, and HIPAA, Auto-Packets queries the graph scoped to the audit window. Returns every passage touching that control phrase node, every triple where the subject or object is a tenant asset under that control, and the audit-trail reasoning path the renderer turns into the exam packet.

Analyst chat and omnibox

Tier 1 for ID lookups, Tier 2 for everything else, Tier 3 for explicit deep-analysis requests. The chat surface already renders the reasoning path so analysts can see which seed nodes the query started from, which passages it walked, and why the top result ranked where it did.

Air-gap deployable. Whole engine. Whole graph.

The embedding model and the optional batch enrichment model both run on CPU. No external API calls in the retrieval path. The full engine plus the retrieval layer can run on-prem at the Edge Gateway, which matters for regulated banks that can't let telemetry leave their network. The cloud SaaS deployment runs the same code with the same query semantics. Same answers, different tenancy.

Status

Shipped today versus on the V1 build list.

The retrieval shape is built and tested. The V1 work is rebuilding the index under the new dense-sparse schema and wiring it to live ingestion. Existing analyst surfaces (chat, omnibox, NLP search) keep their public APIs through the migration.

Shipped

  • Personalized PageRank traversal over an in-memory graph, plus dual-path embedding search across entity and passage indexes.
  • Reasoning-path assembly: seed nodes, top traversal nodes, and the relationships between them returned to every consumer.
  • Embedding cache and query-result cache for repeat workloads. Deterministic fallback embedding path when the model fails to load.
  • Stable retrieval API consumed today by the analyst chat, omnibox, and natural-language search surfaces.

V1 build list

  • New dense-sparse schema (phrase nodes, passage nodes, triples, and typed edges) replacing the single-node prototype index.
  • Live-ingestion indexing: every translated event lands a passage node with linked phrase nodes and at least one triple inside the 50ms budget.
  • Tier 1 / Tier 2 / Tier 3 routing with per-tier latency SLOs and a deep-analysis affordance for explicit Tier 3 opt-in.
  • Multi-tenant isolation, audit-ledger entries on every retrieval, and stable read APIs published for Dedup, DPL, and Auto-Packets to consume.

What the retrieval layer measures.

Latency targets are the design budgets. Coverage is what the existing engine ships with today.

<100ms

Tier 1 lookup latency, p95 target

<500ms

Tier 2 semantic latency, p95 target

<5s

Tier 3 deep-analysis latency, p95 target

Why It Matters

One retrieval brain is the difference between agreeable surfaces and divergent ones.

Every credible SOC platform ships a search box. What separates Concord is what sits behind it. The same graph powers correlation on the dedup surface, analog discovery on the detection-portability surface, and evidence assembly on the auto-packet surface. So three surfaces never disagree about what the corpus says. The retrieval is deterministic, so the same question returns the same answer twice. The query path is CPU-only, so it runs in an air-gapped bank or a regulated MSSP's on-prem rack without any cloud dependency.

The design borrows from current academic work on hybrid embedding-and-graph retrieval, with adaptations made specifically for CPU-only deployment and the audit requirements of regulated buyers. It's engineering excellence, not an IP claim. The patent moat lives upstream in Translation and Entity Resolution. This layer is how those patented engines actually pay off in front of an analyst.

Stop reconciling. Start trusting one timeline.

30-minute walkthrough. Your tools. Your tenants. Your audit cycle. We will show you exactly where Concord earns its keep.