When GraphRAG is worth it

RAG

GraphRAG

Only move to GraphRAG when the answer depends on relationships that plain RAG retrieval alone cannot preserve.

Author

Ethan Tenison

Published

April 14, 2026

People often reach for GraphRAG too early, but a graph only adds value when the answer actually depends on relationships that plain retrieval cannot preserve.

That is the practical question. Not whether a graph is elegant, but whether the problem is relational enough to justify the cost. If the task is simple lookup, summarization, or FAQ-style retrieval, a well-built RAG system is usually enough. If the answer depends on entities, links, dependencies, or multi-hop reasoning, GraphRAG can add real value.

In other words, a graph does not make a system smarter by itself. It helps only when the structure of the data matters to the answer. The decision flow below in Figure 1 is the simplest version of the rule I use in practice. It also sets up the legal example that follows.

flowchart TD
 A["Answer in a few chunks?"]
 C["Missing text, not structure?"]
 B["Do relationships matter?"]

 R["Use RAG"]
 G["Use GraphRAG"]

 A -->|Yes| R
 A -->|No| C

 C -->|Yes| R
 C -->|No| B

 B -->|No| R
 B -->|Yes| G

 style A fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px
 style B fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px
 style C fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px

 style R fill:#EDE0D4,color:#2A1F1A,stroke:#2A1F1A,stroke-width:1.75px
 style G fill:#6B7A3A,color:#FAF7F2,stroke:#2A1F1A,stroke-width:2.5px

 linkStyle 0 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A
 linkStyle 1 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A
 linkStyle 2 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A
 linkStyle 3 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A
 linkStyle 4 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A
 linkStyle 5 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A

Figure 1: Use RAG when the answer is local. Use GraphRAG when correctness depends on relationships across entities or documents.

Case study: legal research where relationships determine the answer

To make that concrete, consider legal research. A legal research question often looks simple until you try to answer it with retrieval alone.

How does the definition of “personal data” under the General Data Protection Regulation compare to “personal information” under the California Consumer Privacy Act, and in what scenarios would a U.S.-based company need to comply with both?

A question like this does not live in one passage. It spans multiple legal regimes, overlapping definitions, territorial scope, and conditional obligations. To answer it well, the system has to connect terms, jurisdictions, exceptions, and compliance triggers across documents. The issue is not that the evidence is far apart. The issue is that the answer depends on how those pieces relate.

In legal data, those relationships are the substance of the answer:

Statutes define terms and establish obligations
Cases interpret, narrow, or extend those terms
Regulations refine scope and implementation
Clauses apply only under specific conditions and exceptions

A plain RAG system can retrieve relevant passages, but it can still miss how those passages fit together. That is the failure mode. The problem is not missing text. It is missing structure.

This is where GraphRAG can earn its cost. A graph gives the system a way to preserve entities, citations, dependencies, and exceptions across documents. When correctness depends on those links, a graph is not extra architecture. It is part of how retrieval stays grounded.

Figure 2: A relationship-heavy legal query often depends on traversing links between definitions, obligations, clauses, and parties, not just retrieving nearby text.

The point of this example is not that legal data can be drawn as a network. The point is that the answer depends on paths through that network. To answer whether a company must comply with both regimes, the system has to connect definitions, scope, obligations, exceptions, and the entities those obligations apply to. That is what justifies the graph. It is a relationship problem, not just a retrieval problem.

Once you see that distinction, the decision becomes simpler.

How should you decide?

Start with plain RAG. It is cheaper, faster, and easier to debug. Move to GraphRAG only when answer quality breaks because the task depends on relationships across entities, documents, or events. If you cannot name that failure mode, you probably do not need GraphRAG yet. The rubric in Table 1 is the decision rule I would use.

Question	Signals Plain RAG	Signals GraphRAG
Where does the answer live?	One document or a few chunks	Across linked entities or documents
What is failing?	Retrieval quality	Relationship structure
What kinds of questions break?	Lookup, summarization, FAQ	Citations, dependencies, exceptions, multi-hop reasoning
What must the system preserve?	Relevant text	Relationships and structure
Is added complexity justified?	Probably not	Possibly yes

Table 1: Decision rubric for when plain RAG is enough and when GraphRAG earns its cost.

What I’d do in practice

In practice, I would start with plain RAG and define a small benchmark around the real failure mode. Then I would measure where it breaks.

If the failures are mostly retrieval quality, I would improve chunking, embeddings, ranking, and citations before adding graph complexity. If the failures are relational, I would add GraphRAG narrowly, around the specific query classes that need it.

That approach keeps the system honest. A graph should earn its place.

You do not have to choose one forever

That is also why this is not always a clean either-or decision. For example, while building my own GraphRAG agent, I realized it did not make sense to treat the graph as a standalone answer engine given the diversity of ways that evidence could be relevant to the questions I cared about. Instead, I used a hybrid retrieval design with dense, sparse, graph, and reranked modes, plus a plan-retrieve-reason-verify workflow. In production, I imagine this is a more realistic path for building systems that need to answer complex questions from diverse evidence sources.

Use graph structure where relationships matter, then use ranking or reranking to decide which evidence is most relevant to the final answer.

The benchmark results for my hybrid graphrag project validated this choice for this context. A later hybrid configuration improved context recall from 0.5275 to 0.6525, context precision from 0.1880 to 0.2013, and answer faithfulness from 0.7467 to 0.8267. Pure graph mode, by contrast, still lagged badly on retrieval quality, with context recall at 0.2473 and context precision at 0.0675. That is the real lesson. Graph structure helped most when it was combined with stronger retrieval and reranking, not when it replaced them.

If you want to see how I approached this trade-off in practice, see my RiskFolio GraphRAG agent experiment here.

The important decision is not “graph or no graph” in the abstract. It is where structure improves retrieval enough to justify the added complexity.

The practical takeaway

GraphRAG is not the default answer. It is the right answer only when the problem depends on structure that plain retrieval cannot reliably preserve.

Start with the simplest system that can succeed. If plain RAG breaks because the answer depends on relationships, add graph structure where it actually helps. And if the task benefits from both, you do not have to choose a pure architecture. In practice, hybrid systems are often the honest solution.

Discussion question

Question: What failure mode are you actually trying to solve, missing context or missing relationships?

--- title: "When GraphRAG is worth it" description: "Only move to GraphRAG when the answer depends on relationships that plain RAG retrieval alone cannot preserve." draft: false date: 2026-04-14 categories: [RAG, GraphRAG] format: html: toc: true toc-depth: 2 tbl-cap-location: bottom metadata: type: decision_memo topic: "GraphRAG" audience: - employers - potential-clients thesis: "Use plain RAG by default. Move to GraphRAG only when the answer depends on relationships, multi-hop reasoning, or entity structure that retrieval alone cannot preserve." discussion_question: "What failure mode are you actually trying to solve, missing context or missing relationships?" publish_to: - blog - linkedin --- People often reach for GraphRAG too early, but a graph only adds value when the answer actually depends on relationships that plain retrieval cannot preserve. That is the practical question. Not whether a graph is elegant, but whether the problem is relational enough to justify the cost. If the task is simple lookup, summarization, or FAQ-style retrieval, a well-built RAG system is usually enough. If the answer depends on entities, links, dependencies, or multi-hop reasoning, GraphRAG can add real value. In other words, a graph does not make a system smarter by itself. It helps only when the structure of the data matters to the answer. The decision flow below in @fig-rag-vs-graphrag is the simplest version of the rule I use in practice. It also sets up the legal example that follows. ```{mermaid} %%| label: fig-rag-vs-graphrag %%| fig-cap: "Use RAG when the answer is local. Use GraphRAG when correctness depends on relationships across entities or documents." flowchart TD A["Answer in a few chunks?"] C["Missing text, not structure?"] B["Do relationships matter?"] R["Use RAG"] G["Use GraphRAG"] A -->|Yes| R A -->|No| C C -->|Yes| R C -->|No| B B -->|No| R B -->|Yes| G style A fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px style B fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px style C fill:#FAF7F2,color:#2A1F1A,stroke:#2A1F1A,stroke-width:2.25px style R fill:#EDE0D4,color:#2A1F1A,stroke:#2A1F1A,stroke-width:1.75px style G fill:#6B7A3A,color:#FAF7F2,stroke:#2A1F1A,stroke-width:2.5px linkStyle 0 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A linkStyle 1 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A linkStyle 2 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A linkStyle 3 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A linkStyle 4 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A linkStyle 5 stroke:#3D3530,stroke-width:2.25px,color:#2A1F1A ``` ## Case study: legal research where relationships determine the answer To make that concrete, consider legal research. A legal research question often looks simple until you try to answer it with retrieval alone. > How does the definition of “personal data” under the [General Data Protection Regulation](https://www.britannica.com/topic/General-Data-Protection-Regulation) compare to “personal information” under the [California Consumer Privacy Act](https://www.oag.ca.gov/privacy/ccpa), and in what scenarios would a U.S.-based company need to comply with both? A question like this does not live in one passage. It spans multiple legal regimes, overlapping definitions, territorial scope, and conditional obligations. To answer it well, the system has to connect terms, jurisdictions, exceptions, and compliance triggers across documents. The issue is not that the evidence is far apart. The issue is that the answer depends on how those pieces relate. In legal data, those relationships are the substance of the answer: - **Statutes** define terms and establish obligations - **Cases** interpret, narrow, or extend those terms - **Regulations** refine scope and implementation - **Clauses** apply only under specific conditions and exceptions A plain RAG system can retrieve relevant passages, but it can still miss how those passages fit together. That is the failure mode. The problem is not missing text. It is missing structure. This is where GraphRAG can earn its cost. A graph gives the system a way to preserve entities, citations, dependencies, and exceptions across documents. When correctness depends on those links, a graph is not extra architecture. It is part of how retrieval stays grounded. ```{python} #| label: fig-legal-knowledge-graph #| fig-cap: "A relationship-heavy legal query often depends on traversing links between definitions, obligations, clauses, and parties, not just retrieving nearby text." #| echo: false #| warning: false from pathlib import Path import math import re import xml.etree.ElementTree as ET import networkx as nx from IPython.display import SVG, display from networkx.drawing.nx_agraph import to_agraph NODE_CATEGORIES = { "document": {"fillcolor": "#3E6FB6", "fontcolor": "#FFFFFF"}, "structural": {"fillcolor": "#2AA6A4", "fontcolor": "#FFFFFF"}, "semantic": {"fillcolor": "#6B7A3A", "fontcolor": "#FFFFFF"}, } NODES = [ {"id": "statute_1", "label": "Statute", "category": "document"}, {"id": "regulation_1", "label": "Regulation", "category": "document"}, {"id": "contract_1", "label": "Contract", "category": "document"}, {"id": "clause_1", "label": "Clause", "category": "structural"}, {"id": "clause_2", "label": "Clause", "category": "structural"}, {"id": "concept_1", "label": "Concept", "category": "semantic"}, {"id": "obligation_1", "label": "Obligation", "category": "semantic"}, {"id": "obligation_2", "label": "Obligation", "category": "semantic"}, {"id": "party_1", "label": "Party", "category": "semantic"}, {"id": "party_2", "label": "Party", "category": "semantic"}, ] EDGES = [ ("statute_1", "concept_1", "defines"), ("regulation_1", "concept_1", "refines"), ("contract_1", "clause_1", "contains"), ("contract_1", "clause_2", "contains"), ("clause_1", "obligation_1", "creates"), ("clause_2", "obligation_2", "creates"), ("clause_2", "obligation_1", "limits"), ("obligation_1", "party_1", "applies_to"), ("obligation_2", "party_2", "applies_to"), ("clause_1", "concept_1", "implements"), ("statute_1", "obligation_1", "creates"), ("party_1", "contract_1", "enters_into"), ("party_2", "contract_1", "enters_into"), ("statute_1", "obligation_2", "creates"), ("regulation_1", "obligation_1", "imposes"), ("regulation_1", "obligation_2", "imposes"), ("contract_1", "statute_1", "governed_by"), ] def build_graph(nodes: list[dict], edges: list[tuple[str, str, str]]) -> nx.DiGraph: graph = nx.DiGraph() for node in nodes: graph.add_node(node["id"], label=node["label"], category=node["category"]) for source, target, relationship in edges: graph.add_edge(source, target, relationship=relationship) return graph def compute_node_sizes(graph: nx.DiGraph, min_size: float = 0.75, max_size: float = 1.35) -> dict[str, float]: centrality = nx.degree_centrality(graph.to_undirected()) if not centrality: return {} low, high = min(centrality.values()), max(centrality.values()) if low == high: return {node_id: (min_size + max_size) / 2 for node_id in graph.nodes} return { node_id: min_size + ((score - low) / (high - low)) * (max_size - min_size) for node_id, score in centrality.items() } def orient_edge_labels(svg_path: Path, offset: float = 5.0) -> None: svg_ns = "http://www.w3.org/2000/svg" ns = {"svg": svg_ns} ET.register_namespace("", svg_ns) tree = ET.parse(svg_path) root = tree.getroot() for edge_group in root.findall(".//svg:g[@class='edge']", ns): path = edge_group.find("svg:path", ns) text = edge_group.find("svg:text", ns) if path is None or text is None: continue numbers = [float(v) for v in re.findall(r"[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?", path.attrib.get("d", ""))] points = list(zip(numbers[::2], numbers[1::2])) if len(points) < 2: continue (x1, y1), (x2, y2) = points[0], points[-1] dx, dy = x2 - x1, y2 - y1 if dx == 0 and dy == 0: continue mx, my = (x1 + x2) / 2, (y1 + y2) / 2 angle = math.degrees(math.atan2(dy, dx)) if angle > 90 or angle < -90: angle += 180 length = math.hypot(dx, dy) nx_off, ny_off = -dy / length, dx / length lx, ly = mx + nx_off * offset, my + ny_off * offset text.set("x", f"{lx:.2f}") text.set("y", f"{ly:.2f}") text.set("transform", f"rotate({angle:.2f} {lx:.2f} {ly:.2f})") text.set("dominant-baseline", "middle") text.set("text-anchor", "middle") tree.write(svg_path, encoding="utf-8", xml_declaration=True) def export_graph_svg(graph: nx.DiGraph, output_path: Path, prog: str = "twopi") -> None: agraph = to_agraph(graph) node_sizes = compute_node_sizes(graph) agraph.graph_attr.update( bgcolor="#F7F5F2", splines="line", overlap="false", outputorder="edgesfirst", start="7", pad="0.15", ) agraph.node_attr.update( shape="circle", style="filled", color="none", fontname="Inter", fontsize="16", fixedsize="shape", ) agraph.edge_attr.update( color="#B8B4AE", penwidth="3", arrowsize="0.8", fontname="Inter", fontsize="16", fontcolor="#3D3530", ) for node_id, attrs in graph.nodes(data=True): node = agraph.get_node(node_id) style = NODE_CATEGORIES[attrs["category"]] node.attr["label"] = attrs["label"] node.attr["fillcolor"] = style["fillcolor"] node.attr["fontcolor"] = style["fontcolor"] node.attr["width"] = f"{node_sizes[node_id]:.2f}" node.attr["height"] = f"{node_sizes[node_id]:.2f}" for source, target, attrs in graph.edges(data=True): edge = agraph.get_edge(source, target) edge.attr["label"] = attrs["relationship"].replace("_", " ") agraph.draw(str(output_path), prog=prog, format="svg") orient_edge_labels(output_path) graph = build_graph(NODES, EDGES) output_dir = Path("artifacts") output_dir.mkdir(exist_ok=True) output_path = output_dir / "legal_knowledge_graph.svg" export_graph_svg(graph, output_path, prog="twopi") display(SVG(filename=str(output_path))) ``` The point of this example is not that legal data can be drawn as a network. The point is that the answer depends on paths through that network. To answer whether a company must comply with both regimes, the system has to connect definitions, scope, obligations, exceptions, and the entities those obligations apply to. That is what justifies the graph. It is a relationship problem, not just a retrieval problem. Once you see that distinction, the decision becomes simpler. ## How should you decide? Start with plain RAG. It is cheaper, faster, and easier to debug. Move to GraphRAG only when answer quality breaks because the task depends on relationships across entities, documents, or events. If you cannot name that failure mode, you probably do not need GraphRAG yet. The rubric in @tbl-rag-graphrag-rubric is the decision rule I would use. | Question | Signals Plain RAG | Signals GraphRAG | |------------------------|------------------------|------------------------| | Where does the answer live? | One document or a few chunks | Across linked entities or documents | | What is failing? | Retrieval quality | Relationship structure | | What kinds of questions break? | Lookup, summarization, FAQ | Citations, dependencies, exceptions, multi-hop reasoning | | What must the system preserve? | Relevant text | Relationships and structure | | Is added complexity justified? | Probably not | Possibly yes | : Decision rubric for when plain RAG is enough and when GraphRAG earns its cost. {#tbl-rag-graphrag-rubric} ## What I’d do in practice In practice, I would start with plain RAG and define a small benchmark around the real failure mode. Then I would measure where it breaks. If the failures are mostly retrieval quality, I would improve chunking, embeddings, ranking, and citations before adding graph complexity. If the failures are relational, I would add GraphRAG narrowly, around the specific query classes that need it. That approach keeps the system honest. A graph should earn its place. ## You do not have to choose one forever That is also why this is not always a clean either-or decision. For example, while building my own GraphRAG agent, I realized it did not make sense to treat the graph as a standalone answer engine given the diversity of ways that evidence could be relevant to the questions I cared about. Instead, I used a hybrid retrieval design with dense, sparse, graph, and reranked modes, plus a plan-retrieve-reason-verify workflow. In production, I imagine this is a more realistic path for building systems that need to answer complex questions from diverse evidence sources. > Use graph structure where relationships matter, then use ranking or reranking to decide which evidence is most relevant to the final answer. The benchmark results for my hybrid graphrag project validated this choice for this context. A later hybrid configuration improved context recall from 0.5275 to 0.6525, context precision from 0.1880 to 0.2013, and answer faithfulness from 0.7467 to 0.8267. Pure graph mode, by contrast, still lagged badly on retrieval quality, with context recall at 0.2473 and context precision at 0.0675. That is the real lesson. Graph structure helped most when it was combined with stronger retrieval and reranking, not when it replaced them. If you want to see how I approached this trade-off in practice, see my RiskFolio GraphRAG agent experiment [here](https://github.com/ethantenison/riskfolio-graphrag-agent). The important decision is not “graph or no graph” in the abstract. It is where structure improves retrieval enough to justify the added complexity. ## The practical takeaway GraphRAG is not the default answer. It is the right answer only when the problem depends on structure that plain retrieval cannot reliably preserve. Start with the simplest system that can succeed. If plain RAG breaks because the answer depends on relationships, add graph structure where it actually helps. And if the task benefits from both, you do not have to choose a pure architecture. In practice, hybrid systems are often the honest solution. ## Discussion question **Question:** What failure mode are you actually trying to solve, missing context or missing relationships?