feat: Entity Identity Synthesis — first-class Entity nodes with semantic memory#124
feat: Entity Identity Synthesis — first-class Entity nodes with semantic memory#124jescalan wants to merge 6 commits intoverygoodplugins:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds entity deduplication and merge tooling, LLM-driven entity identity synthesis with scheduler/runtime wiring, a new Entity REST API and recall identity injection, a migration script to create Entity nodes from Memory tags, and extensive tests for these features. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Consolidation<br/>Scheduler
participant Consolidator as MemoryConsolidator
participant Graph as FalkorDB<br/>Graph
participant LLM as OpenAI<br/>Client
participant EntityNode as Entity<br/>Node
Scheduler->>Consolidator: trigger identity task
Consolidator->>Consolidator: _get_openai_client()
Consolidator->>Graph: run_identity_consolidation()
Graph->>Graph: find_merge_candidates()
Graph->>Graph: merge_entities()
Graph->>Graph: fetch entities needing synthesis
Graph->>Graph: collect memories per entity
loop per entity
Graph->>LLM: synthesize_identity(memories)
LLM-->>Graph: return identity string
Graph->>EntityNode: update identity + version
end
Graph-->>Consolidator: return consolidation results
Consolidator-->>Scheduler: record metrics
sequenceDiagram
participant Client
participant API as Entity API<br/>Endpoint
participant Graph as FalkorDB<br/>Graph
participant Dedup as entity_dedup<br/>module
Client->>API: GET /entities/merge-candidates
API->>Graph: get_memory_graph()
API->>Dedup: find_merge_candidates()
Dedup->>Graph: load entities & memories
Dedup->>Dedup: compute slug similarity & overlap
Dedup-->>API: return candidates
API-->>Client: JSON response
Client->>API: POST /entity/{slug}/merge
API->>Graph: resolve entities
API->>Dedup: merge_entities()
Dedup->>Graph: re-link memories, update aliases, mark merged
Dedup-->>API: return MergeResult
API-->>Client: merge details
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tic memory - Add Entity nodes to FalkorDB with identity, aliases, and REFERENCED_IN edges - New identity consolidation task: dedup entities + synthesize 2-5 sentence identities via LLM - Identity-aware recall: inject entity identities into recall responses - Entity API: GET /entities, GET /entity/<slug>, GET /entities/merge-candidates, POST /entity/<slug>/merge - Migration script: scripts/migrate_entity_nodes.py (idempotent, --dry-run support) - Configurable: IDENTITY_SYNTHESIS_MODEL, CONSOLIDATION_IDENTITY_INTERVAL_SECONDS - Batch queries throughout (UNWIND edges, single-query recall injection) - Conditional synthesis: skip unchanged entities to save LLM calls - 18 tests covering migration, dedup, merge, synthesis, and recall
725bc01 to
fcfb458
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (5)
scripts/migrate_entity_nodes.py (1)
50-52: Uselogging.exceptionfor import errors.Per static analysis hint,
logging.exceptionis preferred when handling exceptions to include the traceback, though here it's an ImportError so minimal traceback is expected. The current approach is acceptable but could be improved:♻️ Optional improvement
except ImportError: - logger.error("falkordb package not installed") + logger.exception("falkordb package not installed") sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_entity_nodes.py` around lines 50 - 52, Replace the logger.error call inside the except ImportError block with logger.exception so the ImportError traceback is recorded; keep the existing message ("falkordb package not installed") and retain the sys.exit(1) behavior in the same except block to ensure the process still exits after logging the exception.automem/api/recall.py (1)
1510-1561: Entity identity injection follows graceful degradation pattern.The implementation correctly logs errors without blocking the main recall flow, which aligns with the coding guidelines for vector store errors. The bare
except Exceptioncatches (Lines 1547, 1558, 1560) are appropriate here since entity identity is supplementary data.One minor robustness consideration: Line 1555 uses
int(row[6] or 0)which could raiseValueErrorifrow[6]is a non-numeric string. Consider wrapping in a try/except or using a safer conversion:🛡️ Safer int conversion
- "identity_source_count": int(row[6] or 0), + "identity_source_count": int(row[6]) if isinstance(row[6], (int, float)) else 0,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/recall.py` around lines 1510 - 1561, The loop building entity_identities uses int(row[6] or 0) which can raise ValueError if row[6] is a non-numeric string; update the block that iterates over ent_result (the for row in getattr(ent_result, "result_set", []) loop) to convert identity_source_count safely—e.g. attempt int(row[6]) inside a small try/except (catch ValueError/TypeError) and fall back to 0 (optionally logger.debug the bad value) before appending the dict to entity_identities; this keeps the graceful-degradation behavior while preventing a conversion exception from bubbling up.consolidation.py (1)
152-165: Consider reusing the existing OpenAI client pattern.This method creates a new OpenAI client instance on each call. The main app already has
init_openaiandget_openai_clientpatterns (via_service_runtime). Consider injecting a client getter intoMemoryConsolidatorinstead of reading environment variables directly within the class.This would:
- Avoid duplicate client creation logic
- Make the class more testable (easier to mock)
- Respect any base URL or other configuration already set up
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@consolidation.py` around lines 152 - 165, Replace the inline client-creation in MemoryConsolidator._get_openai_client with a dependency-injected getter: accept or set a callable/property that returns the app-wide OpenAI client (the same one produced by init_openai/get_openai_client on _service_runtime) and use that instead of reading OPENAI_API_KEY/OPENAI_BASE_URL and instantiating openai.OpenAI directly; update MemoryConsolidator to accept this client getter (or to fetch it from _service_runtime) so tests can inject mocks and the class reuses existing configuration.automem/consolidation/entity_dedup.py (2)
236-237: Use unpacking for cleaner list construction.Per static analysis suggestion, use list unpacking instead of concatenation.
♻️ Proposed fix
- incoming_aliases = [alias_slug] + alias_aliases if alias_slug else alias_aliases + incoming_aliases = [alias_slug, *alias_aliases] if alias_slug else alias_aliases🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/consolidation/entity_dedup.py` around lines 236 - 237, Replace the manual conditional and concatenation with list unpacking to build incoming_aliases and merged_aliases more cleanly: where the code currently constructs incoming_aliases = [alias_slug] + alias_aliases if alias_slug else alias_aliases and merged_aliases = list(dict.fromkeys(current_aliases + incoming_aliases)), change to use unpacking (e.g., incoming_aliases = [*([alias_slug] if alias_slug else []), *alias_aliases]) and merged_aliases = list(dict.fromkeys([*current_aliases, *incoming_aliases])) so order is preserved and duplicates removed as before.
198-218: Old REFERENCED_IN edges on alias entity are not deleted.After merging edges from the alias to the canonical entity, the original edges on the alias entity remain. While the alias is marked with
merged_intoand filtered from queries, this leaves orphaned edges in the graph.Consider deleting the old edges after the merge to maintain graph hygiene:
🧹 Proposed fix — delete old edges
if mem_ids: graph.query( """ MATCH (e:Entity {id: $canonical_id}) UNWIND $mem_ids AS mid MATCH (m:Memory {id: mid}) MERGE (e)-[:REFERENCED_IN]->(m) """, {"canonical_id": canonical_id, "mem_ids": mem_ids}, ) + # Delete old edges from alias entity + graph.query( + "MATCH (e:Entity {id: $id})-[r:REFERENCED_IN]->() DELETE r", + {"id": alias_id}, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/consolidation/entity_dedup.py` around lines 198 - 218, After batch-creating REFERENCED_IN edges on the canonical entity, remove the old relationships from the alias to avoid orphaned edges: after the existing graph.query that MERGE(s) edges for {"canonical_id": canonical_id, "mem_ids": mem_ids}, run a follow-up graph.query that MATCHES (e:Entity {id: $id})-[r:REFERENCED_IN]->(m:Memory) WHERE m.id IN $mem_ids DELETE r (use {"id": alias_id, "mem_ids": mem_ids}); update edges_moved if you want it to reflect deletions and ensure the code only runs the DELETE when mem_ids is non-empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app.py`:
- Line 362: Run the code formatter and import sorter to resolve CI formatting
errors: execute `black .` and `isort .` (or your project's configured pre-commit
hooks) in the repo root, then re-run the tests; this will fix the formatting
issue flagged in app.py around the line containing
identity_interval_seconds=CONSOLIDATION_IDENTITY_INTERVAL_SECONDS and ensure
consistent style across the codebase.
In `@automem/api/entity.py`:
- Line 72: The current conversion of request.args.get("limit", 100) to int can
raise ValueError for invalid input; update the handler that assigns limit to
validate the query value before casting (check that request.args.get("limit") is
None or matches an integer pattern or use a safe try/except around int()) and on
invalid input return a 400 response with a clear error message (e.g., "invalid
limit parameter") instead of allowing a 500; keep the existing cap of min(...,
500) after successful validation and reference the same variable name limit and
the call request.args.get("limit", 100) in your change.
- Around line 57-63: The entity blueprint created by create_entity_blueprint
lacks authentication; update its signature to accept require_api_token_fn and
require_admin_token_fn (or at least require_api_token_fn) and apply
require_api_token_fn as a decorator/validator on all read routes and
require_admin_token_fn on the merge/modify route handlers; then update the
blueprint registration call in runtime_bootstrap to pass the appropriate auth
functions so endpoints validate tokens before returning or mutating entity data.
In `@automem/api/runtime_bootstrap.py`:
- Around line 176-179: The entity blueprint is created without authentication,
so the POST /entity/<slug>/merge write endpoint can be called unauthenticated;
update the create_entity_blueprint call to pass the API auth function (e.g.,
require_api_token_fn) alongside get_memory_graph_fn and logger, and then ensure
the merge handler inside the blueprint checks/uses that auth function to
validate requests before performing the merge operation (refer to
create_entity_blueprint, get_memory_graph_fn and the merge endpoint handler for
where to add the parameter and the authentication check).
In `@automem/consolidation/identity_synthesis.py`:
- Around line 170-173: The code assumes response.choices[0].message.content is
always a string when assigning identity_text, which can be None and cause
AttributeError on .strip(); update the try block around identity_text assignment
(look for identity_text, response, and logger.exception usage) to check for None
before stripping—e.g., retrieve content = response.choices[0].message.content,
if content is None log a warning with entity_id and return None (or set
identity_text to an empty string if appropriate), otherwise call content.strip()
and proceed; keep the existing exception handling via logger.exception for other
errors.
In `@consolidation.py`:
- Around line 858-882: The logger.exception call in the identity consolidation
block is passing the caught exception `exc` as a formatting parameter, which is
redundant because logger.exception already records exception info; update the
call in consolidation.py (inside the try/except around
run_identity_consolidation and _get_openai_client) to call logger.exception with
a simple message (e.g., "Identity consolidation failed") and keep the existing
results["steps"]["identity"] = {"error": str(exc)} behavior unchanged.
In `@scripts/migrate_entity_nodes.py`:
- Around line 1-10: Run Black and isort on the file to fix line length and
import ordering, and then add missing type hints: annotate connect_falkordb(),
collect_entity_tags(graph), and run_migration(graph) with appropriate parameter
and return types consistent with the project (e.g., the graph type used across
the repo and concrete return types or None). Also break the overlong statement
(the one exceeding 100 chars) into shorter concatenated strings or use implicit
continuation so it fits the 100-character limit; ensure the changed signatures
use the correct imports/types after formatting.
In `@tests/test_entity_identities.py`:
- Around line 375-379: The test unpacks a two-value return from
find_merge_candidates into (auto_merge, review) but never uses review; change
the unused variable name to _review to signal it's intentionally ignored (e.g.,
replace the local variable review with _review in the test where
find_merge_candidates is called) so linters won't flag an unused variable while
keeping behavior intact.
---
Nitpick comments:
In `@automem/api/recall.py`:
- Around line 1510-1561: The loop building entity_identities uses int(row[6] or
0) which can raise ValueError if row[6] is a non-numeric string; update the
block that iterates over ent_result (the for row in getattr(ent_result,
"result_set", []) loop) to convert identity_source_count safely—e.g. attempt
int(row[6]) inside a small try/except (catch ValueError/TypeError) and fall back
to 0 (optionally logger.debug the bad value) before appending the dict to
entity_identities; this keeps the graceful-degradation behavior while preventing
a conversion exception from bubbling up.
In `@automem/consolidation/entity_dedup.py`:
- Around line 236-237: Replace the manual conditional and concatenation with
list unpacking to build incoming_aliases and merged_aliases more cleanly: where
the code currently constructs incoming_aliases = [alias_slug] + alias_aliases if
alias_slug else alias_aliases and merged_aliases =
list(dict.fromkeys(current_aliases + incoming_aliases)), change to use unpacking
(e.g., incoming_aliases = [*([alias_slug] if alias_slug else []),
*alias_aliases]) and merged_aliases = list(dict.fromkeys([*current_aliases,
*incoming_aliases])) so order is preserved and duplicates removed as before.
- Around line 198-218: After batch-creating REFERENCED_IN edges on the canonical
entity, remove the old relationships from the alias to avoid orphaned edges:
after the existing graph.query that MERGE(s) edges for {"canonical_id":
canonical_id, "mem_ids": mem_ids}, run a follow-up graph.query that MATCHES
(e:Entity {id: $id})-[r:REFERENCED_IN]->(m:Memory) WHERE m.id IN $mem_ids DELETE
r (use {"id": alias_id, "mem_ids": mem_ids}); update edges_moved if you want it
to reflect deletions and ensure the code only runs the DELETE when mem_ids is
non-empty.
In `@consolidation.py`:
- Around line 152-165: Replace the inline client-creation in
MemoryConsolidator._get_openai_client with a dependency-injected getter: accept
or set a callable/property that returns the app-wide OpenAI client (the same one
produced by init_openai/get_openai_client on _service_runtime) and use that
instead of reading OPENAI_API_KEY/OPENAI_BASE_URL and instantiating
openai.OpenAI directly; update MemoryConsolidator to accept this client getter
(or to fetch it from _service_runtime) so tests can inject mocks and the class
reuses existing configuration.
In `@scripts/migrate_entity_nodes.py`:
- Around line 50-52: Replace the logger.error call inside the except ImportError
block with logger.exception so the ImportError traceback is recorded; keep the
existing message ("falkordb package not installed") and retain the sys.exit(1)
behavior in the same except block to ensure the process still exits after
logging the exception.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dad03088-75de-48c0-a8ba-f1054d681b13
📒 Files selected for processing (14)
.gitignoreapp.pyautomem/api/entity.pyautomem/api/recall.pyautomem/api/runtime_bootstrap.pyautomem/config.pyautomem/consolidation/entity_dedup.pyautomem/consolidation/identity_synthesis.pyautomem/consolidation/runtime_bindings.pyautomem/consolidation/runtime_helpers.pyautomem/consolidation/runtime_scheduler.pyconsolidation.pyscripts/migrate_entity_nodes.pytests/test_entity_identities.py
- Add admin token auth on entity merge endpoint - Validate limit parameter with 400 on invalid input - Handle None LLM response content gracefully - Remove redundant exc in logger.exception call - Prefix unused variable with underscore in tests - Add return type hint to connect_falkordb() - Run black + isort formatting on all changed files
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automem/api/entity.py`:
- Around line 205-234: Reject self-merge attempts before calling merge_entities:
after resolving alias_id and canonical_id (from alias_rows and canonical_rows)
add a check if alias_id == canonical_id and abort with a 400 (bad request) and a
clear message referencing slug/merge_into (e.g., "cannot merge entity into
itself"); this prevents passing identical IDs into merge_entities and
tombstoning the canonical entity.
In `@scripts/migrate_entity_nodes.py`:
- Line 44: The function signature connect_falkordb() -> Any references the Any
type but Any is not imported; add "from typing import Any" to the top of the
module (or include Any in an existing typing import) so the return annotation
resolves and flake8 F821 is satisfied, then run linters to confirm.
In `@tests/test_entity_identities.py`:
- Around line 481-483: The test assigns (auto_merge, _review) =
find_merge_candidates(graph) but asserts against an undefined name review;
update the second assertion to use the actual variable _review (i.e., assert
len(_review) == 0) so both assertions reference the variables returned by
find_merge_candidates (auto_merge and _review).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d32b9e39-f2c1-4021-ab14-3e36728462cb
📒 Files selected for processing (7)
app.pyautomem/api/entity.pyautomem/api/runtime_bootstrap.pyautomem/consolidation/identity_synthesis.pyconsolidation.pyscripts/migrate_entity_nodes.pytests/test_entity_identities.py
- Reject self-merges with 400 before calling merge_entities() - Add missing 'from typing import Any' in migrate script - Fix _review reference in test assertion (line 483)
There was a problem hiding this comment.
♻️ Duplicate comments (2)
automem/api/entity.py (1)
57-70:⚠️ Potential issue | 🟠 MajorMissing authentication on read endpoints.
Per coding guidelines, Flask API routes must validate requests and handle authentication. The blueprint accepts
require_admin_token_fnfor write operations but read endpoints (/entities,/entity/<slug>,/entities/merge-candidates) have no authentication. This exposes entity data to unauthenticated access.Consider adding a
require_api_token_fnparameter and applying it to read endpoints.🔐 Proposed fix — add authentication for read endpoints
def create_entity_blueprint( get_memory_graph: Callable[[], Any], logger: Any, + require_api_token_fn: Optional[Callable[[], None]] = None, require_admin_token_fn: Optional[Callable[[], None]] = None, ) -> Blueprint:Then wrap each read endpoint:
`@bp.route`("/entities", methods=["GET"]) def list_entities() -> Any: + if require_api_token_fn is not None: + require_api_token_fn() graph = get_memory_graph()Apply similarly to
get_entityandmerge_candidates.Also applies to: 72-73, 125-126, 154-155
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/entity.py` around lines 57 - 70, The read endpoints in create_entity_blueprint (routes handling /entities, get_entity (/entity/<slug>), and merge_candidates (/entities/merge-candidates)) lack authentication; add a new optional parameter require_api_token_fn to create_entity_blueprint and call it at the start of each read handler (similar to how require_admin_token_fn is used for merge) to enforce request validation; update the blueprint signature and invoke require_api_token_fn() in the handlers for the functions/route handlers named get_entities (or the handler for /entities), get_entity, and merge_candidates so reads require a valid API token.scripts/migrate_entity_nodes.py (1)
45-63:⚠️ Potential issue | 🟡 MinorAdd Railway domain fallbacks for host resolution.
The connection logic doesn't include the Railway domain fallbacks present in
automem/stores/runtime_clients.py, which could cause the script to fail in Railway deployments whereFALKORDB_HOSTisn't set but Railway-specific env vars are.🔧 Proposed fix
- host = os.getenv("FALKORDB_HOST", "localhost") + host = ( + os.getenv("FALKORDB_HOST") + or os.getenv("RAILWAY_PRIVATE_DOMAIN") + or os.getenv("RAILWAY_PUBLIC_DOMAIN") + or "localhost" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_entity_nodes.py` around lines 45 - 63, connect_falkordb currently only reads FALKORDB_HOST and misses the Railway fallbacks; update the host resolution in connect_falkordb to mirror automem/stores/runtime_clients.py by: read FALKORDB_HOST first, then fallback to RAILWAY_PRIVATE_SUBDOMAIN and then RAILWAY_STATIC_URL (via os.getenv), and if a fallback URL is used extract the hostname portion before assigning to host; keep the rest of the logic that builds params (including password/username) and instantiates FalkorDB (**params) and returns db.select_graph(GRAPH_NAME).
🧹 Nitpick comments (5)
scripts/migrate_entity_nodes.py (3)
49-53: Uselogger.exceptioninstead oflogger.errorfor import failures.Per static analysis hint (TRY400),
logger.exceptionautomatically includes traceback information which aids debugging.📝 Proposed fix
try: from falkordb import FalkorDB except ImportError: - logger.error("falkordb package not installed") + logger.exception("falkordb package not installed") sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_entity_nodes.py` around lines 49 - 53, In the except ImportError block that catches the failed import of FalkorDB in scripts/migrate_entity_nodes.py, replace the call to logger.error("falkordb package not installed") with logger.exception("falkordb package not installed") so the exception traceback is recorded; keep the existing sys.exit(1) behavior. Ensure the logger variable referenced in that block is used unchanged.
87-89: Add type hint forgraphparameter.Same issue as
collect_entity_tags— thegraphparameter should have a type annotation for consistency with the codebase style.♻️ Proposed fix
def run_migration( - graph, entity_to_memories: dict[str, list[str]], *, dry_run: bool + graph: Any, entity_to_memories: dict[str, list[str]], *, dry_run: bool ) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_entity_nodes.py` around lines 87 - 89, The run_migration function's graph parameter is missing a type annotation; update def run_migration(...) to add the same graph type annotation used by collect_entity_tags (i.e., match the exact type name used there) so the signature becomes consistent with the codebase style and type-checking (refer to the run_migration function and collect_entity_tags for the exact identifier to reuse).
66-84: Add type hint forgraphparameter.Per coding guidelines, all Python code should use type hints. The
graphparameter lacks a type annotation.♻️ Proposed fix
-def collect_entity_tags(graph) -> dict[str, list[str]]: +def collect_entity_tags(graph: Any) -> dict[str, list[str]]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_entity_nodes.py` around lines 66 - 84, The function collect_entity_tags is missing a type annotation for the graph parameter; add a type hint to the signature (e.g., graph: neo4j.Graph or graph: py2neo.Graph depending on the graph client used in this repo) and import the appropriate type, or use graph: Any with from typing import Any if the concrete type is uncertain; update the function signature in collect_entity_tags and add the corresponding import so the file passes the project's typing guidelines.automem/api/entity.py (1)
160-192: Consider logging successful merge candidate discovery.The exception path logs failures, but there's no logging for successful operations. Adding debug-level logging for candidate counts would aid troubleshooting.
📝 Optional: Add debug logging
auto_merge, review = find_merge_candidates(graph) + logger.debug( + "Found %d auto-merge and %d review candidates", + len(auto_merge), + len(review), + ) auto_merge_ids = {(c.entity_a_id, c.entity_b_id) for c in auto_merge}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/entity.py` around lines 160 - 192, After successfully building the merge candidate list, add a debug log that records the outcome (e.g., number of candidates and number of auto_merge candidates) to aid troubleshooting; place the log call just before the return jsonify(...) in the block that calls find_merge_candidates, referencing the variables built there (find_merge_candidates, auto_merge, review, candidates) and using the existing logger (logger.debug) so successful discoveries are logged at debug level.tests/test_entity_identities.py (1)
749-751: Test assertion could be clearer.The assertion logic using
orwith set comprehension is harder to read. Consider a more explicit check.📝 Optional: Clearer assertion
entities = _extract_entities_from_results(results) - assert "alice smith" in entities or "alice-smith" in { - e.replace(" ", "-") for e in entities - } + # Normalize to slug format for comparison + normalized = {e.lower().replace(" ", "-") for e in entities} + assert "alice-smith" in normalized, f"Expected 'alice-smith' in {entities}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_entity_identities.py` around lines 749 - 751, The assertion is hard to read; normalize entity names and check membership explicitly: create a normalized set from entities (e.g., normalize by replacing hyphens with spaces and lowercasing) and then assert "alice smith" is in that normalized set (referring to the local variable entities in this test) so the intent is clear and the logic is simpler to understand.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@automem/api/entity.py`:
- Around line 57-70: The read endpoints in create_entity_blueprint (routes
handling /entities, get_entity (/entity/<slug>), and merge_candidates
(/entities/merge-candidates)) lack authentication; add a new optional parameter
require_api_token_fn to create_entity_blueprint and call it at the start of each
read handler (similar to how require_admin_token_fn is used for merge) to
enforce request validation; update the blueprint signature and invoke
require_api_token_fn() in the handlers for the functions/route handlers named
get_entities (or the handler for /entities), get_entity, and merge_candidates so
reads require a valid API token.
In `@scripts/migrate_entity_nodes.py`:
- Around line 45-63: connect_falkordb currently only reads FALKORDB_HOST and
misses the Railway fallbacks; update the host resolution in connect_falkordb to
mirror automem/stores/runtime_clients.py by: read FALKORDB_HOST first, then
fallback to RAILWAY_PRIVATE_SUBDOMAIN and then RAILWAY_STATIC_URL (via
os.getenv), and if a fallback URL is used extract the hostname portion before
assigning to host; keep the rest of the logic that builds params (including
password/username) and instantiates FalkorDB (**params) and returns
db.select_graph(GRAPH_NAME).
---
Nitpick comments:
In `@automem/api/entity.py`:
- Around line 160-192: After successfully building the merge candidate list, add
a debug log that records the outcome (e.g., number of candidates and number of
auto_merge candidates) to aid troubleshooting; place the log call just before
the return jsonify(...) in the block that calls find_merge_candidates,
referencing the variables built there (find_merge_candidates, auto_merge,
review, candidates) and using the existing logger (logger.debug) so successful
discoveries are logged at debug level.
In `@scripts/migrate_entity_nodes.py`:
- Around line 49-53: In the except ImportError block that catches the failed
import of FalkorDB in scripts/migrate_entity_nodes.py, replace the call to
logger.error("falkordb package not installed") with logger.exception("falkordb
package not installed") so the exception traceback is recorded; keep the
existing sys.exit(1) behavior. Ensure the logger variable referenced in that
block is used unchanged.
- Around line 87-89: The run_migration function's graph parameter is missing a
type annotation; update def run_migration(...) to add the same graph type
annotation used by collect_entity_tags (i.e., match the exact type name used
there) so the signature becomes consistent with the codebase style and
type-checking (refer to the run_migration function and collect_entity_tags for
the exact identifier to reuse).
- Around line 66-84: The function collect_entity_tags is missing a type
annotation for the graph parameter; add a type hint to the signature (e.g.,
graph: neo4j.Graph or graph: py2neo.Graph depending on the graph client used in
this repo) and import the appropriate type, or use graph: Any with from typing
import Any if the concrete type is uncertain; update the function signature in
collect_entity_tags and add the corresponding import so the file passes the
project's typing guidelines.
In `@tests/test_entity_identities.py`:
- Around line 749-751: The assertion is hard to read; normalize entity names and
check membership explicitly: create a normalized set from entities (e.g.,
normalize by replacing hyphens with spaces and lowercasing) and then assert
"alice smith" is in that normalized set (referring to the local variable
entities in this test) so the intent is clear and the logic is simpler to
understand.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 52ce054f-1032-4de6-ad37-80be37e88639
📒 Files selected for processing (3)
automem/api/entity.pyscripts/migrate_entity_nodes.pytests/test_entity_identities.py
Line length 100, isort profile=black per .pre-commit-config.yaml
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
automem/api/entity.py (1)
57-61:⚠️ Potential issue | 🟠 MajorAuthenticate the read-side entity endpoints too.
This blueprint only accepts
require_admin_token_fn, soGET /entities,GET /entity/<slug>, andGET /entities/merge-candidatesare exposed without any token check. Please thread a normal API-token validator intocreate_entity_blueprint()and call it in the read handlers, while keeping admin auth onPOST /entity/<slug>/merge.As per coding guidelines,
automem/api/**/*.py: Flask API routes must validate requests and handle authentication via Bearer token, X-API-Key header, or query parameter.Also applies to: 72-188
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/entity.py` around lines 57 - 61, The blueprint currently only accepts require_admin_token_fn so the read endpoints are unprotected; add a new parameter (e.g., require_api_token_fn: Optional[Callable[[], None]]) to create_entity_blueprint and invoke that function at the start of the read handlers for GET /entities, GET /entity/<slug>, and GET /entities/merge-candidates to enforce API-token/X-API-Key/Bearer validation, while leaving the existing require_admin_token_fn intact and only using it for POST /entity/<slug>/merge to preserve admin-only behavior; update any docstring/comments and route handler signatures in create_entity_blueprint to reference require_api_token_fn so callers can pass the standard token validator.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automem/api/recall.py`:
- Around line 1512-1526: The current branch guards entity population with "if
query_text and graph is not None" which prevents populating response["entities"]
when query_text is empty but results contain entity:* tags; change the logic to
check only "if graph is not None" (or otherwise remove query_text from the outer
condition), always gather entity slugs from results into result_entity_slugs,
and only call _extract_query_entities(query_text) when query_text is truthy,
then merge extracted slugs and result_entity_slugs into all_entity_slugs so
response["entities"] is populated even for tag-only or time-window recalls.
In `@automem/consolidation/entity_dedup.py`:
- Around line 170-173: The auto-merge branch currently uses a non-strict check
and will auto-merge when overlap == min_overlap_for_auto; change the condition
in the decision logic that adds candidates to auto_merge from "if is_substring
and overlap >= min_overlap_for_auto" to use a strict comparison "if is_substring
and overlap > min_overlap_for_auto" so exact-boundary (==) cases fall through to
the review path (the existing elif confidence >= 0.5 branch can remain
unchanged); update any relevant tests for Entity deduplication that assume
boundary behavior if present.
- Around line 202-256: The code only MERGEs REFERENCED_IN relationships onto the
canonical entity but leaves the original relationships on the alias, so update
the block that handles moving edges (using alias_id and canonical_id) to delete
the alias->Memory REFERENCED_IN relationships after successfully
creating/merging them onto the canonical node; specifically, after the UNWIND
MERGE query, run a graph.query that MATCHes (e:Entity {id:
$alias_id})-[r:REFERENCED_IN]->(m:Memory) DELETE r (or REMOVE that specific
relationship variable) and pass alias_id to the params so the alias's
relationships are removed before marking the alias merged_into.
In `@automem/consolidation/identity_synthesis.py`:
- Around line 134-145: The code persists the sampled reference count
(len(memories) after _gather_entity_memories with memory_limit) so entities with
>memory_limit refs always appear changed; update run_identity_consolidation and
the identity-writer logic to persist the full reference count instead of the
truncated sample size by obtaining the total ref count via a separate count
query (e.g., add or call a function like _count_entity_references(entity_id) or
modify _gather_entity_memories to return (sampled_memories, total_count)), use
that total_count when setting e.identity_source_count (and in the second writer
block around lines 187-203), and keep using sampled_memories for building
prompts; ensure comparisons against e.identity_source_count use the full
persisted total_count.
In `@consolidation.py`:
- Around line 888-901: The current branch in run_scheduled_tasks marks the
identity step as "skipped" when self._get_openai_client() returns None, which
causes the scheduler to treat the identity cycle as completed and suppress
retries; instead, change the behavior so a missing OpenAI client does not mark
the identity task as done — either omit setting results["steps"]["identity"] or
set a distinct flag like {"not_run": True, "reason": "..."} so the scheduler can
treat it as runnable; modify the logic around self._get_openai_client(),
run_identity_consolidation, and results["steps"]["identity"] to reflect "not
run" rather than "skipped".
---
Duplicate comments:
In `@automem/api/entity.py`:
- Around line 57-61: The blueprint currently only accepts require_admin_token_fn
so the read endpoints are unprotected; add a new parameter (e.g.,
require_api_token_fn: Optional[Callable[[], None]]) to create_entity_blueprint
and invoke that function at the start of the read handlers for GET /entities,
GET /entity/<slug>, and GET /entities/merge-candidates to enforce
API-token/X-API-Key/Bearer validation, while leaving the existing
require_admin_token_fn intact and only using it for POST /entity/<slug>/merge to
preserve admin-only behavior; update any docstring/comments and route handler
signatures in create_entity_blueprint to reference require_api_token_fn so
callers can pass the standard token validator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d00ef3a-cab8-4348-8752-b0de70854bdb
📒 Files selected for processing (9)
app.pyautomem/api/entity.pyautomem/api/recall.pyautomem/api/runtime_bootstrap.pyautomem/consolidation/entity_dedup.pyautomem/consolidation/identity_synthesis.pyconsolidation.pyscripts/migrate_entity_nodes.pytests/test_entity_identities.py
🚧 Files skipped from review as they are similar to previous changes (1)
- app.py
- Route contract test: add entity endpoints to expected routes - Recall: populate entities for tag/time-window queries without query_text - Entity dedup: use strict > (not >=) for auto-merge overlap threshold - Entity dedup: delete alias REFERENCED_IN edges after copying to canonical - Identity synthesis: persist true ref count, not truncated sample size - Scheduler: don't advance identity last_run when task was skipped
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
automem/consolidation/entity_dedup.py (1)
202-228:⚠️ Potential issue | 🟠 MajorDelete only the alias edges you actually copied.
mem_idsis snapshotted before theMERGE, but the follow-up delete removes allREFERENCED_INedges from the alias. If a new reference is created between those two queries, it gets deleted without ever being moved to the canonical entity.🔒 Safer delete scope
if mem_ids: # Copy edges to canonical graph.query( """ MATCH (e:Entity {id: $canonical_id}) UNWIND $mem_ids AS mid MATCH (m:Memory {id: mid}) MERGE (e)-[:REFERENCED_IN]->(m) """, {"canonical_id": canonical_id, "mem_ids": mem_ids}, ) # Remove old edges from alias graph.query( - "MATCH (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->() DELETE r", - {"alias_id": alias_id}, + """ + MATCH (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->(m:Memory) + WHERE m.id IN $mem_ids + DELETE r + """, + {"alias_id": alias_id, "mem_ids": mem_ids}, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/consolidation/entity_dedup.py` around lines 202 - 228, The current logic snapshots mem_ids then deletes all REFERENCED_IN edges from the alias, which can remove newly created references; update the remove step so it only deletes edges linking the alias to the mem_ids that were copied. After the MERGE block (same scope using canonical_id, alias_id, mem_ids) replace the blanket delete with a parameterized delete that matches (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->(m:Memory {id: mid}) by UNWINDing the same mem_ids (or otherwise restrict by mem_ids) so only the edges actually copied from mem_ids are removed; keep the existing guards around mem_ids being non-empty and reuse the same mem_ids variable and alias_id/canonical_id symbols.
🧹 Nitpick comments (1)
automem/api/recall.py (1)
1549-1558: Return the canonicalslugin the recallentitiespayload.This query already resolves aliases back to the canonical entity and fetches
e.slug, but the response drops it. That makes the newGET /entity/<slug>endpoint hard to use fromresponse["entities"].♻️ Minimal payload fix
entity_identities.append( { "id": row[0], + "slug": row[1], "name": row[3], "category": row[2], "aliases": aliases, "identity": row[5], "identity_source_count": int(row[6] or 0),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/recall.py` around lines 1549 - 1558, The entities payload is missing the canonical slug even though the query returns e.slug; update the entity dict construction in recall.py (the block that appends to entity_identities) to include the canonical slug (use the appropriate row index for e.slug, likely row[4]) as "slug": row[4] so GET /entity/<slug> can be referenced from response["entities"].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automem/consolidation/entity_dedup.py`:
- Around line 3-6: Update the module docstring to reflect the actual overlap
threshold used by the logic: replace the "80%+ of memories" statement with the
current threshold (>0.6) used by find_merge_candidates() (or describe as
"strictly greater than 60%") so the documentation matches the implementation in
find_merge_candidates().
In `@automem/consolidation/identity_synthesis.py`:
- Around line 187-209: The ref count extraction assumes the first cell of
graph.query (ref_count_result) is a numeric string and directly casts it to int,
which can raise ValueError for unexpected row shapes; update the extraction
around ref_count_result -> total_ref_count to defensively handle different
result shapes by inspecting ref_count_result.result_set (or equivalent rows),
checking for None/empty, attempting a safe int conversion inside a try/except,
and falling back to sensible defaults (e.g., 0) or using the number of returned
rows if appropriate; ensure the code that passes "source_count" uses the
sanitized total_ref_count value so identity synthesis won't crash (references:
ref_count_result, total_ref_count, graph.query).
In `@consolidation.py`:
- Around line 1084-1091: The code currently only avoids advancing
self.schedules[task_type]["last_run"] when identity_skipped is true; also skip
advancing when the identity step failed or the task returned an exception.
Modify the check around self.schedules[...] so that for task_type == "identity"
you do not advance last_run if result is not a dict, if result indicates an
identity error (e.g. result.get("steps", {}).get("identity", {}).get("error")),
or if the identity step is marked skipped; only set last_run when the identity
step explicitly succeeded. Use the existing variables (task_type, result,
identity_skipped) and update the conditional that sets
self.schedules[task_type]["last_run"] accordingly.
---
Duplicate comments:
In `@automem/consolidation/entity_dedup.py`:
- Around line 202-228: The current logic snapshots mem_ids then deletes all
REFERENCED_IN edges from the alias, which can remove newly created references;
update the remove step so it only deletes edges linking the alias to the mem_ids
that were copied. After the MERGE block (same scope using canonical_id,
alias_id, mem_ids) replace the blanket delete with a parameterized delete that
matches (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->(m:Memory {id: mid}) by
UNWINDing the same mem_ids (or otherwise restrict by mem_ids) so only the edges
actually copied from mem_ids are removed; keep the existing guards around
mem_ids being non-empty and reuse the same mem_ids variable and
alias_id/canonical_id symbols.
---
Nitpick comments:
In `@automem/api/recall.py`:
- Around line 1549-1558: The entities payload is missing the canonical slug even
though the query returns e.slug; update the entity dict construction in
recall.py (the block that appends to entity_identities) to include the canonical
slug (use the appropriate row index for e.slug, likely row[4]) as "slug": row[4]
so GET /entity/<slug> can be referenced from response["entities"].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f26360c5-72b5-4e47-b647-193e1a6cfbf8
📒 Files selected for processing (5)
automem/api/recall.pyautomem/consolidation/entity_dedup.pyautomem/consolidation/identity_synthesis.pyconsolidation.pytests/contracts/test_routes_contract.py
…ntity runs - Update entity_dedup docstring to match actual >60% threshold - Defensive parsing of ref count query result (handles unexpected row shapes) - Don't advance identity scheduler on error (not just skip)
Summary
Adds first-class Entity nodes to FalkorDB with identity synthesis, deduplication, and identity-aware recall. This fills the "semantic memory" gap — entities currently exist only as tags on Memory nodes, with no distilled representation of what an entity is.
The problem
When recalling "Acme Corp", AutoMem returns a bag of episodic memories ranked by relevance/recency. But there is no authoritative "Acme Corp is X" anchor — no semantic memory layer that crystallizes understanding over time.
What this PR adds
Entity nodes in FalkorDB — First-class graph citizens with identity, aliases, and relationships to memories.
Identity consolidation task — A new scheduled task (alongside decay/creative/cluster/forget) that:
Identity-aware recall — Recall responses now include an
entitiesfield with identity blocks for entities referenced in the query or results, giving callers an anchor before the episodes.Entity API —
GET /entities,GET /entity/<slug>(with alias resolution),GET /entities/merge-candidates,POST /entity/<slug>/mergeMigration script —
scripts/migrate_entity_nodes.pyscans existing Memory nodes forentity:*tags and creates Entity nodes + REFERENCED_IN edges. Idempotent, supports--dry-run.Configuration
Key design decisions
entitiesfield in response, existing API unchangedmax_completion_tokenswithmax_tokensfallback for broad client compatibilityFiles changed
automem/consolidation/entity_dedup.pyautomem/consolidation/identity_synthesis.pyautomem/api/entity.pyautomem/api/recall.pyscripts/migrate_entity_nodes.pyconsolidation.pyautomem/config.pytests/test_entity_identities.pyTesting
18 new tests covering: