Skip to content

feat: Entity Identity Synthesis — first-class Entity nodes with semantic memory#124

Open
jescalan wants to merge 6 commits intoverygoodplugins:mainfrom
jescalan:feat/entity-identities
Open

feat: Entity Identity Synthesis — first-class Entity nodes with semantic memory#124
jescalan wants to merge 6 commits intoverygoodplugins:mainfrom
jescalan:feat/entity-identities

Conversation

@jescalan
Copy link
Contributor

Summary

Adds first-class Entity nodes to FalkorDB with identity synthesis, deduplication, and identity-aware recall. This fills the "semantic memory" gap — entities currently exist only as tags on Memory nodes, with no distilled representation of what an entity is.

The problem

When recalling "Acme Corp", AutoMem returns a bag of episodic memories ranked by relevance/recency. But there is no authoritative "Acme Corp is X" anchor — no semantic memory layer that crystallizes understanding over time.

What this PR adds

Entity nodes in FalkorDB — First-class graph citizens with identity, aliases, and relationships to memories.

Identity consolidation task — A new scheduled task (alongside decay/creative/cluster/forget) that:

  1. Deduplicates entities (merges "alice" into "alice-smith" when they share >60% memory overlap + substring match)
  2. Synthesizes 2-5 sentence identity definitions from episodic memories via LLM
  3. Refines identities over time, with full re-synthesis every 5th version to prevent drift

Identity-aware recall — Recall responses now include an entities field with identity blocks for entities referenced in the query or results, giving callers an anchor before the episodes.

Entity APIGET /entities, GET /entity/<slug> (with alias resolution), GET /entities/merge-candidates, POST /entity/<slug>/merge

Migration scriptscripts/migrate_entity_nodes.py scans existing Memory nodes for entity:* tags and creates Entity nodes + REFERENCED_IN edges. Idempotent, supports --dry-run.

Configuration

IDENTITY_SYNTHESIS_MODEL=gpt-5.4          # Model for identity synthesis (configurable)
CONSOLIDATION_IDENTITY_INTERVAL_SECONDS=604800  # 7 days between runs

Key design decisions

  • Batch queries throughout — UNWIND for edge creation, single-query entity lookup in recall, batch memory set loading in dedup
  • Conditional synthesis — Only re-synthesizes when reference count changes (no wasted LLM calls)
  • Alias resolution — Entity nodes store aliases; queries check both slug and aliases
  • Additive recall change — New entities field in response, existing API unchanged
  • max_completion_tokens with max_tokens fallback for broad client compatibility

Files changed

File Purpose
automem/consolidation/entity_dedup.py Entity deduplication (slug similarity + memory overlap)
automem/consolidation/identity_synthesis.py LLM-based identity synthesis + consolidation runner
automem/api/entity.py Entity CRUD endpoints
automem/api/recall.py Entity identity injection in recall
scripts/migrate_entity_nodes.py One-time migration script
consolidation.py Identity step in MemoryConsolidator + scheduler
automem/config.py New config vars
tests/test_entity_identities.py 18 tests covering migration, dedup, merge, synthesis, recall

Testing

18 new tests covering:

  • Entity node creation from tags (migration)
  • Dedup candidate detection (string similarity, co-occurrence, cross-category exclusion)
  • Merge operations (edge moves, alias dedup)
  • Identity synthesis (mocked LLM)
  • Conditional synthesis (skip unchanged entities)
  • Recall entity injection

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds entity deduplication and merge tooling, LLM-driven entity identity synthesis with scheduler/runtime wiring, a new Entity REST API and recall identity injection, a migration script to create Entity nodes from Memory tags, and extensive tests for these features.

Changes

Cohort / File(s) Summary
Repository config & app
/.gitignore, app.py, automem/config.py
Ignore data/; add CONSOLIDATION_IDENTITY_INTERVAL_SECONDS and IDENTITY_SYNTHESIS_MODEL; wire identity_interval_seconds into consolidation runtime.
Entity API & bootstrap
automem/api/entity.py, automem/api/runtime_bootstrap.py
New Entity Blueprint exposing listing, get-by-slug, merge-candidates, and merge endpoints; admin-token gating for write; blueprint created and registered in runtime bootstrap.
Recall augmentation
automem/api/recall.py
After JIT enrichment, batch-lookup Entity nodes by slug and inject an "entities" identity field into recall responses; lookup failures logged and non-fatal.
Deduplication module
automem/consolidation/entity_dedup.py
New dedup module with MergeCandidate/MergeResult types, slug-similarity, memory-overlap heuristics, candidate discovery (find_merge_candidates) and merge_entities that re-links memories, updates aliases, and marks aliases merged.
Identity synthesis
automem/consolidation/identity_synthesis.py, consolidation.py
New identity synthesis (synthesize_identity, run_identity_consolidation) using an OpenAI-compatible client; MemoryConsolidator gains _get_openai_client and consolidator flow includes an identity step; scheduler gains identity schedule.
Runtime & scheduler wiring
automem/consolidation/runtime_bindings.py, automem/consolidation/runtime_helpers.py, automem/consolidation/runtime_scheduler.py
Add identity_interval_seconds parameter, include identity in scheduler overrides and task lists, and account for identity metrics in consolidation tick reporting.
Migration script
scripts/migrate_entity_nodes.py
New idempotent migration: scan Memory tags (entity:category:slug), MERGE Entity nodes, and create REFERENCED_IN edges; supports dry-run and logging.
Tests & contracts
tests/test_entity_identities.py, tests/contracts/test_routes_contract.py
Large test suite with in-test EntityFakeGraph covering migration, dedup/merge, identity synthesis, and recall injection; contract tests add expectations for new Entity routes.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Consolidation<br/>Scheduler
    participant Consolidator as MemoryConsolidator
    participant Graph as FalkorDB<br/>Graph
    participant LLM as OpenAI<br/>Client
    participant EntityNode as Entity<br/>Node

    Scheduler->>Consolidator: trigger identity task
    Consolidator->>Consolidator: _get_openai_client()
    Consolidator->>Graph: run_identity_consolidation()
    Graph->>Graph: find_merge_candidates()
    Graph->>Graph: merge_entities()
    Graph->>Graph: fetch entities needing synthesis
    Graph->>Graph: collect memories per entity

    loop per entity
        Graph->>LLM: synthesize_identity(memories)
        LLM-->>Graph: return identity string
        Graph->>EntityNode: update identity + version
    end

    Graph-->>Consolidator: return consolidation results
    Consolidator-->>Scheduler: record metrics
Loading
sequenceDiagram
    participant Client
    participant API as Entity API<br/>Endpoint
    participant Graph as FalkorDB<br/>Graph
    participant Dedup as entity_dedup<br/>module

    Client->>API: GET /entities/merge-candidates
    API->>Graph: get_memory_graph()
    API->>Dedup: find_merge_candidates()
    Dedup->>Graph: load entities & memories
    Dedup->>Dedup: compute slug similarity & overlap
    Dedup-->>API: return candidates
    API-->>Client: JSON response

    Client->>API: POST /entity/{slug}/merge
    API->>Graph: resolve entities
    API->>Dedup: merge_entities()
    Dedup->>Graph: re-link memories, update aliases, mark merged
    Dedup-->>API: return MergeResult
    API-->>Client: merge details
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.80% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: Entity Identity Synthesis with first-class Entity nodes and semantic memory integration.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the problem, solution, API additions, configuration, design decisions, and testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…tic memory

- Add Entity nodes to FalkorDB with identity, aliases, and REFERENCED_IN edges
- New identity consolidation task: dedup entities + synthesize 2-5 sentence identities via LLM
- Identity-aware recall: inject entity identities into recall responses
- Entity API: GET /entities, GET /entity/<slug>, GET /entities/merge-candidates, POST /entity/<slug>/merge
- Migration script: scripts/migrate_entity_nodes.py (idempotent, --dry-run support)
- Configurable: IDENTITY_SYNTHESIS_MODEL, CONSOLIDATION_IDENTITY_INTERVAL_SECONDS
- Batch queries throughout (UNWIND edges, single-query recall injection)
- Conditional synthesis: skip unchanged entities to save LLM calls
- 18 tests covering migration, dedup, merge, synthesis, and recall
@jescalan jescalan force-pushed the feat/entity-identities branch from 725bc01 to fcfb458 Compare March 11, 2026 22:14
@coderabbitai coderabbitai bot added the enhancement New feature or request label Mar 11, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (5)
scripts/migrate_entity_nodes.py (1)

50-52: Use logging.exception for import errors.

Per static analysis hint, logging.exception is preferred when handling exceptions to include the traceback, though here it's an ImportError so minimal traceback is expected. The current approach is acceptable but could be improved:

♻️ Optional improvement
     except ImportError:
-        logger.error("falkordb package not installed")
+        logger.exception("falkordb package not installed")
         sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_entity_nodes.py` around lines 50 - 52, Replace the
logger.error call inside the except ImportError block with logger.exception so
the ImportError traceback is recorded; keep the existing message ("falkordb
package not installed") and retain the sys.exit(1) behavior in the same except
block to ensure the process still exits after logging the exception.
automem/api/recall.py (1)

1510-1561: Entity identity injection follows graceful degradation pattern.

The implementation correctly logs errors without blocking the main recall flow, which aligns with the coding guidelines for vector store errors. The bare except Exception catches (Lines 1547, 1558, 1560) are appropriate here since entity identity is supplementary data.

One minor robustness consideration: Line 1555 uses int(row[6] or 0) which could raise ValueError if row[6] is a non-numeric string. Consider wrapping in a try/except or using a safer conversion:

🛡️ Safer int conversion
-                        "identity_source_count": int(row[6] or 0),
+                        "identity_source_count": int(row[6]) if isinstance(row[6], (int, float)) else 0,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/recall.py` around lines 1510 - 1561, The loop building
entity_identities uses int(row[6] or 0) which can raise ValueError if row[6] is
a non-numeric string; update the block that iterates over ent_result (the for
row in getattr(ent_result, "result_set", []) loop) to convert
identity_source_count safely—e.g. attempt int(row[6]) inside a small try/except
(catch ValueError/TypeError) and fall back to 0 (optionally logger.debug the bad
value) before appending the dict to entity_identities; this keeps the
graceful-degradation behavior while preventing a conversion exception from
bubbling up.
consolidation.py (1)

152-165: Consider reusing the existing OpenAI client pattern.

This method creates a new OpenAI client instance on each call. The main app already has init_openai and get_openai_client patterns (via _service_runtime). Consider injecting a client getter into MemoryConsolidator instead of reading environment variables directly within the class.

This would:

  1. Avoid duplicate client creation logic
  2. Make the class more testable (easier to mock)
  3. Respect any base URL or other configuration already set up
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@consolidation.py` around lines 152 - 165, Replace the inline client-creation
in MemoryConsolidator._get_openai_client with a dependency-injected getter:
accept or set a callable/property that returns the app-wide OpenAI client (the
same one produced by init_openai/get_openai_client on _service_runtime) and use
that instead of reading OPENAI_API_KEY/OPENAI_BASE_URL and instantiating
openai.OpenAI directly; update MemoryConsolidator to accept this client getter
(or to fetch it from _service_runtime) so tests can inject mocks and the class
reuses existing configuration.
automem/consolidation/entity_dedup.py (2)

236-237: Use unpacking for cleaner list construction.

Per static analysis suggestion, use list unpacking instead of concatenation.

♻️ Proposed fix
-    incoming_aliases = [alias_slug] + alias_aliases if alias_slug else alias_aliases
+    incoming_aliases = [alias_slug, *alias_aliases] if alias_slug else alias_aliases
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/consolidation/entity_dedup.py` around lines 236 - 237, Replace the
manual conditional and concatenation with list unpacking to build
incoming_aliases and merged_aliases more cleanly: where the code currently
constructs incoming_aliases = [alias_slug] + alias_aliases if alias_slug else
alias_aliases and merged_aliases = list(dict.fromkeys(current_aliases +
incoming_aliases)), change to use unpacking (e.g., incoming_aliases =
[*([alias_slug] if alias_slug else []), *alias_aliases]) and merged_aliases =
list(dict.fromkeys([*current_aliases, *incoming_aliases])) so order is preserved
and duplicates removed as before.

198-218: Old REFERENCED_IN edges on alias entity are not deleted.

After merging edges from the alias to the canonical entity, the original edges on the alias entity remain. While the alias is marked with merged_into and filtered from queries, this leaves orphaned edges in the graph.

Consider deleting the old edges after the merge to maintain graph hygiene:

🧹 Proposed fix — delete old edges
     if mem_ids:
         graph.query(
             """
             MATCH (e:Entity {id: $canonical_id})
             UNWIND $mem_ids AS mid
             MATCH (m:Memory {id: mid})
             MERGE (e)-[:REFERENCED_IN]->(m)
             """,
             {"canonical_id": canonical_id, "mem_ids": mem_ids},
         )
+        # Delete old edges from alias entity
+        graph.query(
+            "MATCH (e:Entity {id: $id})-[r:REFERENCED_IN]->() DELETE r",
+            {"id": alias_id},
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/consolidation/entity_dedup.py` around lines 198 - 218, After
batch-creating REFERENCED_IN edges on the canonical entity, remove the old
relationships from the alias to avoid orphaned edges: after the existing
graph.query that MERGE(s) edges for {"canonical_id": canonical_id, "mem_ids":
mem_ids}, run a follow-up graph.query that MATCHES (e:Entity {id:
$id})-[r:REFERENCED_IN]->(m:Memory) WHERE m.id IN $mem_ids DELETE r (use {"id":
alias_id, "mem_ids": mem_ids}); update edges_moved if you want it to reflect
deletions and ensure the code only runs the DELETE when mem_ids is non-empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app.py`:
- Line 362: Run the code formatter and import sorter to resolve CI formatting
errors: execute `black .` and `isort .` (or your project's configured pre-commit
hooks) in the repo root, then re-run the tests; this will fix the formatting
issue flagged in app.py around the line containing
identity_interval_seconds=CONSOLIDATION_IDENTITY_INTERVAL_SECONDS and ensure
consistent style across the codebase.

In `@automem/api/entity.py`:
- Line 72: The current conversion of request.args.get("limit", 100) to int can
raise ValueError for invalid input; update the handler that assigns limit to
validate the query value before casting (check that request.args.get("limit") is
None or matches an integer pattern or use a safe try/except around int()) and on
invalid input return a 400 response with a clear error message (e.g., "invalid
limit parameter") instead of allowing a 500; keep the existing cap of min(...,
500) after successful validation and reference the same variable name limit and
the call request.args.get("limit", 100) in your change.
- Around line 57-63: The entity blueprint created by create_entity_blueprint
lacks authentication; update its signature to accept require_api_token_fn and
require_admin_token_fn (or at least require_api_token_fn) and apply
require_api_token_fn as a decorator/validator on all read routes and
require_admin_token_fn on the merge/modify route handlers; then update the
blueprint registration call in runtime_bootstrap to pass the appropriate auth
functions so endpoints validate tokens before returning or mutating entity data.

In `@automem/api/runtime_bootstrap.py`:
- Around line 176-179: The entity blueprint is created without authentication,
so the POST /entity/<slug>/merge write endpoint can be called unauthenticated;
update the create_entity_blueprint call to pass the API auth function (e.g.,
require_api_token_fn) alongside get_memory_graph_fn and logger, and then ensure
the merge handler inside the blueprint checks/uses that auth function to
validate requests before performing the merge operation (refer to
create_entity_blueprint, get_memory_graph_fn and the merge endpoint handler for
where to add the parameter and the authentication check).

In `@automem/consolidation/identity_synthesis.py`:
- Around line 170-173: The code assumes response.choices[0].message.content is
always a string when assigning identity_text, which can be None and cause
AttributeError on .strip(); update the try block around identity_text assignment
(look for identity_text, response, and logger.exception usage) to check for None
before stripping—e.g., retrieve content = response.choices[0].message.content,
if content is None log a warning with entity_id and return None (or set
identity_text to an empty string if appropriate), otherwise call content.strip()
and proceed; keep the existing exception handling via logger.exception for other
errors.

In `@consolidation.py`:
- Around line 858-882: The logger.exception call in the identity consolidation
block is passing the caught exception `exc` as a formatting parameter, which is
redundant because logger.exception already records exception info; update the
call in consolidation.py (inside the try/except around
run_identity_consolidation and _get_openai_client) to call logger.exception with
a simple message (e.g., "Identity consolidation failed") and keep the existing
results["steps"]["identity"] = {"error": str(exc)} behavior unchanged.

In `@scripts/migrate_entity_nodes.py`:
- Around line 1-10: Run Black and isort on the file to fix line length and
import ordering, and then add missing type hints: annotate connect_falkordb(),
collect_entity_tags(graph), and run_migration(graph) with appropriate parameter
and return types consistent with the project (e.g., the graph type used across
the repo and concrete return types or None). Also break the overlong statement
(the one exceeding 100 chars) into shorter concatenated strings or use implicit
continuation so it fits the 100-character limit; ensure the changed signatures
use the correct imports/types after formatting.

In `@tests/test_entity_identities.py`:
- Around line 375-379: The test unpacks a two-value return from
find_merge_candidates into (auto_merge, review) but never uses review; change
the unused variable name to _review to signal it's intentionally ignored (e.g.,
replace the local variable review with _review in the test where
find_merge_candidates is called) so linters won't flag an unused variable while
keeping behavior intact.

---

Nitpick comments:
In `@automem/api/recall.py`:
- Around line 1510-1561: The loop building entity_identities uses int(row[6] or
0) which can raise ValueError if row[6] is a non-numeric string; update the
block that iterates over ent_result (the for row in getattr(ent_result,
"result_set", []) loop) to convert identity_source_count safely—e.g. attempt
int(row[6]) inside a small try/except (catch ValueError/TypeError) and fall back
to 0 (optionally logger.debug the bad value) before appending the dict to
entity_identities; this keeps the graceful-degradation behavior while preventing
a conversion exception from bubbling up.

In `@automem/consolidation/entity_dedup.py`:
- Around line 236-237: Replace the manual conditional and concatenation with
list unpacking to build incoming_aliases and merged_aliases more cleanly: where
the code currently constructs incoming_aliases = [alias_slug] + alias_aliases if
alias_slug else alias_aliases and merged_aliases =
list(dict.fromkeys(current_aliases + incoming_aliases)), change to use unpacking
(e.g., incoming_aliases = [*([alias_slug] if alias_slug else []),
*alias_aliases]) and merged_aliases = list(dict.fromkeys([*current_aliases,
*incoming_aliases])) so order is preserved and duplicates removed as before.
- Around line 198-218: After batch-creating REFERENCED_IN edges on the canonical
entity, remove the old relationships from the alias to avoid orphaned edges:
after the existing graph.query that MERGE(s) edges for {"canonical_id":
canonical_id, "mem_ids": mem_ids}, run a follow-up graph.query that MATCHES
(e:Entity {id: $id})-[r:REFERENCED_IN]->(m:Memory) WHERE m.id IN $mem_ids DELETE
r (use {"id": alias_id, "mem_ids": mem_ids}); update edges_moved if you want it
to reflect deletions and ensure the code only runs the DELETE when mem_ids is
non-empty.

In `@consolidation.py`:
- Around line 152-165: Replace the inline client-creation in
MemoryConsolidator._get_openai_client with a dependency-injected getter: accept
or set a callable/property that returns the app-wide OpenAI client (the same one
produced by init_openai/get_openai_client on _service_runtime) and use that
instead of reading OPENAI_API_KEY/OPENAI_BASE_URL and instantiating
openai.OpenAI directly; update MemoryConsolidator to accept this client getter
(or to fetch it from _service_runtime) so tests can inject mocks and the class
reuses existing configuration.

In `@scripts/migrate_entity_nodes.py`:
- Around line 50-52: Replace the logger.error call inside the except ImportError
block with logger.exception so the ImportError traceback is recorded; keep the
existing message ("falkordb package not installed") and retain the sys.exit(1)
behavior in the same except block to ensure the process still exits after
logging the exception.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dad03088-75de-48c0-a8ba-f1054d681b13

📥 Commits

Reviewing files that changed from the base of the PR and between 605633e and fcfb458.

📒 Files selected for processing (14)
  • .gitignore
  • app.py
  • automem/api/entity.py
  • automem/api/recall.py
  • automem/api/runtime_bootstrap.py
  • automem/config.py
  • automem/consolidation/entity_dedup.py
  • automem/consolidation/identity_synthesis.py
  • automem/consolidation/runtime_bindings.py
  • automem/consolidation/runtime_helpers.py
  • automem/consolidation/runtime_scheduler.py
  • consolidation.py
  • scripts/migrate_entity_nodes.py
  • tests/test_entity_identities.py

- Add admin token auth on entity merge endpoint
- Validate limit parameter with 400 on invalid input
- Handle None LLM response content gracefully
- Remove redundant exc in logger.exception call
- Prefix unused variable with underscore in tests
- Add return type hint to connect_falkordb()
- Run black + isort formatting on all changed files
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automem/api/entity.py`:
- Around line 205-234: Reject self-merge attempts before calling merge_entities:
after resolving alias_id and canonical_id (from alias_rows and canonical_rows)
add a check if alias_id == canonical_id and abort with a 400 (bad request) and a
clear message referencing slug/merge_into (e.g., "cannot merge entity into
itself"); this prevents passing identical IDs into merge_entities and
tombstoning the canonical entity.

In `@scripts/migrate_entity_nodes.py`:
- Line 44: The function signature connect_falkordb() -> Any references the Any
type but Any is not imported; add "from typing import Any" to the top of the
module (or include Any in an existing typing import) so the return annotation
resolves and flake8 F821 is satisfied, then run linters to confirm.

In `@tests/test_entity_identities.py`:
- Around line 481-483: The test assigns (auto_merge, _review) =
find_merge_candidates(graph) but asserts against an undefined name review;
update the second assertion to use the actual variable _review (i.e., assert
len(_review) == 0) so both assertions reference the variables returned by
find_merge_candidates (auto_merge and _review).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d32b9e39-f2c1-4021-ab14-3e36728462cb

📥 Commits

Reviewing files that changed from the base of the PR and between fcfb458 and 94f08a0.

📒 Files selected for processing (7)
  • app.py
  • automem/api/entity.py
  • automem/api/runtime_bootstrap.py
  • automem/consolidation/identity_synthesis.py
  • consolidation.py
  • scripts/migrate_entity_nodes.py
  • tests/test_entity_identities.py

- Reject self-merges with 400 before calling merge_entities()
- Add missing 'from typing import Any' in migrate script
- Fix _review reference in test assertion (line 483)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
automem/api/entity.py (1)

57-70: ⚠️ Potential issue | 🟠 Major

Missing authentication on read endpoints.

Per coding guidelines, Flask API routes must validate requests and handle authentication. The blueprint accepts require_admin_token_fn for write operations but read endpoints (/entities, /entity/<slug>, /entities/merge-candidates) have no authentication. This exposes entity data to unauthenticated access.

Consider adding a require_api_token_fn parameter and applying it to read endpoints.

🔐 Proposed fix — add authentication for read endpoints
 def create_entity_blueprint(
     get_memory_graph: Callable[[], Any],
     logger: Any,
+    require_api_token_fn: Optional[Callable[[], None]] = None,
     require_admin_token_fn: Optional[Callable[[], None]] = None,
 ) -> Blueprint:

Then wrap each read endpoint:

     `@bp.route`("/entities", methods=["GET"])
     def list_entities() -> Any:
+        if require_api_token_fn is not None:
+            require_api_token_fn()
         graph = get_memory_graph()

Apply similarly to get_entity and merge_candidates.

Also applies to: 72-73, 125-126, 154-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/entity.py` around lines 57 - 70, The read endpoints in
create_entity_blueprint (routes handling /entities, get_entity (/entity/<slug>),
and merge_candidates (/entities/merge-candidates)) lack authentication; add a
new optional parameter require_api_token_fn to create_entity_blueprint and call
it at the start of each read handler (similar to how require_admin_token_fn is
used for merge) to enforce request validation; update the blueprint signature
and invoke require_api_token_fn() in the handlers for the functions/route
handlers named get_entities (or the handler for /entities), get_entity, and
merge_candidates so reads require a valid API token.
scripts/migrate_entity_nodes.py (1)

45-63: ⚠️ Potential issue | 🟡 Minor

Add Railway domain fallbacks for host resolution.

The connection logic doesn't include the Railway domain fallbacks present in automem/stores/runtime_clients.py, which could cause the script to fail in Railway deployments where FALKORDB_HOST isn't set but Railway-specific env vars are.

🔧 Proposed fix
-    host = os.getenv("FALKORDB_HOST", "localhost")
+    host = (
+        os.getenv("FALKORDB_HOST")
+        or os.getenv("RAILWAY_PRIVATE_DOMAIN")
+        or os.getenv("RAILWAY_PUBLIC_DOMAIN")
+        or "localhost"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_entity_nodes.py` around lines 45 - 63, connect_falkordb
currently only reads FALKORDB_HOST and misses the Railway fallbacks; update the
host resolution in connect_falkordb to mirror automem/stores/runtime_clients.py
by: read FALKORDB_HOST first, then fallback to RAILWAY_PRIVATE_SUBDOMAIN and
then RAILWAY_STATIC_URL (via os.getenv), and if a fallback URL is used extract
the hostname portion before assigning to host; keep the rest of the logic that
builds params (including password/username) and instantiates FalkorDB (**params)
and returns db.select_graph(GRAPH_NAME).
🧹 Nitpick comments (5)
scripts/migrate_entity_nodes.py (3)

49-53: Use logger.exception instead of logger.error for import failures.

Per static analysis hint (TRY400), logger.exception automatically includes traceback information which aids debugging.

📝 Proposed fix
     try:
         from falkordb import FalkorDB
     except ImportError:
-        logger.error("falkordb package not installed")
+        logger.exception("falkordb package not installed")
         sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_entity_nodes.py` around lines 49 - 53, In the except
ImportError block that catches the failed import of FalkorDB in
scripts/migrate_entity_nodes.py, replace the call to logger.error("falkordb
package not installed") with logger.exception("falkordb package not installed")
so the exception traceback is recorded; keep the existing sys.exit(1) behavior.
Ensure the logger variable referenced in that block is used unchanged.

87-89: Add type hint for graph parameter.

Same issue as collect_entity_tags — the graph parameter should have a type annotation for consistency with the codebase style.

♻️ Proposed fix
 def run_migration(
-    graph, entity_to_memories: dict[str, list[str]], *, dry_run: bool
+    graph: Any, entity_to_memories: dict[str, list[str]], *, dry_run: bool
 ) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_entity_nodes.py` around lines 87 - 89, The run_migration
function's graph parameter is missing a type annotation; update def
run_migration(...) to add the same graph type annotation used by
collect_entity_tags (i.e., match the exact type name used there) so the
signature becomes consistent with the codebase style and type-checking (refer to
the run_migration function and collect_entity_tags for the exact identifier to
reuse).

66-84: Add type hint for graph parameter.

Per coding guidelines, all Python code should use type hints. The graph parameter lacks a type annotation.

♻️ Proposed fix
-def collect_entity_tags(graph) -> dict[str, list[str]]:
+def collect_entity_tags(graph: Any) -> dict[str, list[str]]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_entity_nodes.py` around lines 66 - 84, The function
collect_entity_tags is missing a type annotation for the graph parameter; add a
type hint to the signature (e.g., graph: neo4j.Graph or graph: py2neo.Graph
depending on the graph client used in this repo) and import the appropriate
type, or use graph: Any with from typing import Any if the concrete type is
uncertain; update the function signature in collect_entity_tags and add the
corresponding import so the file passes the project's typing guidelines.
automem/api/entity.py (1)

160-192: Consider logging successful merge candidate discovery.

The exception path logs failures, but there's no logging for successful operations. Adding debug-level logging for candidate counts would aid troubleshooting.

📝 Optional: Add debug logging
             auto_merge, review = find_merge_candidates(graph)
+            logger.debug(
+                "Found %d auto-merge and %d review candidates",
+                len(auto_merge),
+                len(review),
+            )
             auto_merge_ids = {(c.entity_a_id, c.entity_b_id) for c in auto_merge}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/entity.py` around lines 160 - 192, After successfully building
the merge candidate list, add a debug log that records the outcome (e.g., number
of candidates and number of auto_merge candidates) to aid troubleshooting; place
the log call just before the return jsonify(...) in the block that calls
find_merge_candidates, referencing the variables built there
(find_merge_candidates, auto_merge, review, candidates) and using the existing
logger (logger.debug) so successful discoveries are logged at debug level.
tests/test_entity_identities.py (1)

749-751: Test assertion could be clearer.

The assertion logic using or with set comprehension is harder to read. Consider a more explicit check.

📝 Optional: Clearer assertion
         entities = _extract_entities_from_results(results)
-        assert "alice smith" in entities or "alice-smith" in {
-            e.replace(" ", "-") for e in entities
-        }
+        # Normalize to slug format for comparison
+        normalized = {e.lower().replace(" ", "-") for e in entities}
+        assert "alice-smith" in normalized, f"Expected 'alice-smith' in {entities}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_entity_identities.py` around lines 749 - 751, The assertion is
hard to read; normalize entity names and check membership explicitly: create a
normalized set from entities (e.g., normalize by replacing hyphens with spaces
and lowercasing) and then assert "alice smith" is in that normalized set
(referring to the local variable entities in this test) so the intent is clear
and the logic is simpler to understand.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@automem/api/entity.py`:
- Around line 57-70: The read endpoints in create_entity_blueprint (routes
handling /entities, get_entity (/entity/<slug>), and merge_candidates
(/entities/merge-candidates)) lack authentication; add a new optional parameter
require_api_token_fn to create_entity_blueprint and call it at the start of each
read handler (similar to how require_admin_token_fn is used for merge) to
enforce request validation; update the blueprint signature and invoke
require_api_token_fn() in the handlers for the functions/route handlers named
get_entities (or the handler for /entities), get_entity, and merge_candidates so
reads require a valid API token.

In `@scripts/migrate_entity_nodes.py`:
- Around line 45-63: connect_falkordb currently only reads FALKORDB_HOST and
misses the Railway fallbacks; update the host resolution in connect_falkordb to
mirror automem/stores/runtime_clients.py by: read FALKORDB_HOST first, then
fallback to RAILWAY_PRIVATE_SUBDOMAIN and then RAILWAY_STATIC_URL (via
os.getenv), and if a fallback URL is used extract the hostname portion before
assigning to host; keep the rest of the logic that builds params (including
password/username) and instantiates FalkorDB (**params) and returns
db.select_graph(GRAPH_NAME).

---

Nitpick comments:
In `@automem/api/entity.py`:
- Around line 160-192: After successfully building the merge candidate list, add
a debug log that records the outcome (e.g., number of candidates and number of
auto_merge candidates) to aid troubleshooting; place the log call just before
the return jsonify(...) in the block that calls find_merge_candidates,
referencing the variables built there (find_merge_candidates, auto_merge,
review, candidates) and using the existing logger (logger.debug) so successful
discoveries are logged at debug level.

In `@scripts/migrate_entity_nodes.py`:
- Around line 49-53: In the except ImportError block that catches the failed
import of FalkorDB in scripts/migrate_entity_nodes.py, replace the call to
logger.error("falkordb package not installed") with logger.exception("falkordb
package not installed") so the exception traceback is recorded; keep the
existing sys.exit(1) behavior. Ensure the logger variable referenced in that
block is used unchanged.
- Around line 87-89: The run_migration function's graph parameter is missing a
type annotation; update def run_migration(...) to add the same graph type
annotation used by collect_entity_tags (i.e., match the exact type name used
there) so the signature becomes consistent with the codebase style and
type-checking (refer to the run_migration function and collect_entity_tags for
the exact identifier to reuse).
- Around line 66-84: The function collect_entity_tags is missing a type
annotation for the graph parameter; add a type hint to the signature (e.g.,
graph: neo4j.Graph or graph: py2neo.Graph depending on the graph client used in
this repo) and import the appropriate type, or use graph: Any with from typing
import Any if the concrete type is uncertain; update the function signature in
collect_entity_tags and add the corresponding import so the file passes the
project's typing guidelines.

In `@tests/test_entity_identities.py`:
- Around line 749-751: The assertion is hard to read; normalize entity names and
check membership explicitly: create a normalized set from entities (e.g.,
normalize by replacing hyphens with spaces and lowercasing) and then assert
"alice smith" is in that normalized set (referring to the local variable
entities in this test) so the intent is clear and the logic is simpler to
understand.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52ce054f-1032-4de6-ad37-80be37e88639

📥 Commits

Reviewing files that changed from the base of the PR and between 94f08a0 and f16d959.

📒 Files selected for processing (3)
  • automem/api/entity.py
  • scripts/migrate_entity_nodes.py
  • tests/test_entity_identities.py

Line length 100, isort profile=black per .pre-commit-config.yaml
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
automem/api/entity.py (1)

57-61: ⚠️ Potential issue | 🟠 Major

Authenticate the read-side entity endpoints too.

This blueprint only accepts require_admin_token_fn, so GET /entities, GET /entity/<slug>, and GET /entities/merge-candidates are exposed without any token check. Please thread a normal API-token validator into create_entity_blueprint() and call it in the read handlers, while keeping admin auth on POST /entity/<slug>/merge.

As per coding guidelines, automem/api/**/*.py: Flask API routes must validate requests and handle authentication via Bearer token, X-API-Key header, or query parameter.

Also applies to: 72-188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/entity.py` around lines 57 - 61, The blueprint currently only
accepts require_admin_token_fn so the read endpoints are unprotected; add a new
parameter (e.g., require_api_token_fn: Optional[Callable[[], None]]) to
create_entity_blueprint and invoke that function at the start of the read
handlers for GET /entities, GET /entity/<slug>, and GET
/entities/merge-candidates to enforce API-token/X-API-Key/Bearer validation,
while leaving the existing require_admin_token_fn intact and only using it for
POST /entity/<slug>/merge to preserve admin-only behavior; update any
docstring/comments and route handler signatures in create_entity_blueprint to
reference require_api_token_fn so callers can pass the standard token validator.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automem/api/recall.py`:
- Around line 1512-1526: The current branch guards entity population with "if
query_text and graph is not None" which prevents populating response["entities"]
when query_text is empty but results contain entity:* tags; change the logic to
check only "if graph is not None" (or otherwise remove query_text from the outer
condition), always gather entity slugs from results into result_entity_slugs,
and only call _extract_query_entities(query_text) when query_text is truthy,
then merge extracted slugs and result_entity_slugs into all_entity_slugs so
response["entities"] is populated even for tag-only or time-window recalls.

In `@automem/consolidation/entity_dedup.py`:
- Around line 170-173: The auto-merge branch currently uses a non-strict check
and will auto-merge when overlap == min_overlap_for_auto; change the condition
in the decision logic that adds candidates to auto_merge from "if is_substring
and overlap >= min_overlap_for_auto" to use a strict comparison "if is_substring
and overlap > min_overlap_for_auto" so exact-boundary (==) cases fall through to
the review path (the existing elif confidence >= 0.5 branch can remain
unchanged); update any relevant tests for Entity deduplication that assume
boundary behavior if present.
- Around line 202-256: The code only MERGEs REFERENCED_IN relationships onto the
canonical entity but leaves the original relationships on the alias, so update
the block that handles moving edges (using alias_id and canonical_id) to delete
the alias->Memory REFERENCED_IN relationships after successfully
creating/merging them onto the canonical node; specifically, after the UNWIND
MERGE query, run a graph.query that MATCHes (e:Entity {id:
$alias_id})-[r:REFERENCED_IN]->(m:Memory) DELETE r (or REMOVE that specific
relationship variable) and pass alias_id to the params so the alias's
relationships are removed before marking the alias merged_into.

In `@automem/consolidation/identity_synthesis.py`:
- Around line 134-145: The code persists the sampled reference count
(len(memories) after _gather_entity_memories with memory_limit) so entities with
>memory_limit refs always appear changed; update run_identity_consolidation and
the identity-writer logic to persist the full reference count instead of the
truncated sample size by obtaining the total ref count via a separate count
query (e.g., add or call a function like _count_entity_references(entity_id) or
modify _gather_entity_memories to return (sampled_memories, total_count)), use
that total_count when setting e.identity_source_count (and in the second writer
block around lines 187-203), and keep using sampled_memories for building
prompts; ensure comparisons against e.identity_source_count use the full
persisted total_count.

In `@consolidation.py`:
- Around line 888-901: The current branch in run_scheduled_tasks marks the
identity step as "skipped" when self._get_openai_client() returns None, which
causes the scheduler to treat the identity cycle as completed and suppress
retries; instead, change the behavior so a missing OpenAI client does not mark
the identity task as done — either omit setting results["steps"]["identity"] or
set a distinct flag like {"not_run": True, "reason": "..."} so the scheduler can
treat it as runnable; modify the logic around self._get_openai_client(),
run_identity_consolidation, and results["steps"]["identity"] to reflect "not
run" rather than "skipped".

---

Duplicate comments:
In `@automem/api/entity.py`:
- Around line 57-61: The blueprint currently only accepts require_admin_token_fn
so the read endpoints are unprotected; add a new parameter (e.g.,
require_api_token_fn: Optional[Callable[[], None]]) to create_entity_blueprint
and invoke that function at the start of the read handlers for GET /entities,
GET /entity/<slug>, and GET /entities/merge-candidates to enforce
API-token/X-API-Key/Bearer validation, while leaving the existing
require_admin_token_fn intact and only using it for POST /entity/<slug>/merge to
preserve admin-only behavior; update any docstring/comments and route handler
signatures in create_entity_blueprint to reference require_api_token_fn so
callers can pass the standard token validator.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d00ef3a-cab8-4348-8752-b0de70854bdb

📥 Commits

Reviewing files that changed from the base of the PR and between f16d959 and 2a7e643.

📒 Files selected for processing (9)
  • app.py
  • automem/api/entity.py
  • automem/api/recall.py
  • automem/api/runtime_bootstrap.py
  • automem/consolidation/entity_dedup.py
  • automem/consolidation/identity_synthesis.py
  • consolidation.py
  • scripts/migrate_entity_nodes.py
  • tests/test_entity_identities.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • app.py

- Route contract test: add entity endpoints to expected routes
- Recall: populate entities for tag/time-window queries without query_text
- Entity dedup: use strict > (not >=) for auto-merge overlap threshold
- Entity dedup: delete alias REFERENCED_IN edges after copying to canonical
- Identity synthesis: persist true ref count, not truncated sample size
- Scheduler: don't advance identity last_run when task was skipped
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
automem/consolidation/entity_dedup.py (1)

202-228: ⚠️ Potential issue | 🟠 Major

Delete only the alias edges you actually copied.

mem_ids is snapshotted before the MERGE, but the follow-up delete removes all REFERENCED_IN edges from the alias. If a new reference is created between those two queries, it gets deleted without ever being moved to the canonical entity.

🔒 Safer delete scope
     if mem_ids:
         # Copy edges to canonical
         graph.query(
             """
             MATCH (e:Entity {id: $canonical_id})
             UNWIND $mem_ids AS mid
             MATCH (m:Memory {id: mid})
             MERGE (e)-[:REFERENCED_IN]->(m)
             """,
             {"canonical_id": canonical_id, "mem_ids": mem_ids},
         )
         # Remove old edges from alias
         graph.query(
-            "MATCH (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->() DELETE r",
-            {"alias_id": alias_id},
+            """
+            MATCH (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->(m:Memory)
+            WHERE m.id IN $mem_ids
+            DELETE r
+            """,
+            {"alias_id": alias_id, "mem_ids": mem_ids},
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/consolidation/entity_dedup.py` around lines 202 - 228, The current
logic snapshots mem_ids then deletes all REFERENCED_IN edges from the alias,
which can remove newly created references; update the remove step so it only
deletes edges linking the alias to the mem_ids that were copied. After the MERGE
block (same scope using canonical_id, alias_id, mem_ids) replace the blanket
delete with a parameterized delete that matches (e:Entity {id:
$alias_id})-[r:REFERENCED_IN]->(m:Memory {id: mid}) by UNWINDing the same
mem_ids (or otherwise restrict by mem_ids) so only the edges actually copied
from mem_ids are removed; keep the existing guards around mem_ids being
non-empty and reuse the same mem_ids variable and alias_id/canonical_id symbols.
🧹 Nitpick comments (1)
automem/api/recall.py (1)

1549-1558: Return the canonical slug in the recall entities payload.

This query already resolves aliases back to the canonical entity and fetches e.slug, but the response drops it. That makes the new GET /entity/<slug> endpoint hard to use from response["entities"].

♻️ Minimal payload fix
                     entity_identities.append(
                         {
                             "id": row[0],
+                            "slug": row[1],
                             "name": row[3],
                             "category": row[2],
                             "aliases": aliases,
                             "identity": row[5],
                             "identity_source_count": int(row[6] or 0),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/recall.py` around lines 1549 - 1558, The entities payload is
missing the canonical slug even though the query returns e.slug; update the
entity dict construction in recall.py (the block that appends to
entity_identities) to include the canonical slug (use the appropriate row index
for e.slug, likely row[4]) as "slug": row[4] so GET /entity/<slug> can be
referenced from response["entities"].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automem/consolidation/entity_dedup.py`:
- Around line 3-6: Update the module docstring to reflect the actual overlap
threshold used by the logic: replace the "80%+ of memories" statement with the
current threshold (>0.6) used by find_merge_candidates() (or describe as
"strictly greater than 60%") so the documentation matches the implementation in
find_merge_candidates().

In `@automem/consolidation/identity_synthesis.py`:
- Around line 187-209: The ref count extraction assumes the first cell of
graph.query (ref_count_result) is a numeric string and directly casts it to int,
which can raise ValueError for unexpected row shapes; update the extraction
around ref_count_result -> total_ref_count to defensively handle different
result shapes by inspecting ref_count_result.result_set (or equivalent rows),
checking for None/empty, attempting a safe int conversion inside a try/except,
and falling back to sensible defaults (e.g., 0) or using the number of returned
rows if appropriate; ensure the code that passes "source_count" uses the
sanitized total_ref_count value so identity synthesis won't crash (references:
ref_count_result, total_ref_count, graph.query).

In `@consolidation.py`:
- Around line 1084-1091: The code currently only avoids advancing
self.schedules[task_type]["last_run"] when identity_skipped is true; also skip
advancing when the identity step failed or the task returned an exception.
Modify the check around self.schedules[...] so that for task_type == "identity"
you do not advance last_run if result is not a dict, if result indicates an
identity error (e.g. result.get("steps", {}).get("identity", {}).get("error")),
or if the identity step is marked skipped; only set last_run when the identity
step explicitly succeeded. Use the existing variables (task_type, result,
identity_skipped) and update the conditional that sets
self.schedules[task_type]["last_run"] accordingly.

---

Duplicate comments:
In `@automem/consolidation/entity_dedup.py`:
- Around line 202-228: The current logic snapshots mem_ids then deletes all
REFERENCED_IN edges from the alias, which can remove newly created references;
update the remove step so it only deletes edges linking the alias to the mem_ids
that were copied. After the MERGE block (same scope using canonical_id,
alias_id, mem_ids) replace the blanket delete with a parameterized delete that
matches (e:Entity {id: $alias_id})-[r:REFERENCED_IN]->(m:Memory {id: mid}) by
UNWINDing the same mem_ids (or otherwise restrict by mem_ids) so only the edges
actually copied from mem_ids are removed; keep the existing guards around
mem_ids being non-empty and reuse the same mem_ids variable and
alias_id/canonical_id symbols.

---

Nitpick comments:
In `@automem/api/recall.py`:
- Around line 1549-1558: The entities payload is missing the canonical slug even
though the query returns e.slug; update the entity dict construction in
recall.py (the block that appends to entity_identities) to include the canonical
slug (use the appropriate row index for e.slug, likely row[4]) as "slug": row[4]
so GET /entity/<slug> can be referenced from response["entities"].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f26360c5-72b5-4e47-b647-193e1a6cfbf8

📥 Commits

Reviewing files that changed from the base of the PR and between 2a7e643 and bf2fea3.

📒 Files selected for processing (5)
  • automem/api/recall.py
  • automem/consolidation/entity_dedup.py
  • automem/consolidation/identity_synthesis.py
  • consolidation.py
  • tests/contracts/test_routes_contract.py

…ntity runs

- Update entity_dedup docstring to match actual >60% threshold
- Defensive parsing of ref count query result (handles unexpected row shapes)
- Don't advance identity scheduler on error (not just skip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant