Nanobot Memory Adapter Design
0. 目标
这个 adapter 的目标不是把 Honcho、mem0、Graphiti、MemOS 改造成同一种 memory system,而是给 Nanobot 一个可替换的 memory backend 插槽:
Nanobot agent loop
-> MemoryAdapter contract
-> Honcho | mem0 | Graphiti | MemOS native client/API设计约束:
- Nanobot core 只依赖 adapter contract,不依赖任何 backend SDK 的对象模型。
- Adapter 只做 ID 映射、请求构造、结果格式化、生命周期控制,不重写 backend 的抽取、检索、推理、调度或图更新算法。
- 公共接口以 Nanobot 运行时事件为中心,而不是以统一
MemoryItem为中心。 - 四个 backend 支持什么,adapter 才暴露什么;不为了接口整齐而虚构 backend 没有的能力。
1. 共同抽象:Memory Event,不是 Memory Item
四个架构的原生单位不同:
| Backend | 原生写入单位 | 原生检索/上下文单位 |
|---|---|---|
| Honcho | peer-labeled session message | session context, peer context, representation, peer chat |
| mem0 | messages -> extracted memory items | hybrid ranked memory results |
| Graphiti | episode with reference_time | temporal facts/entities/edges with provenance |
| MemOS | API request into one or more MemCubes | grouped text/pref/tool/skill memories |
因此公共层只定义 Nanobot 事件:
MemoryEvent:
event_id: str
run_id: str
user_id: str
agent_id: str
session_id: str
turn_id: str
timestamp: datetime
messages: list[{role, content, name?, tool_call_id?}]
scenario: str | None
context: "work" | "personal" | str | None
attribute: str | None
metadata: dictAdapter 必须保留这些字段作为可追踪 metadata,但不要求每个 backend 都以同样方式存储它们。
2. Adapter Contract
最低公共接口建议采用 async-first。原因是 Graphiti 的核心写入是 async def add_episode,Honcho SDK 有 async client,MemOS 支持 async scheduler;把 contract 做成 async 可以避免 Nanobot 调用方在不同 backend 间切换时还要知道何时 await。同步 SDK 后端由 adapter 内部用 sync client 或 asyncio.to_thread 包装。
class MemoryAdapter(Protocol):
backend_name: str
async def setup_scope(self, scope: MemoryScope) -> None: ...
async def set_mode(self, mode: AdapterMode) -> ModeReceipt: ...
async def record_event(self, event: MemoryEvent) -> WriteReceipt: ...
async def retrieve(self, request: RetrieveRequest) -> RetrieveResult: ...
def format_context(self, result: RetrieveResult) -> str: ...
async def reset_scope(self, scope: MemoryScope) -> ResetReceipt: ...
async def health(self) -> BackendHealth: ...setup_scope 的含义是“创建或检查 backend 原生隔离资源”,不是创建新的 memory 实体。例如 Honcho 中是 workspace/peer/session 的存在性检查,mem0 中是确认 filters/collection namespace 可用,Graphiti 中是确认 group_id 和 driver 可用,MemOS 中是确认 cube registry/readable/writable cube 配置可用。
建议扩展接口按 capability 暴露。这里的 capability 服务 benchmark harness 和 backend-native ablation,不进入 Nanobot 主循环的最低合同:
class SupportsFeedback(Protocol):
async def apply_feedback(self, feedback: MemoryFeedback) -> FeedbackReceipt: ...
class SupportsBulkIngest(Protocol):
async def record_events(self, events: list[MemoryEvent]) -> list[WriteReceipt]: ...
class SupportsReadiness(Protocol):
async def readiness(self, scope: MemoryScope) -> BackendReadiness: ...
async def wait_until_ready(self, scope: MemoryScope, timeout_s: float) -> BackendReadiness: ...
class SupportsNativeMutation(Protocol):
async def native_mutation(self, request: NativeMutationRequest) -> NativeMutationReceipt: ...
class SupportsProvenance(Protocol):
async def provenance(self, request: ProvenanceRequest) -> ProvenanceResult: ...
class SupportsNativeQuery(Protocol):
async def native_query(self, request: NativeQueryRequest) -> NativeQueryResult: ...Backend-native extension metadata belongs in capability discovery, not in separate property-only protocols:
BackendHealth:
status: "ok" | "degraded" | "unavailable"
backend_name: str
latency_ms: float
consistency_model: "immediate" | "eventual" | "async_derived"
native_memory_types: list[str] | None
native_ingest_modes: list[str] | None
warnings: list[str]native_memory_types and native_ingest_modes are declarations for the harness and run manifest. They are not shared behavior contracts. If Graphiti uses fact_triple or custom entity/edge types, that behavior is selected through Graphiti adapter config, not through a universal Nanobot method.
关键点:
SupportsBulkIngestis for benchmark preload speed. It must still call native write APIs such as Honchoadd_messages, Graphitiadd_episode_bulk, mem0 multi-messageadd, or MemOS add paths; it must not do adapter-side fact extraction.SupportsReadinessis for fair measurement after async ingestion/derivation. Honcho may expose deriver/dreamer pending state, Graphiti may expose queue/awaited ingestion state, MemOS may expose scheduler state, and mem0 may report immediate committed state.SupportsNativeMutationis for reset/debug/explicit correction experiments. It is not used in the Nanobot main loop unless the scenario explicitly includes correction feedback.SupportsProvenanceis for historical provenance queries independent of a specific retrieve call, such as Graphiti facts invalidated in a time range or the source episode for a preference. Per-retrieve provenance belongs inRetrieveResult.rawandTrace.BackendHealth.native_memory_typesrecords MemOS native memory grouping. The adapter may summarize this as text/preference/tool/skill, but traces should keep exact native labels such astext_mem,PreferenceMemory,ToolSchemaMemory,ToolTrajectoryMemory, orSkillMemorywhen exposed. Other backends must not fake these types.BackendHealth.native_ingest_modesrecords Graphiti custom entity/edge support andEpisodeType.fact_tripleavailability. The default Nanobot path remains message episodes; fact triples are selected only by explicit Graphiti adapter config for pre-extracted or ablation experiments.snapshot/restoreis intentionally not a protocol for this benchmark. Use fresh scopes plus run manifests; exact checkpointing is backend-specific storage work.native_query只用于实验调试和 backend 专有评估,不进入 Nanobot 主循环。RetrieveRequestmust includeContextBudget. Some backends, especially Honcho, need the token budget during native retrieval (session.context(tokens=...)), not only during final string formatting.format_context属于 adapter,因为不同 backend 的结果结构不同;Nanobot 只接收最终 prompt context。它只能做最后的 prompt shaping,不能替代 backend 的原生 top-k/token-budget retrieval。
RetrieveRequest:
query: str
scope: MemoryScope
session_id: str
budget: ContextBudget
scenario: str | None
context: str | None
attribute: str | None
filters: dict2.1 Runtime Contract vs Benchmark Harness
需要分清两层:
| 层 | 负责内容 | 不应负责 |
|---|---|---|
| Nanobot runtime adapter | 写入 accumulation 事件、在 accumulation/test 中检索上下文、格式化 prompt、scope reset、health | simulator、judge、指标计算、跨 backend 对齐分析 |
| MemPABench harness | simulator/test-session orchestration、run manifest、latency/token/retrieval trace、isolation smoke test、read-only test-session 控制、outcome judging | backend 内部抽取/检索算法 |
这样做的原因是:Nanobot 需要一个稳定 memory 插槽;MemPABench 还需要评测可复现性和可观测性。两者共享 adapter,但不把评测逻辑塞进 agent loop。
建议每次 record_event 和 retrieve 都返回结构化 trace:
Trace:
backend_name: str
scope: MemoryScope
native_operation: str
native_ids: list[str]
latency_ms: float
consistency: "committed" | "queued" | "pending"
token_count: int | None
retrieved_count: int | None
top_score: float | None
oldest_retrieved_at: datetime | None
newest_retrieved_at: datetime | None
representation_version: str | None
last_derived_at: datetime | None
warnings: list[str]Trace 供 benchmark 记录,不进入 Nanobot prompt,除非显式开启 debug。retrieved_count、top_score、oldest_retrieved_at、newest_retrieved_at 从 RetrieveResult.raw 聚合而来;没有原生 score/timestamp 的 backend 填 null,不能编造。Honcho 如果暴露 representation/context 的 version 或 updated time,应填 representation_version / last_derived_at;否则 PET observability 标记为 weak。Trace 是行为结果的解释材料,不是主评分对象;CS/PET/IQI 主评分看 simulator 中 Nanobot interaction strategy / response / action 是否符合当前 user setting。
2.2 Experimental Conditions vs Memory Architectures
GAME.md §4.6 includes controlled conditions that are not memory architectures:
| Condition | Adapter treatment | Architecture comparison? |
|---|---|---|
| No memory, anonymized/named | retrieve returns empty context; record_event only writes non-retrievable run traces if needed | No |
| Rolling summary | controlled summary provider; may reuse Nanobot’s file-memory/consolidation implementation, but it is labeled as a baseline | No |
| Full-history long-context | packs prior accumulation transcripts into the prompt budget in chronological order | No |
| Simple RAG | mem0 adapter with native infer=False config | Ablation of mem0 extraction, not a new architecture |
| Static profile prompt | fixed persona/profile text provider, no accumulation writes | Ceiling/control |
| Oracle IPaS matrix | fixed structured GT preference provider, no accumulation writes | Ceiling/control |
| Honcho/mem0/Graphiti/MemOS | backend adapters preserving native write/read semantics | Yes |
Nanobot’s current native memory should not be presented as a fifth memory architecture. Source check in nanobot-ref shows it is a file-based runtime mechanism:
ContextBuilder.build_system_prompt()injectsMemoryStore.get_memory_context()into the system prompt.MemoryStorestoresmemory/MEMORY.mdas full long-term markdown andmemory/HISTORY.mdas a grep-searchable log.Session.get_history()keeps unconsolidated JSONL messages in context untillast_consolidated.MemoryConsolidator.maybe_consolidate_by_tokens()only summarizes when the prompt approaches the token budget, then asks an LLM to rewrite the whole markdown memory.
This is closest to the rolling-summary baseline, not to Honcho/mem0/Graphiti/MemOS. If implemented, call it rolling_summary or nanobot_file_summary as a controlled baseline condition. Do not include it in architecture-diversity claims, and do not let its native session-history behavior accidentally give it extra context that the real backends do not receive.
Fairness rule: every condition still enters Nanobot through the same prompt-context slot and the same simulator/test harness. Controlled providers can satisfy the same minimal methods (record_event, retrieve, format_context, set_mode) so the harness has no per-condition special cases, but traces must label them as condition_kind=control rather than condition_kind=architecture.
3. Scope 和 ID 映射
公共 scope:
MemoryScope:
run_id: str
persona_id: str
backend_namespace: str | None
agent_id: str | None映射原则:
| Nanobot/MemPA | Honcho | mem0 | Graphiti | MemOS |
|---|---|---|---|---|
| run | workspace or workspace namespace | run_id + collection/namespace | group_id prefix | cube namespace |
| persona/user | user peer | user_id | group_id suffix / metadata | user_id, cube id |
| assistant | assistant peer | optional agent_id or default Nanobot actor | source metadata | optional agent_id / tags |
| session | session | run_id or metadata | episode name/source | session_id |
| turn timestamp | message created_at | metadata timestamp | reference_time | request metadata |
| scenario/context/attribute | metadata / message metadata | filters metadata | source_description / metadata | custom_tags/info |
MemOS graph DB isolation needs one extra implementation choice: when use_multi_db=True, each user gets physical DB isolation; when use_multi_db=False, users share a DB and are separated by user_name/user tag logic. The adapter should surface this as native MemOS config, not as a universal isolation feature.
Isolation invariant:
No retrieve call for scope A may return data from scope B unless the backend config explicitly enables sharing.4. Runtime Flow
4.0 Allowed Adapter State
Adapter 可以保存的状态只限于运行 glue:
| Allowed | Not allowed |
|---|---|
| backend config hash | normalized universal preference objects |
| scope-to-native-id mapping | cross-backend memory schema |
| consistency/readiness markers | adapter-owned embeddings or reranker scores |
| benchmark trace ids | adapter-generated summaries replacing backend summaries |
换句话说,adapter 可以记住“Nanobot 的 user_a/run_001 对应 Honcho peer X、Graphiti group Y、MemOS cube Z”,但不维护一份独立的偏好数据库。
4.1 Accumulation Turn
Nanobot receives/sends turn
-> build MemoryEvent
-> adapter.record_event(event)
-> backend native write path
-> WriteReceipt(status=committed|queued|pending, native_ids, latency)Adapters may return queued or pending; Nanobot must not assume all memory systems are immediately consistent.
4.2 Test Turn
Nanobot must use the latest memory state available at that point in the timeline to choose an interaction strategy, act and interact with the simulator and be judged by the resulting interaction outcome.
Benchmark reaches a test insertion point, e.g. before accumulation session 15
-> wait for adapter readiness for all prior accumulation writes
-> adapter.set_mode(read_only=True, reason="test_session")
-> simulator starts a normal test scenario with current user setting
-> Nanobot builds RetrieveRequest(query, scope, budget, scenario/context/attribute)
-> adapter.retrieve(...)
-> adapter.format_context(result)
-> Nanobot chooses interaction strategy / response / action
-> simulator produces user-side response and outcome
-> judge scores Nanobot behavior against the current user setting
-> adapter.set_mode(read_only=False, reason="accumulation")
-> continue the accumulation timelineRead-only test sessions are controlled by the benchmark harness, not by ad hoc per-call flags. Before entering a test session, the harness calls:
await adapter.set_mode(AdapterMode(read_only=True, reason="test_session"))While read_only=True, record_event must either no-op with a skipped_read_only receipt or store only non-retrievable audit traces outside the backend memory namespace. This applies to the test user messages, Nanobot responses, simulator events, and judge artifacts. They are evaluation records, not new long-term memory evidence. After the test session, the harness switches back:
await adapter.set_mode(AdapterMode(read_only=False, reason="accumulation"))This keeps behavioral probes from contaminating later memory and avoids each Nanobot call site inventing its own test flag. RetrieveResult.raw, Trace, and native diagnostics are only used for debugging, attribution, and failure analysis; they are not the scoring target.
After the 36 test sessions complete, the adapter scope is not reset. Test results are independent, but memory state should remain at the final accumulated state for post-run analysis. reset_scope is called only at the start of a new benchmark run or during explicit smoke/reset tests.
4.3 Backend Swap Flow
To switch from one backend to another:
- Freeze the run identity:
run_id,persona_id, and optionalagent_id. Keep seed/replicate metadata in the harness run manifest, not in the adapter scope unless isolation requires it. - Change only
memory.backendin Nanobot/MemPA config. - Provide only that backend’s native config block.
- Instantiate through
MemoryAdapterRegistry. - Run
adapter.health(). - Run
setup_scope(scope). - Run capability discovery:
supports_feedback,supports_bulk_ingest,supports_readiness,supports_native_mutation,supports_provenance, backend-native memory/ingest modes,consistency_model,native_query_modes. - Run an isolation smoke test: write one synthetic event under scope A, assert scope B cannot retrieve it.
- Run a write/read smoke test using backend-native semantics.
- Honcho-specific smoke gate: after writing one message, poll
queue_statusor equivalent pending-state endpoint until deriver work is empty or timeout; then testretrieve. If it times out, mark retrieval consistency aspendingrather than passing a false green smoke test. - Start accumulation/test loop.
- Store run manifest: backend name, native config hash, capability flags, scope mapping, harness seed/replicate metadata.
No Nanobot agent-loop code should change during this flow.
Example config shape:
memory:
backend: mem0
scope:
run_id: user_a_c1
persona_id: user_a
agent_id: null
read_only_test_sessions: true
backends:
mem0:
top_k: 12
threshold: 0.1
rerank: true
graphiti:
group_id_template: "{run_id}:{persona_id}"
search_config: combined_hybrid
honcho:
workspace_template: "{run_id}"
context_tokens: 8000
memos:
writable_cube_template: "{run_id}:{persona_id}"
mode: mixture
async_mode: async5. Backend Mappings
5.1 Honcho Adapter
Native model:
Workspace
-> Peer(user), Peer(assistant)
-> Session
-> peer-labeled Messages
-> background deriver/dreamer
-> context, representation, peer card, peer chatWrite path:
record_event
-> resolve workspace_id
-> resolve user peer and assistant peer
-> resolve session_id
-> session.add_messages([peer.message(...)])Read path options:
- Runtime default:
session.context(summary=True, tokens=request.budget.max_tokens, peer_target=user_peer, search_query=request.query, search_top_k=...). - Supplemental user model context:
session.representation(user_peer)orpeer.context(...), only if configured and included withinrequest.budget. - Diagnostic/native query only:
peer.chat(query)belongs behindSupportsNativeQuerybecause it triggers another LLM call and changes benchmark comparability. - Search fallback:
peer.search/session.search, only ifsession.contextis unavailable or configured off.
CS mapping: scenario/context/attribute should be passed as message/session metadata where Honcho supports it and included in search_query or context request only as native parameters allow. If Honcho does not expose field-level metadata filters for context, the adapter must record that CS filtering is semantic/contextual rather than hard-filtered.
PET observability: Honcho PET is weak unless representation/context exposes updated_at, last_derived_at, or an equivalent version. The adapter must trace any exposed derivation timestamp/version and must not invent fact-level timestamps inside prose representation.
For benchmark reproducibility, the adapter config must pin exactly one runtime read strategy. The recommended default is session.context; peer.chat must not be used in the agent loop unless the experiment explicitly evaluates Honcho’s native chat endpoint.
Performance preservation:
- Keep deriver/dreamer asynchronous; do not force synchronous representation refresh per turn.
- Keep peers separate. User and assistant must not be collapsed into one actor.
- Use Honcho SDK/REST, not direct DB writes.
- Use token budget controls rather than dumping all messages.
- Accept eventual consistency and expose pending queue status where available.
- Pass
request.budget.max_tokensinto Honcho retrieval, so Honcho performs native context selection instead of adapter-side over-fetching.
Limits:
- Exact
snapshot/restoreis not a portable Honcho capability. - PET historical query is weak unless Honcho exposes time-scoped reads; adapter should not fake temporal state.
- Fresh writes may not immediately appear in representation/context until background derivation catches up; smoke tests and run traces must record this consistency state.
5.2 mem0 Adapter
Native model:
messages
-> Memory.add(infer=True)
-> LLM extraction
-> memory item + entity linking + vector/BM25 indexes
-> Memory.search(...)Write path:
record_event
-> Memory.add(
messages=event.messages,
user_id=scope.persona_id,
agent_id=scope.agent_id, # optional; omit if null
run_id=scope.run_id,
metadata={scenario, context, attribute, timestamp, ...},
infer=True
)Read path:
retrieve
-> Memory.search(
query=request.query,
filters={user_id, agent_id?, run_id, scenario, context, attribute},
top_k=request.budget.max_items,
threshold=...,
rerank=...
)CS mapping: user_id and run_id are mandatory isolation filters; scenario, context, and attribute are mandatory filters for CS probes when present in the request. If a CS probe intentionally tests cross-context recall, the harness must mark that in request.filters rather than silently omitting filters.
PET observability: search results must preserve native memory id, score, and created_at/updated_at from result metadata when available. format_context may omit some details for prompt compactness, but RetrieveResult.raw and trace must keep them for PET analysis.
Performance preservation:
- Never upsert directly into vector store.
- Never replace
Memory.searchwithget_allplus Python filtering. - Preserve hybrid search, entity boosting, rerank, threshold, and metadata filters as backend config.
- Keep
user_id/agent_id/run_idisolation.
Limits:
- mem0’s natural unit is extracted fact memory; it is less native for full session representation than Honcho.
- Exact restore depends on whether we can replay memories with
infer=Falseor clone the backing store.
5.3 Graphiti Adapter
Native model:
episode(reference_time, group_id)
-> entities + facts
-> temporal edges with valid_at/invalid_at
-> hybrid semantic/keyword/graph searchWrite path:
record_event
-> Graphiti.add_episode(
name=f"{session_id}:{turn_id}",
episode_body=serialized messages,
source_description=scenario/context/attribute/session metadata,
reference_time=event.timestamp,
source=EpisodeType.message,
group_id=f"{run_id}:{persona_id}"
)EpisodeType.fact_triple is available for pre-extracted fact triples, but the default Nanobot path should use EpisodeType.message unless an experiment explicitly wants to bypass Graphiti’s extraction from conversational evidence. Custom entity/edge types and fact-triple ingestion should be declared in BackendHealth.native_ingest_modes and selected through Graphiti adapter config; they should be available for explicit graph-schema ablations, not modeled as a shared adapter schema.
Read path:
retrieve
-> Graphiti.search(query, group_ids=[...], num_results=request.budget.max_items)
or Graphiti.search_(query, config=SearchConfig(...))CS mapping: group_id is the hard isolation boundary. Scenario/context/attribute should be included in episode source_description and in the semantic query text for CS probes unless Graphiti search config exposes a native metadata filter in the selected implementation. The adapter must record whether CS filtering was hard-filtered or semantic-only.
PET observability: preserve fact/edge valid time, invalid time, provenance episode id, and relevance score in RetrieveResult.raw. This is Graphiti’s main advantage for PET and should not be flattened away before scoring.
Performance preservation:
- Always pass timezone-stable
reference_time. - Do not express preference evolution as destructive delete; let Graphiti invalidate old facts.
- Do not directly write graph nodes/edges.
- Use
group_idpartitioning. - Use
search/search_and Graphiti’s search config/reranker; do not dump the graph and rerank in adapter code. - For bulk benchmark preload, use
add_episode_bulkor queueing.
Limits:
- Graphiti has no built-in user/session management; Nanobot must own scope mapping.
- Formatter must keep fact, time, and provenance visible enough for PET analysis.
reset_scopefor Graphiti is backend-specific/best-effort. There is no single high-leveldelete_all(group_id)API; implementation may need driver-level Cypher/graph operations to delete a group partition, or use a freshgroup_idper run as the safer reset strategy.
5.4 MemOS Adapter
Native model:
User / project
-> one or more MemCubes
-> text / preference / tool / skill memories
-> scheduler, feedback, graph/vector/fulltext storesWrite path:
record_event
-> APIADDRequest(
user_id=...,
session_id=event.session_id,
messages=...,
writable_cube_ids=...,
custom_tags=[scenario, context, attribute],
info=metadata,
async_mode=config.async_mode,
mode=config.mode
)
-> add_memories(...)Read path:
retrieve
-> APISearchRequest(
user_id=...,
query=...,
session_id=request.session_id,
readable_cube_ids=readable cubes,
filter={tags/scenario/context/attribute according to MemOS filter syntax},
top_k=request.budget.max_items,
mode=fast|fine|mixture,
include_preference=True,
search_tool_memory=...
)
-> search_memories(...)CS mapping: scenario/context/attribute are written through APIADDRequest.custom_tags. Current APISearchRequest does not expose a custom_tags field; retrieve must express those tags through the native filter field or the selected MemOS search filter syntax. If the selected MemOS path cannot hard-filter on tags, the adapter must record CS filtering as soft/semantic rather than pretending it was hard-filtered. Cube routing remains native MemOS config and should not be replaced by adapter-side filtering.
PET default: benchmark PET should use accumulation-only memory evolution through add_memories. apply_feedback is reserved for a separately labeled feedback-enabled experiment group, because it changes the intervention being evaluated.
Typed memory exposure: MemOS memory groups should be preserved using native labels, including cube-level memory groups such as text_mem and result metadata types such as LongTermMemory, UserMemory, PreferenceMemory, ToolSchemaMemory, ToolTrajectoryMemory, and SkillMemory when exposed. text/preference/tool/skill is a conceptual grouping for the adapter doc, not a new enum. Other backends should not be asked to emulate these memory types.
Source check: current APIADDRequest, APISearchRequest, and APIFeedbackRequest use session_id for add/search/feedback. conversation_id appears as a backward-compatible alias on delete-related request models, so the adapter should map Nanobot session_id to MemOS session_id for the main runtime path and only use conversation_id for APIs that explicitly require it.
Feedback path:
apply_feedback
-> APIFeedbackRequest(...)
-> feedback_memories(...)Performance preservation:
- Preserve
async_mode; do not force all writes to synchronous mode. - Preserve
mode=fast/fine/mixture,dedup,relativity, andtop_kas config. - Preserve memory type separation in formatted context.
- Preserve cube routing; do not collapse all runs into one flat user namespace.
- Tool calls should remain tool-memory-compatible records, not plain text.
Limits:
- MemOS exposes more native entities than the common adapter. The common interface should not force other backends to mimic cube/type/feedback semantics.
- Snapshot/restore depends on cube registry and physical stores.
6. Context Formatting
format_context must output a stable section for the agent prompt, but backend-specific contents can differ:
<memory-context backend="graphiti" scope="...">
...
</memory-context>Formatting rules:
- Keep source labels: backend, score if available, timestamp if available, native id if available.
- Keep PET-relevant temporal fields for Graphiti and timestamped sources for others.
- Keep MemOS memory type groups separate.
- Keep Honcho representation/context as prose, not forced into fake scored bullets.
- Enforce token budget in the formatter, not by disabling backend retrieval algorithms.
- For cross-backend CS/PET/IQI evaluation, the harness should pass a shared target token range through
ContextBudget(min_tokens/max_tokensor equivalent). The formatter should fit that range when enough native evidence is available; if a backend returns less evidence, trace the shortfall rather than padding with invented content.
7. Capability Matrix
| Capability | Honcho | mem0 | Graphiti | MemOS |
|---|---|---|---|---|
| Write interaction | session messages | Memory.add | add_episode | add_memories |
| Search/retrieve | context/search/chat | Memory.search | search/search_ | search_memories |
| User isolation | workspace/peer/session | user/agent/run filters | group_id | user/cube |
| Native evolution | deriver/dreamer derived state | additive extracted facts/entity links | temporal invalidation | feedback/cube memory evolution |
| Async path | deriver/dreamer | mostly sync API, backend dependent | async graph ingestion API | scheduler async mode |
| Bulk ingest | add_messages | multi-message add | add_episode_bulk / queue | batched add path if exposed |
| Readiness | deriver/dreamer pending | usually immediate/backend dependent | async ingestion/queue | scheduler async status |
| Native mutation | object lifecycle if exposed | update/delete/history/delete_all | delete episode/edge/admin clear where available | feedback/delete/cube lifecycle |
| Provenance | message/session/representation metadata | memory id/score/timestamps | fact/edge/time/provenance | cube/type/source/relativity |
| Native memory types | none | none | none | cube memory groups plus memory_type labels: text/preference/tool/skill and exact native labels when exposed |
| Native ingest modes | message/session | messages/infer config | message/json/text/fact_triple + custom entity/edge schema | mode/cube/type-aware add |
| Exact snapshot | not common | not common | not common | not common |
| Feedback API | not common | update/delete, not feedback-first | new episode/correction evidence | native feedback |
| Temporal query | weak | weak | strong | partial/backend dependent |
8. Implementation Layout
Suggested repo shape:
memory/
base.py
events.py
scope.py
registry.py
formatting.py
backends/
honcho_adapter.py
mem0_adapter.py
graphiti_adapter.py
memos_adapter.py
conditions/
no_memory.py
rolling_summary.py
full_history.py
static_profile.py
oracle_ipas.py
tests/
contract/
test_isolation.py
test_record_retrieve.py
test_read_only_test_sessions.pyregistry.py loads a backend by config:
memory:
backend: graphiti
scope:
run_id: user_a_c1
backends:
graphiti:
driver: neo4j
group_id_template: "{run_id}:{persona_id}"
search_mode: hybridOnly the selected backend config is required at runtime.
Controlled condition providers live beside backend adapters because they share the Nanobot-facing prompt-context slot, not because they are memory architectures. Their config should be selected by experiment condition:
memory:
condition: rolling_summary
scope:
run_id: user_a_c1
persona_id: user_a
conditions:
rolling_summary:
max_tokens: 8000
update_policy: after_accumulation_sessionFor architecture runs, memory.backend selects Honcho/mem0/Graphiti/MemOS. For control/ceiling runs, memory.condition selects a controlled provider. The manifest should record both fields and enforce that exactly one runtime context provider is active.
9. Contract Tests
Required tests for every adapter:
record_eventaccepts a two-message user/assistant turn and returns a receipt.retrieveunder the same scope can find relevant context after backend-specific consistency/readiness wait.- Scope A data does not appear under scope B.
- Read-only simulator test sessions do not write test user messages, Nanobot responses, simulator events, or judge artifacts into backend memory.
retrievepasses budget into native backend retrieval where supported, andformat_contextcontains backend label without over-fetching-and-trimming as the primary retrieval strategy.- Reset only clears the configured benchmark scope.
- Backend-native errors are normalized into adapter errors without hiding raw diagnostics.
RetrieveResult.rawpreserves native provenance needed by PET/IQI: ids, scores when available, timestamps when available, and backend-specific type/provenance labels.- CS probes pass scenario/context/attribute through the backend-specific filter or semantic mapping documented for that adapter.
- Test sessions run through simulator and judge behavior outcomes; they do not call
reset_scope, and reset is limited to new benchmark runs and explicit reset tests. - Controlled condition providers use the same read-only test-session gate as backend adapters and cannot write test-session turns into prompt-retrievable state.
Backend-specific tests:
- Honcho: user and assistant peers remain distinct; deriver pending state is visible.
- mem0:
infer=Truecreates extracted memory; filters includeuser_id/run_id. - Graphiti:
reference_timeis non-null; changed facts keep temporal metadata; custom entity/edge schema andEpisodeType.fact_tripleare available only through explicit native-ingest config. - MemOS: async mode eventual retrieval works; memory type groups are preserved; CS tags written through
custom_tagsare retrieved through nativefilterwhen supported.
9.1 Source Anchors Checked
本设计至少对齐以下源码入口:
| Backend | Checked source anchors |
|---|---|
| Honcho | sdks/python/src/honcho/session.py: Session.add_messages, Session.context, Session.search; sdks/python/src/honcho/peer.py: Peer.chat, Peer.search, Peer.representation, Peer.context |
| mem0 | mem0/memory/main.py: Memory.add, Memory.search, Memory.get_all, Memory.update, Memory.delete; add/search both enforce user_id/agent_id/run_id scoping |
| Graphiti | graphiti_core/graphiti.py: Graphiti.add_episode, add_episode_bulk, search, search_; add_episode includes reference_time, group_id, custom entity/edge types, saga support |
| MemOS | src/memos/api/product_models.py: APIADDRequest, APISearchRequest, APIFeedbackRequest; src/memos/multi_mem_cube/single_cube.py and composite_cube.py: add_memories, search_memories, feedback_memories; graph DB isolation config uses use_multi_db plus logical user_name separation |
这些 anchors 不是实现 API 的完整列表,只是防止设计偏离当前源码的最低依据。
9.2 Completion Checklist
| User requirement | Design evidence |
|---|---|
| 四个记忆架构尽可能解耦地通过 adapter 接进 Nanobot | Sections 1-5 define event-centered contract, scope mapping, registry/config flow, and per-backend adapters without shared backend internals |
| 保证四个架构本身性能不因合成失去原设计 | Each backend section has a Performance preservation block preserving native write/search/async/index/rerank/cube/temporal semantics |
| 如无必要,勿增实体,不添加四个 architecture 不支持的功能 | Section 4.0 restricts adapter state; Section 10 rejects universal preference entity and common exact snapshot |
| 更换某一个 memory 架构时能清晰 follow 整个 flow | Section 4.3 gives the 12-step backend swap flow plus config shape |
| GAME.md §4.6/§4.7 的 controlled conditions 不误并入架构比较 | Section 2.2 separates control/ceiling providers from architecture adapters and assigns Nanobot native file memory to the rolling-summary baseline |
| 用 agent-collab 与 Claude 协作并做 2-3 轮辩证/批评 | TASK-002 contains three Codex rounds plus Claude Review; section 9.4 records the review resolution. Prior Claude corrections from TASK-001 about Graphiti fact_triple and MemOS use_multi_db are also incorporated. |
9.3 Collaboration Status
agent-collab task:
TASK-002: Memory Adapter design for Nanobot backendsCodex posted three review rounds:
- Round 1: event-oriented adapter contract, native write paths, backend swap flow, snapshot concerns.
- Round 2: source-confirmed API details and self-critique about runtime contract vs benchmark harness.
- Round 3: allowed adapter state, no fifth memory architecture, expanded backend swap flow.
Claude review status:
- Review requests were sent to both
Claudeandclaudeinboxes. - Claude Review was later found in
.agent-collab/tasks/TASK-002.mdand incorporated. - The final design includes three Codex rounds, one Claude review round with concrete blockers, and a Codex response to that review. See section 9.4.
9.4 Claude Review Resolution
Claude reviewed TASK-002 and raised six concrete issues. Resolution:
| Claude critique | Resolution in this document |
|---|---|
Token budget must flow into retrieve, not only format_context | Section 2 now makes RetrieveRequest.budget explicit; Honcho section passes request.budget.max_tokens to session.context(tokens=...); mem0/Graphiti/MemOS map budget.max_items into native top-k/num_results |
| Honcho retrieve listed four paths without default | Section 5.1 now pins session.context as runtime default, allows representation/context as configured supplement, and moves peer.chat behind SupportsNativeQuery |
| Protocol sync/async unclear | Section 2 now defines async-first adapter methods and says sync backends are wrapped inside adapter |
| MemOS field mismatch | Rechecked source: add/search/feedback models use session_id; delete model has conversation_id alias. Section 5.4 now states this explicitly and keeps runtime mapping to session_id |
| Read-only test mode source unclear | Section 4.2 now assigns control to the benchmark harness through adapter.set_mode(read_only=True/False) |
Graphiti reset_scope lacks native high-level clear API | Section 5.3 now marks Graphiti reset as backend-specific/best-effort and recommends fresh group_id per run when safer |
| Honcho smoke test should wait for deriver queue | Section 4.3 now adds a Honcho-specific queue/pending-state smoke gate |
Claude also agreed with the event-centered contract, capability protocols, no universal preference entity, no common exact snapshot, allowed adapter state, and trace consistency model.
9.5 TASK-003 Evaluation-Fitness Update
TASK-003 reviewed the design specifically against MemPABench CS, PET, and IQI scoring. The resulting changes are:
- Added harness-facing optional capabilities:
SupportsBulkIngest,SupportsReadiness,SupportsNativeMutation, andSupportsProvenance. - Removed
SupportsSnapshotfrom the protocol list; fresh scope plus run manifest is the benchmark default. - Added backend-native extension discovery fields in
BackendHealthfor MemOS memory types and Graphiti ingest/schema modes without turning them into shared schema; corrected MemOS search mapping to useAPISearchRequest.filter, not a nonexistent search-sidecustom_tagsfield. - Added PET trace fields for retrieved counts, score, timestamp range, and Honcho derivation/version metadata when exposed.
- Added per-backend CS mapping for scenario/context/attribute filters or semantic-only retrieval.
- Fixed MemOS PET default to accumulation-only; feedback is a separate experiment group.
- Clarified that test sessions are read-only simulator-facing behavioral probes, not memory inspection probes; they do not reset memory state after completion.
- Added GAME.md §4.6/§4.7 condition boundary: Nanobot native file memory is a rolling-summary/control implementation option, not a fifth architecture under test.
10. Finalized Evaluation Conventions
These points are no longer open questions; they are part of the adapter design:
- Nanobot receives one formatted prompt context string for runtime behavior, while MemPABench stores structured retrieval evidence separately. Every retrieve call should return both
RetrieveResult.rawandformatted_contextso the agent loop stays simple and the harness can still analyze PET/IQI failures. - Read-only simulator test sessions must not write test user messages, Nanobot responses, simulator events, judge artifacts, or final judged responses into backend memory. If these records are needed, store them as non-retrievable evaluation artifacts outside the backend memory namespace.
- Native query is allowed only for diagnostics, not in the Nanobot agent loop. Allowed diagnostic examples include Graphiti group edge/fact counts, Honcho raw peer/session representation inspection, mem0 memory count/history checks, and MemOS cube/type inventory. Native query results must be logged as diagnostics and never injected into the Nanobot prompt.