Nanobot Memory Adapter Design

0. 目标

这个 adapter 的目标不是把 Honcho、mem0、Graphiti、MemOS 改造成同一种 memory system，而是给 Nanobot 一个可替换的 memory backend 插槽：

Nanobot agent loop
  -> MemoryAdapter contract
  -> Honcho | mem0 | Graphiti | MemOS native client/API

设计约束：

Nanobot core 只依赖 adapter contract，不依赖任何 backend SDK 的对象模型。
Adapter 只做 ID 映射、请求构造、结果格式化、生命周期控制，不重写 backend 的抽取、检索、推理、调度或图更新算法。
公共接口以 Nanobot 运行时事件为中心，而不是以统一 MemoryItem 为中心。
四个 backend 支持什么，adapter 才暴露什么；不为了接口整齐而虚构 backend 没有的能力。

1. 共同抽象：Memory Event，不是 Memory Item

四个架构的原生单位不同：

Backend	原生写入单位	原生检索/上下文单位
Honcho	peer-labeled session message	session context, peer context, representation, peer chat
mem0	messages -> extracted memory items	hybrid ranked memory results
Graphiti	episode with `reference_time`	temporal facts/entities/edges with provenance
MemOS	API request into one or more MemCubes	grouped text/pref/tool/skill memories

因此公共层只定义 Nanobot 事件：

MemoryEvent:
    event_id: str
    run_id: str
    user_id: str
    agent_id: str
    session_id: str
    turn_id: str
    timestamp: datetime
    messages: list[{role, content, name?, tool_call_id?}]
    scenario: str | None
    context: "work" | "personal" | str | None
    attribute: str | None
    metadata: dict

Adapter 必须保留这些字段作为可追踪 metadata，但不要求每个 backend 都以同样方式存储它们。

2. Adapter Contract

最低公共接口建议采用 async-first。原因是 Graphiti 的核心写入是 async def add_episode，Honcho SDK 有 async client，MemOS 支持 async scheduler；把 contract 做成 async 可以避免 Nanobot 调用方在不同 backend 间切换时还要知道何时 await。同步 SDK 后端由 adapter 内部用 sync client 或 asyncio.to_thread 包装。

class MemoryAdapter(Protocol):
    backend_name: str
 
    async def setup_scope(self, scope: MemoryScope) -> None: ...
    async def set_mode(self, mode: AdapterMode) -> ModeReceipt: ...
    async def record_event(self, event: MemoryEvent) -> WriteReceipt: ...
    async def retrieve(self, request: RetrieveRequest) -> RetrieveResult: ...
    def format_context(self, result: RetrieveResult) -> str: ...
    async def reset_scope(self, scope: MemoryScope) -> ResetReceipt: ...
    async def health(self) -> BackendHealth: ...

setup_scope 的含义是“创建或检查 backend 原生隔离资源”，不是创建新的 memory 实体。例如 Honcho 中是 workspace/peer/session 的存在性检查，mem0 中是确认 filters/collection namespace 可用，Graphiti 中是确认 group_id 和 driver 可用，MemOS 中是确认 cube registry/readable/writable cube 配置可用。

建议扩展接口按 capability 暴露。这里的 capability 服务 benchmark harness 和 backend-native ablation，不进入 Nanobot 主循环的最低合同：

class SupportsFeedback(Protocol):
    async def apply_feedback(self, feedback: MemoryFeedback) -> FeedbackReceipt: ...
 
class SupportsBulkIngest(Protocol):
    async def record_events(self, events: list[MemoryEvent]) -> list[WriteReceipt]: ...
 
class SupportsReadiness(Protocol):
    async def readiness(self, scope: MemoryScope) -> BackendReadiness: ...
    async def wait_until_ready(self, scope: MemoryScope, timeout_s: float) -> BackendReadiness: ...
 
class SupportsNativeMutation(Protocol):
    async def native_mutation(self, request: NativeMutationRequest) -> NativeMutationReceipt: ...
 
class SupportsProvenance(Protocol):
    async def provenance(self, request: ProvenanceRequest) -> ProvenanceResult: ...
 
class SupportsNativeQuery(Protocol):
    async def native_query(self, request: NativeQueryRequest) -> NativeQueryResult: ...

Backend-native extension metadata belongs in capability discovery, not in separate property-only protocols:

BackendHealth:
    status: "ok" | "degraded" | "unavailable"
    backend_name: str
    latency_ms: float
    consistency_model: "immediate" | "eventual" | "async_derived"
    native_memory_types: list[str] | None
    native_ingest_modes: list[str] | None
    warnings: list[str]

native_memory_types and native_ingest_modes are declarations for the harness and run manifest. They are not shared behavior contracts. If Graphiti uses fact_triple or custom entity/edge types, that behavior is selected through Graphiti adapter config, not through a universal Nanobot method.

关键点：

SupportsBulkIngest is for benchmark preload speed. It must still call native write APIs such as Honcho add_messages, Graphiti add_episode_bulk, mem0 multi-message add, or MemOS add paths; it must not do adapter-side fact extraction.
SupportsReadiness is for fair measurement after async ingestion/derivation. Honcho may expose deriver/dreamer pending state, Graphiti may expose queue/awaited ingestion state, MemOS may expose scheduler state, and mem0 may report immediate committed state.
SupportsNativeMutation is for reset/debug/explicit correction experiments. It is not used in the Nanobot main loop unless the scenario explicitly includes correction feedback.
SupportsProvenance is for historical provenance queries independent of a specific retrieve call, such as Graphiti facts invalidated in a time range or the source episode for a preference. Per-retrieve provenance belongs in RetrieveResult.raw and Trace.
BackendHealth.native_memory_types records MemOS native memory grouping. The adapter may summarize this as text/preference/tool/skill, but traces should keep exact native labels such as text_mem, PreferenceMemory, ToolSchemaMemory, ToolTrajectoryMemory, or SkillMemory when exposed. Other backends must not fake these types.
BackendHealth.native_ingest_modes records Graphiti custom entity/edge support and EpisodeType.fact_triple availability. The default Nanobot path remains message episodes; fact triples are selected only by explicit Graphiti adapter config for pre-extracted or ablation experiments.
snapshot/restore is intentionally not a protocol for this benchmark. Use fresh scopes plus run manifests; exact checkpointing is backend-specific storage work.
native_query 只用于实验调试和 backend 专有评估，不进入 Nanobot 主循环。
RetrieveRequest must include ContextBudget. Some backends, especially Honcho, need the token budget during native retrieval (session.context(tokens=...)), not only during final string formatting.
format_context 属于 adapter，因为不同 backend 的结果结构不同；Nanobot 只接收最终 prompt context。它只能做最后的 prompt shaping，不能替代 backend 的原生 top-k/token-budget retrieval。

RetrieveRequest:
    query: str
    scope: MemoryScope
    session_id: str
    budget: ContextBudget
    scenario: str | None
    context: str | None
    attribute: str | None
    filters: dict

2.1 Runtime Contract vs Benchmark Harness

需要分清两层：

层	负责内容	不应负责
Nanobot runtime adapter	写入 accumulation 事件、在 accumulation/test 中检索上下文、格式化 prompt、scope reset、health	simulator、judge、指标计算、跨 backend 对齐分析
MemPABench harness	simulator/test-session orchestration、run manifest、latency/token/retrieval trace、isolation smoke test、read-only test-session 控制、outcome judging	backend 内部抽取/检索算法

这样做的原因是：Nanobot 需要一个稳定 memory 插槽；MemPABench 还需要评测可复现性和可观测性。两者共享 adapter，但不把评测逻辑塞进 agent loop。

建议每次 record_event 和 retrieve 都返回结构化 trace：

Trace:
    backend_name: str
    scope: MemoryScope
    native_operation: str
    native_ids: list[str]
    latency_ms: float
    consistency: "committed" | "queued" | "pending"
    token_count: int | None
    retrieved_count: int | None
    top_score: float | None
    oldest_retrieved_at: datetime | None
    newest_retrieved_at: datetime | None
    representation_version: str | None
    last_derived_at: datetime | None
    warnings: list[str]

Trace 供 benchmark 记录，不进入 Nanobot prompt，除非显式开启 debug。retrieved_count、top_score、oldest_retrieved_at、newest_retrieved_at 从 RetrieveResult.raw 聚合而来；没有原生 score/timestamp 的 backend 填 null，不能编造。Honcho 如果暴露 representation/context 的 version 或 updated time，应填 representation_version / last_derived_at；否则 PET observability 标记为 weak。Trace 是行为结果的解释材料，不是主评分对象；CS/PET/IQI 主评分看 simulator 中 Nanobot interaction strategy / response / action 是否符合当前 user setting。

2.2 Experimental Conditions vs Memory Architectures

GAME.md §4.6 includes controlled conditions that are not memory architectures:

Condition	Adapter treatment	Architecture comparison?
No memory, anonymized/named	`retrieve` returns empty context; `record_event` only writes non-retrievable run traces if needed	No
Rolling summary	controlled summary provider; may reuse Nanobot’s file-memory/consolidation implementation, but it is labeled as a baseline	No
Full-history long-context	packs prior accumulation transcripts into the prompt budget in chronological order	No
Simple RAG	mem0 adapter with native `infer=False` config	Ablation of mem0 extraction, not a new architecture
Static profile prompt	fixed persona/profile text provider, no accumulation writes	Ceiling/control
Oracle IPaS matrix	fixed structured GT preference provider, no accumulation writes	Ceiling/control
Honcho/mem0/Graphiti/MemOS	backend adapters preserving native write/read semantics	Yes

Nanobot’s current native memory should not be presented as a fifth memory architecture. Source check in nanobot-ref shows it is a file-based runtime mechanism:

ContextBuilder.build_system_prompt() injects MemoryStore.get_memory_context() into the system prompt.
MemoryStore stores memory/MEMORY.md as full long-term markdown and memory/HISTORY.md as a grep-searchable log.
Session.get_history() keeps unconsolidated JSONL messages in context until last_consolidated.
MemoryConsolidator.maybe_consolidate_by_tokens() only summarizes when the prompt approaches the token budget, then asks an LLM to rewrite the whole markdown memory.

This is closest to the rolling-summary baseline, not to Honcho/mem0/Graphiti/MemOS. If implemented, call it rolling_summary or nanobot_file_summary as a controlled baseline condition. Do not include it in architecture-diversity claims, and do not let its native session-history behavior accidentally give it extra context that the real backends do not receive.

Fairness rule: every condition still enters Nanobot through the same prompt-context slot and the same simulator/test harness. Controlled providers can satisfy the same minimal methods (record_event, retrieve, format_context, set_mode) so the harness has no per-condition special cases, but traces must label them as condition_kind=control rather than condition_kind=architecture.

3. Scope 和 ID 映射

公共 scope：

MemoryScope:
    run_id: str
    persona_id: str
    backend_namespace: str | None
    agent_id: str | None

映射原则：

Nanobot/MemPA	Honcho	mem0	Graphiti	MemOS
run	workspace or workspace namespace	`run_id` + collection/namespace	`group_id` prefix	cube namespace
persona/user	user peer	`user_id`	`group_id` suffix / metadata	`user_id`, cube id
assistant	assistant peer	optional `agent_id` or default Nanobot actor	source metadata	optional `agent_id` / tags
session	session	`run_id` or metadata	episode name/source	`session_id`
turn timestamp	message created_at	metadata timestamp	`reference_time`	request metadata
scenario/context/attribute	metadata / message metadata	filters metadata	source_description / metadata	custom_tags/info

MemOS graph DB isolation needs one extra implementation choice: when use_multi_db=True, each user gets physical DB isolation; when use_multi_db=False, users share a DB and are separated by user_name/user tag logic. The adapter should surface this as native MemOS config, not as a universal isolation feature.

Isolation invariant:

No retrieve call for scope A may return data from scope B unless the backend config explicitly enables sharing.

4. Runtime Flow

4.0 Allowed Adapter State

Adapter 可以保存的状态只限于运行 glue：

Allowed	Not allowed
backend config hash	normalized universal preference objects
scope-to-native-id mapping	cross-backend memory schema
consistency/readiness markers	adapter-owned embeddings or reranker scores
benchmark trace ids	adapter-generated summaries replacing backend summaries

换句话说，adapter 可以记住“Nanobot 的 user_a/run_001 对应 Honcho peer X、Graphiti group Y、MemOS cube Z”，但不维护一份独立的偏好数据库。

4.1 Accumulation Turn

Nanobot receives/sends turn
  -> build MemoryEvent
  -> adapter.record_event(event)
  -> backend native write path
  -> WriteReceipt(status=committed|queued|pending, native_ids, latency)

Adapters may return queued or pending; Nanobot must not assume all memory systems are immediately consistent.

4.2 Test Turn

Nanobot must use the latest memory state available at that point in the timeline to choose an interaction strategy, act and interact with the simulator and be judged by the resulting interaction outcome.

Benchmark reaches a test insertion point, e.g. before accumulation session 15
  -> wait for adapter readiness for all prior accumulation writes
  -> adapter.set_mode(read_only=True, reason="test_session")
  -> simulator starts a normal test scenario with current user setting
  -> Nanobot builds RetrieveRequest(query, scope, budget, scenario/context/attribute)
  -> adapter.retrieve(...)
  -> adapter.format_context(result)
  -> Nanobot chooses interaction strategy / response / action
  -> simulator produces user-side response and outcome
  -> judge scores Nanobot behavior against the current user setting
  -> adapter.set_mode(read_only=False, reason="accumulation")
  -> continue the accumulation timeline

Read-only test sessions are controlled by the benchmark harness, not by ad hoc per-call flags. Before entering a test session, the harness calls:

await adapter.set_mode(AdapterMode(read_only=True, reason="test_session"))

While read_only=True, record_event must either no-op with a skipped_read_only receipt or store only non-retrievable audit traces outside the backend memory namespace. This applies to the test user messages, Nanobot responses, simulator events, and judge artifacts. They are evaluation records, not new long-term memory evidence. After the test session, the harness switches back:

await adapter.set_mode(AdapterMode(read_only=False, reason="accumulation"))

This keeps behavioral probes from contaminating later memory and avoids each Nanobot call site inventing its own test flag. RetrieveResult.raw, Trace, and native diagnostics are only used for debugging, attribution, and failure analysis; they are not the scoring target.

After the 36 test sessions complete, the adapter scope is not reset. Test results are independent, but memory state should remain at the final accumulated state for post-run analysis. reset_scope is called only at the start of a new benchmark run or during explicit smoke/reset tests.

4.3 Backend Swap Flow

To switch from one backend to another:

Freeze the run identity: run_id, persona_id, and optional agent_id. Keep seed/replicate metadata in the harness run manifest, not in the adapter scope unless isolation requires it.
Change only memory.backend in Nanobot/MemPA config.
Provide only that backend’s native config block.
Instantiate through MemoryAdapterRegistry.
Run adapter.health().
Run setup_scope(scope).
Run capability discovery: supports_feedback, supports_bulk_ingest, supports_readiness, supports_native_mutation, supports_provenance, backend-native memory/ingest modes, consistency_model, native_query_modes.
Run an isolation smoke test: write one synthetic event under scope A, assert scope B cannot retrieve it.
Run a write/read smoke test using backend-native semantics.
Honcho-specific smoke gate: after writing one message, poll queue_status or equivalent pending-state endpoint until deriver work is empty or timeout; then test retrieve. If it times out, mark retrieval consistency as pending rather than passing a false green smoke test.
Start accumulation/test loop.
Store run manifest: backend name, native config hash, capability flags, scope mapping, harness seed/replicate metadata.

No Nanobot agent-loop code should change during this flow.

Example config shape:

memory:
  backend: mem0
  scope:
    run_id: user_a_c1
    persona_id: user_a
    agent_id: null
  read_only_test_sessions: true
  backends:
    mem0:
      top_k: 12
      threshold: 0.1
      rerank: true
    graphiti:
      group_id_template: "{run_id}:{persona_id}"
      search_config: combined_hybrid
    honcho:
      workspace_template: "{run_id}"
      context_tokens: 8000
    memos:
      writable_cube_template: "{run_id}:{persona_id}"
      mode: mixture
      async_mode: async

5. Backend Mappings

5.1 Honcho Adapter

Native model:

Workspace
  -> Peer(user), Peer(assistant)
  -> Session
  -> peer-labeled Messages
  -> background deriver/dreamer
  -> context, representation, peer card, peer chat

Write path:

record_event
  -> resolve workspace_id
  -> resolve user peer and assistant peer
  -> resolve session_id
  -> session.add_messages([peer.message(...)])

Read path options:

Runtime default: session.context(summary=True, tokens=request.budget.max_tokens, peer_target=user_peer, search_query=request.query, search_top_k=...).
Supplemental user model context: session.representation(user_peer) or peer.context(...), only if configured and included within request.budget.
Diagnostic/native query only: peer.chat(query) belongs behind SupportsNativeQuery because it triggers another LLM call and changes benchmark comparability.
Search fallback: peer.search / session.search, only if session.context is unavailable or configured off.

CS mapping: scenario/context/attribute should be passed as message/session metadata where Honcho supports it and included in search_query or context request only as native parameters allow. If Honcho does not expose field-level metadata filters for context, the adapter must record that CS filtering is semantic/contextual rather than hard-filtered.

PET observability: Honcho PET is weak unless representation/context exposes updated_at, last_derived_at, or an equivalent version. The adapter must trace any exposed derivation timestamp/version and must not invent fact-level timestamps inside prose representation.

For benchmark reproducibility, the adapter config must pin exactly one runtime read strategy. The recommended default is session.context; peer.chat must not be used in the agent loop unless the experiment explicitly evaluates Honcho’s native chat endpoint.

Performance preservation:

Keep deriver/dreamer asynchronous; do not force synchronous representation refresh per turn.
Keep peers separate. User and assistant must not be collapsed into one actor.
Use Honcho SDK/REST, not direct DB writes.
Use token budget controls rather than dumping all messages.
Accept eventual consistency and expose pending queue status where available.
Pass request.budget.max_tokens into Honcho retrieval, so Honcho performs native context selection instead of adapter-side over-fetching.

Limits:

Exact snapshot/restore is not a portable Honcho capability.
PET historical query is weak unless Honcho exposes time-scoped reads; adapter should not fake temporal state.
Fresh writes may not immediately appear in representation/context until background derivation catches up; smoke tests and run traces must record this consistency state.

5.2 mem0 Adapter

Native model:

messages
  -> Memory.add(infer=True)
  -> LLM extraction
  -> memory item + entity linking + vector/BM25 indexes
  -> Memory.search(...)

Write path:

record_event
  -> Memory.add(
       messages=event.messages,
       user_id=scope.persona_id,
       agent_id=scope.agent_id,  # optional; omit if null
       run_id=scope.run_id,
       metadata={scenario, context, attribute, timestamp, ...},
       infer=True
     )

Read path:

retrieve
  -> Memory.search(
       query=request.query,
       filters={user_id, agent_id?, run_id, scenario, context, attribute},
       top_k=request.budget.max_items,
       threshold=...,
       rerank=...
     )

CS mapping: user_id and run_id are mandatory isolation filters; scenario, context, and attribute are mandatory filters for CS probes when present in the request. If a CS probe intentionally tests cross-context recall, the harness must mark that in request.filters rather than silently omitting filters.

PET observability: search results must preserve native memory id, score, and created_at/updated_at from result metadata when available. format_context may omit some details for prompt compactness, but RetrieveResult.raw and trace must keep them for PET analysis.

Performance preservation:

Never upsert directly into vector store.
Never replace Memory.search with get_all plus Python filtering.
Preserve hybrid search, entity boosting, rerank, threshold, and metadata filters as backend config.
Keep user_id/agent_id/run_id isolation.

Limits:

mem0’s natural unit is extracted fact memory; it is less native for full session representation than Honcho.
Exact restore depends on whether we can replay memories with infer=False or clone the backing store.

5.3 Graphiti Adapter

Native model:

episode(reference_time, group_id)
  -> entities + facts
  -> temporal edges with valid_at/invalid_at
  -> hybrid semantic/keyword/graph search

Write path:

record_event
  -> Graphiti.add_episode(
       name=f"{session_id}:{turn_id}",
       episode_body=serialized messages,
       source_description=scenario/context/attribute/session metadata,
       reference_time=event.timestamp,
       source=EpisodeType.message,
       group_id=f"{run_id}:{persona_id}"
     )

EpisodeType.fact_triple is available for pre-extracted fact triples, but the default Nanobot path should use EpisodeType.message unless an experiment explicitly wants to bypass Graphiti’s extraction from conversational evidence. Custom entity/edge types and fact-triple ingestion should be declared in BackendHealth.native_ingest_modes and selected through Graphiti adapter config; they should be available for explicit graph-schema ablations, not modeled as a shared adapter schema.

Read path:

retrieve
  -> Graphiti.search(query, group_ids=[...], num_results=request.budget.max_items)
  or Graphiti.search_(query, config=SearchConfig(...))

CS mapping: group_id is the hard isolation boundary. Scenario/context/attribute should be included in episode source_description and in the semantic query text for CS probes unless Graphiti search config exposes a native metadata filter in the selected implementation. The adapter must record whether CS filtering was hard-filtered or semantic-only.

PET observability: preserve fact/edge valid time, invalid time, provenance episode id, and relevance score in RetrieveResult.raw. This is Graphiti’s main advantage for PET and should not be flattened away before scoring.

Performance preservation:

Always pass timezone-stable reference_time.
Do not express preference evolution as destructive delete; let Graphiti invalidate old facts.
Do not directly write graph nodes/edges.
Use group_id partitioning.
Use search/search_ and Graphiti’s search config/reranker; do not dump the graph and rerank in adapter code.
For bulk benchmark preload, use add_episode_bulk or queueing.

Limits:

Graphiti has no built-in user/session management; Nanobot must own scope mapping.
Formatter must keep fact, time, and provenance visible enough for PET analysis.
reset_scope for Graphiti is backend-specific/best-effort. There is no single high-level delete_all(group_id) API; implementation may need driver-level Cypher/graph operations to delete a group partition, or use a fresh group_id per run as the safer reset strategy.

5.4 MemOS Adapter

Native model:

User / project
  -> one or more MemCubes
  -> text / preference / tool / skill memories
  -> scheduler, feedback, graph/vector/fulltext stores

Write path:

record_event
  -> APIADDRequest(
       user_id=...,
       session_id=event.session_id,
       messages=...,
       writable_cube_ids=...,
       custom_tags=[scenario, context, attribute],
       info=metadata,
       async_mode=config.async_mode,
       mode=config.mode
     )
  -> add_memories(...)

Read path:

retrieve
  -> APISearchRequest(
       user_id=...,
       query=...,
       session_id=request.session_id,
       readable_cube_ids=readable cubes,
       filter={tags/scenario/context/attribute according to MemOS filter syntax},
       top_k=request.budget.max_items,
       mode=fast|fine|mixture,
       include_preference=True,
       search_tool_memory=...
     )
  -> search_memories(...)

CS mapping: scenario/context/attribute are written through APIADDRequest.custom_tags. Current APISearchRequest does not expose a custom_tags field; retrieve must express those tags through the native filter field or the selected MemOS search filter syntax. If the selected MemOS path cannot hard-filter on tags, the adapter must record CS filtering as soft/semantic rather than pretending it was hard-filtered. Cube routing remains native MemOS config and should not be replaced by adapter-side filtering.

PET default: benchmark PET should use accumulation-only memory evolution through add_memories. apply_feedback is reserved for a separately labeled feedback-enabled experiment group, because it changes the intervention being evaluated.

Typed memory exposure: MemOS memory groups should be preserved using native labels, including cube-level memory groups such as text_mem and result metadata types such as LongTermMemory, UserMemory, PreferenceMemory, ToolSchemaMemory, ToolTrajectoryMemory, and SkillMemory when exposed. text/preference/tool/skill is a conceptual grouping for the adapter doc, not a new enum. Other backends should not be asked to emulate these memory types.

Source check: current APIADDRequest, APISearchRequest, and APIFeedbackRequest use session_id for add/search/feedback. conversation_id appears as a backward-compatible alias on delete-related request models, so the adapter should map Nanobot session_id to MemOS session_id for the main runtime path and only use conversation_id for APIs that explicitly require it.

Feedback path:

apply_feedback
  -> APIFeedbackRequest(...)
  -> feedback_memories(...)

Performance preservation:

Preserve async_mode; do not force all writes to synchronous mode.
Preserve mode=fast/fine/mixture, dedup, relativity, and top_k as config.
Preserve memory type separation in formatted context.
Preserve cube routing; do not collapse all runs into one flat user namespace.
Tool calls should remain tool-memory-compatible records, not plain text.

Limits:

MemOS exposes more native entities than the common adapter. The common interface should not force other backends to mimic cube/type/feedback semantics.
Snapshot/restore depends on cube registry and physical stores.

6. Context Formatting

format_context must output a stable section for the agent prompt, but backend-specific contents can differ:

<memory-context backend="graphiti" scope="...">
...
</memory-context>

Formatting rules:

Keep source labels: backend, score if available, timestamp if available, native id if available.
Keep PET-relevant temporal fields for Graphiti and timestamped sources for others.
Keep MemOS memory type groups separate.
Keep Honcho representation/context as prose, not forced into fake scored bullets.
Enforce token budget in the formatter, not by disabling backend retrieval algorithms.
For cross-backend CS/PET/IQI evaluation, the harness should pass a shared target token range through ContextBudget (min_tokens/max_tokens or equivalent). The formatter should fit that range when enough native evidence is available; if a backend returns less evidence, trace the shortfall rather than padding with invented content.

7. Capability Matrix

Capability	Honcho	mem0	Graphiti	MemOS
Write interaction	session messages	`Memory.add`	`add_episode`	`add_memories`
Search/retrieve	context/search/chat	`Memory.search`	`search/search_`	`search_memories`
User isolation	workspace/peer/session	user/agent/run filters	`group_id`	user/cube
Native evolution	deriver/dreamer derived state	additive extracted facts/entity links	temporal invalidation	feedback/cube memory evolution
Async path	deriver/dreamer	mostly sync API, backend dependent	async graph ingestion API	scheduler async mode
Bulk ingest	`add_messages`	multi-message `add`	`add_episode_bulk` / queue	batched add path if exposed
Readiness	deriver/dreamer pending	usually immediate/backend dependent	async ingestion/queue	scheduler async status
Native mutation	object lifecycle if exposed	update/delete/history/delete_all	delete episode/edge/admin clear where available	feedback/delete/cube lifecycle
Provenance	message/session/representation metadata	memory id/score/timestamps	fact/edge/time/provenance	cube/type/source/relativity
Native memory types	none	none	none	cube memory groups plus `memory_type` labels: text/preference/tool/skill and exact native labels when exposed
Native ingest modes	message/session	messages/infer config	message/json/text/fact_triple + custom entity/edge schema	mode/cube/type-aware add
Exact snapshot	not common	not common	not common	not common
Feedback API	not common	update/delete, not feedback-first	new episode/correction evidence	native feedback
Temporal query	weak	weak	strong	partial/backend dependent

8. Implementation Layout

Suggested repo shape:

memory/
  base.py
  events.py
  scope.py
  registry.py
  formatting.py
  backends/
    honcho_adapter.py
    mem0_adapter.py
    graphiti_adapter.py
    memos_adapter.py
  conditions/
    no_memory.py
    rolling_summary.py
    full_history.py
    static_profile.py
    oracle_ipas.py
  tests/
    contract/
      test_isolation.py
      test_record_retrieve.py
      test_read_only_test_sessions.py

registry.py loads a backend by config:

memory:
  backend: graphiti
  scope:
    run_id: user_a_c1
  backends:
    graphiti:
      driver: neo4j
      group_id_template: "{run_id}:{persona_id}"
      search_mode: hybrid

Only the selected backend config is required at runtime.

Controlled condition providers live beside backend adapters because they share the Nanobot-facing prompt-context slot, not because they are memory architectures. Their config should be selected by experiment condition:

memory:
  condition: rolling_summary
  scope:
    run_id: user_a_c1
    persona_id: user_a
  conditions:
    rolling_summary:
      max_tokens: 8000
      update_policy: after_accumulation_session

For architecture runs, memory.backend selects Honcho/mem0/Graphiti/MemOS. For control/ceiling runs, memory.condition selects a controlled provider. The manifest should record both fields and enforce that exactly one runtime context provider is active.

9. Contract Tests

Required tests for every adapter:

record_event accepts a two-message user/assistant turn and returns a receipt.
retrieve under the same scope can find relevant context after backend-specific consistency/readiness wait.
Scope A data does not appear under scope B.
Read-only simulator test sessions do not write test user messages, Nanobot responses, simulator events, or judge artifacts into backend memory.
retrieve passes budget into native backend retrieval where supported, and format_context contains backend label without over-fetching-and-trimming as the primary retrieval strategy.
Reset only clears the configured benchmark scope.
Backend-native errors are normalized into adapter errors without hiding raw diagnostics.
RetrieveResult.raw preserves native provenance needed by PET/IQI: ids, scores when available, timestamps when available, and backend-specific type/provenance labels.
CS probes pass scenario/context/attribute through the backend-specific filter or semantic mapping documented for that adapter.
Test sessions run through simulator and judge behavior outcomes; they do not call reset_scope, and reset is limited to new benchmark runs and explicit reset tests.
Controlled condition providers use the same read-only test-session gate as backend adapters and cannot write test-session turns into prompt-retrievable state.

Backend-specific tests:

Honcho: user and assistant peers remain distinct; deriver pending state is visible.
mem0: infer=True creates extracted memory; filters include user_id/run_id.
Graphiti: reference_time is non-null; changed facts keep temporal metadata; custom entity/edge schema and EpisodeType.fact_triple are available only through explicit native-ingest config.
MemOS: async mode eventual retrieval works; memory type groups are preserved; CS tags written through custom_tags are retrieved through native filter when supported.

9.1 Source Anchors Checked

本设计至少对齐以下源码入口：

Backend	Checked source anchors
Honcho	`sdks/python/src/honcho/session.py`: `Session.add_messages`, `Session.context`, `Session.search`; `sdks/python/src/honcho/peer.py`: `Peer.chat`, `Peer.search`, `Peer.representation`, `Peer.context`
mem0	`mem0/memory/main.py`: `Memory.add`, `Memory.search`, `Memory.get_all`, `Memory.update`, `Memory.delete`; add/search both enforce `user_id/agent_id/run_id` scoping
Graphiti	`graphiti_core/graphiti.py`: `Graphiti.add_episode`, `add_episode_bulk`, `search`, `search_`; `add_episode` includes `reference_time`, `group_id`, custom entity/edge types, saga support
MemOS	`src/memos/api/product_models.py`: `APIADDRequest`, `APISearchRequest`, `APIFeedbackRequest`; `src/memos/multi_mem_cube/single_cube.py` and `composite_cube.py`: `add_memories`, `search_memories`, `feedback_memories`; graph DB isolation config uses `use_multi_db` plus logical `user_name` separation

这些 anchors 不是实现 API 的完整列表，只是防止设计偏离当前源码的最低依据。

9.2 Completion Checklist

User requirement	Design evidence
四个记忆架构尽可能解耦地通过 adapter 接进 Nanobot	Sections 1-5 define event-centered contract, scope mapping, registry/config flow, and per-backend adapters without shared backend internals
保证四个架构本身性能不因合成失去原设计	Each backend section has a `Performance preservation` block preserving native write/search/async/index/rerank/cube/temporal semantics
如无必要，勿增实体，不添加四个 architecture 不支持的功能	Section 4.0 restricts adapter state; Section 10 rejects universal preference entity and common exact snapshot
更换某一个 memory 架构时能清晰 follow 整个 flow	Section 4.3 gives the 12-step backend swap flow plus config shape
GAME.md §4.6/§4.7 的 controlled conditions 不误并入架构比较	Section 2.2 separates control/ceiling providers from architecture adapters and assigns Nanobot native file memory to the rolling-summary baseline
用 agent-collab 与 Claude 协作并做 2-3 轮辩证/批评	TASK-002 contains three Codex rounds plus Claude Review; section 9.4 records the review resolution. Prior Claude corrections from TASK-001 about Graphiti `fact_triple` and MemOS `use_multi_db` are also incorporated.

9.3 Collaboration Status

agent-collab task:

TASK-002: Memory Adapter design for Nanobot backends

Codex posted three review rounds:

Round 1: event-oriented adapter contract, native write paths, backend swap flow, snapshot concerns.
Round 2: source-confirmed API details and self-critique about runtime contract vs benchmark harness.
Round 3: allowed adapter state, no fifth memory architecture, expanded backend swap flow.

Claude review status:

Review requests were sent to both Claude and claude inboxes.
Claude Review was later found in .agent-collab/tasks/TASK-002.md and incorporated.
The final design includes three Codex rounds, one Claude review round with concrete blockers, and a Codex response to that review. See section 9.4.

9.4 Claude Review Resolution

Claude reviewed TASK-002 and raised six concrete issues. Resolution:

Claude critique	Resolution in this document
Token budget must flow into `retrieve`, not only `format_context`	Section 2 now makes `RetrieveRequest.budget` explicit; Honcho section passes `request.budget.max_tokens` to `session.context(tokens=...)`; mem0/Graphiti/MemOS map `budget.max_items` into native top-k/num_results
Honcho retrieve listed four paths without default	Section 5.1 now pins `session.context` as runtime default, allows representation/context as configured supplement, and moves `peer.chat` behind `SupportsNativeQuery`
Protocol sync/async unclear	Section 2 now defines async-first adapter methods and says sync backends are wrapped inside adapter
MemOS field mismatch	Rechecked source: add/search/feedback models use `session_id`; delete model has `conversation_id` alias. Section 5.4 now states this explicitly and keeps runtime mapping to `session_id`
Read-only test mode source unclear	Section 4.2 now assigns control to the benchmark harness through `adapter.set_mode(read_only=True/False)`
Graphiti `reset_scope` lacks native high-level clear API	Section 5.3 now marks Graphiti reset as backend-specific/best-effort and recommends fresh `group_id` per run when safer
Honcho smoke test should wait for deriver queue	Section 4.3 now adds a Honcho-specific queue/pending-state smoke gate

Claude also agreed with the event-centered contract, capability protocols, no universal preference entity, no common exact snapshot, allowed adapter state, and trace consistency model.

9.5 TASK-003 Evaluation-Fitness Update

TASK-003 reviewed the design specifically against MemPABench CS, PET, and IQI scoring. The resulting changes are:

Added harness-facing optional capabilities: SupportsBulkIngest, SupportsReadiness, SupportsNativeMutation, and SupportsProvenance.
Removed SupportsSnapshot from the protocol list; fresh scope plus run manifest is the benchmark default.
Added backend-native extension discovery fields in BackendHealth for MemOS memory types and Graphiti ingest/schema modes without turning them into shared schema; corrected MemOS search mapping to use APISearchRequest.filter, not a nonexistent search-side custom_tags field.
Added PET trace fields for retrieved counts, score, timestamp range, and Honcho derivation/version metadata when exposed.
Added per-backend CS mapping for scenario/context/attribute filters or semantic-only retrieval.
Fixed MemOS PET default to accumulation-only; feedback is a separate experiment group.
Clarified that test sessions are read-only simulator-facing behavioral probes, not memory inspection probes; they do not reset memory state after completion.
Added GAME.md §4.6/§4.7 condition boundary: Nanobot native file memory is a rolling-summary/control implementation option, not a fifth architecture under test.

10. Finalized Evaluation Conventions

These points are no longer open questions; they are part of the adapter design:

Nanobot receives one formatted prompt context string for runtime behavior, while MemPABench stores structured retrieval evidence separately. Every retrieve call should return both RetrieveResult.raw and formatted_context so the agent loop stays simple and the harness can still analyze PET/IQI failures.
Read-only simulator test sessions must not write test user messages, Nanobot responses, simulator events, judge artifacts, or final judged responses into backend memory. If these records are needed, store them as non-retrievable evaluation artifacts outside the backend memory namespace.
Native query is allowed only for diagnostics, not in the Nanobot agent loop. Allowed diagnostic examples include Graphiti group edge/fact counts, Honcho raw peer/session representation inspection, mem0 memory count/history checks, and MemOS cube/type inventory. Native query results must be logged as diagnostics and never injected into the Nanobot prompt.

MemPA Wiki

Explorer

Nanobot_memory_adapter_design

Nanobot Memory Adapter Design

0. 目标

1. 共同抽象：Memory Event，不是 Memory Item

2. Adapter Contract

2.1 Runtime Contract vs Benchmark Harness

2.2 Experimental Conditions vs Memory Architectures

3. Scope 和 ID 映射

4. Runtime Flow

4.0 Allowed Adapter State

4.1 Accumulation Turn

4.2 Test Turn

4.3 Backend Swap Flow

5. Backend Mappings

5.1 Honcho Adapter

5.2 mem0 Adapter

5.3 Graphiti Adapter

5.4 MemOS Adapter

6. Context Formatting

7. Capability Matrix

8. Implementation Layout

9. Contract Tests

9.1 Source Anchors Checked

9.2 Completion Checklist

9.3 Collaboration Status

9.4 Claude Review Resolution

9.5 TASK-003 Evaluation-Fitness Update

10. Finalized Evaluation Conventions

Graph View

Table of Contents