Nanobot Memory Adapter Design

0. 目标

这个 adapter 的目标不是把 Honcho、mem0、Graphiti、MemOS 改造成同一种 memory system,而是给 Nanobot 一个可替换的 memory backend 插槽:

Nanobot agent loop
  -> MemoryAdapter contract
  -> Honcho | mem0 | Graphiti | MemOS native client/API

设计约束:

  1. Nanobot core 只依赖 adapter contract,不依赖任何 backend SDK 的对象模型。
  2. Adapter 只做 ID 映射、请求构造、结果格式化、生命周期控制,不重写 backend 的抽取、检索、推理、调度或图更新算法。
  3. 公共接口以 Nanobot 运行时事件为中心,而不是以统一 MemoryItem 为中心。
  4. 四个 backend 支持什么,adapter 才暴露什么;不为了接口整齐而虚构 backend 没有的能力。

1. 共同抽象:Memory Event,不是 Memory Item

四个架构的原生单位不同:

Backend原生写入单位原生检索/上下文单位
Honchopeer-labeled session messagesession context, peer context, representation, peer chat
mem0messages -> extracted memory itemshybrid ranked memory results
Graphitiepisode with reference_timetemporal facts/entities/edges with provenance
MemOSAPI request into one or more MemCubesgrouped text/pref/tool/skill memories

因此公共层只定义 Nanobot 事件:

MemoryEvent:
    event_id: str
    run_id: str
    user_id: str
    agent_id: str
    session_id: str
    turn_id: str
    timestamp: datetime
    messages: list[{role, content, name?, tool_call_id?}]
    scenario: str | None
    context: "work" | "personal" | str | None
    attribute: str | None
    metadata: dict

Adapter 必须保留这些字段作为可追踪 metadata,但不要求每个 backend 都以同样方式存储它们。

2. Adapter Contract

最低公共接口建议采用 async-first。原因是 Graphiti 的核心写入是 async def add_episode,Honcho SDK 有 async client,MemOS 支持 async scheduler;把 contract 做成 async 可以避免 Nanobot 调用方在不同 backend 间切换时还要知道何时 await。同步 SDK 后端由 adapter 内部用 sync client 或 asyncio.to_thread 包装。

class MemoryAdapter(Protocol):
    backend_name: str
 
    async def setup_scope(self, scope: MemoryScope) -> None: ...
    async def set_mode(self, mode: AdapterMode) -> ModeReceipt: ...
    async def record_event(self, event: MemoryEvent) -> WriteReceipt: ...
    async def retrieve(self, request: RetrieveRequest) -> RetrieveResult: ...
    def format_context(self, result: RetrieveResult) -> str: ...
    async def reset_scope(self, scope: MemoryScope) -> ResetReceipt: ...
    async def health(self) -> BackendHealth: ...

setup_scope 的含义是“创建或检查 backend 原生隔离资源”,不是创建新的 memory 实体。例如 Honcho 中是 workspace/peer/session 的存在性检查,mem0 中是确认 filters/collection namespace 可用,Graphiti 中是确认 group_id 和 driver 可用,MemOS 中是确认 cube registry/readable/writable cube 配置可用。

建议扩展接口按 capability 暴露。这里的 capability 服务 benchmark harness 和 backend-native ablation,不进入 Nanobot 主循环的最低合同:

class SupportsFeedback(Protocol):
    async def apply_feedback(self, feedback: MemoryFeedback) -> FeedbackReceipt: ...
 
class SupportsBulkIngest(Protocol):
    async def record_events(self, events: list[MemoryEvent]) -> list[WriteReceipt]: ...
 
class SupportsReadiness(Protocol):
    async def readiness(self, scope: MemoryScope) -> BackendReadiness: ...
    async def wait_until_ready(self, scope: MemoryScope, timeout_s: float) -> BackendReadiness: ...
 
class SupportsNativeMutation(Protocol):
    async def native_mutation(self, request: NativeMutationRequest) -> NativeMutationReceipt: ...
 
class SupportsProvenance(Protocol):
    async def provenance(self, request: ProvenanceRequest) -> ProvenanceResult: ...
 
class SupportsNativeQuery(Protocol):
    async def native_query(self, request: NativeQueryRequest) -> NativeQueryResult: ...

Backend-native extension metadata belongs in capability discovery, not in separate property-only protocols:

BackendHealth:
    status: "ok" | "degraded" | "unavailable"
    backend_name: str
    latency_ms: float
    consistency_model: "immediate" | "eventual" | "async_derived"
    native_memory_types: list[str] | None
    native_ingest_modes: list[str] | None
    warnings: list[str]

native_memory_types and native_ingest_modes are declarations for the harness and run manifest. They are not shared behavior contracts. If Graphiti uses fact_triple or custom entity/edge types, that behavior is selected through Graphiti adapter config, not through a universal Nanobot method.

关键点:

  • SupportsBulkIngest is for benchmark preload speed. It must still call native write APIs such as Honcho add_messages, Graphiti add_episode_bulk, mem0 multi-message add, or MemOS add paths; it must not do adapter-side fact extraction.
  • SupportsReadiness is for fair measurement after async ingestion/derivation. Honcho may expose deriver/dreamer pending state, Graphiti may expose queue/awaited ingestion state, MemOS may expose scheduler state, and mem0 may report immediate committed state.
  • SupportsNativeMutation is for reset/debug/explicit correction experiments. It is not used in the Nanobot main loop unless the scenario explicitly includes correction feedback.
  • SupportsProvenance is for historical provenance queries independent of a specific retrieve call, such as Graphiti facts invalidated in a time range or the source episode for a preference. Per-retrieve provenance belongs in RetrieveResult.raw and Trace.
  • BackendHealth.native_memory_types records MemOS native memory grouping. The adapter may summarize this as text/preference/tool/skill, but traces should keep exact native labels such as text_mem, PreferenceMemory, ToolSchemaMemory, ToolTrajectoryMemory, or SkillMemory when exposed. Other backends must not fake these types.
  • BackendHealth.native_ingest_modes records Graphiti custom entity/edge support and EpisodeType.fact_triple availability. The default Nanobot path remains message episodes; fact triples are selected only by explicit Graphiti adapter config for pre-extracted or ablation experiments.
  • snapshot/restore is intentionally not a protocol for this benchmark. Use fresh scopes plus run manifests; exact checkpointing is backend-specific storage work.
  • native_query 只用于实验调试和 backend 专有评估,不进入 Nanobot 主循环。
  • RetrieveRequest must include ContextBudget. Some backends, especially Honcho, need the token budget during native retrieval (session.context(tokens=...)), not only during final string formatting.
  • format_context 属于 adapter,因为不同 backend 的结果结构不同;Nanobot 只接收最终 prompt context。它只能做最后的 prompt shaping,不能替代 backend 的原生 top-k/token-budget retrieval。
RetrieveRequest:
    query: str
    scope: MemoryScope
    session_id: str
    budget: ContextBudget
    scenario: str | None
    context: str | None
    attribute: str | None
    filters: dict

2.1 Runtime Contract vs Benchmark Harness

需要分清两层:

负责内容不应负责
Nanobot runtime adapter写入 accumulation 事件、在 accumulation/test 中检索上下文、格式化 prompt、scope reset、healthsimulator、judge、指标计算、跨 backend 对齐分析
MemPABench harnesssimulator/test-session orchestration、run manifest、latency/token/retrieval trace、isolation smoke test、read-only test-session 控制、outcome judgingbackend 内部抽取/检索算法

这样做的原因是:Nanobot 需要一个稳定 memory 插槽;MemPABench 还需要评测可复现性和可观测性。两者共享 adapter,但不把评测逻辑塞进 agent loop。

建议每次 record_eventretrieve 都返回结构化 trace:

Trace:
    backend_name: str
    scope: MemoryScope
    native_operation: str
    native_ids: list[str]
    latency_ms: float
    consistency: "committed" | "queued" | "pending"
    token_count: int | None
    retrieved_count: int | None
    top_score: float | None
    oldest_retrieved_at: datetime | None
    newest_retrieved_at: datetime | None
    representation_version: str | None
    last_derived_at: datetime | None
    warnings: list[str]

Trace 供 benchmark 记录,不进入 Nanobot prompt,除非显式开启 debug。retrieved_counttop_scoreoldest_retrieved_atnewest_retrieved_atRetrieveResult.raw 聚合而来;没有原生 score/timestamp 的 backend 填 null,不能编造。Honcho 如果暴露 representation/context 的 version 或 updated time,应填 representation_version / last_derived_at;否则 PET observability 标记为 weak。Trace 是行为结果的解释材料,不是主评分对象;CS/PET/IQI 主评分看 simulator 中 Nanobot interaction strategy / response / action 是否符合当前 user setting。

2.2 Experimental Conditions vs Memory Architectures

GAME.md §4.6 includes controlled conditions that are not memory architectures:

ConditionAdapter treatmentArchitecture comparison?
No memory, anonymized/namedretrieve returns empty context; record_event only writes non-retrievable run traces if neededNo
Rolling summarycontrolled summary provider; may reuse Nanobot’s file-memory/consolidation implementation, but it is labeled as a baselineNo
Full-history long-contextpacks prior accumulation transcripts into the prompt budget in chronological orderNo
Simple RAGmem0 adapter with native infer=False configAblation of mem0 extraction, not a new architecture
Static profile promptfixed persona/profile text provider, no accumulation writesCeiling/control
Oracle IPaS matrixfixed structured GT preference provider, no accumulation writesCeiling/control
Honcho/mem0/Graphiti/MemOSbackend adapters preserving native write/read semanticsYes

Nanobot’s current native memory should not be presented as a fifth memory architecture. Source check in nanobot-ref shows it is a file-based runtime mechanism:

  1. ContextBuilder.build_system_prompt() injects MemoryStore.get_memory_context() into the system prompt.
  2. MemoryStore stores memory/MEMORY.md as full long-term markdown and memory/HISTORY.md as a grep-searchable log.
  3. Session.get_history() keeps unconsolidated JSONL messages in context until last_consolidated.
  4. MemoryConsolidator.maybe_consolidate_by_tokens() only summarizes when the prompt approaches the token budget, then asks an LLM to rewrite the whole markdown memory.

This is closest to the rolling-summary baseline, not to Honcho/mem0/Graphiti/MemOS. If implemented, call it rolling_summary or nanobot_file_summary as a controlled baseline condition. Do not include it in architecture-diversity claims, and do not let its native session-history behavior accidentally give it extra context that the real backends do not receive.

Fairness rule: every condition still enters Nanobot through the same prompt-context slot and the same simulator/test harness. Controlled providers can satisfy the same minimal methods (record_event, retrieve, format_context, set_mode) so the harness has no per-condition special cases, but traces must label them as condition_kind=control rather than condition_kind=architecture.

3. Scope 和 ID 映射

公共 scope:

MemoryScope:
    run_id: str
    persona_id: str
    backend_namespace: str | None
    agent_id: str | None

映射原则:

Nanobot/MemPAHonchomem0GraphitiMemOS
runworkspace or workspace namespacerun_id + collection/namespacegroup_id prefixcube namespace
persona/useruser peeruser_idgroup_id suffix / metadatauser_id, cube id
assistantassistant peeroptional agent_id or default Nanobot actorsource metadataoptional agent_id / tags
sessionsessionrun_id or metadataepisode name/sourcesession_id
turn timestampmessage created_atmetadata timestampreference_timerequest metadata
scenario/context/attributemetadata / message metadatafilters metadatasource_description / metadatacustom_tags/info

MemOS graph DB isolation needs one extra implementation choice: when use_multi_db=True, each user gets physical DB isolation; when use_multi_db=False, users share a DB and are separated by user_name/user tag logic. The adapter should surface this as native MemOS config, not as a universal isolation feature.

Isolation invariant:

No retrieve call for scope A may return data from scope B unless the backend config explicitly enables sharing.

4. Runtime Flow

4.0 Allowed Adapter State

Adapter 可以保存的状态只限于运行 glue:

AllowedNot allowed
backend config hashnormalized universal preference objects
scope-to-native-id mappingcross-backend memory schema
consistency/readiness markersadapter-owned embeddings or reranker scores
benchmark trace idsadapter-generated summaries replacing backend summaries

换句话说,adapter 可以记住“Nanobot 的 user_a/run_001 对应 Honcho peer X、Graphiti group Y、MemOS cube Z”,但不维护一份独立的偏好数据库。

4.1 Accumulation Turn

Nanobot receives/sends turn
  -> build MemoryEvent
  -> adapter.record_event(event)
  -> backend native write path
  -> WriteReceipt(status=committed|queued|pending, native_ids, latency)

Adapters may return queued or pending; Nanobot must not assume all memory systems are immediately consistent.

4.2 Test Turn

Nanobot must use the latest memory state available at that point in the timeline to choose an interaction strategy, act and interact with the simulator and be judged by the resulting interaction outcome.

Benchmark reaches a test insertion point, e.g. before accumulation session 15
  -> wait for adapter readiness for all prior accumulation writes
  -> adapter.set_mode(read_only=True, reason="test_session")
  -> simulator starts a normal test scenario with current user setting
  -> Nanobot builds RetrieveRequest(query, scope, budget, scenario/context/attribute)
  -> adapter.retrieve(...)
  -> adapter.format_context(result)
  -> Nanobot chooses interaction strategy / response / action
  -> simulator produces user-side response and outcome
  -> judge scores Nanobot behavior against the current user setting
  -> adapter.set_mode(read_only=False, reason="accumulation")
  -> continue the accumulation timeline

Read-only test sessions are controlled by the benchmark harness, not by ad hoc per-call flags. Before entering a test session, the harness calls:

await adapter.set_mode(AdapterMode(read_only=True, reason="test_session"))

While read_only=True, record_event must either no-op with a skipped_read_only receipt or store only non-retrievable audit traces outside the backend memory namespace. This applies to the test user messages, Nanobot responses, simulator events, and judge artifacts. They are evaluation records, not new long-term memory evidence. After the test session, the harness switches back:

await adapter.set_mode(AdapterMode(read_only=False, reason="accumulation"))

This keeps behavioral probes from contaminating later memory and avoids each Nanobot call site inventing its own test flag. RetrieveResult.raw, Trace, and native diagnostics are only used for debugging, attribution, and failure analysis; they are not the scoring target.

After the 36 test sessions complete, the adapter scope is not reset. Test results are independent, but memory state should remain at the final accumulated state for post-run analysis. reset_scope is called only at the start of a new benchmark run or during explicit smoke/reset tests.

4.3 Backend Swap Flow

To switch from one backend to another:

  1. Freeze the run identity: run_id, persona_id, and optional agent_id. Keep seed/replicate metadata in the harness run manifest, not in the adapter scope unless isolation requires it.
  2. Change only memory.backend in Nanobot/MemPA config.
  3. Provide only that backend’s native config block.
  4. Instantiate through MemoryAdapterRegistry.
  5. Run adapter.health().
  6. Run setup_scope(scope).
  7. Run capability discovery: supports_feedback, supports_bulk_ingest, supports_readiness, supports_native_mutation, supports_provenance, backend-native memory/ingest modes, consistency_model, native_query_modes.
  8. Run an isolation smoke test: write one synthetic event under scope A, assert scope B cannot retrieve it.
  9. Run a write/read smoke test using backend-native semantics.
  10. Honcho-specific smoke gate: after writing one message, poll queue_status or equivalent pending-state endpoint until deriver work is empty or timeout; then test retrieve. If it times out, mark retrieval consistency as pending rather than passing a false green smoke test.
  11. Start accumulation/test loop.
  12. Store run manifest: backend name, native config hash, capability flags, scope mapping, harness seed/replicate metadata.

No Nanobot agent-loop code should change during this flow.

Example config shape:

memory:
  backend: mem0
  scope:
    run_id: user_a_c1
    persona_id: user_a
    agent_id: null
  read_only_test_sessions: true
  backends:
    mem0:
      top_k: 12
      threshold: 0.1
      rerank: true
    graphiti:
      group_id_template: "{run_id}:{persona_id}"
      search_config: combined_hybrid
    honcho:
      workspace_template: "{run_id}"
      context_tokens: 8000
    memos:
      writable_cube_template: "{run_id}:{persona_id}"
      mode: mixture
      async_mode: async

5. Backend Mappings

5.1 Honcho Adapter

Native model:

Workspace
  -> Peer(user), Peer(assistant)
  -> Session
  -> peer-labeled Messages
  -> background deriver/dreamer
  -> context, representation, peer card, peer chat

Write path:

record_event
  -> resolve workspace_id
  -> resolve user peer and assistant peer
  -> resolve session_id
  -> session.add_messages([peer.message(...)])

Read path options:

  1. Runtime default: session.context(summary=True, tokens=request.budget.max_tokens, peer_target=user_peer, search_query=request.query, search_top_k=...).
  2. Supplemental user model context: session.representation(user_peer) or peer.context(...), only if configured and included within request.budget.
  3. Diagnostic/native query only: peer.chat(query) belongs behind SupportsNativeQuery because it triggers another LLM call and changes benchmark comparability.
  4. Search fallback: peer.search / session.search, only if session.context is unavailable or configured off.

CS mapping: scenario/context/attribute should be passed as message/session metadata where Honcho supports it and included in search_query or context request only as native parameters allow. If Honcho does not expose field-level metadata filters for context, the adapter must record that CS filtering is semantic/contextual rather than hard-filtered.

PET observability: Honcho PET is weak unless representation/context exposes updated_at, last_derived_at, or an equivalent version. The adapter must trace any exposed derivation timestamp/version and must not invent fact-level timestamps inside prose representation.

For benchmark reproducibility, the adapter config must pin exactly one runtime read strategy. The recommended default is session.context; peer.chat must not be used in the agent loop unless the experiment explicitly evaluates Honcho’s native chat endpoint.

Performance preservation:

  • Keep deriver/dreamer asynchronous; do not force synchronous representation refresh per turn.
  • Keep peers separate. User and assistant must not be collapsed into one actor.
  • Use Honcho SDK/REST, not direct DB writes.
  • Use token budget controls rather than dumping all messages.
  • Accept eventual consistency and expose pending queue status where available.
  • Pass request.budget.max_tokens into Honcho retrieval, so Honcho performs native context selection instead of adapter-side over-fetching.

Limits:

  • Exact snapshot/restore is not a portable Honcho capability.
  • PET historical query is weak unless Honcho exposes time-scoped reads; adapter should not fake temporal state.
  • Fresh writes may not immediately appear in representation/context until background derivation catches up; smoke tests and run traces must record this consistency state.

5.2 mem0 Adapter

Native model:

messages
  -> Memory.add(infer=True)
  -> LLM extraction
  -> memory item + entity linking + vector/BM25 indexes
  -> Memory.search(...)

Write path:

record_event
  -> Memory.add(
       messages=event.messages,
       user_id=scope.persona_id,
       agent_id=scope.agent_id,  # optional; omit if null
       run_id=scope.run_id,
       metadata={scenario, context, attribute, timestamp, ...},
       infer=True
     )

Read path:

retrieve
  -> Memory.search(
       query=request.query,
       filters={user_id, agent_id?, run_id, scenario, context, attribute},
       top_k=request.budget.max_items,
       threshold=...,
       rerank=...
     )

CS mapping: user_id and run_id are mandatory isolation filters; scenario, context, and attribute are mandatory filters for CS probes when present in the request. If a CS probe intentionally tests cross-context recall, the harness must mark that in request.filters rather than silently omitting filters.

PET observability: search results must preserve native memory id, score, and created_at/updated_at from result metadata when available. format_context may omit some details for prompt compactness, but RetrieveResult.raw and trace must keep them for PET analysis.

Performance preservation:

  • Never upsert directly into vector store.
  • Never replace Memory.search with get_all plus Python filtering.
  • Preserve hybrid search, entity boosting, rerank, threshold, and metadata filters as backend config.
  • Keep user_id/agent_id/run_id isolation.

Limits:

  • mem0’s natural unit is extracted fact memory; it is less native for full session representation than Honcho.
  • Exact restore depends on whether we can replay memories with infer=False or clone the backing store.

5.3 Graphiti Adapter

Native model:

episode(reference_time, group_id)
  -> entities + facts
  -> temporal edges with valid_at/invalid_at
  -> hybrid semantic/keyword/graph search

Write path:

record_event
  -> Graphiti.add_episode(
       name=f"{session_id}:{turn_id}",
       episode_body=serialized messages,
       source_description=scenario/context/attribute/session metadata,
       reference_time=event.timestamp,
       source=EpisodeType.message,
       group_id=f"{run_id}:{persona_id}"
     )

EpisodeType.fact_triple is available for pre-extracted fact triples, but the default Nanobot path should use EpisodeType.message unless an experiment explicitly wants to bypass Graphiti’s extraction from conversational evidence. Custom entity/edge types and fact-triple ingestion should be declared in BackendHealth.native_ingest_modes and selected through Graphiti adapter config; they should be available for explicit graph-schema ablations, not modeled as a shared adapter schema.

Read path:

retrieve
  -> Graphiti.search(query, group_ids=[...], num_results=request.budget.max_items)
  or Graphiti.search_(query, config=SearchConfig(...))

CS mapping: group_id is the hard isolation boundary. Scenario/context/attribute should be included in episode source_description and in the semantic query text for CS probes unless Graphiti search config exposes a native metadata filter in the selected implementation. The adapter must record whether CS filtering was hard-filtered or semantic-only.

PET observability: preserve fact/edge valid time, invalid time, provenance episode id, and relevance score in RetrieveResult.raw. This is Graphiti’s main advantage for PET and should not be flattened away before scoring.

Performance preservation:

  • Always pass timezone-stable reference_time.
  • Do not express preference evolution as destructive delete; let Graphiti invalidate old facts.
  • Do not directly write graph nodes/edges.
  • Use group_id partitioning.
  • Use search/search_ and Graphiti’s search config/reranker; do not dump the graph and rerank in adapter code.
  • For bulk benchmark preload, use add_episode_bulk or queueing.

Limits:

  • Graphiti has no built-in user/session management; Nanobot must own scope mapping.
  • Formatter must keep fact, time, and provenance visible enough for PET analysis.
  • reset_scope for Graphiti is backend-specific/best-effort. There is no single high-level delete_all(group_id) API; implementation may need driver-level Cypher/graph operations to delete a group partition, or use a fresh group_id per run as the safer reset strategy.

5.4 MemOS Adapter

Native model:

User / project
  -> one or more MemCubes
  -> text / preference / tool / skill memories
  -> scheduler, feedback, graph/vector/fulltext stores

Write path:

record_event
  -> APIADDRequest(
       user_id=...,
       session_id=event.session_id,
       messages=...,
       writable_cube_ids=...,
       custom_tags=[scenario, context, attribute],
       info=metadata,
       async_mode=config.async_mode,
       mode=config.mode
     )
  -> add_memories(...)

Read path:

retrieve
  -> APISearchRequest(
       user_id=...,
       query=...,
       session_id=request.session_id,
       readable_cube_ids=readable cubes,
       filter={tags/scenario/context/attribute according to MemOS filter syntax},
       top_k=request.budget.max_items,
       mode=fast|fine|mixture,
       include_preference=True,
       search_tool_memory=...
     )
  -> search_memories(...)

CS mapping: scenario/context/attribute are written through APIADDRequest.custom_tags. Current APISearchRequest does not expose a custom_tags field; retrieve must express those tags through the native filter field or the selected MemOS search filter syntax. If the selected MemOS path cannot hard-filter on tags, the adapter must record CS filtering as soft/semantic rather than pretending it was hard-filtered. Cube routing remains native MemOS config and should not be replaced by adapter-side filtering.

PET default: benchmark PET should use accumulation-only memory evolution through add_memories. apply_feedback is reserved for a separately labeled feedback-enabled experiment group, because it changes the intervention being evaluated.

Typed memory exposure: MemOS memory groups should be preserved using native labels, including cube-level memory groups such as text_mem and result metadata types such as LongTermMemory, UserMemory, PreferenceMemory, ToolSchemaMemory, ToolTrajectoryMemory, and SkillMemory when exposed. text/preference/tool/skill is a conceptual grouping for the adapter doc, not a new enum. Other backends should not be asked to emulate these memory types.

Source check: current APIADDRequest, APISearchRequest, and APIFeedbackRequest use session_id for add/search/feedback. conversation_id appears as a backward-compatible alias on delete-related request models, so the adapter should map Nanobot session_id to MemOS session_id for the main runtime path and only use conversation_id for APIs that explicitly require it.

Feedback path:

apply_feedback
  -> APIFeedbackRequest(...)
  -> feedback_memories(...)

Performance preservation:

  • Preserve async_mode; do not force all writes to synchronous mode.
  • Preserve mode=fast/fine/mixture, dedup, relativity, and top_k as config.
  • Preserve memory type separation in formatted context.
  • Preserve cube routing; do not collapse all runs into one flat user namespace.
  • Tool calls should remain tool-memory-compatible records, not plain text.

Limits:

  • MemOS exposes more native entities than the common adapter. The common interface should not force other backends to mimic cube/type/feedback semantics.
  • Snapshot/restore depends on cube registry and physical stores.

6. Context Formatting

format_context must output a stable section for the agent prompt, but backend-specific contents can differ:

<memory-context backend="graphiti" scope="...">
...
</memory-context>

Formatting rules:

  1. Keep source labels: backend, score if available, timestamp if available, native id if available.
  2. Keep PET-relevant temporal fields for Graphiti and timestamped sources for others.
  3. Keep MemOS memory type groups separate.
  4. Keep Honcho representation/context as prose, not forced into fake scored bullets.
  5. Enforce token budget in the formatter, not by disabling backend retrieval algorithms.
  6. For cross-backend CS/PET/IQI evaluation, the harness should pass a shared target token range through ContextBudget (min_tokens/max_tokens or equivalent). The formatter should fit that range when enough native evidence is available; if a backend returns less evidence, trace the shortfall rather than padding with invented content.

7. Capability Matrix

CapabilityHonchomem0GraphitiMemOS
Write interactionsession messagesMemory.addadd_episodeadd_memories
Search/retrievecontext/search/chatMemory.searchsearch/search_search_memories
User isolationworkspace/peer/sessionuser/agent/run filtersgroup_iduser/cube
Native evolutionderiver/dreamer derived stateadditive extracted facts/entity linkstemporal invalidationfeedback/cube memory evolution
Async pathderiver/dreamermostly sync API, backend dependentasync graph ingestion APIscheduler async mode
Bulk ingestadd_messagesmulti-message addadd_episode_bulk / queuebatched add path if exposed
Readinessderiver/dreamer pendingusually immediate/backend dependentasync ingestion/queuescheduler async status
Native mutationobject lifecycle if exposedupdate/delete/history/delete_alldelete episode/edge/admin clear where availablefeedback/delete/cube lifecycle
Provenancemessage/session/representation metadatamemory id/score/timestampsfact/edge/time/provenancecube/type/source/relativity
Native memory typesnonenonenonecube memory groups plus memory_type labels: text/preference/tool/skill and exact native labels when exposed
Native ingest modesmessage/sessionmessages/infer configmessage/json/text/fact_triple + custom entity/edge schemamode/cube/type-aware add
Exact snapshotnot commonnot commonnot commonnot common
Feedback APInot commonupdate/delete, not feedback-firstnew episode/correction evidencenative feedback
Temporal queryweakweakstrongpartial/backend dependent

8. Implementation Layout

Suggested repo shape:

memory/
  base.py
  events.py
  scope.py
  registry.py
  formatting.py
  backends/
    honcho_adapter.py
    mem0_adapter.py
    graphiti_adapter.py
    memos_adapter.py
  conditions/
    no_memory.py
    rolling_summary.py
    full_history.py
    static_profile.py
    oracle_ipas.py
  tests/
    contract/
      test_isolation.py
      test_record_retrieve.py
      test_read_only_test_sessions.py

registry.py loads a backend by config:

memory:
  backend: graphiti
  scope:
    run_id: user_a_c1
  backends:
    graphiti:
      driver: neo4j
      group_id_template: "{run_id}:{persona_id}"
      search_mode: hybrid

Only the selected backend config is required at runtime.

Controlled condition providers live beside backend adapters because they share the Nanobot-facing prompt-context slot, not because they are memory architectures. Their config should be selected by experiment condition:

memory:
  condition: rolling_summary
  scope:
    run_id: user_a_c1
    persona_id: user_a
  conditions:
    rolling_summary:
      max_tokens: 8000
      update_policy: after_accumulation_session

For architecture runs, memory.backend selects Honcho/mem0/Graphiti/MemOS. For control/ceiling runs, memory.condition selects a controlled provider. The manifest should record both fields and enforce that exactly one runtime context provider is active.

9. Contract Tests

Required tests for every adapter:

  1. record_event accepts a two-message user/assistant turn and returns a receipt.
  2. retrieve under the same scope can find relevant context after backend-specific consistency/readiness wait.
  3. Scope A data does not appear under scope B.
  4. Read-only simulator test sessions do not write test user messages, Nanobot responses, simulator events, or judge artifacts into backend memory.
  5. retrieve passes budget into native backend retrieval where supported, and format_context contains backend label without over-fetching-and-trimming as the primary retrieval strategy.
  6. Reset only clears the configured benchmark scope.
  7. Backend-native errors are normalized into adapter errors without hiding raw diagnostics.
  8. RetrieveResult.raw preserves native provenance needed by PET/IQI: ids, scores when available, timestamps when available, and backend-specific type/provenance labels.
  9. CS probes pass scenario/context/attribute through the backend-specific filter or semantic mapping documented for that adapter.
  10. Test sessions run through simulator and judge behavior outcomes; they do not call reset_scope, and reset is limited to new benchmark runs and explicit reset tests.
  11. Controlled condition providers use the same read-only test-session gate as backend adapters and cannot write test-session turns into prompt-retrievable state.

Backend-specific tests:

  • Honcho: user and assistant peers remain distinct; deriver pending state is visible.
  • mem0: infer=True creates extracted memory; filters include user_id/run_id.
  • Graphiti: reference_time is non-null; changed facts keep temporal metadata; custom entity/edge schema and EpisodeType.fact_triple are available only through explicit native-ingest config.
  • MemOS: async mode eventual retrieval works; memory type groups are preserved; CS tags written through custom_tags are retrieved through native filter when supported.

9.1 Source Anchors Checked

本设计至少对齐以下源码入口:

BackendChecked source anchors
Honchosdks/python/src/honcho/session.py: Session.add_messages, Session.context, Session.search; sdks/python/src/honcho/peer.py: Peer.chat, Peer.search, Peer.representation, Peer.context
mem0mem0/memory/main.py: Memory.add, Memory.search, Memory.get_all, Memory.update, Memory.delete; add/search both enforce user_id/agent_id/run_id scoping
Graphitigraphiti_core/graphiti.py: Graphiti.add_episode, add_episode_bulk, search, search_; add_episode includes reference_time, group_id, custom entity/edge types, saga support
MemOSsrc/memos/api/product_models.py: APIADDRequest, APISearchRequest, APIFeedbackRequest; src/memos/multi_mem_cube/single_cube.py and composite_cube.py: add_memories, search_memories, feedback_memories; graph DB isolation config uses use_multi_db plus logical user_name separation

这些 anchors 不是实现 API 的完整列表,只是防止设计偏离当前源码的最低依据。

9.2 Completion Checklist

User requirementDesign evidence
四个记忆架构尽可能解耦地通过 adapter 接进 NanobotSections 1-5 define event-centered contract, scope mapping, registry/config flow, and per-backend adapters without shared backend internals
保证四个架构本身性能不因合成失去原设计Each backend section has a Performance preservation block preserving native write/search/async/index/rerank/cube/temporal semantics
如无必要,勿增实体,不添加四个 architecture 不支持的功能Section 4.0 restricts adapter state; Section 10 rejects universal preference entity and common exact snapshot
更换某一个 memory 架构时能清晰 follow 整个 flowSection 4.3 gives the 12-step backend swap flow plus config shape
GAME.md §4.6/§4.7 的 controlled conditions 不误并入架构比较Section 2.2 separates control/ceiling providers from architecture adapters and assigns Nanobot native file memory to the rolling-summary baseline
用 agent-collab 与 Claude 协作并做 2-3 轮辩证/批评TASK-002 contains three Codex rounds plus Claude Review; section 9.4 records the review resolution. Prior Claude corrections from TASK-001 about Graphiti fact_triple and MemOS use_multi_db are also incorporated.

9.3 Collaboration Status

agent-collab task:

TASK-002: Memory Adapter design for Nanobot backends

Codex posted three review rounds:

  1. Round 1: event-oriented adapter contract, native write paths, backend swap flow, snapshot concerns.
  2. Round 2: source-confirmed API details and self-critique about runtime contract vs benchmark harness.
  3. Round 3: allowed adapter state, no fifth memory architecture, expanded backend swap flow.

Claude review status:

  • Review requests were sent to both Claude and claude inboxes.
  • Claude Review was later found in .agent-collab/tasks/TASK-002.md and incorporated.
  • The final design includes three Codex rounds, one Claude review round with concrete blockers, and a Codex response to that review. See section 9.4.

9.4 Claude Review Resolution

Claude reviewed TASK-002 and raised six concrete issues. Resolution:

Claude critiqueResolution in this document
Token budget must flow into retrieve, not only format_contextSection 2 now makes RetrieveRequest.budget explicit; Honcho section passes request.budget.max_tokens to session.context(tokens=...); mem0/Graphiti/MemOS map budget.max_items into native top-k/num_results
Honcho retrieve listed four paths without defaultSection 5.1 now pins session.context as runtime default, allows representation/context as configured supplement, and moves peer.chat behind SupportsNativeQuery
Protocol sync/async unclearSection 2 now defines async-first adapter methods and says sync backends are wrapped inside adapter
MemOS field mismatchRechecked source: add/search/feedback models use session_id; delete model has conversation_id alias. Section 5.4 now states this explicitly and keeps runtime mapping to session_id
Read-only test mode source unclearSection 4.2 now assigns control to the benchmark harness through adapter.set_mode(read_only=True/False)
Graphiti reset_scope lacks native high-level clear APISection 5.3 now marks Graphiti reset as backend-specific/best-effort and recommends fresh group_id per run when safer
Honcho smoke test should wait for deriver queueSection 4.3 now adds a Honcho-specific queue/pending-state smoke gate

Claude also agreed with the event-centered contract, capability protocols, no universal preference entity, no common exact snapshot, allowed adapter state, and trace consistency model.

9.5 TASK-003 Evaluation-Fitness Update

TASK-003 reviewed the design specifically against MemPABench CS, PET, and IQI scoring. The resulting changes are:

  • Added harness-facing optional capabilities: SupportsBulkIngest, SupportsReadiness, SupportsNativeMutation, and SupportsProvenance.
  • Removed SupportsSnapshot from the protocol list; fresh scope plus run manifest is the benchmark default.
  • Added backend-native extension discovery fields in BackendHealth for MemOS memory types and Graphiti ingest/schema modes without turning them into shared schema; corrected MemOS search mapping to use APISearchRequest.filter, not a nonexistent search-side custom_tags field.
  • Added PET trace fields for retrieved counts, score, timestamp range, and Honcho derivation/version metadata when exposed.
  • Added per-backend CS mapping for scenario/context/attribute filters or semantic-only retrieval.
  • Fixed MemOS PET default to accumulation-only; feedback is a separate experiment group.
  • Clarified that test sessions are read-only simulator-facing behavioral probes, not memory inspection probes; they do not reset memory state after completion.
  • Added GAME.md §4.6/§4.7 condition boundary: Nanobot native file memory is a rolling-summary/control implementation option, not a fifth architecture under test.

10. Finalized Evaluation Conventions

These points are no longer open questions; they are part of the adapter design:

  1. Nanobot receives one formatted prompt context string for runtime behavior, while MemPABench stores structured retrieval evidence separately. Every retrieve call should return both RetrieveResult.raw and formatted_context so the agent loop stays simple and the harness can still analyze PET/IQI failures.
  2. Read-only simulator test sessions must not write test user messages, Nanobot responses, simulator events, judge artifacts, or final judged responses into backend memory. If these records are needed, store them as non-retrievable evaluation artifacts outside the backend memory namespace.
  3. Native query is allowed only for diagnostics, not in the Nanobot agent loop. Allowed diagnostic examples include Graphiti group edge/fact counts, Honcho raw peer/session representation inspection, mem0 memory count/history checks, and MemOS cube/type inventory. Native query results must be logged as diagnostics and never injected into the Nanobot prompt.