User A — World Design
Per-persona document. Each persona in MemPABench has its own World Design. This file covers User A only. When a new persona is added, a separate World Design must be created from scratch — do not reuse this file’s story content, characters, or fixture structure.
Fixture status. Fixture entries in this document describe what should exist to support the storylines. They are not a record of what currently exists in the state server. Fixture existence is validated during the outline review round, not during generation.
2026-05-17 revision —
memory_privacydeprecated.memory_privacyis no longer a measurable interaction preference or evolving arc. User A’s former P_F arc (personal_memory_privacy,Domain-scoped -> Minimal+transparent) is inactive and its generated scripts were deleted:acc_028,acc_052,acc_072,evolv_06,acc_098,acc_106. User A’s stablework_memory_privacyscripts were also deleted:acc_037,acc_069,acc_107. PA memory/history can still exist as ordinary benchmark state, but privacy/memory scope is no longer an active target cell.
设计思路
MemPABench 原始 Timeline 仍保留 acc 1–108 的时间骨架;在 memory_privacy 删除后,User A 的 active target cell 为 28 个,其中 5 个 evolving cell active、1 个 historical evolving slot inactive。当前写作目标是 94 个 accumulation beat scripts + 5 个 forced evolving events,而不是强行补回 108 个 active scripts。如果每个 session 都是孤立的、随机的情境,写出来的 session 会缺乏内部一致性,simulator 也难以维持稳定的人物状态;但如果所有 session 都围绕同一个情境(比如同一个展览),场景会过于重复,preference 的表达也会失去自然感。
解决方案是建立一个有时间弧的虚构世界:给 User A 一个跨越数月的真实生活背景(一个进行中的研究项目、一段有起伏的个人生活),让每个 session 都从这个世界里生长出来,而不是凭空捏造。这样同一个 preference cell 的三次出现可以来自同一个世界的不同时刻、不同情境,既保持人物一致性,又避免场景重复。
具体设计逻辑:
- 时间弧:4 个月的学期弧,研究项目从起草到投稿,个人生活从秋季活动到年末,让 session 之间有自然的时间感和进展感
- 场景类型目录:把 Work 和 Personal 各拆成 6 种场景类型,每种类型有 flavor variation,确保同一类型的多次出现不重复
- 人物关系网:建立少量固定人物(Marcus、Prof. Reyes、Dr. Okafor 等),让不同 session 之间的世界有共同的锚点,也让 PA 学到的关于 User A 的人际背景是跨 session 连贯的
- State fixture 约束:明确哪些世界状态需要在 state server 里有对应文件,避免 PA 无法通过工具发现的事实被当作 ground truth
这份文档是所有 session script 的共同依据。写单个 session 时,先从这里选取场景类型和时间位置,再展开具体 beat。
Session-writing bible for User A accumulation scripts and test sessions. After memory_privacy deprecation, User A has 94 acc_*.yaml scripts, 5 evolv_*.yaml forced-event scripts, 5 pre-event tests, and 28 final probes.
All session scripts must draw from this document for consistency.
1. Time Span
Sessions span approximately 4 months: early October → late January.
| Phase | Sessions | Work arc | Personal arc |
|---|---|---|---|
| Early Oct | acc 1–30 | Research active, drafting begins | Autumn activities, train exhibition opens |
| Mid Oct – Nov | acc 31–70 | Writing phase, collaborator coordination | Holidays, schedule disruptions |
| Dec – early Jan | acc 71–108 | Revision, referee response, submission run-up | Winter routines, end-of-year admin |
2. Core Characters
User A
- Theoretical physicist at Westlake Institute of Technology (WIT), Glenmont, CA
- Lives in apartment with Marcus Chen
- Extremely routine-driven; any disruption is treated as a logistical emergency
- Communicates in precise, formal language; applies scientific frameworks to mundane problems
- Uses the PA primarily as a task executor, not a conversational partner
Marcus Chen
- Experimental physicist at WIT, same department; User A’s roommate and closest social contact
- Can be a direct participant in scenarios: User A may ask the PA to coordinate with Marcus, draft messages to him, plan joint activities, or handle apartment logistics involving him
- Marcus has his own schedule, preferences, and occasional scheduling conflicts with User A
- Appears in roughly 15–20% of sessions as a direct participant; elsewhere as background context
Recurring supporting characters
- Dr. Okafor — department administrator at WIT; sends announcements, manages room bookings
- Prof. Reyes — User A’s primary collaborator on the research project (at another institution; contact via email only)
- Stuart — owner of Stuart’s Comic Center; User A has a standing order arrangement
3. Core Research Project
Project: A theoretical paper on quantum measurement and decoherence pathways in non-Abelian anyonic systems.
Status at session 1: Mid-research; theoretical framework is established, User A is in drafting phase.
Timeline within sessions:
| Period | Project status |
|---|---|
| Oct (acc 1–30) | Active drafting; literature review tasks; checking citations |
| Nov (acc 31–70) | Drafting complete; collaborator review with Prof. Reyes; revising based on feedback |
| Dec (acc 71–90) | Pre-submission polish; abstract finalization; formatting for target journal |
| Jan (acc 91–108) | Submission; referee response prep (first referee reports may arrive) |
Target journal: Physical Review Letters (fictional reference is fine).
State fixture implication: documents/research/ contains evolving draft sections; email/inbox contains messages from Prof. Reyes and journal notifications.
4. Recurring Places
| Place | Context | Notes |
|---|---|---|
| WIT campus / User A’s office | Work | Default work setting |
| Apartment (Glenmont) | Personal | Shared with Marcus |
| Stuart’s Comic Center | Personal-leisure | Standing order for series; User A visits ~monthly |
| Glenmont Cinema | Personal-leisure | Movie outings, primarily with Marcus |
| Glenmont Civic Exhibition Hall | Personal-leisure | Autumn train exhibition (runs Oct–Nov); other exhibitions later |
| Local restaurants (Glenmont) | Personal-social | Thai place on Tuesdays; a few other regulars |
| Glenmont Heights Building Management | Personal-home | Apartment maintenance and facility requests |
5. Work Scenario Types
54 Work sessions total. Draw from these types; same type may recur in different flavors across reps.
W1 — Research writing
Tasks directly on the paper draft: writing a section, revising a paragraph, drafting the abstract, adjusting a proof sketch.
Flavor variation: early drafting vs. late revision vs. referee-response rewriting.
W2 — Literature and citations
Summarizing a paper, checking whether a citation is accurate, finding a reference for a specific claim, organizing a reading list.
Flavor variation: broad survey vs. targeted deep read vs. citation verification.
W3 — Collaborator coordination
Email to Prof. Reyes (sharing draft sections, asking for feedback, negotiating deadlines), occasionally Marcus for department overlap.
Flavor variation: routine update vs. disagreement resolution vs. time-sensitive decision.
W4 — Departmental administration
Responding to Dr. Okafor’s announcements, booking a room, completing committee paperwork, handling a grant-related form, conference registration.
Flavor variation: simple routine vs. multi-step logistics vs. schedule conflict.
W5 — Conference and travel
Preparing a conference abstract, planning travel to a workshop, handling travel expense logistics, checking submission deadlines.
Flavor variation: abstract drafting vs. logistics planning vs. deadline management.
W6 — Teaching-adjacent
Office hours scheduling, handling a student email, preparing a course-related document (User A teaches one seminar per semester).
Flavor variation: routine admin vs. student issue that requires careful handling.
6. Personal Scenario Types
54 Personal sessions total. Draw from these types.
P1 — Leisure outings
Planning a visit to an exhibition, movie outing, or similar. Involves checking availability, route planning, timing relative to User A’s schedule.
Flavor variation: solo vs. with Marcus; spontaneous vs. pre-planned; within routine vs. schedule conflict.
P2 — Comic books and hobby
Checking availability at Stuart’s, placing or modifying a standing order, picking up an item, tracking a series release.
Flavor variation: routine pickup vs. a sought-after issue that requires effort vs. dispute about an order.
P3 — Social coordination with Marcus
Planning a shared dinner, resolving a scheduling conflict over apartment logistics, coordinating a joint outing, handling an apartment decision that affects both.
Flavor variation: low-stakes joint decision vs. User A’s routine threatened by Marcus’s plans vs. external event affecting both.
P4 — Home and apartment
Maintenance requests to building management, purchasing household items, handling a service appointment, dealing with a facility problem.
Flavor variation: simple request vs. multi-step logistics vs. time-sensitive problem.
P5 — Health and appointments
Scheduling or rescheduling a medical appointment, handling a pharmacy task, managing a health-routine disruption.
Flavor variation: routine scheduling vs. conflict with research schedule vs. handling an unexpected issue.
P6 — Routine disruptions
User A’s rigid schedule is disturbed by an external event (a cancellation, a surprise, a conflict). PA must help re-establish order.
Flavor variation: minor inconvenience vs. moderate disruption vs. multi-day cascade.
7. Marcus as Direct Participant — Usage Guide
Marcus appears as a direct participant when the scenario naturally involves him:
- Joint outing planning (both going somewhere together)
- Apartment logistics (a decision or problem that affects both)
- Work-adjacent overlap (same department, shared context)
- Drafting a message or proposal meant for Marcus
Do not force Marcus into sessions where he is not needed. His presence should feel motivated by the scenario, not by a quota.
Rough distribution: direct participant in ~15–20 sessions; background mention in ~20 sessions; absent in the rest.
8. Scenario Variety Principles
To prevent repetition across the active session set that targets the same 28 active preference cells multiple times:
- Vary the work type: across 3 reps of the same cell, draw from different W-types (e.g., rep 1 = W1, rep 2 = W3, rep 3 = W4)
- Vary the personal domain: across 3 reps of the same cell, draw from different P-types
- Use the time arc: early sessions feel like active work; later sessions feel like deadline pressure. Personal sessions shift from autumn leisure to end-of-year logistics.
- Vary Marcus’s role: across sessions, he alternates between direct participant, background mention, and absent.
- Avoid reusing the same fictional task artifact (e.g., don’t reference
glenmont_train_exhibition_logistics.mdin more than 2–3 sessions).
9. Fixture Design Plan
This section lists the fixtures that should exist in the state server to support the scenario types above. Entries marked [exists] are already present; entries marked [needed] must be created before the session can run. Existence is validated during the outline review round.
Documents
| Path | Content | Used by | Status |
|---|---|---|---|
documents/research/draft_intro.md | Introduction section of the anyonic systems paper | W1 writing tasks | [needed] |
documents/research/draft_section2.md | Theory/methods section | W1 writing tasks | [needed] |
documents/research/draft_section3.md | Results/analysis section | W1 late-revision tasks | [needed] |
documents/research/citation_notes.md | Working list of references and annotation notes | W2 citation tasks | [needed] |
documents/research/reyes_feedback.md | Prof. Reyes’s written comments on the draft | W3 collaborator tasks | [needed] |
documents/admin/grant_form_draft.md | Partially filled grant application | W4 admin tasks | [needed] |
documents/admin/conference_abstract.md | Draft conference abstract | W5 conference tasks | [needed] |
my_desktop/glenmont_train_exhibition_logistics.md | Outing logistics note | P1 leisure, process_visibility | [exists] |
my_desktop/apartment_issue_log.md | Running log of apartment maintenance requests | P4 home tasks | [needed] |
| Fixture | Content | Used by | Status |
|---|---|---|---|
email/inbox.jsonl | Messages from Prof. Reyes (draft feedback, deadline updates), Dr. Okafor (department announcements), journal notifications | W2, W3, W4 | [partially exists] |
email/inbox.jsonl | Marcus messages (outing coordination, apartment logistics) | P1, P3 | [partially exists] |
New inbox messages should be added as individual fixture entries per scenario. Do not reuse the same email thread in more than two sessions.
Maps
| Fixture | Content | Used by | Status |
|---|---|---|---|
maps/ | Route options: apartment → WIT campus, apartment → Stuart’s Comic Center, apartment → Glenmont Cinema, apartment → Glenmont Civic Exhibition Hall | P1, solution_breadth | [partially exists] |
maps/ | Multi-leg routes (apartment → store → exhibition, campus → airport) | process_visibility, W5 travel | [needed] |
Each maps_route_options call should return ≥2 viable options to support solution_breadth sessions.
Store / Cinema / Places
| Fixture | Content | Used by | Status |
|---|---|---|---|
store/ | Stuart’s Comic Center inventory: multiple series and issue availability | P2 comic tasks | [exists] |
cinema/ | Glenmont Cinema: showtimes across Oct–Jan (multiple movies) | P1 movie tasks | [exists] |
places/ | Local restaurants, cafes, and services near Glenmont | P3 dining, P4 services | [exists] |
Contacts
| Entry | Role | Used by | Status |
|---|---|---|---|
prof_reyes | Prof. Reyes, collaborator at external institution | W3 emails | [needed] |
dr_okafor | WIT department administrator | W4 admin | [needed] |
journal_editor | Editor contact for target journal | W5 submission tasks | [needed] |
glenmont_civic_accessibility | Exhibition Hall Accessibility Desk | capability_boundary | [exists] |
glenmont_civic_event_services | Exhibition Hall Event Services | W4, P1 | [exists] |
building_management | Glenmont Heights Building Management | P4 home | [exists] |
stuart_comic_center | Stuart’s Comic Center | P2 | [needed] |
Memory Fixture
| Path | Content | Used by | Status |
|---|---|---|---|
memory_fixtures/user_a/memory_base.md | Cross-session continuity facts: routine preferences, recurring patterns, relationship context | ordinary continuity, non-target memory context | [partially exists as memory_privacy_mixed.md] |
Memory fixtures may still be used to support continuity and realistic context, but scripts must not target memory/privacy scope as a measurable preference.
Fixture Design Notes
- capability_boundary sessions: The fixture must explicitly omit the information the scenario depends on. This is a deliberate design choice, not a gap to fill. Document the omission explicitly in the session’s
state_server_prepfield. - information_elicitation sessions: Each session’s tool must have
required_slotsthat are genuinely missing from the user’s opening message. The required slots are defined in the session YAML’sbeat.elicitationfield, not in the fixture itself. - process_visibility sessions: The scenario must reference at least two tool calls in sequence (e.g.,
email_readthenmaps_route_options, ordocuments_readthenemail_save_draft). Single-tool scenarios cannot be used. - autonomy_level sessions: At least one fixture must support an external-action tool call (
email_send) with a plausible recipient from contacts. The external action must be non-trivially reversible (not just a draft).