User A — World Design

Per-persona document. Each persona in MemPABench has its own World Design. This file covers User A only. When a new persona is added, a separate World Design must be created from scratch — do not reuse this file’s story content, characters, or fixture structure.

Fixture status. Fixture entries in this document describe what should exist to support the storylines. They are not a record of what currently exists in the state server. Fixture existence is validated during the outline review round, not during generation.

2026-05-17 revision — memory_privacy deprecated. memory_privacy is no longer a measurable interaction preference or evolving arc. User A’s former P_F arc (personal_memory_privacy, Domain-scoped -> Minimal+transparent) is inactive and its generated scripts were deleted: acc_028, acc_052, acc_072, evolv_06, acc_098, acc_106. User A’s stable work_memory_privacy scripts were also deleted: acc_037, acc_069, acc_107. PA memory/history can still exist as ordinary benchmark state, but privacy/memory scope is no longer an active target cell.

设计思路

MemPABench 原始 Timeline 仍保留 acc 1–108 的时间骨架;在 memory_privacy 删除后,User A 的 active target cell 为 28 个,其中 5 个 evolving cell active、1 个 historical evolving slot inactive。当前写作目标是 94 个 accumulation beat scripts + 5 个 forced evolving events,而不是强行补回 108 个 active scripts。如果每个 session 都是孤立的、随机的情境,写出来的 session 会缺乏内部一致性,simulator 也难以维持稳定的人物状态;但如果所有 session 都围绕同一个情境(比如同一个展览),场景会过于重复,preference 的表达也会失去自然感。

解决方案是建立一个有时间弧的虚构世界:给 User A 一个跨越数月的真实生活背景(一个进行中的研究项目、一段有起伏的个人生活),让每个 session 都从这个世界里生长出来,而不是凭空捏造。这样同一个 preference cell 的三次出现可以来自同一个世界的不同时刻、不同情境,既保持人物一致性,又避免场景重复。

具体设计逻辑:

  • 时间弧:4 个月的学期弧,研究项目从起草到投稿,个人生活从秋季活动到年末,让 session 之间有自然的时间感和进展感
  • 场景类型目录:把 Work 和 Personal 各拆成 6 种场景类型,每种类型有 flavor variation,确保同一类型的多次出现不重复
  • 人物关系网:建立少量固定人物(Marcus、Prof. Reyes、Dr. Okafor 等),让不同 session 之间的世界有共同的锚点,也让 PA 学到的关于 User A 的人际背景是跨 session 连贯的
  • State fixture 约束:明确哪些世界状态需要在 state server 里有对应文件,避免 PA 无法通过工具发现的事实被当作 ground truth

这份文档是所有 session script 的共同依据。写单个 session 时,先从这里选取场景类型和时间位置,再展开具体 beat。


Session-writing bible for User A accumulation scripts and test sessions. After memory_privacy deprecation, User A has 94 acc_*.yaml scripts, 5 evolv_*.yaml forced-event scripts, 5 pre-event tests, and 28 final probes.
All session scripts must draw from this document for consistency.


1. Time Span

Sessions span approximately 4 months: early October → late January.

PhaseSessionsWork arcPersonal arc
Early Octacc 1–30Research active, drafting beginsAutumn activities, train exhibition opens
Mid Oct – Novacc 31–70Writing phase, collaborator coordinationHolidays, schedule disruptions
Dec – early Janacc 71–108Revision, referee response, submission run-upWinter routines, end-of-year admin

2. Core Characters

User A

  • Theoretical physicist at Westlake Institute of Technology (WIT), Glenmont, CA
  • Lives in apartment with Marcus Chen
  • Extremely routine-driven; any disruption is treated as a logistical emergency
  • Communicates in precise, formal language; applies scientific frameworks to mundane problems
  • Uses the PA primarily as a task executor, not a conversational partner

Marcus Chen

  • Experimental physicist at WIT, same department; User A’s roommate and closest social contact
  • Can be a direct participant in scenarios: User A may ask the PA to coordinate with Marcus, draft messages to him, plan joint activities, or handle apartment logistics involving him
  • Marcus has his own schedule, preferences, and occasional scheduling conflicts with User A
  • Appears in roughly 15–20% of sessions as a direct participant; elsewhere as background context

Recurring supporting characters

  • Dr. Okafor — department administrator at WIT; sends announcements, manages room bookings
  • Prof. Reyes — User A’s primary collaborator on the research project (at another institution; contact via email only)
  • Stuart — owner of Stuart’s Comic Center; User A has a standing order arrangement

3. Core Research Project

Project: A theoretical paper on quantum measurement and decoherence pathways in non-Abelian anyonic systems.

Status at session 1: Mid-research; theoretical framework is established, User A is in drafting phase.

Timeline within sessions:

PeriodProject status
Oct (acc 1–30)Active drafting; literature review tasks; checking citations
Nov (acc 31–70)Drafting complete; collaborator review with Prof. Reyes; revising based on feedback
Dec (acc 71–90)Pre-submission polish; abstract finalization; formatting for target journal
Jan (acc 91–108)Submission; referee response prep (first referee reports may arrive)

Target journal: Physical Review Letters (fictional reference is fine).

State fixture implication: documents/research/ contains evolving draft sections; email/inbox contains messages from Prof. Reyes and journal notifications.


4. Recurring Places

PlaceContextNotes
WIT campus / User A’s officeWorkDefault work setting
Apartment (Glenmont)PersonalShared with Marcus
Stuart’s Comic CenterPersonal-leisureStanding order for series; User A visits ~monthly
Glenmont CinemaPersonal-leisureMovie outings, primarily with Marcus
Glenmont Civic Exhibition HallPersonal-leisureAutumn train exhibition (runs Oct–Nov); other exhibitions later
Local restaurants (Glenmont)Personal-socialThai place on Tuesdays; a few other regulars
Glenmont Heights Building ManagementPersonal-homeApartment maintenance and facility requests

5. Work Scenario Types

54 Work sessions total. Draw from these types; same type may recur in different flavors across reps.

W1 — Research writing

Tasks directly on the paper draft: writing a section, revising a paragraph, drafting the abstract, adjusting a proof sketch.

Flavor variation: early drafting vs. late revision vs. referee-response rewriting.

W2 — Literature and citations

Summarizing a paper, checking whether a citation is accurate, finding a reference for a specific claim, organizing a reading list.

Flavor variation: broad survey vs. targeted deep read vs. citation verification.

W3 — Collaborator coordination

Email to Prof. Reyes (sharing draft sections, asking for feedback, negotiating deadlines), occasionally Marcus for department overlap.

Flavor variation: routine update vs. disagreement resolution vs. time-sensitive decision.

W4 — Departmental administration

Responding to Dr. Okafor’s announcements, booking a room, completing committee paperwork, handling a grant-related form, conference registration.

Flavor variation: simple routine vs. multi-step logistics vs. schedule conflict.

W5 — Conference and travel

Preparing a conference abstract, planning travel to a workshop, handling travel expense logistics, checking submission deadlines.

Flavor variation: abstract drafting vs. logistics planning vs. deadline management.

W6 — Teaching-adjacent

Office hours scheduling, handling a student email, preparing a course-related document (User A teaches one seminar per semester).

Flavor variation: routine admin vs. student issue that requires careful handling.


6. Personal Scenario Types

54 Personal sessions total. Draw from these types.

P1 — Leisure outings

Planning a visit to an exhibition, movie outing, or similar. Involves checking availability, route planning, timing relative to User A’s schedule.

Flavor variation: solo vs. with Marcus; spontaneous vs. pre-planned; within routine vs. schedule conflict.

P2 — Comic books and hobby

Checking availability at Stuart’s, placing or modifying a standing order, picking up an item, tracking a series release.

Flavor variation: routine pickup vs. a sought-after issue that requires effort vs. dispute about an order.

P3 — Social coordination with Marcus

Planning a shared dinner, resolving a scheduling conflict over apartment logistics, coordinating a joint outing, handling an apartment decision that affects both.

Flavor variation: low-stakes joint decision vs. User A’s routine threatened by Marcus’s plans vs. external event affecting both.

P4 — Home and apartment

Maintenance requests to building management, purchasing household items, handling a service appointment, dealing with a facility problem.

Flavor variation: simple request vs. multi-step logistics vs. time-sensitive problem.

P5 — Health and appointments

Scheduling or rescheduling a medical appointment, handling a pharmacy task, managing a health-routine disruption.

Flavor variation: routine scheduling vs. conflict with research schedule vs. handling an unexpected issue.

P6 — Routine disruptions

User A’s rigid schedule is disturbed by an external event (a cancellation, a surprise, a conflict). PA must help re-establish order.

Flavor variation: minor inconvenience vs. moderate disruption vs. multi-day cascade.


7. Marcus as Direct Participant — Usage Guide

Marcus appears as a direct participant when the scenario naturally involves him:

  • Joint outing planning (both going somewhere together)
  • Apartment logistics (a decision or problem that affects both)
  • Work-adjacent overlap (same department, shared context)
  • Drafting a message or proposal meant for Marcus

Do not force Marcus into sessions where he is not needed. His presence should feel motivated by the scenario, not by a quota.

Rough distribution: direct participant in ~15–20 sessions; background mention in ~20 sessions; absent in the rest.


8. Scenario Variety Principles

To prevent repetition across the active session set that targets the same 28 active preference cells multiple times:

  1. Vary the work type: across 3 reps of the same cell, draw from different W-types (e.g., rep 1 = W1, rep 2 = W3, rep 3 = W4)
  2. Vary the personal domain: across 3 reps of the same cell, draw from different P-types
  3. Use the time arc: early sessions feel like active work; later sessions feel like deadline pressure. Personal sessions shift from autumn leisure to end-of-year logistics.
  4. Vary Marcus’s role: across sessions, he alternates between direct participant, background mention, and absent.
  5. Avoid reusing the same fictional task artifact (e.g., don’t reference glenmont_train_exhibition_logistics.md in more than 2–3 sessions).

9. Fixture Design Plan

This section lists the fixtures that should exist in the state server to support the scenario types above. Entries marked [exists] are already present; entries marked [needed] must be created before the session can run. Existence is validated during the outline review round.

Documents

PathContentUsed byStatus
documents/research/draft_intro.mdIntroduction section of the anyonic systems paperW1 writing tasks[needed]
documents/research/draft_section2.mdTheory/methods sectionW1 writing tasks[needed]
documents/research/draft_section3.mdResults/analysis sectionW1 late-revision tasks[needed]
documents/research/citation_notes.mdWorking list of references and annotation notesW2 citation tasks[needed]
documents/research/reyes_feedback.mdProf. Reyes’s written comments on the draftW3 collaborator tasks[needed]
documents/admin/grant_form_draft.mdPartially filled grant applicationW4 admin tasks[needed]
documents/admin/conference_abstract.mdDraft conference abstractW5 conference tasks[needed]
my_desktop/glenmont_train_exhibition_logistics.mdOuting logistics noteP1 leisure, process_visibility[exists]
my_desktop/apartment_issue_log.mdRunning log of apartment maintenance requestsP4 home tasks[needed]

Email

FixtureContentUsed byStatus
email/inbox.jsonlMessages from Prof. Reyes (draft feedback, deadline updates), Dr. Okafor (department announcements), journal notificationsW2, W3, W4[partially exists]
email/inbox.jsonlMarcus messages (outing coordination, apartment logistics)P1, P3[partially exists]

New inbox messages should be added as individual fixture entries per scenario. Do not reuse the same email thread in more than two sessions.

Maps

FixtureContentUsed byStatus
maps/Route options: apartment → WIT campus, apartment → Stuart’s Comic Center, apartment → Glenmont Cinema, apartment → Glenmont Civic Exhibition HallP1, solution_breadth[partially exists]
maps/Multi-leg routes (apartment → store → exhibition, campus → airport)process_visibility, W5 travel[needed]

Each maps_route_options call should return ≥2 viable options to support solution_breadth sessions.

Store / Cinema / Places

FixtureContentUsed byStatus
store/Stuart’s Comic Center inventory: multiple series and issue availabilityP2 comic tasks[exists]
cinema/Glenmont Cinema: showtimes across Oct–Jan (multiple movies)P1 movie tasks[exists]
places/Local restaurants, cafes, and services near GlenmontP3 dining, P4 services[exists]

Contacts

EntryRoleUsed byStatus
prof_reyesProf. Reyes, collaborator at external institutionW3 emails[needed]
dr_okaforWIT department administratorW4 admin[needed]
journal_editorEditor contact for target journalW5 submission tasks[needed]
glenmont_civic_accessibilityExhibition Hall Accessibility Deskcapability_boundary[exists]
glenmont_civic_event_servicesExhibition Hall Event ServicesW4, P1[exists]
building_managementGlenmont Heights Building ManagementP4 home[exists]
stuart_comic_centerStuart’s Comic CenterP2[needed]

Memory Fixture

PathContentUsed byStatus
memory_fixtures/user_a/memory_base.mdCross-session continuity facts: routine preferences, recurring patterns, relationship contextordinary continuity, non-target memory context[partially exists as memory_privacy_mixed.md]

Memory fixtures may still be used to support continuity and realistic context, but scripts must not target memory/privacy scope as a measurable preference.

Fixture Design Notes

  • capability_boundary sessions: The fixture must explicitly omit the information the scenario depends on. This is a deliberate design choice, not a gap to fill. Document the omission explicitly in the session’s state_server_prep field.
  • information_elicitation sessions: Each session’s tool must have required_slots that are genuinely missing from the user’s opening message. The required slots are defined in the session YAML’s beat.elicitation field, not in the fixture itself.
  • process_visibility sessions: The scenario must reference at least two tool calls in sequence (e.g., email_read then maps_route_options, or documents_read then email_save_draft). Single-tool scenarios cannot be used.
  • autonomy_level sessions: At least one fixture must support an external-action tool call (email_send) with a plausible recipient from contacts. The external action must be non-trivially reversible (not just a draft).