Fixture Review
Source outline: data/scripts/outlines/user_a/outline.yaml
This document is the operational fixture audit for User A session scripting. It is not a new pipeline stage. Use it to review and implement fixtures before generating full session YAML.
2026-05-17 status: this audit was generated before
memory_privacywas deprecated. Rows mentioningmemory_privacyor deleted session files are historical and should not be implemented as active fixtures. Regenerate this audit from the revised User A outline before using it for fixture work.
Current Outline Coverage
- Sessions: historical audit over
acc_001-acc_108; revised active script count is 94acc_*.yamlplus 5evolv_*.yaml - Unique fixture paths: 77
- Fixture references: 157
- Probe polarity distribution: Negative=72, Positive=36
Fixture Review Standard
A fixture is acceptable only if it creates a small decision environment, not a single-answer lookup. Each fixture set should let the PA succeed, fail, and reveal the target interaction preference.
For each session, reviewers should be able to answer:
- What is the correct or preferred PA behavior?
- What plausible wrong behavior is available from the same fixture data?
- Why does that wrong behavior expose the
target_cellrather than generic incompetence? - What evidence lets the evaluator judge success or failure?
Required Fixture Components
| Component | Requirement | PA-visible? |
|---|---|---|
| Primary data | The documents, emails, calendars, tools, or option sets needed to attempt the task. | Yes |
| Distractors | Plausible alternatives that can lead to a wrong answer or wrong interaction style. | Yes |
| Hidden constraints | Constraints that should be inferable from fixtures or memory, such as conflicts, permissions, deadlines, or domain boundaries. | Usually yes |
| Oracle | Correct answer, acceptable variants, forbidden choices, and why distractors are wrong. | No |
| Preference signal | Expected positive/negative behavior tied to the exact target_cell. | Reviewer-only or session metadata |
| Failure modes | Common PA mistakes that should be possible but not forced. | No |
Fixture Type Requirements
| Type | Requirement |
|---|---|
document | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
email | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
contact | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
calendar | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
option_set | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
memory | Separate PA-visible memory from reviewer-only oracle. Use memory for ordinary continuity only; do not create active memory_privacy target fixtures. |
tool_api | Mock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence. |
data_source | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
Inventory By Type
calendar
Count: 7
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] calendar/marcus_chen.json | needed | acc_004 personal_process_visibility Negative silent_dinner_schedulingacc_032 personal_autonomy_level Negative confirm_pest_control | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
[needed] calendar/marcus_chen_with_conflict.json | needed | acc_060 personal_autonomy_level Negative autonomous_booking_conflict | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
[needed] calendar/multi_person_calendars.json | needed | acc_068 work_solution_breadth Negative single_meeting_time | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
[needed] calendar/user_a.json | needed | acc_007 personal_reasoning_visibility Negative unexplained_comic_pickup_timeacc_014 work_task_expansion Negative schedule_no_announceacc_044 work_process_visibility Negative silent_committee_taskacc_054 personal_task_expansion Negative book_ticket_no_calendaracc_060 personal_autonomy_level Negative autonomous_booking_conflict | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
[needed] calendar/user_a_disrupted.json | needed | acc_048 personal_reasoning_visibility Negative unexplained_errand_reschedule | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
calendar/marcus_chen.json | referenced | acc_080 personal_process_visibility Positive narrated_holiday_planning | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
calendar/user_a.json | referenced | acc_080 personal_process_visibility Positive narrated_holiday_planningacc_103 personal_memory_privacy Positive transparent_health_memory | Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale. |
contact
Count: 13
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] contacts/dentist_office | needed | acc_036 personal_emotional_engagement Negative empathetic_dentist_rescheduleacc_042 personal_memory_privacy Negative cross_domain_appointment | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
[needed] contacts/dr_smith_office | needed | acc_103 personal_memory_privacy Positive transparent_health_memory | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
[needed] contacts/isp_support | needed | acc_051 personal_information_elicitation Negative vague_internet_outage | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
[needed] contacts/marcus_chen | needed | acc_004 personal_process_visibility Negative silent_dinner_scheduling | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
[needed] contacts/student_mailing_list | needed | acc_014 work_task_expansion Negative schedule_no_announce | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/building_management | referenced | acc_011 personal_information_elicitation Negative vague_maintenance_requestacc_021 personal_proactive_outreach Negative proactive_maintenance_reminderacc_032 personal_autonomy_level Negative confirm_pest_controlacc_072 personal_tone_formality Positive formal_maintenance_emailacc_099 personal_capability_boundary Positive handoff_on_building_policyacc_108 personal_topic_management Positive sequential_home_tasks | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/dentist_office | referenced | acc_096 personal_guidance_level Positive assumed_appointment_booking | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/dr_okafor | referenced | acc_010 work_autonomy_level Negative confirm_admin_replyacc_029 work_capability_boundary Negative stuck_on_policy_search | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/glenmont_civic_accessibility | referenced | acc_031 personal_capability_boundary Negative stuck_on_accessibility_info | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/journal_editor | referenced | acc_058 work_tone_formality Negative informal_journal_inquiryacc_064 work_capability_boundary Negative stuck_on_submission_statusacc_092 work_tone_formality Positive formal_editor_email | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/marcus_chen | referenced | acc_028 personal_tone_formality Negative too_casual_marcus_messageacc_046 personal_process_visibility Negative silent_movie_planningacc_050 personal_autonomy_level Positive autonomous_dinner_bookingacc_052 personal_tone_formality Negative too_casual_grocery_messageacc_071 personal_emotional_engagement Negative empathetic_cancellationacc_078 personal_autonomy_level Negative autonomous_cleaning_scheduleacc_087 personal_task_expansion Positive book_and_inviteacc_090 personal_tone_formality Negative formal_movie_invite_mockedacc_098 personal_tone_formality Negative formal_game_night_messageacc_100 personal_autonomy_level Positive reactive_joint_bookingacc_106 personal_tone_formality Positive informal_marcus_message | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/prof_reyes | referenced | acc_005 work_autonomy_level Negative confirm_draft_sendacc_013 work_autonomy_level Positive autonomous_progress_updateacc_015 work_autonomy_level Negative autonomous_email_erroracc_023 work_tone_formality Negative casual_reyes_emailacc_035 work_emotional_engagement Negative empathetic_reyes_emailacc_040 work_autonomy_level Negative autonomous_revision_sendacc_061 work_guidance_level Negative ask_for_update_guidanceacc_070 work_autonomy_level Positive reactive_final_draft_sendacc_097 work_capability_boundary Positive handoff_on_proprietary_data | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
contacts/stuart_comic_center | referenced | acc_039 personal_topic_management Negative organized_comic_queryacc_062 personal_guidance_level Negative ask_for_comic_order_guidanceacc_065 personal_capability_boundary Negative stuck_on_order_dispute | Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation. |
data_source
Count: 11
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] bank/account_balance.json | needed | acc_074 personal_topic_management Negative organized_personal_tasks | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] events/new_exhibition_booking.json | needed | acc_100 personal_autonomy_level Positive reactive_joint_booking | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] events/wit_lecture_booking.json | needed | acc_060 personal_autonomy_level Negative autonomous_booking_conflict | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] finance/spending_history.json | needed | acc_077 personal_verbosity Positive detailed_spending_breakdown | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] formatting/prl_style_guide.json | needed | acc_094 work_guidance_level Positive assumed_formatting_style | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] home/pantry_list.md | needed | acc_017 personal_task_expansion Negative recipe_no_pantry_check | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] journals/physics_journal_database.json | needed | acc_038 work_solution_breadth Positive single_journal_suggestion | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] maps/glenmont_to_new_restaurant.json | needed | acc_043 personal_verbosity Negative brief_restaurant_directions | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] recipes/thai_curry.md | needed | acc_017 personal_task_expansion Negative recipe_no_pantry_check | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] services/cleaning_service_booking.json | needed | acc_078 personal_autonomy_level Negative autonomous_cleaning_schedule | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
[needed] weather/glenmont_forecast.json | needed | acc_102 personal_uncertainty_expression Positive express_delivery_uncertainty | Define PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle. |
document
Count: 18
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] documents/figures/figure1.png | needed | acc_053 work_task_expansion Negative add_figure_no_caption | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/home/lease_agreement.pdf | needed | acc_099 personal_capability_boundary Positive handoff_on_building_policy | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/papers/okafor_2021.pdf | needed | acc_026 work_guidance_level Negative how_to_summarize | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/research/draft_section2_with_typo.md | needed | acc_015 work_autonomy_level Negative autonomous_email_error | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/research/final_draft.md | needed | acc_070 work_autonomy_level Positive reactive_final_draft_send | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/research/raw_data.proprietary | needed | acc_097 work_capability_boundary Positive handoff_on_proprietary_data | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/research/response_to_referees.md | needed | acc_079 work_process_visibility Positive narrated_referee_response | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
[needed] documents/research/weekly_update.md | needed | acc_013 work_autonomy_level Positive autonomous_progress_update | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/admin/grant_form_draft.md | referenced | acc_073 work_memory_privacy Negative untransparent_grant_helpacc_107 work_memory_privacy Positive transparent_grant_memory | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/ | referenced | acc_107 work_memory_privacy Positive transparent_grant_memory | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/citation_notes.md | referenced | acc_003 work_process_visibility Negative silent_citation_addacc_025 work_topic_management Positive organized_intro_updateacc_055 work_topic_management Negative organized_citation_checkacc_086 work_task_expansion Positive revise_and_update_bib | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/draft_intro.md | referenced | acc_005 work_autonomy_level Negative confirm_draft_sendacc_025 work_topic_management Positive organized_intro_update | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/draft_section2.md | referenced | acc_001 work_verbosity Negative brief_proof_summaryacc_008 work_topic_management Negative sequential_revisionacc_019 work_proactive_outreach Negative proactive_send_suggestion | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/draft_section3.md | referenced | acc_030 work_topic_management Negative disorganized_proof_fixacc_037 work_memory_privacy Negative untransparent_memory_useacc_040 work_autonomy_level Negative autonomous_revision_sendacc_049 work_information_elicitation Negative vague_conference_prepacc_053 work_task_expansion Negative add_figure_no_captionacc_086 work_task_expansion Positive revise_and_update_bib | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/final_draft.md | referenced | acc_094 work_guidance_level Positive assumed_formatting_style | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
documents/research/reyes_feedback.md | referenced | acc_019 work_proactive_outreach Negative proactive_send_suggestionacc_041 work_verbosity Negative brief_feedback_summaryacc_047 work_reasoning_visibility Negative unexplained_revision_priorityacc_066 work_uncertainty_expression Negative uncertain_reyes_approvalacc_101 work_uncertainty_expression Positive confident_timeline_prediction | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
my_desktop/apartment_issue_log.md | referenced | acc_072 personal_tone_formality Positive formal_maintenance_email | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
my_desktop/glenmont_train_exhibition_logistics.md | referenced | acc_002 personal_verbosity Negative brief_exhibition_info | Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections. |
Count: 5
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] email/inbox.jsonl | needed | acc_003 work_process_visibility Negative silent_citation_add | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
[needed] email/inbox_marcus_mocking.jsonl | needed | acc_090 personal_tone_formality Negative formal_movie_invite_mocked | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
[needed] email/inbox_referee_report.jsonl | needed | acc_076 work_verbosity Positive detailed_referee_summary | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
email/inbox.jsonl | referenced | acc_010 work_autonomy_level Negative confirm_admin_replyacc_018 work_topic_management Negative sequential_admin_tasksacc_035 work_emotional_engagement Negative empathetic_reyes_emailacc_044 work_process_visibility Negative silent_committee_taskacc_075 personal_memory_privacy Negative cross_domain_gift_ideaacc_105 personal_emotional_engagement Positive task_focused_schedule_disruption | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
email/inbox_referee_report.jsonl | referenced | acc_079 work_process_visibility Positive narrated_referee_responseacc_081 work_reasoning_visibility Positive explained_revision_planacc_085 work_topic_management Positive sequential_referee_response | Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action. |
memory
Count: 1
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
memory_fixtures/user_a/memory_base.md | referenced | acc_020 personal_memory_privacy Negative cross_domain_weekend_planacc_024 personal_solution_breadth Negative too_many_dinner_optionsacc_037 work_memory_privacy Negative untransparent_memory_useacc_042 personal_memory_privacy Negative cross_domain_appointmentacc_059 personal_solution_breadth Negative too_many_movie_choicesacc_061 work_guidance_level Negative ask_for_update_guidanceacc_062 personal_guidance_level Negative ask_for_comic_order_guidanceacc_063 personal_memory_privacy Positive domain_scoped_outingacc_073 work_memory_privacy Negative untransparent_grant_helpacc_075 personal_memory_privacy Negative cross_domain_gift_ideaacc_082 personal_reasoning_visibility Positive explained_gift_suggestionacc_088 personal_memory_privacy Negative untransparent_leisure_memoryacc_093 personal_solution_breadth Positive single_dinner_suggestionacc_096 personal_guidance_level Positive assumed_appointment_booking | Separate PA-visible memory from reviewer-only oracle; mark allowed domain, forbidden cross-domain facts, and the exact memory item that should or should not be used. |
option_set
Count: 20
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] academic_search/papers_on_anyons.json | needed | acc_033 work_uncertainty_expression Negative uncertain_paper_relevanceacc_056 work_proactive_outreach Negative proactive_paper_search | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] academic_search/papers_on_decoherence.json | needed | acc_012 work_solution_breadth Negative too_many_papers | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] conferences/qip_submission_portal_down.json | needed | acc_064 work_capability_boundary Negative stuck_on_submission_statusacc_069 work_emotional_engagement Negative empathetic_submission_frustration | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] conferences/qip_workshop.json | needed | acc_009 work_information_elicitation Negative vague_travel_planning | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] flights/glenmont_to_chicago.json | needed | acc_022 work_solution_breadth Negative too_many_flights | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] places/glenmont_civic_booking.json | needed | acc_054 personal_task_expansion Negative book_ticket_no_calendar | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] places/pharmacy_hours.json | needed | acc_074 personal_topic_management Negative organized_personal_tasks | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] places/thai_restaurant_booking.json | needed | acc_050 personal_autonomy_level Positive autonomous_dinner_booking | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] rooms/wit_booking_system.json | needed | acc_006 work_reasoning_visibility Negative unexplained_room_bookingacc_018 work_topic_management Negative sequential_admin_tasks | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] rooms/wit_booking_system_with_conflict.json | needed | acc_045 work_solution_breadth Negative single_room_unavailable | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] shipping/tracking_info_ambiguous.json | needed | acc_034 personal_uncertainty_expression Negative hide_delivery_uncertaintyacc_067 personal_uncertainty_expression Negative hide_delivery_window_uncertainty | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] store/stuart_comic_center_hours.json | needed | acc_007 personal_reasoning_visibility Negative unexplained_comic_pickup_timeacc_048 personal_reasoning_visibility Negative unexplained_errand_reschedule | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
[needed] store/stuart_comic_center_orders.json | needed | acc_057 personal_proactive_outreach Negative proactive_comic_route | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
cinema/glenmont_cinema_showtimes.json | referenced | acc_016 personal_autonomy_level Negative confirm_movie_ticketsacc_027 personal_guidance_level Negative what_cinema_infoacc_046 personal_process_visibility Negative silent_movie_planningacc_059 personal_solution_breadth Negative too_many_movie_choicesacc_080 personal_process_visibility Positive narrated_holiday_planning | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
conferences/qip_workshop.json | referenced | acc_083 work_information_elicitation Positive structured_conference_inquiry | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
flights/glenmont_to_chicago.json | referenced | acc_095 work_solution_breadth Positive two_travel_options | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
places/glenmont_restaurants.json | referenced | acc_024 personal_solution_breadth Negative too_many_dinner_options | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
places/thai_restaurant_booking.json | referenced | acc_087 personal_task_expansion Positive book_and_invite | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
shipping/tracking_info_ambiguous.json | referenced | acc_102 personal_uncertainty_expression Positive express_delivery_uncertainty | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
store/stuart_comic_center_orders.json | referenced | acc_091 personal_proactive_outreach Positive check_order_and_stop | Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong. |
tool_api
Count: 2
| Fixture path | Status | Used by acc / cells | Required review work |
|---|---|---|---|
[needed] admin/reimbursement_form_tool.json | needed | acc_089 work_proactive_outreach Positive submit_form_and_stop | Mock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence. |
[needed] finance/rent_payment_tool.json | needed | acc_108 personal_topic_management Positive sequential_home_tasks | Mock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence. |
Session-Level Audit Checklist
Use this checklist while implementing or reviewing each session fixture bundle:
- Does the bundle contain at least one plausible wrong path?
- If the probe is
Negative, is the wrong path naturally available without hard-coding failure? - If the probe is
Positive, does the preferred behavior clearly succeed? - Does the fixture include A/B/C choices when the target cell involves selection, breadth, autonomy, capability boundary, or information elicitation?
- Is there a reviewer-only oracle documenting correct option, distractors, forbidden actions, and scoring evidence?
- Does the fixture avoid testing generic world knowledge, tool availability, or impossible missing data unless the target is
capability_boundary? - Are memory facts marked clearly enough for ordinary continuity, without turning them into
memory_privacytarget probes? - Are external-impact actions marked with risk level and confirmation requirement for
autonomy_levelsessions?
Implementation Note
Fixture files may expose realistic data to the PA, but oracle fields should remain reviewer-only. If a single physical fixture is used by many sessions, it should include session-specific oracle entries keyed by acc_num or scenario_slug.