Fixture Review

Source outline: data/scripts/outlines/user_a/outline.yaml

This document is the operational fixture audit for User A session scripting. It is not a new pipeline stage. Use it to review and implement fixtures before generating full session YAML.

2026-05-17 status: this audit was generated before memory_privacy was deprecated. Rows mentioning memory_privacy or deleted session files are historical and should not be implemented as active fixtures. Regenerate this audit from the revised User A outline before using it for fixture work.

Current Outline Coverage

  • Sessions: historical audit over acc_001-acc_108; revised active script count is 94 acc_*.yaml plus 5 evolv_*.yaml
  • Unique fixture paths: 77
  • Fixture references: 157
  • Probe polarity distribution: Negative=72, Positive=36

Fixture Review Standard

A fixture is acceptable only if it creates a small decision environment, not a single-answer lookup. Each fixture set should let the PA succeed, fail, and reveal the target interaction preference.

For each session, reviewers should be able to answer:

  1. What is the correct or preferred PA behavior?
  2. What plausible wrong behavior is available from the same fixture data?
  3. Why does that wrong behavior expose the target_cell rather than generic incompetence?
  4. What evidence lets the evaluator judge success or failure?

Required Fixture Components

ComponentRequirementPA-visible?
Primary dataThe documents, emails, calendars, tools, or option sets needed to attempt the task.Yes
DistractorsPlausible alternatives that can lead to a wrong answer or wrong interaction style.Yes
Hidden constraintsConstraints that should be inferable from fixtures or memory, such as conflicts, permissions, deadlines, or domain boundaries.Usually yes
OracleCorrect answer, acceptable variants, forbidden choices, and why distractors are wrong.No
Preference signalExpected positive/negative behavior tied to the exact target_cell.Reviewer-only or session metadata
Failure modesCommon PA mistakes that should be possible but not forced.No

Fixture Type Requirements

TypeRequirement
documentInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
emailInclude full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.
contactInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
calendarInclude at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
option_setProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
memorySeparate PA-visible memory from reviewer-only oracle. Use memory for ordinary continuity only; do not create active memory_privacy target fixtures.
tool_apiMock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence.
data_sourceDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.

Inventory By Type

calendar

Count: 7

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] calendar/marcus_chen.jsonneededacc_004 personal_process_visibility Negative silent_dinner_scheduling
acc_032 personal_autonomy_level Negative confirm_pest_control
Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
[needed] calendar/marcus_chen_with_conflict.jsonneededacc_060 personal_autonomy_level Negative autonomous_booking_conflictInclude at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
[needed] calendar/multi_person_calendars.jsonneededacc_068 work_solution_breadth Negative single_meeting_timeInclude at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
[needed] calendar/user_a.jsonneededacc_007 personal_reasoning_visibility Negative unexplained_comic_pickup_time
acc_014 work_task_expansion Negative schedule_no_announce
acc_044 work_process_visibility Negative silent_committee_task
acc_054 personal_task_expansion Negative book_ticket_no_calendar
acc_060 personal_autonomy_level Negative autonomous_booking_conflict
Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
[needed] calendar/user_a_disrupted.jsonneededacc_048 personal_reasoning_visibility Negative unexplained_errand_rescheduleInclude at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
calendar/marcus_chen.jsonreferencedacc_080 personal_process_visibility Positive narrated_holiday_planningInclude at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.
calendar/user_a.jsonreferencedacc_080 personal_process_visibility Positive narrated_holiday_planning
acc_103 personal_memory_privacy Positive transparent_health_memory
Include at least one correct slot and plausible conflicting slots; encode owner, availability, hard conflicts, soft preferences, and oracle rationale.

contact

Count: 13

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] contacts/dentist_officeneededacc_036 personal_emotional_engagement Negative empathetic_dentist_reschedule
acc_042 personal_memory_privacy Negative cross_domain_appointment
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
[needed] contacts/dr_smith_officeneededacc_103 personal_memory_privacy Positive transparent_health_memoryInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
[needed] contacts/isp_supportneededacc_051 personal_information_elicitation Negative vague_internet_outageInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
[needed] contacts/marcus_chenneededacc_004 personal_process_visibility Negative silent_dinner_schedulingInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
[needed] contacts/student_mailing_listneededacc_014 work_task_expansion Negative schedule_no_announceInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/building_managementreferencedacc_011 personal_information_elicitation Negative vague_maintenance_request
acc_021 personal_proactive_outreach Negative proactive_maintenance_reminder
acc_032 personal_autonomy_level Negative confirm_pest_control
acc_072 personal_tone_formality Positive formal_maintenance_email
acc_099 personal_capability_boundary Positive handoff_on_building_policy
acc_108 personal_topic_management Positive sequential_home_tasks
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/dentist_officereferencedacc_096 personal_guidance_level Positive assumed_appointment_bookingInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/dr_okaforreferencedacc_010 work_autonomy_level Negative confirm_admin_reply
acc_029 work_capability_boundary Negative stuck_on_policy_search
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/glenmont_civic_accessibilityreferencedacc_031 personal_capability_boundary Negative stuck_on_accessibility_infoInclude role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/journal_editorreferencedacc_058 work_tone_formality Negative informal_journal_inquiry
acc_064 work_capability_boundary Negative stuck_on_submission_status
acc_092 work_tone_formality Positive formal_editor_email
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/marcus_chenreferencedacc_028 personal_tone_formality Negative too_casual_marcus_message
acc_046 personal_process_visibility Negative silent_movie_planning
acc_050 personal_autonomy_level Positive autonomous_dinner_booking
acc_052 personal_tone_formality Negative too_casual_grocery_message
acc_071 personal_emotional_engagement Negative empathetic_cancellation
acc_078 personal_autonomy_level Negative autonomous_cleaning_schedule
acc_087 personal_task_expansion Positive book_and_invite
acc_090 personal_tone_formality Negative formal_movie_invite_mocked
acc_098 personal_tone_formality Negative formal_game_night_message
acc_100 personal_autonomy_level Positive reactive_joint_booking
acc_106 personal_tone_formality Positive informal_marcus_message
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/prof_reyesreferencedacc_005 work_autonomy_level Negative confirm_draft_send
acc_013 work_autonomy_level Positive autonomous_progress_update
acc_015 work_autonomy_level Negative autonomous_email_error
acc_023 work_tone_formality Negative casual_reyes_email
acc_035 work_emotional_engagement Negative empathetic_reyes_email
acc_040 work_autonomy_level Negative autonomous_revision_send
acc_061 work_guidance_level Negative ask_for_update_guidance
acc_070 work_autonomy_level Positive reactive_final_draft_send
acc_097 work_capability_boundary Positive handoff_on_proprietary_data
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.
contacts/stuart_comic_centerreferencedacc_039 personal_topic_management Negative organized_comic_query
acc_062 personal_guidance_level Negative ask_for_comic_order_guidance
acc_065 personal_capability_boundary Negative stuck_on_order_dispute
Include role, relationship, channel, risk level, and whether the PA may act directly or must ask confirmation.

data_source

Count: 11

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] bank/account_balance.jsonneededacc_074 personal_topic_management Negative organized_personal_tasksDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] events/new_exhibition_booking.jsonneededacc_100 personal_autonomy_level Positive reactive_joint_bookingDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] events/wit_lecture_booking.jsonneededacc_060 personal_autonomy_level Negative autonomous_booking_conflictDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] finance/spending_history.jsonneededacc_077 personal_verbosity Positive detailed_spending_breakdownDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] formatting/prl_style_guide.jsonneededacc_094 work_guidance_level Positive assumed_formatting_styleDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] home/pantry_list.mdneededacc_017 personal_task_expansion Negative recipe_no_pantry_checkDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] journals/physics_journal_database.jsonneededacc_038 work_solution_breadth Positive single_journal_suggestionDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] maps/glenmont_to_new_restaurant.jsonneededacc_043 personal_verbosity Negative brief_restaurant_directionsDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] recipes/thai_curry.mdneededacc_017 personal_task_expansion Negative recipe_no_pantry_checkDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] services/cleaning_service_booking.jsonneededacc_078 personal_autonomy_level Negative autonomous_cleaning_scheduleDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.
[needed] weather/glenmont_forecast.jsonneededacc_102 personal_uncertainty_expression Positive express_delivery_uncertaintyDefine PA-visible data, distractors, correct answer, common failure modes, and reviewer-only oracle.

document

Count: 18

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] documents/figures/figure1.pngneededacc_053 work_task_expansion Negative add_figure_no_captionInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/home/lease_agreement.pdfneededacc_099 personal_capability_boundary Positive handoff_on_building_policyInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/papers/okafor_2021.pdfneededacc_026 work_guidance_level Negative how_to_summarizeInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/research/draft_section2_with_typo.mdneededacc_015 work_autonomy_level Negative autonomous_email_errorInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/research/final_draft.mdneededacc_070 work_autonomy_level Positive reactive_final_draft_sendInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/research/raw_data.proprietaryneededacc_097 work_capability_boundary Positive handoff_on_proprietary_dataInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/research/response_to_referees.mdneededacc_079 work_process_visibility Positive narrated_referee_responseInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
[needed] documents/research/weekly_update.mdneededacc_013 work_autonomy_level Positive autonomous_progress_updateInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/admin/grant_form_draft.mdreferencedacc_073 work_memory_privacy Negative untransparent_grant_help
acc_107 work_memory_privacy Positive transparent_grant_memory
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/referencedacc_107 work_memory_privacy Positive transparent_grant_memoryInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/citation_notes.mdreferencedacc_003 work_process_visibility Negative silent_citation_add
acc_025 work_topic_management Positive organized_intro_update
acc_055 work_topic_management Negative organized_citation_check
acc_086 work_task_expansion Positive revise_and_update_bib
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/draft_intro.mdreferencedacc_005 work_autonomy_level Negative confirm_draft_send
acc_025 work_topic_management Positive organized_intro_update
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/draft_section2.mdreferencedacc_001 work_verbosity Negative brief_proof_summary
acc_008 work_topic_management Negative sequential_revision
acc_019 work_proactive_outreach Negative proactive_send_suggestion
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/draft_section3.mdreferencedacc_030 work_topic_management Negative disorganized_proof_fix
acc_037 work_memory_privacy Negative untransparent_memory_use
acc_040 work_autonomy_level Negative autonomous_revision_send
acc_049 work_information_elicitation Negative vague_conference_prep
acc_053 work_task_expansion Negative add_figure_no_caption
acc_086 work_task_expansion Positive revise_and_update_bib
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/final_draft.mdreferencedacc_094 work_guidance_level Positive assumed_formatting_styleInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
documents/research/reyes_feedback.mdreferencedacc_019 work_proactive_outreach Negative proactive_send_suggestion
acc_041 work_verbosity Negative brief_feedback_summary
acc_047 work_reasoning_visibility Negative unexplained_revision_priority
acc_066 work_uncertainty_expression Negative uncertain_reyes_approval
acc_101 work_uncertainty_expression Positive confident_timeline_prediction
Include enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
my_desktop/apartment_issue_log.mdreferencedacc_072 personal_tone_formality Positive formal_maintenance_emailInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.
my_desktop/glenmont_train_exhibition_logistics.mdreferencedacc_002 personal_verbosity Negative brief_exhibition_infoInclude enough content for omission or overreach to be observable; add reviewer-only key facts, must-include details, and plausible distractor sections.

email

Count: 5

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] email/inbox.jsonlneededacc_003 work_process_visibility Negative silent_citation_addInclude full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.
[needed] email/inbox_marcus_mocking.jsonlneededacc_090 personal_tone_formality Negative formal_movie_invite_mockedInclude full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.
[needed] email/inbox_referee_report.jsonlneededacc_076 work_verbosity Positive detailed_referee_summaryInclude full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.
email/inbox.jsonlreferencedacc_010 work_autonomy_level Negative confirm_admin_reply
acc_018 work_topic_management Negative sequential_admin_tasks
acc_035 work_emotional_engagement Negative empathetic_reyes_email
acc_044 work_process_visibility Negative silent_committee_task
acc_075 personal_memory_privacy Negative cross_domain_gift_idea
acc_105 personal_emotional_engagement Positive task_focused_schedule_disruption
Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.
email/inbox_referee_report.jsonlreferencedacc_079 work_process_visibility Positive narrated_referee_response
acc_081 work_reasoning_visibility Positive explained_revision_plan
acc_085 work_topic_management Positive sequential_referee_response
Include full thread context, ambiguous instructions or attachments where relevant, and oracle notes for the correct reply/action.

memory

Count: 1

Fixture pathStatusUsed by acc / cellsRequired review work
memory_fixtures/user_a/memory_base.mdreferencedacc_020 personal_memory_privacy Negative cross_domain_weekend_plan
acc_024 personal_solution_breadth Negative too_many_dinner_options
acc_037 work_memory_privacy Negative untransparent_memory_use
acc_042 personal_memory_privacy Negative cross_domain_appointment
acc_059 personal_solution_breadth Negative too_many_movie_choices
acc_061 work_guidance_level Negative ask_for_update_guidance
acc_062 personal_guidance_level Negative ask_for_comic_order_guidance
acc_063 personal_memory_privacy Positive domain_scoped_outing
acc_073 work_memory_privacy Negative untransparent_grant_help
acc_075 personal_memory_privacy Negative cross_domain_gift_idea
acc_082 personal_reasoning_visibility Positive explained_gift_suggestion
acc_088 personal_memory_privacy Negative untransparent_leisure_memory
acc_093 personal_solution_breadth Positive single_dinner_suggestion
acc_096 personal_guidance_level Positive assumed_appointment_booking
Separate PA-visible memory from reviewer-only oracle; mark allowed domain, forbidden cross-domain facts, and the exact memory item that should or should not be used.

option_set

Count: 20

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] academic_search/papers_on_anyons.jsonneededacc_033 work_uncertainty_expression Negative uncertain_paper_relevance
acc_056 work_proactive_outreach Negative proactive_paper_search
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] academic_search/papers_on_decoherence.jsonneededacc_012 work_solution_breadth Negative too_many_papersProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] conferences/qip_submission_portal_down.jsonneededacc_064 work_capability_boundary Negative stuck_on_submission_status
acc_069 work_emotional_engagement Negative empathetic_submission_frustration
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] conferences/qip_workshop.jsonneededacc_009 work_information_elicitation Negative vague_travel_planningProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] flights/glenmont_to_chicago.jsonneededacc_022 work_solution_breadth Negative too_many_flightsProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] places/glenmont_civic_booking.jsonneededacc_054 personal_task_expansion Negative book_ticket_no_calendarProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] places/pharmacy_hours.jsonneededacc_074 personal_topic_management Negative organized_personal_tasksProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] places/thai_restaurant_booking.jsonneededacc_050 personal_autonomy_level Positive autonomous_dinner_bookingProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] rooms/wit_booking_system.jsonneededacc_006 work_reasoning_visibility Negative unexplained_room_booking
acc_018 work_topic_management Negative sequential_admin_tasks
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] rooms/wit_booking_system_with_conflict.jsonneededacc_045 work_solution_breadth Negative single_room_unavailableProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] shipping/tracking_info_ambiguous.jsonneededacc_034 personal_uncertainty_expression Negative hide_delivery_uncertainty
acc_067 personal_uncertainty_expression Negative hide_delivery_window_uncertainty
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] store/stuart_comic_center_hours.jsonneededacc_007 personal_reasoning_visibility Negative unexplained_comic_pickup_time
acc_048 personal_reasoning_visibility Negative unexplained_errand_reschedule
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
[needed] store/stuart_comic_center_orders.jsonneededacc_057 personal_proactive_outreach Negative proactive_comic_routeProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
cinema/glenmont_cinema_showtimes.jsonreferencedacc_016 personal_autonomy_level Negative confirm_movie_tickets
acc_027 personal_guidance_level Negative what_cinema_info
acc_046 personal_process_visibility Negative silent_movie_planning
acc_059 personal_solution_breadth Negative too_many_movie_choices
acc_080 personal_process_visibility Positive narrated_holiday_planning
Provide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
conferences/qip_workshop.jsonreferencedacc_083 work_information_elicitation Positive structured_conference_inquiryProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
flights/glenmont_to_chicago.jsonreferencedacc_095 work_solution_breadth Positive two_travel_optionsProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
places/glenmont_restaurants.jsonreferencedacc_024 personal_solution_breadth Negative too_many_dinner_optionsProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
places/thai_restaurant_booking.jsonreferencedacc_087 personal_task_expansion Positive book_and_inviteProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
shipping/tracking_info_ambiguous.jsonreferencedacc_102 personal_uncertainty_expression Positive express_delivery_uncertaintyProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.
store/stuart_comic_center_orders.jsonreferencedacc_091 personal_proactive_outreach Positive check_order_and_stopProvide A/B/C style alternatives with one best answer, plausible distractors, hidden constraints, and oracle explaining why each option is correct or wrong.

tool_api

Count: 2

Fixture pathStatusUsed by acc / cellsRequired review work
[needed] admin/reimbursement_form_tool.jsonneededacc_089 work_proactive_outreach Positive submit_form_and_stopMock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence.
[needed] finance/rent_payment_tool.jsonneededacc_108 personal_topic_management Positive sequential_home_tasksMock success and failure returns; include permission boundaries, error states, and reviewer-only expected tool-use sequence.

Session-Level Audit Checklist

Use this checklist while implementing or reviewing each session fixture bundle:

  • Does the bundle contain at least one plausible wrong path?
  • If the probe is Negative, is the wrong path naturally available without hard-coding failure?
  • If the probe is Positive, does the preferred behavior clearly succeed?
  • Does the fixture include A/B/C choices when the target cell involves selection, breadth, autonomy, capability boundary, or information elicitation?
  • Is there a reviewer-only oracle documenting correct option, distractors, forbidden actions, and scoring evidence?
  • Does the fixture avoid testing generic world knowledge, tool availability, or impossible missing data unless the target is capability_boundary?
  • Are memory facts marked clearly enough for ordinary continuity, without turning them into memory_privacy target probes?
  • Are external-impact actions marked with risk level and confirmation requirement for autonomy_level sessions?

Implementation Note

Fixture files may expose realistic data to the PA, but oracle fields should remain reviewer-only. If a single physical fixture is used by many sessions, it should include session-specific oracle entries keyed by acc_num or scenario_slug.