Docs: explain diary vs session mine (why keep both)

Automated session mining could plausibly lead a user (or a future agent) to conclude that writing diary entries is redundant — mining captures every turn, so why also write a compressed summary at wind-down? That conclusion is wrong, and it's worth explaining why in the docs so both disciplines survive. ARCHITECTURE.md §5 gets a new subsection 'Diary vs session mine: why keep both?' that presents this as a first-class concept: - Comparison table — content, granularity, compression, authorship, signal density, retrieval pattern, and the question each answers. - The defining property of a diary entry: editorial judgment by the author. Captures meta-observations that were never said aloud during the session (lessons, patterns, pending items, aggregate counts). Mining raw turns can never surface these because the words don't exist verbatim. - Three practical scenarios where the distinction bites: wake-up token economics, 'what did we decide' vs 'what did we say', redundancy covering each other's failure modes. - Practical implications: don't skip either habit, let them specialize (diary = release notes; mine = git log). README.md gets a brief teaser in the 'First mine' area with a link to the canonical ARCHITECTURE.md section — enough for a skim reader to decide they want to keep writing diaries, and for a deep reader to know where to go for the full explanation. SKILL.md replaces the three-line 'Relationship to the mempalace skill' note with a compact version of the comparison table and a direct call-out of the 'session mining means I don't need diaries' misconception agents fall into. Points agents at ARCHITECTURE.md §5 for the full treatment when users ask the question. Cross-references verified: anchor slug for the new section is #diary-vs-session-mine-why-keep-both (standard slug rules: colon, spaces, punctuation removed/hyphenated). Both linking docs use the matching fragment.
2026-04-30 08:56:20 +00:00
parent 349a3a3d3d
commit 2f703a8ebc
3 changed files with 62 additions and 6 deletions
@@ -224,7 +224,46 @@ That makes the routine worth codifying:
 | Wind-down diary write | Agent session end | Agent, during session |
 | `mempalace-session` mine | Between sessions (manual or scheduled) | Operator or automation |
-The first two are live; the third is batched. They're complementary, not alternatives. A machine doing only wake-up/wind-down keeps a diary but loses the actual conversation turns. A machine doing only `mempalace-session` captures the raw turns but not the curated summaries. Do both.
+The first two are live; the third is batched. They're complementary, not alternatives. The next subsection explains why both matter.
 #### Diary vs session mine: why keep both?
 A reasonable question: *"if every session is mined into `wing_conversations` anyway, what's the point of the agent also writing a diary entry?"* They're not redundant. They answer different questions and cover each other's failure modes.
 |  | Session mine (`wing_conversations`) | Diary (`wing_<agent>`) |
 |---|---|---|
 | Content | Every turn verbatim — prompts, responses, tool calls, dead ends, typos | Curated summary — what was decided, discovered, left pending |
 | Granularity | One session ≈ 50–200 drawers | One session ≈ 1 drawer |
 | Compression | None (raw JSONL → normalized turns) | High (AAAK dialect — dots + pipes + entity codes, ~30× reduction) |
 | Written by | Nothing — extracted from `opencode.db` | The agent that lived the session, at wind-down |
 | Signal density | High noise (wrong turns, corrections, `/exit`'d threads) | High signal (agent's editorial judgment of what mattered) |
 | Retrieval pattern | Semantic search (`mempalace_search("topic X")`) | Recency scan (`mempalace_diary_read(last_n=5)`) |
 | Answers the question | *"What did we say exactly?"* | *"What did we accomplish / learn / decide?"* |
 The distinguishing property of a diary entry is **editorial judgment by the author**. The diary captures things that were *never said aloud during the session* — meta-observations the agent made about the session as a whole:
 - *"this pattern came up again, worth remembering"*
 - *"user caught the bug before I shipped it — lesson: verify CLI examples against `--help` first"*
 - *"10 commits across 3 repos today, all pushed"*
 - *"healthy interruption: user stopped me before a long-running step"*
 These are thoughts *about* the session, not utterances *during* it. Mining the raw turns will never surface them because the exact words were never spoken — they're the agent's reflection at wind-down.
 **Three scenarios where the distinction matters in practice:**
 1. **Wake-up token economics.** Reading `mempalace_diary_read(last_n=5)` returns five dense drawers, maybe 1–2k tokens total, 100% signal. Matching that orientation from the session mine would require semantic-searching for recent topics and reading chunks of raw turns — hundreds of drawers, tens of thousands of tokens, 90% noise.
 2. **"What did we decide?" vs. "what did we say?"** If you ask *"when did we decide to split `mempalace-toolkit` from `cli_utils`?"* the diary gives you the crisp answer (date, trigger, rationale). The session mine gives you the actual seven-turn conversation that led up to the decision, including the turns where alternatives were considered. Both useful; different questions.
 3. **Redundancy as safety.** If the agent `/exit`s without writing a diary (heuristic save missed it, no upstream hook), the session mine still catches the raw content. If `mempalace-session` hasn't run this week, the diary still captures the session's essence. The two systems cover each other's failure modes.
 **Practical implications for how you work with mempalace:**
 - **Don't skip diary writing** just because sessions are mined. A session without a diary entry is a session the next agent can read word-for-word but has no compressed summary of — expensive to orient against.
 - **Don't skip session mining** just because agents write diaries. Diaries miss content (especially on `/exit`), and semantic search over raw turns is valuable when "what did we say exactly?" is the right question.
 - **Do both, and let them specialize.** Treat the diary as your release notes (editorial, curated, recency-scanned) and the session mine as your git log (raw, searchable, complete). A repo keeps both; so should the palace.
 If anything, automating session mining *increases* the value of diary entries. The agent can focus the diary on the parts mining cannot capture — meta-observations, self-critique, pattern noticing, pending work — rather than re-stating content the mine already has.
 #### Automation
@@ -188,6 +188,15 @@ mempalace-docs /workspace/my_project
 > **Note:** mempalace has no one-time global init. The palace itself is created lazily on first write (at `~/.mempalace/palace/`). `mempalace init <dir>` is a *per-project* command that sets up a `mempalace.yaml` + entity list for a specific source directory — optional, not a prerequisite for either wrapper.
 ### Diary vs session mine: why keep both?
 Automated session mining captures every turn verbatim into `wing_conversations`. But agents are still expected to write a short AAAK-compressed diary entry at wind-down (the consumer-side `mempalace` skill calls this out as mandatory). They're not redundant — they answer different questions:
 - **Session mine** = git log with diffs. *"What did we say exactly?"* Raw, searchable, complete. High noise.
 - **Diary** = release notes. *"What did we decide / learn / accomplish?"* Curated, compressed, recency-scanned. The agent's editorial judgment of what mattered, including meta-observations that were never said aloud.
 A machine running only one of these has half a memory. Full treatment with practical implications in [`ARCHITECTURE.md` §5 → "Diary vs session mine: why keep both?"](ARCHITECTURE.md#diary-vs-session-mine-why-keep-both). Short answer: automate the mine, keep writing diaries, and let them specialize.
 ### Keeping it fresh (automation)
 Manual invocation is fine while you're actively driving the machine, but long-running devboxes benefit from a weekly automated mine. [`contrib/`](contrib/) ships ready-to-install templates:
@@ -142,14 +142,22 @@ Suggest invoking the tool when any of these apply:
 Don't suggest running more often than daily — the post-mine HNSW repair (~5 min on 5k drawers) dominates cost, and session growth is slow enough that daily is already overkill.
-### Relationship to the `mempalace` skill
+### Relationship to the `mempalace` skill — and why diaries still matter
-The consumer-side `mempalace` skill covers in-session habits (wake-up search, wind-down diary). `mempalace-session` is **between-session maintenance** — it runs when no agent session is active, typically automated. Both are needed:
+The consumer-side `mempalace` skill covers in-session habits (wake-up search, wind-down diary write). `mempalace-session` is **between-session maintenance** — automated, raw-turn capture. The two are complementary, not alternatives:
- Wake-up/wind-down → curated summaries (what mattered, what was decided)
+|  | `mempalace-session` (this tool) | Agent diary (`mempalace` skill) |
- `mempalace-session` → raw turns (the actual words exchanged)
+|---|---|---|
 | Content | Every turn verbatim | Curated AAAK summary |
 | Written by | Extraction from `opencode.db` | The agent, at wind-down |
 | Answers | *"What did we say?"* | *"What did we decide / learn?"* |
 | Retrieval | Semantic search | Recency scan |
-A machine running only one of these has half a memory. Agents loading the `mempalace` skill on an opencode machine should be aware this tool exists and point users at it if they ask about persistence.
+**Common misconception agents fall into:** *"session mining runs weekly, so I don't need to write diary entries."* Wrong. Diary entries capture meta-observations that were never said during the session — "lesson learned", "pattern confirmed", "user caught a bug before ship", "pending next time". Mining raw turns can never surface those because the words don't exist verbatim; they're the agent's editorial judgment at wind-down.
 If a user asks whether diaries are redundant given automated mining, point them at [`ARCHITECTURE.md` §5 → "Diary vs session mine: why keep both?"](../../ARCHITECTURE.md#diary-vs-session-mine-why-keep-both) for the full treatment. Short answer: release notes vs. git log — a repo keeps both, and so should the palace.
 Both systems cover each other's failure modes too: a missed diary (agent `/exit`'d without writing) → session mine catches the raw content; a missed mine (cron behind schedule, new machine) → diary captures the essence. Belt and braces.
 ### Quick automation pitch