From 6352373a1f2b397d225e13dbb0113d11aeee8b17 Mon Sep 17 00:00:00 2001 From: Joakim Persson Date: Tue, 5 May 2026 12:35:04 +0200 Subject: [PATCH] fix(feeders): make post-mine repair opt-in, not default The three feeder wrappers (mempalace-docs, mempalace-pi-session, mempalace-session) unconditionally ran 'mempalace repair --yes' after mining, controllable only via --no-repair opt-out. The contrib launchd and systemd templates did not pass --no-repair, so every scheduled tick invoked the destructive in-place HNSW rebuild. This has bitten us twice: - 2026-05-04 09:08: a kickstart triggered repair while an MCP subprocess held the DB open; the live collection was wiped (0 drawers) and had to be restored from the palace.backup snapshot. - 2026-05-05 10:00: post-mine repair crashed mid-rebuild with 'NotFoundError: Collection [] does not exist' - chromadb's rebuild recreated the collection under a new UUID while the code still held the old handle. Live DB survived only by luck (crash hit before the swap). Fix: flip the default. - New flag: --repair (opt-in). Prints a warning and sleeps 3s before invoking 'mempalace repair --yes'. - --no-repair is retained as a deprecated no-op alias for backward compatibility with any scripts/units still passing it. - Default behavior: no repair. Routine ChromaDB add() keeps HNSW consistent; repair is a recovery op, not a maintenance tick. Docs updated to match: README, SKILL, ARCHITECTURE, AGENTS, contrib/README. Scheduling guidance now explicitly warns against enabling --repair on cron/launchd/systemd-timer runs. --- AGENTS.md | 4 ++-- ARCHITECTURE.md | 4 ++-- README.md | 4 ++-- SKILL.md | 2 +- bin/mempalace-docs | 18 ++++++++++++++---- bin/mempalace-pi-session | 18 ++++++++++++++---- bin/mempalace-session | 18 ++++++++++++++---- contrib/README.md | 2 +- 8 files changed, 50 insertions(+), 20 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 7fa2704..f527dc5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -33,10 +33,10 @@ bin/ A third wrapper would justify factoring a shared helper library. Until then, copy the pattern from `mempalace-session` (richest example): 1. Create `bin/` with `#!/usr/bin/env bash` + `chmod +x`. -2. Implement `--help`, `--dry-run`, `--no-repair` flags. +2. Implement `--help`, `--dry-run`, `--repair` flags (repair is opt-in; `--no-repair` kept as deprecated alias). 3. Stage to `~/.cache///` with deterministic filenames. 4. Invoke `mempalace mine ...` (choose `--mode convos` if input is chat-like). -5. End with `mempalace repair` unless `--no-repair`. +5. Do NOT end with `mempalace repair` unless `--repair` was explicitly passed. Repair is a destructive in-place HNSW rebuild and must never run on an unattended schedule. 6. Update `README.md` with usage + rationale. 7. Update `install.sh`? No — `bin/*` is auto-linked. 8. Update `ARCHITECTURE.md` if the wrapper fills a new architectural gap. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 326f0ee..b5c415e 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -128,7 +128,7 @@ What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, - `--since YYYY-MM-DD` — incremental catch-up. - `--session ` — one-shot mode. -**Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`). +**Then:** invokes `mempalace mine --mode convos` against the cache dir. A post-mine `mempalace repair` is **opt-in** via `--repair` — it is intentionally *not* the default because the in-place HNSW rebuild has corrupted live palaces on past runs. Never pass `--repair` from an unattended schedule. --- @@ -210,7 +210,7 @@ That makes the routine worth codifying: **Default: weekly.** Dedup is free on unchanged sessions, and `wing_conversations` growth is roughly linear in user activity. Weekly is frequent enough that searches almost always include recent context, and infrequent enough that the cost is negligible. -**Daily** is fine but wasteful — you'll pay the post-mine `repair` cost seven times more often than you need. If you want daily runs, add `--no-repair` and schedule a separate weekly repair. +**Daily** is fine. Repair is now opt-in (`--repair`) and should never be set on an unattended schedule — run it manually from a quiet session if you suspect stale HNSW state. **Monthly** is too infrequent. You'll search for "that thing we discussed last Tuesday" and miss it. diff --git a/README.md b/README.md index 2072c1a..7116e2b 100644 --- a/README.md +++ b/README.md @@ -320,7 +320,7 @@ mempalace-docs # mine with wing = dirname mempalace-docs --wing my_project # override wing name mempalace-docs --agent alice # record agent on drawers mempalace-docs --dry-run # list files, don't file -mempalace-docs --no-repair # skip post-mine repair +mempalace-docs --repair # opt-in post-mine repair (risky, interactive only) mempalace-docs --help ``` @@ -344,7 +344,7 @@ mempalace-session --since 2026-04-01 # only sessions updated on/aft mempalace-session --min-messages 6 # stricter short-session filter mempalace-session --db /custom/path/opencode.db # non-default DB location mempalace-session --dry-run # export + list, skip mine -mempalace-session --no-repair # skip post-mine index repair +mempalace-session --repair # opt-in post-mine repair (risky, interactive only) mempalace-session --help ``` diff --git a/SKILL.md b/SKILL.md index 2647f32..4bf93d8 100644 --- a/SKILL.md +++ b/SKILL.md @@ -138,7 +138,7 @@ Suggest invoking the tool when any of these apply: | Occasional opencode user | Monthly manual or weekly automated | | Fresh machine / first setup | One-shot full backfill, then schedule | | "I'm about to rebuild the container" | Run now, as a checkpoint | -| Automated daily mines | Pass `--no-repair` + schedule weekly repair separately | +| Automated daily mines | Repair is now opt-in (`--repair`); **never** set it on an unattended schedule. Run `mempalace repair` by hand from a quiet session if HNSW genuinely needs rebuilding. | Don't suggest running more often than daily — the post-mine HNSW repair (~5 min on 5k drawers) dominates cost, and session growth is slow enough that daily is already overkill. diff --git a/bin/mempalace-docs b/bin/mempalace-docs index 469ee57..bac0dc4 100755 --- a/bin/mempalace-docs +++ b/bin/mempalace-docs @@ -36,7 +36,7 @@ AGENT="${USER:-mempalace}" WING="" SRC="" DRY_RUN=0 -NO_REPAIR=0 +DO_REPAIR=0 # File patterns to include. Docs + config + intent-bearing scripts. # Everything else (code) is excluded by omission. @@ -77,7 +77,13 @@ Options: --wing Override wing name (default: source directory name) --agent Agent name recorded on drawers (default: $USER) --dry-run List files that would be mined; do not file - --no-repair Skip `mempalace repair` after mining + --repair Run `mempalace repair` after mining (opt-in). + WARNING: repair does a destructive in-place HNSW + rebuild. If it races a live MCP connection or crashes + mid-rebuild, it can wipe the collection. Only pass + this from a quiet, interactive context. Not safe for + unattended cron/launchd schedules. + --no-repair (Deprecated; no-repair is now the default.) -h, --help Show this help What gets mined: @@ -109,7 +115,8 @@ while [[ $# -gt 0 ]]; do --wing) WING="${2:-}"; shift 2 ;; --agent) AGENT="${2:-}"; shift 2 ;; --dry-run) DRY_RUN=1; shift ;; - --no-repair) NO_REPAIR=1; shift ;; + --repair) DO_REPAIR=1; shift ;; + --no-repair) shift ;; # deprecated alias; no-repair is the default --) shift; break ;; -*) echo "error: unknown option: $1" >&2; usage >&2; exit 1 ;; *) if [[ -z "$SRC" ]]; then SRC="$1"; shift; else echo "error: unexpected arg: $1" >&2; exit 1; fi ;; @@ -258,8 +265,11 @@ if ! mempalace mine "$STAGE" --agent "$AGENT" --wing "$WING"; then fi # ── Repair index ───────────────────────────────────────────────────── -if [[ $NO_REPAIR -eq 0 ]]; then +if [[ $DO_REPAIR -eq 1 ]]; then echo "" + echo "WARNING: --repair runs an in-place HNSW rebuild that has wiped" + echo " live palaces on past runs. Proceeding in 3 seconds..." + sleep 3 echo "Rebuilding HNSW index..." mempalace repair --yes fi diff --git a/bin/mempalace-pi-session b/bin/mempalace-pi-session index 54b6128..9d1fa8b 100755 --- a/bin/mempalace-pi-session +++ b/bin/mempalace-pi-session @@ -51,7 +51,7 @@ SESSION_ID="" SINCE="" MIN_MESSAGES=3 DRY_RUN=0 -NO_REPAIR=0 +DO_REPAIR=0 PI_SESSIONS_DIR="${PI_SESSIONS_DIR:-$HOME/.pi/agent/sessions}" # ── Usage ──────────────────────────────────────────────────────────── @@ -74,7 +74,13 @@ Options: --dry-run Export + list; do not mine into palace. Each session is tagged [NEW] or [SKIP] based on whether its source_file is already in the palace. - --no-repair Skip `mempalace repair` after mining + --repair Run `mempalace repair` after mining (opt-in). + WARNING: repair does a destructive in-place HNSW + rebuild. If it races a live MCP connection or + crashes mid-rebuild, it can wipe the collection. + Only pass this from a quiet, interactive context. + Not safe for unattended cron/launchd schedules. + --no-repair (Deprecated; no-repair is now the default.) -h, --help Show this help Idempotency: @@ -118,7 +124,8 @@ while [[ $# -gt 0 ]]; do --agent) AGENT="${2:-}"; shift 2 ;; --sessions-dir) PI_SESSIONS_DIR="${2:-}"; shift 2 ;; --dry-run) DRY_RUN=1; shift ;; - --no-repair) NO_REPAIR=1; shift ;; + --repair) DO_REPAIR=1; shift ;; + --no-repair) shift ;; # deprecated alias; no-repair is the default --) shift; break ;; -*) echo "error: unknown option: $1" >&2; usage >&2; exit 1 ;; *) echo "error: unexpected arg: $1" >&2; exit 1 ;; @@ -463,8 +470,11 @@ if ! mempalace mine "$STAGE" --mode convos --wing "$WING" --agent "$AGENT"; then fi # ── Repair index ───────────────────────────────────────────────────── -if [[ $NO_REPAIR -eq 0 ]]; then +if [[ $DO_REPAIR -eq 1 ]]; then echo "" + echo "WARNING: --repair runs an in-place HNSW rebuild that has wiped" + echo " live palaces on past runs. Proceeding in 3 seconds..." + sleep 3 echo "Rebuilding HNSW index..." mempalace repair --yes fi diff --git a/bin/mempalace-session b/bin/mempalace-session index 4781bdf..846ad92 100755 --- a/bin/mempalace-session +++ b/bin/mempalace-session @@ -47,7 +47,7 @@ SESSION_ID="" SINCE="" MIN_MESSAGES=3 DRY_RUN=0 -NO_REPAIR=0 +DO_REPAIR=0 OPENCODE_DB="${OPENCODE_DB:-$HOME/.local/share/opencode/opencode.db}" # ── Usage ──────────────────────────────────────────────────────────── @@ -69,7 +69,13 @@ Options: --dry-run Export + list; do not mine into palace. Each session is tagged [NEW] or [SKIP] based on whether its source_file is already present in the palace. - --no-repair Skip `mempalace repair` after mining + --repair Run `mempalace repair` after mining (opt-in). + WARNING: repair does a destructive in-place HNSW + rebuild. If it races a live MCP connection or + crashes mid-rebuild, it can wipe the collection. + Only pass this from a quiet, interactive context. + Not safe for unattended cron/launchd schedules. + --no-repair (Deprecated; no-repair is now the default.) -h, --help Show this help Idempotency: @@ -117,7 +123,8 @@ while [[ $# -gt 0 ]]; do --agent) AGENT="${2:-}"; shift 2 ;; --db) OPENCODE_DB="${2:-}"; shift 2 ;; --dry-run) DRY_RUN=1; shift ;; - --no-repair) NO_REPAIR=1; shift ;; + --repair) DO_REPAIR=1; shift ;; + --no-repair) shift ;; # deprecated alias; no-repair is the default --) shift; break ;; -*) echo "error: unknown option: $1" >&2; usage >&2; exit 1 ;; *) echo "error: unexpected arg: $1" >&2; exit 1 ;; @@ -392,8 +399,11 @@ if ! mempalace mine "$STAGE" --mode convos --wing "$WING" --agent "$AGENT"; then fi # ── Repair index ───────────────────────────────────────────────────── -if [[ $NO_REPAIR -eq 0 ]]; then +if [[ $DO_REPAIR -eq 1 ]]; then echo "" + echo "WARNING: --repair runs an in-place HNSW rebuild that has wiped" + echo " live palaces on past runs. Proceeding in 3 seconds..." + sleep 3 echo "Rebuilding HNSW index..." mempalace repair --yes fi diff --git a/contrib/README.md b/contrib/README.md index 283c029..0ab3c50 100644 --- a/contrib/README.md +++ b/contrib/README.md @@ -318,7 +318,7 @@ The in-container mempalace sees only the container's opencode.db and palace (via - Dedup is free on unchanged sessions, so there's no cost to running daily other than the ~5 min post-mine repair. - Weekly keeps the palace fresh enough that searches almost always return current context. -**Daily or more:** edit `OnCalendar=` or the cron DOW field. On a daily schedule, add `--no-repair` to the wrapper invocation and let a separate weekly unit handle repair — otherwise you repair 7× more often than you need. +**Scheduling cadence:** edit `OnCalendar=` or the cron DOW field. Post-mine repair is now **opt-in** (`--repair`) and should NOT be added to unattended schedules — the in-place HNSW rebuild has wiped live palaces on past runs. Run `mempalace repair` manually from a quiet interactive session if you ever need it. **Monthly:** probably too infrequent. You'll search for "that thing we discussed last Tuesday" and miss it.