Add per-request AI logging, DB batch queue, WS entity updates, and UI polish

- log_thread.py: thread-safe ContextVar bridge so executor threads can log individual LLM calls and archive searches back to the event loop - ai_log.py: init_thread_logging(), notify_entity_update(); WS now pushes entity_update messages when book data changes after any plugin or batch run - batch.py: replace batch_pending.json with batch_queue SQLite table; run_batch_consumer() reads queue dynamically so new books can be added while batch is running; add_to_queue() deduplicates - migrate.py: fix _migrate_v1 (clear-on-startup bug); add _migrate_v2 for batch_queue table - _client.py / archive.py / identification.py: wrap each LLM API call and archive search with log_thread start/finish entries - api.py: POST /api/batch returns {already_running, added}; notify_entity_update after identify pipeline - models.default.yaml: strengthen ai_identify confidence-scoring instructions; warn against placeholder data - detail-render.js: book log entries show clickable ID + spine thumbnail; book spine/title images open full-screen popup - events.js: batch-start handles already_running+added; open-img-popup action - init.js: entity_update WS handler; image popup close listeners - overlays.css / index.html: full-screen image popup overlay - eslint.config.js: add new globals; fix no-redeclare/no-unused-vars for multi-file global architecture; all lint errors resolved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 12:10:54 +03:00
parent fd32be729f
commit b94f222c96
41 changed files with 2566 additions and 586 deletions
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -20,15 +20,18 @@ src/
  config.py                     # Config loading and typed AppConfig
  models.py                     # Typed dataclasses / mashumaro decoders
  errors.py                     # Domain exceptions (NotFoundError, BadRequestError subtypes)
+  log_thread.py                 # Thread-safe logging context (ContextVar + event-loop bridge for executor threads)
  logic/
    __init__.py                 # dispatch_plugin() orchestrator + re-exports
    boundaries.py               # Boundary math, shelf/spine crop sources, boundary detector runner
    identification.py           # Status computation, text recognizer, book identifier runners
    archive.py                  # Archive searcher runner (sync + background)
-    batch.py                    # Batch pipeline, process_book_sync
+    batch.py                    # Batch queue consumer (run_batch_consumer); queue persisted in batch_queue DB table
+    ai_log.py                   # AI request ring buffer + WebSocket pub-sub (log_start/log_finish/notify_entity_update); persisted to ai_log table
    images.py                   # crop_save, prep_img_b64, serve_crop
+  migrate.py                    # DB migration; run_migration() called at startup
  plugins/
-    __init__.py                 # Registry: load_plugins(), get_plugin(), get_manifest()
+    __init__.py                 # Registry: load_plugins(), get_plugin(), get_manifest(), get_all_text_recognizers(), get_all_book_identifiers(), get_all_archive_searchers()
    rate_limiter.py             # Thread-safe per-domain rate limiter
    ai_compat/                  # AI plugin implementations
    archives/                   # Archive plugin implementations
@@ -71,7 +74,7 @@ Categories:
 | `credentials` | `base_url` + `api_key` per endpoint; no model or prompt |
 | `models` | `credentials` ref + `model` string + optional `extra_body` + `prompt` |
 | `functions` | Plugin definitions; dict key = plugin_id (unique across all categories) |
-| `ui` | Frontend display settings |
+| `ui` | Frontend display settings (`boundary_grab_px`, `spine_padding_pct`, `ai_log_max_entries`) |

 Minimal setup — create `config/credentials.user.yaml`:
 ```yaml
@@ -88,9 +91,19 @@ credentials:
 | `boundary_detectors` (`target=shelves`) | cabinet image | `{boundaries:[…], confidence:N}` | `cabinets.ai_shelf_boundaries` |
 | `boundary_detectors` (`target=books`) | shelf image | `{boundaries:[…]}` | `shelves.ai_book_boundaries` |
 | `text_recognizers` | spine image | `{raw_text, title, author, …}` | `books.raw_text` + `candidates` |
-| `book_identifiers` | raw_text | `{title, author, …, confidence}` | `books.ai_*` + `candidates` |
+| `book_identifiers` | raw_text + archive results + optional images | `[{title, author, …, score, sources}, …]` | `books.ai_blocks` + `books.ai_*` |
 | `archive_searchers` | query string | `[{source, title, author, …}, …]` | `books.candidates` |

+### Identification pipeline (`POST /api/books/{id}/identify`)
+Single endpoint runs the full pipeline in sequence:
+1. **VLM text recognizer** reads the spine image → `raw_text` and structured fields.
+2. **All archive searchers** run in parallel with title+author and title-only queries.
+3. Archive results are **deduplicated** by normalized full-field match (case-insensitive, punctuation removed, spaces collapsed).
+4. **Main identifier model** receives `raw_text`, deduplicated archive results, and (if `is_vlm: true`) spine + title-page images. Returns ranked `IdentifyBlock` list.
+5. `ai_blocks` stored persistently in the DB (never cleared; overwritten each pipeline run). Top block updates `ai_*` fields if score ≥ `confidence_threshold`.
+
+`functions.*.yaml` key for `book_identifiers`: add `is_vlm: true` for models that accept images.
+
 ### Universal plugin endpoint
 ```
 POST /api/{entity_type}/{entity_id}/plugin/{plugin_id}
@@ -108,14 +121,22 @@ All implement `search(query: str) -> list[CandidateRecord]`. Use shared `RATE_LI

 ### Auto-queue
 - After `text_recognizer` completes → fires all `archive_searchers` with `auto_queue: true` in background thread pool.
- `POST /api/batch` → runs `text_recognizers` then `archive_searchers` for all unidentified books.
+- `POST /api/batch` → adds all unidentified books to the `batch_queue` DB table; starts `run_batch_consumer()` if not already running. Calling again while running adds newly-unidentified books to the live queue.

 ## Database Schema (key fields)
 | Table | Notable columns |
 |-------|-----------------|
 | `cabinets` | `shelf_boundaries` (JSON `[…]`), `ai_shelf_boundaries` (JSON `{pluginId:[…]}`) |
 | `shelves` | `book_boundaries`, `ai_book_boundaries` (same format), `photo_filename` (optional override) |
-| `books` | `raw_text`, `ai_title/author/year/isbn/publisher`, `candidates` (JSON `[{source,…}]`), `identification_status` |
+| `books` | `raw_text`, `ai_title/author/year/isbn/publisher`, `candidates` (JSON `[{source,…}]`), `ai_blocks` (JSON `[{title,author,year,isbn,publisher,score,sources}]`), `identification_status` |
+| `batch_queue` | `book_id` (PK), `added_at` — persistent batch processing queue; consumed in FIFO order by `run_batch_consumer()` |
+
+`ai_blocks` are persistent: set by the identification pipeline, shown in the book detail panel as clickable cards. Hidden by default for `user_approved` books.
+
+### DB Migration (`src/migrate.py`)
+`run_migration()` is called at startup (after `init_db()`). Migrations:
+- `_migrate_v1`: adds the `ai_blocks` column if absent; clears stale AI fields (runs once only, not on every startup).
+- `_migrate_v2`: creates the `batch_queue` table if absent.

 `identification_status`: `unidentified` → `ai_identified` → `user_approved`.

@@ -127,7 +148,12 @@ N interior boundaries → N+1 segments. `full = [0] + boundaries + [1]`. Segment
 - Book K spine = shelf image cropped to `(x_start, *, x_end, *)` with composed crop if cabinet-based

 ## Frontend JS
-No ES modules, no bundler. All files use global scope; load order in `index.html` is the dependency order. State lives in `state.js` (`S`, `_plugins`, `_bnd`, `_photoQueue`, etc.). Events delegated via `#app` in `events.js`.
+No ES modules, no bundler. All files use global scope; load order in `index.html` is the dependency order. State lives in `state.js` (`S`, `_plugins`, `_bnd`, `_photoQueue`, `_aiLog`, `_aiLogWs`, etc.). Events delegated via `#app` in `events.js`.
+
+`connectAiLogWs()` subscribes to `/ws/ai-log` on startup. Message types:
+- `snapshot` — full log on connect (`_aiLog` initialized)
+- `update` — single log entry added or updated (spinner count in header updated)
+- `entity_update` — entity data changed (tree node updated via `walkTree`; detail panel or full render depending on selection)

 ## Tooling
 ```
@@ -150,8 +176,11 @@ PATCH  /api/cabinets/{id}/boundaries                     # update shelf boundary
 PATCH  /api/shelves/{id}/boundaries                      # update book boundary list
 GET    /api/shelves/{id}/image                           # shelf image (override or cabinet crop)
 GET    /api/books/{id}/spine                             # book spine crop
+POST   /api/books/{id}/identify                          # full identification pipeline (VLM → archives → main model)
 POST   /api/books/{id}/process                           # full auto-queue pipeline (single book)
 POST   /api/batch / GET /api/batch/status                # batch processing
+WS     /ws/batch                                         # batch progress push (replaces polling)
+WS     /ws/ai-log                                        # AI request log: snapshot + update per request + entity_update on book changes
 POST   /api/books/{id}/dismiss-field                     # dismiss a candidate suggestion
 PATCH  /api/{kind}/reorder                               # drag-to-reorder
 POST   /api/cabinets/{id}/crop / POST /api/shelves/{id}/crop  # permanent crop