Update docs; add contributing standards

- docs/overview.md: rewrite for current architecture (src/ layout, split JS/CSS modules, credentials/models/functions/ui config categories, correct test fixture targets) - docs/contributing.md: new — documentation philosophy and style guide - AGENTS.md: add rule to follow docs/contributing.md
2026-03-09 14:22:30 +03:00
parent 084d1aebd5
commit 2ab41ead9f
3 changed files with 156 additions and 65 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -8,3 +8,10 @@
 ## Implementation rules
 - No backward-compatibility shims or legacy endpoint aliases.
 - Run `poetry run presubmit` before finishing any task. Fix all failures before marking work done.
 ## Documentation rules
 Follow `docs/contributing.md`. Key points:
 - Prefer in-code comments and self-documenting code over external docs.
 - Add docstrings only to public functions you create or modify; Google style.
 - Update `docs/overview.md` only for structural/architectural changes, not implementation details.
 - No emojis in any documentation.
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -0,0 +1,57 @@
 # Documentation Standards
 ## Philosophy
 Prefer self-documenting code and in-code comments over external documentation.
 Write external docs only when the information cannot live closer to the code.
 ## What belongs where
 | Content | Where |
 |---------|-------|
 | Why a function does something non-obvious | Inline comment at the relevant line |
 | Contract, parameters, exceptions | Docstring on the function |
 | Module purpose, dependencies, exports | File header comment |
 | Architecture, data flow, system-level decisions | `docs/overview.md` |
 | Setup, configuration, usage for humans | `README.md` |
 | Agent/AI session rules | `AGENTS.md` |
 ## Style
 - Brief and technical. No preambles, no summaries, no filler.
 - No emojis.
 - High-level overview only in external docs — implementation details belong in code.
 - Present tense, imperative mood for instructions.
 ## Python docstrings
 All public functions use Google style: one-line summary, then `Args:`, `Returns:`, `Raises:` sections if non-trivial. List every domain exception the function can raise directly or via propagation.
 ```python
 def crop_save(src: Path, dst: Path, box: tuple[int, int, int, int]) -> None:
    """Crop src image to box and write to dst, overwriting if present.
    Args:
        src: Source image path.
        dst: Destination path; parent directory must exist.
        box: (x, y, w, h) in pixels.
    Raises:
        FileNotFoundError: If src does not exist.
    """
 ```
 ## JS file headers
 Every JS file starts with a block comment stating: purpose, dependencies (Depends on:), exports (Provides:).
 ```js
 /*
 * helpers.js
 * Pure utility functions with no dependencies on other application modules.
 *
 * Provides: esc(), toast(), isDesktop()
 */
 ```
 ## When NOT to document
 - Do not add docstrings or comments to code you did not change.
 - Do not document implementation details that are already clear from the code.
 - Do not keep comments that describe what the code does when the code already says it clearly.
 - Do not update `docs/overview.md` for implementation details — only for structural/architectural changes.
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -6,53 +6,76 @@ AI plugins identify spine text; archive plugins supply bibliographic metadata.
 ## Stack
 - **Server**: FastAPI + SQLite (no ORM), Python 3.11+, Poetry (`poetry run serve`)
- **Frontend**: Single-file vanilla JS SPA (`static/index.html`)
+- **Frontend**: Vanilla JS SPA — `static/index.html` + `static/css/` + `static/js/`; no build step
 - **AI**: OpenAI-compatible API (OpenRouter, OpenAI, etc.) via `openai` library
 - **Images**: Stored uncompressed in `data/images/`; Pillow used server-side for crops and AI prep
 ## Directory Layout
 ```
-app.py                          # FastAPI routes only
+src/
-storage.py                      # DB schema/helpers, settings loading, photo file I/O
+  app.py                        # FastAPI app, exception handlers
-logic.py                        # Image processing, boundary helpers, plugin runners, batch pipeline
+  api.py                        # All routes (APIRouter)
-scripts.py                      # Poetry console entry points: fmt, presubmit
+  db.py                         # All SQL; connection() / transaction() context managers
-plugins/
+  files.py                      # Image file I/O; DATA_DIR, IMAGES_DIR
-  __init__.py                   # Registry: load_plugins(), get_manifest(), get_plugin()
+  config.py                     # Config loading and typed AppConfig
-  rate_limiter.py               # Thread-safe per-domain rate limiter (one global instance)
+  models.py                     # Typed dataclasses / mashumaro decoders
-  ai_compat/
+  errors.py                     # Domain exceptions (NotFoundError, BadRequestError subtypes)
-    __init__.py                 # Exports the four AI plugin classes
+  logic/
-    _client.py                  # Internal: AIClient (openai wrapper, JSON extractor)
+    __init__.py                 # dispatch_plugin() orchestrator + re-exports
-    boundary_detector_shelves.py  # BoundaryDetectorShelvesPlugin
+    boundaries.py               # Boundary math, shelf/spine crop sources, boundary detector runner
-    boundary_detector_books.py    # BoundaryDetectorBooksPlugin
+    identification.py           # Status computation, text recognizer, book identifier runners
-    text_recognizer.py            # TextRecognizerPlugin
+    archive.py                  # Archive searcher runner (sync + background)
-    book_identifier.py            # BookIdentifierPlugin
+    batch.py                    # Batch pipeline, process_book_sync
-  archives/
+    images.py                   # crop_save, prep_img_b64, serve_crop
-    openlibrary.py              # OpenLibrary JSON API
+  plugins/
-    rsl.py                      # RSL AJAX JSON API
+    __init__.py                 # Registry: load_plugins(), get_plugin(), get_manifest()
-    html_scraper.py             # Config-driven HTML scraper (rusneb, alib, shpl)
+    rate_limiter.py             # Thread-safe per-domain rate limiter
-    sru_catalog.py              # SRU XML catalog (nlr)
+    ai_compat/                  # AI plugin implementations
-    telegram_bot.py             # STUB (pending Telegram credentials)
+    archives/                   # Archive plugin implementations
-static/index.html               # Full SPA (no build step)
+scripts/
  presubmit.py                  # Poetry console entry points: fmt, presubmit
 static/
  index.html                    # HTML shell + CSS/JS imports (load order matters)
  css/                          # base, layout, tree, forms, overlays
  js/                           # state → helpers → api → canvas-boundary → tree-render →
                                #   detail-render → canvas-crop → editing → photo → events → init
 config/
-  providers.default.yaml        # Provider credentials (placeholder api_key)
+  credentials.default.yaml      # API endpoints and keys (override in credentials.user.yaml)
-  prompts.default.yaml          # Default prompt templates
+  models.default.yaml           # Model selection and prompts per AI function
-  plugins.default.yaml          # Default plugin configurations
+  functions.default.yaml        # Plugin definitions and per-plugin settings
-  ui.default.yaml               # Default UI settings
+  ui.default.yaml               # UI display settings
-  providers.user.yaml           # ← create this with your real api_key (gitignored)
+  *.user.yaml                   # Gitignored overrides — create these with real values
-  *.user.yaml                   # Optional overrides for other categories (gitignored)
+data/                           # Runtime: books.db + images/ (gitignored)
-data/                           # Runtime: books.db + images/
+tests/
-docs/overview.md                # This file
+  *.py                          # Python tests (pytest)
  js/pure-functions.test.js     # JS tests (node:test)
 docs/
  overview.md                   # This file
  contributing.md               # Documentation and contribution standards
 ```
 ## Layer Architecture
 Unidirectional: `api` → `logic` → `db` / `files`. No layer may import from a layer above it.
 - **api**: HTTP parsing, entity existence checks via `db.connection()`, calls logic, returns HTTP responses. Owns HTTPException and status codes.
 - **logic**: Business operations, no HTTP/FastAPI imports. Raises domain exceptions from `errors.py` for expected failures.
 - **db / files**: SQL and file I/O only. Returns typed dataclasses or None. Never raises domain exceptions.
 ## Configuration System
-Config is loaded from `config/*.default.yaml` merged with `config/*.user.yaml` overrides.
+Config loaded from `config/*.default.yaml` merged with `config/*.user.yaml`. Deep merge: dicts recursive, lists replaced. Typed via `mashumaro BasicDecoder[AppConfig]`.
 Deep merge: dicts are merged recursively; lists in user files replace default lists entirely.
-Categories: `providers`, `prompts`, `plugins`, `ui` — each loaded from its own pair of files.
+Categories:
-Minimal setup — create `config/providers.user.yaml`:
+| File | Purpose |
 |------|---------|
 | `credentials` | `base_url` + `api_key` per endpoint; no model or prompt |
 | `models` | `credentials` ref + `model` string + optional `extra_body` + `prompt` |
 | `functions` | Plugin definitions; dict key = plugin_id (unique across all categories) |
 | `ui` | Frontend display settings |
 Minimal setup — create `config/credentials.user.yaml`:
 ```yaml
-providers:
+credentials:
  openrouter:
    api_key: "sk-or-your-actual-key"
 ```
@@ -62,28 +85,26 @@ providers:
 ### Categories
 | Category | Input | Output | DB field |
 |----------|-------|--------|----------|
-| `boundary_detector` (`target=shelves`) | cabinet image | `{boundaries:[…], confidence:N}` | `cabinets.ai_shelf_boundaries` |
+| `boundary_detectors` (`target=shelves`) | cabinet image | `{boundaries:[…], confidence:N}` | `cabinets.ai_shelf_boundaries` |
-| `boundary_detector` (`target=books`) | shelf image | `{boundaries:[…]}` | `shelves.ai_book_boundaries` |
+| `boundary_detectors` (`target=books`) | shelf image | `{boundaries:[…]}` | `shelves.ai_book_boundaries` |
-| `text_recognizer` | spine image | `{raw_text, title, author, …}` | `books.raw_text` + `candidates` |
+| `text_recognizers` | spine image | `{raw_text, title, author, …}` | `books.raw_text` + `candidates` |
-| `book_identifier` | raw_text | `{title, author, …, confidence}` | `books.ai_*` + `candidates` |
+| `book_identifiers` | raw_text | `{title, author, …, confidence}` | `books.ai_*` + `candidates` |
-| `archive_searcher` | query string | `[{source, title, author, …}, …]` | `books.candidates` |
+| `archive_searchers` | query string | `[{source, title, author, …}, …]` | `books.candidates` |
 ### Universal plugin endpoint
 ```
 POST /api/{entity_type}/{entity_id}/plugin/{plugin_id}
 ```
-Routes to the correct runner function in `logic.py` based on plugin category.
+Routes to the correct runner via `dispatch_plugin()` in `logic/__init__.py`.
 ### AI Plugin Configuration
- **Providers** (`config/providers.*.yaml`): connection credentials only — `base_url`, `api_key`.
+- `credentials` file: connection only — `base_url`, `api_key`.
- **Per-plugin** (`config/plugins.*.yaml`): `provider`, `model`, optionally `max_image_px` (default 1600), `confidence_threshold` (default 0.8).
+- `models` file: `credentials` ref, `model` string, `prompt` text, optional `extra_body`.
- `OUTPUT_FORMAT` is a **hardcoded class constant** in each plugin class — not user-configurable.
+- `functions` file: per-plugin settings — `model`, `max_image_px` (default 1600), `confidence_threshold` (default 0.8), `auto_queue`, `rate_limit_seconds`, `timeout`.
-  It is substituted into the prompt template as `${OUTPUT_FORMAT}` by `AIClient.call()`.
+- `OUTPUT_FORMAT` is a hardcoded class constant in each plugin — not user-configurable; injected into the prompt as `${OUTPUT_FORMAT}` by `AIClient`.
-### Archive Plugin Interface
+### Archive plugins
-All archive plugins implement `search(query: str) -> list[CandidateRecord]`.
+All implement `search(query: str) -> list[CandidateRecord]`. Use shared `RATE_LIMITER` singleton for per-domain throttling.
 `CandidateRecord`: TypedDict with `{source, title, author, year, isbn, publisher}`.
 Uses shared `RATE_LIMITER` singleton for per-domain throttling.
 ### Auto-queue
 - After `text_recognizer` completes → fires all `archive_searchers` with `auto_queue: true` in background thread pool.
@@ -93,7 +114,7 @@ Uses shared `RATE_LIMITER` singleton for per-domain throttling.
 | Table | Notable columns |
 |-------|-----------------|
 | `cabinets` | `shelf_boundaries` (JSON `[…]`), `ai_shelf_boundaries` (JSON `{pluginId:[…]}`) |
-| `shelves` | `book_boundaries`, `ai_book_boundaries` (same format) |
+| `shelves` | `book_boundaries`, `ai_book_boundaries` (same format), `photo_filename` (optional override) |
 | `books` | `raw_text`, `ai_title/author/year/isbn/publisher`, `candidates` (JSON `[{source,…}]`), `identification_status` |
 `identification_status`: `unidentified` → `ai_identified` → `user_approved`.
@@ -102,30 +123,36 @@ Uses shared `RATE_LIMITER` singleton for per-domain throttling.
 N interior boundaries → N+1 segments. `full = [0] + boundaries + [1]`. Segment K spans `full[K]..full[K+1]`.
 - User boundaries: `shelf_boundaries` / `book_boundaries` (editable via canvas drag)
 - AI suggestions: `ai_shelf_boundaries` / `ai_book_boundaries` (JSON object `{pluginId: [fractions]}`)
- Shelf K image = cabinet photo cropped to `(0, y_start, 1, y_end)` unless override photo exists
+- Shelf K image = cabinet photo cropped to `(0, y_start, 1, y_end)` unless shelf has override photo
 - Book K spine = shelf image cropped to `(x_start, *, x_end, *)` with composed crop if cabinet-based
 ## Frontend JS
 No ES modules, no bundler. All files use global scope; load order in `index.html` is the dependency order. State lives in `state.js` (`S`, `_plugins`, `_bnd`, `_photoQueue`, etc.). Events delegated via `#app` in `events.js`.
 ## Tooling
 ```
 poetry run serve       # start uvicorn on :8000
 poetry run fmt         # black (in-place)
-poetry run presubmit   # black --check + flake8 + pyright + pytest  ← run before finishing any task
+poetry run presubmit   # black --check + flake8 + pyright + pytest + JS tests
 npm install            # install ESLint + Prettier (requires network; enables JS lint/fmt in presubmit)
 npm run lint           # ESLint on static/js/
 npm run fmt            # Prettier on static/js/
 ```
-Line length: 120. Type checking: pyright strict mode. Pytest fixtures with `yield` use `Iterator[T]` return type.
+Line length: 120. Pyright strict mode. Pytest fixtures with `yield` return `Iterator[T]`.
-Tests in `tests/`; use `monkeypatch` on `storage.DB_PATH` / `storage.DATA_DIR` for temp-DB fixtures.
+Test fixtures: monkeypatch `db.DB_PATH` / `files.DATA_DIR` / `files.IMAGES_DIR`.
 ## Key API Endpoints
 ```
-GET    /api/config                                      # UI config + plugin manifest
+GET    /api/config                                       # UI config + plugin manifest
-GET    /api/tree                                        # full nested tree
+GET    /api/tree                                         # full nested tree
 POST   /api/{entity_type}/{entity_id}/plugin/{plugin_id} # universal plugin runner
-PATCH  /api/cabinets/{id}/boundaries                    # update shelf boundary list
+PATCH  /api/cabinets/{id}/boundaries                     # update shelf boundary list
-PATCH  /api/shelves/{id}/boundaries                     # update book boundary list
+PATCH  /api/shelves/{id}/boundaries                      # update book boundary list
-GET    /api/shelves/{id}/image                          # shelf image (override or cabinet crop)
+GET    /api/shelves/{id}/image                           # shelf image (override or cabinet crop)
-GET    /api/books/{id}/spine                            # book spine crop
+GET    /api/books/{id}/spine                             # book spine crop
-POST   /api/books/{id}/process                          # run full auto-queue pipeline (single book)
+POST   /api/books/{id}/process                           # full auto-queue pipeline (single book)
-POST   /api/batch                                       # start batch processing
+POST   /api/batch / GET /api/batch/status                # batch processing
-GET    /api/batch/status
+POST   /api/books/{id}/dismiss-field                     # dismiss a candidate suggestion
-POST   /api/books/{id}/dismiss-field                    # dismiss a candidate suggestion
+PATCH  /api/{kind}/reorder                               # drag-to-reorder
-PATCH  /api/{kind}/reorder                              # SortableJS drag reorder
+POST   /api/cabinets/{id}/crop / POST /api/shelves/{id}/crop  # permanent crop
 ```