Commit ecd1cd8
committed
Add page sectioning, web rendering steps and replace Listr2 with cli-progress
Implements the remaining pipeline steps (page-sectioning, web-rendering with
HTML validation) and replaces Listr2 with cli-progress for reliable concurrent
page processing. Listr2 had a fundamental bug where nested task.newListr()
subtasks were not properly awaited under concurrency, causing pipeline steps
to run out of order.
Key changes:
- Add page-sectioning and web-rendering pipeline steps with Liquid prompts
- Add HTML validation (data-id uniqueness, text containment, image refs)
- Replace Listr2 with cli-progress MultiBar + custom async concurrency pool
- Generate unique per-text IDs (pg001_gp001_tx001) for web rendering
- Add configurable max_retries to StepConfig (default 8 for web rendering)
- Add step/item_id columns to llm_log table with v2→v3 migration
- Progress bars per pipeline step instead of per page for better scaling
- Spinner for metadata extraction, progress bar for PDF extraction1 parent a17627a commit ecd1cd8
24 files changed
Lines changed: 2021 additions & 333 deletions
File tree
- packages
- pipeline
- src
- __tests__
- storage/src
- __tests__
- types/src
- prompts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
30 | 52 | | |
31 | 53 | | |
32 | 54 | | |
33 | 55 | | |
34 | 56 | | |
35 | 57 | | |
36 | 58 | | |
37 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
38 | 70 | | |
39 | 71 | | |
40 | 72 | | |
41 | 73 | | |
42 | 74 | | |
43 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
44 | 81 | | |
45 | 82 | | |
46 | 83 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | | - | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | | - | |
20 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
21 | 25 | | |
22 | 26 | | |
0 commit comments