🔎 Search before asking
🐛 Bug (问题描述)
I just noticed that the extracted table in the exported markdown file is different from the exported JSON file, which causes data integrity and consistency issue.
The markdown output is accurate, but the JSON output is not.
virology_pg2_0_res.json
virology_pg2_0.md
🏃♂️ Environment (运行环境)
OS macOS-26.2
Environment Jupyter
Python 3.13.2
PaddleOCR 3.4.0
Install uv
RAM 16.00 GB
CPU Apple M1
CUDA None
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PaddleOCR-VL-1.5")
output = pipeline.predict(input="./data/virology_pg2.pdf")
pages_res = list(output)
output = pipeline.restructure_pages(pages_res)
# output = pipeline.restructure_pages(pages_res, merge_table=True) # Merge tables across pages
# output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True) # Merge tables across pages and reconstruct multi-level titles
# output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True, concatenate_pages=True) # Merge tables across pages, reconstruct multi-level titles, and merge multiple pages
for res in output:
res.print() # Print the structured prediction output
res.save_to_json(save_path="output") # Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") # Save the current image's result in Markdown format
virology_pg2.pdf
🔎 Search before asking
🐛 Bug (问题描述)
I just noticed that the extracted table in the exported markdown file is different from the exported JSON file, which causes data integrity and consistency issue.
The markdown output is accurate, but the JSON output is not.
virology_pg2_0_res.json
virology_pg2_0.md
🏃♂️ Environment (运行环境)
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
virology_pg2.pdf