Skip to content

Commit bd77b1c

Browse files
committed
feat: add PDF to Image conversion with poppler-utils integration
🚀 New Features: - PDF to PNG image conversion with 300 DPI quality - Cross-platform poppler-utils tool detection and installation guidance - Interactive WebView installation guide with copy commands - Batch conversion support for multiple PDF files - Smart file naming with filename_Images folder structure 🔧 Technical Implementation: - External tool approach using pdftoppm command for reliable conversion - Comprehensive error handling and user feedback - Progress tracking for long-running operations - International support (English/Chinese) for all user interfaces - Platform-specific installation instructions (macOS/Windows/Linux) 📚 Documentation Updates: - Updated README.md and README.zh-cn.md with PDF to Image feature - Enhanced GitHub Pages website (English and Chinese versions) - Updated version to 0.1.7 across all documentation - Added comprehensive feature descriptions and usage examples 🧪 Testing & Quality: - Integration tests for PDF conversion functionality - Multi-platform compatibility validation - Real-world testing with sample PDF files - Code cleanup and removal of debug files 🌐 Multi-language Support: - Complete i18n implementation for all new features - Chinese and English interface translations - Platform-specific installation guidance in both languages This release significantly expands document conversion capabilities with professional-grade PDF to image conversion, maintaining the extension's focus on simplicity and reliability.
1 parent 0ea4239 commit bd77b1c

23 files changed

Lines changed: 1456 additions & 135 deletions

.github/ISSUES/pdf-to-image.md

Lines changed: 71 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -3,104 +3,120 @@
33
## 📋 Development Task
44

55
### Task Description
6-
Implement functionality to convert PDF pages to individual image files with support for different formats, quality settings, and page range selection.
6+
Implement functionality to convert PDF pages to PNG images using poppler-utils command-line tool. This feature provides a simple, one-click conversion with minimal user configuration required.
77

88
### Acceptance Criteria
9-
- [ ] Convert PDF pages to PNG/JPG images
10-
- [ ] Support custom resolution and quality settings
11-
- [ ] Allow page range selection (e.g., pages 1-5, or specific pages)
9+
- [ ] Convert PDF pages to PNG images (standardized format)
10+
- [ ] Use poppler-utils (pdftoppm) as conversion engine
11+
- [ ] Detect and guide users to install poppler-utils if not available
1212
- [ ] Batch process multiple PDF files
13-
- [ ] Maintain original page proportions and quality
13+
- [ ] Use standard settings (300 DPI, PNG format) for optimal quality
1414
- [ ] Add progress tracking for multi-page conversions
15-
- [ ] Support different output formats (PNG, JPG, WebP)
1615
- [ ] Create organized folder structure for output images
17-
- [ ] Add configuration options for image settings
16+
- [ ] Cross-platform installation detection (Windows, macOS, Linux)
1817

1918
### Technical Requirements
20-
- Integrate PDF rendering library (`pdf2pic`, `pdf-poppler`, `pdf.js`)
21-
- Implement page range parsing and validation
22-
- Add image quality and format options
19+
- Use poppler-utils (pdftoppm command) for PDF to image conversion
20+
- Implement tool availability detection across platforms
21+
- Provide clear installation guidance for missing tools
22+
- Handle command execution with proper error handling
2323
- Support batch processing with progress tracking
24-
- Handle large PDF files efficiently (memory management)
25-
- Cross-platform compatibility (Windows, macOS, Linux)
24+
- Create organized output folder structure
25+
- Maintain cross-platform compatibility (Windows, macOS, Linux)
2626

27-
### Implementation Notes
28-
1. **Library Evaluation**:
29-
- `pdf2pic`: Node.js wrapper for GraphicsMagick/ImageMagick
30-
- `pdf-poppler`: Node.js wrapper for Poppler PDF utilities
31-
- `pdf.js`: Mozilla's PDF rendering library
32-
- Consider bundle size and dependencies
33-
34-
2. **Configuration Options**:
27+
### Implementation Strategy
28+
1. **Tool Detection System**:
3529
```typescript
36-
interface PDFToImageOptions {
37-
format: 'png' | 'jpg' | 'webp';
38-
quality: number; // 1-100 for JPG
39-
density: number; // DPI (72, 150, 300)
40-
pageRange?: string; // "1-5", "1,3,5", "all"
41-
outputDir?: string;
42-
prefix?: string; // filename prefix
30+
interface ToolAvailability {
31+
isInstalled: boolean;
32+
version?: string;
33+
installationGuide: string;
4334
}
4435
```
4536

46-
3. **Page Range Parsing**:
47-
- "all" - convert all pages
48-
- "1-5" - convert pages 1 through 5
49-
- "1,3,5" - convert specific pages
50-
- "1-3,7,10-12" - mixed ranges
37+
2. **Standard Conversion Settings**:
38+
- Format: PNG (best quality, transparency support)
39+
- Resolution: 300 DPI (high quality for text and images)
40+
- Color space: RGB
41+
- Compression: Standard PNG compression
42+
43+
3. **Command Template**:
44+
```bash
45+
pdftoppm -png -r 300 input.pdf output_prefix
46+
```
47+
48+
4. **Installation Guidance**:
49+
- **macOS**: `brew install poppler`
50+
- **Windows**: Download portable version or use package manager
51+
- **Linux**: `sudo apt-get install poppler-utils` (Ubuntu/Debian)
5152

52-
4. **File Naming Convention**:
53-
- Single page: `document_page_001.png`
54-
- Multiple PDFs: `document1_page_001.png`, `document2_page_001.png`
55-
- Custom prefix: `{prefix}_page_{number}.{ext}`
53+
5. **File Naming Convention**:
54+
- Single PDF: `document-01.png`, `document-02.png`
55+
- Multiple PDFs: `document1-01.png`, `document2-01.png`
5656

57-
5. **Output Organization**:
57+
6. **Output Organization**:
5858
```
5959
pdf_images/
6060
├── document1/
61-
│ ├── page_001.png
62-
│ ├── page_002.png
61+
│ ├── document1-01.png
62+
│ ├── document1-02.png
6363
│ └── ...
6464
└── document2/
65-
├── page_001.png
65+
├── document2-01.png
6666
└── ...
6767
```
6868

6969
### Testing Requirements
70-
- [ ] Image quality validation tests
71-
- [ ] Page range parsing tests
70+
- [ ] Tool detection on all platforms (Windows, macOS, Linux)
71+
- [ ] Command execution and error handling tests
72+
- [ ] Installation guidance verification
7273
- [ ] Performance tests with large PDFs (100+ pages)
73-
- [ ] Memory usage optimization tests
74+
- [ ] Batch processing functionality tests
7475
- [ ] Cross-platform compatibility tests
75-
- [ ] Error handling for corrupted PDFs
76+
- [ ] Error handling for corrupted PDFs and missing tools
7677

7778
### Dependencies
78-
- PDF rendering library (research and selection needed)
79-
- Image processing utilities
80-
- Page range parsing utilities
79+
- poppler-utils (external command-line tool)
80+
- Child process execution utilities
8181
- File system operations
8282
- Progress tracking integration
83+
- Cross-platform path handling
84+
85+
### User Experience Flow
86+
1. User selects PDF file(s) for conversion
87+
2. Extension checks if poppler-utils is installed
88+
3. If not installed, show installation guide with platform-specific instructions
89+
4. If installed, proceed with conversion using standard settings
90+
5. Show progress bar for multi-page documents
91+
6. Display completion message with output location
92+
93+
### Benefits of Simplified Approach
94+
- **Zero Configuration**: No options to confuse users
95+
- **Consistent Output**: All images use optimal settings
96+
- **Faster Development**: No complex UI for options
97+
- **Better Reliability**: Single tested configuration
98+
- **Easier Maintenance**: Fewer edge cases to handle
8399

84100
### Related Issues
85101
- Part of Advanced Document Processing v0.2.0
86102
- Should integrate with existing PDF text conversion
87-
- May share PDF parsing infrastructure
103+
- Share tool detection infrastructure with other converters
88104

89105
### Estimated Effort
90106
- [x] 1-2 weeks
91107
- [ ] 2+ weeks
92108

93109
**Breakdown**:
94-
- Library research and evaluation: 2-3 days
95-
- Core conversion implementation: 3-4 days
96-
- Page range and options handling: 2-3 days
97-
- UI integration and testing: 2-3 days
110+
- Tool detection system implementation: 2-3 days
111+
- Core conversion command execution: 2-3 days
112+
- Installation guidance and UI: 1-2 days
113+
- Testing and cross-platform validation: 2-3 days
98114

99115
### Priority Level
100116
- [ ] Critical
101-
- [ ] High
102-
- [x] Medium
117+
- [x] High
118+
- [ ] Medium
103119
- [ ] Low
104120

105121
### Labels
106-
`enhancement`, `feature-request`, `development`, `v0.2.0`, `pdf`, `image-conversion`
122+
`enhancement`, `feature-request`, `development`, `v0.2.0`, `pdf`, `image-conversion`, `poppler-utils`

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,9 @@ yarn-error.log*
4040

4141
# Build artifacts
4242
*.map
43+
44+
# Compiled JavaScript files (keep only config files)
45+
src/**/*.js
46+
src/**/*.js.map
47+
!esbuild.js
48+
!eslint.config.mjs

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@ Check [Keep a Changelog](http://keepachangelog.com/) for recommendations on how
66

77
## [Unreleased]
88

9+
### Planned Features
10+
- **PDF to Images Conversion**: Convert PDF pages to PNG images using poppler-utils
11+
- One-click conversion with 300 DPI standard quality
12+
- Cross-platform tool detection and installation guidance
13+
- Simplified workflow with zero user configuration required
14+
- Batch processing support for multiple PDF files
15+
- Technical approach changed from in-extension libraries to external tool integration
16+
917
## [0.1.6] - 2025-07-09
1018

1119
### Added

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ A powerful VS Code extension for converting various document formats to Markdown
1111
- **Excel Spreadsheets** (.xlsx, .xls, .csv) → Markdown Tables
1212
- **Excel Spreadsheets** (.xlsx, .xls) → CSV Files
1313
- **PDF Documents** (.pdf) → Text Files
14+
- **PDF Documents** (.pdf) → PNG Images *(requires poppler-utils)*
1415
- **PowerPoint Presentations** (.pptx, .ppt) → Markdown
1516

1617
### Core Features
@@ -65,6 +66,7 @@ npm run compile
6566
- `Convert Excel to Markdown` - Convert Excel files to Markdown tables
6667
- `Convert Excel to CSV` - Convert Excel files to CSV format
6768
- `Convert PDF to Text` - Convert PDF to text files
69+
- `Convert PDF to Images` - Convert PDF pages to PNG images *(requires poppler-utils)*
6870
- `Convert PowerPoint to Markdown` - Convert PowerPoint presentations to Markdown
6971
- `Extract Word Tables to CSV` - Extract tables from Word documents to CSV format
7072
- `Extract PDF Tables to CSV` - Extract tables from PDF documents to CSV format
@@ -93,6 +95,8 @@ npm run compile
9395
- Automatic data formatting
9496

9597
### PDF Document Conversion
98+
99+
#### Text Extraction
96100
- **Advanced Text Processing Algorithms**:
97101
- Smart space correction
98102
- Word boundary detection
@@ -108,6 +112,20 @@ npm run compile
108112
- Organize content by paragraphs
109113
- Markdown format output
110114

115+
#### Image Conversion *(New Feature)*
116+
- **PDF to Images**: Convert PDF pages to high-quality PNG images
117+
- **Tool Requirement**: Requires poppler-utils installation
118+
- **Standard Settings**: 300 DPI resolution for optimal quality
119+
- **Batch Processing**: Convert multiple PDFs with progress tracking
120+
- **Cross-Platform**: Automatic tool detection with installation guidance
121+
- **Organized Output**: Creates structured folder hierarchy for images
122+
- **One-Click Setup**: Simple installation guidance for missing tools
123+
124+
**Installation Guide for poppler-utils**:
125+
- **macOS**: `brew install poppler`
126+
- **Windows**: Download portable version or use package manager
127+
- **Linux**: `sudo apt-get install poppler-utils`
128+
111129
### PowerPoint Presentation Conversion
112130
- **Slide Content Extraction**:
113131
- Extract text from all slides

README.zh-cn.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- **Excel表格** (.xlsx, .xls, .csv) → Markdown表格
1212
- **Excel表格** (.xlsx, .xls) → CSV文件
1313
- **PDF文档** (.pdf) → 文本文件
14+
- **PDF文档** (.pdf) → PNG图片 *(需要poppler-utils工具)*
1415
- **PowerPoint演示文稿** (.pptx, .ppt) → Markdown
1516

1617
### 核心功能
@@ -61,6 +62,7 @@ npm run compile
6162
- `将Excel转换为Markdown` - 转换Excel文件为Markdown表格
6263
- `将Excel转换为CSV` - 转换Excel文件为CSV格式
6364
- `将PDF转换为文本` - 转换PDF为文本文件
65+
- `将PDF转换为图片` - 转换PDF页面为PNG图片 *(需要poppler-utils工具)*
6466
- `将PowerPoint转换为Markdown` - 转换PowerPoint演示文稿为Markdown
6567
- `提取Word表格为CSV` - 从Word文档提取表格为CSV格式
6668
- `提取PDF表格为CSV` - 从PDF文档提取表格为CSV格式
@@ -89,6 +91,8 @@ npm run compile
8991
- 自动数据格式化
9092

9193
### PDF文档转换
94+
95+
#### 文本提取
9296
- **高级文本处理算法**:
9397
- 智能空格修复
9498
- 单词边界检测
@@ -104,6 +108,24 @@ npm run compile
104108
- 按段落组织内容
105109
- Markdown格式输出
106110

111+
#### 图片转换 *(新功能)*
112+
- **PDF转图片**: 将PDF页面转换为高质量PNG图片
113+
- **工具要求**: 需要安装poppler-utils工具
114+
- **标准设置**: 300 DPI分辨率,确保最佳质量
115+
- **批量处理**: 支持多个PDF文件转换,带进度跟踪
116+
- **跨平台**: 自动检测工具,提供安装指导
117+
- **有序输出**: 创建结构化的文件夹层次结构
118+
- **一键设置**: 当工具缺失时提供简单的安装指导
119+
120+
**poppler-utils安装指南**:
121+
- **macOS**: `brew install poppler`
122+
- **Windows**: 下载便携版本或使用包管理器
123+
- **Linux**: `sudo apt-get install poppler-utils`
124+
- **输出增强**:
125+
- 添加文档元数据
126+
- 按段落组织内容
127+
- Markdown格式输出
128+
107129
### PowerPoint演示文稿转换
108130
- **幻灯片内容提取**:
109131
- 从所有幻灯片提取文本

ROADMAP.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -74,20 +74,23 @@
7474
- [ ] Support batch export of multiple presentations
7575

7676
##### 2.3 PDF to Image Conversion
77-
- Convert PDF pages to individual images
78-
- Support different image formats and quality settings
79-
- Batch processing for multi-page PDFs
77+
- Convert PDF pages to PNG images using poppler-utils
78+
- One-click conversion with optimal default settings
79+
- Cross-platform tool detection and installation guidance
80+
- Batch processing for multiple PDFs
8081

8182
**Technical Requirements:**
82-
- Integrate PDF rendering library (e.g., `pdf2pic`, `pdf-poppler`)
83-
- Add image quality and format options
84-
- Implement page range selection
83+
- Use poppler-utils (pdftoppm) as conversion engine
84+
- Implement tool availability detection system
85+
- Provide platform-specific installation guidance
86+
- Standard settings: 300 DPI, PNG format
8587

8688
**Acceptance Criteria:**
87-
- [ ] Convert PDF pages to PNG/JPG images
88-
- [ ] Support custom resolution and quality settings
89-
- [ ] Allow page range selection
90-
- [ ] Batch process multiple PDF files
89+
- [ ] Convert PDF pages to high-quality PNG images
90+
- [ ] Detect poppler-utils installation across platforms
91+
- [ ] Guide users through tool installation if needed
92+
- [ ] Batch process multiple PDF files with progress tracking
93+
- [ ] Create organized output folder structure
9194

9295
---
9396

@@ -151,9 +154,9 @@
151154

152155
#### New Libraries to Evaluate:
153156
- **PowerPoint Processing**: `pptx-parser`, `officegen`, `node-pptx`
154-
- **Image Generation**: `pdf2pic`, `canvas`, `sharp`
155157
- **Table Processing**: Enhanced `mammoth.js` usage, custom table parsers
156-
- **PDF Rendering**: `pdf-poppler`, `pdf2pic`
158+
- **External Tools**: poppler-utils (pdftoppm for PDF to image conversion)
159+
- **Image Processing**: `sharp` for image optimization (if needed)
157160

158161
#### Compatibility Requirements:
159162
- Node.js compatibility for VS Code extension environment

docs/index.html

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
<div class="hero-content">
9292
<div class="hero-badge">
9393
<i class="fas fa-rocket"></i>
94-
Latest v0.1.6 - Excel to CSV Conversion!
94+
Latest v0.1.7 - PDF to Image Conversion!
9595
</div>
9696
<h1 class="hero-title">
9797
<span class="highlight">OneClick</span> Document to Markdown Conversion
@@ -199,8 +199,8 @@ <h3 class="feature-title">Excel Spreadsheets</h3>
199199
</div>
200200
<h3 class="feature-title">PDF Documents</h3>
201201
<p class="feature-description">
202-
Extract and optimize text from PDF files with advanced processing
203-
algorithms and smart formatting.
202+
Extract and optimize text from PDF files or convert PDF pages to high-quality PNG images.
203+
Advanced processing algorithms ensure smart formatting and reliable image conversion.
204204
</p>
205205
</div>
206206

@@ -215,6 +215,17 @@ <h3 class="feature-title">Table Extraction</h3>
215215
</p>
216216
</div>
217217

218+
<div class="feature-card">
219+
<div class="feature-icon">
220+
<i class="fas fa-images"></i>
221+
</div>
222+
<h3 class="feature-title">PDF to Images</h3>
223+
<p class="feature-description">
224+
Convert PDF pages to high-quality PNG images with automatic tool detection
225+
and cross-platform installation guidance for poppler-utils.
226+
</p>
227+
</div>
228+
218229
<div class="feature-card">
219230
<div class="feature-icon">
220231
<i class="fas fa-file-powerpoint"></i>

0 commit comments

Comments
 (0)