diff --git a/README_zh-CN.md b/README_zh-CN.md
new file mode 100644
index 000000000..ad22d7e30
--- /dev/null
+++ b/README_zh-CN.md
@@ -0,0 +1,391 @@
+# MarkItDown
+
+[![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/)
+![PyPI - Downloads](https://img.shields.io/pypi/dd/markitdown)
+[![Built by AutoGen Team](https://img.shields.io/badge/Built%20by-AutoGen%20Team-blue)](https://github.com/microsoft/autogen)
+
+> ⚠️ **重要提示**
+> 
+> MarkItDown 以当前进程的权限执行 I/O 操作。与 `open()` 或 `requests.get()` 类似，它会访问进程本身能够访问的资源。在不受信任的环境中使用时，请务必对输入进行安全清理，并根据您的使用场景调用最窄范围的 `convert_*` 函数（例如 `convert_stream()` 或 `convert_local()`）。更多信息请参阅[安全考虑](#安全考虑)部分。
+
+## 项目简介
+
+MarkItDown 是一个轻量级的 Python 工具，用于将各种文件格式转换为 Markdown，适用于大语言模型（LLM）和相关文本分析管道。
+
+与 [textract](https://github.com/deanmalmgren/textract) 等工具相比，MarkItDown 更专注于保留重要的文档结构和内容为 Markdown 格式（包括：标题、列表、表格、链接等）。虽然输出通常具有良好的可读性且对人类友好，但它主要是为文本分析工具设计的 —— 对于需要高保真文档转换供人类阅读的场景，可能不是最佳选择。
+
+## 为什么选择 Markdown？
+
+Markdown 非常接近纯文本，只有最少的标记或格式，但仍然提供了一种表示重要文档结构的方式。主流的大语言模型，如 OpenAI 的 GPT-4o，原生"理解"Markdown，并且经常在未提示的情况下在响应中使用 Markdown。这表明它们在大量 Markdown 格式的文本上进行了训练，并且理解得很好。作为附带好处，Markdown 约定在 token 效率方面也非常高。
+
+## 支持的格式
+
+MarkItDown 目前支持从以下格式转换：
+
+| 类别 | 格式 |
+|------|------|
+| 文档格式 | PDF, Word (DOCX), Excel (XLSX/XLS), PowerPoint (PPTX), EPUB |
+| 网页格式 | HTML, Wikipedia, YouTube, Bing 搜索结果, RSS |
+| 媒体格式 | 图片 (EXIF元数据+OCR), 音频 (EXIF元数据+语音转录) |
+| 其他格式 | ZIP, CSV, JSON, XML, Jupyter Notebook, Outlook 消息 |
+| 云端服务 | Azure Document Intelligence |
+
+## 环境要求
+
+MarkItDown 需要 Python 3.10 或更高版本。建议使用虚拟环境以避免依赖冲突。
+
+### 使用标准 Python 创建虚拟环境
+
+```bash
+python -m venv .venv
+# Windows
+.venv\Scripts\activate
+# Linux/macOS
+source .venv/bin/activate
+```
+
+### 使用 uv 创建虚拟环境
+
+```bash
+uv venv --python=3.12 .venv
+source .venv/bin/activate
+# 注意：请使用 'uv pip install' 而不是直接使用 'pip install'
+```
+
+### 使用 Anaconda 创建虚拟环境
+
+```bash
+conda create -n markitdown python=3.12
+conda activate markitdown
+```
+
+## 安装
+
+### 从 PyPI 安装（推荐）
+
+安装所有可选依赖（推荐）：
+
+```bash
+pip install 'markitdown[all]'
+```
+
+或者，从源码安装：
+
+```bash
+git clone git@github.com:microsoft/markitdown.git
+cd markitdown
+pip install -e 'packages/markitdown[all]'
+```
+
+### 可选依赖分组
+
+您可以根据需要选择性地安装依赖：
+
+```bash
+pip install 'markitdown[pdf, docx, pptx]'
+```
+
+可用的可选依赖分组：
+
+| 分组 | 支持的格式 |
+|------|-----------|
+| `[all]` | 所有可选依赖（推荐） |
+| `[pptx]` | PowerPoint 文件 |
+| `[docx]` | Word 文件 |
+| `[xlsx]` | Excel 文件 |
+| `[xls]` | 旧版 Excel 文件 |
+| `[pdf]` | PDF 文件 |
+| `[outlook]` | Outlook 消息 |
+| `[az-doc-intel]` | Azure Document Intelligence |
+| `[audio-transcription]` | WAV 和 MP3 音频转录 |
+| `[youtube-transcription]` | YouTube 视频字幕获取 |
+
+## 使用方法
+
+### 命令行使用
+
+#### 基础转换
+
+将文件转换为 Markdown 并输出到标准输出：
+
+```bash
+markitdown path-to-file.pdf
+```
+
+保存到文件（方式一）：
+
+```bash
+markitdown path-to-file.pdf > document.md
+```
+
+保存到文件（方式二）：
+
+```bash
+markitdown path-to-file.pdf -o document.md
+```
+
+#### 从标准输入读取
+
+```bash
+cat path-to-file.pdf | markitdown
+```
+
+或
+
+```bash
+markitdown < path-to-file.pdf
+```
+
+#### 提供文件类型提示
+
+当从标准输入读取或文件扩展名不明确时，可以提供类型提示：
+
+```bash
+# 提供扩展名提示
+markitdown -x .pdf
+
+# 提供 MIME 类型提示
+markitdown -m application/pdf
+
+# 提供编码提示
+markitdown -c utf-8
+```
+
+#### 使用插件
+
+MarkItDown 支持第三方插件。插件默认禁用。
+
+列出已安装的插件：
+
+```bash
+markitdown --list-plugins
+```
+
+启用插件进行转换：
+
+```bash
+markitdown --use-plugins path-to-file.pdf
+```
+
+要查找可用的插件，请在 GitHub 上搜索话题标签 `#markitdown-plugin`。
+
+#### MarkItDown OCR 插件
+
+`markitdown-ocr` 插件为 PDF、DOCX、PPTX 和 XLSX 转换器添加 OCR 支持，使用 LLM Vision 从嵌入的图像中提取文本 —— 使用与 MarkItDown 用于图像描述相同的 `llm_client` / `llm_model` 模式，无需新的 ML 库或二进制依赖。
+
+**安装：**
+
+```bash
+pip install markitdown-ocr
+pip install openai  # 或任何 OpenAI 兼容客户端
+```
+
+**命令行使用：**
+
+```bash
+markitdown document.pdf --use-plugins
+```
+
+**Python 使用：**
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+result = md.convert("document_with_images.pdf")
+print(result.text_content)
+```
+
+如果没有提供 `llm_client`，插件仍然会加载，但 OCR 会被静默跳过，转而使用标准的内置转换器。
+
+更多详细信息请参阅 [`packages/markitdown-ocr/README.md`](packages/markitdown-ocr/README.md)。
+
+#### 使用 Azure Document Intelligence
+
+使用 Microsoft Document Intelligence 进行转换：
+
+```bash
+markitdown path-to-file.pdf -o document.md -d -e "<document_intelligence_endpoint>"
+```
+
+有关如何设置 Azure Document Intelligence 资源的更多信息，请参阅[官方文档](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/create-document-intelligence-resource?view=doc-intel-4.0.0)。
+
+### Python API 使用
+
+#### 基础用法
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown(enable_plugins=False)  # 设置为 True 以启用插件
+result = md.convert("test.xlsx")
+print(result.text_content)
+```
+
+#### 使用 Document Intelligence
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
+result = md.convert("test.pdf")
+print(result.text_content)
+```
+
+#### 使用大语言模型进行图像描述
+
+要使用大语言模型进行图像描述（目前仅适用于 pptx 和图像文件），请提供 `llm_client` 和 `llm_model`：
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+client = OpenAI()
+md = MarkItDown(
+    llm_client=client, 
+    llm_model="gpt-4o", 
+    llm_prompt="可选的自定义提示词"
+)
+result = md.convert("example.jpg")
+print(result.text_content)
+```
+
+#### 多种输入源
+
+MarkItDown 支持多种输入源：
+
+```python
+from markitdown import MarkItDown
+from pathlib import Path
+import requests
+
+md = MarkItDown()
+
+# 本地文件路径 (字符串)
+result = md.convert("/path/to/file.pdf")
+
+# 本地文件路径 (Path 对象)
+result = md.convert(Path("/path/to/file.pdf"))
+
+# URL
+result = md.convert("https://example.com/document.pdf")
+
+# requests.Response 对象
+response = requests.get("https://example.com/document.pdf")
+result = md.convert(response)
+
+# 二进制流
+with open("/path/to/file.pdf", "rb") as f:
+    result = md.convert(f)
+```
+
+#### 使用窄范围 API 进行更好的安全控制
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+# 只处理本地文件 (不处理 URL)
+result = md.convert_local("/path/to/file.pdf")
+
+# 只处理流
+with open("/path/to/file.pdf", "rb") as f:
+    result = md.convert_stream(f)
+
+# 只处理 URI
+result = md.convert_uri("https://example.com/document.pdf")
+```
+
+### Docker 使用
+
+```sh
+docker build -t markitdown:latest .
+docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
+```
+
+## 安全考虑
+
+MarkItDown 以当前进程的权限执行 I/O 操作。与 `open()` 或 `requests.get()` 类似，它会访问进程本身能够访问的资源。
+
+### 输入安全
+
+**清理您的输入：** 不要将不受信任的输入直接传递给 MarkItDown。如果输入的任何部分可能由不受信任的用户或系统控制（例如在托管或服务器端应用程序中），则必须在调用 MarkItDown 之前对其进行验证和限制。根据您的环境，这可能包括：
+- 限制文件路径
+- 限制 URI 方案和网络目标
+- 阻止访问私有、回环、链路本地或元数据服务地址
+
+### API 选择
+
+**只调用您需要的转换方法：** 优先选择最适合您用例的最窄范围转换 API。
+
+| API | 访问能力 | 推荐场景 |
+|-----|----------|----------|
+| `convert()` | 本地文件 + URL + 流 | 通用场景（最宽松） |
+| `convert_local()` | 仅本地文件 | 只需要读取本地文件 |
+| `convert_stream()` | 仅已打开的流 | 完全控制的场景 |
+| `convert_response()` | 仅 requests.Response | 自己管理 HTTP 获取 |
+| `convert_uri()` | URI 解析 | 需要 URI 处理时 |
+
+## 贡献
+
+本项目欢迎贡献和建议。大多数贡献要求您同意贡献者许可协议 (CLA)，声明您有权并实际授予我们使用您的贡献的权利。有关详细信息，请访问 https://cla.opensource.microsoft.com。
+
+当您提交拉取请求时，CLA 机器人将自动确定您是否需要提供 CLA 并适当装饰 PR（例如状态检查、评论）。只需按照机器人提供的说明操作。您只需在使用我们的 CLA 的所有存储库中执行一次此操作。
+
+本项目采用了 [Microsoft 开源行为准则](https://opensource.microsoft.com/codeofconduct/)。有关更多信息，请参阅[行为准则常见问题解答](https://opensource.microsoft.com/codeofconduct/faq/)，或联系 [opencode@microsoft.com](mailto:opencode@microsoft.com) 提出任何其他问题或意见。
+
+### 如何贡献
+
+您可以通过查看问题或帮助审查 PR 来提供帮助。任何问题或 PR 都是欢迎的，但我们也标记了一些为"open for contribution"和"open for reviewing"，以帮助促进社区贡献。这些当然只是建议，欢迎您以任何您喜欢的方式贡献。
+
+| | 全部 | 特别需要社区帮助 |
+|--|------|-----------------|
+| **问题** | [所有问题](https://github.com/microsoft/markitdown/issues) | [开放贡献的问题](https://github.com/microsoft/markitdown/issues?q=is%3Aissue+is%3Aopen+label%3A%22open+for+contribution%22) |
+| **PRs** | [所有 PR](https://github.com/microsoft/markitdown/pulls) | [开放审查的 PR](https://github.com/microsoft/markitdown/pulls?q=is%3Apr+is%3Aopen+label%3A%22open+for+reviewing%22) |
+
+### 运行测试和检查
+
+1. 导航到 MarkItDown 包目录：
+
+```sh
+cd packages/markitdown
+```
+
+2. 在您的环境中安装 `hatch` 并运行测试：
+
+```sh
+pip install hatch  # 其他安装 hatch 的方式：https://hatch.pypa.io/dev/install/
+hatch shell
+hatch test
+```
+
+或者使用 Devcontainer（已安装所有依赖）：
+
+```sh
+# 在 Devcontainer 中重新打开项目并运行：
+hatch test
+```
+
+3. 在提交 PR 之前运行 pre-commit 检查：
+
+```sh
+pre-commit run --all-files
+```
+
+### 贡献第三方插件
+
+您还可以通过创建和共享第三方插件来贡献。有关更多详细信息，请参阅 `packages/markitdown-sample-plugin`。
+
+## 商标
+
+本项目可能包含项目、产品或服务的商标或徽标。授权使用 Microsoft 商标或徽标必须遵守 [Microsoft 商标和品牌指南](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general)。在修改本项目中使用 Microsoft 商标或徽标时，不得引起混淆或暗示 Microsoft 赞助。任何第三方商标或徽标的使用均受这些第三方的政策约束。
+
+## 许可证
+
+本项目采用 MIT 许可证。有关详细信息，请参阅 [LICENSE](LICENSE) 文件。
diff --git a/web/app.py b/web/app.py
new file mode 100644
index 000000000..6e03a2b05
--- /dev/null
+++ b/web/app.py
@@ -0,0 +1,232 @@
+import os
+import io
+import tempfile
+import zipfile
+from pathlib import Path
+from flask import Flask, render_template, request, jsonify, session, send_file
+from flask_cors import CORS
+from markitdown import MarkItDown
+from werkzeug.utils import secure_filename
+
+app = Flask(__name__)
+app.secret_key = 'markitdown-secret-key-2026'
+CORS(app, supports_credentials=True)
+app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024  # 100MB max file size
+
+ALLOWED_EXTENSIONS = {'pdf', 'docx', 'doc', 'pptx', 'ppt', 'xlsx', 'xls', 
+                      'jpg', 'jpeg', 'png', 'html', 'htm', 'csv', 'json', 
+                      'xml', 'epub', 'txt', 'md', 'ipynb'}
+
+def get_markitdown():
+    llm_config = session.get('llm_config', {})
+    kwargs = {'enable_plugins': False}
+    
+    if llm_config.get('api_key'):
+        try:
+            from openai import OpenAI
+            
+            client_kwargs = {'api_key': llm_config['api_key']}
+            if llm_config.get('base_url'):
+                client_kwargs['base_url'] = llm_config['base_url']
+            
+            client = OpenAI(**client_kwargs)
+            kwargs['llm_client'] = client
+            
+            if llm_config.get('model'):
+                kwargs['llm_model'] = llm_config['model']
+        except ImportError:
+            pass
+    
+    return MarkItDown(**kwargs)
+
+
+def get_file_extension(filename):
+    return Path(filename).suffix.lower().lstrip('.')
+
+
+def allowed_file(filename):
+    ext = get_file_extension(filename)
+    return ext in ALLOWED_EXTENSIONS
+
+
+@app.route('/')
+def index():
+    return render_template('index.html')
+
+
+def convert_single_file(file, md):
+    original_filename = file.filename
+    
+    if not original_filename:
+        return None, {'error': 'No filename', 'filename': original_filename}
+    
+    if not allowed_file(original_filename):
+        return None, {'error': 'File type not supported', 'filename': original_filename}
+    
+    try:
+        ext = get_file_extension(original_filename)
+        
+        safe_filename = secure_filename(original_filename)
+        if not safe_filename or safe_filename == '.':
+            safe_filename = f"upload.{ext}"
+        elif not get_file_extension(safe_filename):
+            safe_filename = f"{safe_filename}.{ext}"
+        
+        with tempfile.TemporaryDirectory() as temp_dir:
+            temp_path = os.path.join(temp_dir, safe_filename)
+            file.save(temp_path)
+            
+            result = md.convert(temp_path)
+            
+            return {
+                'success': True,
+                'filename': original_filename,
+                'markdown': result.text_content,
+                'file_type': ext
+            }, None
+            
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        return None, {'error': str(e), 'filename': original_filename}
+
+
+@app.route('/api/convert', methods=['POST'])
+def convert_file():
+    if 'file' not in request.files:
+        return jsonify({'error': 'No file part'}), 400
+    
+    file = request.files['file']
+    md = get_markitdown()
+    
+    result, error = convert_single_file(file, md)
+    
+    if error:
+        return jsonify(error), 400
+    
+    return jsonify(result)
+
+
+@app.route('/api/convert-batch', methods=['POST'])
+def convert_batch():
+    if 'files' not in request.files:
+        return jsonify({'error': 'No files part'}), 400
+    
+    files = request.files.getlist('files')
+    
+    if not files:
+        return jsonify({'error': 'No selected files'}), 400
+    
+    md = get_markitdown()
+    results = []
+    errors = []
+    
+    for file in files:
+        if file.filename:
+            result, error = convert_single_file(file, md)
+            if result:
+                results.append(result)
+            if error:
+                errors.append(error)
+    
+    session['batch_results'] = results
+    
+    return jsonify({
+        'success': True,
+        'total': len(files),
+        'success_count': len(results),
+        'error_count': len(errors),
+        'results': results,
+        'errors': errors
+    })
+
+
+@app.route('/api/supported-formats', methods=['GET'])
+def supported_formats():
+    return jsonify({
+        'formats': [
+            {'ext': 'pdf', 'name': 'PDF Documents', 'icon': '📄'},
+            {'ext': 'docx', 'name': 'Word Documents', 'icon': '📝'},
+            {'ext': 'doc', 'name': 'Word Documents', 'icon': '📝'},
+            {'ext': 'pptx', 'name': 'PowerPoint Presentations', 'icon': '📊'},
+            {'ext': 'ppt', 'name': 'PowerPoint Presentations', 'icon': '📊'},
+            {'ext': 'xlsx', 'name': 'Excel Spreadsheets', 'icon': '📈'},
+            {'ext': 'xls', 'name': 'Excel Spreadsheets', 'icon': '📈'},
+            {'ext': 'jpg', 'name': 'JPEG Images', 'icon': '🖼️'},
+            {'ext': 'jpeg', 'name': 'JPEG Images', 'icon': '🖼️'},
+            {'ext': 'png', 'name': 'PNG Images', 'icon': '🖼️'},
+            {'ext': 'html', 'name': 'HTML Files', 'icon': '🌐'},
+            {'ext': 'htm', 'name': 'HTML Files', 'icon': '🌐'},
+            {'ext': 'csv', 'name': 'CSV Files', 'icon': '📋'},
+            {'ext': 'json', 'name': 'JSON Files', 'icon': '📋'},
+            {'ext': 'xml', 'name': 'XML Files', 'icon': '📋'},
+            {'ext': 'epub', 'name': 'EPUB eBooks', 'icon': '📚'},
+            {'ext': 'txt', 'name': 'Text Files', 'icon': '📄'},
+            {'ext': 'md', 'name': 'Markdown Files', 'icon': '📝'},
+            {'ext': 'ipynb', 'name': 'Jupyter Notebooks', 'icon': '📓'},
+        ]
+    })
+
+
+@app.route('/api/llm-config', methods=['GET', 'POST', 'DELETE'])
+def llm_config():
+    if request.method == 'GET':
+        config = session.get('llm_config', {})
+        return jsonify({
+            'has_config': bool(config.get('api_key')),
+            'model': config.get('model', ''),
+            'base_url': config.get('base_url', ''),
+        })
+    
+    elif request.method == 'POST':
+        data = request.get_json()
+        
+        if not data or not data.get('api_key'):
+            return jsonify({'error': 'API key is required'}), 400
+        
+        session['llm_config'] = {
+            'api_key': data.get('api_key'),
+            'model': data.get('model', 'gpt-4o'),
+            'base_url': data.get('base_url', ''),
+        }
+        
+        return jsonify({
+            'success': True,
+            'message': 'LLM config saved successfully'
+        })
+    
+    elif request.method == 'DELETE':
+        session.pop('llm_config', None)
+        return jsonify({
+            'success': True,
+            'message': 'LLM config cleared successfully'
+        })
+
+
+@app.route('/api/download-batch', methods=['GET'])
+def download_batch():
+    results = session.get('batch_results', [])
+    
+    if not results:
+        return jsonify({'error': 'No batch results available'}), 400
+    
+    with tempfile.TemporaryDirectory() as temp_dir:
+        zip_path = os.path.join(temp_dir, 'converted_files.zip')
+        
+        with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
+            for result in results:
+                filename = result['filename']
+                markdown = result['markdown']
+                md_filename = os.path.splitext(filename)[0] + '.md'
+                zipf.writestr(md_filename, markdown)
+        
+        return send_file(
+            zip_path,
+            mimetype='application/zip',
+            as_attachment=True,
+            download_name='converted_files.zip'
+        )
+
+
+if __name__ == '__main__':
+    app.run(debug=True, host='0.0.0.0', port=5000)
diff --git a/web/requirements.txt b/web/requirements.txt
new file mode 100644
index 000000000..a0339de44
--- /dev/null
+++ b/web/requirements.txt
@@ -0,0 +1,3 @@
+flask>=3.0.0
+flask-cors>=4.0.0
+openai>=1.0.0
diff --git a/web/static/css/style.css b/web/static/css/style.css
new file mode 100644
index 000000000..ceab664c7
--- /dev/null
+++ b/web/static/css/style.css
@@ -0,0 +1,932 @@
+:root {
+    --primary: #6366f1;
+    --primary-light: #818cf8;
+    --primary-dark: #4f46e5;
+    --secondary: #64748b;
+    --accent: #f472b6;
+    --bg-primary: #f8fafc;
+    --bg-secondary: #ffffff;
+    --bg-tertiary: #f1f5f9;
+    --bg-hover: #eef2ff;
+    --text-primary: #1e293b;
+    --text-secondary: #64748b;
+    --text-muted: #94a3b8;
+    --border: #e2e8f0;
+    --border-light: #f1f5f9;
+    --success: #22c55e;
+    --warning: #f59e0b;
+    --error: #ef4444;
+    --shadow-sm: 0 1px 2px 0 rgb(0 0 0 / 0.05);
+    --shadow: 0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1);
+    --shadow-md: 0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1);
+    --shadow-lg: 0 10px 15px -3px rgb(0 0 0 / 0.1), 0 4px 6px -4px rgb(0 0 0 / 0.1);
+    --shadow-xl: 0 20px 25px -5px rgb(0 0 0 / 0.1), 0 8px 10px -6px rgb(0 0 0 / 0.1);
+    --radius-sm: 6px;
+    --radius: 8px;
+    --radius-md: 12px;
+    --radius-lg: 16px;
+    --font-sans: 'Inter', system-ui, -apple-system, sans-serif;
+    --font-mono: 'Fira Code', 'SF Mono', monospace;
+    --transition: 200ms cubic-bezier(0.4, 0, 0.2, 1);
+}
+
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+
+html {
+    font-size: 16px;
+    -webkit-font-smoothing: antialiased;
+    -moz-osx-font-smoothing: grayscale;
+    height: 100%;
+}
+
+body {
+    font-family: var(--font-sans);
+    background: var(--bg-primary);
+    color: var(--text-primary);
+    line-height: 1.6;
+    min-height: 100vh;
+    height: 100%;
+    overflow: hidden;
+}
+
+.app-container {
+    min-height: 100vh;
+    height: 100vh;
+    display: flex;
+    flex-direction: column;
+}
+
+.app-header {
+    background: var(--bg-secondary);
+    border-bottom: 1px solid var(--border);
+    padding: 0.75rem 1.5rem;
+    flex-shrink: 0;
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    gap: 1rem;
+}
+
+.header-left {
+    display: flex;
+    align-items: center;
+    gap: 1.5rem;
+}
+
+.header-right {
+    display: flex;
+    align-items: center;
+    gap: 0.75rem;
+}
+
+.logo {
+    display: flex;
+    align-items: center;
+    gap: 0.625rem;
+}
+
+.logo-icon {
+    width: 28px;
+    height: 28px;
+}
+
+.logo h1 {
+    font-size: 1rem;
+    font-weight: 600;
+    letter-spacing: -0.02em;
+}
+
+.btn-icon {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    padding: 0.5rem;
+    background: var(--bg-tertiary);
+    border: 1px solid var(--border);
+    border-radius: var(--radius);
+    color: var(--text-secondary);
+    cursor: pointer;
+    transition: all var(--transition);
+}
+
+.btn-icon:hover {
+    background: var(--bg-hover);
+    color: var(--primary);
+    border-color: var(--primary-light);
+}
+
+.btn-primary {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 0.375rem;
+    padding: 0.5rem 1rem;
+    background: var(--primary);
+    color: white;
+    border: none;
+    border-radius: var(--radius);
+    font-size: 0.875rem;
+    font-weight: 500;
+    cursor: pointer;
+    transition: all var(--transition);
+    font-family: var(--font-sans);
+}
+
+.btn-primary:hover {
+    background: var(--primary-dark);
+    transform: translateY(-1px);
+    box-shadow: var(--shadow-md);
+}
+
+.btn-primary:active {
+    transform: translateY(0);
+}
+
+.btn-ghost {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 0.375rem;
+    padding: 0.4375rem 0.75rem;
+    background: transparent;
+    color: var(--text-secondary);
+    border: 1px solid var(--border);
+    border-radius: var(--radius);
+    font-size: 0.75rem;
+    font-weight: 500;
+    cursor: pointer;
+    transition: all var(--transition);
+    font-family: var(--font-sans);
+}
+
+.btn-ghost:hover {
+    background: var(--bg-tertiary);
+    color: var(--text-primary);
+    border-color: var(--border-light);
+}
+
+.btn-small {
+    padding: 0.3125rem 0.625rem;
+    font-size: 0.75rem;
+}
+
+.main-layout {
+    flex: 1;
+    display: flex;
+    overflow: hidden;
+    min-height: 0;
+}
+
+.preview-area {
+    flex: 1;
+    width: 80%;
+    display: flex;
+    flex-direction: column;
+    padding: 1rem;
+    gap: 0.75rem;
+    min-height: 0;
+    overflow: hidden;
+}
+
+.preview-header {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    background: var(--bg-secondary);
+    padding: 0.75rem 1rem;
+    border-radius: var(--radius-md);
+    border: 1px solid var(--border);
+    flex-shrink: 0;
+}
+
+.preview-file-info {
+    display: flex;
+    align-items: center;
+    gap: 0.75rem;
+}
+
+.preview-file-icon {
+    font-size: 1.5rem;
+}
+
+.preview-file-details {
+    display: flex;
+    flex-direction: column;
+    gap: 0.125rem;
+}
+
+.preview-file-details h3 {
+    font-size: 0.875rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.preview-file-type {
+    font-size: 0.6875rem;
+    color: var(--text-muted);
+    background: var(--bg-tertiary);
+    padding: 0.125rem 0.5rem;
+    border-radius: 999px;
+    width: fit-content;
+}
+
+.preview-actions {
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+}
+
+.view-toggle {
+    display: flex;
+    align-items: center;
+    gap: 0.25rem;
+    background: var(--bg-secondary);
+    padding: 0.25rem;
+    border-radius: var(--radius);
+    border: 1px solid var(--border);
+    width: fit-content;
+    flex-shrink: 0;
+}
+
+.toggle-btn {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 0.375rem;
+    padding: 0.375rem 0.75rem;
+    background: transparent;
+    color: var(--text-secondary);
+    border: none;
+    border-radius: var(--radius-sm);
+    font-size: 0.75rem;
+    font-weight: 500;
+    cursor: pointer;
+    transition: all var(--transition);
+    font-family: var(--font-sans);
+}
+
+.toggle-btn:hover {
+    color: var(--text-primary);
+}
+
+.toggle-btn.active {
+    background: var(--primary);
+    color: white;
+}
+
+.preview-content-area {
+    flex: 1;
+    display: flex;
+    flex-direction: column;
+    background: var(--bg-secondary);
+    border-radius: var(--radius-md);
+    border: 1px solid var(--border);
+    overflow: hidden;
+    min-height: 0;
+}
+
+.empty-state {
+    flex: 1;
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+    justify-content: center;
+    gap: 1rem;
+    padding: 2rem;
+    text-align: center;
+}
+
+.empty-icon {
+    width: 80px;
+    height: 80px;
+    opacity: 0.6;
+}
+
+.empty-icon svg {
+    width: 100%;
+    height: 100%;
+}
+
+.empty-state h3 {
+    font-size: 1rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.empty-state p {
+    font-size: 0.875rem;
+    color: var(--text-secondary);
+    max-width: 300px;
+}
+
+.result-content {
+    flex: 1;
+    display: flex;
+    flex-direction: column;
+    min-height: 0;
+}
+
+.markdown-view,
+.preview-view {
+    display: none;
+    flex: 1;
+    overflow: hidden;
+}
+
+.markdown-view.active,
+.preview-view.active {
+    display: flex;
+    flex-direction: column;
+}
+
+.markdown-view pre {
+    margin: 0;
+    padding: 1rem;
+    flex: 1;
+    overflow: auto;
+    min-height: 0;
+}
+
+.markdown-view code {
+    font-family: var(--font-mono);
+    font-size: 0.8125rem;
+    line-height: 1.7;
+    color: var(--text-primary);
+    white-space: pre-wrap;
+    word-break: break-word;
+}
+
+.preview-content {
+    padding: 1.25rem;
+    flex: 1;
+    overflow: auto;
+    min-height: 0;
+    font-family: var(--font-sans);
+    font-size: 0.875rem;
+    line-height: 1.7;
+    color: var(--text-primary);
+}
+
+.preview-content h1,
+.preview-content h2,
+.preview-content h3,
+.preview-content h4,
+.preview-content h5,
+.preview-content h6 {
+    margin-top: 1rem;
+    margin-bottom: 0.5rem;
+    font-weight: 600;
+    line-height: 1.3;
+    color: var(--text-primary);
+}
+
+.preview-content h1:first-child,
+.preview-content h2:first-child,
+.preview-content h3:first-child {
+    margin-top: 0;
+}
+
+.preview-content h1 {
+    font-size: 1.5rem;
+}
+
+.preview-content h2 {
+    font-size: 1.25rem;
+}
+
+.preview-content h3 {
+    font-size: 1.125rem;
+}
+
+.preview-content h4 {
+    font-size: 1rem;
+}
+
+.preview-content p {
+    margin-bottom: 0.75rem;
+}
+
+.preview-content p:last-child {
+    margin-bottom: 0;
+}
+
+.preview-content ul,
+.preview-content ol {
+    margin-bottom: 0.75rem;
+    padding-left: 1.25rem;
+}
+
+.preview-content li {
+    margin-bottom: 0.125rem;
+}
+
+.preview-content code {
+    background: var(--bg-tertiary);
+    padding: 0.125rem 0.375rem;
+    border-radius: 4px;
+    font-family: var(--font-mono);
+    font-size: 0.75rem;
+}
+
+.preview-content pre {
+    background: var(--bg-tertiary);
+    padding: 0.75rem;
+    border-radius: var(--radius);
+    overflow-x: auto;
+    margin-bottom: 0.75rem;
+}
+
+.preview-content pre code {
+    background: transparent;
+    padding: 0;
+}
+
+.preview-content blockquote {
+    border-left: 3px solid var(--primary);
+    padding-left: 0.875rem;
+    margin: 0.75rem 0;
+    color: var(--text-secondary);
+}
+
+.preview-content table {
+    width: 100%;
+    border-collapse: collapse;
+    margin-bottom: 0.75rem;
+}
+
+.preview-content th,
+.preview-content td {
+    border: 1px solid var(--border);
+    padding: 0.5rem 0.75rem;
+    text-align: left;
+    font-size: 0.8125rem;
+}
+
+.preview-content th {
+    background: var(--bg-tertiary);
+    font-weight: 600;
+}
+
+.preview-content hr {
+    border: none;
+    border-top: 1px solid var(--border);
+    margin: 1rem 0;
+}
+
+.preview-content a {
+    color: var(--primary);
+    text-decoration: none;
+}
+
+.preview-content a:hover {
+    text-decoration: underline;
+}
+
+.preview-content img {
+    max-width: 100%;
+    height: auto;
+    border-radius: var(--radius);
+}
+
+.sidebar {
+    width: 20%;
+    min-width: 240px;
+    max-width: 320px;
+    background: var(--bg-secondary);
+    border-left: 1px solid var(--border);
+    display: flex;
+    flex-direction: column;
+    min-height: 0;
+    overflow: hidden;
+}
+
+.sidebar-header {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    padding: 0.875rem 1rem;
+    border-bottom: 1px solid var(--border);
+    flex-shrink: 0;
+}
+
+.sidebar-header h3 {
+    font-size: 0.875rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.sidebar-actions {
+    display: flex;
+    align-items: center;
+    gap: 0.25rem;
+}
+
+.document-list {
+    flex: 1;
+    overflow-y: auto;
+    min-height: 0;
+}
+
+.empty-list {
+    padding: 2rem 1rem;
+    text-align: center;
+    color: var(--text-muted);
+    font-size: 0.8125rem;
+}
+
+.document-item {
+    display: flex;
+    align-items: center;
+    gap: 0.75rem;
+    padding: 0.75rem 1rem;
+    border-bottom: 1px solid var(--border-light);
+    cursor: pointer;
+    transition: all var(--transition);
+}
+
+.document-item:hover {
+    background: var(--bg-tertiary);
+}
+
+.document-item.active {
+    background: var(--bg-hover);
+    border-left: 3px solid var(--primary);
+}
+
+.document-item.error {
+    opacity: 0.7;
+}
+
+.document-item-icon {
+    font-size: 1.25rem;
+    flex-shrink: 0;
+}
+
+.document-item-details {
+    flex: 1;
+    overflow: hidden;
+    min-width: 0;
+}
+
+.document-item-name {
+    font-size: 0.8125rem;
+    font-weight: 500;
+    color: var(--text-primary);
+    overflow: hidden;
+    text-overflow: ellipsis;
+    white-space: nowrap;
+}
+
+.document-item-status {
+    font-size: 0.6875rem;
+    font-weight: 500;
+}
+
+.document-item-status.success {
+    color: var(--success);
+}
+
+.document-item-status.error {
+    color: var(--error);
+}
+
+.modal-overlay {
+    position: fixed;
+    top: 0;
+    left: 0;
+    right: 0;
+    bottom: 0;
+    background: rgba(0, 0, 0, 0.5);
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    z-index: 1000;
+    backdrop-filter: blur(4px);
+}
+
+.modal-overlay[hidden] {
+    display: none;
+}
+
+.modal {
+    background: var(--bg-secondary);
+    border-radius: var(--radius-lg);
+    padding: 2.5rem 3rem;
+    text-align: center;
+    box-shadow: var(--shadow-xl);
+    animation: modalIn 0.3s ease-out;
+    min-width: 280px;
+}
+
+@keyframes modalIn {
+    from {
+        opacity: 0;
+        transform: scale(0.95) translateY(10px);
+    }
+    to {
+        opacity: 1;
+        transform: scale(1) translateY(0);
+    }
+}
+
+.modal-content {
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+    gap: 1rem;
+}
+
+.progress-info {
+    font-size: 0.875rem;
+    color: var(--text-secondary);
+    font-weight: 500;
+}
+
+.spinner {
+    position: relative;
+    width: 56px;
+    height: 56px;
+}
+
+.spinner-ring {
+    position: absolute;
+    width: 100%;
+    height: 100%;
+    border: 3px solid var(--bg-tertiary);
+    border-top-color: var(--primary);
+    border-radius: 50%;
+    animation: spin 1s linear infinite;
+}
+
+.spinner-path {
+    position: absolute;
+    width: 100%;
+    height: 100%;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    color: var(--primary);
+    font-size: 1.25rem;
+}
+
+.spinner-path svg {
+    width: 24px;
+    height: 24px;
+    animation: pulse 2s ease-in-out infinite;
+}
+
+@keyframes spin {
+    to {
+        transform: rotate(360deg);
+    }
+}
+
+@keyframes pulse {
+    0%, 100% {
+        opacity: 0.5;
+        transform: scale(0.9);
+    }
+    50% {
+        opacity: 1;
+        transform: scale(1);
+    }
+}
+
+.modal-content h3 {
+    font-size: 1rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.modal-content p {
+    color: var(--text-secondary);
+    font-size: 0.875rem;
+}
+
+.modal-settings {
+    max-width: 420px;
+    width: 100%;
+    padding: 0;
+    overflow: hidden;
+    text-align: left;
+}
+
+.modal-header {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    padding: 1rem 1.25rem;
+    border-bottom: 1px solid var(--border);
+}
+
+.modal-header h3 {
+    font-size: 1rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.modal-body {
+    padding: 1.25rem;
+}
+
+.form-group {
+    margin-bottom: 1.25rem;
+}
+
+.form-group:last-child {
+    margin-bottom: 0;
+}
+
+.form-group label {
+    display: block;
+    font-size: 0.8125rem;
+    font-weight: 500;
+    color: var(--text-primary);
+    margin-bottom: 0.375rem;
+}
+
+.form-group label .required {
+    color: var(--error);
+}
+
+.form-group input[type="text"],
+.form-group input[type="password"] {
+    width: 100%;
+    padding: 0.625rem 0.875rem;
+    font-size: 0.8125rem;
+    font-family: var(--font-sans);
+    color: var(--text-primary);
+    background: var(--bg-secondary);
+    border: 1px solid var(--border);
+    border-radius: var(--radius);
+    transition: all var(--transition);
+}
+
+.form-group input[type="text"]:focus,
+.form-group input[type="password"]:focus {
+    outline: none;
+    border-color: var(--primary);
+    box-shadow: 0 0 0 3px rgba(99, 102, 241, 0.1);
+}
+
+.form-group input[type="text"]::placeholder,
+.form-group input[type="password"]::placeholder {
+    color: var(--text-muted);
+}
+
+.form-group .form-hint {
+    font-size: 0.6875rem;
+    color: var(--text-muted);
+    margin-top: 0.375rem;
+}
+
+.input-wrapper {
+    position: relative;
+    display: flex;
+    align-items: center;
+}
+
+.input-wrapper input {
+    padding-right: 2.5rem;
+}
+
+.toggle-password {
+    position: absolute;
+    right: 0.5rem;
+    background: transparent;
+    border: none;
+    color: var(--text-secondary);
+    cursor: pointer;
+    padding: 0.25rem;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+}
+
+.toggle-password:hover {
+    color: var(--text-primary);
+}
+
+.form-actions {
+    display: flex;
+    align-items: center;
+    justify-content: flex-end;
+    gap: 0.5rem;
+    margin-top: 1.5rem;
+    padding-top: 1.25rem;
+    border-top: 1px solid var(--border);
+}
+
+.toast {
+    position: fixed;
+    bottom: 1.5rem;
+    left: 50%;
+    transform: translateX(-50%) translateY(100px);
+    background: var(--text-primary);
+    color: white;
+    padding: 0.75rem 1.25rem;
+    border-radius: var(--radius);
+    font-size: 0.8125rem;
+    opacity: 0;
+    transition: all var(--transition);
+    z-index: 2000;
+    box-shadow: var(--shadow-lg);
+}
+
+.toast.show {
+    transform: translateX(-50%) translateY(0);
+    opacity: 1;
+}
+
+.toast.success {
+    background: var(--success);
+}
+
+.toast.error {
+    background: var(--error);
+}
+
+::-webkit-scrollbar {
+    width: 6px;
+    height: 6px;
+}
+
+::-webkit-scrollbar-track {
+    background: var(--bg-tertiary);
+    border-radius: 3px;
+}
+
+::-webkit-scrollbar-thumb {
+    background: var(--text-muted);
+    border-radius: 3px;
+}
+
+::-webkit-scrollbar-thumb:hover {
+    background: var(--text-secondary);
+}
+
+@media (max-width: 900px) {
+    .main-layout {
+        flex-direction: column;
+    }
+
+    .preview-area {
+        width: 100%;
+        max-height: 60vh;
+    }
+
+    .sidebar {
+        width: 100%;
+        max-width: none;
+        max-height: 40vh;
+        border-left: none;
+        border-top: 1px solid var(--border);
+    }
+}
+
+@media (max-width: 768px) {
+    .app-header {
+        padding: 0.75rem 1rem;
+    }
+
+    .header-content {
+        flex: 1 1 100%;
+        flex-direction: row;
+        align-items: center;
+        justify-content: space-between;
+    }
+
+    .preview-header {
+        flex-direction: column;
+        align-items: flex-start;
+        gap: 0.75rem;
+    }
+
+    .preview-actions {
+        width: 100%;
+        justify-content: flex-start;
+        flex-wrap: wrap;
+    }
+
+    .preview-content {
+        padding: 1rem;
+    }
+
+    .modal-settings {
+        margin: 1rem;
+        max-width: none;
+    }
+
+    .form-actions {
+        flex-wrap: wrap;
+    }
+
+    .form-actions .btn-ghost,
+    .form-actions .btn-primary {
+        flex: 1;
+    }
+}
diff --git a/web/static/js/app.js b/web/static/js/app.js
new file mode 100644
index 000000000..8b84385c8
--- /dev/null
+++ b/web/static/js/app.js
@@ -0,0 +1,515 @@
+document.addEventListener('DOMContentLoaded', function() {
+    let convertedDocuments = [];
+    let currentDocIndex = -1;
+    let activeView = 'markdown';
+
+    const elements = {
+        uploadBtn: document.getElementById('uploadBtn'),
+        emptyUploadBtn: document.getElementById('emptyUploadBtn'),
+        fileInput: document.getElementById('fileInput'),
+        settingsBtn: document.getElementById('settingsBtn'),
+        settingsModal: document.getElementById('settingsModal'),
+        closeSettingsBtn: document.getElementById('closeSettingsBtn'),
+        modalOverlay: document.getElementById('modalOverlay'),
+        processingTitle: document.getElementById('processingTitle'),
+        processingFilename: document.getElementById('processingFilename'),
+        progressInfo: document.getElementById('progressInfo'),
+        progressCurrent: document.getElementById('progressCurrent'),
+        progressTotal: document.getElementById('progressTotal'),
+        previewHeader: document.getElementById('previewHeader'),
+        viewToggle: document.getElementById('viewToggle'),
+        emptyState: document.getElementById('emptyState'),
+        resultContent: document.getElementById('resultContent'),
+        previewFileIcon: document.getElementById('previewFileIcon'),
+        previewFilename: document.getElementById('previewFilename'),
+        previewFileType: document.getElementById('previewFileType'),
+        markdownContent: document.getElementById('markdownContent'),
+        previewContent: document.getElementById('previewContent'),
+        markdownView: document.getElementById('markdownView'),
+        previewView: document.getElementById('previewView'),
+        toggleBtns: document.querySelectorAll('.toggle-btn'),
+        copyBtn: document.getElementById('copyBtn'),
+        downloadBtn: document.getElementById('downloadBtn'),
+        sidebarActions: document.getElementById('sidebarActions'),
+        batchDownloadBtn: document.getElementById('batchDownloadBtn'),
+        clearAllBtn: document.getElementById('clearAllBtn'),
+        documentList: document.getElementById('documentList'),
+        emptyList: document.getElementById('emptyList'),
+        llmConfigForm: document.getElementById('llmConfigForm'),
+        apiKeyInput: document.getElementById('apiKeyInput'),
+        baseUrlInput: document.getElementById('baseUrlInput'),
+        modelInput: document.getElementById('modelInput'),
+        toggleApiKey: document.getElementById('toggleApiKey'),
+        clearConfigBtn: document.getElementById('clearConfigBtn'),
+        toast: document.getElementById('toast'),
+        toastMessage: document.getElementById('toastMessage'),
+    };
+
+    const fileIcons = {
+        pdf: '📄',
+        docx: '📝',
+        doc: '📝',
+        pptx: '📊',
+        ppt: '📊',
+        xlsx: '📈',
+        xls: '📈',
+        jpg: '🖼️',
+        jpeg: '🖼️',
+        png: '🖼️',
+        html: '🌐',
+        htm: '🌐',
+        csv: '📋',
+        json: '📋',
+        xml: '📋',
+        epub: '📚',
+        txt: '📄',
+        md: '📝',
+        ipynb: '📓',
+    };
+
+    const fileTypeNames = {
+        pdf: 'PDF Document',
+        docx: 'Word Document',
+        doc: 'Word Document',
+        pptx: 'PowerPoint Presentation',
+        ppt: 'PowerPoint Presentation',
+        xlsx: 'Excel Spreadsheet',
+        xls: 'Excel Spreadsheet',
+        jpg: 'JPEG Image',
+        jpeg: 'JPEG Image',
+        png: 'PNG Image',
+        html: 'HTML File',
+        htm: 'HTML File',
+        csv: 'CSV File',
+        json: 'JSON File',
+        xml: 'XML File',
+        epub: 'EPUB eBook',
+        txt: 'Text File',
+        md: 'Markdown File',
+        ipynb: 'Jupyter Notebook',
+    };
+
+    function showToast(message, type = 'normal') {
+        elements.toastMessage.textContent = message;
+        elements.toast.className = 'toast';
+        if (type === 'success') {
+            elements.toast.classList.add('success');
+        } else if (type === 'error') {
+            elements.toast.classList.add('error');
+        }
+        elements.toast.classList.add('show');
+        
+        setTimeout(() => {
+            elements.toast.classList.remove('show');
+        }, 3000);
+    }
+
+    function showProcessingModal() {
+        elements.modalOverlay.hidden = false;
+    }
+
+    function hideProcessingModal() {
+        elements.modalOverlay.hidden = true;
+        elements.progressInfo.hidden = true;
+    }
+
+    function getFileIcon(extension) {
+        return fileIcons[extension.toLowerCase()] || '📄';
+    }
+
+    function getFileTypeName(extension) {
+        return fileTypeNames[extension.toLowerCase()] || 'Document';
+    }
+
+    function formatFileSize(bytes) {
+        if (bytes === 0) return '0 Bytes';
+        const k = 1024;
+        const sizes = ['Bytes', 'KB', 'MB', 'GB'];
+        const i = Math.floor(Math.log(bytes) / Math.log(k));
+        return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
+    }
+
+    function switchView(view) {
+        activeView = view;
+        elements.toggleBtns.forEach(btn => {
+            btn.classList.remove('active');
+            if (btn.dataset.view === view) {
+                btn.classList.add('active');
+            }
+        });
+        
+        if (view === 'markdown') {
+            elements.markdownView.classList.add('active');
+            elements.previewView.classList.remove('active');
+        } else {
+            elements.markdownView.classList.remove('active');
+            elements.previewView.classList.add('active');
+        }
+    }
+
+    function displayDocument(doc) {
+        if (!doc) {
+            elements.previewHeader.hidden = true;
+            elements.viewToggle.hidden = true;
+            elements.emptyState.hidden = false;
+            elements.resultContent.hidden = true;
+            return;
+        }
+
+        elements.previewHeader.hidden = false;
+        elements.viewToggle.hidden = false;
+        elements.emptyState.hidden = true;
+        elements.resultContent.hidden = false;
+
+        const ext = doc.filename.split('.').pop().toLowerCase();
+        
+        elements.previewFileIcon.textContent = getFileIcon(ext);
+        elements.previewFilename.textContent = doc.filename;
+        elements.previewFileType.textContent = getFileTypeName(ext);
+
+        elements.markdownContent.textContent = doc.markdown;
+
+        const previewHtml = marked.parse(doc.markdown);
+        elements.previewContent.innerHTML = previewHtml;
+
+        switchView(activeView);
+    }
+
+    function updateDocumentList() {
+        if (convertedDocuments.length === 0) {
+            elements.emptyList.hidden = false;
+            elements.sidebarActions.hidden = true;
+            elements.documentList.innerHTML = '';
+            elements.documentList.appendChild(elements.emptyList);
+            displayDocument(null);
+            return;
+        }
+
+        elements.emptyList.hidden = true;
+        elements.sidebarActions.hidden = false;
+        
+        elements.documentList.innerHTML = '';
+        
+        convertedDocuments.forEach((doc, index) => {
+            const ext = doc.filename.split('.').pop().toLowerCase();
+            const item = document.createElement('div');
+            item.className = `document-item ${doc.status || 'success'} ${index === currentDocIndex ? 'active' : ''}`;
+            item.dataset.index = index;
+            
+            item.innerHTML = `
+                <span class="document-item-icon">${getFileIcon(ext)}</span>
+                <div class="document-item-details">
+                    <span class="document-item-name">${doc.filename}</span>
+                    <span class="document-item-status ${doc.status || 'success'}">
+                        ${doc.status === 'error' ? doc.error : '已转换'}
+                    </span>
+                </div>
+            `;
+            
+            item.addEventListener('click', () => {
+                if (doc.status !== 'error') {
+                    selectDocument(index);
+                }
+            });
+            
+            elements.documentList.appendChild(item);
+        });
+    }
+
+    function selectDocument(index) {
+        if (index < 0 || index >= convertedDocuments.length) return;
+        
+        currentDocIndex = index;
+        
+        document.querySelectorAll('.document-item').forEach((item, i) => {
+            if (i === index) {
+                item.classList.add('active');
+            } else {
+                item.classList.remove('active');
+            }
+        });
+        
+        const doc = convertedDocuments[index];
+        if (doc && doc.status !== 'error') {
+            displayDocument(doc);
+        }
+    }
+
+    function openFileDialog() {
+        elements.fileInput.click();
+    }
+
+    async function handleFiles(files) {
+        if (!files || files.length === 0) return;
+        
+        if (files.length === 1) {
+            elements.processingTitle.textContent = '正在处理...';
+            elements.processingFilename.textContent = files[0].name;
+            elements.progressInfo.hidden = true;
+        } else {
+            elements.processingTitle.textContent = '正在批量处理...';
+            elements.processingFilename.textContent = `共 ${files.length} 个文件`;
+            elements.progressInfo.hidden = false;
+            elements.progressCurrent.textContent = '0';
+            elements.progressTotal.textContent = files.length;
+        }
+        
+        showProcessingModal();
+        
+        const formData = new FormData();
+        for (let file of files) {
+            formData.append('files', file);
+        }
+        
+        try {
+            const response = await fetch('/api/convert-batch', {
+                method: 'POST',
+                body: formData,
+                credentials: 'include',
+            });
+            
+            const data = await response.json();
+            hideProcessingModal();
+            
+            if (data.success) {
+                const newResults = data.results.map(r => ({ ...r, status: 'success' }));
+                const newErrors = data.errors.map(e => ({ ...e, status: 'error' }));
+                
+                convertedDocuments = [...newResults, ...newErrors, ...convertedDocuments];
+                updateDocumentList();
+                
+                const firstSuccess = newResults.find(r => r.status === 'success');
+                if (firstSuccess) {
+                    const firstIndex = convertedDocuments.findIndex(d => d.filename === firstSuccess.filename);
+                    selectDocument(firstIndex);
+                }
+                
+                if (data.success_count > 0) {
+                    showToast(`成功转换 ${data.success_count} 个文件`, 'success');
+                }
+                if (data.error_count > 0) {
+                    showToast(`${data.error_count} 个文件转换失败`, 'error');
+                }
+            } else {
+                showToast(data.error || '转换失败', 'error');
+            }
+        } catch (error) {
+            hideProcessingModal();
+            showToast(error.message || '网络错误', 'error');
+        }
+    }
+
+    async function copyToClipboard() {
+        if (currentDocIndex < 0 || !convertedDocuments[currentDocIndex]) {
+            showToast('没有可复制的内容', 'error');
+            return;
+        }
+        
+        const doc = convertedDocuments[currentDocIndex];
+        try {
+            await navigator.clipboard.writeText(doc.markdown);
+            showToast('已复制到剪贴板', 'success');
+        } catch (error) {
+            showToast('复制失败', 'error');
+        }
+    }
+
+    function downloadCurrentDocument() {
+        if (currentDocIndex < 0 || !convertedDocuments[currentDocIndex]) {
+            showToast('没有可下载的内容', 'error');
+            return;
+        }
+        
+        const doc = convertedDocuments[currentDocIndex];
+        const blob = new Blob([doc.markdown], { type: 'text/markdown' });
+        const url = URL.createObjectURL(blob);
+        const a = document.createElement('a');
+        a.href = url;
+        a.download = doc.filename.replace(/\.[^/.]+$/, '.md');
+        document.body.appendChild(a);
+        a.click();
+        document.body.removeChild(a);
+        URL.revokeObjectURL(url);
+        showToast('下载已开始', 'success');
+    }
+
+    async function downloadAllDocuments() {
+        const successDocs = convertedDocuments.filter(d => d.status === 'success');
+        if (successDocs.length === 0) {
+            showToast('没有可下载的文档', 'error');
+            return;
+        }
+        
+        try {
+            const response = await fetch('/api/download-batch', {
+                method: 'GET',
+                credentials: 'include',
+            });
+            
+            if (response.ok) {
+                const blob = await response.blob();
+                const url = URL.createObjectURL(blob);
+                const a = document.createElement('a');
+                a.href = url;
+                a.download = 'converted_files.zip';
+                document.body.appendChild(a);
+                a.click();
+                document.body.removeChild(a);
+                URL.revokeObjectURL(url);
+                showToast('下载已开始', 'success');
+            } else {
+                const data = await response.json();
+                showToast(data.error || '下载失败', 'error');
+            }
+        } catch (error) {
+            showToast(error.message || '网络错误', 'error');
+        }
+    }
+
+    function clearAllDocuments() {
+        convertedDocuments = [];
+        currentDocIndex = -1;
+        updateDocumentList();
+        displayDocument(null);
+        showToast('已清空列表', 'success');
+    }
+
+    function openSettingsModal() {
+        loadLLMConfig();
+        elements.settingsModal.hidden = false;
+    }
+
+    function closeSettingsModal() {
+        elements.settingsModal.hidden = true;
+    }
+
+    async function loadLLMConfig() {
+        try {
+            const response = await fetch('/api/llm-config', {
+                method: 'GET',
+                credentials: 'include',
+            });
+            
+            const data = await response.json();
+            
+            if (data.has_config) {
+                elements.baseUrlInput.value = data.base_url || '';
+                elements.modelInput.value = data.model || 'gpt-4o';
+            }
+        } catch (error) {
+            console.error('Failed to load LLM config:', error);
+        }
+    }
+
+    async function saveLLMConfig(e) {
+        e.preventDefault();
+        
+        const apiKey = elements.apiKeyInput.value.trim();
+        if (!apiKey) {
+            showToast('请输入 API Key', 'error');
+            return;
+        }
+        
+        const config = {
+            api_key: apiKey,
+            base_url: elements.baseUrlInput.value.trim(),
+            model: elements.modelInput.value.trim() || 'gpt-4o',
+        };
+        
+        try {
+            const response = await fetch('/api/llm-config', {
+                method: 'POST',
+                headers: {
+                    'Content-Type': 'application/json',
+                },
+                body: JSON.stringify(config),
+                credentials: 'include',
+            });
+            
+            const data = await response.json();
+            
+            if (data.success) {
+                showToast('配置已保存', 'success');
+                closeSettingsModal();
+            } else {
+                showToast(data.error || '保存失败', 'error');
+            }
+        } catch (error) {
+            showToast(error.message || '网络错误', 'error');
+        }
+    }
+
+    async function clearLLMConfig() {
+        try {
+            const response = await fetch('/api/llm-config', {
+                method: 'DELETE',
+                credentials: 'include',
+            });
+            
+            const data = await response.json();
+            
+            if (data.success) {
+                elements.apiKeyInput.value = '';
+                elements.baseUrlInput.value = '';
+                elements.modelInput.value = 'gpt-4o';
+                showToast('配置已清除', 'success');
+            }
+        } catch (error) {
+            showToast(error.message || '网络错误', 'error');
+        }
+    }
+
+    function toggleApiKeyVisibility() {
+        const isPassword = elements.apiKeyInput.type === 'password';
+        elements.apiKeyInput.type = isPassword ? 'text' : 'password';
+        
+        const showIcon = elements.toggleApiKey.querySelector('.icon-show');
+        const hideIcon = elements.toggleApiKey.querySelector('.icon-hide');
+        
+        if (isPassword) {
+            showIcon.style.display = 'none';
+            hideIcon.style.display = 'block';
+        } else {
+            showIcon.style.display = 'block';
+            hideIcon.style.display = 'none';
+        }
+    }
+
+    elements.uploadBtn.addEventListener('click', openFileDialog);
+    elements.emptyUploadBtn.addEventListener('click', openFileDialog);
+    elements.fileInput.addEventListener('change', (e) => {
+        handleFiles(e.target.files);
+        elements.fileInput.value = '';
+    });
+
+    elements.settingsBtn.addEventListener('click', openSettingsModal);
+    elements.closeSettingsBtn.addEventListener('click', closeSettingsModal);
+
+    elements.toggleBtns.forEach(btn => {
+        btn.addEventListener('click', () => {
+            switchView(btn.dataset.view);
+        });
+    });
+
+    elements.copyBtn.addEventListener('click', copyToClipboard);
+    elements.downloadBtn.addEventListener('click', downloadCurrentDocument);
+    elements.batchDownloadBtn.addEventListener('click', downloadAllDocuments);
+    elements.clearAllBtn.addEventListener('click', clearAllDocuments);
+
+    elements.llmConfigForm.addEventListener('submit', saveLLMConfig);
+    elements.toggleApiKey.addEventListener('click', toggleApiKeyVisibility);
+    elements.clearConfigBtn.addEventListener('click', clearLLMConfig);
+
+    elements.settingsModal.addEventListener('click', (e) => {
+        if (e.target === elements.settingsModal) {
+            closeSettingsModal();
+        }
+    });
+
+    marked.setOptions({
+        breaks: true,
+        gfm: true,
+    });
+
+    displayDocument(null);
+});
diff --git a/web/templates/index.html b/web/templates/index.html
new file mode 100644
index 000000000..f2cfd5c13
--- /dev/null
+++ b/web/templates/index.html
@@ -0,0 +1,221 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>MarkItDown - 文件转 Markdown</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com">
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=Fira+Code:wght@400;500&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
+</head>
+<body>
+    <div class="app-container">
+        <header class="app-header">
+            <div class="header-left">
+                <div class="logo">
+                    <svg class="logo-icon" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg">
+                        <rect width="32" height="32" rx="8" fill="#6366F1"/>
+                        <path d="M8 8H24V24H8V8Z" fill="#818CF8"/>
+                        <path d="M10 11L14 17L18 13L22 20" stroke="#EEF2FF" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                        <path d="M10 21H22" stroke="#EEF2FF" stroke-width="2" stroke-linecap="round"/>
+                    </svg>
+                    <h1>MarkItDown</h1>
+                </div>
+            </div>
+            <div class="header-right">
+                <button class="btn-primary" id="uploadBtn">
+                    <svg width="18" height="18" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                        <path d="M12 5V19M5 12H19" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                    </svg>
+                    上传文件
+                </button>
+                <button class="btn-icon" id="settingsBtn" title="设置">
+                    <svg width="18" height="18" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                        <path d="M12 15C13.6569 15 15 13.6569 15 12C15 10.3431 13.6569 9 12 9C10.3431 9 9 10.3431 9 12C9 13.6569 10.3431 15 12 15Z" stroke="currentColor" stroke-width="2" stroke-linejoin="round"/>
+                        <path d="M19.4 15C19.1 14.2 18.8 13.2 18.9 12.3C18.9 11.9 19 11.5 19 11.1C19.5 9.9 19.4 8.5 18.8 7.4C18.7 7.1 18.5 6.8 18.3 6.6L16.3 4.6C16 4.3 15.7 4.1 15.3 4.1C14.3 4.2 13.5 4.3 12.7 4.5C11.9 4.7 11.1 4.9 10.4 5.3C10.1 5.5 9.8 5.6 9.4 5.6C9 5.6 8.7 5.5 8.4 5.3L6.4 3.3C6.2 3.1 6 2.9 5.7 2.9C5.3 2.9 4.9 3.1 4.6 3.4L3.5 4.5C3.2 4.8 3 5.2 2.9 5.6C2.7 6.4 2.5 7.2 2.4 8.1C2.2 8.9 2.1 9.7 2.1 10.6C2.1 11 2.1 11.4 2.2 11.8C2.4 12.6 2.7 13.5 3.1 14.3C3.3 14.7 3.5 15 3.8 15.2L5.8 17.2C6.1 17.5 6.4 17.7 6.8 17.7C7.2 17.7 7.5 17.6 7.8 17.4C8.6 17.6 9.4 17.8 10.2 17.9C11 18.1 11.8 18.1 12.7 18.1C13.5 18.1 14.3 18 15.1 17.8C15.9 17.6 16.7 17.4 17.4 17C17.8 16.8 18.1 16.5 18.3 16.2L20.3 14.2C20.6 13.9 20.8 13.5 20.8 13.1C20.8 12.7 20.7 12.4 20.5 12.1L19.4 15Z" stroke="currentColor" stroke-width="2" stroke-linejoin="round"/>
+                    </svg>
+                </button>
+            </div>
+        </header>
+
+        <main class="main-layout">
+            <section class="preview-area">
+                <div class="preview-header" id="previewHeader" hidden>
+                    <div class="preview-file-info">
+                        <span class="preview-file-icon" id="previewFileIcon">📄</span>
+                        <div class="preview-file-details">
+                            <h3 id="previewFilename">document.pdf</h3>
+                            <span class="preview-file-type" id="previewFileType">PDF Document</span>
+                        </div>
+                    </div>
+                    <div class="preview-actions">
+                        <button class="btn-ghost" id="copyBtn" title="复制到剪贴板">
+                            <svg width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M8 8V16C8 17.1046 8.89543 18 10 18H18C19.1046 18 20 17.1046 20 16V8C20 6.89543 19.1046 6 18 6H10C8.89543 6 8 6.89543 8 8Z" stroke="currentColor" stroke-width="2"/>
+                                <path d="M6 10H5C4.44772 10 4 10.4477 4 11V18C4 19.1046 4.89543 20 6 20H13C13.5523 20 14 19.5523 14 19V18" stroke="currentColor" stroke-width="2"/>
+                            </svg>
+                            复制
+                        </button>
+                        <button class="btn-primary" id="downloadBtn" title="下载 Markdown">
+                            <svg width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M12 3V15M12 15L8 11M12 15L16 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                                <path d="M5 21H19" stroke="currentColor" stroke-width="2" stroke-linecap="round"/>
+                            </svg>
+                            下载
+                        </button>
+                    </div>
+                </div>
+                
+                <div class="view-toggle" id="viewToggle" hidden>
+                    <button class="toggle-btn active" data-view="markdown">
+                        <svg width="14" height="14" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                            <path d="M3 5C3 3.89543 3.89543 3 5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5Z" stroke="currentColor" stroke-width="2"/>
+                            <path d="M7 17V13L10 16L13 13V17" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                            <path d="M17 13L14 17H20L17 13Z" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                        </svg>
+                        Markdown
+                    </button>
+                    <button class="toggle-btn" data-view="preview">
+                        <svg width="14" height="14" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                            <path d="M1 12C1 12 5 4 12 4C19 4 23 12 23 12C23 12 19 20 12 20C5 20 1 12 1 12Z" stroke="currentColor" stroke-width="2"/>
+                            <circle cx="12" cy="12" r="3" stroke="currentColor" stroke-width="2"/>
+                        </svg>
+                        预览
+                    </button>
+                </div>
+
+                <div class="preview-content-area">
+                    <div class="empty-state" id="emptyState">
+                        <div class="empty-icon">
+                            <svg viewBox="0 0 64 64" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M32 8L40 24H56L44 34L48 50L32 40L16 50L20 34L8 24H24L32 8Z" fill="#E0E7FF"/>
+                                <path d="M32 8L40 24H56L44 34L48 50L32 40L16 50L20 34L8 24H24L32 8Z" stroke="#6366F1" stroke-width="2" stroke-linejoin="round"/>
+                            </svg>
+                        </div>
+                        <h3>暂无转换的文档</h3>
+                        <p>点击顶部的「上传文件」按钮开始转换</p>
+                        <button class="btn-primary" id="emptyUploadBtn">
+                            <svg width="18" height="18" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M12 5V19M5 12H19" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                            </svg>
+                            上传文件
+                        </button>
+                    </div>
+
+                    <div class="result-content" id="resultContent" hidden>
+                        <div class="markdown-view active" id="markdownView">
+                            <pre><code id="markdownContent"></code></pre>
+                        </div>
+                        <div class="preview-view" id="previewView">
+                            <div class="preview-content" id="previewContent"></div>
+                        </div>
+                    </div>
+                </div>
+            </section>
+
+            <section class="sidebar">
+                <div class="sidebar-header">
+                    <h3>转换结果</h3>
+                    <div class="sidebar-actions" id="sidebarActions" hidden>
+                        <button class="btn-icon" id="batchDownloadBtn" title="下载全部">
+                            <svg width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M12 3V15M12 15L8 11M12 15L16 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                                <path d="M5 21H19" stroke="currentColor" stroke-width="2" stroke-linecap="round"/>
+                                <path d="M20 17V13C20 11.8954 19.1046 11 18 11H16V7C16 5.89543 15.1046 5 14 5H10C8.89543 5 8 5.89543 8 7V11H6C4.89543 11 4 11.8954 4 13V17" stroke="currentColor" stroke-width="2" stroke-linecap="round"/>
+                            </svg>
+                        </button>
+                        <button class="btn-icon" id="clearAllBtn" title="清空列表">
+                            <svg width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                <path d="M19 7L18.13 19.142C18.0615 20.0686 17.2874 20.7991 16.36 20.7991H7.64C6.71258 20.7991 5.93845 20.0686 5.87 19.142L5 7M10 11V17M14 11V17M21 7H3M17 7L15.708 4.416C15.5187 4.03543 15.1335 3.79907 14.72 3.79907H9.28C8.86647 3.79907 8.48133 4.03543 8.292 4.416L7 7" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                            </svg>
+                        </button>
+                    </div>
+                </div>
+                <div class="document-list" id="documentList">
+                    <div class="empty-list" id="emptyList">
+                        <p>暂无转换记录</p>
+                    </div>
+                </div>
+            </section>
+        </main>
+    </div>
+
+    <input type="file" id="fileInput" accept=".pdf,.docx,.doc,.pptx,.ppt,.xlsx,.xls,.jpg,.jpeg,.png,.html,.htm,.csv,.json,.xml,.epub,.txt,.md,.ipynb" multiple hidden>
+
+    <div class="modal-overlay" id="modalOverlay" hidden>
+        <div class="modal" id="processingModal">
+            <div class="modal-content">
+                <div class="spinner">
+                    <div class="spinner-ring"></div>
+                    <div class="spinner-path">
+                        <svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                            <path d="M12 3L15.09 9.26L22 9.27L16.54 13.37L18.18 20.02L12 16.77L5.82 20.02L7.46 13.37L2 9.27L8.91 9.26L12 3Z" fill="currentColor"/>
+                        </svg>
+                    </div>
+                </div>
+                <h3 id="processingTitle">正在处理...</h3>
+                <p id="processingFilename">准备中</p>
+                <div class="progress-info" id="progressInfo" hidden>
+                    <span id="progressCurrent">0</span> / <span id="progressTotal">0</span>
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <div class="modal-overlay" id="settingsModal" hidden>
+        <div class="modal modal-settings">
+            <div class="modal-header">
+                <h3>大模型 API 配置</h3>
+                <button class="btn-icon" id="closeSettingsBtn">
+                    <svg width="18" height="18" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                        <path d="M18 6L6 18M6 6L18 18" stroke="currentColor" stroke-width="2" stroke-linecap="round"/>
+                    </svg>
+                </button>
+            </div>
+            <div class="modal-body">
+                <form id="llmConfigForm">
+                    <div class="form-group">
+                        <label for="apiKeyInput">API Key <span class="required">*</span></label>
+                        <div class="input-wrapper">
+                            <input type="password" id="apiKeyInput" placeholder="sk-..." autocomplete="off">
+                            <button type="button" class="toggle-password" id="toggleApiKey">
+                                <svg class="icon-show" width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
+                                    <path d="M1 12C1 12 5 4 12 4C19 4 23 12 23 12C23 12 19 20 12 20C5 20 1 12 1 12Z" stroke="currentColor" stroke-width="2"/>
+                                    <circle cx="12" cy="12" r="3" stroke="currentColor" stroke-width="2"/>
+                                </svg>
+                                <svg class="icon-hide" width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" style="display:none">
+                                    <path d="M17.94 17.94C16.23 19.24 14.2 20 12 20C5 20 1 12 1 12C2.24 9.68 4 7.76 6.06 6.06M1 1L23 23M15.5 11.25C15.55 11.5 15.58 11.75 15.58 12C15.58 14 13.97 15.58 12 15.58C11.75 15.58 11.5 15.55 11.25 15.5M8.42 8.42C7.77 8.95 7.29 9.66 7.03 10.47C6.78 11.28 6.76 12.13 6.99 12.95C7.22 13.77 7.7 14.53 8.38 15.13C9.06 15.73 9.92 16.14 10.82 16.27C11.72 16.4 12.65 16.25 13.5 15.82" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                                    <path d="M12 7C14.5 7 16.73 8.24 18.29 10.23C19.85 12.22 20.54 14.59 20.23 16.9" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"/>
+                                </svg>
+                            </button>
+                        </div>
+                        <p class="form-hint">用于图片描述和 OCR 功能的大模型 API Key</p>
+                    </div>
+                    <div class="form-group">
+                        <label for="baseUrlInput">API Base URL</label>
+                        <input type="text" id="baseUrlInput" placeholder="https://api.openai.com/v1">
+                        <p class="form-hint">可选，用于兼容其他 OpenAI 兼容的 API 服务</p>
+                    </div>
+                    <div class="form-group">
+                        <label for="modelInput">Model</label>
+                        <input type="text" id="modelInput" placeholder="gpt-4o" value="gpt-4o">
+                        <p class="form-hint">要使用的模型名称</p>
+                    </div>
+                    <div class="form-actions">
+                        <button type="button" class="btn-ghost" id="clearConfigBtn">清除配置</button>
+                        <button type="submit" class="btn-primary">保存配置</button>
+                    </div>
+                </form>
+            </div>
+        </div>
+    </div>
+
+    <div class="toast" id="toast">
+        <span id="toastMessage"></span>
+    </div>
+
+    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
+    <script src="{{ url_for('static', filename='js/app.js') }}"></script>
+</body>
+</html>
diff --git "a/\345\210\206\346\236\220\346\212\245\345\221\212.md" "b/\345\210\206\346\236\220\346\212\245\345\221\212.md"
new file mode 100644
index 000000000..96f55ba28
--- /dev/null
+++ "b/\345\210\206\346\236\220\346\212\245\345\221\212.md"
@@ -0,0 +1,820 @@
+# MarkItDown 项目分析报告
+
+## 1. 项目概述
+
+### 1.1 项目定位
+MarkItDown 是微软开源的轻量级 Python 工具，专门用于将各种文件格式转换为 Markdown 格式，主要面向大语言模型（LLM）和文本分析管道使用。
+
+### 1.2 项目价值
+- **结构保留**：与 `textract` 等工具相比，MarkItDown 更专注于保留文档结构（标题、列表、表格、链接等）
+- **LLM 友好**：Markdown 格式接近纯文本，最小化标记开销，同时保留重要文档结构
+- **Token 高效**：LLM（如 GPT-4o）原生"理解"Markdown，训练数据中包含大量 Markdown 格式文本
+
+### 1.3 支持的文件格式
+- **文档格式**：PDF、Word (DOCX)、Excel (XLSX/XLS)、PowerPoint (PPTX)、EPUB
+- **网页格式**：HTML、Wikipedia、YouTube、Bing 搜索结果、RSS
+- **媒体格式**：图片（EXIF元数据+OCR）、音频（EXIF元数据+语音转录）
+- **其他格式**：ZIP文件、CSV、JSON、XML、Jupyter Notebook、Outlook消息
+- **云端服务**：Azure Document Intelligence 集成
+
+---
+
+## 2. 系统架构与实现方式
+
+### 2.1 核心架构设计
+
+#### 2.1.1 核心类层次结构
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    MarkItDown (主入口类)                      │
+├─────────────────────────────────────────────────────────────┤
+│  - _converters: List[ConverterRegistration]  # 转换器注册列表  │
+│  - _requests_session: Session                 # HTTP会话       │
+│  - _magika: Magika                            # 文件类型检测    │
+│  - _llm_client / _llm_model                   # LLM配置        │
+│  - _plugins_enabled: bool                      # 插件启用状态    │
+├─────────────────────────────────────────────────────────────┤
+│  核心方法：                                                    │
+│  - convert()                 # 统一转换入口                    │
+│  - convert_local()           # 本地文件转换                    │
+│  - convert_stream()          # 流转换                          │
+│  - convert_uri()             # URI转换                         │
+│  - convert_response()        # HTTP响应转换                    │
+│  - register_converter()      # 注册转换器                      │
+│  - enable_plugins()          # 启用插件                        │
+└─────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│              DocumentConverter (转换器基类)                    │
+├─────────────────────────────────────────────────────────────┤
+│  核心方法：                                                    │
+│  - accepts()                 # 判断是否接受该文件               │
+│  - convert()                 # 执行转换                        │
+└─────────────────────────────────────────────────────────────┘
+                              │
+          ┌───────────────────┼───────────────────┐
+          ▼                   ▼                   ▼
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+│ 内置转换器       │  │ 第三方插件转换器  │  │ DocumentIntel.. │
+│ (20+种格式)     │  │  (动态加载)      │  │ 云端转换器      │
+└─────────────────┘  └─────────────────┘  └─────────────────┘
+```
+
+#### 2.1.2 数据流处理架构
+
+```
+输入源
+  │
+  ▼
+┌──────────────────────────────────────────────────────────┐
+│                    输入类型分发层                          │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐│
+│  │本地路径  │  │  URL/URI │  │ 流对象   │  │ Response ││
+│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘│
+│       │             │             │             │        │
+│       ▼             ▼             ▼             ▼        │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │              统一转换为 StreamInfo 元数据            │ │
+│  │  - mimetype: 媒体类型                                │ │
+│  │  - extension: 文件扩展名                             │ │
+│  │  - charset: 字符编码                                 │ │
+│  │  - filename: 文件名                                  │ │
+│  │  - url/ local_path: 来源信息                         │ │
+│  └─────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌──────────────────────────────────────────────────────────┐
+│                    文件类型识别层                          │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │  1. 扩展名识别 → mimetypes.guess_type()             │ │
+│  │  2. MIME类型识别 → 扩展名反向映射                     │ │
+│  │  3. 内容识别 → Magika 库 (基于ML的文件类型检测)      │ │
+│  │  4. 编码识别 → charset_normalizer                    │ │
+│  └─────────────────────────────────────────────────────┘ │
+│                           │                               │
+│                           ▼                               │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │              生成 StreamInfo 猜测列表                 │ │
+│  │  - 兼容模式：合并多种识别结果                          │ │
+│  │  - 冲突模式：分别尝试各种可能的类型                    │ │
+│  └─────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌──────────────────────────────────────────────────────────┐
+│                    转换器选择与执行层                      │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │              转换器按优先级排序 (升序)                 │ │
+│  │  - PRIORITY_SPECIFIC_FILE_FORMAT = 0.0  (高优先级) │ │
+│  │  - PRIORITY_GENERIC_FILE_FORMAT = 10.0 (低优先级)  │ │
+│  │  - 插件转换器可自定义优先级 (-1.0 可覆盖内置)         │ │
+│  └─────────────────────────────────────────────────────┘ │
+│                           │                               │
+│                           ▼                               │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │              转换器执行流程                           │ │
+│  │  for stream_info in 猜测列表:                        │ │
+│  │    for converter in 已排序转换器:                     │ │
+│  │      1. converter.accepts(stream, info) → bool      │ │
+│  │      2. if True: converter.convert(stream, info)    │ │
+│  │      3. 成功则返回结果，失败则继续尝试                │ │
+│  └─────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌──────────────────────────────────────────────────────────┐
+│                    结果处理与规范化层                      │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │  DocumentConverterResult:                             │ │
+│  │  - markdown: 转换后的Markdown内容                     │ │
+│  │  - title: 可选的文档标题                              │ │
+│  │  - text_content: markdown的软别名 (已废弃)           │ │
+│  └─────────────────────────────────────────────────────┘ │
+│                           │                               │
+│                           ▼                               │
+│  ┌─────────────────────────────────────────────────────┐ │
+│  │  内容规范化：                                          │ │
+│  │  1. 行尾空格去除                                      │ │
+│  │  2. 换行符统一为 \n                                   │ │
+│  │  3. 3个以上连续空行压缩为2个                          │ │
+│  └─────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────┘
+```
+
+### 2.2 模块协作机制
+
+#### 2.2.1 转换器注册与优先级系统
+
+**核心设计理念**：
+- 采用**责任链模式**与**策略模式**结合的设计
+- 转换器按优先级排序，优先级值越小优先级越高
+- 新注册的转换器插入列表头部，同优先级时后注册的先尝试
+
+**关键代码位置**：`_markitdown.py:49-60, 641-671`
+
+```python
+# 优先级定义
+PRIORITY_SPECIFIC_FILE_FORMAT = 0.0    # 特定格式转换器
+PRIORITY_GENERIC_FILE_FORMAT = 10.0     # 通用格式转换器
+
+# 注册方法 (插入列表头部)
+def register_converter(self, converter, priority=PRIORITY_SPECIFIC_FILE_FORMAT):
+    self._converters.insert(0, ConverterRegistration(
+        converter=converter, 
+        priority=priority
+    ))
+```
+
+**内置转换器注册顺序与优先级**（`_markitdown.py:178-206`）：
+
+| 优先级 | 转换器 | 说明 |
+|--------|--------|------|
+| 10.0 | PlainTextConverter | 纯文本兜底转换器 |
+| 10.0 | ZipConverter | ZIP文件处理 |
+| 10.0 | HtmlConverter | HTML转换 |
+| 0.0 | RssConverter | RSS订阅 |
+| 0.0 | WikipediaConverter | Wikipedia特殊处理 |
+| 0.0 | YouTubeConverter | YouTube字幕提取 |
+| 0.0 | BingSerpConverter | Bing搜索结果 |
+| 0.0 | DocxConverter | Word文档 |
+| 0.0 | XlsxConverter | Excel文档 |
+| 0.0 | XlsConverter | 旧版Excel |
+| 0.0 | PptxConverter | PowerPoint |
+| 0.0 | AudioConverter | 音频处理 |
+| 0.0 | ImageConverter | 图片处理 |
+| 0.0 | IpynbConverter | Jupyter Notebook |
+| 0.0 | PdfConverter | PDF处理 |
+| 0.0 | OutlookMsgConverter | Outlook消息 |
+| 0.0 | EpubConverter | EPUB电子书 |
+| 0.0 | CsvConverter | CSV文件 |
+| 动态 | DocumentIntelligenceConverter | 云端OCR (仅当配置endpoint时) |
+
+#### 2.2.2 文件类型检测流程
+
+**多层检测机制**（`_markitdown.py:673-773`）：
+
+```
+输入流
+  │
+  ▼
+┌─────────────────┐
+│ 1. 扩展名检测   │
+│ mimetypes模块   │
+└────────┬────────┘
+         │ 成功则使用
+         ▼
+┌─────────────────┐     失败
+│ 2. MIME反向映射 │ ──────────┐
+│ mimetypes模块   │           │
+└────────┬────────┘           │
+         │ 成功则使用          │
+         ▼                    │
+┌─────────────────┐           │
+│ 3. 内容检测     │           │
+│ Magika (ML)    │◄──────────┘
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│ 4. 编码检测     │
+│ charset_normalizer│
+└────────┬────────┘
+         │
+         ▼
+   生成猜测列表
+```
+
+**Magika 集成的优势**：
+- 基于机器学习的文件类型检测
+- 准确率高于传统的魔数(magic number)检测
+- 支持100+种文件类型
+- 轻量级，无外部依赖
+
+#### 2.2.3 插件系统架构
+
+**插件加载机制**（`_markitdown.py:65-83, 232-250`）：
+
+```python
+# 插件入口点定义 (setup.py/pyproject.toml)
+# entry_points = {
+#     'markitdown.plugin': ['plugin_name = module:register_converters']
+# }
+
+def _load_plugins():
+    """懒加载插件，通过 entry_points 发现"""
+    global _plugins
+    if _plugins is not None:
+        return _plugins
+    
+    _plugins = []
+    for entry_point in entry_points(group="markitdown.plugin"):
+        try:
+            _plugins.append(entry_point.load())
+        except Exception:
+            warn(f"插件 '{entry_point.name}' 加载失败...跳过")
+    return _plugins
+```
+
+**插件接口规范**（参考 `markitdown-sample-plugin`）：
+
+```python
+__plugin_interface_version__ = 1  # 接口版本
+
+def register_converters(markitdown: MarkItDown, **kwargs):
+    """
+    插件必须实现此函数
+    调用 markitdown.register_converter() 注册自定义转换器
+    """
+    markitdown.register_converter(MyCustomConverter())
+```
+
+**插件优先级策略**（`markitdown-ocr` 示例）：
+- 插件注册转换器时使用 `priority=-1.0`
+- 低于内置转换器的 `0.0`，因此优先尝试
+- 失败时自动回退到内置转换器
+
+#### 2.2.4 LLM 集成机制
+
+**LLM 使用场景**：
+1. 图片描述生成
+2. OCR 文本提取（markitdown-ocr 插件）
+3. 未来可能的高级文档理解
+
+**配置方式**（`_markitdown.py:97-138`）：
+
+```python
+md = MarkItDown(
+    llm_client=OpenAI(),           # OpenAI兼容客户端
+    llm_model="gpt-4o",             # 模型名称
+    llm_prompt="自定义提示词"        # 可选自定义提示
+)
+```
+
+**LLM 调用在转换器中的传递**（`_markitdown.py:565-579`）：
+
+```python
+# 转换时自动传递给各个转换器
+_kwargs = {}
+if "llm_client" not in _kwargs and self._llm_client is not None:
+    _kwargs["llm_client"] = self._llm_client
+if "llm_model" not in _kwargs and self._llm_model is not None:
+    _kwargs["llm_model"] = self._llm_model
+# ... 其他配置同样传递
+```
+
+---
+
+## 3. 核心模块详解
+
+### 3.1 MarkItDown 主类
+
+**文件位置**：`packages/markitdown/src/markitdown/_markitdown.py`
+
+**核心职责**：
+1. 转换器注册表管理
+2. 输入类型分发
+3. 文件类型检测与猜测
+4. 转换执行与错误处理
+5. 插件生命周期管理
+
+**关键内部方法**：
+
+| 方法 | 功能 | 位置 |
+|------|------|------|
+| `_convert()` | 核心转换循环 | 538-631 |
+| `_get_stream_info_guesses()` | 文件类型猜测 | 673-773 |
+| `_load_plugins()` | 插件懒加载 | 65-83 |
+| `_normalize_charset()` | 编码规范化 | 774-783 |
+
+### 3.2 DocumentConverter 基类
+
+**文件位置**：`packages/markitdown/src/markitdown/_base_converter.py`
+
+**接口契约**：
+
+```python
+class DocumentConverter:
+    def accepts(
+        self, 
+        file_stream: BinaryIO, 
+        stream_info: StreamInfo, 
+        **kwargs
+    ) -> bool:
+        """快速判断是否可处理该文件
+        
+        重要：此方法不得修改 file_stream 位置，
+        如需读取必须先 seek() 恢复
+        """
+        raise NotImplementedError()
+    
+    def convert(
+        self, 
+        file_stream: BinaryIO, 
+        stream_info: StreamInfo, 
+        **kwargs
+    ) -> DocumentConverterResult:
+        """执行实际转换"""
+        raise NotImplementedError()
+```
+
+### 3.3 StreamInfo 元数据类
+
+**文件位置**：`packages/markitdown/src/markitdown/_stream_info.py`
+
+```python
+@dataclass(kw_only=True, frozen=True)
+class StreamInfo:
+    mimetype: Optional[str] = None      # e.g., "application/pdf"
+    extension: Optional[str] = None     # e.g., ".pdf"
+    charset: Optional[str] = None       # e.g., "utf-8"
+    filename: Optional[str] = None      # 文件名
+    local_path: Optional[str] = None    # 本地路径
+    url: Optional[str] = None            # 来源URL
+    
+    def copy_and_update(self, *args, **kwargs):
+        """不可变对象的复制更新模式"""
+```
+
+### 3.4 典型转换器实现分析
+
+#### 3.4.1 PdfConverter 深度分析
+
+**文件位置**：`packages/markitdown/src/markitdown/converters/_pdf_converter.py`
+
+**核心依赖**：
+- `pdfplumber`：表格提取、表单分析
+- `pdfminer.six`：文本提取（兜底方案）
+
+**处理策略**：
+
+```
+PDF文件
+  │
+  ▼
+┌──────────────────────────────────┐
+│ 逐页分析 (pdfplumber)             │
+│ ┌──────────────────────────────┐ │
+│ │ 表单风格检测 (_extract_form_) │ │
+│ │ 通过单词位置聚类识别列边界    │ │
+│ └──────────────┬───────────────┘ │
+│                │                  │
+│     ┌──────────┴──────────┐       │
+│     ▼                     ▼       │
+│  检测到表单            纯文本页面  │
+│  (表格/无框表单)        │          │
+│     │                    │         │
+│     ▼                    ▼         │
+│  表格格式化提取      pdfminer提取  │
+│  Markdown表格             │        │
+│     │                    │         │
+│     └──────────┬──────────┘         │
+│                ▼                     │
+│      合并所有页面内容                 │
+└────────────────┬─────────────────────┘
+                 │
+                 ▼
+┌──────────────────────────────────┐
+│ 后处理                            │
+│ - MasterFormat编号行合并          │
+│ - 异常页面回退到 pdfminer         │
+└────────────────┬──────────────────┘
+                 ▼
+         DocumentConverterResult
+```
+
+**智能表格检测算法**（`_pdf_converter.py:398-492`）：
+
+关键特征：
+1. **列边界聚类**：通过x位置聚类识别潜在列
+2. **内容密度判断**：单元格字符数 >30 视为非表格
+3. **列数阈值**：3-10列视为表格，否则为多栏文本
+4. **行覆盖率**：表格行需占总行数 20% 以上
+
+#### 3.4.2 DocxConverter 分析
+
+**核心依赖**：`mammoth` 库
+
+**处理流程**：
+1. DOCX → HTML (mammoth)
+2. HTML → Markdown (markdownify)
+3. 支持自定义 style_map 控制转换
+
+#### 3.4.3 ZipConverter 特殊处理
+
+**独特设计**：需要访问父转换器列表
+
+```python
+# _markitdown.py:581
+_kwargs["_parent_converters"] = self._converters
+```
+
+**用途**：递归处理 ZIP 内部文件时，使用相同的转换器集合
+
+---
+
+## 4. 命令行接口 (CLI)
+
+### 4.1 入口模块
+
+**文件位置**：`packages/markitdown/src/markitdown/__main__.py`
+
+### 4.2 命令行参数
+
+| 参数 | 简写 | 说明 |
+|------|------|------|
+| `--output` | `-o` | 输出文件路径（默认stdout） |
+| `--extension` | `-x` | 提供文件扩展名提示 |
+| `--mime-type` | `-m` | 提供MIME类型提示 |
+| `--charset` | `-c` | 提供字符编码提示 |
+| `--use-docintel` | `-d` | 使用Azure Document Intelligence |
+| `--endpoint` | `-e` | Document Intelligence端点 |
+| `--use-plugins` | `-p` | 启用第三方插件 |
+| `--list-plugins` | | 列出已安装插件 |
+| `--keep-data-uris` | | 保留base64编码的图片 |
+| `--version` | `-v` | 显示版本号 |
+
+### 4.3 输出处理
+
+```python
+def _handle_output(args, result):
+    if args.output:
+        with open(args.output, "w", encoding="utf-8") as f:
+            f.write(result.markdown)
+    else:
+        # stdout 编码错误处理：replace模式
+        print(result.markdown.encode(
+            sys.stdout.encoding, errors="replace"
+        ).decode(sys.stdout.encoding))
+```
+
+---
+
+## 5. 依赖与包结构
+
+### 5.1 包结构
+
+```
+packages/
+├── markitdown/              # 核心包
+│   ├── src/markitdown/
+│   │   ├── converters/      # 20+个转换器
+│   │   ├── converter_utils/ # 工具函数 (docx数学公式)
+│   │   ├── __init__.py
+│   │   ├── __main__.py      # CLI入口
+│   │   ├── _markitdown.py   # 核心类
+│   │   ├── _base_converter.py
+│   │   ├── _stream_info.py
+│   │   ├── _uri_utils.py
+│   │   └── _exceptions.py
+│   ├── tests/               # 测试套件
+│   └── pyproject.toml
+├── markitdown-ocr/          # OCR插件包
+├── markitdown-mcp/          # MCP协议服务
+└── markitdown-sample-plugin/# 示例插件
+```
+
+### 5.2 核心依赖
+
+**必需依赖**（`pyproject.toml:26-33`）：
+| 包名 | 用途 |
+|------|------|
+| beautifulsoup4 | HTML解析 |
+| requests | HTTP请求 |
+| markdownify | HTML → Markdown |
+| magika~=0.6.1 | 文件类型检测 |
+| charset-normalizer | 编码检测 |
+| defusedxml | 安全XML解析 |
+
+**可选依赖分组**：
+| 分组 | 包含依赖 | 支持格式 |
+|------|----------|----------|
+| `[all]` | 全部可选依赖 | 所有格式 |
+| `[pptx]` | python-pptx | PowerPoint |
+| `[docx]` | mammoth, lxml | Word |
+| `[xlsx]` | pandas, openpyxl | Excel |
+| `[xls]` | pandas, xlrd | 旧版Excel |
+| `[pdf]` | pdfminer.six, pdfplumber | PDF |
+| `[outlook]` | olefile | Outlook MSG |
+| `[audio-transcription]` | pydub, SpeechRecognition | 音频转录 |
+| `[youtube-transcription]` | youtube-transcript-api | YouTube字幕 |
+| `[az-doc-intel]` | azure-ai-documentintelligence, azure-identity | 云端OCR |
+
+---
+
+## 6. 异常处理体系
+
+### 6.1 自定义异常层次
+
+**文件位置**：`packages/markitdown/src/markitdown/_exceptions.py`
+
+```
+MarkItDownException (基类)
+├── UnsupportedFormatException    # 无转换器支持此格式
+├── FileConversionException       # 转换过程出错
+│   └── 包含 FailedConversionAttempt 列表
+└── MissingDependencyException    # 缺少可选依赖
+```
+
+### 6.2 转换失败处理流程
+
+**`_markitdown.py:544-631`**：
+
+```python
+def _convert(self, file_stream, stream_info_guesses, **kwargs):
+    failed_attempts = []  # 记录失败的尝试
+    
+    # 按优先级排序转换器
+    sorted_registrations = sorted(self._converters, key=lambda x: x.priority)
+    
+    # 尝试每种猜测的文件类型
+    for stream_info in stream_info_guesses + [StreamInfo()]:
+        for converter_registration in sorted_registrations:
+            converter = converter_registration.converter
+            
+            # 1. 检查是否接受
+            if converter.accepts(file_stream, stream_info, **_kwargs):
+                try:
+                    # 2. 尝试转换
+                    res = converter.convert(file_stream, stream_info, **_kwargs)
+                except Exception:
+                    # 3. 记录失败，继续尝试
+                    failed_attempts.append(
+                        FailedConversionAttempt(
+                            converter=converter, 
+                            exc_info=sys.exc_info()
+                        )
+                    )
+                finally:
+                    file_stream.seek(cur_pos)  # 重置流位置
+            
+            # 成功则返回
+            if res is not None:
+                # 规范化内容
+                res.text_content = ...
+                return res
+    
+    # 全部失败
+    if failed_attempts:
+        raise FileConversionException(attempts=failed_attempts)
+    raise UnsupportedFormatException(...)
+```
+
+---
+
+## 7. 安全考虑
+
+### 7.1 安全设计要点
+
+**输入验证策略**：
+- 不自动验证输入，由调用方负责
+- 文档明确提示：在不受信任环境中必须清理输入
+
+**API 粒度控制**：
+| API | 访问权限 | 推荐场景 |
+|-----|----------|----------|
+| `convert()` | 本地文件 + URL + 流 | 通用场景，最宽松 |
+| `convert_local()` | 仅本地文件 | 只需要读取本地文件 |
+| `convert_stream()` | 仅已打开的流 | 完全控制的场景 |
+| `convert_response()` | 仅 requests.Response | 自己管理HTTP获取 |
+| `convert_uri()` | URI解析 | 需要URI处理时 |
+
+### 7.2 潜在风险点
+
+1. **文件系统访问**：使用当前进程权限
+2. **网络访问**：可通过 `convert()` 或 URL 发起网络请求
+3. **XML 解析**：使用 `defusedxml` 防止 Billion Laughs 攻击
+4. **ZIP 解压**：可能存在路径遍历风险（需检查实现）
+
+---
+
+## 8. 测试体系
+
+### 8.1 测试结构
+
+```
+packages/markitdown/tests/
+├── test_files/              # 测试文件
+│   ├── test.docx
+│   ├── test.pdf
+│   ├── test.xlsx
+│   ├── test.pptx
+│   ├── test.jpg
+│   ├── test.mp3
+│   └── ... (20+种格式)
+├── expected_outputs/        # 预期输出
+├── _test_vectors.py         # 测试数据定义
+├── test_module_vectors.py   # Python API测试
+├── test_cli_vectors.py      # CLI测试
+├── test_module_misc.py      # 杂项功能测试
+├── test_cli_misc.py         # CLI杂项测试
+├── test_pdf_tables.py       # PDF表格专项测试
+├── test_pdf_memory.py       # PDF内存测试
+├── test_pdf_masterformat.py # MasterFormat编号测试
+└── test_docintel_html.py    # Document Intelligence测试
+```
+
+### 8.2 测试向量定义
+
+**`_test_vectors.py`** 中的测试用例结构：
+
+```python
+@dataclass
+class TestVector:
+    filename: str           # 测试文件名
+    mimetype: str           # 预期MIME类型
+    charset: Optional[str]  # 预期编码
+    url: Optional[str]      # Mock URL
+    must_include: List[str] # 输出必须包含的字符串
+    must_not_include: List[str] # 输出不能包含的字符串
+```
+
+### 8.3 测试覆盖场景
+
+1. **文件类型检测**：`test_guess_stream_info()`
+2. **本地文件转换**：`test_convert_local()`
+3. **带提示的流转换**：`test_convert_stream_with_hints()`
+4. **无提示的流转换**：`test_convert_stream_without_hints()`
+5. **HTTP URI转换**：`test_convert_http_uri()` (CI中跳过)
+6. **File URI转换**：`test_convert_file_uri()`
+7. **Data URI转换**：`test_convert_data_uri()`
+8. **Data URI保留测试**：`test_convert_keep_data_uris()`
+
+### 8.4 测试运行方式
+
+```bash
+# 使用 hatch
+cd packages/markitdown
+pip install hatch
+hatch shell
+hatch test
+
+# 或直接使用 pytest
+pytest tests/ -v
+```
+
+---
+
+## 9. 扩展与插件开发
+
+### 9.1 插件开发流程
+
+1. **创建包结构**：参考 `markitdown-sample-plugin`
+2. **实现转换器**：继承 `DocumentConverter`
+3. **实现注册函数**：`register_converters(markitdown, **kwargs)`
+4. **配置 entry_points**：在 `pyproject.toml` 中声明
+
+### 9.2 插件优先级策略
+
+| 优先级值 | 效果 |
+|----------|------|
+| < 0.0 | 在内置转换器之前尝试 |
+| 0.0 | 与特定格式转换器同优先级 |
+| 0.0 ~ 10.0 | 在特定格式与通用格式之间 |
+| 10.0 | 与通用转换器同优先级 |
+| > 10.0 | 在所有内置转换器之后尝试 |
+
+---
+
+## 10. 性能与资源管理
+
+### 10.1 流处理设计
+
+**不可seek流处理**（`_markitdown.py:369-378`）：
+
+```python
+# 不可seek的流先全部读入内存
+if not stream.seekable():
+    buffer = io.BytesIO()
+    while True:
+        chunk = stream.read(4096)
+        if not chunk:
+            break
+        buffer.write(chunk)
+    buffer.seek(0)
+    stream = buffer
+```
+
+### 10.2 PDF 内存优化
+
+**`_pdf_converter.py:548-566`**：
+
+```python
+# 逐页处理，每页处理后立即释放
+with pdfplumber.open(pdf_bytes) as pdf:
+    for page_idx, page in enumerate(pdf.pages):
+        page_content = _extract_form_content_from_words(page)
+        # ... 处理内容
+        page.close()  # 立即释放缓存的页数据
+```
+
+### 10.3 请求会话复用
+
+**`_markitdown.py:107-118`**：
+
+```python
+# 使用单一 requests.Session 减少连接开销
+self._requests_session = requests.Session()
+self._requests_session.headers.update({
+    "Accept": "text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1"
+})
+```
+
+---
+
+## 11. 版本与兼容性
+
+### 11.1 Python 版本支持
+
+**`pyproject.toml:18-24`**：
+- Python 3.10+
+- CPython 和 PyPy 实现
+
+### 11.2 版本号管理
+
+- 版本号定义在 `__about__.py`
+- 使用 hatch 版本管理
+- 遵循语义化版本控制
+
+---
+
+## 12. 总结
+
+### 12.1 架构优势
+
+1. **高度可扩展**：插件系统 + 优先级机制
+2. **容错性强**：多层检测 + 失败回退策略
+3. **接口简洁**：统一 `convert()` API 处理多种输入
+4. **LLM 友好**：专为大语言模型输入设计
+
+### 12.2 核心设计模式
+
+| 模式 | 应用场景 |
+|------|----------|
+| 责任链模式 | 转换器按优先级依次尝试 |
+| 策略模式 | 不同格式使用不同转换策略 |
+| 模板方法模式 | DocumentConverter 定义接口契约 |
+| 懒加载模式 | 插件仅在启用时加载 |
+| 不可变对象 | StreamInfo 使用 frozen dataclass |
+
+### 12.3 二次开发建议
+
+1. **新增格式支持**：
+   - 继承 `DocumentConverter`
+   - 实现 `accepts()` 和 `convert()`
+   - 通过插件或直接注册
+
+2. **增强现有功能**：
+   - 使用插件机制，设置优先级 < 0.0
+   - 失败时自动回退到内置转换器
+
+3. **自定义转换逻辑**：
+   - 子类化 `MarkItDown`
+   - 重写 `enable_builtins()` 自定义转换器注册
+
+---
+
+*报告生成时间：2026-04-28*
+*基于项目版本：当前开发分支*
diff --git "a/\345\274\200\345\217\221\350\200\205\346\211\213\345\206\214.md" "b/\345\274\200\345\217\221\350\200\205\346\211\213\345\206\214.md"
new file mode 100644
index 000000000..bfc2716a3
--- /dev/null
+++ "b/\345\274\200\345\217\221\350\200\205\346\211\213\345\206\214.md"
@@ -0,0 +1,769 @@
+# MarkItDown 开发者手册
+
+---
+
+## 目录
+
+1. [项目架构深度解析](#1-项目架构深度解析)
+2. [开发环境搭建](#2-开发环境搭建)
+3. [核心 API 详解](#3-核心-api-详解)
+4. [转换器开发指南](#4-转换器开发指南)
+5. [插件开发指南](#5-插件开发指南)
+6. [测试指南](#6-测试指南)
+7. [代码规范与质量保证](#7-代码规范与质量保证)
+8. [高级扩展技巧](#8-高级扩展技巧)
+9. [贡献指南](#9-贡献指南)
+10. [常见开发问题](#10-常见开发问题)
+
+---
+
+## 1. 项目架构深度解析
+
+### 1.1 整体架构
+
+MarkItDown 采用了**责任链模式**与**策略模式**相结合的架构设计，核心思想是：
+
+- **可扩展性**：通过转换器注册机制支持动态添加新格式
+- **容错性**：多层检测 + 失败回退策略
+- **简洁性**：统一的 API 处理多种输入源
+
+#### 架构图
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         输入层                                    │
+│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐  │
+│  │ 本地文件  │ │   URL    │ │   流     │ │ requests.Response │  │
+│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬─────────┘  │
+│       │            │            │                 │              │
+│       └────────────┴──────┬─────┴─────────────────┘              │
+│                            ▼                                       │
+│                  ┌─────────────────┐                              │
+│                  │  输入分发器      │                              │
+│                  │  (MarkItDown)  │                              │
+│                  └────────┬────────┘                              │
+└───────────────────────────┼───────────────────────────────────────┘
+                            │
+                            ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      文件类型识别层                                │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │  1. 扩展名识别 → mimetypes.guess_type()                  │   │
+│  │  2. MIME类型识别 → 扩展名反向映射                          │   │
+│  │  3. 内容识别 → Magika (基于ML的文件类型检测)               │   │
+│  │  4. 编码识别 → charset_normalizer                         │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                            │                                      │
+│                            ▼                                      │
+│              ┌─────────────────────────────┐                     │
+│              │  StreamInfo 猜测列表生成    │                     │
+│              │  - 兼容模式：合并多种结果   │                     │
+│              │  - 冲突模式：分别尝试       │                     │
+│              └──────────────┬──────────────┘                     │
+└─────────────────────────────┼───────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      转换器执行层                                 │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │              转换器按优先级排序 (升序)                    │   │
+│  │                                                          │   │
+│  │  优先级 0.0: 特定格式转换器 (DOCX, PDF, etc.)            │   │
+│  │  优先级 10.0: 通用格式转换器 (PlainText, HTML, ZIP)      │   │
+│  │  插件可自定义优先级 (-1.0 可覆盖内置)                      │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                            │                                      │
+│                            ▼                                      │
+│              ┌─────────────────────────────┐                     │
+│              │      转换执行流程           │                     │
+│              │  for stream_info in 猜测:   │                     │
+│              │    for converter in 转换器:  │                     │
+│              │      1. converter.accepts()  │                     │
+│              │      2. converter.convert()  │                     │
+│              │      3. 成功返回，失败继续   │                     │
+│              └──────────────┬──────────────┘                     │
+└─────────────────────────────┼───────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      结果处理层                                   │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │  DocumentConverterResult:                                │   │
+│  │  - markdown: 转换后的内容                                │   │
+│  │  - title: 文档标题 (可选)                                │   │
+│  │                                                          │   │
+│  │  内容规范化:                                              │   │
+│  │  1. 行尾空格去除                                          │   │
+│  │  2. 换行符统一为 \n                                       │   │
+│  │  3. 3个以上连续空行压缩为2个                              │   │
+│  └─────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 1.2 目录结构
+
+```
+markitdown/
+├── packages/
+│   ├── markitdown/                    # 核心包
+│   │   ├── src/markitdown/
+│   │   │   ├── __init__.py            # 包入口，导出核心类
+│   │   │   ├── __main__.py            # CLI 入口
+│   │   │   ├── __about__.py           # 版本信息
+│   │   │   ├── _markitdown.py         # 核心类 MarkItDown
+│   │   │   ├── _base_converter.py     # 转换器基类
+│   │   │   ├── _stream_info.py        # 流信息数据类
+│   │   │   ├── _uri_utils.py          # URI 处理工具
+│   │   │   ├── _exceptions.py         # 自定义异常
+│   │   │   ├── converters/            # 转换器模块
+│   │   │   │   ├── __init__.py        # 导出所有转换器
+│   │   │   │   ├── _pdf_converter.py
+│   │   │   │   ├── _docx_converter.py
+│   │   │   │   ├── _xlsx_converter.py
+│   │   │   │   ├── _pptx_converter.py
+│   │   │   │   ├── _html_converter.py
+│   │   │   │   ├── _plain_text_converter.py
+│   │   │   │   ├── _zip_converter.py
+│   │   │   │   ├── _image_converter.py
+│   │   │   │   ├── _audio_converter.py
+│   │   │   │   ├── _ipynb_converter.py
+│   │   │   │   ├── _epub_converter.py
+│   │   │   │   ├── _outlook_msg_converter.py
+│   │   │   │   ├── _csv_converter.py
+│   │   │   │   ├── _rss_converter.py
+│   │   │   │   ├── _wikipedia_converter.py
+│   │   │   │   ├── _youtube_converter.py
+│   │   │   │   ├── _bing_serp_converter.py
+│   │   │   │   ├── _doc_intel_converter.py
+│   │   │   │   ├── _markdownify.py     # HTML → Markdown 工具
+│   │   │   │   ├── _llm_caption.py    # LLM 图像描述
+│   │   │   │   ├── _transcribe_audio.py # 音频转录
+│   │   │   │   └── _exiftool.py       # EXIF 工具
+│   │   │   └── converter_utils/        # 转换器工具函数
+│   │   │       └── docx/               # DOCX 特定工具
+│   │   │           ├── math/           # 数学公式处理
+│   │   │           └── pre_process.py  # 预处理
+│   │   ├── tests/                      # 测试套件
+│   │   │   ├── test_files/             # 测试文件
+│   │   │   ├── expected_outputs/       # 预期输出
+│   │   │   ├── _test_vectors.py        # 测试数据定义
+│   │   │   ├── test_module_vectors.py  # Python API 测试
+│   │   │   ├── test_cli_vectors.py     # CLI 测试
+│   │   │   └── ...
+│   │   ├── pyproject.toml              # 包配置
+│   │   └── README.md
+│   │
+│   ├── markitdown-ocr/                 # OCR 插件
+│   │   ├── src/markitdown_ocr/
+│   │   │   ├── __init__.py
+│   │   │   ├── __about__.py
+│   │   │   ├── _plugin.py              # 插件注册
+│   │   │   ├── _ocr_service.py         # OCR 服务
+│   │   │   ├── _pdf_converter_with_ocr.py
+│   │   │   ├── _docx_converter_with_ocr.py
+│   │   │   ├── _pptx_converter_with_ocr.py
+│   │   │   └── _xlsx_converter_with_ocr.py
+│   │   └── pyproject.toml
+│   │
+│   ├── markitdown-mcp/                 # MCP 协议服务
+│   │   └── ...
+│   │
+│   └── markitdown-sample-plugin/       # 示例插件
+│       ├── src/markitdown_sample_plugin/
+│       │   ├── __init__.py
+│       │   ├── __about__.py
+│       │   └── _plugin.py              # RTF 转换器示例
+│       └── pyproject.toml
+│
+├── .github/
+│   └── workflows/                       # CI/CD 配置
+│       ├── tests.yml                    # 测试工作流
+│       └── pre-commit.yml               # 预提交检查
+│
+├── Dockerfile                           # Docker 构建文件
+├── pyproject.toml                       # 根项目配置
+├── README.md                            # 项目说明
+└── LICENSE                              # 许可证
+```
+
+---
+
+## 2. 开发环境搭建
+
+### 2.1 系统要求
+
+- **Python**: 3.10 或更高版本
+- **操作系统**: Windows 10+, macOS 10.15+, Linux (Ubuntu 20.04+, etc.)
+- **工具**: git, pip, hatch (推荐)
+
+### 2.2 环境配置步骤
+
+#### 步骤 1: 克隆仓库
+
+```bash
+git clone https://github.com/microsoft/markitdown.git
+cd markitdown
+```
+
+#### 步骤 2: 创建虚拟环境
+
+```bash
+# 使用 venv
+python -m venv .venv
+
+# Windows
+.venv\Scripts\activate
+
+# Linux/macOS
+source .venv/bin/activate
+```
+
+#### 步骤 3: 安装开发依赖
+
+```bash
+# 安装 hatch (项目使用 hatch 管理环境)
+pip install hatch
+
+# 进入开发环境 (自动安装所有依赖)
+cd packages/markitdown
+hatch shell
+```
+
+或手动安装：
+
+```bash
+# 安装核心包（可编辑模式）
+pip install -e 'packages/markitdown[all]'
+
+# 安装开发工具
+pip install pytest pytest-cov mypy pre-commit
+```
+
+#### 步骤 4: 安装 pre-commit hooks
+
+```bash
+pre-commit install
+```
+
+#### 步骤 5: 验证安装
+
+```bash
+# 检查版本
+markitdown --version
+
+# 运行基本测试
+pytest packages/markitdown/tests/test_module_vectors.py -v
+```
+
+---
+
+## 3. 核心 API 详解
+
+### 3.1 MarkItDown 类
+
+#### 3.1.1 构造函数参数
+
+```python
+class MarkItDown:
+    def __init__(
+        self,
+        *,
+        enable_builtins: Optional[bool] = None,  # 是否启用内置转换器
+        enable_plugins: Optional[bool] = None,    # 是否启用插件
+        **kwargs,
+    ):
+```
+
+**kwargs 可选参数：**
+
+| 参数 | 类型 | 说明 | 默认值 |
+|------|------|------|--------|
+| `requests_session` | `requests.Session` | 自定义 HTTP 会话 | 自动创建 |
+| `llm_client` | `Any` | OpenAI 兼容客户端 | `None` |
+| `llm_model` | `str` | LLM 模型名称 | `None` |
+| `llm_prompt` | `str` | 图像描述提示词 | `None` |
+| `exiftool_path` | `str` | exiftool 路径 | 自动检测 |
+| `style_map` | `str` | DOCX 样式映射 | `None` |
+| `docintel_endpoint` | `str` | Azure Document Intelligence 端点 | `None` |
+
+#### 3.1.2 核心转换方法
+
+##### `convert()` - 统一转换入口
+
+```python
+def convert(
+    self,
+    source: Union[str, Path, requests.Response, BinaryIO],
+    *,
+    stream_info: Optional[StreamInfo] = None,
+    **kwargs: Any,
+) -> DocumentConverterResult:
+```
+
+**支持的 source 类型：**
+
+| 类型 | 说明 | 示例 |
+|------|------|------|
+| `str` (路径) | 本地文件路径 | `"/path/to/file.pdf"` |
+| `str` (URL) | HTTP/HTTPS URL | `"https://example.com/doc.pdf"` |
+| `Path` | pathlib Path 对象 | `Path("/path/to/file.pdf")` |
+| `requests.Response` | HTTP 响应对象 | `requests.get(url)` |
+| `BinaryIO` | 二进制流 | `open("file.pdf", "rb")` |
+
+##### `convert_local()` - 仅本地文件
+
+```python
+def convert_local(
+    self,
+    path: Union[str, Path],
+    *,
+    stream_info: Optional[StreamInfo] = None,
+    **kwargs: Any,
+) -> DocumentConverterResult:
+```
+
+**特点：**
+- 只能处理本地文件路径
+- 更安全，不会发起网络请求
+- 推荐在服务器端应用中使用
+
+---
+
+## 4. 转换器开发指南
+
+### 4.1 最简转换器示例
+
+```python
+from typing import BinaryIO, Any
+from markitdown import (
+    DocumentConverter,
+    DocumentConverterResult,
+    StreamInfo,
+)
+
+# 接受的文件扩展名
+ACCEPTED_EXTENSIONS = [".myext"]
+
+# 接受的 MIME 类型前缀
+ACCEPTED_MIME_PREFIXES = ["application/x-myformat"]
+
+
+class MyFormatConverter(DocumentConverter):
+    """自定义格式转换器示例"""
+    
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        """判断是否可以处理该文件"""
+        
+        # 获取扩展名和 MIME 类型
+        extension = (stream_info.extension or "").lower()
+        mimetype = (stream_info.mimetype or "").lower()
+        
+        # 检查扩展名
+        if extension in ACCEPTED_EXTENSIONS:
+            return True
+        
+        # 检查 MIME 类型
+        for prefix in ACCEPTED_MIME_PREFIXES:
+            if mimetype.startswith(prefix):
+                return True
+        
+        return False
+    
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        """执行实际转换"""
+        
+        # 读取文件内容
+        content = file_stream.read()
+        
+        # 执行转换逻辑...
+        markdown = self._do_convert(content, stream_info, **kwargs)
+        
+        # 返回结果
+        return DocumentConverterResult(
+            markdown=markdown,
+            title=None  # 或提取的标题
+        )
+    
+    def _do_convert(
+        self,
+        content: bytes,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> str:
+        """实际的转换逻辑"""
+        # TODO: 实现转换逻辑
+        return "# 转换后的内容"
+```
+
+### 4.2 内置转换器注册
+
+在 `packages/markitdown/src/markitdown/_markitdown.py` 的 `enable_builtins()` 方法中添加：
+
+```python
+def enable_builtins(self, **kwargs) -> None:
+    if not self._builtins_enabled:
+        # ... 现有转换器 ...
+        
+        # 注册新转换器
+        from .converters import MyNewConverter
+        self.register_converter(MyNewConverter())
+        
+        self._builtins_enabled = True
+```
+
+---
+
+## 5. 插件开发指南
+
+### 5.1 插件架构概述
+
+MarkItDown 的插件系统基于 Python 的 `entry_points` 机制，允许第三方包动态扩展 MarkItDown 的功能。
+
+### 5.2 创建插件的完整步骤
+
+#### 步骤 1: 创建包结构
+
+```
+markitdown-myplugin/
+├── src/
+│   └── markitdown_myplugin/
+│       ├── __init__.py
+│       ├── __about__.py
+│       └── _plugin.py           # 核心实现
+├── tests/
+├── README.md
+└── pyproject.toml               # 关键：配置 entry_points
+```
+
+#### 步骤 2: 配置 `pyproject.toml`
+
+这是最关键的部分，配置 `entry_points` 让 MarkItDown 能够发现插件：
+
+```toml
+[project]
+name = "markitdown-myplugin"
+description = "MarkItDown 插件示例"
+requires-python = ">=3.10"
+dependencies = ["markitdown"]
+
+# 关键：配置 entry_points 让 MarkItDown 发现插件
+[project.entry-points."markitdown.plugin"]
+myplugin = "markitdown_myplugin:register_converters"
+```
+
+#### 步骤 3: 实现插件代码
+
+**`src/markitdown_myplugin/_plugin.py`：**
+
+```python
+from typing import BinaryIO, Any
+from markitdown import (
+    MarkItDown,
+    DocumentConverter,
+    DocumentConverterResult,
+    StreamInfo,
+)
+
+# 插件接口版本
+__plugin_interface_version__ = 1
+
+# 接受的文件类型
+ACCEPTED_EXTENSIONS = [".rtf"]
+
+
+class RtfConverter(DocumentConverter):
+    """RTF 格式转换器示例"""
+    
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        extension = (stream_info.extension or "").lower()
+        return extension in ACCEPTED_EXTENSIONS
+    
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        # 读取内容
+        import locale
+        encoding = stream_info.charset or locale.getpreferredencoding()
+        content = file_stream.read().decode(encoding)
+        
+        # 执行转换...
+        markdown = self._rtf_to_markdown(content)
+        
+        return DocumentConverterResult(
+            markdown=markdown,
+            title=None
+        )
+    
+    def _rtf_to_markdown(self, rtf_content: str) -> str:
+        """RTF 到 Markdown 转换"""
+        # 实现转换逻辑
+        return rtf_content
+
+
+def register_converters(markitdown: MarkItDown, **kwargs):
+    """
+    插件必须实现的函数
+    
+    由 MarkItDown 在启用插件时调用。
+    """
+    # 注册转换器
+    # 使用 priority=-1.0 让它在内置转换器之前尝试
+    markitdown.register_converter(
+        RtfConverter(),
+        priority=-1.0
+    )
+```
+
+#### 步骤 4: 安装并测试插件
+
+```bash
+# 以可编辑模式安装插件
+cd markitdown-myplugin
+pip install -e .
+
+# 验证插件是否被发现
+markitdown --list-plugins
+
+# 测试使用插件
+markitdown --use-plugins test.rtf
+```
+
+### 5.3 插件优先级策略
+
+| 优先级值 | 使用场景 |
+|----------|----------|
+| `< -1.0` | 实验性功能，想最先尝试 |
+| `-1.0` | 官方推荐的插件优先级（如 markitdown-ocr） |
+| `0.0` | 与内置特定格式转换器同优先级 |
+| `10.0` | 与通用格式转换器同优先级 |
+
+---
+
+## 6. 测试指南
+
+### 6.1 测试向量系统
+
+MarkItDown 使用数据驱动的测试模式：
+
+```python
+# _test_vectors.py 示例
+
+from dataclasses import dataclass
+from typing import List, Optional
+
+
+@dataclass
+class TestVector:
+    """单个测试用例的数据定义"""
+    
+    filename: str                    # 测试文件名
+    mimetype: str                    # 预期 MIME 类型
+    charset: Optional[str]           # 预期编码
+    url: Optional[str]               # Mock URL
+    must_include: List[str]          # 输出必须包含的字符串
+    must_not_include: List[str]      # 输出不能包含的字符串
+
+
+# 定义测试向量
+GENERAL_TEST_VECTORS = [
+    TestVector(
+        filename="test.docx",
+        mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+        charset=None,
+        url=None,
+        must_include=["Heading 1", "Paragraph text"],
+        must_not_include=["<?xml"],
+    ),
+    # ... 更多测试向量
+]
+```
+
+### 6.2 运行测试
+
+```bash
+# 使用 hatch（推荐）
+cd packages/markitdown
+hatch shell
+hatch test
+
+# 使用 pytest 直接运行
+pytest packages/markitdown/tests/ -v
+
+# 运行特定测试
+pytest packages/markitdown/tests/test_module_vectors.py -v
+
+# 运行带有覆盖率报告
+pytest packages/markitdown/tests/ --cov=markitdown --cov-report=term-missing
+```
+
+---
+
+## 7. 代码规范与质量保证
+
+### 7.1 代码风格
+
+项目使用以下工具保证代码质量：
+
+| 工具 | 用途 |
+|------|------|
+| black | 代码格式化 |
+| isort | 导入排序 |
+| mypy | 类型检查 |
+| ruff | 快速 linting |
+| pre-commit | 预提交检查 |
+
+### 7.2 运行质量检查
+
+```bash
+# 安装 pre-commit 钩子
+pre-commit install
+
+# 手动运行所有检查
+pre-commit run --all-files
+
+# 类型检查（使用 hatch）
+cd packages/markitdown
+hatch run types:check
+```
+
+---
+
+## 8. 高级扩展技巧
+
+### 8.1 自定义 MarkItDown 子类
+
+```python
+from markitdown import MarkItDown
+from markitdown.converters import PdfConverter
+
+
+class CustomMarkItDown(MarkItDown):
+    """自定义 MarkItDown 类"""
+    
+    def __init__(self, **kwargs):
+        # 先不启用内置转换器
+        super().__init__(enable_builtins=False, **kwargs)
+        
+        # 自定义转换器注册顺序
+        self.register_converter(MyCustomConverter(), priority=-1.0)
+        self.register_converter(PdfConverter())
+        # ... 其他转换器
+        
+        self._builtins_enabled = True
+```
+
+### 8.2 流式处理大文件
+
+```python
+from markitdown import MarkItDown
+from markitdown._exceptions import FileConversionException
+
+md = MarkItDown()
+
+def safe_convert_large_file(file_path: str, chunk_size: int = 1024*1024):
+    """安全转换大文件"""
+    try:
+        result = md.convert(file_path)
+        return result.markdown
+    except FileConversionException as e:
+        # 记录错误并尝试恢复
+        print(f"转换失败: {e}")
+        for attempt in e.attempts:
+            print(f"  转换器 {type(attempt.converter).__name__} 失败")
+        return None
+```
+
+---
+
+## 9. 贡献指南
+
+### 9.1 贡献流程
+
+1. **Fork 仓库**：在 GitHub 上 Fork 项目
+2. **创建分支**：`git checkout -b feature/my-feature`
+3. **开发**：编写代码和测试
+4. **检查**：运行 `pre-commit run --all-files`
+5. **测试**：运行 `hatch test`
+6. **提交**：创建 Pull Request
+
+### 9.2 Pull Request 要求
+
+- 所有测试必须通过
+- 代码必须通过 pre-commit 检查
+- 包含新功能的测试
+- 更新相关文档
+
+---
+
+## 10. 常见开发问题
+
+### Q1: 如何调试转换器？
+
+**A:** 使用以下方法：
+
+```python
+from markitdown import MarkItDown
+from markitdown.converters import PdfConverter
+
+md = MarkItDown()
+
+# 查看已注册的转换器
+print("已注册的转换器：")
+for reg in sorted(md._converters, key=lambda x: x.priority):
+    print(f"  - {type(reg.converter).__name__} (priority: {reg.priority})")
+
+# 单独测试转换器
+converter = PdfConverter()
+with open("test.pdf", "rb") as f:
+    from markitdown import StreamInfo
+    info = StreamInfo(extension=".pdf")
+    print(f"接受: {converter.accepts(f, info)}")
+    # 注意：accepts 不会改变流位置
+```
+
+### Q2: 如何处理新的文件类型？
+
+**A:** 遵循以下步骤：
+
+1. 确认文件的扩展名、MIME 类型、魔数
+2. 选择合适的依赖库进行解析
+3. 实现 `DocumentConverter` 子类
+4. 实现 `accepts()` 方法进行类型判断
+5. 实现 `convert()` 方法进行实际转换
+6. 添加测试文件和测试向量
+7. 运行测试确保功能正常
+
+### Q3: 插件和内置转换器有什么区别？
+
+**A:** 主要区别：
+
+| 特性 | 内置转换器 | 插件转换器 |
+|------|-----------|-----------|
+| 代码位置 | 核心包 | 独立包 |
+| 发布周期 | 随 MarkItDown 版本 | 独立发布 |
+| 默认启用 | 是 | 否（需 `enable_plugins=True`） |
+| 优先级 | 0.0 或 10.0 | 可自定义（通常 -1.0） |
+| 适用场景 | 通用格式 | 特定领域、实验性功能 |
+
+---
+
+*手册版本：1.0*
+*最后更新：2026-04-28*
diff --git "a/\347\224\250\346\210\267\346\211\213\345\206\214.md" "b/\347\224\250\346\210\267\346\211\213\345\206\214.md"
new file mode 100644
index 000000000..3b7d69f5e
--- /dev/null
+++ "b/\347\224\250\346\210\267\346\211\213\345\206\214.md"
@@ -0,0 +1,1399 @@
+# MarkItDown 用户手册
+
+---
+
+## 目录
+
+1. [简介](#1-简介)
+2. [安装指南](#2-安装指南)
+3. [命令行使用](#3-命令行使用)
+4. [Python API 使用](#4-python-api-使用)
+5. [格式转换详细说明](#5-格式转换详细说明)
+6. [插件使用](#6-插件使用)
+7. [高级功能](#7-高级功能)
+8. [常见问题](#8-常见问题)
+9. [故障排除](#9-故障排除)
+
+---
+
+## 1. 简介
+
+### 1.1 什么是 MarkItDown？
+
+MarkItDown 是微软开源的轻量级 Python 工具，专门用于将各种文件格式转换为 Markdown 格式。它主要面向大语言模型（LLM）和文本分析管道使用，能够智能地保留文档结构（标题、列表、表格、链接等）。
+
+### 1.2 为什么使用 MarkItDown？
+
+| 特性 | 说明 |
+|------|------|
+| 结构保留 | 与其他文本提取工具不同，MarkItDown 专注于保留文档的结构信息 |
+| LLM 友好 | Markdown 是大语言模型最理解的格式之一，token 效率高 |
+| 格式广泛 | 支持 20+ 种文件格式，包括 PDF、Word、Excel、PowerPoint 等 |
+| 易于扩展 | 支持插件机制，可以轻松添加新的格式支持 |
+| 多输入源 | 支持本地文件、URL、流、Data URI 等多种输入方式 |
+
+### 1.3 支持的格式列表
+
+#### 文档格式
+
+| 格式 | 扩展名 | 说明 |
+|------|--------|------|
+| PDF | `.pdf` | 支持文本提取和表格识别 |
+| Word | `.docx` | 支持格式化为 Markdown |
+| Excel | `.xlsx`, `.xls` | 表格转换为 Markdown 表格 |
+| PowerPoint | `.pptx` | 支持图像描述 |
+| EPUB | `.epub` | 电子书格式 |
+| Outlook | `.msg` | Outlook 邮件消息 |
+
+#### 网页与数据格式
+
+| 格式 | 扩展名/类型 | 说明 |
+|------|------------|------|
+| HTML | `.html`, `.htm` | 网页转换 |
+| Wikipedia | URL 检测 | Wikipedia 页面优化 |
+| YouTube | URL 检测 | 字幕提取 |
+| RSS | `.xml` (RSS) | 订阅源转换 |
+| Bing 搜索 | 检测 | 搜索结果优化 |
+| Jupyter | `.ipynb` | Notebook 转换 |
+| CSV | `.csv` | 逗号分隔值 |
+| JSON/XML | `.json`, `.xml` | 结构化数据 |
+
+#### 媒体格式
+
+| 格式 | 扩展名 | 说明 |
+|------|--------|------|
+| 图片 | `.jpg`, `.png`, `.gif`, 等 | EXIF 元数据 + OCR |
+| 音频 | `.mp3`, `.wav`, `.m4a` | EXIF 元数据 + 语音转录 |
+
+#### 其他格式
+
+| 格式 | 扩展名 | 说明 |
+|------|--------|------|
+| ZIP | `.zip` | 递归处理压缩包内文件 |
+| 纯文本 | `.txt`, 无扩展名 | 通用文本处理 |
+
+---
+
+## 2. 安装指南
+
+### 2.1 系统要求
+
+- **Python 版本**：3.10 或更高版本
+- **操作系统**：Windows、Linux、macOS
+- **可选依赖**：根据需要转换的格式安装相应的依赖
+
+### 2.2 安装方式
+
+#### 方式一：从 PyPI 安装（推荐）
+
+**安装所有可选依赖（推荐大多数用户）：**
+
+```bash
+pip install 'markitdown[all]'
+```
+
+**按需安装：**
+
+```bash
+# 仅安装核心依赖
+pip install markitdown
+
+# 安装 PDF 支持
+pip install 'markitdown[pdf]'
+
+# 安装 Office 文档支持
+pip install 'markitdown[docx,pptx,xlsx]'
+
+# 安装音频转录支持
+pip install 'markitdown[audio-transcription]'
+```
+
+#### 方式二：从源码安装
+
+```bash
+# 克隆仓库
+git clone https://github.com/microsoft/markitdown.git
+cd markitdown
+
+# 以可编辑模式安装
+pip install -e 'packages/markitdown[all]'
+```
+
+#### 方式三：使用虚拟环境（推荐）
+
+**使用标准 venv：**
+
+```bash
+# 创建虚拟环境
+python -m venv markitdown-env
+
+# 激活虚拟环境
+# Windows:
+markitdown-env\Scripts\activate
+# Linux/macOS:
+source markitdown-env/bin/activate
+
+# 安装
+pip install 'markitdown[all]'
+```
+
+**使用 conda：**
+
+```bash
+# 创建 conda 环境
+conda create -n markitdown python=3.12
+conda activate markitdown
+
+# 安装
+pip install 'markitdown[all]'
+```
+
+#### 方式四：使用 Docker
+
+```bash
+# 构建镜像
+docker build -t markitdown:latest .
+
+# 使用
+docker run --rm -i markitdown:latest < your-file.pdf > output.md
+```
+
+### 2.3 验证安装
+
+```bash
+# 检查版本
+markitdown --version
+
+# 查看帮助
+markitdown --help
+```
+
+### 2.4 可选依赖分组详解
+
+| 分组 | 包含的依赖 | 支持的功能 |
+|------|-----------|-----------|
+| `[all]` | 全部可选依赖 | 所有格式支持 |
+| `[pptx]` | python-pptx | PowerPoint 演示文稿 |
+| `[docx]` | mammoth, lxml | Word 文档 |
+| `[xlsx]` | pandas, openpyxl | Excel 2007+ |
+| `[xls]` | pandas, xlrd | Excel 97-2003 |
+| `[pdf]` | pdfminer.six, pdfplumber | PDF 文档 |
+| `[outlook]` | olefile | Outlook .msg 文件 |
+| `[audio-transcription]` | pydub, SpeechRecognition | 音频转录 |
+| `[youtube-transcription]` | youtube-transcript-api | YouTube 字幕 |
+| `[az-doc-intel]` | azure-ai-documentintelligence, azure-identity | Azure 云端 OCR |
+
+---
+
+## 3. 命令行使用
+
+### 3.1 基本命令
+
+#### 语法
+
+```bash
+markitdown [选项] <文件名>
+```
+
+如果未提供文件名，markitdown 从标准输入读取。
+
+#### 基本示例
+
+```bash
+# 转换单个文件并输出到标准输出
+markitdown document.pdf
+
+# 转换并保存到文件
+markitdown document.pdf -o output.md
+
+# 使用重定向
+markitdown document.pdf > output.md
+
+# 从标准输入读取
+cat document.pdf | markitdown
+# 或
+markitdown < document.pdf
+```
+
+### 3.2 命令行选项详解
+
+#### 输出控制
+
+| 选项 | 简写 | 说明 | 示例 |
+|------|------|------|------|
+| `--output` | `-o` | 指定输出文件 | `markitdown input.pdf -o output.md` |
+| `--keep-data-uris` | | 保留 base64 编码的图片数据 | `markitdown input.html --keep-data-uris` |
+
+#### 类型提示
+
+| 选项 | 简写 | 说明 | 示例 |
+|------|------|------|------|
+| `--extension` | `-x` | 提供文件扩展名提示 | `markitdown -x .pdf < unknown_file` |
+| `--mime-type` | `-m` | 提供 MIME 类型提示 | `markitdown -m application/pdf < stream` |
+| `--charset` | `-c` | 提供字符编码提示 | `markitdown -c gbk < chinese.txt` |
+
+#### 高级功能
+
+| 选项 | 简写 | 说明 |
+|------|------|------|
+| `--use-docintel` | `-d` | 使用 Azure Document Intelligence |
+| `--endpoint` | `-e` | 指定 Document Intelligence 端点 |
+| `--use-plugins` | `-p` | 启用第三方插件 |
+| `--list-plugins` | | 列出已安装的插件 |
+
+#### 信息查询
+
+| 选项 | 简写 | 说明 |
+|------|------|------|
+| `--version` | `-v` | 显示版本号 |
+| `--help` | | 显示帮助信息 |
+
+### 3.3 完整示例
+
+#### 示例 1：基本转换
+
+```bash
+# 转换 PDF
+markitdown report.pdf -o report.md
+
+# 转换 Word 文档
+markitdown memo.docx -o memo.md
+
+# 转换 Excel 表格
+markitdown data.xlsx -o data.md
+```
+
+#### 示例 2：从标准输入读取
+
+```bash
+# 使用管道
+curl -s https://example.com/document.pdf | markitdown -x .pdf
+
+# 使用重定向
+markitdown -x .docx < document.docx
+```
+
+#### 示例 3：使用 Azure Document Intelligence
+
+```bash
+# 使用云端 OCR 服务（适合扫描版 PDF）
+markitdown scanned.pdf -o output.md -d -e "https://<your-resource>.cognitiveservices.azure.com/"
+```
+
+#### 示例 4：使用插件
+
+```bash
+# 列出已安装的插件
+markitdown --list-plugins
+
+# 使用 OCR 插件转换包含图片的文档
+markitdown --use-plugins document_with_images.pdf -o output.md
+```
+
+#### 示例 5：批量转换
+
+```bash
+# Windows (PowerShell)
+Get-ChildItem *.pdf | ForEach-Object { markitdown $_.FullName -o ($_.BaseName + ".md") }
+
+# Linux/macOS (Bash)
+for f in *.pdf; do markitdown "$f" -o "${f%.pdf}.md"; done
+```
+
+---
+
+## 4. Python API 使用
+
+### 4.1 快速开始
+
+#### 基础转换
+
+```python
+from markitdown import MarkItDown
+
+# 创建转换器实例
+md = MarkItDown()
+
+# 转换文件
+result = md.convert("document.pdf")
+
+# 获取转换后的 Markdown 内容
+print(result.markdown)
+# 或使用 text_content (软弃用别名)
+print(result.text_content)
+```
+
+#### 结果对象
+
+`convert()` 方法返回 `DocumentConverterResult` 对象：
+
+```python
+result = md.convert("document.pdf")
+
+# 主要属性
+markdown = result.markdown        # 转换后的 Markdown 文本
+title = result.title              # 文档标题（可选）
+text_content = result.text_content  # markdown 的别名（已弃用）
+
+# 字符串表示
+print(str(result))  # 等同于 print(result.markdown)
+```
+
+### 4.2 多种输入源
+
+MarkItDown 支持多种输入源类型：
+
+#### 本地文件路径
+
+```python
+from markitdown import MarkItDown
+from pathlib import Path
+
+md = MarkItDown()
+
+# 使用字符串路径
+result = md.convert("/path/to/document.pdf")
+
+# 使用 Path 对象
+result = md.convert(Path("/path/to/document.pdf"))
+```
+
+#### URL
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+# 直接使用 URL
+result = md.convert("https://example.com/document.pdf")
+
+# Wikipedia 页面会自动优化
+result = md.convert("https://en.wikipedia.org/wiki/Python_(programming_language)")
+
+# YouTube 视频会提取字幕
+result = md.convert("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
+```
+
+#### HTTP 响应对象
+
+```python
+from markitdown import MarkItDown
+import requests
+
+md = MarkItDown()
+
+# 自己管理 HTTP 请求
+response = requests.get("https://example.com/document.pdf")
+response.raise_for_status()
+
+result = md.convert(response)
+```
+
+#### 二进制流
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+# 使用文件对象
+with open("document.pdf", "rb") as f:
+    result = md.convert(f)
+
+# 使用 BytesIO
+from io import BytesIO
+buffer = BytesIO(b"...二进制数据...")
+result = md.convert(buffer)
+```
+
+#### URI 类型
+
+```python
+from markitdown import MarkItDown
+from pathlib import Path
+
+md = MarkItDown()
+
+# File URI
+result = md.convert(Path("/path/to/document.pdf").as_uri())
+
+# Data URI
+import base64
+with open("document.pdf", "rb") as f:
+    data = base64.b64encode(f.read()).decode()
+result = md.convert(f"data:application/pdf;base64,{data}")
+```
+
+### 4.3 细粒度 API
+
+为了更好的安全性和控制，MarkItDown 提供了多个细粒度的转换方法：
+
+| 方法 | 功能 | 适用场景 |
+|------|------|----------|
+| `convert_local()` | 仅转换本地文件 | 只处理本地文件，防止 URL 注入 |
+| `convert_stream()` | 仅转换流 | 完全控制输入源 |
+| `convert_uri()` | 转换 URI | 需要处理 URI 时 |
+| `convert_response()` | 转换 HTTP 响应 | 自己管理 HTTP 请求 |
+| `convert_url()` | `convert_uri()` 的别名 | 向后兼容 |
+
+#### 使用示例
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+# 仅处理本地文件（更安全）
+result = md.convert_local("/path/to/document.pdf")
+
+# 仅处理流
+with open("document.pdf", "rb") as f:
+    result = md.convert_stream(f)
+
+# 带提示的流转换
+from markitdown import StreamInfo
+
+with open("unknown_file", "rb") as f:
+    stream_info = StreamInfo(
+        extension=".pdf",
+        mimetype="application/pdf"
+    )
+    result = md.convert_stream(f, stream_info=stream_info)
+```
+
+### 4.4 配置选项
+
+#### LLM 集成（图像描述）
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+# 使用 OpenAI 客户端进行图像描述
+md = MarkItDown(
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+    llm_prompt="请描述这张图片的内容"  # 可选自定义提示
+)
+
+result = md.convert("presentation.pptx")
+```
+
+#### 使用 Azure OpenAI
+
+```python
+from markitdown import MarkItDown
+from openai import AzureOpenAI
+
+md = MarkItDown(
+    llm_client=AzureOpenAI(
+        api_key="your-api-key",
+        azure_endpoint="https://your-resource.openai.azure.com/",
+        api_version="2024-02-01",
+    ),
+    llm_model="gpt-4o",
+)
+```
+
+#### Azure Document Intelligence
+
+```python
+from markitdown import MarkItDown
+
+# 使用云端 OCR 服务
+md = MarkItDown(
+    docintel_endpoint="https://<your-resource>.cognitiveservices.azure.com/",
+    # 可选：自定义凭据
+    # docintel_credential=your_credential,
+    # 可选：指定处理的文件类型
+    # docintel_file_types=["pdf", "docx"],
+    # 可选：API 版本
+    # docintel_api_version="2024-02-29-preview",
+)
+
+result = md.convert("scanned_document.pdf")
+```
+
+#### 自定义 HTTP 会话
+
+```python
+from markitdown import MarkItDown
+import requests
+
+# 使用自定义的 requests 会话
+session = requests.Session()
+session.headers.update({"User-Agent": "MyCustomAgent/1.0"})
+session.auth = ("user", "password")
+
+md = MarkItDown(requests_session=session)
+```
+
+### 4.5 插件管理
+
+```python
+from markitdown import MarkItDown
+
+# 启用插件
+md = MarkItDown(enable_plugins=True)
+
+# 或在初始化后启用
+md = MarkItDown(enable_plugins=False)
+md.enable_plugins(llm_client=OpenAI(), llm_model="gpt-4o")
+```
+
+### 4.6 完整示例
+
+#### 示例 1：批量转换文件夹
+
+```python
+from markitdown import MarkItDown
+from pathlib import Path
+
+md = MarkItDown()
+
+input_dir = Path("documents")
+output_dir = Path("output")
+output_dir.mkdir(exist_ok=True)
+
+# 支持的扩展名
+supported = {".pdf", ".docx", ".xlsx", ".pptx"}
+
+for file_path in input_dir.iterdir():
+    if file_path.suffix.lower() in supported:
+        print(f"正在转换: {file_path.name}")
+        result = md.convert(file_path)
+        
+        output_path = output_dir / (file_path.stem + ".md")
+        output_path.write_text(result.markdown, encoding="utf-8")
+        
+        print(f"已保存: {output_path}")
+```
+
+#### 示例 2：处理 URL 列表
+
+```python
+from markitdown import MarkItDown
+import time
+
+md = MarkItDown()
+
+urls = [
+    "https://example.com/report.pdf",
+    "https://example.com/presentation.pptx",
+    # ...
+]
+
+for url in urls:
+    try:
+        result = md.convert(url)
+        # 保存结果
+        filename = url.split("/")[-1].split(".")[0] + ".md"
+        with open(filename, "w", encoding="utf-8") as f:
+            f.write(result.markdown)
+        print(f"成功: {url}")
+    except Exception as e:
+        print(f"失败 {url}: {e}")
+    
+    time.sleep(1)  # 避免请求过快
+```
+
+#### 示例 3：错误处理
+
+```python
+from markitdown import MarkItDown
+from markitdown._exceptions import (
+    UnsupportedFormatException,
+    FileConversionException,
+    MissingDependencyException
+)
+
+md = MarkItDown()
+
+try:
+    result = md.convert("document.pdf")
+except UnsupportedFormatException:
+    print("不支持的文件格式")
+except MissingDependencyException as e:
+    print(f"缺少依赖: {e}")
+    print(f"请运行: pip install 'markitdown[{e.feature}]'")
+except FileConversionException as e:
+    print(f"转换失败: {e}")
+    # 查看详细错误
+    for attempt in e.attempts:
+        print(f"转换器: {attempt.converter}")
+        print(f"错误: {attempt.exc_info}")
+```
+
+---
+
+## 5. 格式转换详细说明
+
+### 5.1 PDF 转换
+
+#### 特性
+
+- **文本提取**：使用 pdfminer.six 提取文本
+- **表格识别**：使用 pdfplumber 识别和转换表格
+- **表单检测**：智能检测无框表单
+- **MasterFormat 支持**：处理建筑行业文档的特殊编号
+
+#### 工作原理
+
+```
+PDF 文件
+    │
+    ├──► 逐页分析 (pdfplumber)
+    │         │
+    │         ├──► 检测表格/表单布局
+    │         │         │
+    │         │         ├──► 是 → 转换为 Markdown 表格
+    │         │         │
+    │         │         └──► 否 → 使用 pdfminer 提取文本
+    │         │
+    │         └──► 页处理后立即释放内存
+    │
+    └──► 合并所有页面
+              │
+              └──► 后处理：MasterFormat 编号合并
+```
+
+#### 示例
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+# 标准 PDF
+result = md.convert("report.pdf")
+
+# 包含表格的 PDF
+result = md.convert("invoice.pdf")
+# 表格会自动转换为 Markdown 表格格式
+
+# 扫描版 PDF（需要 Azure Document Intelligence）
+md_azure = MarkItDown(docintel_endpoint="https://...")
+result = md_azure.convert("scanned.pdf")
+```
+
+### 5.2 Word 文档转换
+
+#### 特性
+
+- 使用 mammoth 库转换为 HTML，再转为 Markdown
+- 支持自定义样式映射
+- 保留标题、列表、表格等结构
+
+#### 自定义样式映射
+
+```python
+from markitdown import MarkItDown
+
+# 使用自定义样式映射
+style_map = """
+    p[style-name='Heading 1'] => h1:fresh
+    p[style-name='Heading 2'] => h2:fresh
+    p[style-name='Quote'] => blockquote
+"""
+
+md = MarkItDown(style_map=style_map)
+result = md.convert("document.docx")
+```
+
+### 5.3 Excel 转换
+
+#### 特性
+
+- 使用 pandas 读取 Excel 文件
+- 每个工作表转换为单独的 Markdown 部分
+- 表格自动对齐
+
+#### 输出结构
+
+```markdown
+## Sheet1
+
+| 列1 | 列2 | 列3 |
+|-----|-----|-----|
+| A   | B   | C   |
+| D   | E   | F   |
+
+## Sheet2
+
+...
+```
+
+### 5.4 PowerPoint 转换
+
+#### 特性
+
+- 提取幻灯片文本
+- 支持使用 LLM 进行图像描述
+- 表格转换为 Markdown 格式
+
+#### 带图像描述的转换
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    llm_client=OpenAI(),
+    llm_model="gpt-4o"
+)
+
+result = md.convert("presentation.pptx")
+# 图像会被描述并插入到 Markdown 中
+```
+
+### 5.5 HTML 转换
+
+#### 特性
+
+- 使用 markdownify 库转换
+- 保留标题、列表、表格、链接
+- 自动清理无用的 HTML 标签
+
+#### 特殊优化
+
+- **Wikipedia**：自动检测并优化 Wikipedia 页面
+- **Bing 搜索**：优化搜索结果页面
+- **YouTube**：提取视频字幕
+
+### 5.6 音频转换
+
+#### 特性
+
+- 提取 EXIF 元数据
+- 支持语音转录（需要 SpeechRecognition）
+
+#### 注意
+
+音频转录功能需要安装 `pydub` 和 `SpeechRecognition`：
+
+```bash
+pip install 'markitdown[audio-transcription]'
+```
+
+### 5.7 ZIP 文件处理
+
+#### 特性
+
+- 递归处理 ZIP 包内的文件
+- 使用相同的转换器集合处理内部文件
+- 输出包含每个文件的内容
+
+#### 示例
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+result = md.convert("documents.zip")
+# 输出包含 ZIP 内所有支持的文件的转换结果
+```
+
+---
+
+## 6. 插件使用
+
+### 6.1 插件概述
+
+MarkItDown 支持通过插件机制扩展功能。插件默认禁用，需要显式启用。
+
+### 6.2 插件管理
+
+#### 查看已安装的插件
+
+**命令行：**
+```bash
+markitdown --list-plugins
+```
+
+**Python：**
+```python
+from importlib.metadata import entry_points
+
+plugins = list(entry_points(group="markitdown.plugin"))
+for p in plugins:
+    print(f"{p.name}: {p.value}")
+```
+
+#### 启用插件
+
+**命令行：**
+```bash
+markitdown --use-plugins document.pdf
+```
+
+**Python：**
+```python
+from markitdown import MarkItDown
+
+# 初始化时启用
+md = MarkItDown(enable_plugins=True)
+
+# 或之后启用
+md = MarkItDown()
+md.enable_plugins()
+```
+
+### 6.3 MarkItDown OCR 插件
+
+这是一个官方提供的插件，用于从文档中的图像提取文本。
+
+#### 安装
+
+```bash
+pip install markitdown-ocr
+pip install openai  # 或其他 OpenAI 兼容客户端
+```
+
+#### 使用
+
+**命令行：**
+```bash
+markitdown --use-plugins document_with_images.pdf -o output.md
+```
+
+**Python：**
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+
+result = md.convert("document_with_images.pdf")
+print(result.markdown)
+```
+
+#### 支持的格式
+
+| 格式 | 特性 |
+|------|------|
+| PDF | 嵌入式图像 OCR，扫描版 PDF 全页 OCR 回退 |
+| DOCX | 图像 OCR，保留文档结构 |
+| PPTX | 图像 OCR，图像描述回退 |
+| XLSX | 图像 OCR，按工作表列出 |
+
+#### 输出格式
+
+OCR 提取的文本会被包裹在特定标记中：
+
+```markdown
+*[Image OCR]
+提取的文本内容...
+[End OCR]*
+```
+
+### 6.4 查找更多插件
+
+在 GitHub 上搜索话题标签 `#markitdown-plugin` 可以找到社区开发的插件。
+
+---
+
+## 7. 高级功能
+
+### 7.1 StreamInfo 元数据
+
+`StreamInfo` 用于提供关于流的元数据信息，帮助转换器正确处理输入。
+
+```python
+from markitdown import StreamInfo
+
+# 创建 StreamInfo 对象
+info = StreamInfo(
+    mimetype="application/pdf",
+    extension=".pdf",
+    charset="utf-8",
+    filename="document.pdf",
+    local_path="/path/to/document.pdf",
+    url="https://example.com/document.pdf"
+)
+
+# 复制并更新（不可变对象模式）
+new_info = info.copy_and_update(
+    extension=".docx",
+    mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+)
+```
+
+### 7.2 Data URI 处理
+
+默认情况下，Data URI（如 base64 编码的图片）会被截断。可以选择保留：
+
+**命令行：**
+```bash
+markitdown --keep-data-uris input.html
+```
+
+**Python：**
+```python
+result = md.convert("input.html", keep_data_uris=True)
+```
+
+### 7.3 自定义请求头
+
+```python
+from markitdown import MarkItDown
+import requests
+
+session = requests.Session()
+session.headers.update({
+    "User-Agent": "MyApp/1.0",
+    "Authorization": "Bearer token123"
+})
+
+md = MarkItDown(requests_session=session)
+result = md.convert("https://private.example.com/document.pdf")
+```
+
+---
+
+## 8. 常见问题
+
+### Q1: MarkItDown 支持哪些 Python 版本？
+
+**A:** MarkItDown 要求 Python 3.10 或更高版本。同时支持 CPython 和 PyPy 实现。
+
+### Q2: 为什么转换 PDF 时表格没有正确识别？
+
+**A:** 标准的 PDF 转换使用 pdfplumber 进行表格检测，适用于大多数情况。如果表格没有正确识别，可以尝试：
+
+1. **检查 PDF 类型**：扫描版 PDF 需要使用 Azure Document Intelligence
+2. **检查表格结构**：无框表格可能需要特殊处理
+3. **更新依赖**：确保 pdfplumber 和 pdfminer.six 是最新版本
+
+### Q3: 如何处理扫描版 PDF？
+
+**A:** 扫描版 PDF（纯图片，无文本层）需要使用 OCR 服务。有以下选择：
+
+1. **Azure Document Intelligence**（推荐）：
+   ```python
+   md = MarkItDown(docintel_endpoint="https://...")
+   ```
+
+2. **MarkItDown OCR 插件**：
+   ```python
+   md = MarkItDown(enable_plugins=True, llm_client=..., llm_model=...)
+   ```
+
+### Q4: 转换大文件时内存不足怎么办？
+
+**A:** MarkItDown 在设计上已经考虑了内存效率：
+
+- PDF 处理是逐页进行的，每页处理后立即释放
+- 不可 seek 的流会被完全读入内存，但这是必要的
+
+如果仍然遇到问题：
+
+1. **增加可用内存**
+2. **分批处理**：将大文件拆分为小文件
+3. **使用 64 位 Python**
+
+### Q5: 为什么某些格式转换失败并提示缺少依赖？
+
+**A:** MarkItDown 使用可选依赖模式，以保持轻量级。如果遇到 `MissingDependencyException`，请安装相应的依赖：
+
+```bash
+# 查看异常信息中的 feature 字段
+# 例如，如果 feature 是 "pdf"
+pip install 'markitdown[pdf]'
+
+# 或者安装所有可选依赖
+pip install 'markitdown[all]'
+```
+
+### Q6: 如何添加对新格式的支持？
+
+**A:** 有两种方式：
+
+1. **开发插件**（推荐用于第三方扩展）：
+   - 参考 `markitdown-sample-plugin` 示例
+   - 继承 `DocumentConverter` 基类
+   - 实现 `register_converters` 函数
+
+2. **直接修改代码**（用于自定义需求）：
+   - 在 `packages/markitdown/src/markitdown/converters/` 中添加新的转换器
+   - 在 `__init__.py` 中导出
+   - 在 `_markitdown.py` 中注册
+
+### Q7: MarkItDown 与 textract 有什么区别？
+
+| 特性 | MarkItDown | textract |
+|------|------------|----------|
+| 输出格式 | Markdown（结构化） | 纯文本 |
+| 结构保留 | 保留标题、列表、表格等 | 仅提取文本 |
+| 设计目标 | LLM 输入、文本分析 | 通用文本提取 |
+| 扩展机制 | 插件系统 | 有限 |
+| 微软支持 | 是 | 否 |
+
+### Q8: 转换结果可以进一步处理吗？
+
+**A:** 是的，转换结果是标准的 Markdown 文本，可以使用任何 Markdown 处理工具进一步处理：
+
+```python
+from markitdown import MarkItDown
+import markdown
+
+md = MarkItDown()
+result = md.convert("document.pdf")
+
+# 转换为 HTML
+html = markdown.markdown(result.markdown)
+
+# 或使用其他 Markdown 库
+```
+
+### Q9: 如何处理编码问题？
+
+**A:** MarkItDown 使用 `charset-normalizer` 自动检测编码。如果自动检测失败，可以手动指定：
+
+**命令行：**
+```bash
+markitdown -c gbk chinese.txt
+```
+
+**Python：**
+```python
+from markitdown import StreamInfo
+
+with open("chinese.txt", "rb") as f:
+    result = md.convert_stream(
+        f,
+        stream_info=StreamInfo(charset="gbk")
+    )
+```
+
+### Q10: 安全方面需要注意什么？
+
+**A:** 请参考[安全考虑](#安全考虑)部分。关键点：
+
+1. **输入验证**：在不受信任的环境中，务必验证输入
+2. **API 选择**：使用最窄范围的 API（如 `convert_local` 而非 `convert`）
+3. **路径限制**：限制可访问的文件路径和网络目标
+
+---
+
+## 9. 故障排除
+
+### 9.1 安装问题
+
+#### 问题 1：pip install 失败
+
+**症状：** 安装时出现错误，提示依赖冲突或编译失败。
+
+**解决方案：**
+
+1. **使用虚拟环境：**
+   ```bash
+   python -m venv .venv
+   # Windows
+   .venv\Scripts\activate
+   # Linux/macOS
+   source .venv/bin/activate
+   pip install 'markitdown[all]'
+   ```
+
+2. **升级 pip 和 setuptools：**
+   ```bash
+   pip install --upgrade pip setuptools
+   ```
+
+3. **检查 Python 版本：**
+   ```bash
+   python --version
+   # 确保是 3.10 或更高
+   ```
+
+#### 问题 2：某些可选依赖安装失败
+
+**症状：** 特定格式的依赖无法安装。
+
+**解决方案：**
+
+1. **安装系统依赖（Linux）：**
+   ```bash
+   # Ubuntu/Debian
+   sudo apt-get install -y libxml2-dev libxslt1-dev
+   ```
+
+2. **使用预编译的 wheel：**
+   ```bash
+   # 某些包提供预编译的 wheel
+   pip install --only-binary :all: markitdown
+   ```
+
+### 9.2 运行时问题
+
+#### 问题 1：提示 "No module named '...'"
+
+**症状：** 运行时提示缺少某个模块。
+
+**解决方案：**
+
+这通常意味着缺少可选依赖。根据错误信息安装相应的依赖：
+
+```bash
+# 如果缺少 pdf 相关依赖
+pip install 'markitdown[pdf]'
+
+# 或者安装所有可选依赖
+pip install 'markitdown[all]'
+```
+
+#### 问题 2：文件转换失败
+
+**症状：** 转换特定文件时抛出异常。
+
+**排查步骤：**
+
+1. **检查文件是否损坏：**
+   ```python
+   # 尝试使用其他工具打开文件
+   ```
+
+2. **查看详细错误信息：**
+   ```python
+   from markitdown import MarkItDown
+   from markitdown._exceptions import FileConversionException
+
+   md = MarkItDown()
+   try:
+       result = md.convert("problematic.pdf")
+   except FileConversionException as e:
+       print("转换失败，详细信息：")
+       for attempt in e.attempts:
+           print(f"\n转换器: {type(attempt.converter).__name__}")
+           print(f"异常类型: {attempt.exc_info[0]}")
+           print(f"异常信息: {attempt.exc_info[1]}")
+   ```
+
+3. **检查是否需要 OCR：**
+   - 扫描版 PDF 需要 Azure Document Intelligence 或 OCR 插件
+
+#### 问题 3：PDF 表格识别不正确
+
+**症状：** PDF 中的表格被转换为普通文本。
+
+**解决方案：**
+
+1. **确认 PDF 不是扫描版：**
+   - 尝试复制 PDF 中的文本，如果无法复制则是扫描版
+
+2. **更新依赖：**
+   ```bash
+   pip install --upgrade pdfplumber pdfminer.six
+   ```
+
+3. **使用 Azure Document Intelligence：**
+   ```python
+   md = MarkItDown(docintel_endpoint="https://...")
+   ```
+
+#### 问题 4：中文乱码
+
+**症状：** 转换包含中文的文档时出现乱码。
+
+**解决方案：**
+
+1. **手动指定编码：**
+   ```bash
+   markitdown -c gbk chinese.txt
+   ```
+
+   或在 Python 中：
+   ```python
+   from markitdown import StreamInfo
+
+   with open("chinese.txt", "rb") as f:
+       result = md.convert_stream(
+           f,
+           stream_info=StreamInfo(charset="gbk")
+       )
+   ```
+
+2. **检查终端编码（Windows）：**
+   - 确保终端使用正确的编码
+   - 输出到文件而不是标准输出：
+     ```bash
+     markitdown document.docx -o output.md
+     ```
+
+#### 问题 5：URL 转换失败
+
+**症状：** 转换 URL 时出现网络错误或权限错误。
+
+**解决方案：**
+
+1. **检查网络连接：**
+   ```bash
+   ping example.com
+   ```
+
+2. **检查 URL 是否可访问：**
+   ```bash
+   curl -I https://example.com/document.pdf
+   ```
+
+3. **使用代理：**
+   ```python
+   from markitdown import MarkItDown
+   import requests
+
+   session = requests.Session()
+   session.proxies = {
+       "http": "http://proxy:8080",
+       "https": "http://proxy:8080",
+   }
+
+   md = MarkItDown(requests_session=session)
+   ```
+
+4. **添加认证：**
+   ```python
+   session = requests.Session()
+   session.auth = ("username", "password")
+   # 或
+   session.headers["Authorization"] = "Bearer token"
+
+   md = MarkItDown(requests_session=session)
+   ```
+
+### 9.3 插件问题
+
+#### 问题 1：插件未被检测到
+
+**症状：** `markitdown --list-plugins` 不显示已安装的插件。
+
+**解决方案：**
+
+1. **确认插件正确安装：**
+   ```bash
+   pip list | grep markitdown
+   ```
+
+2. **检查安装位置：**
+   ```bash
+   pip show markitdown-ocr
+   ```
+
+3. **重新安装：**
+   ```bash
+   pip uninstall markitdown-ocr
+   pip install markitdown-ocr
+   ```
+
+#### 问题 2：插件加载失败
+
+**症状：** 启用插件时出现警告或错误。
+
+**解决方案：**
+
+1. **检查插件依赖：**
+   ```bash
+   # markitdown-ocr 需要 openai 客户端
+   pip install openai
+   ```
+
+2. **查看详细错误：**
+   插件加载时的错误会以警告形式输出，检查控制台输出。
+
+### 9.4 性能问题
+
+#### 问题 1：大文件转换慢
+
+**症状：** 转换大 PDF 或其他大文件时耗时过长。
+
+**分析：**
+
+- PDF 转换是逐页处理的，页数越多耗时越长
+- 表格检测需要额外的计算
+- 网络转换（URL、Azure）受网络影响
+
+**建议：**
+
+1. **评估是否真的需要处理整个文件**
+2. **分批处理**：将大文件拆分为小文件
+3. **使用更快的存储**：SSD 比 HDD 快
+4. **使用 Azure Document Intelligence**：云端处理可能更快（取决于网络）
+
+#### 问题 2：内存使用过高
+
+**症状：** 转换大文件时内存占用过高。
+
+**建议：**
+
+1. **确保使用最新版本**：MarkItDown 持续优化内存使用
+2. **避免同时转换多个大文件**
+3. **使用 64 位 Python**：可以访问更多内存
+4. **增加交换空间**：作为临时解决方案
+
+### 9.5 调试技巧
+
+#### 启用详细输出
+
+虽然 MarkItDown 没有内置的详细模式，但可以使用以下方法调试：
+
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+
+from markitdown import MarkItDown
+
+md = MarkItDown()
+result = md.convert("test.pdf")
+```
+
+#### 检查转换器列表
+
+```python
+from markitdown import MarkItDown
+
+md = MarkItDown()
+
+print("已注册的转换器（按优先级排序）：")
+for reg in sorted(md._converters, key=lambda x: x.priority):
+    print(f"  - {type(reg.converter).__name__} (priority: {reg.priority})")
+```
+
+#### 测试特定转换器
+
+```python
+from markitdown import MarkItDown
+from markitdown.converters import PdfConverter
+from markitdown import StreamInfo
+
+md = MarkItDown()
+converter = PdfConverter()
+
+# 测试是否接受
+with open("test.pdf", "rb") as f:
+    info = StreamInfo(extension=".pdf")
+    accepts = converter.accepts(f, info)
+    print(f"PdfConverter 接受此文件: {accepts}")
+```
+
+### 9.6 获取更多帮助
+
+如果以上解决方案都无法解决问题，可以：
+
+1. **查看 GitHub Issues**：https://github.com/microsoft/markitdown/issues
+2. **搜索现有问题**：可能已经有人遇到并解决了相同的问题
+3. **提交新 Issue**：提供以下信息：
+   - MarkItDown 版本
+   - Python 版本
+   - 操作系统
+   - 完整的错误信息
+   - 复现问题的最小代码示例
+   - 相关文件（如果可能）
+
+---
+
+*手册版本：1.0*
+*最后更新：2026-04-28*