Description
The PaperLearner extracts knowledge from research papers (PDFs) and adds it to the wiki. Currently has placeholder implementation.
Location: src/knowledge/learners/paper_learner.py
Goal
Implement the full paper learning pipeline that:
- Loads PDF (local or from URL like arXiv)
- Parses sections and extracts knowledge
- Creates wiki pages from extracted knowledge
What to Extract
- Abstract and introduction → Overview/concept pages
- Methodology sections → Workflow pages
- Key findings and conclusions → Principle pages
- Formulas and algorithms → Implementation pages
- References → Related pages links
Input Format
{"path": "./paper.pdf"} # Local file
# or
{"url": "https://arxiv.org/pdf/..."} # Remote URL
Implementation Steps
- Load PDF (local or download from URL)
- Extract text using PyPDF2 or pdfplumber
- Identify sections (Abstract, Methods, Results, etc.)
- Extract formulas using OCR if needed
- Create structured chunks per section
- Convert chunks to wiki pages
- Index into knowledge graph
Deliverable
- Full implementation of
PaperLearner.learn() method
- PDF parsing and section extraction
- Wiki page generation from paper content
Test
- Call
expert.learn(Source.Paper("./sample_paper.pdf"))
- Verify wiki pages created with paper sections
- Verify formulas and algorithms extracted correctly
References
src/knowledge/learners/paper_learner.py - Current placeholder
src/knowledge/learners/base.py - Base class and KnowledgeChunk
src/knowledge/wiki_structure/extension_of_product.md - Wiki page structure
Dependencies
Description
The
PaperLearnerextracts knowledge from research papers (PDFs) and adds it to the wiki. Currently has placeholder implementation.Location:
src/knowledge/learners/paper_learner.pyGoal
Implement the full paper learning pipeline that:
What to Extract
Input Format
{"path": "./paper.pdf"} # Local file # or {"url": "https://arxiv.org/pdf/..."} # Remote URLImplementation Steps
Deliverable
PaperLearner.learn()methodTest
expert.learn(Source.Paper("./sample_paper.pdf"))References
src/knowledge/learners/paper_learner.py- Current placeholdersrc/knowledge/learners/base.py- Base class and KnowledgeChunksrc/knowledge/wiki_structure/extension_of_product.md- Wiki page structureDependencies