Skip to content

Implement Paper Learner #8

@alirezamshi

Description

@alirezamshi

Description

The PaperLearner extracts knowledge from research papers (PDFs) and adds it to the wiki. Currently has placeholder implementation.

Location: src/knowledge/learners/paper_learner.py

Goal

Implement the full paper learning pipeline that:

  1. Loads PDF (local or from URL like arXiv)
  2. Parses sections and extracts knowledge
  3. Creates wiki pages from extracted knowledge

What to Extract

  • Abstract and introduction → Overview/concept pages
  • Methodology sections → Workflow pages
  • Key findings and conclusions → Principle pages
  • Formulas and algorithms → Implementation pages
  • References → Related pages links

Input Format

{"path": "./paper.pdf"}  # Local file
# or
{"url": "https://arxiv.org/pdf/..."}  # Remote URL

Implementation Steps

  1. Load PDF (local or download from URL)
  2. Extract text using PyPDF2 or pdfplumber
  3. Identify sections (Abstract, Methods, Results, etc.)
  4. Extract formulas using OCR if needed
  5. Create structured chunks per section
  6. Convert chunks to wiki pages
  7. Index into knowledge graph

Deliverable

  1. Full implementation of PaperLearner.learn() method
  2. PDF parsing and section extraction
  3. Wiki page generation from paper content

Test

  1. Call expert.learn(Source.Paper("./sample_paper.pdf"))
  2. Verify wiki pages created with paper sections
  3. Verify formulas and algorithms extracted correctly

References

  • src/knowledge/learners/paper_learner.py - Current placeholder
  • src/knowledge/learners/base.py - Base class and KnowledgeChunk
  • src/knowledge/wiki_structure/extension_of_product.md - Wiki page structure

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Medium Priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions