Skip to content

Latest commit

 

History

History
148 lines (95 loc) · 6.04 KB

File metadata and controls

148 lines (95 loc) · 6.04 KB

[0.1.6] - 02-04-2026

Changed

Added

  • Guide for API key creation for various LLM providers and publisher APIs added to the documentation at docs/getting-started/api-key-guide.md with detailed instructions for each provider.

[0.1.5] - 08-02-2026

Added

  • Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the examples/piezo_test/comparing_existing_frameworks folder.

  • New parameter apply_advanced_cleaning added to data cleaning methods in data_cleaner.py. When set to True, it triggers the advanced cleaning pipeline.

  • Advanced composition cleaning methods in data_cleaner.py:

    • _remove_miller_indices() - Removes crystal plane notations from chemical formulas
    • _remove_zero_coefficient_elements() - Removes elements with zero coefficients
    • _normalize_coefficients() - Removes trailing zeros from coefficients
    • _expand_leading_and_trailing_coefficients() - Expands leading/trailing coefficient patterns
    • _expand_parenthetical_coefficients() - Expands nested bracket coefficients
  • Enhanced documentation in docs/usage/data-cleaning.md:

    • Added apply_advanced_cleaning parameter documentation
    • Added Mermaid process flow diagram showing cleaning stages
    • Added advanced cleaning examples with tables for each transformation type
  • Template for GitHub issues added to .github/ISSUE_TEMPLATE for the following topics:

    • bug reports
    • feature requests
    • documentation improvements
    • support questions
  • Changelog page added in the documentation. Also, CHANGELOG.md linked in README.md.

  • DeepWiki integration badge added to README.md for community Q&A support:

  • arXiv preprint badge added to README.md:

  • CITATION.cff added for standardized citation information based on the latest release and arXiv preprint.

Fixed

  • OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.

  • Empty/corrupted PDF handled in pdf_processor.py and wiley_processor.py to avoid having GLYPH errors during text extraction.

  • Data extraction failures fixed if composition-property text data is empty.

  • CSV progress tracking in elsevier_processor.py:

    • DtypeWarning resolved by adding dtype=str, low_memory=False to pd.read_csv()
    • Data loss issue fixed with immediate CSV persistence for processed articles
    • Sleep delays optimized for batch writes
  • Type annotation warnings in documentation build (griffe/mkdocstrings):

    • Added return type annotations to function signatures in comproscanner.py
    • Added return type annotations to all visualization functions in data_visualizer.py and eval_visualizer.py
    • Fixed parameter type format in docstrings from colon to comma notation
    • Added TYPE_CHECKING conditional imports for matplotlib Figure type
    • Fixed **kwargs type annotations across multiple modules
  • Numbered list formatting in docs/about/contribution.md:

    • Fixed list continuation by using 4-space indentation for code blocks and nested lists
    • Disabled format on save for Markdown files in .vscode/settings.json
  • GitHub Actions CI disk space issue:

    • Added --no-cache-dir flag to pip install to reduce disk usage

Changed

  • README badges section converted from HTML to markdown format for better compatibility across platforms.

[0.1.4] - 02-12-2025

Added

  • New function clean_data() added for improved data cleaning and preprocessing instead of integrating it into data extraction function.

  • New documentation page for Data Cleaning added:

    • docs/usage/data-cleaning.md
    • Added to mkdocs.yml navigation.
  • New API overview documentation page added:

    • docs/api.md
    • Added to mkdocs.yml navigation.
    • New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
  • New tests added for remaining utils functions.

  • Added pytest coverage tracking (50%) using pytest-cov and coverage report generation using codecov.

Fixed

  • Tests updated to reflect changes in data cleaning process.

Removed

  • Arguments related to data cleaning removed from data extraction function.

Changed


[0.1.3] - 04-11-2025

Fixed

  • RecursiveCharacterTextSplitter importing updated for latest langchain version to avoid import errors:
    • Changed from from langchain.text_splitter import RecursiveCharacterTextSplitter
    • To from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter

[0.1.2] - 24-10-2025

Added

  • Link to ComProScanner preprint on arXiv in the documentation index page and README.md:

[0.1.1] - 22-10-2025

Fixed


[0.1.0] - 22-10-2025

Added

  • Initial release of ComProScanner.