Skip to content

Releases: aritraroy24/ComProScanner

v0.1.6

02 Apr 10:26

Choose a tag to compare

Changed

Added

  • Guide for API key creation for various LLM providers and publisher APIs added to the documentation at docs/getting-started/api-key-guide.md with detailed instructions for each provider.

Fixed

  • Model prefix handling in rag_tool.py standardized to reflect the docs.
  • HF_TOKEN documentation clarified as optional — only required for gated or private Hugging Face models.

Full Changelog: v0.1.5...v0.1.6

Archived version of ComProScanner referenced in the Digital Discovery paper

14 Mar 22:31

Choose a tag to compare

Archived version of ComProScanner which is referenced in the Digital Discovery paper.

This release includes:

  • the snapshot of ComProScanner package which has been referenced in the Digital Discovery paper
  • examples folder with:
    • minimal and test (used script for evaluation) scripts to run ComProScanner
    • 5 years of piezoelectric materials-related journal articles' metadata
    • collected full-text articles where d33 were mentioned as CSV and vector-database entry
    • data related 100 randomly chosen DOIs from the 3917 d33-mentioned articles to benchmark ComProScanner across 10 different cost-efficient LLMs.
    • all model logs and outputs for 100 test articles across 10 LLMs.
    • data related to the comparison with similar existing frameworks (Eunomia and the extraction agent by CMEG-IITR)
    • scripts to regenerate graphs and other relation information for the paper.

v0.1.3

14 Mar 23:01

Choose a tag to compare

Fixed

  • RecursiveCharacterTextSplitter importing updated for the latest langchain version to avoid import errors

Full Changelog: v0.1.2...v0.1.3

v0.1.2

14 Mar 22:57

Choose a tag to compare

  • Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
    arXiv:2510.20362

Full Changelog: v0.1.1...v0.1.2

v0.1.1

14 Mar 22:53

Choose a tag to compare

README images updated with an external image link to fix the PyPI rendering issue.

v0.1.0

14 Mar 22:52

Choose a tag to compare

Initial release of ComProScanner

v0.1.5

14 Mar 23:06

Choose a tag to compare

Added

  • Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the examples/piezo_test/comparing_existing_frameworks folder.

  • New parameter apply_advanced_cleaning added to data cleaning methods in data_cleaner.py. When set to True, it triggers the advanced cleaning pipeline.

  • Advanced composition cleaning methods in data_cleaner.py:

    • _remove_miller_indices() - Removes crystal plane notations from chemical formulas
    • _remove_zero_coefficient_elements() - Removes elements with zero coefficients
    • _normalize_coefficients() - Removes trailing zeros from coefficients
    • _expand_leading_and_trailing_coefficients() - Expands leading/trailing coefficient patterns
    • _expand_parenthetical_coefficients() - Expands nested bracket coefficients
  • Enhanced documentation in docs/usage/data-cleaning.md:

    • Added apply_advanced_cleaning parameter documentation
    • Added Mermaid process flow diagram showing cleaning stages
    • Added advanced cleaning examples with tables for each transformation type
  • Template for GitHub issues added to .github/ISSUE_TEMPLATE for the following topics:

    • bug reports
    • feature requests
    • documentation improvements
    • support questions
  • Changelog page added in the documentation. Also, CHANGELOG.md linked in README.md.

  • DeepWiki integration badge added to README.md for community Q&A support:

  • arXiv preprint badge added to README.md:

  • CITATION.cff added for standardized citation information based on the latest release and arXiv preprint.

Fixed

  • OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.

  • Empty/corrupted PDF handled in pdf_processor.py and wiley_processor.py to avoid having GLYPH errors during text extraction.

  • Data extraction failures fixed if composition-property text data is empty.

  • CSV progress tracking in elsevier_processor.py:

    • DtypeWarning resolved by adding dtype=str, low_memory=False to pd.read_csv()
    • Data loss issue fixed with immediate CSV persistence for processed articles
    • Sleep delays optimized for batch writes
  • Type annotation warnings in documentation build (griffe/mkdocstrings):

    • Added return type annotations to function signatures in comproscanner.py
    • Added return type annotations to all visualization functions in data_visualizer.py and eval_visualizer.py
    • Fixed parameter type format in docstrings from colon to comma notation
    • Added TYPE_CHECKING conditional imports for matplotlib Figure type
    • Fixed **kwargs type annotations across multiple modules
  • Numbered list formatting in docs/about/contribution.md:

    • Fixed list continuation by using 4-space indentation for code blocks and nested lists
    • Disabled format on save for Markdown files in .vscode/settings.json
  • GitHub Actions CI disk space issue:

    • Added --no-cache-dir flag to pip install to reduce disk usage

Changed

  • README badges section converted from HTML to markdown format for better compatibility across platforms.

Full Changelog: v2026.02.02...v0.1.5

v0.1.4

14 Mar 23:04

Choose a tag to compare

Added

  • New function clean_data() added for improved data cleaning and preprocessing instead of integrating it into data extraction function.

  • New documentation page for Data Cleaning added:

    • docs/usage/data-cleaning.md
    • Added to mkdocs.yml navigation.
  • New API overview documentation page added:

    • docs/api.md
    • Added to mkdocs.yml navigation.
    • New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
  • New tests added for remaining utils functions.

  • Added pytest coverage tracking (50%) using pytest-cov and coverage report generation using codecov.

Fixed

  • Tests updated to reflect changes in data cleaning process.

Removed

  • Arguments related to data cleaning removed from data extraction function.

Changed


Full Changelog: v0.1.3...v0.1.4