Releases · aritraroy24/ComProScanner

02 Apr 10:26

aritraroy24

v0.1.6

5516bd7

v0.1.6 Latest

Latest

Changed

Updated README.md, CITATION.cff and docs with the published version (advance article) of the ComProScanner paper in Digital Discovery as fully open access:
- ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature

Added

Guide for API key creation for various LLM providers and publisher APIs added to the documentation at docs/getting-started/api-key-guide.md with detailed instructions for each provider.

Fixed

Model prefix handling in rag_tool.py standardized to reflect the docs.
HF_TOKEN documentation clarified as optional — only required for gated or private Hugging Face models.

Full Changelog: v0.1.5...v0.1.6

Assets 2

14 Mar 22:31

aritraroy24

v2026.02.02

be06452

Archived version of ComProScanner referenced in the Digital Discovery paper

Archived version of ComProScanner which is referenced in the Digital Discovery paper.

This release includes:

the snapshot of ComProScanner package which has been referenced in the Digital Discovery paper
examples folder with:
- minimal and test (used script for evaluation) scripts to run ComProScanner
- 5 years of piezoelectric materials-related journal articles' metadata
- collected full-text articles where d33 were mentioned as CSV and vector-database entry
- data related 100 randomly chosen DOIs from the 3917 d33-mentioned articles to benchmark ComProScanner across 10 different cost-efficient LLMs.
- all model logs and outputs for 100 test articles across 10 LLMs.
- data related to the comparison with similar existing frameworks (Eunomia and the extraction agent by CMEG-IITR)
- scripts to regenerate graphs and other relation information for the paper.

Assets 2

14 Mar 23:01

aritraroy24

v0.1.3

aab9438

v0.1.3

Fixed

RecursiveCharacterTextSplitter importing updated for the latest langchain version to avoid import errors

Full Changelog: v0.1.2...v0.1.3

Assets 2

14 Mar 22:57

aritraroy24

v0.1.2

f563f20

v0.1.2

Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
arXiv:2510.20362

Full Changelog: v0.1.1...v0.1.2

Assets 2

14 Mar 22:53

aritraroy24

v0.1.1

b4ef1e4

v0.1.1

README images updated with an external image link to fix the PyPI rendering issue.

Assets 2

14 Mar 22:52

aritraroy24

v0.1.0

a4c7716

v0.1.0

Initial release of ComProScanner

Assets 2

14 Mar 23:06

aritraroy24

v0.1.5

188f07d

v0.1.5

Added

Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the examples/piezo_test/comparing_existing_frameworks folder.
New parameter apply_advanced_cleaning added to data cleaning methods in data_cleaner.py. When set to True, it triggers the advanced cleaning pipeline.
Advanced composition cleaning methods in data_cleaner.py:
- _remove_miller_indices() - Removes crystal plane notations from chemical formulas
- _remove_zero_coefficient_elements() - Removes elements with zero coefficients
- _normalize_coefficients() - Removes trailing zeros from coefficients
- _expand_leading_and_trailing_coefficients() - Expands leading/trailing coefficient patterns
- _expand_parenthetical_coefficients() - Expands nested bracket coefficients
Enhanced documentation in docs/usage/data-cleaning.md:
- Added apply_advanced_cleaning parameter documentation
- Added Mermaid process flow diagram showing cleaning stages
- Added advanced cleaning examples with tables for each transformation type
Template for GitHub issues added to .github/ISSUE_TEMPLATE for the following topics:
- bug reports
- feature requests
- documentation improvements
- support questions
Changelog page added in the documentation. Also, CHANGELOG.md linked in README.md.
DeepWiki integration badge added to README.md for community Q&A support:
- Ask DeepWiki
arXiv preprint badge added to README.md:
- arXiv:2510.20362
CITATION.cff added for standardized citation information based on the latest release and arXiv preprint.

Fixed

OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.
Empty/corrupted PDF handled in pdf_processor.py and wiley_processor.py to avoid having GLYPH errors during text extraction.
Data extraction failures fixed if composition-property text data is empty.
CSV progress tracking in elsevier_processor.py:
- DtypeWarning resolved by adding dtype=str, low_memory=False to pd.read_csv()
- Data loss issue fixed with immediate CSV persistence for processed articles
- Sleep delays optimized for batch writes
Type annotation warnings in documentation build (griffe/mkdocstrings):
- Added return type annotations to function signatures in comproscanner.py
- Added return type annotations to all visualization functions in data_visualizer.py and eval_visualizer.py
- Fixed parameter type format in docstrings from colon to comma notation
- Added TYPE_CHECKING conditional imports for matplotlib Figure type
- Fixed **kwargs type annotations across multiple modules
Numbered list formatting in docs/about/contribution.md:
- Fixed list continuation by using 4-space indentation for code blocks and nested lists
- Disabled format on save for Markdown files in .vscode/settings.json
GitHub Actions CI disk space issue:
- Added --no-cache-dir flag to pip install to reduce disk usage

Changed

README badges section converted from HTML to markdown format for better compatibility across platforms.

Full Changelog: v2026.02.02...v0.1.5

Assets 2

14 Mar 23:04

aritraroy24

v0.1.4

86a3c5e

v0.1.4

Added

New function clean_data() added for improved data cleaning and preprocessing instead of integrating it into data extraction function.
New documentation page for Data Cleaning added:
- docs/usage/data-cleaning.md
- Added to mkdocs.yml navigation.
New API overview documentation page added:
- docs/api.md
- Added to mkdocs.yml navigation.
- New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
New tests added for remaining utils functions.
Added pytest coverage tracking (50%) using pytest-cov and coverage report generation using codecov.

Fixed

Tests updated to reflect changes in data cleaning process.

Removed

Arguments related to data cleaning removed from data extraction function.

Changed

README images updated with raw GitHub links for better reliability:
- ComProScanner Logo
- ComProScanner Workflow

Full Changelog: v0.1.3...v0.1.4

Assets 2

Releases: aritraroy24/ComProScanner

v0.1.6

Changed

Added

Fixed

Uh oh!

Archived version of ComProScanner referenced in the Digital Discovery paper

Uh oh!

v0.1.3

Fixed

Uh oh!

v0.1.2

Uh oh!

v0.1.1

Uh oh!

v0.1.0

Uh oh!

v0.1.5

Added

Fixed

Changed

Uh oh!

v0.1.4

Added

Fixed

Removed

Changed

Uh oh!