Releases: aritraroy24/ComProScanner
v0.1.6
Changed
- Updated README.md, CITATION.cff and docs with the published version (advance article) of the ComProScanner paper in Digital Discovery as fully open access:
Added
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at
docs/getting-started/api-key-guide.mdwith detailed instructions for each provider.
Fixed
- Model prefix handling in
rag_tool.pystandardized to reflect the docs. HF_TOKENdocumentation clarified as optional — only required for gated or private Hugging Face models.
Full Changelog: v0.1.5...v0.1.6
Archived version of ComProScanner referenced in the Digital Discovery paper
Archived version of ComProScanner which is referenced in the Digital Discovery paper.
This release includes:
- the snapshot of ComProScanner package which has been referenced in the Digital Discovery paper
examplesfolder with:- minimal and test (used script for evaluation) scripts to run ComProScanner
- 5 years of piezoelectric materials-related journal articles' metadata
- collected full-text articles where d33 were mentioned as CSV and vector-database entry
- data related 100 randomly chosen DOIs from the 3917 d33-mentioned articles to benchmark ComProScanner across 10 different cost-efficient LLMs.
- all model logs and outputs for 100 test articles across 10 LLMs.
- data related to the comparison with similar existing frameworks (Eunomia and the extraction agent by CMEG-IITR)
- scripts to regenerate graphs and other relation information for the paper.
v0.1.3
Fixed
- RecursiveCharacterTextSplitter importing updated for the latest langchain version to avoid import errors
Full Changelog: v0.1.2...v0.1.3
v0.1.2
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
arXiv:2510.20362
Full Changelog: v0.1.1...v0.1.2
v0.1.1
README images updated with an external image link to fix the PyPI rendering issue.
v0.1.0
Initial release of ComProScanner
v0.1.5
Added
-
Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the
examples/piezo_test/comparing_existing_frameworksfolder. -
New parameter
apply_advanced_cleaningadded to data cleaning methods indata_cleaner.py. When set toTrue, it triggers the advanced cleaning pipeline. -
Advanced composition cleaning methods in
data_cleaner.py:_remove_miller_indices()- Removes crystal plane notations from chemical formulas_remove_zero_coefficient_elements()- Removes elements with zero coefficients_normalize_coefficients()- Removes trailing zeros from coefficients_expand_leading_and_trailing_coefficients()- Expands leading/trailing coefficient patterns_expand_parenthetical_coefficients()- Expands nested bracket coefficients
-
Enhanced documentation in
docs/usage/data-cleaning.md:- Added
apply_advanced_cleaningparameter documentation - Added Mermaid process flow diagram showing cleaning stages
- Added advanced cleaning examples with tables for each transformation type
- Added
-
Template for GitHub issues added to .github/ISSUE_TEMPLATE for the following topics:
- bug reports
- feature requests
- documentation improvements
- support questions
-
Changelog page added in the documentation. Also, CHANGELOG.md linked in README.md.
-
DeepWiki integration badge added to README.md for community Q&A support:
-
arXiv preprint badge added to README.md:
-
CITATION.cff added for standardized citation information based on the latest release and arXiv preprint.
Fixed
-
OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.
-
Empty/corrupted PDF handled in
pdf_processor.pyandwiley_processor.pyto avoid having GLYPH errors during text extraction. -
Data extraction failures fixed if composition-property text data is empty.
-
CSV progress tracking in
elsevier_processor.py:- DtypeWarning resolved by adding
dtype=str, low_memory=Falsetopd.read_csv() - Data loss issue fixed with immediate CSV persistence for processed articles
- Sleep delays optimized for batch writes
- DtypeWarning resolved by adding
-
Type annotation warnings in documentation build (griffe/mkdocstrings):
- Added return type annotations to function signatures in
comproscanner.py - Added return type annotations to all visualization functions in
data_visualizer.pyandeval_visualizer.py - Fixed parameter type format in docstrings from colon to comma notation
- Added
TYPE_CHECKINGconditional imports for matplotlib Figure type - Fixed
**kwargstype annotations across multiple modules
- Added return type annotations to function signatures in
-
Numbered list formatting in
docs/about/contribution.md:- Fixed list continuation by using 4-space indentation for code blocks and nested lists
- Disabled format on save for Markdown files in
.vscode/settings.json
-
GitHub Actions CI disk space issue:
- Added
--no-cache-dirflag to pip install to reduce disk usage
- Added
Changed
- README badges section converted from HTML to markdown format for better compatibility across platforms.
Full Changelog: v2026.02.02...v0.1.5
v0.1.4
Added
-
New function
clean_data()added for improved data cleaning and preprocessing instead of integrating it into data extraction function. -
New documentation page for Data Cleaning added:
- docs/usage/data-cleaning.md
- Added to mkdocs.yml navigation.
-
New API overview documentation page added:
- docs/api.md
- Added to mkdocs.yml navigation.
- New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
-
New tests added for remaining utils functions.
-
Added pytest coverage tracking (50%) using
pytest-covand coverage report generation using codecov.
Fixed
- Tests updated to reflect changes in data cleaning process.
Removed
- Arguments related to data cleaning removed from data extraction function.
Changed
- README images updated with raw GitHub links for better reliability:
Full Changelog: v0.1.3...v0.1.4