Skip to content

Commit f563f20

Browse files
committed
fix: mixed data-type fixed for csv files, chore: citation data updated
1 parent b4ef1e4 commit f563f20

File tree

6 files changed

+13
-13
lines changed

6 files changed

+13
-13
lines changed

README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
[![Python Version](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
1010
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
11-
[![PyPI](https://img.shields.io/pypi/v/comproscanner)](https://pypi.org/project/comproscanner/)
1211
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://slimeslab.github.io/ComProScanner/)
1312

1413
## Overview
@@ -172,14 +171,14 @@ eval_visualizer.plot_multiple_radar_charts(
172171
If you use ComProScanner in your research, please cite:
173172

174173
```bibtex
175-
@misc{roy2025comproscanner,
174+
@misc{roy2025comproscannermultiagentbasedframework,
176175
title={ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature},
177176
author={Aritra Roy and Enrico Grisan and John Buckeridge and Chiara Gattinoni},
178177
year={2025},
179-
eprint={example},
178+
eprint={2510.20362},
180179
archivePrefix={arXiv},
181-
primaryClass={cond-mat.mtrl-sci},
182-
url={https://arxiv.org/abs/example},
180+
primaryClass={physics.comp-ph},
181+
url={https://arxiv.org/abs/2510.20362},
183182
}
184183
```
185184

docs/about/citation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33
If you use ComProScanner in your research, please cite our related paper:
44

55
```bibtex
6-
@misc{roy2025comproscanner,
6+
@misc{roy2025comproscannermultiagentbasedframework,
77
title={ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature},
88
author={Aritra Roy and Enrico Grisan and John Buckeridge and Chiara Gattinoni},
99
year={2025},
10-
eprint={example},
10+
eprint={2510.20362},
1111
archivePrefix={arXiv},
12-
primaryClass={cond-mat.mtrl-sci},
13-
url={https://arxiv.org/abs/example},
12+
primaryClass={physics.comp-ph},
13+
url={https://arxiv.org/abs/2510.20362},
1414
}
1515
```

docs/usage/evaluation/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Both methods provide:
5656
- **Classification Metrics**: Standard Precision/Recall/F1 metrics for detailed performance analysis
5757
- **Normalized Classification Metrics**: Classification metrics normalized to ensure an equitable comparison between articles with significant disparities in the quantity of extractable information.
5858

59-
To read more about the evaluation metrics, please refer the journal article [here](https://arxiv.org/abs/example).
59+
To read more about the evaluation metrics, please refer the journal article [here](https://arxiv.org/abs/2510.20362).
6060

6161
## Quick Start
6262

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "comproscanner"
7-
version = "0.1.1"
7+
version = "0.1.2"
88
description = "Multi-agent system for extracting and processing structured composition-property data from scientific literature"
99
readme = "README.md"
1010
authors = [{ name = "Aritra Roy", email = "contact@aritraroy.live" }]

src/comproscanner/utils/data_preparator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ def get_unprocessed_data(self) -> list[Dict]:
298298
Process materials data extracted from the CSV database and run CrewAI Workflow.
299299
"""
300300
all_files = glob.glob(self.extracted_folderpath + "/*.csv")
301-
dfs = [pd.read_csv(f) for f in all_files]
301+
dfs = [pd.read_csv(f, dtype=str) for f in all_files]
302302
if not dfs:
303303
logger.error(f"No files found in the folder: {self.extracted_folderpath}")
304304
raise FileNotFoundErrorHandler(

src/comproscanner/utils/database_manager.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,8 @@ def write_to_csv(self, final_df, filepath, keyword, source, csv_batch_size):
158158
output_file = f"{filepath}/{source}_{keyword}_paragraphs.csv"
159159

160160
if os.path.exists(output_file):
161-
existing_df = pd.read_csv(output_file)
161+
# Read all columns as strings to avoid mixed type issues
162+
existing_df = pd.read_csv(output_file, dtype=str)
162163
final_df = final_df[~final_df["doi"].isin(existing_df["doi"])]
163164
if not final_df.empty:
164165
combined_df = pd.concat([existing_df, final_df], ignore_index=True)

0 commit comments

Comments
 (0)