Problem description
I'm using smart_open to write a big file directly into S3, and for that it has been working great. Thank you for the great package!
The issue is that sometimes the Python process gets killed and I'm left with an incomplete multi-part upload. At the moment I have the S3 bucket lifecycle rule to clean these up after a day, but ideally I wanted to resume the upload instead of just starting from scratch, making the most use of multi-part upload and saving some compute & storage.
After a quick glance at MultipartWriter.__init__, this looks impossible as self._client.create_multipart_upload is always invoked. I wanted to gauge how big of an effort would it be to support such scenario if I somehow am able to supply an UploadId to be used.
Versions
Please provide the output of:
macOS-14.5-arm64-arm-64bit
Python 3.10.12 (main, Sep 16 2023, 13:51:00) [Clang 15.0.0 (clang-1500.0.40.1)]
smart_open 7.0.4
Checklist
Before you create the issue, please make sure you have:
Problem description
I'm using smart_open to write a big file directly into S3, and for that it has been working great. Thank you for the great package!
The issue is that sometimes the Python process gets killed and I'm left with an incomplete multi-part upload. At the moment I have the S3 bucket lifecycle rule to clean these up after a day, but ideally I wanted to resume the upload instead of just starting from scratch, making the most use of multi-part upload and saving some compute & storage.
After a quick glance at
MultipartWriter.__init__, this looks impossible asself._client.create_multipart_uploadis always invoked. I wanted to gauge how big of an effort would it be to support such scenario if I somehow am able to supply anUploadIdto be used.Versions
Please provide the output of:
Checklist
Before you create the issue, please make sure you have:
Provided a minimal reproducible example, including any required data