Skip to content

Better S3 download_fileobj docs, note potential need to call flush() with threaded transfers #1304

@nbargnesi

Description

@nbargnesi

Threaded transfers using the S3 download_fileobj will leave the file position in a nondeterministic state.

The example from the function's docstring is:

import boto3
s3 = boto3.client('s3')

with open('filename', 'wb') as data:
    s3.download_fileobj('mybucket', 'mykey', data)

Inside the with clause a data.tell() call will behave differently if threaded transfers were used. This is made worse by the threshold put in place to guard against threaded transfers. For small files, the file position will always appear to be deterministic.

If the same approach is used with an open context (e.g. a named temporary file), the download could appear to be incomplete:

import boto3
import tempfile
s3 = boto3.client('s3')

with tempfile.NamedTemporaryFile(mode='wb') as data:
    s3.download_fileobj('mybucket', 'mykey', data)
    # do something with data before it's closed and removed

Noting this behavior and recommending the file is flushed prior to use would help catch downloads that appear to be incomplete.

def download_fileobj(self, Bucket, Key, Fileobj, ExtraArgs=None,
                     Callback=None, Config=None):
    """Download an object from S3 to a file-like object.

    The file-like object must be in binary mode.

    This is a managed transfer which will perform a multipart download in
    multiple threads if necessary. This behavior may leave the file position
    in an unexpected state. A call to `flush` may be required.

    Usage::

        import boto3

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationThis is a problem with documentation.enhancementThis issue requests an improvement to a current feature.feature-requestThis issue requests a feature.p2This is a standard priority issues3

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions