Welcome to S3 Downloader’s documentation!

S3 Downloader

Main Module

class s3.dumper.Downloader(bucket_name: str, download_dir: str | None = None, region_name: str | None = None, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, logger: ~logging.Logger | None = None, log_type: ~s3.logger.LogType = LogType.stdout, sort: ~s3.squire.Sort = Sort.no_sort, prefix: str | ~typing.List[str] | None = None, retry_config: ~botocore.config.Config = <botocore.config.Config object>, transfer_config: ~boto3.s3.transfer.TransferConfig = <boto3.s3.transfer.TransferConfig object>)

Initiates Downloader object to download an entire S3 bucket.

>>> Downloader

Initiates all the necessary args and creates a boto3 session with retry logic.

Parameters:
  • bucket_name – Name of the bucket.

  • download_dir – Name of the download directory. Defaults to bucket name.

  • region_name – Name of the AWS region.

  • profile_name – AWS profile name.

  • aws_access_key_id – AWS access key ID.

  • aws_secret_access_key – AWS secret access key.

  • logger – Bring your own logger.

  • log_type – Type of logging output. Defaults to stdout.

  • sort – Sorting options for the files to be downloaded. Defaults to no_sort.

  • prefix – Specific path [OR] list of paths from which the objects have to be downloaded.

  • retry_config – Custom retry configuration for boto3 client. Defaults to RETRY_CONFIG.

  • transfer_config – Custom transfer configuration for boto3 client. Defaults to TRANSFER_CONFIG.

Warning

  • The default sort option is no_sort which uses the default lexicographical order by object key.

  • Bucket objects are fetched using bucket.objects.all() which is paginated under the hood.

  • Sorting will pull everything into memory. This may be expensive for very large buckets.

RETRY_CONFIG: Config = <botocore.config.Config object>
TRANSFER_CONFIG: TransferConfig = <boto3.s3.transfer.TransferConfig object>
init() None

Instantiates the bucket instance.

Raises:
  • ValueError – If no bucket name was passed.

  • BucketNotFound – If bucket name was not found.

exit() None

Logs if there were any failures.

get_objects() List[S3Object]

Get all the objects in the target s3 bucket.

Raises:
Returns:

List of objects in the bucket.

Return type:

List[S3Object]

downloader(s3_object: S3Object, callback: ProgressPercentage) None

Download the files in the exact path as in the bucket.

Parameters:
  • s3_object – Takes the S3Object as an argument.

  • callback – Takes the ProgressPercentage callback to track download progress.

See also

  • Checks if the file already exists and is of the same size to avoid redundant downloads.

get_downloads() List[S3Object]

Filters out the objects that are not files and cannot be downloaded.

Returns:

List of objects that can be downloaded.

Return type:

List[S3Object]

run() None

Initiates bucket download in a traditional loop.

run_in_parallel(threads: int = 5) None

Initiates bucket download in multi-threading.

Parameters:

threads – Number of threads to use for downloading using multi-threading.

get_bucket_structure(raw: bool = False) str | Dict[str, int]

Gets all the objects in an S3 bucket and forms it into a hierarchical folder like representation.

Returns:

Returns a hierarchical folder like representation of the chosen bucket or the set of objects if raw is True.

Return type:

Union[str, Dict[str, int]]

save_bucket_structure(filename: str = 'bucket_structure.json', convert_size: bool = False) None

Saves the bucket structure in a JSON file.

Parameters:
  • filename – Name of the file to save the bucket structure in.

  • convert_size – Whether to convert the size into human-readable format or not.

print_bucket_structure() None

Prints all the objects in an S3 bucket with a folder like representation.

Exceptions

Module to store all the custom exceptions and formatters.

>>> S3Error
exception s3.exceptions.S3Error

Custom error for base exception to the s3-downloader module.

exception s3.exceptions.BucketNotFound

Custom error for bucket not found.

exception s3.exceptions.NoObjectFound

Custom error for no objects found.

exception s3.exceptions.InvalidPrefix(prefix: str, bucket_name: str)

Custom exception for invalid prefix value.

Initialize an instance of InvalidPrefix object inherited from S3Error

Parameters:
  • prefix – Prefix to limit the objects.

  • bucket_name – Name of the S3 bucket.

format_error_message()

Returns the formatter error message as a string.

Progress

class s3.progress.ProgressPercentage(filename: str, size: int, bar: alive_bar)

Tracks the file transfer progress in S3 and updates the alive_bar.

>>> ProgressPercentage

Initializes the progress tracker.

Parameters:
  • filename – Name of the file being transferred.

  • size – Total size of the file in bytes.

  • bar – alive_bar instance to update progress.

Squire

s3.squire.refine_prefix(prefix: str | List[str] | None = None) Generator[str]

Refines the prefix input to ensure it is a list of strings.

Parameters:

prefix – A string or a list of strings representing the prefix(es) to filter S3 objects.

Yields:

str – Yields strings representing the refined prefix(es).

s3.squire.size_converter(byte_size: int | float) str

Gets the current memory consumed and converts it to human friendly format.

Parameters:

byte_size – Receives byte size as argument.

Returns:

Converted human understandable size.

Return type:

str

s3.squire.convert_to_folder_structure(sequence: Dict[str, int]) str

Convert objects in an S3 bucket into a folder-like representation including sizes.

Parameters:

sequence – A dictionary where keys are S3 object keys (paths) and values are their sizes in bytes.

Returns:

A string representing the folder structure of the S3 bucket, with each file and folder showing the size.

Return type:

str

s3.squire.format_bucket_structure(bucket_structure: Dict[str, int], convert_size: bool) Dict[str, Any]

Formats the bucket structure into a human-readable string.

Parameters:
  • bucket_structure – A dictionary where keys are S3 object keys (paths) and values are their sizes in bytes.

  • convert_size – A boolean indicating whether to convert sizes to human-readable format.

Returns:

A dictionary representing the folder structure of the S3 bucket, with each file and folder showing the size.

Return type:

Dict[str, Any]

class s3.squire.S3Object(key: str, size: int)

Represents an S3 object with its key and size.

key: str
size: int
class s3.squire.DownloadResults

Object to store results of S3 download.

>>> DownloadResults
success: int = 0
failed: int = 0
skipped: int = 0
class s3.squire.Sort(value)

Enum to represent sorting options for S3 objects.

>>> Sort
size: str = 'size'
size_desc: str = 'size_desc'
key: str = 'key'
key_desc: str = 'key_desc'
last_modified: str = 'last_modified'
last_modified_desc: str = 'last_modified_desc'
no_sort: str = 'no_sort'

Logger

Loads a default logger with StreamHandler set to DEBUG mode.

>>> logging.Logger
class s3.logger.LogType(value)

Defines the type of logging output.

>>> LogType
file: str = 'file'
stdout: str = 'stdout'
s3.logger.default_handler(log_type: LogType) StreamHandler | FileHandler

Creates a handler and assigns a default format to it.

Parameters:

log_type – An instance of the LogType enum to specify the type of logging output.

Returns:

Returns an instance of either StreamHandler or FileHandler based on the specified log type.

Return type:

Union[logging.StreamHandler, logging.FileHandler]

s3.logger.default_format() Formatter

Creates a logging Formatter with a custom message and datetime format.

Returns:

Returns an instance of the Formatter object.

Return type:

logging.Formatter

s3.logger.default_logger(log_type: LogType) Logger

Creates a default logger with debug mode enabled.

Parameters:

log_type – An instance of the LogType enum to specify the type of logging output.

Returns:

Returns an instance of the Logger object.

Return type:

logging.Logger

Indices and tables