Welcome to S3 Downloader’s documentation!¶
S3 Downloader¶
Main Module¶
- class s3.dumper.Downloader(bucket_name: str, download_dir: str | None = None, region_name: str | None = None, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, logger: ~logging.Logger | None = None, log_type: ~s3.logger.LogType = LogType.stdout, sort: ~s3.squire.Sort = Sort.no_sort, prefix: str | ~typing.List[str] | None = None, retry_config: ~botocore.config.Config = <botocore.config.Config object>, transfer_config: ~boto3.s3.transfer.TransferConfig = <boto3.s3.transfer.TransferConfig object>)¶
Initiates Downloader object to download an entire S3 bucket.
>>> Downloader
Initiates all the necessary args and creates a boto3 session with retry logic.
- Parameters:
bucket_name – Name of the bucket.
download_dir – Name of the download directory. Defaults to bucket name.
region_name – Name of the AWS region.
profile_name – AWS profile name.
aws_access_key_id – AWS access key ID.
aws_secret_access_key – AWS secret access key.
logger – Bring your own logger.
log_type – Type of logging output. Defaults to stdout.
sort – Sorting options for the files to be downloaded. Defaults to no_sort.
prefix – Specific path [OR] list of paths from which the objects have to be downloaded.
retry_config – Custom retry configuration for boto3 client. Defaults to RETRY_CONFIG.
transfer_config – Custom transfer configuration for boto3 client. Defaults to TRANSFER_CONFIG.
Warning
The default
sortoption isno_sortwhich uses the default lexicographical order by object key.Bucket objects are fetched using
bucket.objects.all()which is paginated under the hood.Sorting will pull everything into memory. This may be expensive for very large buckets.
- RETRY_CONFIG: Config = <botocore.config.Config object>¶
- TRANSFER_CONFIG: TransferConfig = <boto3.s3.transfer.TransferConfig object>¶
- init() None¶
Instantiates the bucket instance.
- Raises:
ValueError – If no bucket name was passed.
BucketNotFound – If bucket name was not found.
- exit() None¶
Logs if there were any failures.
- get_objects() List[S3Object]¶
Get all the objects in the target s3 bucket.
- Raises:
InvalidPrefix – If no objects with the given path exists.
NoObjectFound – If the bucket is empty.
- Returns:
List of objects in the bucket.
- Return type:
List[S3Object]
- downloader(s3_object: S3Object, callback: ProgressPercentage) None¶
Download the files in the exact path as in the bucket.
- Parameters:
s3_object – Takes the
S3Objectas an argument.callback – Takes the
ProgressPercentagecallback to track download progress.
See also
Checks if the file already exists and is of the same size to avoid redundant downloads.
- get_downloads() List[S3Object]¶
Filters out the objects that are not files and cannot be downloaded.
- Returns:
List of objects that can be downloaded.
- Return type:
List[S3Object]
- run() None¶
Initiates bucket download in a traditional loop.
- run_in_parallel(threads: int = 5) None¶
Initiates bucket download in multi-threading.
- Parameters:
threads – Number of threads to use for downloading using multi-threading.
- get_bucket_structure(raw: bool = False) str | Dict[str, int]¶
Gets all the objects in an S3 bucket and forms it into a hierarchical folder like representation.
- Returns:
Returns a hierarchical folder like representation of the chosen bucket or the set of objects if raw is True.
- Return type:
Union[str, Dict[str, int]]
- save_bucket_structure(filename: str = 'bucket_structure.json', convert_size: bool = False) None¶
Saves the bucket structure in a JSON file.
- Parameters:
filename – Name of the file to save the bucket structure in.
convert_size – Whether to convert the size into human-readable format or not.
- print_bucket_structure() None¶
Prints all the objects in an S3 bucket with a folder like representation.
Exceptions¶
Module to store all the custom exceptions and formatters.
>>> S3Error
- exception s3.exceptions.S3Error¶
Custom error for base exception to the s3-downloader module.
- exception s3.exceptions.BucketNotFound¶
Custom error for bucket not found.
- exception s3.exceptions.NoObjectFound¶
Custom error for no objects found.
- exception s3.exceptions.InvalidPrefix(prefix: str, bucket_name: str)¶
Custom exception for invalid prefix value.
Initialize an instance of
InvalidPrefixobject inherited fromS3Error- Parameters:
prefix – Prefix to limit the objects.
bucket_name – Name of the S3 bucket.
- format_error_message()¶
Returns the formatter error message as a string.
Progress¶
- class s3.progress.ProgressPercentage(filename: str, size: int, bar: alive_bar)¶
Tracks the file transfer progress in S3 and updates the alive_bar.
>>> ProgressPercentage
Initializes the progress tracker.
- Parameters:
filename – Name of the file being transferred.
size – Total size of the file in bytes.
bar – alive_bar instance to update progress.
Squire¶
- s3.squire.refine_prefix(prefix: str | List[str] | None = None) Generator[str]¶
Refines the prefix input to ensure it is a list of strings.
- Parameters:
prefix – A string or a list of strings representing the prefix(es) to filter S3 objects.
- Yields:
str – Yields strings representing the refined prefix(es).
- s3.squire.size_converter(byte_size: int | float) str¶
Gets the current memory consumed and converts it to human friendly format.
- Parameters:
byte_size – Receives byte size as argument.
- Returns:
Converted human understandable size.
- Return type:
str
- s3.squire.convert_to_folder_structure(sequence: Dict[str, int]) str¶
Convert objects in an S3 bucket into a folder-like representation including sizes.
- Parameters:
sequence – A dictionary where keys are S3 object keys (paths) and values are their sizes in bytes.
- Returns:
A string representing the folder structure of the S3 bucket, with each file and folder showing the size.
- Return type:
str
- s3.squire.format_bucket_structure(bucket_structure: Dict[str, int], convert_size: bool) Dict[str, Any]¶
Formats the bucket structure into a human-readable string.
- Parameters:
bucket_structure – A dictionary where keys are S3 object keys (paths) and values are their sizes in bytes.
convert_size – A boolean indicating whether to convert sizes to human-readable format.
- Returns:
A dictionary representing the folder structure of the S3 bucket, with each file and folder showing the size.
- Return type:
Dict[str, Any]
- class s3.squire.S3Object(key: str, size: int)¶
Represents an S3 object with its key and size.
- key: str¶
- size: int¶
Logger¶
Loads a default logger with StreamHandler set to DEBUG mode.
>>> logging.Logger
- class s3.logger.LogType(value)¶
Defines the type of logging output.
>>> LogType
- file: str = 'file'¶
- stdout: str = 'stdout'¶
- s3.logger.default_handler(log_type: LogType) StreamHandler | FileHandler¶
Creates a handler and assigns a default format to it.
- Parameters:
log_type – An instance of the
LogTypeenum to specify the type of logging output.- Returns:
Returns an instance of either
StreamHandlerorFileHandlerbased on the specified log type.- Return type:
Union[logging.StreamHandler, logging.FileHandler]
- s3.logger.default_format() Formatter¶
Creates a logging
Formatterwith a custom message and datetime format.- Returns:
Returns an instance of the
Formatterobject.- Return type:
logging.Formatter