Welcome to Git2S3’s documentation!

Git2S3 - Main

class git2s3.main.Git2S3(env_file: str | os.PathLike = '.env', logger: Logger = None)

Instantiates Git2S3 object to clone all repos/wiki/gists from GitHub and upload to S3.

>>> Git2S3
Keyword Arguments:
  • env_file – Environment configuration.

  • logger – Bring your own logger object.

  • max_per_page – Maximum number of repos to fetch per page.

profile_type() str

Get the profile type.

Returns:

Returns the profile type.

Return type:

str

cli(cmd: str, fail: bool = True, retry: bool = False) int

Runs CLI commands.

Parameters:
  • cmd – Command to run.

  • fail – Boolean flag to fail on errors.

  • retry – Boolean flag to indicate that it’s a retry attempt.

Returns:

Return code after running the command.

Return type:

int

get_all(source: SourceControl) Generator[Dict[str, str]]

Iterate through a target owner/organization to get all available repositories/gists.

Parameters:

source – Source type to clone.

Yields:

Generator[Dict[str, str]] – Yields a dictionary of each repo’s information.

set_pat(url: str | pydantic.networks.HttpUrl) str | pydantic.networks.HttpUrl | None

Creates an authenticated URL by updating the netloc, and sets that as the origin URL.

Parameters:

url – Takes the repository/gist/wiki URL as input.

See also

  • This step is not required for:
    • Public repositories/gists/wiki

clone_wiki(datastore: DataStore) None

Clone all the wikis from the repository.

Parameters:

datastore – DataStore model to store repository/gist information.

worker(source: Dict[str, str]) None

Clones repository/gist/wiki from GitHub.

Parameters:

source – Repository/Gist information as JSON payload.

Raises:
  • Exception

  • If the thread fails to clone the repository.

cloner(source: SourceControl) bool

Clones all the repos/gists concurrently.

Parameters:

source – Source type to clone.

See also

  • Clones all the repos/gists concurrently using ThreadPoolExecutor.

  • GitHub doesn’t have a rate limit for cloning, so multi-threading is safe.

  • This makes it depend on Git installed on the host machine.

References

https://github.com/orgs/community/discussions/44515

Returns:

Returns a boolean flag to indicate if any of the threads failed.

Return type:

bool

start() None

Start the cloning process and upload to S3 once cloning completes successfully.

S3

class git2s3.s3.Uploader(env: EnvConfig, logger: Logger)

Concurrent uploader object to upload files to S3.

>>> Uploader
Keyword Arguments:
  • env – Environment configuration.

  • logger – Logger object.

upload_file(local_file_path: str | os.PathLike, s3_file_path: str | os.PathLike) None

Uploads an object to S3.

Parameters:
  • local_file_path – Local file path to upload from.

  • s3_file_path – S3 file path to upload to.

trigger() int

Trigger to upload all file objects concurrently to S3.

Returns:

Returns a failed count to indiciate the number files that were failed to upload.

Return type:

int

Squire

git2s3.squire.archer(destination: str) None

Archives a given directory and deletes it while retaining the zipfile.

Parameters:

destination – Directory path to be archived.

Raises:
  • AssertionError

  • If zipfile is not present after archiving.

git2s3.squire.env_loader(filename: str | os.PathLike) EnvConfig

Loads environment variables based on filetypes.

Parameters:

filename – Filename from where env vars have to be loaded.

Returns:

Returns a reference to the EnvConfig object.

Return type:

config.EnvConfig

git2s3.squire.source_detector(source: Dict[str, Any], env: EnvConfig) DataStore

Detects the type of source to clone and returns the DataStore model.

Parameters:
  • source – Repository/Gist information as a dict.

  • env – Environment configuration.

Returns:

DataStore model.

Return type:

config.DataStore

git2s3.squire.default_logger(env: EnvConfig) Logger

Generates a default console logger.

Parameters:

env – Environment configuration.

Returns:

Logger object.

Return type:

logging.Logger

git2s3.squire.check_file_presence(source_dir: str | os.PathLike) int

Get a list of all subdirectories and check for file presence.

Parameters:

source_dir – Root directory to check for file presence.

Returns:

Returns the total number of zip files cloned.

Return type:

int

git2s3.squire.is_within_last_n_days(timestamp_str: str, n_days: int) bool

Check if an ISO 8601 timestamp is within the last n days.

Parameters:
  • timestamp_str – The ISO 8601 formatted timestamp string (e.g., “2025-08-25T16:42:10Z”).

  • n_days – Number of days to look back from the current UTC time.

Returns:

True if the timestamp is within the last n_days, False otherwise.

Return type:

bool

git2s3.squire.is_older_than_n_days(timestamp_str: str, n_days: int) bool

Check if an ISO 8601 timestamp is older than n days.

Parameters:
  • timestamp_str – The ISO 8601 formatted timestamp string (e.g., “2025-08-25T16:42:10Z”).

  • n_days – Number of days to compare against from the current UTC time.

Returns:

True if the timestamp is older than n_days, False otherwise.

Return type:

bool

Configuration

class git2s3.config.DataStore(BaseModel)

DataStore model to store repository/gist information.

>>> DataStore
source: SourceControl
clone_url: HttpUrl
name: str
description: Optional[str]
private: bool

class git2s3.config.EnvConfig(BaseSettings)

Configure all env vars and validate using pydantic.

>>> EnvConfig
git_api_url: HttpUrl
git_owner: str
git_token: str
git_ignore: List[str]
max_per_page: int
backup_dir: Path
source: List[SourceControl]
log: LogOptions
debug: bool
dry_run: bool
local_store: bool
incomplete_upload: bool
aws_bucket_name: str
aws_profile_name: str | None
aws_access_key_id: str | None
aws_secret_access_key: str | None
aws_region_name: str | None
aws_s3_prefix: str
boto3_retry_attempts: int
boto3_retry_mode: Boto3RetryMode
cut_off_days: Optional[int]
classmethod from_env_file(filename: Path) EnvConfig

Create an instance of EnvConfig from environment file.

Parameters:

filename – Name of the env file.

See also

  • Loading environment variables from files are an additional feature.

  • Both the system’s and session’s env vars are processed by default.

Returns:

Loads the EnvConfig model.

Return type:

EnvConfig

classmethod parse_source(value: List[SourceControl]) Path

Validate and parse ‘source’ to remove ‘all’ from the source option.

classmethod parse_git_api_url(value: HttpUrl) str

Parse git_api_url stripping the / at the end.

classmethod parse_git_ignore(value: List[str]) List[str]

Convert all git_ignore values to lowercase.

class Config

Environment variables configuration.

env_prefix = ''
extra = 'allow'
hide_input_in_errors = True

class git2s3.config.LogOptions(StrEnum)

Available log options for default logger.

>>> LogOptions
stdout: str = 'stdout'
file: str = 'file'

class git2s3.config.SourceControl(StrEnum)

Available source control options to clone.

>>> SourceControl
gist: str = 'gist'
repo: str = 'repo'
wiki: str = 'wiki'

Exceptions

exception git2s3.exc.DirectoryExists

Warning: Raised when clone directory already exists.

exception git2s3.exc.UnsupportedSource

Warning: Raised when source is not supported.

exception git2s3.exc.Git2S3Error

Exception: Base class for all exceptions.

exception git2s3.exc.GitHubAPIError

Exception: Raised when failed to fetch repositories from source control.

exception git2s3.exc.InvalidOwner

Exception: Raised when owner is invalid.

exception git2s3.exc.InvalidSource

Exception: Raised when source is invalid.

exception git2s3.exc.ArchiveError

Exception: Raised when failed to archive repositories.

exception git2s3.exc.UploadError

Exception: Raised when failed to upload file objects to S3.

Indices and tables