Welcome to Git2S3’s documentation!

Git2S3 - Main

class git2s3.main.Git2S3(env_file: str | os.PathLike = '.env', logger: Logger = None, max_per_page: int = 100)

Instantiates Git2S3 object to clone all repos/wiki/gists from GitHub and upload to S3.

>>> Git2S3
Keyword Arguments:
  • env_file – Environment configuration.

  • logger – Bring your own logger object.

  • max_per_page – Maximum number of repos to fetch per page.

profile_type() str

Get the profile type.

Returns:

Returns the profile type.

Return type:

str

cli(cmd: str, fail: bool = True, retry: bool = False) int

Runs CLI commands.

Parameters:
  • cmd – Command to run.

  • fail – Boolean flag to fail on errors.

Returns:

Return code after running the command.

Return type:

int

get_all(source: SourceControl) Generator[Dict[str, str]]

Iterate through a target owner/organization to get all available repositories/gists.

Parameters:

source – Source type to clone.

Yields:

Generator[Dict[str, str]] – Yields a dictionary of each repo’s information.

set_pat(url: Union[str, Url]) Optional[Union[str, Url]]

Creates an authenticated URL by updating the netloc, and sets that as the origin URL.

Parameters:

url – Takes the repository/gist/wiki URL as input.

See also

  • This step is not required for:
    • Public repositories/gists/wiki

clone_wiki(datastore: DataStore) None

Clone all the wikis from the repository.

Parameters:

datastore – DataStore model to store repository/gist information.

worker(repo: Dict[str, str]) None

Clones repository/gist/wiki from GitHub.

Parameters:

repo – Repository information as JSON payload.

Raises:
  • Exception

  • If the thread fails to clone the repository.

cloner(source: SourceControl) bool

Clones all the repos/gists concurrently.

Parameters:

source – Source type to clone.

See also

  • Clones all the repos/gists concurrently using ThreadPoolExecutor.

  • GitHub doesn’t have a rate limit for cloning, so multi-threading is safe.

  • This makes it depend on Git installed on the host machine.

References

https://github.com/orgs/community/discussions/44515

Returns:

Returns a boolean flag to indicate if any of the threads failed.

Return type:

bool

start() None

Start the cloning process and upload to S3 once cloning completes successfully.

S3

class git2s3.s3.Uploader(env: EnvConfig, logger: Logger)

Concurrent uploader object to upload files to S3.

>>> Uploader
Keyword Arguments:
  • env – Environment configuration.

  • logger – Logger object.

upload_file(local_file_path: str | os.PathLike, s3_file_path: str | os.PathLike) None

Uploads an object to S3.

Parameters:
  • local_file_path – Local file path to upload from.

  • s3_file_path – S3 file path to upload to.

trigger() int

Trigger to upload all file objects concurrently to S3.

Returns:

Returns a failed count to indiciate the number files that were failed to upload.

Return type:

int

Squire

git2s3.squire.archer(destination: str) None

Archives a given directory and deletes it while retaining the zipfile.

Parameters:

destination – Directory path to be archived.

Raises:
  • AssertionError

  • If zipfile is not present after archiving.

git2s3.squire.env_loader(filename: str | os.PathLike) EnvConfig

Loads environment variables based on filetypes.

Parameters:

filename – Filename from where env vars have to be loaded.

Returns:

Returns a reference to the EnvConfig object.

Return type:

config.EnvConfig

git2s3.squire.source_detector(repo: Dict[str, str], env: EnvConfig) DataStore

Detects the type of source to clone and returns the DataStore model.

Parameters:
  • repo – Repository information as a dict.

  • env – Environment configuration.

Returns:

DataStore model.

Return type:

config.DataStore

git2s3.squire.default_logger(env: EnvConfig) Logger

Generates a default console logger.

Parameters:

env – Environment configuration.

Returns:

Logger object.

Return type:

logging.Logger

git2s3.squire.check_file_presence(source_dir: str | os.PathLike) int

Get a list of all subdirectories and check for file presence.

Parameters:

source_dir – Root directory to check for file presence.

Returns:

Returns the total number of zip files cloned.

Return type:

int

Configuration

class git2s3.config.DataStore(BaseModel)

DataStore model to store repository/gist information.

>>> DataStore
source: SourceControl
clone_url: Url
name: str
description: Optional[str]
private: bool

class git2s3.config.EnvConfig(BaseSettings)

Configure all env vars and validate using pydantic.

>>> EnvConfig
git_api_url: Url
git_owner: str
git_token: str
git_ignore: List[str]
incomplete_upload: bool
source: Union[SourceControl, List[SourceControl]]
log: LogOptions
debug: bool
local_store: bool
aws_profile_name: str | None
aws_access_key_id: str | None
aws_secret_access_key: str | None
aws_region_name: str | None
aws_bucket_name: str
aws_s3_prefix: str
boto3_retry_attempts: int
boto3_retry_mode: Boto3RetryMode
classmethod from_env_file(filename: Path) EnvConfig

Create an instance of EnvConfig from environment file.

Parameters:

filename – Name of the env file.

See also

  • Loading environment variables from files are an additional feature.

  • Both the system’s and session’s env vars are processed by default.

Returns:

Loads the EnvConfig model.

Return type:

EnvConfig

classmethod parse_source(value: Union[SourceControl, List[SourceControl]]) Path

Validate and parse ‘source’ to remove ‘all’ from the source option.

classmethod parse_git_api_url(value: Url) str

Parse git_api_url stripping the / at the end.

classmethod parse_git_ignore(value: List[str]) List[str]

Convert all git_ignore values to lowercase.

class Config

Environment variables configuration.

env_prefix = ''
extra = 'allow'
hide_input_in_errors = True

class git2s3.config.LogOptions(StrEnum)

Available log options for default logger.

>>> LogOptions
stdout: str = 'stdout'
file: str = 'file'

class git2s3.config.SourceControl(StrEnum)

Available source control options to clone.

>>> SourceControl
all: str = 'all'
gist: str = 'gist'
repo: str = 'repo'
wiki: str = 'wiki'

Exceptions

exception git2s3.exc.DirectoryExists

Warning: Raised when clone directory already exists.

exception git2s3.exc.UnsupportedSource

Warning: Raised when source is not supported.

exception git2s3.exc.Git2S3Error

Exception: Base class for all exceptions.

exception git2s3.exc.GitHubAPIError

Exception: Raised when failed to fetch repositories from source control.

exception git2s3.exc.InvalidOwner

Exception: Raised when owner is invalid.

exception git2s3.exc.InvalidSource

Exception: Raised when source is invalid.

exception git2s3.exc.ArchiveError

Exception: Raised when failed to archive repositories.

exception git2s3.exc.UploadError

Exception: Raised when failed to upload file objects to S3.

Indices and tables