atmo.clusters

The code base to manage AWS EMR clusters.

atmo.clusters.forms

class atmo.clusters.forms.EMRReleaseChoiceField(*args, **kwargs)[source]

A ModelChoiceField subclass that uses EMRRelease objects for the choices and automatically uses a “radioset” rendering – a horizontal button group for easier selection.

label_from_instance(obj)[source]

Append the status of the EMR release if it’s experimental or deprecated.

class atmo.clusters.forms.ExtendClusterForm(*args, **kwargs)[source]
class atmo.clusters.forms.NewClusterForm(*args, **kwargs)[source]

A form used for creating new clusters.

Parameters:
  • identifier (RegexField) – A unique identifier for your cluster, visible in the AWS management console. (Lowercase, use hyphens instead of spaces.)
  • size (IntegerField) – Number of workers to use in the cluster, between 1 and 30. For testing or development 1 is recommended.
  • lifetime (IntegerField) – Lifetime in hours after which the cluster is automatically terminated, between 2 and 24.
  • ssh_key (ModelChoiceField) – Ssh key
  • emr_release (EMRReleaseChoiceField) – Different AWS EMR versions have different versions of software like Hadoop, Spark, etc. See what’s new in each.

atmo.clusters.models

class atmo.clusters.models.Cluster(id, created_at, modified_at, created_by, emr_release, identifier, size, lifetime, lifetime_extension_count, ssh_key, expires_at, started_at, ready_at, finished_at, jobflow_id, most_recent_status, master_address, expiration_mail_sent)[source]
Parameters:
  • id (AutoField) – Id
  • created_at (DateTimeField) – Created at
  • modified_at (DateTimeField) – Modified at
  • created_by_id (ForeignKey to User) – User that created the instance.
  • emr_release_id (ForeignKey to EMRRelease) – Different AWS EMR versions have different versions of software like Hadoop, Spark, etc. See what’s new in each.
  • identifier (CharField) – Cluster name, used to non-uniqely identify individual clusters.
  • size (IntegerField) – Number of computers used in the cluster.
  • lifetime (PositiveSmallIntegerField) – Lifetime of the cluster after which it’s automatically terminated, in hours.
  • lifetime_extension_count (PositiveSmallIntegerField) – Number of lifetime extensions.
  • ssh_key_id (ForeignKey to SSHKey) – SSH key to use when launching the cluster.
  • expires_at (DateTimeField) – Date/time that the cluster will expire and automatically be deleted.
  • started_at (DateTimeField) – Date/time when the cluster was started on AWS EMR.
  • ready_at (DateTimeField) – Date/time when the cluster was ready to run steps on AWS EMR.
  • finished_at (DateTimeField) – Date/time when the cluster was terminated or failed on AWS EMR.
  • jobflow_id (CharField) – AWS cluster/jobflow ID for the cluster, used for cluster management.
  • most_recent_status (CharField) – Most recently retrieved AWS status for the cluster.
  • master_address (CharField) – Public address of the master node.This is only available once the cluster has bootstrapped
  • expiration_mail_sent (BooleanField) – Whether the expiration mail were sent.
exception DoesNotExist
exception MultipleObjectsReturned
deactivate()[source]

Shutdown the cluster and update its status accordingly

extend(hours)[source]

Extend the cluster lifetime by the given number of hours.

info

Returns the provisioning information for the cluster.

is_active

Returns whether the cluster is active or not.

is_expiring_soon

Returns whether the cluster is expiring in the next hour.

is_failed

Returns whether the cluster has failed or not.

is_ready

Returns whether the cluster is ready or not.

is_terminated

Returns whether the cluster is terminated or not.

is_terminating

Returns whether the cluster is terminating or not.

save(*args, **kwargs)[source]

Insert the cluster into the database or update it if already present, spawning the cluster if it’s not already spawned.

sync(info=None)[source]

Should be called to update latest cluster status in self.most_recent_status.

class atmo.clusters.models.EMRRelease(created_at, modified_at, version, changelog_url, help_text, is_active, is_experimental, is_deprecated)[source]
Parameters:
  • created_at (DateTimeField) – Created at
  • modified_at (DateTimeField) – Modified at
  • version (CharField) – Version
  • changelog_url (TextField) – The URL of the changelog with details about the release.
  • help_text (TextField) – Optional help text to show for users when creating a cluster.
  • is_active (BooleanField) – Whether this version should be shown to the user at all.
  • is_experimental (BooleanField) – Whether this version should be shown to users as experimental.
  • is_deprecated (BooleanField) – Whether this version should be shown to users as deprecated.
exception DoesNotExist
exception MultipleObjectsReturned

atmo.clusters.provisioners

class atmo.clusters.provisioners.ClusterProvisioner[source]

The cluster specific provisioner.

format_info(cluster)[source]

Formats the data returned by the EMR API for internal ATMO use.

format_list(cluster)[source]

Formats the data returned by the EMR API for internal ATMO use.

info(jobflow_id)[source]

Returns the cluster info for the cluster with the given Jobflow ID with the fields start time, state and public IP address

job_flow_params(*args, **kwargs)[source]

Given the parameters returns the extended parameters for EMR job flows for on-demand cluster.

list(created_after, created_before=None)[source]

Returns a list of cluster infos in the given time frame with the fields: - Jobflow ID - state - start time

start(user_username, user_email, identifier, emr_release, size, public_key)[source]

Given the parameters spawns a cluster with the desired properties and returns the jobflow ID.

stop(jobflow_id)[source]

Stops the cluster with the given JobFlow ID.

atmo.clusters.queries

class atmo.clusters.queries.ClusterQuerySet(model=None, query=None, using=None, hints=None)[source]

A Django queryset that filters by cluster status.

Used by the Cluster model.

active()[source]

The clusters that have an active status.

failed()[source]

The clusters that have an failed status.

terminated()[source]

The clusters that have an terminated status.

class atmo.clusters.queries.EMRReleaseQuerySet(model=None, query=None, using=None, hints=None)[source]

A Django queryset for the EMRRelease model.

active()[source]
deprecated()[source]

The EMR releases that are deprecated.

experimental()[source]

The EMR releases that are considered experimental.

natural_sort_by_version()[source]

Sorts this queryset by the EMR version naturally (human-readable).

stable()[source]

The EMR releases that are considered stable.

atmo.clusters.tasks

Task: atmo.clusters.tasks.deactivate_clusters

Deactivate clusters that have been expired.

Task: atmo.clusters.tasks.send_expiration_mails

Send expiration emails an hour before the cluster expires.

Task: atmo.clusters.tasks.update_clusters

Update the cluster metadata from AWS for the pending clusters.

  • To be used periodically.
  • Won’t update state if not needed.
  • Will queue updating the Cluster’s public IP address if needed.
Task: atmo.clusters.tasks.update_master_address(cluster_id, force=False)

Update the public IP address for the cluster with the given cluster ID

atmo.clusters.views

atmo.clusters.views.detail_cluster(request, id)[source]

View to show details about an existing cluster.

atmo.clusters.views.extend_cluster(request, id)[source]

View to extend the lifetime an existing cluster.

atmo.clusters.views.new_cluster(request)[source]

View to create a new cluster.

atmo.clusters.views.terminate_cluster(request, id)[source]

View to terminate an existing cluster.