Troubleshooting Tips

Guidance on troubleshooting common issues that may arise when using Nucleator


Using the CloudFormation Console with Nucleator

Most Nucleator Stacksets execute one or more CloudFormation templates, which result in the creation of one or more CloudFormation stacks in AWS.  This page documents tips for using the CloudFormation Console with Nucleator.

Using Nucleator with Windows

Nucleator runs on Unix-based systems.  If you would like to use it on a Windows machine, this page documents tips we have learned for either running Virtual Box on Windows or accessing a Linux instance in AWS.

Account Setup Fails Due to S3 Bucket Creation Error

nucleator account setup creates S3 Buckets used by Nucleator and AWS services. These buckets typically have a long lifespan - deletion and recreation of account setup resources are very infrequent events.  Should you delete the setup-account stack, CloudFormation will delete the S3 buckets that were created as part of  nucleator account setup.  The names of buckets that are deleted in S3 do not become immediately available for re-use.  Because S3 bucket names must be unique across all AWS regions, time is required to synchronize available names across all regions, globally.  The delay for such a name to become available again can be substantial – in our observation it is typically measured in hours.  To ensure that the setup-account stack can be deleted and then immediately recreated, Nucleator adds a "uuid-like" string as a suffix to S3 buckets created by nucleator account setup.  This suffix, once created, is persisted in ~/.nucleator/nucleator-<account>-<customer_domain>.  If a suffix is already present, it will continue to be used.  If not, a new suffix is generated and persisted.

Deletion of the setup-account Cloudformation stack does not delete the uuid-file in ~/.nucleator/nucleator-<account>-<customer_domain>.  Unless this file is removed, subsequent runs of nucleator account setup will specify the same bucket name as was just deleted in the resulting CloudFormation template.  When CloudFormation attempts to create the bucket, bucket creation may fail because the bucket name has not yet become available after its recent deletion.  If the file is removed, Nucleator will regenerate it with a different uuid-like bucketname suffix.  This enables the bucket to be created immediately.

If nucleator account setup fails with an error indicating that it is unable to create specified S3 buckets after a recent delete of the setup-account CloudFormation stacks, remove the partially created resulting setup-account stack, remove the uuid-like file ~/.nucleator/nucleator-<account>-<customer_domain> then try again to run nucleator account setup.

Cage Provision Fails Due to Incorrect Availability Zone

Due to an AWS API call issue on accounts that pre-date VPCs it is possible that the setup wizard may record incorrect availability zones (AZ) in the {customer}.yml siteconfig file.  If this occurs nucleator cage provision will fail.  Each AWS account uses a unique set of AZs for locating their resources and if the AZ definition in siteconfig is incorrect nucleator cage provision will fail on attempting to create your VPCs.  

To resolve, you will need to locate the correct AZs for your account on the AWS console and edit {customer}.yml which is located in the siteconfig directory which is where the setup wizard was run (typically the ansible/roles/siteconfig/vars directory of your cloned Git siteconfig repository).

  1. Locate your accounts valid AZs:  This can be done by going to the AWS console -> VPCs -> Subnets -> Create Subnet -> and grabbing at least the first two list items of the availability zone drop-down.   These will be the AZs for the region that your console is set to.  Please note both your AZs and the region that they represent and then cancel the create subnet dialog.

    1. You will need to repeat for each region in your {customer}-{cage}.yml file.  Change the region in the top right of the AWS console to grab the availability zones for the different regions and repeat the initial steps of creating a subnet, noting the region and AZs.

  2. Edit your {customer}.yml file (replacing customer with your customer name) and scroll to the section that matches the account name in question
    1. Ensure that the account number matches the account used in step #1 in collecting AZ information 
      1. AWS console -> click on support in upper right -> select support center -> note the account number in the upper right
      2. If the account number does not match move to the correct account within the {customer}.yml or rerun step #1 while logged into the account being used for Nucleator 
    2. Below the account number is a section starting with map_region_plus_redundant_zone_number_to_vpc_valid_az
    3. Review and compare the AZ1 and AZ2 (more if they are present) for each region header in this section and compare to the AZs noted in step #1
    4. Change any AZ definition statements that do not match the information collected in step #1 to match the information captured from the console in step #1

If you care to read more about this issue additional information can be found at http://stackoverflow.com/questions/22744467/vpc-capable-availability-zones-in-amazon.  Don't forget to update your Git repository with the new versions of files.

Unable to Reach Bastion via ssh

If you ever see an error stating that Bastion cannot be reached, log into the AWS Management Console and navigate to EC2.  Identify the Bastion instance and Stop it.  Start it again.  Re-try your command.  Occurrences appear to be very rare.

A more likely reason Bastion might be unreachable is if the key file is no longer exists in your .nucleator directory or if it mismatches the ec2 key used for creating the instances.  Possibilities:

To resolve key mismatch issues , delete stacksets within the cage, delete the cage, delete the .pem file, and delete the appropriate key pair from the ec2 console.  Then begin again at cage provision.  The commands to do this would be for example:

nucleator builder delete --customer <customer> --cage <cage>
nucleator cage delete --customer <customer> --cage <cage>
rm ~/.nucleator/<customer>-<account>-<region>.pem
In the AWS console -> ec2 -> key pairs -> select the key pair named <customer>-<account>-<region> -> delete keypair

AWS Service Limits

Be aware of AWS default service limits.  It is very easy to exceed the default limits of 5 Elastic IPs, 5 VPCs per region, 20 Cloudformation stacks.  If you require higher limits, create an AWS support ticket using your Management Console.

Account Cloudformation Stackset Delete Fails Due to S3 Bucket Error

Manually deleting the setup-{account}-{customer} stackset in cloudformation will fail any of the three buckets created by Nucleator are not empty.  All three of these nucleator created s3 buckets need to be empty before they can be deleted.  Manually deleting the contents of:

buckets and rerunning the delete of the stacket will workaround this error

Local Virtual Machines, Time and AWS Authentication

It is common for local VMs (e.g. provided by Virtual Box) to have significant time drift.  If the VMs concept of time drifts far ahead or behind of Amazon's, AWS API requests will no longer properly authenticate and you will see a message like this:

BotoServerError: Failed to obtain temporary credentials for role NucleatorAgent in target account 972598532625, message: 'BotoServerError: 403 Forbidden
<ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
  <Error>
    <Type>Sender</Type>
    <Code>SignatureDoesNotMatch</Code>
    <Message>Signature expired: 20150219T155830Z is now earlier than 20150219T160112Z (20150219T161612Z - 15 min.)</Message>
  </Error>
  <RequestId>9f04f9d6-b852-11e4-875d-17a0cc0996dc</RequestId>
</ErrorResponse>

To resolve this, sync the time on the VM:

sudo ntpdate -s time.nist.gov

Incorrect Keys

Nucleator attempts to use either temporary keys specified via an IAM Instance Profile, or specified in ~/.nucleator/{{customer_name}}-credentials.yml

However some underlying tools used by Ansible (in particular, boto) also look for credentials in other places, and will use them if found.  This can cause Nucleator to fail because the credentials that are present in these locations likely do not possess the required access policies.  If you see in the console output that some user ARN other than user/NucleatorUser, then this problem is mostly likely happening.

You may need to ensure that no credentials are present in these locations, or if they are that they mirror the credentials required by Nucleator (as may be present in ~/.nucleator/{{customer_name}}-credentials.yml:

~/.aws/credentials
~/.boto

Environment variables are also a source of credentials, such as AWS_ACCESS_KEY_ID, AWS_SECRET_KEY.

 

 

Installation Documentation Releases License Community