Accessing a Private Redshift Cluster using an SSH Tunnel through your Cage's Bastion Server
You may wish to create Redshift clusters without public accessibility to ensure that your private data stays private. It is Nucleator's default behavior to limit access to a Redshift cluster to only requests inside of your Cage. Redshift cluster login names and passwords are often insufficient to protect sensitive data and using the added protection of VPCs provides a desirable additional layer of security. Even though direct access has be prevented, access from within the Cage and external access through secure tunnels can still be provided.
Private Redshift Cluster Connectivity from Within Cage
Access to such a private Redshift cluster can be achieved by other instances within your Cage. You can test this connectivity with psql command line tool.
First, log in to an ec2 instance with the cage (bastion in this example):
ssh -F ~/.nucleator/ssh-config/<customer>/<cage> bastion-<cage>
For demonstration purposes, install psql:
sudo yum install postgresql postgresql-contrib
Now you can test a connection to your Redshift cluster
psql -h {redshift-dns-name} -v schema=public -p 5439 -U {user_name} -d {db_name} -c "SELECT * from information_schema.tables limit 5;”
This command should produce table descriptions for the first 5 informational tables. No user data needs to be added to Redshift to view this information. If an error or no information is produced the connection to the Redshift cluster is not correct.
You can use the method above to test connections to Redshift clusters from other instances within the same Cage. For some Redshift use models this level of connectivity may be all that is needed but for most use models Redshift connectivity from outside of the cage will be desired (see below). Either way it is advised that you test Redshift connection within the cage to ensure correct functionality.
Private Redshift Cluster Connectivity from the Internet
Most BI (Business Intelligence) tools will not be running inside of a customer Cage and in many cases the additional layer of security provided by a VPCs will be desired. It is still possible to connect to a private Redshift cluster in one of two ways - establishing a VPN (virtual private network) connection to the Cage or connecting through a secure ssh tunnel. Nucleator currently provides direct support for connectivity through a secure ssh tunnel. Each of these methods have their own pros and cons:
- VPN Connection (Not yet implemented):
- Pro - More flexible since this will include the client computer virtually within the Cage's public subnet, enabling access to selected private services within the Cage
- Pro - VPN User and key management functionality available from several VPN software providers
- Con - Can be less secure due to greater cage access, network bridging risk, and full access to cage for other users of the BI system
- Con - More complex, requires installation and management of VPN Software or use of an AWS VPN Gateway to on-premise networks
- SSH Tunnel Connection:
- Pro - More secure: Limited connectivity for only the Redshift connection port and only to the Redshift master node
- Pro - Simple - requires no additional software
- Con - Requires process for management and rotation of ssh keys to be used for SSH Tunneling
Establishing Internet connection to private Redshift cluster using VPN
This approach has not yet been implemented in Nucleator. In the future, it may be supported through a vpn Stackset that introduces a VPN server(s) to the public subnets of your Cage.
Establishing Internet connection to private Redshift cluster using ssh tunnel
The remote login tool ssh and its Windows cousins putty and/or git bash provide secure login access and also allow for port tunneling on top of the login connection. To establish such a connection an ssh session to the bastion server will be required. The ssh tunnel to bastion can be established by:
ssh -L 15439:<redshift_dns_name>:5439 -i ~/.nucleator/<customer>-<account>-<region>.pem ec2-user@bastion.<cage>.<customer_domain>
This connection assumes that the Redshift cluster is on the standard Redshift port of 5439. If you are using a different port the port number after the <redshift_dns_name>
will also need to change. The choice of the local port 15439 is complete arbitrary and just needs to be a port not in use by the local computer.
It is important to note that this tunnel will remain open as long as the ssh session is active. If the ssh connection is closed the tunnel will also close.
To connect your BI tools running outside of the cage to the Redshift cluster within the cage your BI tool will need to connect to the local port number (15439 in this example) on the computer that established the ssh connection. You can test the ability to connect to the Redshift cluster through an established ssh tunnel by using psql connection to the local port on the tunnel:
psql -h localhost -v schema=public -p 15439 -U {user_name} -d {db_name} -c "SELECT * from information_schema.tables limit 5;”
Again this command should produce table descriptions for the first 5 informational tables.
In the same way other BI tools can be pointed to port 15439 on localhost to establish connection to the private Redshift cluster.
Next: Using Nucleator for Continuous Integration and Delivery