February 24, 2020 22 min read

Certified Solutions Architect Professional - Study Notes

I recently passed my AWS Certified Solutions Architect Professional exam and along my journey of studying I scribbled down a bunch of notes I took. I thought it would be a good idea to share these random bits of information so that they can be used as flash cards / bite sized chunks of information to anyone else who is studying for this fairly vast examination.

AWS Certified Solutions Architect Professional Badge

Learning Options

The below information alone won't be enough to get you through. I highly recommend you study from some of the great online courses available from the providers below.

Whizlabs ($29.95 Exams | $12.95 Online Course)
- Largest selection of sample questions
Linux Academy ($37.42 Monthly)
- Sample Questions are a little simple, however the video content and learning material is A++
  - Particularly enjoyed the hands on labs with the temporary AWS environments
A Cloud Guru ($49 Monthly)
- Best of both worlds
  - Challenging sample questions (not as large of a selection as Whizlabs)
  - Good learning material

Active Directory
- WorkDocs
API Gateway
Athena
Aurora
Backup
Batch
Beanstalk
Billing
CloudFormation
CloudFront
CloudWatch
Data Migration Service
DirectConnect
DynamoDB
Glue
EC2
- AMI
- Autoscaling
- EBS
- EFS
- Load Balancers
ElastiCache
- Redis
IAM
Kinesis
- Kinesis Data Streams
- Kinesis Video Streams
KMS
Managed Blockchain
Migration Hub
OpsWorks
Organizations
Redshift
RDS
- RDS for Oracle
- RDS VMware
Route53
S3
SQS
Systems Manager
VPC
- VPN
- Endpoints
- NAT Gateway
X-Ray

Active Directory

SimpleAD
- Microsoft Active Directory compatible directory from AWS Directory Service and supports common features of an active directory.
- Cannot connect to existing on-prem AD
AWS Directory Service for Microsoft Active Directory
- Managed Microsoft Active Directory that is hosted on AWS cloud.
AD Connector
- Proxy service for connecting your on-premises Microsoft Active Directory to the AWS cloud.

WorkDocs

Can we used to share documents via AD directory services
Can define time duration or passcodes to access the document

API Gateway

Lambda non-proxy integration flow
- Method Request -> Integration Request -> Integration Response -> Method Response
Maximum integration timeout for AWS API gateway is 29 seconds
If you want to change the default timeout for an integration request, uncheck Use Default Timeout and change it to something else then 5 seconds
You can capture a response code and rewrite it to something custom via Gateway Responses

Athena

Serverless platform
Automatically executes queries in parallel
If asked whether to use Athena or Quicksite, look for a mention of whether the team has experience with SQL. If they do, pick Athena

Aurora

Can replicate from an external master instance or a MySQL DB instance on AWS RDS.
Aurora serverless is best suited to situations where you can’t predict what traffic will be like

Backup

The following services can be backed up and restored using AWS Backup
- EFS, DynamoDB, EBS, RDS, Volume Gateway

Batch

Configures resources, schedules when to run data analytics workloads
Suitable for running a bash script using a job
Batch scheduler evaluates when / where / how to run jobs (no need for integration with cloudwatch events to schedule)
Key components
- Jobs: unit of work (script, exec, docker container)
- Job Definitions: specifies how a job is run
- Job Queues: Jobs submitted are added to queues
- Compute Environment: compute resources that run jobs
If your Batch jobs are stuck in RUNNABLE state check:
- Role assigned has adequate permissions
- CPU and RAM given as per compute allocation
- Check EC2 limits on the account

Beanstalk

No concept of programmable infrastructure / Git source. Can’t do infrastructure as code end to end without other tooling.

Billing

Billing reports can be delivered to an S3 bucket
Consolidated billing is only available in master accounts (where there are children accounts under organisations). These reports include activity for all child accounts

CloudFormation

Retain data for S3: Set DeletionPolicy on S3 resource to retain
Create RDS Snapshot on delete: Set RDS resource DeletePolicy to snapshot
- There are three options for RDS DeletePolicy: Retain, Snapshot and Delete
To coordinate stack creations that rely on configuration to be executed on an EC2 you should use the CreationPolicy attribute under the wait condition.
If you need to reference AZ info within CloudFormation templates you can make use of the Fn::GetAZs function
Launching EC2 instances with CloudFormation requires IAM permissions to be provided to the person creating the stack
Intrinsic functions can be used in Properties, Outputs, Metadata attributes and update policy attribute

CloudFront

Managed content delivery network (CDN)
S3 Transfer Acceleration can be used to distribute S3 content more efficiently globally
Origin Access Identity can be used to grant access to objects in s3 without having to give a bucket public access.
Different HTTP methods for CloudFront forwarding and there’s uses:
- GET, HEAD: You can use CloudFront only to get objects from your origin or to get object headers.
- GET, HEAD, OPTIONS: You can use CloudFront only to get objects from your origin, get object headers, or retrieve a list of the options that your origin server supports.
- GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE: You can use CloudFront to get, add, update, and delete objects, and to get object headers. In addition, you can perform other POST operations such as submitting data from a web form.
Support for common content types such as:
- Static content (S3 website or web assets)
- Live events (streaming video)
- Media content (HLS)
TTL can be changed on CloudFront to deliver new content immediately when it changes

CloudWatch

Cron event to trigger lambda
- CloudWatch Events -> Create Rule
- provide a valid expression (00 08 * * ? *)
- Provide a target(s)
Can trigger a number of different services like Lambda, SNS, SQS, CodeBuild etc.
Cross-region dashboards are a thing that work so if you need to display metrics from different regions on one dashboard, it is possible

Data Migration Service

Suitable for migrating databases like MySQL to Aurora or RDS
Data migrated is encrypted with KMS
- By default is uses the AWS managed aws/dms key, or a custom managed key (CMK) can be provided
DMS input stream can be throttled to accommodate downstream systems that can’t ingest at full speed.
- Ingesting data to elasticsearch where indexing queue fills up

DirectConnect

Link aggregation groups (LAG) can bond DirectConnects together
Direct connect to a VPC provides access to all AZ’s
Maximum number of DirectConnect instances in a LAG is 4
- All must contain the same bandwidth
Troubleshooting Direct Connect
- Confirm no firewalls are blocking TCP 179 (or ephemeral ports)
- Confirm the ASNs match on both sides

DynamoDB

Supports autoscaling
When defining primary keys, use the many to few concept
Supported CloudWatch metrics
- ProvisionedWriteCapacityUnits
- ProvisionedReadCapacityUnits
- ConsumedWriteCapacityUnits
- ConsumedReadCapacityUnits
You cannot configure On Demand for read and Provisioned for write separately. It has to be used for one or both
Attribute called EXPIRE can provide a way of doing TTL on items in a dynamodb table

Glue

Containers crawlers that connect to
- S3, JDBC and DynamoDB
Glue has a central metadata repository (data catalog)
Fully serverless ETL

EC2

High network performance
- Cluster placement groups are recommended for applications that need low network latency / high throughput between instances in the same group
- You can detach secondary network instances when the instance is running or stopped.
  - You can’t detach the primary
If you need a static MAC address, you have to create an ENI (where a random one will be assigned). Then reattach that same ENI to different instances going forward.
- There is no way to manually config a MAC address on AWS EC2
Reservable Instances can be pools among accounts in AWS Organisations
- If 3 t2.mediums are purchased, and 1 is used in 1 account, 2 more could be used in other accounts within the organisation
Placement groups could suffer from capacity errors if you try to add new instances to the group. It's recommended to relaunch these instances again to see if you get capacity. The best thing to do is launch with the number of instances you are going to need in a placement group at the start obviously.
To improve high network throughput make use of Single Root I/O Virtualization (SR-IOV)
In order to hibernate an EC2 instances you require the following
- Instance root has to be EBS and not instance store
- Instance cannot be in an Autoscaling group (or used by ECS)
- Instance root volume must be large enough so RAM can be stored
Hibernation of instances must also be using a HVM AMI type.
- It has to be enabled on creation of the EC2 instance as well as be supported on your AMI
For specifying Drive letters on Windows instances make use of EC2Config
Lost your SSH keys? Two options
- Stop the instance, detach the root volume, attach it as another volume to another EC2, modify the authorized_keys file, move the volume back to the original instance, start it.
- Systems Manager Automation with AWSSupport-ResetAccess

AMI

You cannot create an AMI from an ec2 connected instance store
If you launch an AMI, the PEM keys will be removed, however the authorization keys will still be on the instance.
- You need to ensure that the AMI is launched with the same PEM key

Autoscaling

AZRebalance will attempt to balance the number of instances in different availability zones.
When associating an ELB with an ASG the ASG gets awareness about unhealthy instances (and can terminate)

EBS

When using an encrypted EBS volume the following data is encrypted:
- Data at rest in the volume
- Data moving between the volume and instance
- Snapshots created from the volume
- Volumes created from the snapshots
Snapshots can be created every 2, 3, 4, 6, 8, 12, 24 hours
- Lifecycle policies help retain backups required for compliance / audits. Also deleted unnecessary ones to save cost.
When using snapshots (if you don’t want downtime) don’t use RAID
Copies of snapshots with retention policies do not have policies carried over during copy.
In order to mount an EBS volume, it must be in the same AZ as the instance you are mounting to
Root volume can be changed without stopping the instance provided its to gp2, iot1, standard
- sc1 or st1 cannot UNLESS they are non-root volumes (must be at least 500gb)
When an EBS volume has two tags, multiple lifecycle policies can run at the same time
Encrypted snapshots cannot be copied to non encrypted ones
Non encrypted snapshots can be encrypted when copying them using the --kms-key-id (with your CMK)

EFS

Data is distributed across multiple availability zones which provides durability and availability.
Supports 2 throughput modes
- Bursting throughput: uses burst credits to determine if the filesystem can burst
- Provisioned throughput
Provides both in-transit and at-rest encryption using AWS KMS
Mount an EFS volume with encryption in transit by
- Getting EFS id, create mount targets for EC2 instance, use the mount helper with the -o tls flag
Does not support Windows-based clients
- Storage Gateway / File Gateway is the recommendation if you need file store (using SMB mount)

Load Balancers

When using Network Load Balancers Secure connections should be TCP 443 with targets also using TLS (port 443)
When sticky sessions are needed, it's usually recommended to use ElastiCache to store session state
- You don’t want to bind a user to a particular instance under a load balancer
- Requires code to retrieve session state from ElastiCache
If you need to get the client IP when using a Classic Load Balancer:
- TCP: configure proxy protocol to pass the IP address in a new TCP header
- HTTP: send the client IP in the x-forward-for header
Cross-zone load balancing can be enabled to spread requests across your AZ
If a static IP is needed with a load balancer, provision a NLB with an attached EIP
Application Load Balancers support SNI
- Is able to deal with multiple SSL certificates per listener

ElastiCache

Redis

Can only be upgraded, cannot be downgraded

IAM

AssumeRole can be secured down with an ExternalId
Flow for using a custom identity system
- Custom identity broker app, this authenticates the user
- Uses GetFederationToken API and passes a permission policy to get temp credentials from STS
- Alternatively can call AssumeRole API to get temp access using role-based access instead

Kinesis

Ideal for real-time data ingestion

Kinesis Data Streams

Can store records in order and replay them in the same order later (up to 7 days)
- Makes it ideal for financial transactions
Able to have multiple applications consume from the same stream concurrently.

Kinesis Video Streams

HLS can be used for live playback
- Use GetHLSStreamingSessionURL and then use the resulting URL in the video player of your choice
Content delivery typically leverages AWS Elemental MediaLive / MediaPackage and CloudFront to distribute content globally
Can view either Live or archived video

KMS

Two types of keys
- Master keys: used directly to encrypt and decrypt up to 4 kilobytes of data and can also protect data keys
- Data keys: used to encrypt and decrypt customer data
If you are accessing a very large number of KMS encrypted files at a time there is a chance you will hit the KMS encrypt request account limit. You might need to open a support case to resolve
Grants in KMS
- Dynamically / programmatically revoke a key after its use.
- Better then changing roles / policies

Managed Blockchain

Supported frameworks include Hyperledger Fabric and Ethereum.
If you have members who would like to deploy their own blockchain networks they can use the CloudFormation templates to support ECS clusters or EC2 instances

Migration Hub

AWS Discovery Agent can transmit to Migration hub, then Data exploration can be done in Athena
Agentless migrations can only pull information like RAM or Disk I/O from VMware
If your OS isn’t supports for import, you can provide the details yourself via import template
Migration steps from VMware
- Schedule migration job
- Upload your VMDK and then convert it to and EBS snapshot
- Create an AMI from the snapshot

OpsWorks

Can be managed by CloudFormation AWS::OpsWorks::Stack
- This can be part of a nested stack with a parent containing all the VPC, NAT Gateway etc. resources
Lifecycle events:
- Setup, Configure, Deploy, Undeploy, Shutdown
Handles autohealing of instances
Bluegreen style deployments can be accomplished by creating a new stack with identical configuration
- This can be used when making updates to AMIs
Process for deploying with AWS CodePipeline
- Create stack, layer and instance in a OpsWorks Stack
- Upload app code to bucket, then add your app to OpsWorks stack
- Create a pipeline (run it), verify the app deployment in OpsWorks stack
Process for updating OpsWork stacks to the latest security patches
- Run update dependencies stack command
- Create new instances to replace the only ones
When you attach a load balancer to a layer
- Deregisters currently registered instances
- Re-registers layer instances when they come online (removes offline ones)
- Handles the starting of routing requests to the registered instances

Organizations

You may only join one organization (even if you receive more than one invite)
Invitations expire after 15 days
To resend an invite, you must cancel the pending one, then create a new invitation
In order to move an account to a different OU you need the following permissions
- organizations:MoveAccount
- organizations:DescribeOrganization
Accounts can be dragged into different OU’s, however OU’s can’t be dragged around to new locations in the organization's structure.
- Instead you must create new OU’s and then reassign any SCP’s you had inplace. Then move the accounts to these OU’s again.
If you want to block access to unused services, check the IAM Activity for services (never used, last used date) and base your blocks on this information
SCP’s can only be Deny (not allow)
- Explicite denies will always overrule explicit allows
To apply WAF rules across an organization make use of AWS Firewall Manager
You cannot restrict a member account from the ability to change its root password or manage MFA settings
Improve consolidated billing by also tagging resources
- This will group expenses on the detailed billing report
To access a member account
- Use sts:AssumeRole with OrganizationAccountAccessRole
The master account isn't impacted by SCPs

Redshift

Does not have read replicas
Queries cannot be paused in Redshift
Use redshift workload management groups
- Priorities of these workloads can be assigned to these groups
Can create single node cluster via CLI (and in Console as of recently)
Using the RedshiftCommands.sql file from the Billing section of your account you can analyse billing reports.
Redshift snapshots are a very expensive solution normally, so if cost is important, don’t select anything to do with snapshots.
- Snapshots on redshift could be pointless too if you can repopulate all the data in the cluster with S3 instead

RDS

When a primary DB instance fails in a multi-AZ deployment, the CNAME is changed from primary to standby so there’s no need to change a reference to the other DB in code.
Multi-AZ replication is done synchronously
For redundant architectures Multi-AZ support is used
Read-replicas aren’t used for redundancy, they are used to improve performance.
If Encryption is enabled on the RDS instance that:
- Encrypts the underlying storing
- Defaults to also encrypting the snapshots as they are created
RDS does not support Oracle RAC
RDS Oracle can read/write from S3 directly.
- Option groups should have a role with permissions to access S3
- Feature S3_INTEGRATION
If there is an RDS update available that you aren’t ready to apply, you can Defer the updates indefinitely until you are ready.
Read Replicas require access to backups for maintaining their read replica logs. This means if you want to disable automatic backups you must remove all Read Replicas first.

RDS for Oracle

Supported backup / restore options
- Oracle Import/Export
- Oracle Data Pump Import/Export
- RDS Snapshot / point in time recover

RDS VMware

Manages:
- Patching
- Multi AZ configurations
- Backups based on retention policies
- Point-in-time restores (from on-prem or cloud backups)

Route53

Latency based routing
- Redirect requests to nearest region
If you have issues with route53 not routing to ‘live’ hosts, check to make sure you have “Evaluate Target Health” set to “Yes” on the latency alias. Same goes with HTTP health checks on weighted resources.
Resolve two domains to one domain (test1.example.com, test2.example.com -> test3.example.com)
- CNAME for the records test1.example.com, test2.example.com to test3.example.com
Resolve a DNS entry to an ALB
- Alias record test3.example.com to ALB address

S3

Using the x-amz-server-side-encryption request header when making an API call will ensure an object is server side encrypted (SSE)
If versioning is enabled on S3 after objects are already put in, those objects with have a version ID of null
Referrer keys in a bucket policy can make sure requests to Objects come from a domain you operate
INTELLIGENT_TIERING storing class is used to optimize storage costs automatically for you

SQS

Message group ID can be used on FIFO delivery to ensure messages that belong to the same message group are always processed one by one.
- E.g. binding platform with multiple products, FIFO and a message group based on the product being bid on
Dead-letter queues need to match the queue they are set up for. So a standard SQS queue needs to use a standard dead-letter queue (not FIFO)

Systems Manager

Troubleshooting why you can’t Run commands on a SSM host
- Check the latest SSM Agent is installed on the instances
- Verify the instances has an IAM role that lets its talk to SSM API
Services that can have costs associated to them
- On-Premises Instance Management: pay-as-you-go pricing
- Parameter Store: calling API costs
- System Manager Automation
Schedule log file copying from hosts
- State Manager to run a script at a given time
- Schedule in Maintenance Windows for the log file moves
Patch management can be applied to instances using the following methods
- Tag key/value pairs that identify the resources
- Patch groups, where a group requires a particular tag
- Manual selection of the hosts to patch

VPC

You cannot create subnets with overlapping CIDR ranges, you’ll get an error on trying to create.
VPC subnets will have 5 reserved addresses
- 10.0.0.0: Network address.
- 10.0.0.1: Reserved by AWS for the VPC router.
- 10.0.0.2: Reserved by AWS. The IP address of the DNS server is the base of the VPC network range plus two.
- 10.0.0.3: Reserved by AWS for future use.
- 10.0.0.255: Network broadcast address (but no broadcast supported).
When wanting to make changes to a DHCP option set, you must create a new one then assiate it to your VPC replacing the old one.
Troubleshooting EC2 in VPC unable to talk to data-center over Direct connect?
- Make sure route propagation to the Virtual Private Gateway (VGW) is setup
- Make sure the IPv4 dest address that routes the traffic over the VGW as a prefix you want to advertise
Sharing a SaaS product out via your VPC to customers can be done via AWS endpoint service (PrivateLink) to other customers VPC’s
- Customers need to use an interface VPC endpoint on their end.
Options for sharing an application running in a shared VPC within Organization
- VPN between two VPCs
- Use AWS Resource Access Manager to share subnets within the account

VPN

Creating a VPN connection requires the static IP of the customer gateway device
- With dynamic routing type, a Autonomous System Number (ASN) is also required
An option for if you need Multicask in a VPC is to build a virtual overlay network
- Create ENIs between subnets
- Runs on the OS level on the instances in your VPC.

Endpoints

Provide a secure link to access AWS resources from a VPC

NAT Gateway

Used to communicate with the internet via a private subnet
- Secure private resources like Databases and Application servers that shouldn’t have public connectivity

X-Ray

Segments allow for detailed tracing
Annotations can help find specific areas of the application in the tracing records (isolate the issues / impact area)

Miscellaneous

Below is a set of random pieces of information that didn't really need it own section.

IPS/IDS systems within VPC
- Configure to listen / block suspected bad traffic in and out of VPC
- The system could be Palo Alto networks
- Monitors, alerts and filters on potential bad traffic sent in / out of VPC.
Reducing DDOS surface area
- Remove non-critical internet entry IPs
- Configure ELB to auto-scale
Rekognition CLI example for detecting faces
- aws rekognition detect-faces
SAML identity provider in IAM
- SAML metadata document from the identity provider
- Create a SAML IAM identity provider in AWS
- Configure the SAML Identity provider with relying party trust
- In Identity provider configure SAML assertions for auth response
AWS has its own ways of protecting customers from DDoS
- If you are trying to flood a connection, or running a pentest you will likely find that you’ll be blocked by AWS
- You need to notify and have AWS grant you permission if you are running pentesting jobs
Want to access Support Ticket API?
- You need Business support plans
Alexa for Business
- You can have Alexa devices perform tasks for staff (getting info for them, booking meetings)

Summary

Did I miss something you think I should include? Please reach out to me on Twitter @nathangloverAUS and let me know!