05 - AWS Database Services
Amazon RDS (Relational Database Service)
- RDS is a Database as a Service (DBaaS) product. It can be used to provision a fully functional database without the admin overhead traditionally associated with DB platforms
- RDS supports a number of database engines - MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, Aurora
- RDS can be deployed in single AZ or Multi-AZ mode (for resilience) and supports General purpose and Memory Optimized instances
- RDS instances are managed by AWS, you cannot SSH into the VM instances
- Primary use case for RDS are Relational, Transactional databases. Best for relational datastore requirements (OLTP)
- By default, Customer are allowed to have up to 40 RDS databases
- Pricing - RDS instances are charged based on
- Instance Size
- Provisioned storage (Not Used)
- IOPS (io1)
- Data Transfer Out
- Any backups/snapshots beyond the 100% total database storage for a region
- RDS Supports encryption
- Encryption can be configured when creating DB instances
- Encryption can be added by taking a snapshot, making an encrypted snapshot, and creating a new encrypted instance from that encrypted snapshot
- Once encrypted, encryption cannot be removed
- Automated backups to S3 occur daily and can be retained from 0 to 35 days. (0 means disabled)
- Manual snapshot are taken manually and exist until deleted, and point-in-time log-based backups are also stored on S3
RDS Multi-AZ
- RDS can be provisioned in Single or Multi-AZ mode. Multi-AZ makes an exact copy of your data in another AZ
- Multi-AZ provisions a primary instance and a standby instance in a different AZ of the same region
- With MultiAZ you do not have access to secondary AZ
- Multi-AZ has Automatic Failover protection if one AZ goes down, failover will occur and the standby will be promoted to primary
RDS Read Replica
- RDS Read Replicas are read-only copies of an RDS instance that can be created in the same region or a different region from the primary instance
- Read Replicas can be addressed independently (each having their own DNS name) and used for read workloads, allowing you to scale reads
- 5 Read Replicas can be created from a RDS instance, allowing a 5x increase in reads
- Unlike Multi-AZs, read replicas can be used across regions. They also can be within the same AZ or even across AZs
- The primary instance needs to have automatic backups enabled to use Read Replicas
- With read replicas, AWS takes care of the networking aspects needed for asynchronous syncing between the primary and secondary regions
- Read replica use cases
- When needing a faster recovery time than restoration from a snapshot
- When most of your DB usage is reading rather than writing, you can scale out your datbase instances for read-only purpose. (Horizontal Scaling)
- Have a Global Resilience
Amazon Aurora
- Amazon Aurora is a fully-managed, MySQL-compatible, relational database engine that combines the speed and availability of high-end commercial databases that needs to scale, with Automatic backups, high availability, and fault tolerance
- Aurora MySQL is 5x faster over regular MySQL and 3x faster over regular Postgres
- Aurora is 1/10th the cost over its competitors with similar performance and availability options
- Aurora uses a base configuration of a cluster, which consists of one more db instances and a cluster volume that spans multiple AZs, with each AZ having a copy of the db cluster data
- A cluster contains a single primary instance and zero or more replicas
- Aurora cluster volume automatically scale as the amount of data in your database increases, up to max of 64 TB
- AWS Aurora only bills for the consumed data, and it’s constantly backed up to S3
- Aurora replicates 6 copies of your database across 3 availability zones (2 Copies of data are kept in each AZ with minimum of 3 AZ)
- Aurora Serverless
Aurora Endpoints
- Cluster endpoint - Connects to the primary db instance for the db cluster. This endpoint is the one that can perform write operation
- Reader endpoint - Connects to the one of the available Aurora replicas for the database cluster
- Custom endpoint - Represents a set of database instances that you choose. When you connect to the endpoint, Aurora performs load balancing and chooses one of the instance in the group to handle the connection.
- Instance endpoint - Connects to a specific database instance within an Aurora cluster. The instance endpoint provides direct control over connections to the db cluster.
Aurora Serverless
- On-demand, auto-scaling configuration for Amazon Aurora (MySQL-compatible and PostgreSQL-compatible editions)
- The database will automatically start up, shut down, and scale capacity up or down based on your application’s needs
- It’s a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
- Enabled you to run database in cloud without managing any database instances
- Pricing is based on per-second for the database capacity you use when the database is active
DynamoDB
- DynamoDB is a fully managed NoSQL key/value and document database that provides fast and precedable performance with seamless scalability
- It’s a regional service, partitioned regionally and allows the creation of tables, hence tables names in DynamoDB have to be regionally unique
- DynamoDB can be set to have Eventual Consistent Reads (default) and Strongly Consistent Reads
- Eventual consistent reads data is returned immediately but data can be inconsistent. Strongly Consistent Reads will wait until data is consistent
- DynamoDb stores 3 copies of data on SSD drive across 3 regions
- Key Components
- A Table is a collection of items that share the same partition key (PK) or partition key and sort key(SK) together with other configuration and performance settings
- An Item is a collection of attributes (up to 400 KB in size) inside a table that shares the same key structure as every other items in the table
- Attribute is a fundamental data element which consists of key and value
- Primary Key is a unique identifier for the items in the table
- Partition Key and Sort Key is composed of two of more attributes
- Secondary Indexes allows you to query the data in the table using the alternate key
DynamoDB Indexes
- Indexes provide an alternative representation of data in a table, which is useful for application with varying query demands
- Indexes come in two forms: Local Secondary Indexes (LSI) and Global Secondary Indexes (GSI)
- Indexes are interacted with as though they are table, but they are just an alternate representation of data in an existing table
- There are two types of Secondary Indexes
- Global Secondary Index (GSI) - An index with a partition key and sort key that can be different from that on the table
- Local Secondary Index (LSI) - An index that has the same partition key as the table, but a different sort key
- LSIs must be created at the time of creation of table. GSI can be created at any point after the table is created.
- You can define up to 20 GSI and 5 LSI per table
DynamoDB Streams
- When enabled, streams provide an ordered list of changes that occur to items within a DynamoDB table
- A stream is a rolling 24-hour window of changes
- Streams are enabled per table and only contain data from the point of being enabled.
- Streams can be configured with one of four view types:
KEYS_ONLY
- Whenever an item is added, updated or deleted, the key(s) of that item are added to the stream.
NEW_IMAGE
- The entire item is added to the stream “post-change”
OLD_IMAGE
- The entire item is added to the stream “pre-change”
NEW_AND_OLD_IMAGES
- Both the new and old versions of the items are added to the stream.
- Streams can be integrated with AWS Lambda, invoking a function whenever items are changed in a DynamoDB table (a DB trigger)
- DynamoDB has two read/write capacity modes: Provisioned throughput (default) and On-Demand mode
- When using On-Demand mode, DynamoDB automatically scales to handle performance demands and bills a per-request charge
- When using Provisioned throughput mode, each table is configured with read capacity units (RCU) and write capacity units (WCU)
Database Migration Service (DMS)
- DMS is a service to migrate relational database, To and From from any location with network connectivity to AWS
- DMS is compatible with a broad range of DB Sources, including Oracle, MS SQL, MySQL, MariaDB, PostgreSQL, MongoDB, Aurora, and SAP
- Data can be synced to most of the above engines, as well as Redshift, S# and DynamoDB
- DMS is useful in a number of common scenarios:
- Scaling database resources up and down without downtime
- Migrating databases from on-premises to AWS, from AWS to on-premises or to/from other cloud platforms
- Moving data between different DB engines, including schema conversion
- Partial / subset data migration
- Migration with little to no admin overhead, as a service