AWS DynamoDB Summary
DynamoDB is a non-relational NoSQL database service that scales seamlessly providing fast and predictable performance.
It enables people to offload administrative burdens for operating and scaling distributed databases to AWS such as setup and config., provisioning, replication, throughput capacity planning, software patching, or cluster scaling.
With DynamoDB you get encryption at rest and you can scale your throughput capacity (up or down) without performance impact or downtime. Utilization and performance metrics can be monitored through the AWS management console.
How Does it Work?
DynamoDB is pretty interesting because it actually will automatically spread data and traffic for your tables over a number of servers to handle throughput/storage requirements. It stores data in partitions (allocation of storage for a table) which are entirely handled by DynamoDB — you never manage the partitions.
DynamoDB does all this while maintaining consistent and fast performance. Now all of the storage for your data is on SSD’s which is automatically replicated across multiple availability zones in a region. This gives DynamoDB high availability and durability.
You get encryption at rest and will be able to handle any level of request traffic. You can store and retrieve any amount of data.
DynamoDB uses tables, items, and attributes which make up the core components and building blocks you will use.
- Table — A collection of items
- Items — A collection of attributes
Think of it like an organ inside your body. An organ is a system of tissues. Tissues are systems of cells.
- Table — A collection of data (items). DynamoDB stores data (items) in a table. This is a collection of data. You can have 256 per region and they’re schema-less (they don’t have a fixed structure).
- Items — An item is a group of attributes that’s unique among other items. If you have a table of Cars, items represent a vehicle. If you have a table on people, Items will represent a person. Each table will have 0 or more items.
- Attributes — These are fundamental data elements, they’re building blocks that cannot be broken down further. If you have a database of people, attributes would be first name, last name, phone, etc.
Whenever you create a table in DynamoDB, you must specify a table name and the primary key of the table. The key identifies each item in the table uniquely so basically no two items can have the same key.
There are two different types of primary keys:
- Partition Key — This is a simple version of a primary key that is composed of one attribute.
- Partition Key and Sort Key (AKA composite primary key) — This is composed of two attributes, the partition key, and the sort key.
DynamoDB uses the partition key value as an input to an internal hash function. The output of the hash function will actually determine which partition of storage the item will be stored in. All items are stored together with the same partition key (sorted by key value).
- If a table has a partition key and a sort key it’s possible for 2 items to have the same key value. If no sort key is used, no two items can have the same partition key value.
- A composite key essentially gives you more flexibility when querying data. For example, if you’re searching your music app for an artist (assuming the app uses DynamoDB). If you type only the artist name, DynamoDB provides all songs by that artist. To get a set of particular songs by that artist, you can also provide a value of the artist along with a range of names for song titles.
A secondary index allows you to query the table using an alternate key (in addition to queries using the primary key). It’s not a requirement to use indexes with DynamoDB but it will offer more flexibility when querying your data in DynamoDB.
There are two types of indexes in DynamoDB:
- Global Secondary Index — An index with a partition and sort key that can be different from those on the table.
- Local Secondary Index — An index that has the same partition key as the table, but a different sort key.
Each DynamoDB table has a limit of:
- 20 global secondary indexes (default limit)
- 5 local secondary indexes per table
This is an optional feature in DynamoDB that captures data modification events in DynamoDB tables. All the data in streams will show up in near-real time and in the order of what events occurred.
- Stream endpoints appear as streams.dynamodb.amazonaws.com
- Each event is represented by a stream record. When you enable stream on a table, the stream record records:
- When a new item is added to the table. An image of the items is captured.
- When an item is updated. “Before” and “after” images are captured for any modified attributes.
- When an item is deleted from the table. An image of the items is captured before deletion.
- The stream record will also show the name of the table, event timestamp, and other metadata. Stream records have a 24 hour lifespan and are automatically removed from the stream thereafter.
- Streams are sliced into groups, or shards. A shard is a container for multiple stream records containing information for accessing and going through these records.
- You can also use DynamoDB streams with AWS Lambda to create a trigger — code that automatically executes when an event of interest appears in a stream.
DynamoDB streams enables powerful solutions like data replication within and across AWS regions, data analysis using Kinesis (materialized views), materialized views of data in DynamoDB tables, and more.
DAX: DynamoDB Accelerator
DAX is DynamoDB’s fully managed, highly available, in-memory cache. You would use DAX for applications requiring the fastest possible time for reads. Some exampled might include gaming, trading applications, and real-time bidding.
DAX Quick Points:
· 10X Performance Improvement (from milliseconds to microseconds). It can do this at millions of requests per second
· DAX is easy because it requires no one to manage cache invalidation, data population, or cluster management
· Compatible with DynamoDB API Calls
· Improves response times for eventually consistent reads only (DevOps)
· Data written to cache and backend store at same time (DevOps)
· Point your API calls at DAX instead of your table (DevOps)
· Cache hit — if an item you query is in the cache, DAX will return it (DevOps)
Anti-patterns (what is DAX not suitable for?):
· Not suitable for write-intensive apps or any applications that need strongly consistent reads