Boto3 for DynamoDB

As I described in my last post, Boto3 is the higher level SDK for AWS services. One of the services I've used quite a lot is DynamoDB. DynamoDB is a NoSQL key-value store. It's a little out of the scope of this blog entry to dive into details of DynamoDB, but it has some similarities to other NoSQL database systems like MongoDB and CouchDB.

Using Boto3, you can operate on DynamoDB stores in pretty much any way you would ever need to. You can create new tables, read and write data either individually or in bulk, you can delete tables, change table capacities, set up auto-scaling, etc.

One thing I mentioned in the past post is that when you are using Boto3 heavily, it is important to build your own set of Boto3 functions which help you to make your code more readable, and so that you don't repeat yourself.

Let me give you an example. Let's say you want to check if a table exists, create a table, then fill it with some values from a dictionary.

import boto3
session = boto3.session.Session(region_name='us-east-1')
dynamodb = session.resource('dynamodb')
table = dynamodb.Table('mytable')

table_exists = False
try:
    table.creation_date_time
    table_exists = True
except:
    table_exists = False

if table_exists:
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
else:
    new_table = dynamodb.create_table(
        AttributeDefinitions=[
            {
                'AttributeName' : 'partition_key',
                'AttributeType' : 'S',
            },
        ],
        KeySchema=[
            {
                'AttributeName' : 'partition_key',
                'KeyType' : 'HASH',
            },
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': '5',
            'WriteCapacityUnits': '5',
        },
        TableName='mytable',
    )
    new_table.wait_until_exists()
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)

But if you've created a set of modules to do this, the code could look like:

from aws_functions import check_table, create_table, bulk_insert

table_name = 'mytable'
region = 'us-east-1'
if not check_table(table_name, region):
    create_table(table_name, region, 'partition_key')

bulk_insert(table_name, region, items)

Nice and short, right? If you are using DynamoDB to do a lot of work, it is really necessary to write those modules to make your code more readable and maintainable.

And in the modules, you can allow for different variables in the tables. For instance, when you create new DynamoDB tables, they can have one 'primary' key (called a 'partition' key) or two, (the second is called a 'sort' key.) You also can include indexes which allow for more easy search and retrieval. Like this:

def create_table(**kwargs):
    """
    This creates a new table in Dynamo
    args: table_name, region_name, partition_key, sort_key, throughput, indexes, attributedefs
    """
    table_name = kwargs['table_name']
    region_name = kwargs['region_name']
    profile = kwargs['profile']
    partition_key = kwargs['partition_key']
    if 'sort_key' in kwargs.keys():
        sort_key = kwargs['sort_key']
    else:
        sort_key = ''
    if 'throughput' in kwargs.keys():
        throughput = kwargs['throughput']
    else:
        throughput = '5'
    if 'indexes' in kwargs.keys():
        indexes = kwargs['indexes']
    else:
        indexes = ''
    if 'attributedefs' in kwargs.keys():
        attributedefs = kwargs['attributedefs']
    else:
        attributedefs = ''

    print ("creating necessary table...")
    session = boto3.session.Session(profile_name=profile)
    dynamodb = session.resource('dynamodb', region_name=region_name)

    throughput = int(throughput)
    if indexes:
        for index in indexes:
            index['ProvisionedThroughput']['ReadCapacityUnits'] = int(index['ProvisionedThroughput']['ReadCapacityUnits'])
            index['ProvisionedThroughput']['WriteCapacityUnits'] = int(index['ProvisionedThroughput']['WriteCapacityUnits'])

    if not sort_key and not indexes:
        new_table = dynamodb.create_table(
            AttributeDefinitions=[
                {
                    'AttributeName' : partition_key,
                    'AttributeType' : 'S',
                },
            ],
            KeySchema=[
                {
                    'AttributeName' : partition_key,
                    'KeyType' : 'HASH',
                },
            ],
            ProvisionedThroughput={
                'ReadCapacityUnits': throughput,
                'WriteCapacityUnits': throughput,
            },
            TableName=table_name,
        )
    elif not indexes and sort_key:
        new_table = dynamodb.create_table(
            AttributeDefinitions=[
                {
                    'AttributeName' : partition_key,
                    'AttributeType' : 'S',
                },
                {
                    'AttributeName' : sort_key,
                    'AttributeType' : 'S',
                }
            ],
            KeySchema=[
                {
                    'AttributeName' : partition_key,
                    'KeyType' : 'HASH',
                },
                {
                    'AttributeName' : sort_key,
                    'KeyType' : 'RANGE',
                }
             ],
            ProvisionedThroughput={
                'ReadCapacityUnits': throughput,
                'WriteCapacityUnits': throughput,
            },
            TableName=table_name,
        )
    elif not sort_key and indexes:
        if partition_key == 'Master_UUID': # master table
            attributedefs.append(
                {
                    "AttributeName" : 'Master_UUID',
                    "AttributeType" : "S"
                }
            )
        try:
            new_table = dynamodb.create_table(
                AttributeDefinitions=attributedefs,
                KeySchema=[
                    {
                        'AttributeName' : partition_key,
                        'KeyType' : 'HASH',
                    },
                ],
                GlobalSecondaryIndexes=indexes,
                ProvisionedThroughput={
                    'ReadCapacityUnits': throughput,
                    'WriteCapacityUnits': throughput,
                },
                TableName=table_name,
            )
        except:
            error = sys.exc_info()
            print ('Dynamo Create Table Error:', error)
    else: #index and sort_key
        if partition_key == 'Master_UUID': # master table
            attributedefs.append(
                {
                    "AttributeName" : 'Master_UUID',
                    "AttributeType" : "S"
                }
            )
            attributedefs.append(
                {
                    'AttributeName' : sort_key,
                    'AttributeType' : 'S',
                }
            )
        else:
            attributedefs.append(
                {
                    'AttributeName' : sort_key,
                    'AttributeType' : 'S',
                }
            )
        try:
            new_table = dynamodb.create_table(
                AttributeDefinitions=attributedefs,
                KeySchema=[
                    {
                        'AttributeName' : partition_key,
                        'KeyType' : 'HASH',
                    },
                    {
                        'AttributeName' : sort_key,
                        'KeyType' : 'RANGE',
                    }
                ],
                ProvisionedThroughput={
                    'ReadCapacityUnits': throughput,
                    'WriteCapacityUnits': throughput,
                },
                TableName=table_name,
                GlobalSecondaryIndexes=indexes,
            )
        except:
            error = sys.exc_info()
            print ('Dynamo Create Table Error:', error)

    print ("Waiting for table creation...")
    new_table.wait_until_exists()
    print ("Table Created!")  
    return

DynamoDB is a powerful and flexible NoSQL system - and one of the advantages of using it over MongoDB, for instance, is that you don't have to manage it. It's powerful enough that it basically is the database behind Amazon.com, so I think it probably can be used for many different use cases. This is not to say it doesn't have it's limitations - it certainly does, especially when you are trying to do fast search and retrieves - SQL databases are really best designed for that.

links

social