A couple weeks ago I wrote about making our reports take a couple seconds instead of 3 minutes. What I discovered later is that we didn’t actually have access to historical reports, because all the DynamoDB entries that pointed to the S3 data behind those reports had a TTL of one day. After asking around, the reason was simple: some partition keys were exceeding 10GB, and that’s the DynamoDB item collection limit per partition key (aka “all items with the same partition key”). So the “solution” was to delete everything fast enough that we never hit the limit. Great.
So I created a task for myself to fix it properly. I started by asking AI for the best approaches, and it basically gave me “sharding” plus a bunch of nonsense that didn’t apply. Which, to be fair, is also what humans do when they don’t want to think about DynamoDB.
Sharding (what it actually means)
Sharding in DynamoDB is basically admitting that your partition key is doing too much work, then splitting it into multiple keys so DynamoDB can spread the data across more partitions.
Instead of:
You do:
PK = customerId#shard_00
PK = customerId#shard_01
- …
PK = customerId#shard_N
And you pick the shard deterministically, usually based on something like a hash, or a time bucket, or both. The goal is that you don’t end up with one key that holds infinite data and gets hammered, while the rest of the table sits there bored.
I wasn’t a fan of it at first because reads become “scatter-gather.” If you need a month of data and you shard by day, that’s 30 queries. If you shard by hash, you might need N queries to cover all shards. Either way, you’re paying with complexity and more round trips.
The “recommended” time-series approach (and why I didn’t do it)
Then I found AWS’s recommended time-series design.
It does make sense: you bucket time so one partition key doesn’t grow forever. But once you implement it, you still end up in “multiple queries” land, and you also end up needing more glue logic around it (how do we pick buckets, how do we query ranges, how do we backfill, how do we avoid edge cases, etc).
Also, depending on which version of the pattern you choose, you may end up needing scheduled workflows or background jobs to manage things. And of course those can fail. I didn’t want to introduce more “scheduler + automation + permissions + deploy” complexity unless I had to.
So I decided against it.
What I really wanted to do (Glue + Athena)
What I really wanted was Glue + Athena for this. It just makes more sense for historical/reporting-ish queries, especially when the data already lives in S3 and you can use partitions to restrict the amount of data scanned instead of trying to force DynamoDB to be a data lake.
The problem is that our access patterns weren’t just “by primary key.” We also rely on GSIs, and once you go down the route of “let’s rebuild this access pattern on top of S3,” you’re basically signing up for a lot more infrastructure. And I’ve already suffered enough doing permissions for Glue jobs in CDK to avoid creating new Glue-based pipelines unless it’s absolutely necessary.
So I didn’t do that either.
So… I ended up with sharding
Yes, doing N queries for each time range is annoying. But we can parallelize it, and we can also put reasonable limits on what users can query (last month, last 3 months, etc). And the actual implementation didn’t take long.
The trickiest part was the transition. I didn’t want to break anything, especially with TTL involved.
So I pushed a change with dual-reads first (read from the old non-sharded key and the new sharded keys). After that was deployed everywhere, I pushed the change to start doing the writes to the sharded structure. I haven’t been paged yet, so that’s a success.
Next step is deleting the non-sharded reads, but I’m waiting a few days to make sure TTL expires the old items and we’re not depending on them anywhere. The hard part is behind me.
Your turn
What DynamoDB limitations have you run into that forced you to redesign something?
Cheers!
Evgeny Urubkov (@codevev)