Outstanding Solutions in
Azure Cosmos DB
Azure Cosmos DB
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure regions worldwide; and take advantage of fast, single-digit-millisecond data access, backed by a SLA guarantee.
The specialists at Oscore have been using Cosmos DB since back when it was called Document DB. We’ve stored petabytes of data for clients in various regions across the globe. If you’re already using Cosmos DB and looking to optimize or refactor, or if you’re just considering building a system around Cosmos DB, then please get in touch.
Deeper Dive and Top Tips
Cosmos DB (formerly known as DocumentDB) is Microsoft’s flagship cloud-based database service. It offers immense scalability, one-click global distribution, and guaranteed low-latency backed by SLA.
Uniquely amongst database offerings, it is “multi-model”, meaning that is can operate as a NoSql database, a columnar datastore, a graph database, and several other styles. While this is clearly a towering achievement of technical architecture, it isn’t the feature that really drives adoption; after all, if you want a graph database, why not just use a “native” graph database?
But what is a huge draw and completely unbeatable across the entire market of database as a platform solutions is the ease with which the system can be configured to scale and replicate, providing secure geo-redundant mass storage combined with proximity and therefore super fast access for end users.
Most of the Microsoft Customer Stories are giants such as Mercedes Benz, Avanade, Chipotle, Siemens and Exxon Mobil, but in fact Cosmos DB is perfectly suited to start-ups with plans to go global, as well as small to medium sized businesses looking to implement a next-generation capability for their customers.
The platform is available for free for the first 400 RU/s (request units) and up to 5GB of storage, so it’s straightforward and economical to start experimenting with. However, it is certainly not straightforward to architect a solution that is in “mechanical sympathy” with the capabilities of the platform. Firstly, the sheer breadth of the system and it’s configurability can be overwhelming. Secondly, there are lots of design decisions that need to be made upfront – especially the infamous Partition Key decision. Thirdly, there’s the fact that there are lots of terms and concepts which are verbally the same as in other database systems but fundamentally different – transactions, triggers, stored procedures, unique indexes and plenty of others.
If you’re just starting out on your Cosmos journey, or if you’re already using it but keep thinking “use case X would be simple on database platform Y, why the heck can’t I manage it with Cosmos?” then here are some key tips to get you on the right track.
Tip 01: Cosmos DB offers a SQL API, but you must avoid architecting your solution using relational database patterns.
If your business or development team hail from a SQL/relational database background, then it is critical that you absorb the design patterns of the NoSql world and not simply attempt to implement relational design patterns in Cosmos DB. In particular:
- Cosmos DB has features that initially look like analogues and namesakes of relational concepts but are completely different under the hood.
- Collection != Table. Creating one collection per entity is inefficient and unmanageable in Cosmos DB and it is commonplace and best practice of have multiple distinct entities stored in a single collection, as long as it makes sense for them to share the same “partition key” (see Tip 2 below). If you’re moving from SQL, this will feel almost incomprehensible – collections seem like tables, and you would never put different entities into the same table.
- Best practice generally dictates storing completely distinct entities within the same collection as long as they share the same “partition key”.
- Creating parent/child relationships using “foreign keys” à la relational database is often a recipe for performance nightmares.
- Embedding related objects into parent objects is a commonplace in Cosmos DB “schema” design. Redundant copies of data are often the best way to solve a problem, in a way that is almost never the case in a relational database.
- Say goodbye to relationships, referential integrity, unique indexes, transactions, database constraints – these factors that sit at the heart of SQL or relational databases are non-existent in CosmosDB (or exist in a very limited way). Your application code is going to have to work much harder to ensure that only valid data enters the system, but then again, likely your application code is written in a highly evolved language like Python or C#/.Net Core and you actually have much more scope for sophisticated integrity and validation checks.
Bottom line: To take full advantage of Cosmos DB requires a mindset change that can be awkward, but which will ultimately open up whole new horizons of application design.
Tip 02: Recognize that partitioning is a different beast in Cosmos DB
Table partitioning exists in relational databases, but in Cosmos DB collection partitioning is a much more central concept. Every collection must be partitioned, and Cosmos DB delivers unlimited capacity if your partitioning scheme is correctly planned. Conversely, a few bad choices around partitioning can hamstring your system for years to come.
- Containers are the inherent drivers of scalability. Data and throughput are partitioned based on the key you specify for the container.
- A good partition key allows fast retrieval of related entities while ensuring that writes are evenly distributed.
- If you require transactionality in your Cosmos DB database, the entities you wish to participate in a transaction must be in the same collection and must be share the same partition key. According to Microsoft’s partitioning guidance, a logical partition defines the scope of the database transactions and can be updated using a transaction with snapshot isolation. If the entities you wish to participate in a transaction are in different collections or do not share the same partition key, then you cannot use Cosmos DB transactions.
Of course, it isn’t always possible to group your entities in such a way. Broadly speaking if your use cases require strong guarantees of ACID transactions, then either you will need to write some really quite sophisticated application code, or think hard about using a different platform. At Oscore, we’ve found that Azure SQL Server offers a compelling combination of json storage within a transactional framework, but that’s a story for another article (or contact us if you’d like to know more).
Tip 03: Cosmos DB has a unique, provisioning and utilization based pricing model, which can get pricy if not designed for specifically.
Cosmos DB bills for provisioned throughput and consumed storage. Provisioned throughput is measured in units called Request Units (RUs), which represents an amount of computing power. Your actual operations such as inserts, reads, replaces, upsets and queries each consume some amount of RUs. If you exceed your provisioned throughput then your query will fail with a 429 and you need to retry. So clearly you need provision enough throughput for your expected utilitization. Although provision more throughput that you need and you'll find the bills mounting. For storage is billed for every GB used for SSD-backed data and index. Generally speaking, throughput is the cost you need to optimize.
Provisioning can be done at the container level or at the database level (typically for smaller cosmos DB installations). Microsoft has recently introduced “Autopilot mode” where provisioning can be automatically adjusted within user-defined boundaries at either the container or the database level. At Oscore we are currently experimenting with Autopilot mode, and will be writing up our findings shortly, or contact us if you’d like to discuss where we’re at.
What can be very surprising in the early stages of Cosmos DB adoption is the types of operations that consume a lot of RU.
- Queries that can be resolved within a single collection and a single partition key have very low RU utilization. Queries that cannot be resolved within a single partition key generally need to inspect every partition, which obviously requires much more computing power.
- Aggregate queries can read a lot of data and therefore consume significant RUs.
- Making minor updates to large quantities of large objects will require the whole object to be read in and the whole object to be written out. So, changing a single boolean value on a large object is a large operation, not a tiny one.
- Queries issued though the Cosmos SDK with predicates that cannot be sent to the Cosmos server can require use amounts of data to be read down.
Be prepared for a lot of analysis around the RUs being consumed by your queries!
If you are provisioning by container, then be sure to optimize the mapping of entities to containers. If you go for one entity per container, which is probably a bad idea anyway as per Tip 1, you’ll surely end up overpaying for provisioned throughput.
It’s very hard to get an idea of real world costs for your installation without understanding the RU utilization. But once you have a sense of that, to work out the costs, you can use the Cosmos DB capacity planner as it will help you estimate your storage requirements effectively. You can also use the Monitor Azure Cosmos DB tool to set up alerts if you’re heading upwards on storage.
A good choice of partition key and sensible groupings of entities into containers can dramatically reduce your RUs and therefore costs.
And finally ...
Even given the above insights and the wealth of information available on the web, it can still be tricky to start off in the right direction or change direction once development is in progress. Consider collaborating with a partner like Oscore with deep experience in the domain. Reach out to us today to chat about your project and requirements.