An overview on Azure’s NoSQL Cosmos DB

Vinay Kumar
March 6, 2024

Azure Cosmos DB is a fully managed platform-as-a-service (PaaS). Offers NoSQL and relational database to build low-latency and high available applications with support to multiple data stores like relational, document, vector, key-value, graph, and table.  Azure Cosmos DB offers single-digit millisecond response times, high scalability. Guaranteed SLA-backed availability and enterprise-grade security.

Global distribution: Cosmos DB is a globally distributed database that allows users to read or write from multiple regions across the world. Helps to build low latency, high availability applications. Cosmos DB replicates the data across the globe with guaranteed consistency levels. Azure Cosmos DB offers 99.999% read and write availability for multi-region databases.

 

Consistency levels: Azure cosmos DB supports 5 different consistent levels.

  • Strong: Linearizable reads.
  • Bounded staleness: Consistent Prefix. Reads lag behind writes by k prefixes or t interval.
  • Session: Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads.
  • Consistent prefix: Updates returned are some prefixes of all the updates, with no gaps.
  • Eventual: Eventual

 

 

Cosmos DB resource hierarchy:

A Cosmos DB account can hold multiple databases. A Database can hold multiple containers.

 

 

Data is stored in containers. Each container contains a partition key. Partition key helps to distribute the data across all partitions equally. Partition key should be selected cautiously because choosing a wrong partition key will increase the consumption of RUs. The easiest way to determine the partition key is the field that will be used on your WHERE clause. Data is stored in physical partitions; Cosmos DB abstracts the physical partitions into logical partitions. If a container contains 10 distinct partition values, 10 logical partitions are created. Each physical partition is replicated at least 4 times to increase availability and durability.

Containers are schema-agnostic which means items in containers can be of different schema but with same partition key. All items are indexed automatically, a custom index policy is also available.

 

 

Pricing: Azure cosmos DB calculates all the database operations in Request Units (RU’s) irrespective of the API. One request unit equals to 1KB of item read using a partition key and ID value.

There are three modes we can use to setup the cosmos DB.

  • Provisioned Throughput: A fixed number of RUs per second is assigned to the Cosmos DB based on the requirement.
  • Serverless: No assignment needed, billed based on the consumption. Serverless mode comes with some limitations like single region only, can store maximum of 1TB, RUs ranges between 5000-20000.
  • Auto scale: Auto scales based on the consumption. Suitable for building scalable high available applications with unpredictable traffic. No need to handle rate limiting operations.

 

Cosmos DB emulator: Cosmos DB also offers an emulator that can be installed on your local system. Emulator comes with limited features and can be used for developing and testing applications locally without creating an actual cloud account.  Fixed RU’s, fixed consistency levels and supporting only NoSQL API are few on the limited features.

 

Follow us for more such updates.

Ready to get started?

From global engineering and IT departments to solo data analysts, DataTheta has solutions for every team.