Snowflake Database: An Effective Cloud-Native Data Warehousing Platform

Anurag Sharma
December 11, 2023

What is a Snowflake?

Snowflake is a cloud-based data warehouse that was created by three data warehousing experts in 2012 who formerly worked at Oracle Corporation. The Snowflake data warehouse is a cloud-based Analytical data warehouse that is offered as Software-as-a-service. Snowflake architecture is different from traditional data warehousing technologies like SQL Server, Teradata, Oracle, and cloud data warehouses like AWS Redshift and Google Big Query.

How is Snowflake Special?

Data warehouses typically use either Shared Disk or Shared Nothing architecture. Multiple nodes are used in Shared disk architectures to access data shared on a central storage system, while a portion of the data is stored in each node/cluster in a Shared Nothing architecture. Snowflake combines both architectures and creates a hybrid architecture.

Snowflake employs a centralized storage layer for data persistence that is available to all computing nodes. Snowflake also uses Massively Parallel Processing (MPP) clusters to process queries, with each node storing a fraction of the whole data locally.

Snowflake’s data architecture consists of three layers:

  1. Storage
  2. Compute/Query Processing
  3. Cloud Services

Each layer can scale independently and includes built-in redundancy.

  Fig: Snowflake architecture showing the different layers.

 

How does it work?

Storage Layer: Snowflake stores data in databases. A database is a logical group of objects consisting primarily of tables and views organized into schemas. Snowflake supports structured relational data in the form of tables using standard SQL data types. Additionally, Snowflake’s variant data type stores semi-structured non-relational data such as JSON, parquet, etc. ANSI standard SQL is used to perform data-related tasks for all datatypes.

Snowflake uses secure cloud storage to maintain data. Snowflake converts the stored data into a compressed format and encrypts it using AES 256 encryption.

Compute Layer: This is the layer where queries are executed using resources provisioned from a cloud provider. Unlike conventional data warehouses, Snowflake creates independent compute clusters called virtual warehouses that can access the same data storage layer without compromising performance.

To create a virtual warehouse, we can simply give it a name and specify a size. Snowflake automatically handles the provisioning and configuration of the underlying computational resources. There is no downtime when scaling up or down a virtual warehouse. Anytime a virtual warehouse is resized, the extra resources are available for use by any subsequent queries. Snowflake’s architecture also enables read/write concurrency without any resource contention. For instance, separate virtual warehouses can be used for loading and querying simultaneously. As all virtual warehouses access the same data storage layer, inserts and updates are immediately available to other warehouses.

Cloud Services Layer: This layer manages the entire system. It authenticates users, secures data, manages sessions, and performs query compilation and optimization. This layer also coordinates the data storage updates and access, to ensure all virtual warehouses can see the latest data instantaneously once a transaction is completed. A vital part of this layer is the metadata store as it enables many features like time travel, zero-copy cloning and data sharing. Snowflake maintains the services layer using resources distributed across multiple zones to ensure high availability.

Connecting to Snowflake is pretty easy using clients such as the JDBC or ODBC drivers. Snowflake also provides a web interface and a command-line client.

What do you need to manage Snowflake?

Most of the criteria that traditional data warehouses use to adjust performance are eliminated by Snowflake. You only need to virtual warehouses, databases, and tables, load data, and run queries. Snowflake handles everything else.

How much does Snowflake cost?

Pricing is based on usage. Just pay for the computational and storage resources that are used. Storage costs are determined by the amount of compressed data stored in database tables, and the data retained to support Snowflake’s data recovery features. Compute prices are based on the warehouse size and how long the warehouse runs.

The purpose of this post is to provide a broad overview of data analytics stacks. Please get in touch with us if our analytics service and skills-as-a-service pique your interest.

Images credit: sarasanalytics.com

Ready to get started?

From global engineering and IT departments to solo data analysts, DataTheta has solutions for every team.