Updated: Jun 8, 2021
Introducing Arctica.ai -- the clever snowy owl that keeps tabs on your data lake costs, highlights hidden patterns and provides you with insights into your Snowflake usage.
Arctica.ai is a usage and cost monitoring solution, developed by experienced data engineers from Vision.bi (a Snowflake ELITE partner). It was designed with Snowflake customers in mind, with a sole purpose of helping Snowflake users to track and control their costs.
Snowflake as a centralized data platform
Snowflake is by far one of the most interesting data platforms out there. After 20 years of data engineering, with more than 200 data solutions on our belt -- built on almost every possible stack -- our last ~40 projects built on Snowflake Data Platform were the shortest, the easiest and proved to be the most stable and robust of all. It just works.
Snowflake is a shining example of a cloud data platform. A single solution that provides all your data needs. When relying solely on AWS stack, your solution will probably be a mix of Athena, Redshift and EMR. Using Azure, one would typically combine Synapse, Custo and HDInsights, while Google BigQuery still prompts you to choose between standard or legacy SQL.
Using Snowflake, all this is simplified. During the past eight years, Snowflake has created a mature data platform that significantly reduces the implementation complexity as well as maintenance efforts. This allows you to focus on the things that matter most -- your data and your business.
Undoubtedly, in the upcoming years we are about to witness other vendors shift from fully managed (or partially managed) solutions to SaaS data platforms, thus avoiding many of the intricacies of managing and maintaining large scale data environments.
Why have a centralized solution?
A centralized data platform simplifies your data solution in many ways. Without going into too much technical details, some of the advantages are:
Data - easily integrate data in all forms, shapes and sizes in the same place using standard SQL. Combining raw events with structured and modeled data lead to better integrations. Data is more accessible and data quality improves.
Security and Compliance - a centralized solution makes it easier to manage users and roles. Using a single data source also makes it easier to monitor and audit data access and comply with GDPR and other regulations. Data exists in one place and can easily be deleted or encrypted when needed. Now with Dynamic data masking and RLS (Row level security) that was announced recently it’s even easier and more secured.
Maintenance - SaaS platforms significantly lowers maintenance efforts. Your site reliability engineers shouldn’t be spending their time maintaining data warehouse servers or Hadoop clusters. These efforts should be invested in attending to your business solutions and products.
Cost - a significant benefit of using a single data platform is that you get a centralized view of your costs. This lets you understand the costs at the most granular levels -- users, tables and even specific queries. Tagging cost parameters and translating it to “business language” can also provide you with the most accurate ROI for each of your business use cases (and understanding business ROI is the foundation of data platforms cost efficiency measurements).
This is where Arctica comes in. Arctica is all about cost management and cost visibility. Instead of tables and databases we show business use cases, instead of users and roles we talk about business units and instead of CPUs and servers we tally up dollars.
Generally speaking, when activity is done in a single platform, you gain full control over its usage. However, the lowest granularity displayed by Snowflake is warehouse/hour. This is definitely good enough for high-level cost analysis but every so often you need to look deeper into what was actually running during that hour.
There are many visualization solutions on top of the Snowflake repository -- from large and popular vendors like Tableau and Qlik to niche players such as Looker and Sigma. They all provide nicely displayed reports over Snowflake usage data, but they only show what you already know (and can query yourself).
At Arctica we took a slightly different approach. We analyze your usage patterns, parse your queries (meta data) and generate new data to provide new insights. We allow users to write custom rules that translate technical entities into business language. Furthermore, you can define thresholds and alerts to make sure everything is running as expected.
In a typical usage scenario, most of your data lake costs are derived from compute usage. Here is how you can analyze it using Arctica:
Cost by Use Case
An effective cost monitoring requires the right definition of business use-cases. A business use-case can be a line of business, a business application, internal BI etc.
Keep in mind that use-cases may be added, changed or even dropped over time. You may have a hard time understanding whether your costs are up due to newly added use-cases or because of a natural growth of an existing business line.
Using Snowflake, a somewhat naive approach would pair use-cases with warehouses for easy cost and usage monitoring. However, some use-cases are not as straightforward and may be defined as a combination of database objects (warehouses, roles, schemas etc.). Arctica calculates cost at the most granular level and by that allows you to group costs as a composition of various attributes.
Users, Roles and business units
The next important dimension is the Users, Roles and Business Units. Analyzing the cost by roles and groups allows you to split the costs between business units, and manage your budget in an effective way.
This also proves valuable on implementations of analytics platforms, making sure users are actually using it, and getting value from it.
Cost by table and business process
Parsing SQL queries allows us to understand the cost by table/entity. Data may be grouped by business process, thus displaying costs at the table level. Which are our most expensive tables and processes? How often are scheduled to run? Do they worth the cost? Can we change the loading frequency to reduce some costs? With Arctica, these questions are answered and decisions are simple.
Tag your own attributes
Reporting tools such as Tableau or Looker (or any other) typically connect using the same admin username. This prevents you from understanding who actually triggered the query and who is responsible for the database costs. Adding your own attributes to the query (aka “tagging”) allows you to attach the cost to the end user.
Examples of custom dimensions
Tag by ENV - Dev vs Prod (Stage - staging CHP)
Tag by - System vs Users
Tag by Tenant
For multi tenant solutions, Arctica provides a billing solution that enables you to understand your operational costs, and the cost of each of your customers.
Just like tagging end users by BI tools, you may bind a tenant to specific queries and get the linked costs of this tenant. Obviously, some of your costs are cross-tenants (e.g. data processing), but these costs are usually consistent so you can decide how to split them between tenants based on your business model.
Compare to benchmarks
Investigating usage patterns of one of our customers, we’ve noticed that 33% of the account costs were associated with DELETE operations. In a different case, we discovered that 90% of the cost came from data ingestion and transformations but only 10% from actually querying the data (getting value). Although in some cases these ratios are Inevitable, in most cases it is a symptom of implementation issues.
KPIs analysis at the table/process level may provide significant valua. What is the right balance between data processing and data analyzing? Benchmarks can help!
Get Alerts on Time
Our Digests module trigger events when consumption is changed over time. No more surprises by the end of the month.
Coming Next - Storage Analysis
Although storage is usually your last of your concerns there are many insights you can learn from it. Which are your top growing tables, how much are you spending on DR, Fail safe and Time travel and many other cool Snowflake features that you just need to keep an eye on.
To visit Arctica’s demo environment please register at https://arctica.ai
Feel free to drop us a message for a deeper demo and see how you can start tracking your cost using Arctica. email@example.com