Lightning-fast Data Transfer Between Tellius and Snowflake, Powered by Apache Arrow

Introduction

Many of our largest customers run Tellius’s AI-powered insights, natural language searches, and data prep on top of Snowflake’s cloud data warehouse every day. Amongst our 30+ out-of-the-box data connectors including to other cloud data warehouses like Redshift and BigQuery, Snowflake is extremely popular. Why? In a previous post we highlighted the four key reasons Tellius excels on Snowflake such as the ability to push down Tellius search and visualization queries directly to the underlying Snowflake database without data movement or copy — today, we’d like to highlight a fifth reason to love Tellius on Snowflake — lightning-fast data transfer via Apache Arrow, a rising open-source technology that has been downloaded over 20 million times per month. In this post, we’ll discuss our integration with Apache Arrow and how it expedites data transfers to Snowflake from Tellius up to 3x faster than earlier versions. At the end, we hope you’ll take Tellius for a spin yourself to see how quickly you can start adding AI-powered analytics value to your business. Ready to get going?

 

5 Reasons to love tellius

TLDR: Tellius and Snowflake pair nicely 

Tellius Snowflake Connector

In our most recent release we made major improvements to how to bring data from Snowflake to Tellius which has improved the connectivity and scalability aspects of the connector.

The Tellius Snowflake Connector uses the Spark Snowflake Connector to load data from Snowflake to the Tellius In-Memory Compute Engine (ICE). The Tellius ICE Engine allows users to automate complex analytical processes and find key drivers behind business data at scale.  Utilizing the Spark Snowflake connector allows Tellius to load in a scalable manner to it’s ICE engine, allowing customers to bring TB’s of data from their Snowflake system to Tellius’s system without data movement or copy.

Previous Limitations of Snowflake Spark Connector

In earlier versions of Tellius, we used to use the Snowflake connector version 2.5.x which had the below limitations:

  • Before loading data to Tellius, for large data, Snowflake used to dump the data into CSV or JSON format to the staging area. Writing to the staging area meant write access to the underneath database. Customers were not comfortable with that. Even though Tellius will not write to the source system by default, we needed write access due to this connector behavior
  • CSV and JSON are not the best formats to load the data as they explode the data much bigger than original datasets. This affected the load time in Tellius.

Apache Arrow

Apache Arrow is a cross-language, cross-platform, columnar in-memory data format for data which allows for zero-copy reads for lightning-fast data access without serialization overhead. This format allows for efficient sharing of data between two large scale systems and is organized for efficient analytic operations on modern hardware like CPUs and GPUs. More information about Apache Arrow is available below

https://arrow.apache.org/

Adoption of Apache Arrow in Snowflake

From Spark Connector Version 2.6.0, Snowflake has adopted Apache Arrow as its standard communication. This change makes it possible to read the data from Snowflake without a staging area and in a much more efficient approach than CSV or JSON.

https://www.snowflake.com/blog/snowflake-connector-for-spark-version-2-6-turbocharges-reads-with-apache-arrow/

Bringing Apache Arrow Improvements to Tellius

From Tellius 3.0, Tellius uses Arrow Format as the standard way for caching in ICE engines. In addition, the underlying Snowflake connector has been upgraded to the latest version to use all the new improvements.

This upgrade has improved Tellius user experience as customers no longer need to give write access to their underneath system. Finally, with Apache Arrow support both in Snowflake and Tellius ICE engine loads are up to 3x faster than earlier versions.

Ready to take Tellius for a spin yourself? Try a 14 day free trial to experience the power of AI-powered analytics in your business.

share

Read Similar Posts

  • Diagnostic Analytics: Using AI to Get to the 'Why' Behind What Happened
    Deep Dive

    Diagnostic Analytics: Using AI to Get to the 'Why' Behind What Happened

    Here are the key forms of diagnostic analytics, the limitations of traditional manual approaches, and the benefits of AI-powered diagnostic analytics.

    Tellius
  • Tellius Kaiya: Welcome to the Next Generation of Self-Service
    Deep Dive

    Tellius Kaiya: Welcome to the Next Generation of Self-Service

    Introducing Tellius Kaiya: A suite of GenAI-powered tools designed to democratize data-driven decision-making for the entire organization.

    Tellius
  • Navigating Generative AI in Pharma: Insights from Top Industry Leaders
    Deep Dive

    Navigating Generative AI in Pharma: Insights from Top Industry Leaders

    During a PMSA 2024 Q&A session, leaders from Novartis, Pfizer, UCB Biopharma, and Boehringer-Ingelheim covered the current state of GenAI and its immense potential to transform the pharmaceuticals industry. So, what's in store?

    Tellius