Lightning-fast Data Transfer Between Tellius and Snowflake, Powered by Apache Arrow

Introduction

Many of our largest customers run Tellius’s AI-powered insights, natural language searches, and data prep on top of Snowflake’s cloud data warehouse every day. Amongst our 30+ out-of-the-box data connectors including to other cloud data warehouses like Redshift and BigQuery, Snowflake is extremely popular. Why? In a previous post we highlighted the four key reasons Tellius excels on Snowflake such as the ability to push down Tellius search and visualization queries directly to the underlying Snowflake database without data movement or copy — today, we’d like to highlight a fifth reason to love Tellius on Snowflake — lightning-fast data transfer via Apache Arrow, a rising open-source technology that has been downloaded over 20 million times per month. In this post, we’ll discuss our integration with Apache Arrow and how it expedites data transfers to Snowflake from Tellius up to 3x faster than earlier versions. At the end, we hope you’ll take Tellius for a spin yourself to see how quickly you can start adding AI-powered analytics value to your business. Ready to get going?

 

5 Reasons to love tellius

TLDR: Tellius and Snowflake pair nicely 

Tellius Snowflake Connector

In our most recent release we made major improvements to how to bring data from Snowflake to Tellius which has improved the connectivity and scalability aspects of the connector.

The Tellius Snowflake Connector uses the Spark Snowflake Connector to load data from Snowflake to the Tellius In-Memory Compute Engine (ICE). The Tellius ICE Engine allows users to automate complex analytical processes and find key drivers behind business data at scale.  Utilizing the Spark Snowflake connector allows Tellius to load in a scalable manner to it’s ICE engine, allowing customers to bring TB’s of data from their Snowflake system to Tellius’s system without data movement or copy.

Previous Limitations of Snowflake Spark Connector

In earlier versions of Tellius, we used to use the Snowflake connector version 2.5.x which had the below limitations:

  • Before loading data to Tellius, for large data, Snowflake used to dump the data into CSV or JSON format to the staging area. Writing to the staging area meant write access to the underneath database. Customers were not comfortable with that. Even though Tellius will not write to the source system by default, we needed write access due to this connector behavior
  • CSV and JSON are not the best formats to load the data as they explode the data much bigger than original datasets. This affected the load time in Tellius.

Apache Arrow

Apache Arrow is a cross-language, cross-platform, columnar in-memory data format for data which allows for zero-copy reads for lightning-fast data access without serialization overhead. This format allows for efficient sharing of data between two large scale systems and is organized for efficient analytic operations on modern hardware like CPUs and GPUs. More information about Apache Arrow is available below

https://arrow.apache.org/

Adoption of Apache Arrow in Snowflake

From Spark Connector Version 2.6.0, Snowflake has adopted Apache Arrow as its standard communication. This change makes it possible to read the data from Snowflake without a staging area and in a much more efficient approach than CSV or JSON.

https://www.snowflake.com/blog/snowflake-connector-for-spark-version-2-6-turbocharges-reads-with-apache-arrow/

Bringing Apache Arrow Improvements to Tellius

From Tellius 3.0, Tellius uses Arrow Format as the standard way for caching in ICE engines. In addition, the underlying Snowflake connector has been upgraded to the latest version to use all the new improvements.

This upgrade has improved Tellius user experience as customers no longer need to give write access to their underneath system. Finally, with Apache Arrow support both in Snowflake and Tellius ICE engine loads are up to 3x faster than earlier versions.

Ready to take Tellius for a spin yourself? Try a 14 day free trial to experience the power of AI-powered analytics in your business.

share

Read Similar Posts

  • Enabling Enterprise Decision-Making with AI Analytics: Lessons from eBay and AbbVie
    Deep Dive

    Enabling Enterprise Decision-Making with AI Analytics: Lessons from eBay and AbbVie

    Learn how eBay and AbbVie are turning to AI-powered analytics to unlock insights, streamline processes, and, ultimately, drive growth for their respective organizations.

    Tellius
  • 7 Takeaways from the Gartner Data & Analytics Summit 2024
    Deep Dive

    7 Takeaways from the Gartner Data & Analytics Summit 2024

    The 2024 Gartner Data & Analytics Summit was a jam-packed three days of sessions and networking opportunities for data & analytics leaders. Here's what Tellius took away from the event.

    Tellius
  • 5 Common Pitfalls to Avoid When Launching a Self-Service Analytics Program
    Deep Dive

    5 Common Pitfalls to Avoid When Launching a Self-Service Analytics Program

    Here are some common pitfalls we've seen for organizations launching a self-serve analytics vision—and how to avoid them to maximize your odds of success.

    Tellius