“BI is changing” is yesterday’s news. “How is BI changing?” is the pressing question
Let’s assume you agree with the following: The Business Intelligence (BI) experience can, and should, be better. As the the needs of data analysts are changing, so too are the new digital technologies businesses are adopting, and the velocity and pressure that puts on organizations who strive to become “data driven” continue to mount. Legacy operational data systems are not being retired; rather, new ones are being added to support digital business models. Technologies and vendor options in the BI and AI space are growing more varied and nuanced.
With more decisions needed to be made in less time, analysts question where they should spend their time:
- Learn new tools to fill current tool gaps?
- Upskill data literacy as the prevalence of AI rises?
- Focus on business problems?
Analytics and AI initiatives are a team sport in any organization. Team organization, technologies used, and data flows determine how well companies achieve their goals. The more fragmented the skills, data, and tech landscape, the more time, money, and resources the “initiatives” take. The roles with the most hard technical skills (around data specifically) are in data engineering and data science teams. The largest populations of knowledge workers in companies — the data analysts and business users — have so far struggled with a highly fragmented analysis experience and have had a hard time collaborating within their own team and other stakeholders in the analytics and AI ecosystem.
In This Post
- Technology that Augments the Experience is Emerging
- The Paradigm Shift Occurring in the Data Analytics Stack
- Current Analytics Stack challenges:
- Goals of the Modern Data Analytics Stack:
- Tellius: Building the Insights Layer on the Modern Data Analytics Stack
- The Data Stack
- Data Sources
- Data Stitching: Pipelines and Data Mesh
- Data Lakes or Lakehouses (Combining Storage and Compute)
- The Compute Stack
- Query workloads consolidate
- Live connection to Data Lakes:
- The AI Compute Components
- The Applications
- Components for Application Layer
- Bringing it all together
- Next technology choice should compliment your current investments
Technology that Augments the Experience is Emerging
Augmented Analytics — the new BI that blends traditional BI with AI and Automation — aims at using these methods to alleviate the technical skill gaps in the analyst populations and help them spend the bulk of their time on solving problems based on intelligence from the data. By all means, analysts in organizations need to up their game in terms of data literacy, which is indeed an interesting organizational opportunity across most companies. However, today we will address what organizations can do from a technology perspective to augment their analysts and offer scalable, impactful decision intelligence to their users.
Technologies that use AI to lower analyst barriers to entry in using AI for decisions are emerging as the new wave of innovation. Analysts are empowered by the following:
- Natural Language Processing to make data exploration easier through a Google-like Search interaction
- AI and Automation for producing Intelligence and Insights narratives that would otherwise take massive amounts of manual work
- Efficiently collaborate with other stakeholders in the process like other data producers (data engineers and data scientists) or intelligence consumers (business users/leaders)
In essence, analysts can control a lot more of the data value chain and increase their impact on organizations. For a deeper dive in what the Augmented Analytics Experience can be, check out a previous posting on Making BI more Human with AI.
The Paradigm Shift Occurring in the Data Analytics Stack
I’ve heretofore outlined how the data consumption/BI experience is evolving, but one of the areas of innovation happening beneath the surface is the way the technology stack is coming together to make the experience possible, driven by efficiencies in data storage, movement, and computation, to meet the needs of a larger and more heterogeneous set of analytic problems in the enterprise. The design for these systems is starting to consider multiple elements and layers. The remainder of the write-up will focus on the technical challenges and opportunities of the Data Analytics Stack.
Current Analytics Stack challenges:
The way the current analytics stack is set up makes it nearly impossible to pivot towards the experience described above. Current shortcomings are due to the fact it was built reactively over time in a bottoms-ups approach without considering an end goal of providing more intelligence at the decision points in the organization. The following are the challenges introduced by this approach and also the opportunity areas that should be addressed.
- Data Processes Fragmentation: While data fragmentation in itself is not a major challenge and/or the natural-state for most companies, the process silos around data are raising barriers to updating and syncing data and ultimately hindering intelligence.
- Computational paradigms of Data Querying vs. Machine Learning are used separately within different parts of the stack and create a need for large integration efforts between reactive reporting and proactive predictive intelligence.
- Tools Fragmentation: Tools used for bespoke individual stakeholder functions are not optimized for collaboration
- Missing Insights Layer: As an augmentation to the BI experience, the Insights components provide a self-service way for analysts to drive their own AI-powered Insights from data beyond reports.
- Cost and Time-intensive processes for data movement, replication, compute, and integration.
If we are to look at this layout from a logical representation, the challenges listed become bottlenecks affecting the entire data-to-insights flow. Status quo worked for some time, but now that companies must scale, the drag brought by the data flows is hampering the ability to do so quickly. Outdated data warehouses that cannot react to new data modeling needs; Extract>Transform>Load (ETL) processes are being still used in most areas; blended and ungoverned low code (UIs) and heavy code (SQL and Python) data handling processes are slowing down the path to actionable insights.
In order to provide the Augmented Analytics experience, technology vendors and IT organizations are starting to rethink 1) the way they plan the data flows, 2) the computational engines needed for Augmented Analytics as well as 3) the applications and UX available to end users.
The end goal of investment in the Modern Data Analytics Stack needs to be rooted in the business outcomes: use data for better decisions across the organization. The true payoff in the data analytics stack occurs in the user experience where insights and decisions need to occur: business application layers.
Goals of the Modern Data Analytics Stack:
- Minimize data handoffs and replication between platforms
- Accommodate multiple scalable computational paradigms (Mixed-Workloads or Multimodal) to meet the needs of querying and machine learning as part of the same data-to-intelligence workflow
- Focus on Applications Integration: Applications that augment non-coders through automation of insights generation and increase collaboration with coders.
- Cost efficiencies: Simplify and organize the technology stack and data movement to optimize resource consumption of the system, without sacrificing user experience.
Tellius: Building the Insights Layer on the Modern Data Analytics Stack
We as the Product Team at Tellius have learned several lessons while building the missing insights layer of the Modern Data Analytics Stack. Our goal has been to design the Tellius application to achieve the Augmented Analytics experience, and therefore our internal architecture and the way components play in the larger ecosystem is critical. Enabling multiple compute workload profiles to support the application experience are things that we will touch on. We will present this from the context of the entire Modern Data Analytics Stack and how we fit in by detailing each component of the diagram below.
The Data Stack
There is a lot of innovation happening in the data layer of the Modern Data Analytics Stack. Updates are made across the board to data capture, storage, movement, governance, traceability, and replicability. Digital natives are able to build their supporting data stack in full alignment to their business models. Legacy operational companies are adding digital operational data systems to support their digital transformation but struggle to retire the systems that support the traditional business.
In the modern digital environment, transactional systems are adding data at unprecedented speeds and volumes. For Augmented Analytics, the digital natives want easy data onboard and instant value which can be achieved with cloud-native modern data stacks. Legacy operational companies will need the flexibility to draw data and drive insights from both legacy on-premises sources as well as cloud data stores. As a vendor in this space, Tellius is striving to support both journeys.
Data Stitching: Pipelines and Data Mesh
The complexity and variety of data pipelines stemming from digital and legacy sources is driving a data paradigm shift from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform), whereby data is brought together first then transformed for query and machine learning uses. The emergence of tools that automate and scale the management of the data pipelines themselves (Stitch, Fivetran, Matillion, etc.) are making life easier for teams trying to make data available for intelligence and decisioning across all the systems via Data Lakes.
Data Mesh is another emerging concept that focuses on a more decentralized and domain-driven curation of data as opposed to centralized IT ownership. Eventually these domain-based data models may still end up in data lakes, lakehouses, and warehouses. As it is more of an organizational aspect in general with looser guidelines, we will not focus on it too deeply but is worth mentioning as it is starting to fit an increasing footprint in the data strategy of companies.
Data Lakes or Lakehouses (Combining Storage and Compute)
Cloud data store technologies like Snowflake or Delta Lake from Databricks are highly efficient at storing and processing data at scale. While the concepts of data lakes and warehouses are getting to end-of-life in the buzzword cycle, data lakehouses are taking their place, augmented with all the enterprise bells and whistles such as scalability, replication, security, auditing, etc. Data lake/ lakehouses’s cloud-native architectures effectively decouple storage from compute to offer efficient ways to scale and operate per the usage patterns for each. Due to this innovative approach, data lakes are becoming perfect environments for ELT where most of the business and data science data transformations can be housed under one roof.
Data lakes are great for housing granular transactional data that is increasingly valuable to run predictive analytics against. Data lakes are one avenue to keep the data value chain intact and create ELTI flows (Extract>Load>Transform>Insight) to unlock valuable insights in an automated fashion to identify trends and drivers at scale within an enterprise. As such, these processes are ideal to be run where the data resides and for the output to be presented to the application layers through APIs.
As data transitions from the data engineering space into the business and data science space, the types of transformations and computation profiles change. We will next dive into the exciting area of the Compute Stack.
The Compute Stack
This layer is full of innovation as it is the connection between the data and application layers. It is also the golden opportunity for Augmented Analytics vendors to extend in the lower part of the stack and integrate a compute layer with the data stack.
Previously, the computation needed to support the variety of workload profiles in the data value chain was fragmented, so data querying and data science/ML jobs were supported on different computational platforms. The new Augmented Analytics experience combines natural language search, SQL, visualization, automation of insights and intelligence narratives. This requires a new back-end compute component architecture. Workloads supported by multiple compute engines must intertwine and work together.
Query workloads consolidate
The previous fragmented data process workflow looked like this:
- Data engineers used their preferred SQL tools for coding data transformations.
- Business analysts would pull the data into their preferred Visualization/BI tools to apply further business transformations and logic to the data for ad-hoc analysis.
- Finally the data scientists would use their preferred tools for feature engineering and machine learning modelling.
This behavior, while organizationally simple at first, ultimately constrains the flow of data to intelligence; siloes skill sets; increases integration time between steps; and ultimately doesn’t allow for a cohesive and collaborative experience.
Tellius unifies this workflow by bringing data transformation functionality closer to the business with capabilities for both coders and business analysis to share this effort. Tellius supports and optimizes one unified Compute Stack. Both SQL-based pre-processing for data transformations, Python-based feature engineering, as well as fast querying needed for interactive visualizations all happen under one roof. The compute needed for processing adjusts seamlessly for the operation at hand.
To augment business data modeling, Tellius provides a point-and-click experience as well as partners with Looker as a facility geared specifically to business analysts who want to model their business data and KPI calculations using LookML. Once modeled, Looker automates the transformations needed to instantiate these KPI definitions through the Looker framework.
An Augmented Analytics platform like Tellius complements the Looker processing by adding the automated insight generation that targets the KPIs produced. The contributing factors to the KPI trends are presented in data narratives and best-fit visualizations to the users.
Live connection to Data Lakes:
Traditionally, BI tools have developed sophisticated, proprietary SQL engines optimized to quickly serve visualizations back to the application. This, however, leads to data duplication from the centralized store, potential inconsistency, and problems scaling. As mentioned earlier in the Data Stack section, data lakes have become more performant, scalable, and cost efficient for SQL operations. The additional step of loading data into the BI tool is not appropriate for all use cases anymore. For certain scenarios (especially when the datasets in data lakes get updated very frequently) the most cost and resource efficient path is for Queries from the Search layer of the BI Applications to hit the SQL layer of the Data Lakes directly as a passthrough (no data movement, fewer headaches).
The live connections from the Tellius Application to Data Lakes/ Lakehouses (such as Snowflake, Delta Lake from Databricks, and Amazon Redshift) can support exploratory work on TBs of data and 100’s of concurrent users. The same exact paradigm can be applied to queries that support the dynamic Visualization Engine, making the live connection to data lakes highly versatile for search, interactive visualization, and insights like trend analysis.
The AI Compute Components
AI is used pervasively throughout the Augmented Analytics experience. First, part of the experience is about lowering the barriers to entry in data exploration. Then, as search becomes the preferred way to explore data at scale, Natural Language Processing (NLP) and AI on top of a search engine are used to index the dataset’s metadata and build the learnings needed to translate the Natural Language Queries into SQL queries. Once that learning is completed, the queries from the application can be executed against a SQL engine.
If insights and intelligence needs to be produced and embedded in the experience, even as a short batch window that notifies the user when analysis is ready, the compute supporting algorithmic training and generation of intelligence narratives needs to be close and integrated with the data lakes compute layer (the SQL component). While many technology vendors use linear hand-offs (i.e. not scalable) between these compute paradigms (Query and ML compute engines), the Augmented Analytics experience requires the ability to run multiple workload profiles in parallel. Users should be able to query data in real-time while triggering AI-based jobs that run in parallel and notify when ready for viewing.
Tellius succeeds in integrating multimodal workloads for a cohesive experience that combines fast SQL querying with batch ML training for Insights. The usage patterns in the application determine the need for the workload type (whether it is a query for search and visualization or a model training and rendering) and the back-end makes automated decisions around sampling, feature engineering, training and output presentation to the users. We will next spend time in the Applications which make up the Augmented Analytics Experience,
The Application Layer benefits from the changes in the stacks below. The new Augmented Analytics experience is built within the Applications layer as a fluid experience. This allows analysts to navigate fluidly and iteratively between search (for visualization and slicing/dicing), insights creation, and all the way into model training using automated tournaments. Collaboration happens in the Applications layer. While data analysts are at the center of the feature focus, data engineers and data scientists are now able to operate within the same platform as key inputs and support so intelligence can make its way to decision makers. The Application Layer has the following components to support this collaboration further:
Components for Application Layer
The pillars of the analyst-to-data interaction are addressed through search as the entry point, visualization in support of the search function, automated generation of Insights narratives, and deeper dives into prediction when needed. Tellius supports these flows through the applications below:
Search Engine: As mentioned earlier this engine allows the platform to abstract SQL queries via NLP search. The search engine interprets the intent and entities of the question then translates and pushes them into a query understandable by the data store below.
BI Visualization Engine: As a traditional component of the BI experience, the Visualization Engine serves two purposes. It augments the results from the Search engine using smart autocharting/ best-fit visualizations and further refinement in a point-and-click edit mode. The same charting objects in the Visualization engine can be used to build operational dashboards and further dynamic explorations used for presenting the information to business stakeholders.
AI-Powered Insights Engine: As part of the Augmented Analytics Experience, users can prompt the Insights Engine to perform complex trend analysis and segmentation based on multi-TBs transactional data. The insights production is automated and presented back to the user as narratives and graphs for further exploration.
AutoML Engine: Analysts can use this component for either AutoML tournaments or bespoke ML training (guided point-and-click) based on methodologies such as classification, clustering, forecasting or recommendations. Data scientists can also participate in all the functions of the platform, while bringing their own feature engineering code and trained models (BYO code) to the platform. Model integration is increasingly easier to achieve and enabled through well documented APIs.
Bringing it all together
At Tellius, we are using the Compute Stack as the integration point between the Modern Data Analytics Stack and the Applications. By supporting business and data science compute needs under one roof, while leveraging firm’s investments in the modern data stack, Tellius brings to bear the power of the new Augmented Analytics experience allowing business users and advanced users alike easier access, shorter time to insights, and ultimately, the ability to make better data driven decisions to move the needle on their business initiatives. This is why our customers frequently bring us into their conversations about technology choices they face as they modernize their data stacks.
Next technology choice should compliment your current investments
With all the technology options available, vendors and IT shops in organizations have a tremendous opportunity to build highly impactful tools for their users to produce insights out of data.
The investment in the Modern Data Analytics Stack is a journey that does not stop mid-way into the data-to-insights value chain. If you have built a Modern Data Analytics Stack to support the new/digital use cases for data capture, movement, and organization, the logical next investment is in the insights and intelligence layer on top which unlocks self-service analytics, democratization of Insights, and collaboration between data value chain stakeholders. With that in mind, we at Tellius have designed our application with the speed-to-insights in mind. We have all the elements that would allow a fast onboard, integration, and time-to-value as the preferred Insights & AI-powered Intelligence Layer of the Modern Data Analytics Stack.
For a quick overview of Tellius, check out this 4min demo or start a free 14-day trial (no credit card necessary) to take Tellius for a spin yourself.
As Donald Farmer stated in a recent webinar titled “Break Through the Boundaries of BI”, “Data without analysis is a wasted resource”. And what better way to help scale analysis than lowering the skill barriers to producing insights from data by using AI and Automation. The BI experience can be and should be better.