Have you ever been using analytical tools to make a decision, but you find that the output of those tools just doesn’t quite fit the decision you’re going to make? You want to target a very specific set of users, but you find yourself hamstrung by the only dashboard that is available to you — or you can’t even find the data you need in the first place.
How can you get the right set of information at the right time?
Courtesy of dilbert.com
Tellius is a decision intelligence platform empowering business users and analysts with a faster way to answer “what” happened and “why” business metrics are changing. Our Guided Insights engine uses applied machine learning and statistics to automatically present insights: the right piece of information at the right time.
These automated insights help business users to make decisions based upon data, without having to be an expert data scientist or statistician.
Many organizations want to know what exactly powers the insights that we produce. Below, we lay out the exact techniques that we use to create our insights. Why would we do this? We believe in transparency. Throughout Tellius, we expose as much information as we can about the models that we are running. This explainable AI is key to regulated industries, and helps technical teams within organizations understand and leverage Tellius along with business users.
Read on to understand the nitty gritty detail behind our four key insights:
1. Trend Drivers (Why did a metric change?)
2. Cohort Comparisons (What drives a metric for two different classifications?)
3. Segment Drivers (What combinations of our data drive a particular behavior?)
4. Discoveries (Find outliers and correlations)
Trend drivers are used to identify what is causing a change over time in a particular metric — in an automated fashion, with no manual work from the user. Let’s consider a CPG dataset with sales, state, region, demographic information, channel and more across two quarters. The table structure looks like this:
A small piece of our CPG dataset
Aggregating total revenue across quarters is simple. However, how would you quickly pinpoint what drove the change here? The answer is typically dashboards + filters — a repetitive process that takes a lot of manual effort. We don’t think that’s scalable. Now let’s take a look at what a Trend Driver looks like:
Trend Driver Summary
Trend Driver Detail
We create trend drivers by performing aggregated delta calculations at scale. For every measure in your data, we focus on each dimension, aggregate the measure by every field in that dimension, and calculate the change across the time periods in question.
We then adjust for statistical significance, rank the deltas and present them to the user.
Ranked Trend Drivers
Think about what this means — we are calculating every change in your data across all aggregations in every field. Not only does this remove a significant amount of manual effort, but we leverage Apache Spark to apply this technique to big data in just seconds.
Not only do we calculate this for the metric at question, but we calculate nested values for the top contributors — allowing users to dive in to top drivers to proactively see what is causing that particular driver to change. Think of this as an automated drill down.
Finally, the Trend Driver (via the Why section and How sections) also exposes other metrics and interesting segments of your data that are changing at the same time as the primary metric in question, adjusting for correlation and exposing those metrics or segments that changed at the same time as your primary metric.
The cohort comparison is similar to our trend driver, but instead of focusing on two time periods it focuses on comparing a primary metric for two different classes within your data. If you want to understand why two different campaigns performed differently, what drives two different types of customer or contrast two different regions, the cohort comparison is the insight to use.
Let’s consider the same dataset from above. We saw that the Direct channel was our #3 driver behind our Q1 vs Q2 trends. To understand how Direct compares to our #2 in the channel dimension (Social), we can use a Cohort Insight:
Cohort Comparison Insight
Similar to the trend driver, we are aggregating a particular metric across all individual items within every dimension of your data. This is slightly different from the trend driver in that instead of comparing across time, we are comparing two different classifications within a dimension.
Cohort Insights are a great tool for when you want to compare two different classes immediately, or as a second level of analysis when you discover something interesting from a trend or segment insights.
Segment Drivers are a powerful tool that are used to identify which features or metrics most impact a subset of data — this is called feature importance. Just as critically, the Segment Driver Insight also highlights multi-dimensional cohorts of data that are more likely to occur than on average.
Let’s consider our dataset one more time, and investigate which segments of our customers are at high risk to go to a competitor. Tellius automates this process and creates two pieces of output: feature ranking and segment identification.
Tellius highlights up to 5 segments of data that are more likely to occur with the target subset. Finding specific segments that occur more frequently can help you target specific behaviors for any number of uses: marketing, attrition, recruiting, fraud, etc.
To accomplish finding these 5 segments, all measures (continuous variables) are binned into uniform bins. We then apply a variety of machine learning or statistical models that we have automated. One example of a technique we use is a decision tree model, applied to both binned and categorical variables. Why a decision tree? Think of a decision tree as a branching flow chart, segmenting your data based upon different combinations of data.
Example of a decision tree model
We apply this decision tree model to your data in order to find hidden segments that you are not able to identify using traditional tools.To learn more about our specific implementation of decision trees, read on here.
Here is an example of a segment that Tellius identified based upon our CPG dataset:
Segments of customers most as risk
Tellius also automates the process of identifying feature importance, exposing which of our dimensions or metrics impact the subset of our data that we want to investigate. Feature ranking tells you which of your variables are most impactful to your other data. See below:
Feature importance ranking
The Discovery Insight will identify correlations and anomalies within your data. While our previous 3 insights are focused around specific questions, the Discovery Insight is used to identify relationships or individual data points that might be of interest. While Trend, Cohort and Segment Insights are geared to answer specific questions, this is geared to find anomalies or relationships that are worth investigating.
Correlations are calculated when considering all metrics compared to each other, across all categories. For example, if you have 5 continuous metrics and 5 dimensions with 5 categories in each dimension, Tellius will calculate (5! * 5 * 5) = 3,000 correlations and expose those with very high or very low correlation. See a correlation example below:
Correlation Discovery Insight
Anomalies will find outlier data points by calculating the standard deviation, assessing an appropriate cutoff and then exposing datapoints that are on either end of a distribution curve.
Distribution curve with outliers shaded
See an anomalies example below:
Outliers Discovery Insight