What is Clustering in Machine Learning?
We will use an unsupervised machine learning clustering model that analyzes and groups a set of points in such a way that the distance between the points in a cluster is small (within the cluster distance) and the distance between points from other clusters is large (inter-cluster distance). There are multiple types of unsupervised algorithms (E.g.: hierarchical, probabilistic, overlapping) of which K-Means clustering is the most popular approach. Using Tellius, we are going to train a Bisecting K-Means model, which is a modification to the traditional K-Means algorithm where a number of clusters is defined apriori and the regular K-Means algorithm with k=2 runs to bisect the data until the desired number of segments is reached.
Preparing the Data
We are going to use the same data we used to perform the RFM segmentation described in the other use using RFM. Except this time, we can keep most of the continuous variables to be used as the input in the model. We also enrich the dataset by creating new features using the Tellius built-in SQL editor:
We created features such as total number of furniture orders, number of high priority orders, number of orders shipped via first class shipping, etc. The resulted data contained a record per customer ID with 22 features that served as the input into the clustering algorithm.
Training the Machine Learning Model
Tellius offers a robust machine learning layer which is built on Apache spark using Spark ML open-source library, where users can train, assess, and apply predictive models. The platform offers two approaches for training a model. One is called AutoML, where user selects a target variable and relies on Tellius to select the appropriate algorithm, perform feature transformation, fine-tune the parameters. The other is called Point-n-Click, which offers users more control over model selection and hyperparameter tuning approach. We are going to utilize Point & Click approach to build our model.
Step 1. Select the Clustering category of algorithms.
Step 2. Select the input features
Step 3. Select the Bisecting K-Means algorithm and provide the model parameters, such as number of iterations, seed value, number of clusters, and minimum cluster size.
Clicking on the Next button kicks off the model training job.
Reviewing the Machine Learning Model
After the model is finished training, Tellius surfaces all the model information, such as the final list of input features, algorithm documentation, and model parameters.
Tellius also displays the top three features by variance in the evaluation section:
Using the model and the dataset it was trained on, the marketing team can easily build content in the visualization layer in Tellius to assess the quality of customer segments created by the model, identify segments of customers that have the highest growth potential, and share the content with the rest of peers or executives inside the organization. An example of such content may look like this:
The marketing analytics team can interact with each chart in the Vizpad, drill into each customer segment all the way to individual customer records if necessary, output charts into native Microsoft PowerPoint graphs, and create a slide deck for C-level leaders in a matter of minutes.
Scoring New Data
After the clustering model is trained and is ready to be implemented in production, we need to be able to apply the model on new data (i.e. scoring) and assign a segment label to each customer record unseen by the model. Tellius offers a few ways of applying the model to the new data. One way is through the Tellius interface using point and click functionality. More technical users may prefer to utilize Tellius’ prediction API to access a trained model using Python or CURL script. Let’s take a closer look how to access the Bisecting K-Means model described in the previous section via API and score a dataset containing new customer data.
The Tellius platform contains detailed documentation on the API with clear examples for each step.
Step 1. Run the Tellius module using a Python IDE of your choice. Note: The script below can be accessed from the Tellius platform.
Step 2. Authenticate into the environment by providing the clientID and clientSecret configured. Note: the ID and the Secret are typically configured by a user with admin capabilities.
Step 3. Obtain model ID and assign the ID to a variable.
Step 4. Bring the model object into IDE using the model ID obtained in the previous step.
Step 5. Specify input parameters into the Tellius predict_file function, such as the type of file, the header info, as well as the scoring file location. In this case, we are using Customer_Segmentation_Scoring.csv file stored in my Google Drive.
Step 6. Fetch the object with predicted data.
Step 7. Transform the scored data from JSON into a Pandas data frame and output into a csv file for sharing or further analysis.
Since Tellius uses Apache Spark distributed architecture, users can enjoy superb performance when it comes to training any type of machine learning model. Point-and-click predictive model training offers time savings when it comes to training a model, feature selection, and feature transformation. The Tellius predict API offers flexibility of using the platform as the scoring engine by producing the model prediction results in a form that can be easily integrated into your current infrastructure, third party tools, or web applications.
In this article we provided an overview of how a marketing team can utilize Tellius’s augmented analytics platform to perform customer segmentation analysis utilizing two different approaches, a traditional RFM approach as well as a Bisecting K-Means clustering predictive model. We showcased:
- Tellius’ ETL layer for easy transformation, cleaning, and enriching data through point and click, SQL, and Python script.
- Tellius Predict where we developed the clustering model in order to group the customer base into segments and identify the ones with the highest growth potential and the highest customer lifetime value.
- Tellius API to apply and score ML models on data and integrate results within 3rd party web applications.
Tellius is useful for a variety of other eCommerce & Retail applications or download our Guide to AI-Driven Analytics for eCommerce. Take Tellius for a free 14 day spin (no credit card necessary) today!