“Alas, I cannot claim this next feat as illusion. Watch carefully- you will see no trickery, for no trickery is being employed.” – The Prestige
In this post, we describe how Tellius is used to classify segments of customers who are at a higher risk of default using a publicly available data set. The screenshots I will share have not been doctored, and they’re the exact output from a real cluster of our platform. Read on to see how you can make better credit risk decisions today.
What is credit risk and why do I care?
Here at Tellius, we talk to banks and lenders all of the time. 10 out of 10 want to hear about how we can help them to squeeze the most out of their credit risk efficiency. They are all swimming in oceans of demographic, transactional, rating and interactions data…but their current models require manually intensive inputs and take significant collaboration across roles. Many can only look at 3-4 variables at any one time to identify anomalies, and it then takes days to dive in and make any sense of what is happening. These monoliths are slow, inefficient and lag behind today’s available technologies.
Why would a financial institution want to improve their credit risk modeling? According to McKinsey, analytically enhanced credit models can improve banks’ return in a variety of ways. Risk classification can integrate in to approval processes, generating 50% efficiency improvement and increasing revenues by up to 10%. Others are using risk analytics to create new business models and serve their customers in new, innovative ways.
While the promise of integrating analytics even deeper in to risk modeling is alluring, the path to getting there can be dark and murky. Think the first two Harry Potter movies compared to the rest. Data is difficult to integrate, skills in the organization are hard to come by, and regulatory demands mean that even if organizations could utilize machine learning – how would they explain a model to a regulator?
Now for the good stuff
First, I downloaded data from Kaggle and uploaded it to Tellius. I cover data load in more detail here. We have 1000 rows of credit profiles which include demographic data (gender, age, region, job), financial status (savings/checking information, mortgage/rental info, credit score and trended credit score), and a classification of Good or Bad risk.
Using natural language search, I visualize risk. We can see that 30% of the entries in this credit risk data set is tagged as Bad, and 70% is tagged as Good. Which of the above attributes can help us predict Bad vs Good? By clicking on Drivers, Tellius presents me with the groups which are most likely to have Bad risk.
Give me some insight into who is risky
This process took 3 minutes and 9 seconds to run, and it first presents us with the factors that most heavily factor in to Risk = Bad. Below, we can see that Credit_now (current credit score), Duration (term of a loan) and Age are our top 3 factors.
The second part of this insight is the segment of customers most likely to be tagged as Risk = Bad.
Above, the highlighted segment shows that if a customer meets all of the following criteria, they are more 2.3x more likely to have a Risk = Bad tag compared to the overall data set when:
loan duration is between 15 and 33 months AND
credit score is between 1364 and 3512 (this is a German credit score from the 90s) AND
the borrower rents AND
the borrow is 25 or younger
First thing I want to note: 3 of the 4 variables (duration, credit score and age) are continuous variables. Duration is not segmented in to just 3 or 4 classes – each duration is a unique number, and Tellius has automatically identified a range within that field that is particularly sensitive. Same goes for credit score and age.
This identified segment is a targeted, actionable group with unique criteria that Tellius has identified in 3 minutes. This is not specific only to these variables – any kind of data can be tiered in this fashion.
Historical insights are great – can I make a prediction based on this?
I want to create model that can be consistently trained on my current credit risk data and applied to predict if new customers are going to result in credit losses. However, I’m not a data scientist. Maybe you aren’t technical as well, but you want to leverage machine learning because you know it will save you time and be more accurate than what you do today.
On the other hand, maybe you’re a data scientist and understand machine learning, but you want to move faster in creating and deploying models then you do today. Either way, Tellius can help you get there.
In my case, I navigate to the Predict section of Tellius and kick off an AutoML process. This automates most steps of machine learning and is targeted towards the non-technical user. My next blog will be on our more flexible and powerful Point and Click ML.
The AutoML process asks me which column I want to predict, and I choose Risk:
From here, I name my model and Tellius automatically suggests the type of ML models to use (Classification models) and the Evaluation Metric (Area under ROC or AUR).
Tellius supports a variety of Machine Learning models across the gamut: from classification to regression and beyond. You can read more about the types of models we support and how to build, train and deploy Machine Learning models in Tellius at this link.
I also have the option to view an Advanced Configuration screen, where I can make changes to how to handle null values, what percent of data is use, of that data what should be tested vs what should be used for training, etc. I’m not a data scientist, so I don’t want to make any custom changes yet – I just hit Predict. The AutoML process kicks off, and finishes running in 14 minutes.
Why did it take 14 minutes? The AutoML process actually trained and scored 5 different machine learning models, all under the Classification type. That’s 2 minutes and 53 seconds per model, during which time we assessed every possible variable in our data set and developed functioning models that can predict Risk.
Our highest performing model is the logistic regression type, which can be used when we have a binary dependent variable such as Risk (Good vs Bad). We can see evaluation and performance statistics for each model ranked in a leaderboard format. Below is how our logistic regression model performed:
So this logistic regression model has a variety of performance metrics. For classification, AutoML suggested that we use AUR to evaluate and score the models – this particular modl had the highest AUR score at 0.68. This score would improve if we were training the model with more data. Of our 1,000 rows, we only trained on a subset.
In general, Tellius gives this model a 73% composite accuracy metric. Credit risk prediction can be a sensitive subject – Tellius gives us an explanation for any prediction that it makes, breaking down each factor and how they affected the prediction.
In the case above, the model is favoring categorical variables. For this particular row of training data, Tellius predicted that the customer should be marked as Risk = Bad. The 3 factors that contributed the most to this classification decision were Age, Checking Amount, and Housing. Finally, Tellius also provides an accuracy metric for each individual prediction that it makes. While the model is 73% accurate in net, this individual prediction is made with 96.79% confidence.
Okay, great – how can I operationalize this?
Tellius ML can be shared within the platform as Machine Learning projects, utilized within dashboards and presented as interactive visualizations, or even pushed out for use in business processes and external applications using a powerful API.
Once I hit Save, they are stored in a Project that can be shared across Tellius. Here, you can train with new data or make predictions against new data.
Credit risk – from data to predictions in minutes
In this article we took 1,000 rows of data and made a credible machine learning model that predicts if an individual will have be a risky customer for a lender. Tellius is a powerful tool that spans across individuals. It allows people who aren’t technical like myself to explore data with natural language and start making predictions in just minutes. Imagine the insight you could derive from your data with a platform like Tellius.
Thanks for reading, I hope that this was a fun read. If you’re still here you probably have tons of data and are trying to get better at getting insights in to it – I encourage you to read our eBook for AI-Driven Analytics in Financial Services, to reach out to me via my Twitter (@csreuter) or here on LinkedIn, and to try out a free trial of Tellius today!