Predict Customers That Are Likely To Default Using C# and ML.NET Machine Learning

Ofoedu Frank Ebuka
4 min readJun 7, 2019

Ml.Net is Microsoft’s way of enabling .Net developers to develop applications that have wide variety of applications such as in speech recognition, data analysis and computer vision.

Machine learning can be a quite intimidating area of computer science for newcomers, that is why Microsoft keep working and improving on the APIs of Ml.Net to make it more developer friendly and easy to build on.

Due to the very few examples on how to use ML.Net, I decided to write this post to show its application in performing predictions.

Scenario

Machine learning algorithms are algorithms that aim to teach a computer over time.

Instead of telling a computer to solve a problem using a set of rules, it tries to solve the problem based on experience i.e using previous solutions you provided it. For example if you told the computer that it always rains on the 1st , 3rd and 20th of June, it will likely predict rain around the 31st, 1st, 2nd, 3rd, 4th ,19th, 20th and 21st of every June. It uses the data you provided before to learn which days have high probability of rainfall.

For this post, we will be looking at a kind of problem called binary classification. In binary classification, we are trying to make the computer predict between two outputs or classes or labels. E.g predict if a patient has diabetes or not. We will provide the computer with a list of inputs called features which will help it to make decisions e.g age of the patient, glucose amount in the body, blood pressure etc.

In binary classification, we are trying to predict one of two cases or classes.

Process

The data we will use contains 30,000 records of different activities performed on a card. Download here

There are a lot of columns in the csv file. Normally, we are to process the data, look for features that are not needed , check for null values but for this tutorial, we will just use the data as it is.

  • Start a new dot net core console app in visual studio 2019 or for visual studio code users run the following commands
$ dotnet new console -o CreditCardDefaultPrediction
$ cd CreditCardDefaultPrediction
  • Install the ML.NET Nuget package (Microsoft.ML)
  • Then create two classes. The first class represents each row of data in the csv file. The class fields should have the same name as the column names in the csv file. The second class holds the predictions.

In the main function of the Program.cs file, we start the coding process.

  • First we create the machine learning context(MLContext). The MLContext is a starting point for all ML.NET operations such as training, prediction, model operation and other processes.
var ctx = new MLContext();
  • Next we load the data using the Data property of the MLContext object. The data is lazy loaded i.e it isn't loaded except when it is explicitly needed. This function could also be used to load enumerable and binary (media stream). We also specify the type of the input using the class we created earlier.
IDataView dataReader = ctx.Data.LoadFromTextFile<CreditCardData>(dataPath, separatorChar: ‘,’, hasHeader: true);
  • instead of using the whole data for training our model, we could easily split it into training data and a test data to assess our model accuracy. ML.Net provides an easy way to do this. We could also specify the ratio of test to train data, for this example we use 0.2.
var splitData = ctx.Data.TrainTestSplit(dataReader, 0.2);
  • Since we have multiple input columns, we transform them to form a single input column named Features. The concatenate method accepts
var InputColumns = new string[] { "","",""};// an array of the column names we need.var concatenatingEstimator = ctx.Transforms.Concatenate(“Features”, InputColumns);
  • We then pick an algorithm to work with. Since its a binary classification problem, we could use Fast-Forest, Averaged Perceptron, GAM or linearSVM among others. Do experiment with different trainers/algorithms and compare the results
var estimator = concatenatingEstimator.Append(ctx.BinaryClassification.Trainers.FastForest(“Label”, “Features”));
  • The last training step is to train with the algorithm using the training set of data
var model = estimator.Fit(splitData.TrainSet);

We now have a model. We could use it to make predictions on examples:

//var predictionEngine = ctx.Model.CreatePredictionEngine<CreditCardData, CreditCardPrediction>(model);var singleData = new CreditCardData() {AGE= 22, BILL_AMT1 =23,... };//var singlePrediction = predictionEngine.Predict(singleData);//Console.WriteLine(singlePrediction.Prediction);

We could also check the metrics of our model to decide if it is a good fit for the business case. There are multiple metrics used in assessing a binary classification model such as F1 score, recall, precision, accuracy etc

var metrics = ctx.BinaryClassification.EvaluateNonCalibrated(predictions, “Label”);Console.WriteLine($”Accuracy : {metrics.Accuracy.ToString(“0.00”)} + F1 Score: {metrics.F1Score.ToString(“0.00”)} “);Console.WriteLine($”Negative Precision : {metrics.NegativePrecision.ToString(“0.00”)} + Negative Recall : {metrics.NegativeRecall.ToString(“0.00”)}”);Console.WriteLine($”Positive Precision :{metrics.PositivePrecision.ToString(“0.00”)} + Positive Recall : {metrics.NegativePrecision.ToString(“0.00”)}”);Console.WriteLine($”AUPRC :{metrics.AreaUnderPrecisionRecallCurve.ToString(“0.00”)} + AURC : {metrics.AreaUnderRocCurve.ToString(“0.00”)}”);

Comparing these metrics for different algorithms/trainers is a good way to pick the best model for our problem.

View project on github

--

--

Ofoedu Frank Ebuka

Senior C# Engineer. Nigerian. I enjoy writing about issues that took me a while to figure out. Contact me here → frankofoedu@gmail.com