Credit Card Fraud and Detection technique

Overview

    Fraud is one of the major ethical issues in the credit card industry. For years, fraudsters would simply takes numbers from credit or debit cards and print them onto blank plastic cards and use at brick-and-mortar stores. Nevertheless, expert predict online credit card soar to whopping $32 billion in 2025.

    As a result, companies have been investing massive amounts in other technologies for detecting fraudulent transaction.

    Below will present some detection techniques.

Classification Problems

    In Machine Learning, problems like fraud detection are usually framed as classification problems -- predicting a discrete class label output given a data observation. Example of classification problems that can be thought of are Spam Detectors,Recommender Systems and Loan Default Prediction.

    Taking about the credit card payment fraud detection, the classification problems involves creating models that have enough intelligent in order to properly classify transaction as either legit or fraudulent,based on transaction details such as amount, merchant, location, time.

    Financial fraud still amounts for considerable amounts of money.Hackers and crooks around the world are always looking for the new ways of committing fraud.Relying exclusively on rule-based, conventionally programmed systems would not provide the appropriate time-to-market.This is where Machine Learning shines as a unique solution for this type of problem.

    The main challenge when it comes to dealing fraud detection come from the fact that in the real world data, the majority of the transaction is not fraudulent.And this brings us a problem: imbalanced data .

Imbalanced Data

    Data imbalance usually reflects an unequal distribution of classes within a dataset.For example, in a credit card detection dataset, most of the transaction are not fraud and very few classes are fraud transactions.This leaves us with something like 50:1 ratio on between the fraud and non-fraud classes.

    When you are supposed to predict the fraudulent rate based on data from previous years, what would you do?

    The most straightforward way to proceed in this case would be predicting that 100% of transactions is non-fraud.Accuracy in this case would be 98% when simulating past years. Sounds great, right?

    Would this model be correct?

    Certainly not! We'll take a look at a practical case study and learn how to overcome the issue of imbalanced data.

Case Study

    The case study is aimed to demonstrate how I dealt with an imbalanced data for credit card fraudulent dataset and obtained a forecast with 94% accuracy rate using python.

    And the code can be found here.