Spotting earnings manipulation: using machine learning for financial fraud detection

Rahul, Kumar; Seth, Nandini; Dinesh Kumar, U

Welcome to the Research @IIMB

The Digital Institutional Repository of IIM Bangalore

This repository provides metadata of IIMB Publications and aimed at creating and preserving an archive of Institution scholarship. IIMB Publications include Articles, Working Papers (FULL TEXT), Book Chapters published by Faculty, Doctoral Dissertations by FPM Scholars and Project reports of Students enrolled in various courses of IIMB.

Learn More

Please use this identifier to cite or link to this item: https://repository.iimb.ac.in/handle/2074/11158

Title:	Spotting earnings manipulation: using machine learning for financial fraud detection
Authors:	Rahul, Kumar Seth, Nandini Dinesh Kumar, U
Keywords:	Accrual Manipulation;Bagging;Boosting;Data Analytics;Earnings Manipulation;Ensemble Methods;Gaussian Model;Sampling;Simulation;Supervised Learning;Unsupervised Learning
Issue Date:	2018
Publisher:	Springer Verlag
Abstract:	Earnings manipulation and accounting fraud leads to reduced firm valuation in the long run and a public distrust in the company and its management. Yet, manipulation of accruals to hide liabilities and inflate earnings has been a long-standing fraudulent conduct amongst many listed firms. As auditing is time consuming and restricted to a sample of entries, fraud is either not detected or detected belatedly. We believe that supervised machine learning models can be used to determine high risk firms early enough for auditing by the regulator. We also discuss the anomaly detection unsupervised learning methodology. Since the proportion of manipulators is much lower than the non-manipulators, the biggest challenge in predicting earnings manipulation is the imbalance in the data leading to biased results for conventional statistical models. In this paper, we build ensemble models to detect accrual manipulation by borrowing theory from the seminal work done by Beneish. We also showcase a novel simulation-based sampling technique to efficiently handle imbalanced dataset and illustrate our results on data from listed Indian firms. We compare existing ensemble models establishing the superiority of fairly simple boosting models whilst commenting on the shortfall of area under ROC curve as a performance metric for imbalanced datasets. The paper makes two major contributions: (i) a functional contribution of suggesting an easily deployable strategy to identify high risk companies; (ii) a methodological contribution of suggesting a simulation-based sampling approach that can be applied in other cases of highly imbalanced data for utilizing the entire dataset in modeling.
URI:	https://repository.iimb.ac.in/handle/2074/11158
ISBN:	9783030041908 9783030041915
ISSN:	0302-9743
DOI:	10.1007/978-3-030-04191-5_29
Appears in Collections:	2010-2019

Show full item record

Google Scholar^TM

Check

Welcome to the Research @IIMB

Google ScholarTM

Altmetric

Google Scholar^TM