A Preliminary Investigation of Decision Tree Models for Classification Accuracy Rates and Extracting Interpretable Rules in the Credit Scoring Task: A Case of the German Data Set
The Clute Institute For Academic Research, Colorado, USA
Faculty of Computing, Health and Science
School of Computer and Security Science
For many years lenders have been using traditional statistical techniques such as logistic regression and discriminant analysis to more precisely distinguish between creditworthy customers who are granted loans and non-creditworthy customers who are denied loans. More recently new machine learning techniques such as neural networks, decision trees, and support vector machines have been successfully employed to classify loan applicants into those who are likely to pay a loan off or default upon a loan. Accurate classification is beneficial to lenders in terms of increased financial profits or reduced losses and to loan applicants who can avoid overcommitment. This paper examines a historical data set from consumer loans issued by a German bank to individuals whom the bank considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper examines and compares the classification accuracy rates of three decision tree techniques as well as analyzes their ability to generate easy to understand rules.