In this paper, we consider a general approach for classifying objects and explain it with credit cards scoring problem. Classification can be defined by a classification function assigning to each object some categorical value called the class number. However, this classification function has a very inconvenient property it is discontinuous (impossible to use it for classifying new objects). We reduce the classification problem to evaluating a continuous utility function from some general class of functions. This function is used for separating objects belonging to different sets. Values of utility functions for objects from one class should be in the same range. “The best” utility function in some class is found by minimizing the error of misclassification.
Depending upon the class of utility function, it may be a quite difficult problem from optimization point of view. However, if one is looking for a utility function, which is a linear combination of some other functions (possibly nonlinear in indicator variables), it can be formulated as a linear programming problem. Mangasarian, et al. (1995) used this approach for failure discriminant analysis with linear utility functions (applications to breast cancer diagnosis). This function is linear in control parameters and indicator variables. Zopounidis, et al. (1997, 1998), Pardalos, et al. (1997) used a linear utility function for trichotomous classifications of credit card applications. Konno, et al. (2000a, 2000b) considered utility functions which are quadratic in indicator prameters and linear in control variables.
The approach was tested with the classification of enterprises and breast cancer diagnosis. Konno, et al. (2000a, 2000b) imposed convexity constraints on utility functions in order to avoid disconnectness of the discriminant regions. Similar to Konno, et al. (2000a, 2000b), we also use the quadratic function in indicator parameters. But instead of convexity constraints, we consider monotonicity constraints reflecting experts’ opinions. Extending some ideas by Zopounidis, et al. (1997, 1998) and Pardalos, et al. (1997) we consider the multi-class classification with many levelsets of the utility function. Although, with our approach we can classify objects to an arbitrary number of classes, in this paper, we considered the trichotomous (i.e., three classes) classification.
This paper is focused on a numerical validation of the proposed algorithm. We classified a dataset of credit card applications submitted to a bank in Greece. This dataset was earlier considered in Damaskos (1997) and Zopounidis, et al. (1998). We investigated the impact of model flexibility on classification characteristics of the algorithm. We compared the performance of several classes of quadratic and linear utility functions with various constraints. Experiments showed the importance of imposing constraints adjusting “flexibility” of the model to the size of the dataset. We studied “in-sample” and “out-of-sample” characteristics of the suggested algorithms. With our classification approach, we minimize the empirical risk, that is, the error of misclassification on the training set (in-sample error). Nevertheless, the objective of the procedure is to classify objects outside of the training set with minimal error (out-of-sample error). The in-sample error is always no greater than the out-of-sample error. Similar issues are studied in the Statistical Learning Theory, see Vapnik (1998).
Broadly speaking, the classification problem can be referred to problems of Data Mining or Knowledge Data Discovery. During the last 50 years, a wide set of different methodologies was proposed for data discovery. Data mining techniques can be divided into five classes, Bradley, et al. (1999): predictive modeling (predicting a specific attribute based on the other attributes in the data), clustering (grouping similar data records into subsets), dependency modeling (modeling a joint probability function of the process), data summarization (finding summaries of parts of the data) and change/deviation detection (accounting for sequence information in data records).
Considered in this paper the credit cards scoring problem is a particular case of a consumer lending problem utilizing financial risk forecasting techniques, see, Thomas (2000). Scoring models are divided into two types: 1) models (or techniques) helping creditors to decide whether or not to grant credit to consumers who applied for credit; 2) behavior scoring models helping to decide how to deal with existing customers. We focus on the first type of scoring models.
In credit scoring, decision on issuing credit for a client is based on application for credit and a report obtained from a credit report agency. Also, information on previous applications and their performance is available; we call this information in sample information. Creditor uses in sample information together with applicant information to make a decision.
Credit scoring is essentially a way of separating (recognizing) specific subgroups in a population of objects (such as applications for credit), which have significantly different credit risk characteristics. Starting with ideas on discriminating between groups, which were introduced by Fisher (1936), many different approaches were developed using statistical and operational research methods. The statistical tools include discriminant analysis (linear and logistic regressions) and recursive partitioning algorithms (classification and decision trees). The operation research techniques primarily include mathematical programming methods, such as linear programming. Also, several new non-parametric and artificial intelligence approaches were recently developed. They include ubiquitous neural networks, expert systems, genetic algorithms, and the nearest neighborhood methods, see for instance, Thomas (2000).
A common weakness of many credit-scoring approaches is that they do not provide clear explanations of “reasons” for preferring some objects versus others. Capon (1982) considered this as the main drawback of many credit scoring algorithms. Also, there are many implementation issues, which need to be addressed before using any credit-scoring model, such as: 1) How to select a sample of previous applicants; 2) How long should be the period of time for the sample set? 3) What proportion of “goods” and “bads” should be had in the sample. Henley (1995) discussed some of these issues in his thesis.
A credit scoring classification problem can be defined as a decision process, which has theinput: answers to the application form questions and various information obtained from the credit reference bureau, and the output: separation of applications into “goods” and “bads,” Thomas, (2000). The objective of credit scoring is to find a rule that separates “goods” and “bads” with the smallest percentage of misclassifications. Note that perfect classification is impossible due to several reasons. For instance, there could be errors in the sample data. Moreover, it is possible that some “good” applications and “bad” applications have the exactly the same information in all data fields (i.e., not enough information is available to make a correct decision). The statistical learning theory, see, Vapnik (1998), states that for a model the optimal prediction (i.e., out-of-sample classification with minimal misclassification) is achieved when the in-sample error is close to the out-of-sample error.
Relatively simple statistical approaches using linear scoring functions (Bayesian decision rule, discriminant analysis, and linear regression) became the most popular for classification problems. The Bayesian decision rule works especially well in the case when the distribution of “goods” and “bads” can be described by multivariate normal distributions with a common covariance matrix; it reduces the problem to the linear decision rule. If covariances of these populations are different, then, it leads to a quadratic decision rule. However, the paper by Titterington (1992) pointed out that in many cases the quadratic scoring function appears to be less robust than the linear one. Fisher (1936) used discriminant analysis to find a linear combination of variables that separates the groups in the best way. His approach does not require an assumption of normality. The other way leading to linear discriminating functions is linear regression. Myers and Forgy (1963) compared regression and discriminant analysis in credit scoring applications. Although, there were many critics of the discriminant and regression analysis (see, Eisenbeis (1978), Hand, et al. (1996), Capon (1982)), empirical experience shows that these linear scoring techniques are very competitive with other more sophisticated approaches.
The other important method of linear discrimination is logistic regression. Earlier, this approach was associated with computational difficulties of maximum likelihood estimation, but nowadays this is not a problem due to readily available high computing power. Wiginton (1980) was one of the first who applied logistic regression to credit scoring; currently this approach is widely accepted.
Classification trees and expert systems represent another class of approaches. Classification trees typically are used in statistical, artificial intelligence, and machine learning applications. Makowski (1985) was one of the first who suggested using classification trees in credit scoring. Coffman (1986) showed that classification trees perform better than the discriminant analysis when there are interactions between variables.
The paper is organized as follows. Section 1 provides the background for the considered methodology. Section 2 formally describes our classification approach. In this section we show how it can be implemented using linear programming. Section 3 applies the developed methodology to credit cards scoring problems. It describes a dataset of credit card applications and demonstrates computational results. Section 4 analyses the results of computational experiments. We finalize the paper with the concluding remarks in Section 5.
Download
PDF Ebook Credit Cards Scoring With Quadratic Utility Function
