BANK FAILURE: A MULTIDIMENSIONAL SCALING
C. Mar Molinero C. Serrano Cinca
Management Science Department of Accounting and Finance
University of Southampton, UK. University of Zaragoza, Spain.
Paper presented to the 17th Annual Congress of the European Accounting Association,
Venice, Italy, April 1994
A new version in: Mar Molinero, C. and Serrano Cinca, C. (2001): "Bank failure: a
multidimensional scaling approach", European Journal of Finance, 2001, Vol. 7, No 2,
June, pp. 165-183
Mathematical models for the prediction of company failure are by now well established. Most of the work on multivariate modelling of distress prediction attempts to get a Z-score that gives the failure probability of a company. Two of them are prevalent in the literature: logit and discriminant analysis. Both logit and discriminant analysis, require, before implementation, a selection of the variables that enter the model. Besides, the information provided by this kind of models, a single number, is quite poor.
A data set of 66 Spanish banks, 29 of which failed, is used to show that Multidimensional Scaling (MDS) techniques can be of use to produce simple tools for the analysis of the financial health of a company. MDS has the advantage of producing pictorial representations that are easy to interpret and use. This is done without loss of statistical rigour given the very close links between MDS and other multivariate statistical techniques that are normally used in the analysis of failure.
Correspondence to either author:
Cecilio MAR MOLINERO: Department of Management Science, University of Southampton,
Southampton S09 5NH, United Kingdom. E mail: email@example.com
Carlos SERRANO CINCA: Departamento de Contabilidad y Finanzas, Fac. Económicas, Gran Vía
2, Zaragoza E 50005, Spain. E mail: firstname.lastname@example.org
Papers related in: http://ciberconta.unizar.es/carlos.htm
Mathematical models for the prediction of company failure are by now well established. Two of them are prevalent in the literature: logit and discriminant analysis. There is little to choose between these two models since they share the same mathematical basis; see Haggstrom (1983).
Discriminant analysis makes extensive demands on the structure of the data. It starts from the premise that two different populations coexist in the data set, one of failed and one of continuing firms. Both populations are described by multivariate normal distributions with the same variance-covariance matrices, although their means are presumed to be different. This assumption is not totally necessary, it is only required for computational convenience since it results on linear classification rules that are easy to apply in practice. The assumption of common covariance structures can be relaxed, but there is a severe price to pay: models become much more difficult to estimate and implement. For a discussion of the issues involved see Eisenbeis (1977).
Logit is slightly less demanding in terms of assumptions, although the choice of the logistic function by itself is simply one of convenience. The logit function can be transformed into linearity, and in this form it has an attractive interpretation: it explains the odds that a company has of belonging to either population. There are alternatives to the logistic function that have similar mathematical properties but that cannot be linearised and interpreted in the way that logit models can.
Both logit and discriminant analysis, require, before implementation, a selection of the variables that enter the model. The selection of the final set of variables is complex and delicate. It is difficult to discard the fear that an important variable has been missed or that a spurious one has been included. These fears are justified since the variables that enter the model are seldom normally distributed and tests rely on asymptotic distribution theory.
An alternative model for the analysis of company failure, which bypasses many of the above shortcomings was suggested by Mar-Molinero and Ezzamel (1991). They suggested the use of Multidimensional Scaling (MDS) techniques. Their study was, however, concerned with explaining the process that a company follows on its path to failure rather than with prediction. In this paper we extend that work and suggest a way in which MDS models can be used as an alternative to discriminant analysis or logit in order to classify companies as failed or continuing. The use of MDS type of techniques has the further advantage that the results of the analysis can be presented in the form of maps which have intuitive interpretation.
Implementation does not make any sophisticated statistical demand on those who will have to live with the model. The methodology is illustrated with the help of a Spanish data set.
The paper consists of five sections. The first section describes the MDS model and reviews its use in Accounting and Finance. The second section concentrates on the data. The results are presented in the third and fourth section. The paper concludes with guidelines for model implementation.
I. MULTIDIMENSIONAL SCALING MODELS
Multidimensional Scaling models are well established in the Multivariate Analysis literature. MDS encompasses a set of techniques based on graphical representations. The end result of a MDS analysis is a statistical map.
Ordinary geographical maps are sophisticated mathematical instruments: they contain axes of references (scales) that locate the position of any point by means of a set of co-ordinates. Several scales may be present; for example, if three scales are present they may be associated with latitude, longitude, and altitude over the level of the sea. Maps also have orientation: it is general practice to locate points that have latitude North towards the top of the map, although other conventions are possible, provided that an indication is given of the way in which the position of the points is related to their latitude. Finally, a map makes it possible to calculate the distance between any two points, and to build a table of distances between pairs of points.
MDS proceeds in the opposite direction. From a table of distances, it reproduces the map. It creates scales of reference. It is possible to orient a MDS map by giving an indication of the direction in which a particular property of the data is related to its position in the space. The technical apparatus on which MDS relies can be found in many textbooks and will not be discussed here; see, for example Kruskal and Wish (1984).
MDS is not the only statistical technique that allows the graphical representation of multivariate data. Principal Components Analysis (PCA), which does not rely on distribution theory either, comes to mind as an obvious alternative and, indeed, when distances are euclidean both techniques will produce identical maps; see Chatfield and Collins (1980) or Mar-Molinero (1991). It is possible to exploit the relationship between the two techniques in order to decide on the dimensionality of the map in which the data is to be represented. This will be done below. The advantage of MDS, besides its intuitive appeal, is that it still
succeeds at creating maps from data even when distances are not euclidean. All that is required is a measure of proximity (dissimilarity) between pairs of points. MDS will place points that are similar next to each other and will place points that are dissimilar far away from each other. To do this, most MDS models rely on the ordinal properties of the data rather than on absolute values.
There are many ways in which measures of dissimilarity can be created. No particular demands are placed on the data other than there must be a message in it. This has been taken advantage of in Accounting and Finance. MDS was first used to explore non-quantitative aspects of Accounting information; examples are Libby (1979) and Moriarty and Barron (1976). There is, however, no reason why MDS should not be used to analyse quantitative data, such as Financial Ratios. This is what was done by Mar-Molinero and Ezzamel (1991) who used a measure of dissimilarity based on the absolute value of correlation coefficients. Mar-Molinero and Ezzamel explored the way in which ratios evolve as a company approaches failure, they thus used ratios as variables and companies as cases in the calculation of their similarity measures. The situation is reversed in this paper. Companies are taken as variables and ratios as cases. This makes it possible to explore up to what point any two companies are similar or different on the basis of published accounting information. Obviously, the objective of the exercise is to find out if failed companies tend to group in one area of the map and non-failed companies in a different area. If the two areas are disjoint enough, the map can be used for prediction purposes. This is exactly what was found in the case of Spanish banks.
II. THE DATA
In order to illustrate the use of the technique we have made use of data that was published in a previous study by Serrano Cinca and Martin del Brio (1993). The data consists of nine financial ratios calculated on 66 Spanish banks, 29 of which failed. Information for continuing banks refers to the 1982 financial year, while the ratios for failed banks were calculated for the last year before failure. Besides the Serrano and Martin study, the data has been analysed by Laffarga et al. (1986) and Pina (1989). Amongst the ratios, the first three measure liquidity, the fourth one is associated with the ability to self-finance, the next three are profitability ratios, ratio number eight measures the cost of sales and, finally, ratio number nine relates to cash-flow.
A problem to be faced in a study of this type is that each one of the different ratios that are used to describe a bank is measured in different units. The easiest way to avoid this problem is to standardise ratios to zero mean and unit variance. This procedure is equivalent to changing the original ratios that describe a bank into a set of multiple orderings, one for every ratio. The advantages and disadvantages of working with orderings have been extensively discussed in the literature; see, for example, French (1985). The main disadvantage is that the introduction of an extra bank in the data set may influence all the standardised data and change the final results. This also happens in alternative approaches, and does not have a satisfactory solution, thus we chose to standardise each financial ratio to zero mean and unit variance and compare banks on the basis of their standardised ratios.
An analysis of discordant observations was carried out by identifying the banks for which one or more of the financial ratios exceeded a standardised value of two and one half. MDS should be robust enough to cope with discordant observations. To check that this was the case, the full analysis was repeated with and without the discordant banks. The maps were less cluttered, and more visually attractive, when the discordant banks were omitted, but the relative position of bankrupt versus continuing banks was not affected, and it was decided to include all the banks in the reported results.
Although it is possible to think of many ways of comparing individual banks, the easiest way to do it is to calculate a correlation coefficient between banks using standardised ratios as variables. This correlation coefficient is then treated as a measure of euclidean distance. The advantage of proceeding in this way is that the parallelism with PCA is maintained. If there is little to choose between two particular banks on the basis of their financial structure, any measure of similarity that may be calculated will take a small value, and if two banks have diverse financial structures, any measure of similarity will take a large value. The exact
specification of the similarity measure will have little impact on the relative position of the two banks in the final map (although it may affect their exact position in the space). Furthermore, the use of correlation coefficients as distances will ensure that if the assumptions that underlie discriminant analysis are satisfied, a hyperplane through the final configuration of the MDS map will produce the same classification as the linear discriminant function.
III. RESULTS OF MDS ANALYSIS
The first decision to be taken in any MDS analysis is the choice of the number of dimensions in which the map is to be drawn. To this effect, the similarity matrix was first the subject of a PCA study. It was found that the first three principal components accounted for 93.3% of the variance, the first component accounting for 52.9% of the variance, the second for 28.4, and the third for 12.1%. Thus it appears to be the case that a map in three dimensions is sufficient to describe the data. This is a remarkable result, Mar-Molinero and Ezzamel (1991), in common with previous studies reviewed there, found it necessary to use seven dimensions to describe the financial health of a company.
Another way of establishing the dimensionality of the map is to produce the MDS map in six, five, four, three, two and one dimensions and observe how the number of dimensions influences the quality of the representation. A series of measures of goodness of fit are available but, as in previous studies, Young's stress1 formula was chosen. The results are given below:
DIMENSIONS STRESS 1
It is apparent from the results that a solution in six dimensions gives an almost perfect representation of the data, a conjecture that is confirmed by examining Sheppard's diagram, which collapses into a straight line. Perhaps a solution in three dimensions, as suggested by PCA, is appropriate for most purposes, but it was decided to work in the first instance with six dimensions.
The solution gave, for every bank, a set of six co-ordinates that locate the bank in the space. It is, however, possible that some of the dimensions may not be associated with the probability of failure. Visual inspection of the maps suggested that only two dimensions were relevant in this respect, but just to confirm this conjecture a logit analysis was performed. The dependent variable was a dichotomy, zero in the case of a continuing firm and one in the case of a failed one. The coordinates of the points in the space were used as explanatory variables. Only the first two dimensions returned coefficients that were significantly different from zero at the 95% level. The fourth dimension returned a coefficient that was just marginally insignificant at the 90% level. This suggest that a map in the first two dimensions may give an appropriate visual representation of the salient features in the data. The fourth dimension may have something to contribute to the prediction of company failure, but given its relative little weight and the uncertainty associated with the relevant coefficient, it was decided that what was to be gained by working in two dimensions far outweighed the loss in predictive ability that might derive from the exclusion of the fourth dimension. The issue of non-linearities was explored by using the squares and some cross-products of the coordinates as regressors. It was found that the only non-linear term that could be said to be needed in the logit analysis was the square of the second co-ordinate. This suggests that the second dimension acts as in a non-linear way, and that linear discriminant analysis is not totally appropriate for this data set.
Although the objective of the above exercise was not to discriminate between failed and non-failed banks, but to explore the dimensions that are relevant to the geometrical representation of the data, it was pleasing to see that the best model found only misclassified
four points, and the worse, seven. Laffarga's et al. (1986) linear discriminant model misclassified seven points and Pina's (1989) logit misclassified four.
R1: Current Assets/Total AssetsR2R1R2: Current Assets-Cash/Total Assets
R3R3: Current Assets/Loans 4
R5: Net Income/Total Assets
R6: Net Income/Total Equity Capital3R4R7: Net Income/Loans
R8: Cost of Sales/Sales
R9: Cash Flow/Loans
Dimensi髇 1Bankrupt BanksSolvent banks
Figure 1. Multidimensional Scaling. Projection on the first two dimensions and profit analysis.
The projection on the first two dimensions of the six dimensional MDS configuration is shown in Figure 1. The map, like all other statistical results reported in this paper, was obtained by means of the package SPSS. Each point in Figure 1 is a bank. The map does not make it possible to locate individual banks. It would have been possible to replace points with numbers or names, but this would have cluttered the figure without adding to its intrinsic value. The only characteristic of a bank that has been displayed is whether the bank has failed or is continuing to trade. It can clearly be seen that failed banks fall towards the right hand side of the map while continuing banks fall towards the left hand side, this suggests that the first dimension is a powerful failure indicator. It is also possible to draw a curved line that leaves most failed banks on the right hand side and most continuing banks on the left hand side. This would be the exact equivalent of doing non-linear discriminant analysis. A more sophisticated approach would be to accept that there is a region of space clearly associated with failure, a region of the space clearly associated with success, and a region of space where
anything can, and does happen. What is surprising is the narrowness of the region where failed and non-failed banks coexist.
IV. PROFIT ANALYSIS
The previous section has concentrated on the construction of a map in which it can be seen that failed banks, the year before they fail, are different from continuing banks. This section explores the possible interpretation of the above results.
A first attempt at interpreting the data was done by means of cluster analysis, and this, combined with knowledge of the individual banks, revealed some interesting features of clearly failed and clearly non-failed banks. The results are not reported here. Much more informative was the attempt to see how the position of a bank in the space was associated with the financial structure of the bank. This was done by means of profit analysis.
Profit analysis is a regression based technique. It attempts to explain up to what point the value that a particular ratio takes for a given bank is associated with the position in the space of the point that represents the bank. There are various ways in which logit analysis can be conducted. We chose the metric, rather than the non-metric approach, to benefit from the relationship between profit and regression analysis, as described by Mar-Molinero (1991). A set of multiple regressions were run using as dependent variable each financial ratio in turn, and as independent variables the coordinates of the points in the space. The results were represented in a graphical way in Figure 1. The results were surprisingly good. In every case, the value of the R square statistic was well in excess 0.9. For every ratio, a line is drawn through the space in such a way that the value of the ratio increases in the direction of the line. Profit analysis clearly shows the first dimension to be associated with profitability, and the second to be associated with liquidity. The clear message is that profitability is the most important determinant of bank failure although liquidity also plays a part.
A scale of measurement can be attached to each one of the lines, and the value of the ratio for a given point can be estimated by orthogonally projecting the point on to the line and reading the appropriate value on the scale. These scales have not been reproduced here. The scales associated with profit analysis can also be used in the converse way: given a set of ratios for a bank, they can be used to approximately locate it in the space. When this is done, if the new bank falls in the area associated with failed banks, the possibility that the bank may fail has to be contemplated on the grounds that it has a financial structure that is not different from the financial structure of other banks that have already failed.
It has been shown how MDS techniques can be of use in order to give a visual presentation of multivariate data. In the particular case of Spanish banks, a representation on two dimensions, which can be labelled profitability and liquidity was sufficient for the prediction of company failure. Experience with other data sets suggests that it is often the case that an appropriately chosen two-dimensional map will display the relevant features of the data for the decision at hand; for other examples in different contexts see Mar Molinero (1988) and Mar Molinero and Portilla (1993).
The advantage of a graphical representation is that a clear and unambiguous explanation can be given for the classification of a bank, or company, into the failed or non-failed set. In the case of banks, or companies, that formed part of the analysis, a clear picture emerges. A new bank, or company can be added to the configuration. If it clusters with the failed banks, then it must be treated with care on the grounds that its financial structure is not different from the financial structure of other banks, or companies, that failed in the past. If the bank clusters with non-failed companies the concern disappears. There is a region in the space where anything can happen. The advantage of the MDS representation is that this region is clearly evident.
The addition of a new bank to the data set can be performed in various ways. The most desirable one would be to repeat the study with one more point in the space. Besides the technical skills required to conduct such a study, there is the added disadvantage that computer packages have a limit on the number of points that can be included in the analysis, and this may limit the useful life of MDS as a tool. A second alternative would be to keep the results of the previous study unchanged, and add the new point to the configuration. This second option is available in some packages like MDS(X) but appears not to be presently programmed in the version of SPSS that was used for this study. The third option is a pragmatic one: auxiliary scales produced by means of PROFIT analysis can be used to approximately locate the new point on the space. This last option removes the need for a microcomputer as an aid to decision making. All that is required is a chart with some scales and predefined regions: failed, non-failed, and uncertain. The banks used to generate the chart need not be included in it. Any new bank can be placed in the chart by means of the auxiliary scales. This last option would be a black-box type of approach where the results of a fairly sophisticated analysis support a simple decision tool but remain hidden from the user. This philosophy opens the door to other black-box type of approaches such as the neural network approach of Serrano Cinca and Martin del Brio (1993).