A subject transfer framework for EEG classification

By Edwin Garcia,2014-07-04 17:47
18 views 0
A subject transfer framework for EEG classification


    Neurocomputing 00 (2011) 111

    A subject transfer framework for EEG classi?cation

    Wenting Tu, Shiliang Sun?

    Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P. R. China


     This paper proposes a subject transfer framework for EEG classi?cation. It aims to improve the classi?cation per- formance when the training set of the target subject (namely user) is small owing to the need to reduce the calibration session. Our framework pursues improvement not only at the feature extraction stage, but also at the classi?cation stage. At the feature extraction stage, we ?rst obtain a candidate ?lter set for each subject through a previously pro- posed feature extraction method. Then, we design different criterions to learn two sparse subsets of the candidate ?lter set, which are called the robust ?lter bank and adaptive ?lter bank, respectively. Given robust and adaptive ?lter banks, at the classi?cation step, we learn classi?ers corresponding to these ?lter banks and employ a two-level ensemble strategy to dynamically and locally combine their outcomes to reach a single decision output. The pro- posed framework, as validated by experimental results, can achieve positive knowledge transfer for improving the performance of EEG classi?cation.


    EEG Classi?cation, Transfer Learning, Ensemble Learning, Sparse Representation

    1. Introduction

     During recent years, the development of brain computer interface (BCI) technology has both theoretical and practical signi?cance. BCIs have the ability to enable their users to manipulate an external device by means of translating brain activities into a command for a computer or machine [1]. They are communication and control systems that do not require any peripheral muscular activities, and therefore can be a very helpful aid to people suffering from motor disabilities [2]. As Fig. 1 shows, BCIs can be seen as a complex pattern recognition system [3], where the user’s ability to reliably produce changes of electroencephalogram (EEG) signals and subsequent stages of feature extraction and classi?cation are equally important and can complement one another.

     Improving classi?cation performances of EEG-based BCI systems faces a great challenge today. One problem is that, for a new subject (user), a long calibration session (e.g. more than one hour) is needed to collect suf?cient training samples to construct subject-speci?c feature extractors and classi?ers, which are used later in the test session to classify brain signals of this subject. In recent research in BCIs, reducing training sessions is a task of great sense since the calibration session is a boring and time-consuming process. Therefore, it is more desirable to conduct performance improvement with a small labeled set, rather than on an abundant one. However, because a short calibration session ? Corresponding author. Tel.: +86 21 54345186; fax: +86 21 54345119. E-mail address: (S. Sun).

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 2

    Figure 1: A general EEG-based BCI [4]

    means only a few training samples of the target user are available, which may lead to suboptimal or over?tting feature extractors or classi?ers, we have to ?nd appropriate methods to enhance the performance.

     One promising way to reduce training sessions is to utilize the samples collected from other subjects (we call them as “source subjects”) to aid the subject whose brain signals would be classi?ed in the test session (we call this subject as “target subject”). This strategy can be termed “subject transfer”. However, owing to the large inter-subject variabilities, it is unwise to simply add training samples of source subjects to the training set of the target subject. This is often unhelpful and even degrades the performance. How to properly use the data of source subjects is the key to achieving positive subject transfer for EEG classi?cation.

     Several related works also focusing on the small training sample problem in EEG classi?cation are described here. Semi-supervised learning with local temporal regularization [5] has been proposed to utilize test samples to solve the small training sample problem. However, for real BCI systems, collecting a lot of test samples are sometimes unpractical, so that semi-supervised technology may be unsuitable. Another important work for this problem is the session transfer strategy [6]. However, this method assumes the target user has already performed some training sessions, so that it can not work for a new user who did not perform any training session before. Moreover, there are several works on adaptive learning also related to this problem [7, 8, 9]. This paper proposes a framework for achieving subject transfer strategy. It can be applied to users with few training samples and it handles test samples with the one-after-one manner, which is similar to the real situation of most BCI systems.

     This paper proposes a framework for improving EEG classi?cation performance. It can achieve positive subject transfer with improvement on both feature extraction and classi?cation stages. Traditional spatial ?ltering algorithms often have some important limitations: It is not ?exible enough and often over?tts or under?ts, especially in the small training sample situation (for details, see [10, 11, 12, 13]). Thus, we design a method to obtain two banks that ensure robustness and adaptiveness of spatial ?ltering, respectively. Speci?cally speaking, it uses a previous method called extreme energy ratio (EER) to obtain candidate ?lter banks, and then extracts its two subsets with 1-norm regularization and different performance criterions. Given multiple classi?ers corresponding to these banks, we employ an ensemble strategy to combine them. The proposed fusion method assigns dynamic weights to these base models according to the local structure of a given test sample in the feature space. These weights represent the prediction consistency of the model. With this weighting approach, we can learn robust ensemble learner and adaptive ensemble learner, respectively. Finally, a parameter weighting combination of them makes the ?nal prediction on the category of the given test sample. Fig. 2 provides a structural illustration of our framework. The experiment is performed on public datasets from nine subjects and the results demonstrate the excellent performance of our method.

     The rest of this paper is organized as follows. Subject transfer based BCI systems are introduced in Section 2. Section 3 displays main notations used in this paper. Next, the feature extraction stage and classi?cation stage of our framework are shown in Section 4 and Section 5, respectively. In Section 6, experimental results and performance analysis are provided. At last, conclusions and plans for future work are given in Section 7.

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 3

    Robust Filter Bank Model Mr 1 Candidate Robust Filter Bank ensemble 1 learner ModelMa Adaptive 1 Filter Bank

    Robust Final ensemble learner Model Mr 2 Filter Bank Candidate Filter Bank 2 ModelMa Adaptive 2 Filter Bank Adaptive ensemble …… learner

    Robust Filter Bank Model Mr Candidate K+1 Filter Bank K+1 Adaptive ModelMa Filter Bank K+1

    Figure 2: The proposed subject transfer framework for EEG classi?cation.

    2. Subject Transfer based BCI Systems

     In this part, the subject transfer based BCI system is proposed. It wishes to use data from other subjects to reduce training burden of the current user.

     It is known that the procedure of using BCI devices that includes two parts: training and test sessions. Before a BCI device can be served for users, users should ?rstly perform training sessions to provide enough training samples to systems. This can be achieved by two steps: First, devices show a command by offering a visual or audible cue and users should act accordingly By that way, BCI systems can collect enough training samples to learn a model for performing later tasks The training session of a BCI device often consumes half to one hour.

     However, long time-consuming training sessions bring huge dif?culties to the practical wide use of BCI devices. First, not all users can perform training sessions (e.g. people who have visual or audible handicaps) and sometimes users who wish to use the BCI devices in a hurry are not willing to perform long time-consuming sessions. All these situations can be solved by transferring other training sets to help the current test task. Subject transfer schedule aims to use training sets from other users to help the current user. Fig. 3 shows a recommend for practical applications of subject transfer based BCI systems. The datasets from source subjects can be stored as a dataset group. Then, when the BCI device is ready to perform classi?cation task for the user, it can ?rstly obtain transfer data from source subject group. For example, it can select training sets of users whose characteristics are similar with the target user (e.g. same age or sex). Then, after training with these datasets, the device can perform test sessions for the target user. Owing to large inter-subject variances, how to extract and transfer knowledge from source datasets is the key challenge for the subject transfer schedule.

    3. Notations

     Denote an observed EEG sample as an N × T matrix x, where N is the number of recording electrodes and T is the number of total points during the recording period. For a subject, its training set is denoted as X.

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 4

    Source subject Group

    Storage devices

    …… ……

    Transfer Data



    Sex: Age: .. Training

    Target subject

    Figure 3: Subject transfer based BCI systems.

    S S Here, suppose we have training sets from K source subjects denoted as XS 1 , XS 2 , ? ? ? , XS K where XS j = {x1 j , x2 j , ? ? ? ,

     Sxn jj } ( j = 1, 2, ? ? ? , K), and n j is the number of samples in the training set XS j . The label set of the training set XS j

    S S S S is denoted as YS j = {y1 j , y2 j , ? ? ? , yn jj }, where yS k ? {?1, 1} is the label of xi j (i = 1, 2, ? ? ? , n j ). Moreover, we havei

     SSS K+1a small training set of the target subject denoted as XS K+1 = {x1 K+1 , x2 K+1 , ? ? ? , xnK+1 } and the corresponding label set K+1 YS K+1 = {yS K+1 , yS K+1 , ? ? ? , ySK+1 }.n12

    4. Feature Extraction Stage: Spatial Filter Bank Construction

    4.1. Candidate Filter Bank Construction

     We construct the candidate ?lter bank through a previous feature extraction method called EER algorithm [14]. The EER algorithm aims at learning spatial ?lters which maximize the variance of spatially ?ltered EEG signals from one class while minimizing the variance of signals from the other class, and has been proven to be theoretically equivalent and computationally superior to the commonly used common spatial patterns (CSP) method for EEG signal processing [14, 15, 16].

     Assume only one latent signal source from each class is to be recovered. For an EEG sample x, the spatially ?ltered signal with a spatial ?lter denoted by φ(N×1) will be φ x. The signal energy after ?ltering can be represented by the sample variance as x)(φ x)φ Cφ, where C is the normalized covariance of one EEG sample and given by



     T ? 1 tr(xx )

    Ignoring the multiplicative factor 1/(T ? 1) in the following calculation of covariance, the covariances for speci?c classes can be computed as the average of all single covariances so as to get a more accurate and stable covariance

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 5 estimate: TA xp xp 1 , (2) CA = tr(x p x p ) p=1 TA

    TB xq xq 1 , (3) CB = tr(xq xq ) q=1 TB

    where T A (T B ) is the number of samples from class A (B) and x p (xq ) is the pth (qth) sample belongs to class A (B). Therefore, in order to maximize the difference of energy features under two conditions, EER ?nds a spatial ?lter which maximizes or minimizes their ratio to maximize the difference of their energy features. Thus, the discriminative EER criterion is de?ned as:

     φ CAφ

     ,(4)max / min

     φ CBφ

    which can be extended readily to extract m latent source signals from each class. After eigen-decomposition, EER always selects m eigenvectors from top and end of the eigenvalue spectrum respectively to construct the spatial ?lter bank. In other words, the traditional spatial ?lter bank includes 2m spatial ?lters half of which are eigenvectors corresponding to the top of the eigenvalue spectrum and the other half are ones corresponding to the end of the eigenvalue spectrum. The spatial ?lter bank constructed by this strategy has some obvious limitations. First, the number of spatial ?lters in the bank is always even, which makes the method in?exible. Second, it always includes the ?rst and the last eigenvector of the eigenvalue spectrum. However, due to the non-stationary nature of brain signals and the existence of outliers, those two spatial ?lters may over?t on training set and thus are probably unsuitable to be included in the ?lter bank. Third, the number of the spatial ?lters in the bank can be larger or smaller than the optimal but unknown number, which often cause over?tting or under?tting defects.

     Therefore, we only use the group of eigen-vectors as the candidate ?lter bank. Sparse strategy is then employed to help us build spatial ?lter banks. Moreover, adaptiveness and robustness of our subject transfer framework are ensured with different objective functions.

    4.2. Robust and Adaptive Filter Bank Construction

     For XS j ( j = 1, 2, ? ? ? , K + 1), given its candidate ?lter set Φ j = j1 , φ j2 ? ? ? , φ jN } ? N×N , we learn its two subsets called robust ?lter bank Φr j and adaptive ?lter bank Φa j by means of sparse representation. Note that each ?lter in the candidate ?lter set recovers a source signal:

     S = φT X.(5)

    The source signals recovered by different ?lters have different importance. Some of them are nonsensitive to inter- subject variabilities, and able to extract generic common discriminative features. These ?lters are potential to construct robust ?lter banks. In addition, there may be other ?lters that can recover source signals that are adaptive to the target subject. These ?lters should also be selected out to serve as the adaptive ?lter banks. The robust ?lter bank can alleviate the over?tting trend of the adaptive one. Meanwhile, the adaptive ?lter bank helps to improve the performance of classifying brain signals of the target subject.

     Here we employ a penalized ?lter selection to construct these two kinds of ?lter banks. It includes a classi?cation objective function and a penalty term. We design different classi?cation performance criterions to learn banks that extract robust and adaptive source signals to construct robust and adaptive ?lter banks, respectively. Our choice for the second term is L1 (1-norm) penalty regularization [17]. It is a popular technology to enforce solution sparsity and has been the driving force for many emerging ?elds in signal processing, such as sparse coding and compressive sensing.

    4.2.1. Robust Filter bank

     Given source subjects with rich training samples, the generalization ability can be evaluated by the average clas- si?cation ability on the training sets of all source subjects S 1 , S 2 , ? ? ? , S K . Note that this de?nition loses effectiveness if there is considerable variability in the size of sample numbers of source subjects. However, this negative situation would not happen in the subject transfer based BCI systems since we can control the size regularity of source training sets. First, subjects with limit training samples should not be included in the source subject candidates. Second, since

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 6 the training session of BCI is a boring and time-consuming process, it is impossible that a user would be pleased to provide huge training samples. Even if a user provides a huge training set, we can extract a subset whose size is close to other source training sets.

     Assume we wish to select ?lters from Φ j ( j = 1, 2, ? ? ? , K + 1) to construct a robust ?lter bank Φr j . We ?rstly ex-

    SSSkpress each sample xiS k (i = 1, 2, ? ? ? , nk , k = 1, 2, ? ? ? , K +1) as its energy feature expression xiS k = (xi1k , xi2k , ? ? ? , xiN ) , where xiSfk is the energy feature corresponding to the f -th ?lter in Φ j : xiSfk = j f xiS k )(φ j f xiS k ) . Then, with that ex- pression, the robust ?lter bank Φr j can be learned by the optimization problem of minimizing:

    nk K 1 1 (yS k (6) K nk i i=1 k=1

     where nk is the sample number of the training set from subject k, and βr j = r j1 , ? ? ? , βr jN ] , λr j0. The ?rst ? βr j xiS k )2 + λr j ||βr j ||1 , term in (6) is an error function calculated on the training sets of all source subjects and 1/nk is used to balance the in?uences of source training sets when constructing robust ?lter banks, so that the negative in?uence of imbalance source training set can be eliminated. The second one is an L1 penalty with || ? ||1 denotes L1 norm (sum of absolute values). Owing to the abundant training samples of source subjects, the ?lter bank corresponding to the excellence performance on them is assumed to capture generic common discriminative features among subjects. This means it is comparatively robust. An important property of the L1 penalty is that it can generate zero coef?cients. In fact, β reveals the complexity of the model and minimizing its L1 norm means the complexity of the model is penalized. In our STF method, we can select the spatial ?lters corresponding to the non-zero coef?cients of βr j to construct the robust ?lter bank Φr j .

    4.2.2. Adaptive Filter bank

     Moreover, we also need to construct spatial ?lter banks adaptive to the target subject. In our STF method, “adaptive to a subject” means “subject-speci?c”. Therefore, the loss function for learning adptive ?lter bank Φa j should be related to the classi?cation ability on the training set of the target subject, and thus the optimization problem would be to minimize:


    (yS K+1 ? βa j xi S K+1 )2 + λa j ||βa j ||1 .i (7)


    Analogously, the adaptive ?lter bank can be constructed by the ?lters corresponding to the non-zero coef?cients of βa j .

     The relation of λ and the number of spatial ?lters to be selected should be noted: In principle, if we select the value of λ properly, we can obtain a ?lter bank containing any possible number of the spatial ?lters. Brie?y speaking, the larger λ is, the more zero elements of β would have. When λ ? 0, more features will be selected. However, since the corresponding classi?er would be bad complex, it may have unsatisfactory prediction and be less interpretable. When λ ? +?, fewer features will be selected. The case of λ = +? corresponds to the simplest classi?er where no input variable is used for classi?cation. As a result, if we select the interval of λ values properly, we can obtain a ?lter bank containing any possible number of the spatial ?lters. The optimal number of spatial ?lters in one bank can be determined by the cross-validation technology.

    5. Classi?cation Stage: Ensemble Strategy of Multiple Models

     Given robust ?lter banks and adaptive ?lter banks of all subjects, we can learn equal amounts of models (bank classi?er pairs). We denote the model obtained by the robust (adaptive) ?lter bank of subject j ( j = 1 ? ? ? K + 1) as Mr j (Mr j ). It is trained with some classi?cation strategy based on the training set of subject j projected by the robust (adaptive) ?lter bank Φr j a j ). Then, we should employ a ensemble strategy to combine their outcomes into a single one. However, most of existing model weighting approaches always assign static weights to models, which are either uniform (e.g., in Bagging [18]), or proportional to the training accuracy (e.g., in Boosting [19]), or ?xed by favoring certain models (e.g., in single-model classi?cation). Such a static weighting scheme may not perform well for subject transfer framework since different test examples may favor predictions from different base models, which may be

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 7 caused by the time-varying property of target subject’s brain patterns [20]. Here, we employ a dynamic ensemble strategy to assign different weights for distinct test samples. Speci?cally, weights are de?ned with source samples surrounding the given test sample, so that the weights can estimate similarities between source subjects and the target subject. The ensemble strategy can be decomposed to two levels: The ?rst level constructs the robust ensemble learner and adaptive ensemble learner, whose weights are de?ned dynamically and locally. The second level combines two learners to a single learner to complement each other.

     Given robust models corresponding to the robust ?lter banks of all subjects, we show the construction of the robust ensemble learner. When the test sample xi is to be classi?ed, the robust models of all subject Mr j ( j = 1 ? ? ? K + 1) make up a robust ensemble learner:


    RE(xi ) = Wr j × Mr j (xi ), (8)


    where RE(xi ) denotes the robust ensemble result of test sample xi , Mr j (xi ) is the result of the robust model of subject i and Wr j is the weight of the model Mr j . Our weight assignment determines Wr j by ?rstly mapping the test sample and subject j’s samples into the space projected by the robust bank of subject j, and then weighting each model locally according to its prediction consistency with the neighborhood structure of the test example surrounded by subject j’s samples. We de?ne another two weights for determining Wr j :

    N j (xi ) W1r j = (9) ,

     nj N + (xi )j W2r j = , (10) N j (xi )

    where N j (xi ) is the number of neighborhoods of xi , and N + (xi ) represent the maximum number of samples belongingj to the same class. Here the neighborhoods of xi can be de?ned as:

    < d, x ? X s j }, {x x ? xi (11) F

    where d is a parameter that controls the local range and ? denotes the Frobenius norm (e.g. F=2). F Given W1r j and W2r j , we de?ne Wr j as follows:

    Wr j = W1r j × W2r j . (12) This expression is de?ned owing to the assumption that if a test sample plots more close with the samples of a source subject, and meanwhile, if most of the samples around it belong to the same class, the prediction of the trained model of the subject is more dependable. Note that, in training sessions of BCI systems, we can control the sample number of each class, so that the classes are nearly balance. However, for other application domains, the class imbalance problem may appear and the effectiveness of our method may be eliminated: Let’s assume, one of the source training sets contains a very biased number of class labels and the test sample is at a location which most of the training samples are of that subject is inside the neighborhood. This may happen since d is likely to be larger than half the distance between cluster means. Then the ?lter will be skewed towards the biased labeled subject, even if the class label of the test sample is different than the biased label. This might cause undesired decision. Modifying our method to ?t the class imbalance problem is a big challenge that is worth researching. Considering this negative situation would hardly happen, we do not provide the solution to this problem, but put it on our future work plan.

     Then, we use the norm term of the above calculated weights to construct the robust ensemble learner. It should be noted that other methods for combining W1r j and W2r j are also possible, which is a challenge of our future work.

     Analogously, we can get our adaptive ensemble learner:


    AE(xi ) = Wa j × Ma j (xi ), (13)


    where AE(xi ) denotes the adaptive ensemble result of test sample xi and Ma j (xi ) is the result of the adaptive model of Subject j. Its optimal model weights are also computing by ?rstly mapping and then measuring the models’ prediction consistency of a given test sample as for robust ensemble learner.

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 8

     Then, we use a parameter α ? [0, 1], which can be determined by cross-validation, to control the balance between robustness and adaptiveness of the ?nal ensemble learner:

    E(xi ) = (1 ? α)RE(xi ) + αAE(xi ). (14) Finally, the ensemble learner E can dynamically classify brain signals of the target subject in the test session. 6. Experiments

    6.1. Data Description

     The EEG data used in this study were made available by Dr. Allen Osman of University of Pennsylvania during the NIPS 2001 BCI workshop [21]. There were a total of nine subjects denoted S 1 , S 2 , ? ? ? , S 9 , respectively. For each subject, the task was to imagine moving his or her left or right index ?nger in response to a highly predictable visual cue. EEG signals were recorded with 59 electrodes mounted according to the international 1020 system and the sampling rate is 100 HZ. A total of 180 trials were recorded for each subject. Ninety trials with half labeled left and the other half right were used for training, and the other 90 trials were for testing. Each trial lasted six seconds with two important cues. The preparation cue appeared at 3.75 s indicating which hand movement should be imagined, and the execution cue appeared at 5.0 s indicating it was time to carry out the assigned response.

     Signals from 15 electrodes over the sensorimotor area are used in this paper, so that the raw dimension of the energy feature space is 15. Moreover, for each trial, the time window from 4.0 s to 6.0 s is retained for analysis. Other preprocessing operations include common average reference [22], 830 Hz bandpass ?ltering, and signal normaliza- tion to eliminate the energy variation of different recording instants [16]. Moreover, we reduce the training set of each target subject by randomly extracting 20 samples from the original set to simulate the small calibration. 6.2. Experimental Setup and Result Analysis

     Here, we employ linear discriminant analysis (LDA) as the classi?cation method in our experiment. This technique has been used with success in a number of BCI systems such as motor imagery based BCIs [23], P300 spellers [24], multiclass or asynchronous BCIs [25, 26]. Moreover, LDA algorithm is simple and has a very low computational requirement. Thus, it generally provides good generalization ability. In our subject transfer framework, after the construction of ?lter banks, LDA classi?ers are trained with the feature expression of ?ltered training samples. Then, they give predictions of test samples as the outputs of the models. The ?nal predictions for test samples are obtained by weighting model outputs. The parameters α are determined by 10-fold cross-validation on the training set and its value is selected among the sets [0.1, 0.2, ? ? ? , 1] and d is de?ned as the average distance between samples in the training set of the corresponding subject. Moreover, experimental procedure was repeated 10 times and the averaged results are reported.

     We compare the performance of our subject transfer framework (STF) against two methods. First, we employ the traditional EEG classi?cation method as a baseline. It only uses target training samples to perform test tasks for the target user (baseline). Second, the result of a naive subject transfer method (NSTM) is also reported. It merges source subject training sets and the training set of the target subject as a whole training set. Almost all ?gures demonstrate that our new framework presents the best performance in these three methods. Some other indications are observed. For example, the NSTM may result negative transfer with worse performance than the baseline method (as S 2 , S 4 and S 5 indicate). This fact illuminates the necessity of studying transfer learning methods. Moreover, when the NSTM achieves positive results, our method can further enlarge the performance improvement (as S 1 , S 8 and S 9 depict). For statistical analysis of experimental results, we compare methods using t-test and ?nd our method can remarkably get better performance (the accepted level satis?es P < 0.05).

     For parameter analysis, we report best α values selected by cross-validation in experiment iterations. As Fig. 5 shows, more than 70% of α values are between 0 and 0.5. This result illustrates that when training samples of the target sample are very few, the robust ensemble learner is more important than the adaptive one. It also demonstrates the bene?ts of using source data and the effectiveness of our subject transfer framework. Moreover, in the previous analysis, we assume ?lter numbers of ?lter banks are the same. However, in real applications of STF, they can be different. Table 1 shows the results of STF where ?lter numbers of ?lter banks can be different, and each of them is

    W. Tu and S. Sun / Neurocomputing 00 (2011) 111 9

    75 75 75 Baseline Baseline Baseline NSTM NSTM NSTM STF STF STF 70 70 70 classification classification classification accuracy (%) accuracy (%) accuracy (%)

    65 65 65

    60 60 60

    55 55 55

    50 50 50 4 6810 12 14 4 6810 12 14 4 6810 12 14 2 2 2 number of filters in the bank number of filters in the bank number of filters in the bank

    (a) S 1 . (b) S 2 . (c) S 3 .

    75 75 75 Baseline Baseline NSTM NSTM STF STF 70 70 70 classification classification classification accuracy (%) accuracy (%) accuracy (%)

    65 65 65

    60 60 60

    55 55 55 Baseline NSTM STF

    50 50 50 4 6810 12 14 4 6810 12 14 4 6810 12 14 2 2 2 number of filters in the bank number of filters in the bank number of filters in the bank (d) S 4 . (e) S 5 . (f) S 6 .

    75 75 75 Baseline Baseline Baseline NSTM NSTM NSTM STF STF STF 70 70 70 classification classification classification accuracy (%) accuracy (%) accuracy (%)