DOC

Clustering algorithms group objects into classes based on some ---

By Victoria Ferguson,2014-08-09 01:12
19 views 0
Clustering algorithms group objects into classes based on some ---

    Journal of Machine Learning Research 3 (2002) 747-780 Submitted: 12/01; Published: 12/02

    Coupled Clustering: A Method for Detecting

    Structural Correspondence

    Zvika Marx MARXZV@CS.BIU.AC.IL

    The Interdisciplinary Center for Neural Computation

    The Hebrew University of Jerusalem

    Givat-Ram, Jerusalem 91904, Israel

    and Department of Computer Science

    Bar-Ilan University

    Ramat-Gan 52900, Israel

    Ido Dagan DAGAN@CS.BIU.AC.IL

    Department of Computer Science

    Bar-Ilan University

    Ramat-Gan 52900, Israel

    Joachim M. Buhmann JB@INFORMATIK.UNI-BONN.DE

    Institut für Informatik III

    University of Bonn

    Römerstr. 164, D-53117 Bonn, Germany

    Eli Shamir SHAMIR@CS.HUJI.AC.IL

    School of Computer Science and Engineering

    The Hebrew University of Jerusalem

    Givat-Ram, Jerusalem 91904, Israel

Editors: Carla E. Brodley and Andrea Danyluk

    Abstract

    This paper proposes a new paradigm and a computational framework for revealing

    equivalencies (analogies) between sub-structures of distinct composite systems that

    are initially represented by unstructured data sets. For this purpose, we introduce

    and investigate a variant of traditional data clustering, termed coupled clustering,

    which outputs a configuration of corresponding subsets of two such representative

    sets. We apply our method to synthetic as well as textual data. Its achievements in

    detecting topical correspondences between textual corpora are evaluated through

    comparison to performance of human experts.

    Keywords: Unsupervised learning, Clustering, Structure mapping, Data mining in

    texts, Natural language processing

    1. Introduction

    Unsupervised learning methods aim at the analysis of data, based on patterns within the data itself while no supplementary directions are provided. Two extensively studied unsupervised tasks are: (a) assessing similarity between object pairs, typically quantified by a single value denoting an overall similarity level, and (b) detecting, through various techniques, the detailed structure of individual composite objects. The common approach to similarity assessment has been to examine feature-based vectorial representations of each object in order to calculate some distance or proximity measure between object pairs (see Subsection 2.2 below for examples). A natural extension to this task would be to analyze in detail what makes two composite

    ?2002 Zvika Marx, Ido Dagan, Joachim M. Buhmann and Eli Shamir

    MARX, DAGAN, BUHMANN AND SHAMIR

    Data Set AData Set B

    Figure 1: The coupled clustering framework. The diamonds represent elements of the

    two clustered data sets A and B. Closed contours represent coupled clusters,

    capturing corresponding sub-structures of the two sets.

    objects similar. According to the common view, two given objects are considered similar if they share a relatively large subset of common features, which are identifiable independently of the role they perform within the internal organization of the compared objects. Nonetheless, one might wonder how faithfully a single value (or a list of common features) represents the whole richness and subtlety of what could be conceived as similar. Indeed, cognitive studies make a distinction between surface-level similar appearance and deep structure-based correspondence relationships, such as in analogies and metaphors. Motivated by this conception of structure-related similarity, we introduce in this paper a novel conceptual and algorithmic framework for unsupervised identification of structural correspondences. Data clustering techniquesrelying themselves on vectorial representations or

    similarities within the clustered elementsimpose elementary structure on a given

    unstructured corpus of data by partitioning it into disjoint clusters (see Subsection 2.1 for more details). The method that we introduce herecoupled clusteringextends

    the standard clustering methods for a setting consisting of a pair of distinct data sets. We study the problem of partitioning these given two sets into corresponding subsets, so that every subset is matched with a counterpart in the other data set. Each pair of matched subsets forms jointly a coupled cluster. A resulting configuration of coupled

    clusters is sketched in Figure 1. A coupled cluster consists of elements that are similar to one another and distinct from elements in other clusters, subject to the context imposed by aligning the clustered data sets with respect to each other. Coupled clustering is intended to reflect comparison-dependent equivalencies rather than overall similarity. It produces interesting results in cases where the two clustered data sets are overall not very similar to each other. In such cases, coupled clustering yields inherently different outcome than standard clustering applied to the union of the two data sets: standard clustering might be inclined to produce clusters that are exclusive to elements of either data set. Coupled clustering, on the other hand, is directed to include representatives from both data sets in every cluster. Coupled clustering is a newly defined computational task of general purpose. Although it is ill posed, similarly to the standard data-clustering problem, it is potentially usable for a variety of applications.

    Coupled clustering can potentially be utilized to reveal equivalencies in any type of data that can be represented by unstructured data sets. Representative data sets might contain, for instance, pixels or contours sampled from two image collections, or patterns of physiological measurements collected from distinct populations and so on.

     748

    COUPLED CLUSTERING: A METHOD FOR DETECTING STRUCTURAL CORRESPONDENCE

    Figure 2: Keyword samples from news articles regarding two conflicts. Examples of

    coupled clusters, i.e. matched pairs of corresponding keyword subsets, are

    marked by curved contours.

    The current work concentrates on textual data, while the variety of other conceivable applications should be investigated within subsequent projects. Specifically, we apply coupled clustering to pairs of textual sourcesdocument collections or corpora

    containing information regarding two distinct topics that are characterized by their own terminology and key-concepts. The target is to identify prominent themes, categories or entities, for which a correspondence can be identified simultaneously within both corpora. The keyword sets in Figure 2, for instance, have been extracted from news articles regarding two conflicts of distinct types: the Middle-East conflict and the dispute over copyright of music and other media types (the ―Napster case‖).

    The question of whether, and with relation to which aspects, these two conflicts are similar does not seem amenable to an obvious straightforward analysis. Figure 2, however, demonstrates some non-trivial correspondences that have been identified by our method. For example: the role played within the Middle East conflict by individuals—such as ‗soldier‘, ‗refugee‘, ‗diplomat‘—has been aligned by our

    procedure, in this specific comparison, with the role of other individuals—‗lawyer‘,

    ‗student‘ and ‗artist‘—in the copyright dispute.

    Cognitive research, most notably the structure mapping theory (Gentner, 1983),

    has emphasized the importance of the various modes of similarity assessment for a variety of mental activities. In particular, the role of structural correspondence is crucial in analogical reasoning, abstraction and creative thinking, involving mental maps between complex domains, which might appear unrelated at first glance. The computational mechanisms introduced for implementing the structure mapping theory and related approaches typically require as input data items that are encoded a priori with structured knowledge (see Section 7 for further discussion). Our framework, in distinction, assumes knowledge of abstracted type, namely similarity values pertaining to unstructured data elements, while the structure that underlies the correspondence between the compared domains emerges through the mapping process itself.

    Structural equivalencies consistently enlighten various fields of knowledge and scholarship. Historical situations and events, for instance, provide a rich field for the construction of structural analogies. A unique enterprise dates back to Plutarch's ―Parallel Lives‖, in which Greek public figures are paired with Roman counterparts

     749

    MARX, DAGAN, BUHMANN AND SHAMIR

    1whose ―feature vectors‖ of life events and actions exhibit structural similarity. In

    comparison to ancient times, the current era presents growing accessibility to large amounts of unstructured information. Indeed, intensive research takes place, within areas such as information retrieval and data mining, aiming at the needs that evolve in an information-intensive environment. This line of research typically addresses similarity in its surface feature-based mode. A subsequent objective would be to relate and compare separate aggregations of information with one another through structure mapping. In the field of competitive intelligence, for example, one attempts to obtain knowledge of the players in a given industry and to understand the strengths and weaknesses of the competitors (Zanasi, 1998). In many cases, plenty of data

    financial reports, white papers and so onare publicly available for any type of

    analysis. The ability to map automatically knowledge regarding products, staff or financial policy of one company onto the equivalent information of another company could make a valuable tool for inspection of competing firms. If applied appropriately to such readily available data, structure mapping might turn into a useful approach in future information technology.

    The current paper extends earlier versions of the coupled clustering framework that have been presented previously (Marx, Dagan and Buhmann, 2001; Marx and Dagan, 2001). In Section 2, we review the computational methods used within our procedure, namely standard clustering methods and co-occurrence based similarity measures. The coupled clustering method is formally introduced in Section 3. Then, we demonstrate our method's capabilities on synthetic data (Section 4) as well as in detecting equivalencies in textual corpora, including elaboration of the conflict example of Figure 2 and identification of corresponding aspects within various religions (Section 5). Evaluation is conducted through comparison of our program's output with clusters that were constructed manually by experts of comparative studies of religions (Section 6). Thereafter, we compare the coupled clustering method with related research (Section 7). In Section 8, we illustrate how coupled clustering, which is essentially a feature-based method, could be used to detect equivalent relational patterns of the type that have been motivating cognitive theories of structural similarity. The paper ends with conclusions and directions for further research (Section 9).

    2. Computational Background

    The following two subsections address the computational procedures that are utilized within the coupled-clustering framework. The first subsection, reviewing the data-clustering task, concentrates on the relevant details of the particular approach that we have adapted for our algorithm (Subsection 3.3), by Puzicha, Hofmann and Buhmann (2000). The following subsection reviews methods for calculating similarity values. It exemplifies co-occurrence based techniques of similarity assessment, utilized later for generating input for coupled clustering applied to keyword data sets extracted from textual corpora (Sections 5, 6).

    2.1. Cost-based Pairwise Clustering

    Data clustering methods provide a basic and widely used tool for detecting structure within initially unstructured data. Here we concentrate on the basic clustering task, namely partitioning the given data set into relatively homogenous and well-separated clusters. Hereinafter, we shall refer to this standard task by the term standard

     1Plutarch's ―Lives‖ can be browsed at http://classics.mit.edu/Browse/browse-Plutarch.html.

     750

    COUPLED CLUSTERING: A METHOD FOR DETECTING STRUCTURAL CORRESPONDENCE

    clustering, to distinguish it from coupled clustering. The notion of data clustering can be extended to include several additional methods, which would not be discussed further here, for instance, methods that output overlapping clusters or hierarchical clustering that recursively partitions the data.

    Our motivation for utilizing a clustering mechanism for mapping structure stems from the view of clustering as extracting of meaningful components in the data (Tishby, Pereira and Bialek, 1999). This view is particularly sensible when the clustered data is textual. In this case, clusters of words (Pereira, Tishby and Lee, 1993) or documents (Lee and Seung, 1999; Dhillon and Modha, 2001) can be referred to as explicating prominent semantic categories, topics or themes that are substantial within the analyzed texts.

    Clustering often relies on associating data elements with feature vectors. In this case, each cluster can be represented by some averaged (centroid) vectorial extraction (e.g., Pereira, Tishby and Lee, 1993; Slonim and Tishby, 2000b). An alternative approach, pairwise clustering, is based on a pre-given measure of similarity (or

    distance) between the clustered elements, which are not necessarily embeddable within a vector space or even a metric space. Feature vectors, if not used directly for clustering, can still be utilized for defining and calculating a pairwise similarity measure.

    The clustering task is considered ill-posed: there is no pre-determined criterion measuring objectively the quality of any given result. However, there are clustering algorithms that integrate the effect of the inputfeature vectors or similarity values

    through an objective function, or cost function, which assigns to any given

    partitioning of the data (clustering configuration) a value denoting its assumed quality.

    The clustering method that we use in our work follows a cost-based framework for pairwise clustering recently introduced by Puzicha, Hofmann and Buhmann (2000). They present, analyze and classify a family of clustering cost functions. We review here relevant details of their framework, to be adapted and modified for coupled clustering later (Section 3).

    A clustering procedure partitions the elements of a given data set, A, into disjoint

    subsets, A, …, A. Puzicha et al.'s framework assumes ―hard‖ assignments: every 1k

    data element is assigned into one and only one of the clusters. Ambiguous, or soft, assignments can be considered advantageous in recording subtleties and ambiguities within lingual data, for example. However, there are reasons to adhere first to hard assignments. It is technically and conceptually simpler and it constructs definite and easily interpretable clustering configurations (Section 7 further addresses this topic). The number of clusters, k, is pre-determined and specified as an input parameter to the clustering algorithm.

    A cost criterion guides the search for a suitable clustering configuration. This criterion is realized through a cost function H S, M taking the following parameters:

    2(i) S )s? : a collection of pairwise similarity values, each of which aa'a,a'?A

    pertains to a pair of data elements a and a' in A.

    (ii) M = (A, …, A) : a candidate clustering configuration, specifying 1k

    assignments of all elements into the disjoint clusters (that is A=A and A;A=j jj'

    Ø for every j?j'k). ;

     2 In their original formulation, Puzicha et al. use distance values (dissimilarities) rather then similarities. Hereinafter, we apply straightforward adaptation to similarity values by adding a minus sign to H. Adhering to the

    cost minimization principle, this transformation replaces the cost paid for within-cluster dissimilarities with cost saved for within-cluster similarities (alternatively pronounced as ―negative cost paid‖).

     751

    MARX, DAGAN, BUHMANN AND SHAMIR

    The cost function outputs a numeric cost value for the input clustering-configuration M, given the similarity collection S. Thus, various candidate

    configurations can be compared and the best one, i.e the configuration of lowest cost, is chosen. The main idea, underlying clustering criteria, is the preference of configurations in which similarity of elements within each cluster is generally high and similarity of elements that are not in the same cluster is correspondingly low. This idea is formalized by Puzicha et al. through the monotonicity axiom: in a given

    clustering configuration, locally increasing similarity values, pertaining to elements within the same cluster, cannot decrease the cost assigned to that configuration. Similarly, increasing the similarity level of elements belonging to distinct clusters cannot increase the cost.

    Monotonicity is adequate for pairwise data clustering. By introducing further requirements, Puzicha et al. focus on a more confined family of cost functions. The following requirement focuses attention on functions of relatively simple structure. A cost function H fulfills the additivity axiom if it can be presented as the cumulative

    sum of repeated applications of ―local‖ functions referring individually to each pair of data elements. That is:

    aa'HS,M = ? (a,a',s,M?;, (1) a,a'?A; aa'

    aa'where ( depends on the two data elements a anda', their similarity value, s,and aa'

    the whole clustering configuration M. An additional axiom, the permutation

    invariance axiom, states that cost should be independent of element and cluster reordering. Combined with the additivity axiom, it implies that a single local function aa'(,;s.t. ( ? ( for all a,a'?A, can be assumed.

    Two additional invariance requirements aim at stabilizing the cost under simple transformations of the data. First, relative ranking of all clustering configurations should persist under scalar multiplication of the whole similarity ensemble. Assume that all similarity values within a given collection S are multiplied by a positive

    constant c, and denote the modified collection by cS. Then, H fulfills the scale

    invariance axiom if for every fixed clustering configuration M, the following holds:

    (2) HcS,M = cHS,M?;.

    Likewise, it is desirable to control the effect of an addition of a constant. Assume

    that a fixed constant ? is added to all similarity values in a given collection S, and +?denote the modified collection by S. Then, H fulfills the shift invariance axiom if

    for every fixed clustering configuration M, the following holds:

    +?(3) HS,M?;:;HS,M?;?;?;,

    where ?;may depend on ?;and on any aspect of the clustered data (typically the data size), but not on the particular configuration M.

    As the most consequential criterion, to assure that a given cost function is not subject to local slips, Puzicha et al. suggest a criterion for robustness. This criterion

    ensures that whenever the data is large enough, bounded changes in the similarity values regarding one specific element, a ? A, would result in limited effect on the cost.

    Consequently, the cost assigned to any clustering configuration would not be susceptive to a small number of fluctuations in the similarity data. Formally, denote a+?the size of the data set A by n and let S be the collection obtained by adding ? to all

    similarity values in S pertaining to one particular element, a ? A. Then H is robust (in

    the strong sense) if it fulfills

    a?1. |H(S,M)H(S,M)|??:?0(4) nn?

     752

    COUPLED CLUSTERING: A METHOD FOR DETECTING STRUCTURAL CORRESPONDENCE

    It turns that among the cost functions examined by Puzicha et al. there is only one function that retains the characterizations given by Equations 1, 2, 3 above, as well as 0the strong robustness criterion of Equation 4. This function, denoted here as H,

    involves only within-cluster similarity values, i.e. similarity values pertaining to 0elements within the same cluster. Specifically, H is a weighted sum of the average

    similarities within the clusters. Denote the sizes of the k clusters A,…, A by n,…, 1 k1

    n respectively. The average within-cluster similarity for the cluster A is then kj

    s?aa'a,a'?Aj . (5) Avgjn?(n1)jj0H weights the contribution of each cluster to the cost proportionally to the cluster size:

    0H ?nAvg . (6) jj j

    0In Section 3, we modify H to adapt it for the coupled clustering setting.

    2.2. Feature-based Similarity Measures

    Similarity measures are used within many applications: data mining (Das, Mannila and Ronkainen, 1998), image retrieval (Ortega et al., 1998), document clustering (Dhillon and Modha, 2001), and approximation of syntactic relations (Dagan, Marcus and Markovitch, 1995), to mention just few. The current paper aims at a different approach to similarity of composite objects, more detailed than the conventional single-valued similarity measures. However, as a pre-processing step, preceding the application of the coupled clustering procedure, we calculate similarity values pertaining to the data elements, which are, in our experiments, keywords extracted from textual corpora. The required similarity values can be induced, in principle, in several ways: they could be obtained, for example, through similarity assessments by experts or naive human subjects that were exposed to the relevant data. An alternative way is to calculate similarities from feature vectors representing the data elements. There are many alternatives, as well, for obtaining appropriate feature-based vectorial representations. The method for this heavily depends, of course, on the specific data under study. In general terms and within textual data in particular, the context in which data elements are observed is often used for feature extraction. This approach conforms to the observation that two objectsfor example, keywords

    extracted from a given corpusare similar if they consistently occur in similar

    contexts. Thus, a keyword can be represented as a list of values referring to other words co-occurring with it along the text, e.g., the corresponding co-occurrence counts or co-occurrence probabilities. In this representation, each dimension in the resulting feature space corresponds to one of the co-occurring words. The resulting (sparse) vectors, whose entries are co-occurrence counts or probabilities, can underlie distance or similarity calculations.

    Numerous studies concerning co-occurrence-based measures have been directed to calculating similarity of words. The scope of co-occurrence ranges from counting occurrences within their specific syntactic context (Lin, 1998) to a sliding window of 40-50 words (Gale, Church and Yarowsky, 1993) or an entire document (Dhillon and Modha, 2001).

    A widely used measure of similarity between co-occurrence vectors of words is their cosine, i.e. dot product of the normalized vectors (used e.g., by Dhillon and Modha, 2001). This measure yields for identical co-occurrences vector (such as the

    case of self-similarity), and if the vectors are orthogonal, i.e. the two corresponding

     753

    MARX, DAGAN, BUHMANN AND SHAMIR

    keywords do not commonly co-occur with any word. The rest of the cases yield values between and , in correlation with the degree of overlap of co-occurrences. This measure, similarly to other straightforward measures, is affected by the data sparseness problem: the common use of non-identical words for reference to similar contexts. One strategy for coping with this issue is to project the co-occurrence data into a subspace of lower dimension (LSI: Latent Semantic Indexing, Deerwester et al., 1990; Schutze, 1992; NMF: Non-negative Matrix Factorization, Lee and Seung, 1999).

    In our calculations, the same issue is tackled through a simpler approach that does not alter the feature space, but rather puts heavier weights on features that are more informative. The information regarding a data element, x, conveyed through a given

    feature, w, for which similarity is being measured, is assessed through the following 3term:

    p(x|w) , I(x,w)log(7) 2p(x)

    where, p denotes conditional and unconditional occurrence probabilities and the

    sign indicates that is returned whenever the log function produces negative value. 2

    Dagan, Marcus and Markovitch (1995) base their similarity measure on this term:

    min{I(x,w),I(x,w)}?12w sim(x,x). (8) DMM12max{I(x,w),I(x,w)}?12w

    The similarity value obtained by this measure is higher as the number of highly informative features, providing comparable amount of information for both elements x and x, is larger. 12

    Lin, 1998 incorporates the information term of Equation 7, as well, though differently:

    (I(x,w)I(x,w))?12)?wIxw?Ixw/(,)0(,)012 sim(x,x). (9) L12(I(x,w)I(x,w))?12w

    Here, the obtained similarity value is higher as the number of features that are somewhat informative for both elements, x and x, is larger, and the relative 12

    contribution of those is in proportion to the total information they convey. Similarly to the cosine measure, both sim and sim measures satisfy: (i) the DMML

    maximal similarity value, , is obtained for element pairs for which every feature is equally informative (including self similarity); and (ii) the minimal similarity value, ,

    is obtained whenever every attribute is not informative for either one of the elements. Accordingly, our formulation and experiments below follow the convention that a zero value denotes no similarity (see Subsection 3.3 and Section 4). In the coupled clustering experiments on textual data that are described later, we use both above similarity measures. We utilize pre-calculated sim values for one L

    experiment (Subsection 5.1) and we calculate sim values, based on word co-DMM

    occurrence within our corpora, for another experiment (Subsection 5.2).

     3 The expectation of the term given by Equation 7 over co-occurrences of all x's and w's, provided the

    unaltered log function is in use, defines the mutual information of the parameters x and w (Cover and Thomas, 21991).

     754

    COUPLED CLUSTERING: A METHOD FOR DETECTING STRUCTURAL CORRESPONDENCE

    Data Set AData Set B

    AjBCjjabsab

    a'

    Figure 3: The coupled clustering setting. The diamonds represent elements of the given

    data sets A and B. The long arrow represents one of the values in use: a

    similarity value pertaining to two elements, one from each data set. The short

    arrow stands for one of the disregarded similarity values within a data set.

    3. Algorithmic Framework for Coupled Clustering

    In this section, we define the coupled clustering task and introduce an appropriate setting for accomplishing it. We then present alternative cost criteria that can be applied within this setting and describe the search method that we use to identify coupled-clustering configurations of low cost.

    3.1. The Coupled Clustering Problem

    As we note in Section 1, coupled clustering is the problem of partitioning two data sets into corresponding subsets, so that every subset is matched with a counterpart in the other data set. Each pair of matched subsets forms jointly a coupled cluster. A coupled cluster consists of elements that are similar to one another and distinct from elements in other clusters, subject to the context imposed by aligning the clustered data sets with respect to each other.

    3.2. Pairwise Setting Based on Between-data-set Similarities

    Coupled clustering divides two given data sets denoted by A and B into disjoint

    subsets A, …, A and B, …, B. Each of these subsets is coupled with the 1k1k

    corresponding subset of the other data set, that is A is coupled with B for j;:;,k. jj

    Every pair of coupled subsets forms a unified coupled cluster, C = A B, containing jj j

    elements of both data sets (see Figure 3). We approach the coupled clustering problem through a pairwise-similarity-based setting, incorporating the elements of both A and B. Our treatment is independent of the method through which similarity values are compiled: feature-based calculations such as those described in Subsection 2.2, subjective assessments, or any other method.

    The notable feature distinguishing our method from standard pairwise clustering, is the set of similarity values, S, that are considered. A standard pairwise clustering

    procedure potentially considers all available similarity values referring to any pair of elements within the single clustered data set, with the exception of the typically excluded self-similarities. In the coupled clustering setting, there are two different types of available similarity values. Values of one type denote similarities between elements within the same data set (within-data-set similarities; short arrow in Figure

     755

    MARX, DAGAN, BUHMANN AND SHAMIR

    3). Values of the second type denote similarities of element pairs consisting of one element from each data set (between-data-set similarities; long arrow in Figure 3). As

    an initial strategy, to be complied with throughout this paper, we choose to ignore similarities of the first type altogether and to concentrate solely on between-data-set similarities: S )s?, where a?A and b?B. Consequently, the assignment of a ab

    given data element into a coupled cluster is directly influenced by the most similar elements of the other data set, regardless of its similarity to members of its own data set.

    The policy of excluding within-data-set similarities captures, according to our conception, the unique context posed by aligning two data sets representing distinct domains with respect to one another. Correspondences, underlying presumed parallel or analogous structure of the compared systems, that are special to the current comparison are thus likely to be identified, abstracted from the distinctive information characterizing each system individually. Whether and how to incorporate the available information regarding within-data-set similarities, while maintaining the contextual orientation of our method is left to a follow up research. 3.3. Three Alternative Cost Functions

    Given the setting described above, in order to identify configurations that accomplish the coupled clustering task, our next step is defining a cost function. In formulating it, we closely follow the standard pairwise-clustering framework presented by Puzicha, Hofmann and Buhmann, (2000, see Subsection 2.1 above). Given a collection of similarity values S pertaining to the members of two data sets, A and B, we formulate

    an additive cost function, HS,M, which assigns a cost value to any coupled-

    clustering configuration M. Equipped with such a cost function and a search strategy (see Subsection 3.4 below), our procedure would be able to output a coupled clustering configuration specifying assignments of the elements into a pre-determined 0number, k, of coupled clusters. We concentrate on Puzicha et al.'s H cost function

    (Subsection 2.1, Equation 6), which is limited to similarity values within each cluster and weights each cluster's contribution proportionally to its size. Below we present 0and analyze three alternative cost-functions derived from H.

    As in clustering in general, the coupled clustering cost function should assign similar elements into the same cluster and dissimilar elements into distinct clusters (as articulated by the monotonicity axiom in Subsection 2.1). A coupled-clustering cost function is thus expected to assign low cost to configurations in which the similarity values, s, of elements a and b of coupled subsets, A and B, are high on average. abjj

    (The dual requirement to assign low cost whenever similarity values of elements a

    and b of non-coupled subsets A and B, j?j', are low, is implicitly fulfilled). In jj'

    addition, we seek to avoid influence of transient or minute componentsthose that

    could have been evolved from casual noise or during the optimization processand

    maintain the influence of stable larger components. Consequently, the contribution of large coupled clusters to the cost is greater than the contribution of small ones with 0the same average similarity. This direction is realized in H through weighting each

    cluster's contribution by its size.

    In the coupled-clustering case, one apparent option is to straightforwardly apply 0the original H cost function to our restricted collection of similarity values. The average similarity of each cluster is then calculated as

    s?aba?A,b?Bjj , Avg' jn?(n1)jj

     756

Report this document

For any questions or suggestions please email
cust-service@docsford.com