DOC

Supplentary Data - Bioinformatics

By Micheal Payne,2014-01-29 05:44
10 views 0
Supplentary Data - BioinformaticsData,data

Supplementary Information

    CAM-CM: A Signal Deconvolution Tool for In Vivo Dynamic Contrast-enhanced Imaging of Complex Tissues

    1,*1,2,*34Li Chen, Tsung-Han Chan, Peter L. Choyke, Elizabeth M. C. Hillman, Chong-72565Yung Chi, Zaver M. Bhujwalla, Ge Wang, Sean S. Wang, Zsolt Szabo, Yue 1,† Wang

16Bradley Department of Electrical and Computer Engineering and School of Biomedical

    Engineering and Sciences, Virginia Polytechnic Institute and state University, Arlington, VA 22203, USA, 2Department of Electrical Engineering, National Tsing Hua University, Taiwan, ROC, 3Molecular Imaging Program, National Cancer Institute, NIH, Bethesda, MD 20892, USA, 4Department of Biomedical Engineering and Radiology, Columbia University, New York, NY 10027, USA, 5Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA, 7Science, Mathematics, Computer Science House, Poolesville High School, Poolesville, MD 20837, USA.

     * Contributed equally. To whom correspondence should be addressed.

     1

Contents

    S1 Background ................................................................................................................ 3 S2 Theory........................................................................................................................ 3

    S2.1 Compartment Modeling .............................................................................................. 3

    S2.2 Identifiability of Tissue-specific Compartments .......................................................... 4 S3 Methods ..................................................................................................................... 7

    S3.1 Multivariate Clustering of Pixel Time Series ............................................................... 8

    S3.2 Convex Analysis of Mixtures .................................................................................... 11

    S3.3 Tissue-specific Compartment Analysis ...................................................................... 12 S4 Software Development ............................................................................................. 13

    S4.1 CAM-CM software ................................................................................................... 13

    S4.2 MATLAB library for convex optimization, clustering and dimension reduction......... 16

    S4.3 Simulation Study and Validation ............................................................................... 16 S5 Case studies.............................................................................................................. 18

    S5.1 A case study on DCE-MRI ........................................................................................ 18

    S5.2 A case study on dynamic fluorescence image data ..................................................... 20

    S5.3 A longitudinal case study on DCE-MRI data ............................................................. 25 References ..................................................................................................................... 29

     2

S1 BACKGROUND

    In vivo dynamic contrast-enhanced imaging tools provide noninvasive methods for analyzing various functional changes associated with disease initiation, progression, and responses to therapy (McDonald, et al., 2003). Typical modalities include dynamic

    contrast-enhanced magnetic resonance imaging (DCE-MRI) (Costouros, et al., 2002), dynamic optical imaging (Hillman, et al., 2007), positron emission tomography (Zhou, et al., 1997), and spectroscopic computed tomography (Anderson, et al., 2010). These tools exploit the dynamics of contrast accumulation and washout to produce functionally relevant images of vascular perfusion and permeability, metabolism, or gene expression, and can potentially test novel hypotheses and predict drug efficacy.

    However, due to spatially-mixed tissue heterogeneity, the precise imaging-based phenotyping by these tools has been hindered by its inability to accurately resolve and characterize targeted functional tissue compartments (Hillman, et al., 2007). This

    indistinction between contributions of different tissues to the mixed tracer signals, while often conveniently overlooked, could significantly confound pharmacokinetics compartmental modeling (CM) (Zhou, et al., 1997) and affect the accuracy of genotype-phenotype association studies (Costouros, et al., 2002; Segal, et al., 2007).

    We developed convex analysis of mixtures compartment modeling (CAM-CM)

    signal deconvolution tool that enables geometrically-principled, unsupervised, and accurate characterization and delineation of major functional tissue structures using dynamic contrast-enhanced imaging data, not only dissecting complex tissue into regions with differential tracer kinetics on a pixel-wise resolution but also substantially improving tissue-specific pharmacokinetics parameter estimation (Wang, et al., 2010). CAM-CM is supported by a well-grounded mathematical framework, and combines the advantages of multivariate clustering, convex geometry analysis, and compartmental modeling. The algorithm possesses a novel, powerful feature allowing pure-volume pixels to be readily identified from the measured pixel time series, without any knowledge of the associated compartment pharmacokinetics, leading to a completely unsupervised approach. We provide CAM-CM software as an open-source standalone MATLAB application.

S2 THEORY

    We first introduce the typical compartment modeling for analyzing dynamic contrast-enhanced imaging data, and then present the theory of how the tissue compartments can be identified, which gives prime motivations of CAM-CM method.

S2.1 Compartment Modeling

    Consider J-tissue compartment model for dynamic contrast-enhanced image time series, where the tracer concentration kinetics is governed by a set of linear first-order differential equations (Port, et al., 1999; Roberts, et al., 2006; Tofts, et al., 1999; Zhou, et al., 1997):

     3

    dCt()InOut1!,KCtkCt()(),111pdt

     (1) dCt()InOutJ1!,KCtkCt()(),,,,JpJJ111dt

    !???CtCtCtKCt()()()(),ms11Jpp

    jJ!,1,2,...,1where , , is the tracer concentration (TC) in the interstitial space Ct()j

    tweighted by the fractional interstitial volume in the tissue-type j, at time ; is the Ct()p

    tracer concentration in plasma (or the plasma input function) and corresponds to the Jth

    tissue type; is the measured tracer concentration in the region of interest (ROI); Ct()ms

    InOutand are the wash-in rate and wash-out rate constants in the tissue-type j, Kkjj

    respectively; and is the plasma volume in tissue. We acknowledge that there are Kp

    alternative compartment models that may be more suitable for some particular applications, and we are currently investigating extended methods that can adapt to the alternative compartment models. Equation (1) can be solved for in terms CtCt(),,()11J

    of the rate constants as

    InOut (2) CtKCtktjJ()()exp(), 1,...,1!),!,jjpj

    )where denotes the mathematical convolution operation.

    OutjJ!,1,2,...,1Let , , and . By equations FtCt()()FtCtkt()()exp()!),Jpjpj

    (1) and (2), the spatial-temporal patterns of tracer concentrations in dynamic contrast-enhanced imaging data can be expressed as the following latent tissue-specific compartment model (Wang, et al., 2006):

    InCitFtFtFt(,)()()()??Ki()????ms111111JJ1??????CitFtFtFt(,)()()()ms212122JJ?????? (3) , In??????Ki()1J??????CitFtFtFt(,)()()()Ki()??????ms11LLJLJL????P??

    iiN1,...,at pixel with (with N being the total number of pixels), where is the Cit(,)msl

    LlL1,2,...,tracer concentration measured at time , , is the number of sampling time tl

    InInpoints, are the local wash-in constants of the tissue-type 1 to tissue-type KiKi(),...,()11J

    iJ-1, at pixel i, respectively, and is the local plasma volume at pixel . We should Ki()p

    emphasize that our goal is to estimate these (unknown) kinetic parameters InIn and the tissue-specific concentration curves FtFt(),...,()KiKiKi(),...,(),()1lJl11Jp

    OutOutdetermined by the parameters from the measurements . Next some Cit(,)kk,...,msl11J

    realistic conditions on these kinetic parameters will be given, by which the tissue-specific

    compartments can be shown to be perfectly identifiable.

S2.2 Identifiability of Tissue-specific Compartments

     4

For ease of analysis in the sequel, we normalize and over their effective Cit(,)Ft;?msj

    Linterval of time samples via a sum-based normalization that projects the scatter plot data points onto the standard simplex as follows

    Ft()Cit(,)jlmslat()xit(,), , . lL1,...,ljlLLCit(,)Ft()??ms'l'jll'1'1l

    We then re-express (3) as

    , (4) xaaK(),...,(), 1,...,iiiN!!(;1J

    TTT??xixitxit(,),...,(,)aatat(),...,()where , , is K()[(),...,()]iKiKi;?(;11LjjjL1J??

    Jaccordingly normalized with . After normalization, the physical meanings Ki()1?j1j

    of the pharmacokinetics parameters shall be interpreted as the relative local wash-in constants. Since these local wash-in constants and local plasma volume,

    InInKiKiKi(),...,(),() or , are non-negative, the pixel time series KiKi(),...,()????11Jp1J

    model (4) immediately indicates that the observed pixel time series is a non-xi;?

    negative linear combination of the tissue-specific compartment TCs, , weighted aa,...,1J

    Jby their spatially-distributed local wash-in constants, with . Ki()1KiKi(),...,()?1Jj1j

    This immediately implies that the observed set of pixel time series is Xxx(1),...,()N??

    a subset of the convex hull of the set of compartment TCs (a convex set readily defined by compartment TCs), where (Boyd, et al., 2004); that is, HAAaa,...,????1J

    JJ. (5) XHAA,!??!,,,aa|, 0,1??????jjjjj!!11jj

    With (5) in mind, an important question is which conditions are needed to support that

    Athe tissue compartment TCs can be theoretically identified from the pixel time series X.

    Before answering this question, some definitions shall be first introduced as follows (Chen, et al., 2008) (Wang, et al., 2010):

    Definition 1. (Corner points) A compartment TC is a corner point of the convex set aj

     if and only if it can only be expressed as a trivial convex combination of . aa,...,HA??1J

    Definition 2. (Well-grounded points) Any pixel time course whose associated normalized

    espatially-distributed wash-in constants are in the form of (where is Ke()i??jWGP()jj

    the standard basis of J-dimensional real space) is called a well-grounded point (WGP) and corresponds to a pure-volume pixel, i.e., . xaaKa(),...,()ii!!(;WGPjJjj()1WGP()

    The following theorem provides the identifiability of the tissue compartments: Theorem 1 (Convexity of pixel time series). Suppose that the J compartment TCs

    J are linearly independent, and where non-negative aa,...,xa()()iKi?1Jjj1j

    normalized spatially-distributed wash-in constants have at least one well-K()i??

     5

    Xgrounded point on each of the J coordinate axes, then the convex set specified by , i.e.,

    NN, is identical to the convex set HXX{}|, 0, 1!??!,,,xxii;?;?????iii!!11ii

    , and its corner points are the J compartment TCs . aa,...,HA??1J

    xaiProof: By Definition 2 that , then for any we have z?HA;???WGP()jj

    Jza?jjj1

    J x i;??jjWGP()j1

    ??,,ii??jjWGP()N?!!x ', where 'i;??ii?,,i10,,ii???jWGP()? NN'''zxx?!??!HXX,,,ii|, 0, 1implying , i.e., . HAHX??;?;?????????iii!!11ii

    On the other hand, for any , we have z?HX??

    Nzxi;??ii1

    NJKia ;???ijjij!!11 JN??Kia ;???ijjji!!11??

    JNJa!!Ki = , where and 1,;????jjjijj::,:jij!!!111

    implying , i.e., . Therefore, combining and z?HAHXHAHXHA???????????

     gives . Next, we show that are corner points of aa,...HXHAHXHA????????1J

    J. Since are linearly independent, we have aa,...,,a0!!; iff 0 jHA???1Jjjj1j

    which implies that

    J'''T aae= iff [,...,] ,,,!;j?1jjjJj1j

    i.e., can only be a trivial convex combination of . By Definition 1, aa,...,aa,...,a1J1Jj

    are therefore the corner points of convex set , and together with , HXHAHA??????

    we readily complete the proof of Theorem 1. Q.E.D

    The inferences of Theorem 1 are twofold: Firstly, given a set of observed pixel time

    Aseries, the compartment TCs can be identified by searching the corner points of the

    HX{}pixel time series convex set when pure-volume pixels exist for each of the tissue compartments (Rijpkema, et al., 2001). Secondly, the pure-volume pixels constitute the corner points of the observed pixel time series convex set, while the partial-volume pixels constitute the interior points of the observed pixel time series convex set.

    One important consideration with the present method is the existence of functionally pure-volume pixels for each of the underlying compartments, and this reasonable assumption reflects only the ideal scenario and constitutes the necessary and sufficient condition for the mathematical identifiability of the signal deconvolution model

     6

    (equations (1)-(3)). Also, our CAM-CM solution, introduced in the next section, is proposed to identify the corners of , i.e. the time series of pure-volume pixels, by HA??

    . Nevertheless, it is possible that in some datasets, no identifying the corners of HX{}

    pixel is pure and it would be helpful to provide an accurate interpretation of the CAM-CM solution in such non-ideal scenarios. The following theorems show that, if source-dominance pixels exist for each of the underlying tissue compartments, CAM-CM will provide the optimal solution that captures maximum source information (i.e., with the identified corners of the pixel time series scatter simplex corresponding to maximum source-dominance).

Theorem 2 (Source dominance). Suppose that the non-negative normalized pixel-wise

    local wash-in rates K(c)=[K(c),…, K(c),…,K(c)] are the corners of the pixel time j1jmjJj

    series scatter simplex. Then the CAM-CM solution based on these corners achieves the

    maximum source dominance in the sense of K(c)=max K(i). mji=1,2,…,Nm

Proof of theorem 2. Consider the pixel K(i*)=α(i*)K(c)+α(i*)K(c)+…+α(i*)K(c) of 1122JJ

    the convex hull defined by these corners, whose mth entry is the largest among all pixels,

    i.e., K(i*)=max K(i). Since α(i*)+α(i*)+…+α(i*)=1, we may therefore write mi=1,2,…,Nm12J

K(i*)= (α(i*)+α(i*)+…+α(i*))K(i*)=α(i*)K(i*)+α(i*)K(i*)+…+α (i*)K(i*). m12Jm1m2mJm

Alternatively, the mth entry of K(i*) can be expressed as

    K(i*)=α(i*)K(c)+α(i*)K(c)+…+α (i*)K(c). m1m12m2JmJ

By the unique convex expression of K(i*), we have m

     α(i*)[K(i*)-K(c)]+α(i*)[K(i*)-K(c)]+…+α (i*)[K(i*)-K(c)]=0, 1mm12 mm2JmmJ

which, together with the fact α(i*)?0 and K(i*)-K(c)?0, implies i*={c}. Q.E.D 1mmjj

    The results provided by Theorem 1 and Theorem 2 would allow us to gain further insights beyond the dynamic contrast-enhanced data themselves into how the temporal patterns of the underlying compartment TCs geometrically located at the pixel time series scatter simplex, facilitating the design of separation principle in CAM-CM.

S3 METHODS

    As we introduced in the Background section, the CAM-CM method uses in vivo dynamic

    contrast-enhanced imaging data to analyze various functional changes associate with disease initiation, progression and responses to therapy. We herein demonstrate the procedure in the CAM-CM method in Fig. S1. Input pixel time courses within the tumor

    ROI are first normalized onto a simplex and then processed by the following three core components: (1) initialization-free multivariate clustering method that assumes observed pixel time series follow standard finite normal mixture (SFNM) Xxx(1),...,()N??

     7

    Xmodel, and aggregates into a few clusters by using expectation-maximization (EM) algorithm (Miller, et al., 2003) initialized with affinity propagation clustering (APC) (Frey, et al., 2007), in an attempt to reduce the impact of noise/outlier pixels on model learning; see Section S2.1; (2) convex analysis of mixtures (CAM) that identifies the tissue-specific compartment TCs by finding the pure-volume clusters located at aa,...,1J

    the corners of the clustered pixel time series scatter simplex; the details will be given in Section S2.2; and (3) compartment modeling that estimates tissue-specific kinetic

    InInparameters using only pure-volume pixel time series. See KiKiKi(),...,(),()11Jp

    discussion in Section S3.3.

    Figure S1. Pictorial flow chart of the CAM-CM method (illustrated on the special case of J=3, with three tissue types, “plasma input”, “fast flow”, and “slow flow”, associated with the compartment model (Wang, et al., 2006).

S3.1 Multivariate Clustering of Pixel Time Series

    There has been considerable success in using SFNMs to model clustered data sets, such as dynamic contrast-enhanced imaging data (Chen, et al., 2008), taking a sum of the following general form:

    JMpigigiKKe()()|,()|,!?~~ΣKμΣ, (6) ;?;?;???KKK,,,mmmmmm!!?11mmJ

     8

where the first term corresponds to the clusters of pure volume pixels , the pmJ1,...,;?

    Msecond term corresponds to the clusters of partial volume pixels , is pmJM!?1,...,;?

    the total number of pixel clusters, is the mixing factor, is the Gaussian p~pg;?m

    probability density function, and and are the mean vector and covariance pμΣK,mK,m

    matrix of cluster , respectively. By incorporating (4) into (6), the SFNM model for pm

    pixel time series becomes

    JMppigigixxa()()|,()|,!?~~ΣxμΣ, (7) ;?;?;???mmmmmmxxx,,,mmJ!!?11

    TμAμwhere and , with . Accordingly, the first ΣAΣAAaa,...,(;xK,,mmxK,,mm1J

    term of (7) represents the pure volume clusters and the second term of (7) represents the

    Xpartial volume clusters, and as shown in Fig. S1 the clustered pixel time series set is

    (approximately) confined within a convex set whose corner centers are the J

    compartment TCs . aa,...,1J

     It has been shown that significant computational savings can be achieved by using the EM algorithm to allow a mixture of the form (7) to be fitted to the data (Chen, et al., 2004; Titterington, et al., 1985). Determination of the parameters of the model (7) can be viewed as a “missing data” problem in which the missing information corresponds to

    pixel labels specifying which cluster generated each data point with limI,Iim,;?;?im

    denoting the indicator function. If we were given a set of already clustered data with specified pixel labels, then the log likelihood (known as the complete data log-

    likelihood) becomes

    NM

    ??LX(|,)log()|,ΘLxlgi~μΣ, (8) ????xx,,immmm??!!11im

    where . However, we and L!!!{,1,2,...,,1,2,...,}liNmNΘμΣ!;{,,,}~mimmmmxx,,

    only have indirect, probabilistic information in the form of the posterior responsibilities

    x()i for each model m having generated the pixel time series . Taking the expectation zim

    of (8), we then obtain the complete data log likelihood in the form

    NM

    ??LX(|,)log()|,ΘZxμΣzgi~, (9) ????,,,xximmmm??!!11im

    where the are constants and . Z!!!{,1,2,...,,1,2,...,}ziNmNzli!!Pr1|()x;?im,imim,

     Maximization of (9) can be performed using the two-stage form of the EM algorithm. At each complete cycle of the algorithm we commence with an “old” set of parameter values . We first use these parameters in the E-step to ΘμΣ!;{,,,}~mmmmxx,,

    evaluate the posterior probabilities using Bayes theorem zim,

    ??~gix()|,μΣmmmxx,,?? . (10) zlimM!!!?Pr1|(), 1,...,x;???imim,M??gix()|,μΣ~?mmm',','xx??m'1

    These posterior probabilities are then used in the M-step to obtain “new” values

     using the following re-estimation formulas ΘμΣ!;{,,,}~mmmmxx,,

     9

    N1, (11) ~z?,mimN1i

    Nzix()?,im1i, (12) μx,mNz?,im1iTNziix()(),,μxμ;?;??,,,xximmm1iΣ. (13) x,mNz?,imi1

     To reduce the likelihood of pixel time series clustering being trapped into local maxima, an effective and initialization-free affinity propagation clustering (APC) is attempted to initialize the parameter Θ for the EM algorithm and to estimate the number

    of obtainable clusters (Frey, et al., 2007). We simultaneously consider all data points as

    potential exemplars (cluster centers) and recursively exchange real-valued messages between data points until a high-quality set of exemplars and corresponding clusters

    2simim,!,,xxgradually emerges. Let the “similarity” indicate how well the ;?;?;?

    data point is suited to be the exemplar for data point ; the “responsibility” xmxi;?;?

     reflects the accumulated evidence for how well-suited data point is to rim,xm;?;?

    serve as the exemplar for data point , and the “availability” reflects the aim,xi;?;?

    accumulated evidence for how appropriate the data point chooses data point xmxi;?;?

    as its exemplar. Then, the responsibilities are computed based on rim,;?

    , (14) rimsimaimsim,,max,','?,?;?;?;?;???mm'(

    where the availabilities are initialized to zero and the competitive update rule (14) aim,;?

    is purely data-driven. Whereas the responsibility update (14) allows all candidate exemplars to compete for ownership of a data point, the availability update rule

    ????aimrmmrim,min0,,max0,', (15) ??;?;?;??????iim',??????

    collects evidence from data points to support a good exemplar, where the “self-

    availability” is updated differently

    . (16) ammrim,max0,',?;?;????im'

    At any iteration during affinity propagation, availabilities and responsibilities are combined to identify exemplars and to terminate the algorithm when these decisions do not change for 10 iterations (Frey, et al., 2007). One advantage of APC is that the number

    of clusters need not be specified a priori but emerges from the message-passing procedure and only depends on the density distribution of the data points and a parameter. The preference associated with similarities s(k,k) for all data points can be taken as input

    of APC, and can be varied to produce different number of clusters. The shared value could be the median of the input similarities (resulting in a moderate number of clusters) or their minimum (resulting in a small number of clusters). We use the median of the

     10

Report this document

For any questions or suggestions please email
cust-service@docsford.com