EVALUATION OF THE CL CELL ONTOLOGY
Questions and comments can be addressed to Lindsay Cowell at
An ontological representation of the entities relevant to biological research is urgently needed. The cell ontology developed by Bard and colleagues (CL) (Bard et al. 2005) makes a significant contribution towards fulfilling this need by providing an ontology of cell types. The CL has already proven useful for data annotation (e.g. Grumbling et al. 2006), although the ontology‟s potential utility goes well beyond that specific application. For example, using the number of distinct cell types in an organism as a measure of biological complexity, Vogel and Chothia (2006) compared the proteomes of 38 organisms of varying complexity and identified patterns in the evolution and expansion of protein domain superfamilies. This work has implications for some of the fundamental questions in biology, such as understanding the processes by which physiology becomes more intricate, new cell types arise, and biological complexity increases. While Vogel and Chothia did not yet utilize the CL for this work, they cite Bard et al. (2005) and describe the ontology‟s value for improving and extending their analysis. Thus, in
addition to its great utility for database annotation, the CL has the potential to play a significant role in basic scientific inquiry.
The prospect of using the CL (and other ontologies) for this type of scientific research is extremely exciting but also imposes requirements on the level of formal rigor applied in ontology development, on the adequacy of the ontology as a representation of reality, and on its adherence to community standards of best practice. It is with these things in mind that I examined the CL to determine whether any revision would be required before my research group could use it for scientific research.
After carefully evaluating the CL, my overall impression is that it does not possess the rigor and exactness required of a reference ontology. Furthermore, the problems I see are significant enough that it would be difficult for my research group to use the CL as our application ontology. While some of the problems could be resolved by changing a relation or rewriting a definition, others would require careful rethinking of the ontology‟s foundation, because they involve the scope and organizing principle of the
ontology as a whole.
I have attempted to classify the problems that I see with the CL and have provided a few examples in the section General Comments (2). A more thorough description of the
ontology‟s structure and the problems observed with each analyzed term follows in
Detailed Evaluation (3). Finally, in Concluding Remarks (4), I suggest potential
changes to the CL, changes that I believe could solve many of the problems outlined in (2)
and (3) and that will render the CL usable for research like that described by Vogel and Chothia (2006), research that requires computation over multiple ontologies, and research that relies on automated reasoning.
I recognize that it is relatively easy to identify problems with a particular ontology and extremely difficult to create a good one. I have tried to describe the issues that, in my opinion, must be resolved and which I believe can be resolved in a reasonable time. I have also tried to provide a set of suggested solutions that can serve as a starting point for discussion within the CL community. It is my hope that this evaluation will stimulate discussion that may lead to improvements in the CL and facilitate its broader use.
1.1 Version of the Cell Ontology Used for Evaluation
I evaluated a version of the CL ontology dated May 25, 2006. I received this version by email in response to an email request I sent to Oliver Hofmann (email@example.com).
This version does not include the changes to cells of hematopoietic lineage that were suggested by Alex Diehl. I have seen Alex‟s proposal, however, and have commented
briefly on his suggested changes in section 3.4.
1.2 Selection of Terms for Evaluation
1I started with the root term (cell) and searched for all terms labeled with the relation is_a:
CL:0000000 ! cell. From the returned set of terms, I selected a term and searched for all of its immediate is_a subclasses. I continued down the tree iteratively this way, selecting at each level either the terms of interest to me as an immunologist or those that seemed important for the structure of the ontology. I ignored relations that are not is_a and terms
with the relation is_a: CL:0000610 ! plant cell. The terms I selected at each level are
indicated in the trees below in bold. The numbers in parentheses correspond to the non-zero digits in the CL unique identifiers (e.g. plant cell (610)).
For each term, I evaluated both the definition and is_a relations for biological correctness
and consistency with the stated organizing principle for that branch of the ontology. 2 General Comments
2.1 Scope of CL
The CL cell ontology intends to cover cells from the “prokaryotic, fungal, animal and
plant worlds” (Bard et al. 2005). Although I agree that we need an ontological representation of all these cell types, I don‟t think that they can be effectively covered in
one ontology. There is significant information loss when species differences are compressed into a single ontology. This hampers use of the ontology in applications where species differences between cell types are important, for example when making predictions about the immune response in humans based on experimental results from the mouse. I would suggest one small master cell ontology, together with a separate ontology for each organism type, with links between ontologies to indicate comparable
1 Italics indicate a CL term or relation.
cell types. Perhaps the approach being taken for anatomy in the CARO project can be used as a model.
2.2 Multiple Inheritance and the Confusion of Organizing
Most of the classes in the CL inherit properties over multiple paths to the root. The problem is that many classes inherit properties that are not true for them, and some inherit contradictory properties. The problems arise from two sources: 1) In general, cell types were not defined and classified based on a stated organizing principle; rather, commonly used terms (whose referents are often unclear) were placed in a hierarchical structure to capture information about properties of cells such as lineage and processes in which a cell participates, and 2) The relation A is_a B is often included in the ontology even when
properties of B are not true for some of the subclasses of A.
For example, in an attempt to capture information about the histology and lineage of different cell types, two branches were constructed, the cell by histology and cell by
lineage branches. The highest distinction under cell by histology is histological (e.g.
example subclasses are epithelial cell and mesenchymal cell), and the highest distinction
under cell by lineage is based on lineage (e.g. endodermal cell and mesodermal cell).
The class mesenchymal cell is related to mesodermal cell is_a cell by lineage by the
relation develops_from, and all relations directly under mesenchymal cell are
develops_from (as are most of the relations anywhere under mesenchymal cell). The
class mesenchymal cell, along with all classes below it, has been copied from the cell by
lineage branch and inserted as an is_a subclass to cell by histology. This is true for many
subclasses of cell by histology so that, below the first level, the majority of classes are based on a lineage rather than a histological classification.
An example of inheriting contradictory properties comes from the cell by function branch
of the ontology which is intended to be a classification of cell types based on their “primary end goal or behavior.” The class phagocyte (sensu Vertebrata) is duplicated
under blood cell is_a circulating cell is_a cell by function and under phagocyte is_a
motile cell is_a cell by function. Thus all subclasses of phagocyte (sensu Vertebrata)
inherit both passive (from circulating cell) and active (from motile cell) movement as
their “primary end goal or behavior.” Even if one reconciled this contradiction by saying
that the duplication is intended to capture the fact that phagocytes are capable of both modes of movement, they can‟t both be true at the same time. Futhermore, there are
subclasses of phagocyte (sensu Vertebrata) (e.g. tissue resident macrophages) that do not
circulate and so should not inherit from circulating cell.
I strongly recommend single inheritance trees based on one single mode of classification that is maintained throughout the tree.
2.3 Relations in CL
Cell types in the cell ontology are related by either the is_a or the develops_from relation
(Bard et al. 2005). CL defines the is_a relation as “a subsumption relationship, in which
the child term is a more restrictive concept than its parent (thus chondrocyte is_a
mesenchyme_cell)” (Bard et al. 2005). The develops_from relation indicates a
“developmental lineage relationship” (e.g. “hepatocyte develops_from
mesenchymal_cell”) (Bard et al. 2005). Properties are inherited over the is_a relation but
not over the develops_from relation.
I observed two types of problems in the use of these relations: 1. use of is_a when the correct relationship is develops_from, and
2. use of is_a between cell types where, for B is_a A, there are properties of A that are
not true of B.
The example above, chondrocyte is_a mesenchyme_cell, is an example of the former and
was taken from the published manuscript Bard et al. 2005. As an example of the second type of problem, take lymphopoietic stem cell is_a hematopoietic stem cell;
hematopoietic stem cell is_a multi fate stem cell is_a somatic stem cell is_a stem cell and
thus inherits “the ability to divide and proliferate throughout life.” Lymphopoietic stem
cells, however, are not capable of lifelong proliferation. The discussion above about phagocytes (2.2) provides a second example.
I strongly recommend that the relations from the OBO relation ontology be used. 2.4 Definitions in CL
Many of the ontology‟s terms are not defined (484 of 761 terms), and the definitions that are present have not been carefully written. Definitions have been imported from MeSH or other sources wherever possible and are often not true to the referent or the relationships between referents. I observed three types of problems with the definitions: 1. The definitions have not been written in the form “B is_a A which C.”
2. Some definitions are simply wrong. For example, the definition for plant cell is: "a
cell found in seeded plants [TAIR:syr],” implying that cells found in plants using
spores rather than seeds for reproduction are not plant cells.
3. Many of the definitions are not consistent with the stated organizing principle for the
relevant branch of the ontology. For example, the cell by function branch is defined
as “a classification of cells by their primary end goal or behavior [FB:ma].” In this
branch, however, there are structural definitions. For example, granulocyte is_a
defensive cell where defensive cell is functionally defined but granulocyte is defined
structurally: “leukocytes with abundant granules in the cytoplasm
[MESH:A.11.118.637.415].” Additionally, there are functional definitions that do
not describe the “primary end goal or behavior” of the cell, if interpreted in the
context of function. For example, a subclass of cell by function is circulating cell, but
I doubt the reason for existence for any cell type is „to circulate‟; rather circulating is
required in order to serve some other function such as „to transport oxygen‟.
I recommend that definitions of the form „A is_a B which C‟ be written for all terms, and
that the definitions be consistent with the stated organizing principle for the tree containing the term.
3 Detailed Evaluation
In the remainder of this document, I have tried to give a systematic description of the CL‟s structure with comments pointing out specific examples of the types of errors I described above. I did not analyze every term, but I think I have covered enough terms to have a sense for the nature and extent of the revisions required by this ontology.
experimentally modified cell (578)
cell in vivo (3) (no def)
cell by organism (4)
prokaryotic cell (520) (no def)
eukaryotic cell (255) (no def)
plant cell (610)
animal cell (548) (no def)
cell by class (12) (no def)
stem cell (34)
non-terminally differentiated (blast) cell (55)
cell by histology (63)
cell by function (144)
cell by lineage (220) (no def)
cell by nuclear number (224)
cell by ploidy (414)
3.1 The Root: cell
To facilitate interoperability between ontologies, I think that for a given organism the definitions of shared terms should be consistent between ontologies. The definitions for cell from CL and the FMA are not the same; the differences result primarily from problems with the CL definition.
CL defines cell as: "minute protoplasmic masses that make up organized tissue, usually consisting of a nucleus which is surrounded by protoplasm which contains the various organelles and is enclosed in the cell or plasma membrane. Cells are the fundamental, structural, and functional units of living organisms." This definition has been imported from MeSH [MESH:A.11].
The definition of cell in the FMA is: “anatomical structure that consists of a cell
compartment surrounded by a plasma membrane; together with other cells and [extra]cellular matrix, it constitutes tissues. Examples: lymphocyte, fibroblast, erythrocyte, neuron.”
The CL claims to define terms appropriately for “prokaryotic, fungal, animal and plant
worlds,” thus “organized tissue” and “nucleus” should not be part of the definition for
cell. Single-celled organisms do not “make up organized tissue,” and prokaryotic cells
do not contain a nucleus. Mention of the presence of a nucleus is qualified with “usually,” but I think, in definitions, we should avoid use of properties that are not always true. Even if the scope of the CL were restricted to a single organism such as human, the use of “organized tissue” and “nucleus” is still inappropriate: even if one accepts the
classification of blood as tissue, it is not an organized tissue, and erythrocytes do not 2contain a nucleus.
3.2 Subclasses of cell: experimentally modified cell versus cell
The distinction between cells in their natural state and those that have been modified in some way, such as having been transfected with a foreign gene, is an important one. I don‟t, however, think this distinction should be included in the ontology.
The CL makes the distinction by dividing cell, the root class, into the two subclasses cell
in vivo and experimentally modified cell, making this the highest distinction in the
ontology. The definition of experimentally modified cell is "a cell that has been changed
as a consequence of a deliberate and specific experimental procedure [FB:ma].” cell in
vivo is not defined.
There is ambiguity about what is meant by cell in vivo, but by at least one reasonable
interpretation (cell in the living organism), there are particulars that instantiate both cell
in vivo and experimentally modified cell. Cells are sometimes modified (e.g. by
transfection) and then introduced into an organism. There are also procedures by which a cell is modified in the organism, such as by introducing a substance into an organism that modifies some of its cells, or by the creation of transgenic organisms. In all three examples, cells are modified as a consequence of the procedure and are certainly experimentally modified cells by the CL definition, but they are also in vivo.
Another problem I see is that there are many “deliberate and specific experimental
procedure[s]” that change a cell in some way, but not in a way that changes its type. Or
the cell may change types but still be identical to a type of natural cell. For example, cells can be induced to proliferate or to differentiate into another cell type. In both cases, the cell is participating in a process that is natural for it. If a mouse is exposed to a murine pathogen in the lab, then surely that mouse and all of its cells have been subjected to a “deliberate and specific experimental procedure.” Any of the mouse‟s cells that
change as a consequence of this deliberate pathogen exposure are experimentally modified cells by the CL definition, but these cells are not different types from the responding cells in a wild mouse exposed to the same pathogen naturally (if we ignore genetic differences between wild and laboratory mice). For this reason, one could argue against classifying experimentally modified cells (as defined by CL) separately from natural cells of the same type.
There is one particular modification to cells that could more easily be argued to warrant separate classification, and that is the modification of a cell‟s genome; there may be other
procedures that induce radical and unnatural changes in a cell.
2 Mention of tissue in the FMA definition is less of a problem because “tissue” appears after the semicolon
thus providing additional information but not defining the essence of cells. It is worth mentioning though that blood in the FMA is defined as a body substance not as tissue, so use of “tissue” here may need to be
reworded to make it clear that this statement is true for some cells. Not all cells are the constituent part of some tissue, and even if blood were defined as tissue, blood does not contain extracellular matrix.
Cells whose genomes have been modified are more clearly different types than cells that have been subjected to other experimental procedures. The problem with having “genomically modified cells” as the highest level distinction made in the ontology is that it implies in some way that a pre-B cell transfected with a plasmid expressing the RAG-1 gene is more similar to a fibroblast transfected with green fluorescent protein than to natural pre-B cells. This doesn‟t seem right. On the other hand, including the distinction
as the lowest in the ontology is also a problem. For example, if we included natural pre-
B cell and pre-B cell transfected with RAG-1 as leaf nodes in the ontology under the type
pre-B cell, a problem arises when the transfected gene results in the loss of pre-B cell properties inherited from pre-B cell to pre-B cell transfected with RAG-1.
The best alternative I see is to have two separate but linked ontologies, one for natural cells and one for cells modified in some way that we agree warrants separate classification (e.g. modification of the genome). Development of a good ontology of modified cells will require a good ontology of the experimental procedures; perhaps these procedures will be included in the experiment ontology that is under development. My recommendation is that we first focus on an ontology of natural or canonical cells and postpone the task of representing modified cells.
3.3 Subclasses of cell in vivo: cell by organism versus cell by
cell by class and cell by organism are classifications of cells based on different criteria; they are not types of cells, thus the relation is_a cell in vivo is incorrect.
3.3.1 Subclasses of cell by organism
cell by organism is defined as: "a classification of cells by the organisms within which they are contained [FB:ma].” The immediate subclasses of cell by organism include
eukaryotic cell and prokaryotic cell. The subclasses of eukaryotic cell include animal
cell and plant cell. So far this is ok, but at this point, the classification ceases to be based on organism. The subclasses of animal cell are primarily the same classes as the
subclasses of cell by function. It looks as if the cell by function branch of the ontology
was copied under the cell by organism ? eukaryotic cell ? animal cell branch. There
are some differences, but it is not clear what these differences have in common. For example, under cell by organism,
antigen presenting cell is_a animal cell is_a eukaryotic cell is_a cell by organism,
but under cell by function,
antigen presenting cell is_a defensive cell is_a cell by function is_a cell by class.
First, antigen presenting cell is not a classification based on organism. Second, defensive
cell, the superclass of antigen presenting cell in the cell by function branch, is not a
subclass of animal cell, but many of the sibling classes of defensive cell under cell by
function (e.g. motile cell and electrically active cell) are. It is not clear why the
functional classification defensive cell would be excluded under animal cell when other
functional types are permitted. Finally, there is no information under antigen presenting
cell about differences in antigen presenting cells between organisms. Thus, the purpose of the cell by organism branch of the ontology is not clear to me.
If we decided to have a small master ontology along with a separate cell ontology for each organism type, as is planned for anatomy, then this would eliminate the need for a classification by organism. The other possibility is to link the cell ontology to an ontology of organisms to indicate which cell types are in which organisms. Either option I think makes more sense than an ontology of cell types based on organism and would include much more information than the current cell by organism classification which
does not go any deeper than animal cell, fungal cell, Mycetozoan cell, and plant cell
before becoming a classification by function.
3.3.2 Subclasses of cell by class: stem cell versus non-terminally
differentiated (blast) cell versus cell by __
As with cell by organism and cell by class, the cell by __ (histology, nuclear number,
ploidy, lineage, function) classes are types of classifications not types of cells, so the relation is_a cell by class is_a cell in vivo is incorrect. cell by histology, cell by nuclear
number and cell by ploidy all seem to be aspects of a structural/morphological
classification rather than the basis for separate ontologies or branches. For the cell by
function classification, it may make more sense to link a classification of cells to a classification of function rather than classifying cells by function. Finally, the sibling relation of the cell by __ classes to stem cell and non-terminally differentiated (blast) cell
doesn‟t make sense, and neither do the relations stem cell is_a cell by class and non-
terminally differentiated (blast) cell is_a cell by class.
stem cell is defined as: "a relatively undifferentiated cell that retains the ability to divide and proliferate throughout life to provide progenitor cells that can differentiate into specialized cells [MESH:A.11.872].” I don‟t like the use of “relatively undifferentiated”
because many cell types (that aren‟t stem cells) are relatively undifferentiated, but
relative to a different type of cell in each case. Furthermore, this is a functional definition, so I am not sure why stem cell isn‟t under cell by function. As I mentioned above, I
recommend classifying cells based on structural properties and linking the cell ontology to an ontology of function, but because stem cell is currently defined functionally, I will
comment on the difficulties I see.
stem cell is hard to define functionally; it is much easier to find problems with a proposed definition than to come up with a good definition. The main problem is coming up with a functional definition that distinguishes stem cells from progenitor or precursor cells. Many of the problems could probably be resolved by refining the way people use these terms (one of the things I hope will happen as a consequence of a good cell ontology). The following is based on current usage.
When trying to define stem cells, people generally think about multipotency and long-term self renewal. The problem is that there are cell types referred to as stem cells that do not posses both multipotency and long-term self renewal, and these properties are not unique to stem cells.
Multipotency: Some stem cells, unipotent stem cells, only give rise to one type of cell and so are not multipotent (e.g. muscle stem cell). I assume that these cells are classified
as stem cells because they possess the property of long-term self renewal, but I don‟t
actually know anything about them. There are also cell types that are not considered stem cells (e.g. common myeloid and common lymphoid progenitors) but that are multipotent. For example, common myeloid progenitors give rise to both megakaryocyte/erythrocyte and granulocyte/macrophage progenitors. One could argue that the common myeloid and common lymphoid progenitors should be considered stem cells, but they are not capable of long-term self renewal.
Self renewal: Not all multipotent stem cells possess the capacity for long-term self renewal. For example, most immunologists would agree that there are three classes of hematopoietic stem cells. All three classes are able to develop into all cell types of hematopoietic lineage (mulitpotent), but the three classes differ in their capacity for self renewal. One of these classes is not capable of long-term self renewal, and in fact, can only reconstitute irradiated mice for a few weeks. One could argue that this class of hematopoietic stem cells should not be considered stem cells, but in the current CL classification, there is then no place for these cells because they are also not consistent with the definition for non-terminally differentiated (blast) cell, which is defined as
having a “unique fate.” None of the three classes of hematopoietic stem cells have a “unique fate,” including those not capable of long-term self renewal. Furthermore, there
are cell types that are not stem cells but that do have the ability to proliferate for life, namely skin cells.
non-terminally differentiated (blast) cell is defined as: "a precursor cell with a unique fate
[FB:ma].” It would help to have a definition for “precursor cell” and to clarify “unique fate.” I would argue though, that a non-terminally differentiated cell does not have a unique fate by definition of the fact that it is not terminally differentiated. Example subclasses from the CL are myeloblast and lymphoblast. These cells may be considered
to have a unique fate in that lymphoblasts differentiate into lymphocytes, for example, but they differentiate into B lymphocytes, T lymphocytes, natural killer cells, T helper type 1 lymphocytes, etc. They are lineage restricted relative to hematopoietic stem cells but they do not have a unique fate.
3.3.3 Subclasses of stem cell
cell in vivo (3) (no def)
cell by class (12) (no def)
stem cell (34)
germline stem cell (14) (no def)
somatic stem cell (723) (no def)
single fate stem cell (35) (no def)
multi fate stem cell (48)
hematopoietic stem cell (37)
multipotential myeloid stem cell (49)
lymphopoietic stem cell (51) (no def)
totipotent stem cell (52) (no def)
22.214.171.124 germline stem cell versus somatic stem cell
These terms are not defined, but the distinction seems real biologically. 126.96.36.199 Subclasses of somatic stem cell: single fate stem cell versus multi
fate stem cell versus totipotent stem cell
This is the typical classification of stem cells, and, as a functional classification, it isn‟t
too bad. Definitions for single fate stem cell and totipotent stem cell are needed, however,
as is a better definition for multi fate stem cell.
The difference between single fate stem cell and non-terminally differentiated (blast) cell
(sibling of stem cell) is not clear. A single fate stem cell certainly has a unique fate, so the distinction from non-terminally differentiated (blast) cell relies on the assumption that
precursor cells do not “proliferate throughout life.” This issue was discussed above in the
section on stem cells (3.3.2).
multi fate stem cell is defined as: "a specialized stem cell that is committed to give rise to cells that have a particular function [MESH:A.11.872.590].” A multi fate stem cell can
give rise to fewer cell types than a totipotent or pluripotent stem cell (consistent with being a “specialized stem cell”), but it is not clear from this definition that a multi fate stem cell can give rise to multiple cell types, each type of cell with its own particular function. Thus, the multi fate stem cell gives rise to cells that, as a population, have a range of functions, not “a particular function.” For example, hematopoietic stem cell is_a
multi fate stem cell that gives rise to lymphocytes, macrophages, and dendritic cells (among others).
188.8.131.52 hematopoietic stem cell
The main problem with the definition for hematopoietic stem cell ("a progenitor cell from
which all blood cells develop [MESH:A.11.148.378]”) is that blood cell is not defined. If
blood cell is taken to mean cells found in the blood, the the problem is that there are cells that develop from hematopoietic stem cells that are not found in the blood (e.g. tissue resident macrophages). The definition does not specify that hematopoietic stem cells do not develop into any cell type that is not a blood cell, but if one allows for that, then the definition, as a functional definition based on the types of the daughter cells, is incomplete.
Another way to reconcile this definition is to say that cells such as tissue resident macrophages are blood cells. They certainly spend part of their life in the blood, but their classification as blood cells would depend on how blood cell is defined. blood cell is not
defined, but blood cell is_a circulating cell, and circulating cell is defined as "a cell that
does not usually make strong adhesions to other cells and, [as] a consequence, moves