By Charles Moore,2014-04-12 15:22
16 views 0

Journal of Chinese Language and Computing 16 (2): 99-119 99

    Lexicon as a Generating System: Restating the Case

    of Complex Word Formation in Chinese

    Yuanjian He

    Department of Translation, Chinese University of Hong Kong, Shatin, N.T., Hong Kong

    Submitted on 12 March 2004; Revised and Accepted on 26 April 2006


    Part of the speaker’s grammatical competence is his/her ability to access and generate well-formed morpheme strings which we call words. But not much attention has been paid to treating the Chinese lexicon as a generating system. Part of the difficulty was that before Chomsky’s (1993/1995) framework of Generalized Transformation (GT), there had been a

    lack of a unified computational system in general for all components of grammar, i.e. lexicon, syntax, logical form and phonetic form. The current study thus tries to apply GT to Chinese complex word formation from the perspective of the Chinese lexicon operating as a generating system. Fresh findings are: a) GT does work for Chinese morphology; b) the canonical word order in Chinese morphology is a mirror image of that of its syntax, e.g. (O)VS vs. SV(O) and OV vs. VO; c) Chinese has lexico-syntactic compounds whose syntactic components are a VP that is first generated in syntax and is then looped back to the lexicon to occur as stems in compounds; and d) traditional Chinese “disyllabic compounds of both syntactic and lexical

    origin have already evolved to become root words due to the likelihood that the disyllabic phonological/prosodic pattern might have triggered the cognitive process in which rule-generated constructs are commuted to memory-stored root words.


    lexicon, complex word formation, commutation between computation and memory

1. The Components of the Lexicon

Part of the speaker’s grammatical competence is his/her ability to access and generate well-

    formed morpheme strings which we call words (a string ? one morpheme). To access is to

    draw on root words stored in the lexicon, and to generate is to construct complex words from

    roots and affixes, resulting in compounds, words with derivational or inflectional affixes, words with reduplicated roots, and even abbreviated word forms. Descriptively, the lexicon represents the speaker’s grammatical competence on producing well-formed words, and it

    consists, according to Kiparsky (1982) and Pinker (1999: 205), of three components, with relations to syntax as shown in (1):


    Yuanjian He 100


     Complex x SyntaWord Formation



    Every inter-channel in the lexicon is eventually linked to syntax. First, roots may go directly to syntax. If not, they go to complex word formation or to regular inflection. Second, the output of the complex word formation, i.e. complex words, will either go to syntax or to the regular inflection. Third, the outcome of the regular inflection, i.e. words with inflectional morphemes, will also go to syntax.

    Pinker (1999) stipulates that roots are word forms that have been memorized, and to access them is to access the relevant part of the memory system. In contrast, complex words, including words with regular inflectional affixes, will be generated by rules, a process which is the computing act of the mind (ibid.).

    As a rule, monosyllabic words in Chinese are roots (e.g. Chao 1968). For the present study, I also take disyllabic words as being root-like and no longer requiring computing for structure-building, see Section 5.0. Thus, I am concerned primarily with how multi-syllabic complex words are generated in Chinese.

    Section 2.0 introduces the system of Generalized Transformation (GT) (Chomsky 1993/1995), followed by discussion of major cases of Chinese complex word formation under GT in Section 3.0. Special cases of Chinese synthetic compounding are analyzed under the loop theory of Pinker (1999) in Section 4.0, and Section 5.0 presents arguments for what has been traditionally called “disyllabic compounds” being designated as root words instead. The

    conclusion is dealt with in Section 6.0.

    2. Generalized Transformation as a Unified Computational System for Grammar

The issue of how to capture the speaker’s grammatical competence in producing well-

    formed words is reducible to how grammar generates lexical structures with a set of rules on a context-free and non-redundant basis. System-wise, computational rules, i.e. rules for constructing structures, ought to be the same for every component of grammar, namely, the lexicon, syntax, the phonetic form (PF), and the logical form (LF).

    The quest for a unified computational system in the past has led to studies adapting the X-bar rule schema for application in morphology. This is seen with Chinese linguists, e.g. Tang (1982, 1988, 1993, 1995; in part 1991a-b), Dai (1992, 1997), Sproat & Shih (1996) and Packard (2000), as well as with authors investigating other languages, e.g. Selkirk (1982), Scalise (1984), Sadock (1991) and Sadler & Arnold (1993).

    However, certain properties of the X-bar rule schema make it difficult to apply in morphology. One is its false representational generality, as seen in Chomsky (1970, 1981, n n-11986a-b) and in Jackendoff (1977). For instance, “X? X (YP) (order irrelevant)” has to n n-1n n-1n n-1entail a series of rules like X? X, X? X YP, and X? YP X, much the same as

Chinese Lexicon as a Generating System 101

    the early phrase structure rules, e.g. those of Chomsky (1965). Such redundancy is less a problem for syntax than morphology, where a lexical structure has a limited capacity, more limited than a XP, and requires a more precise and yet the same context-free generating capacity as in syntax.

    Another property of the X-bar rule schema is its uniquely-defined hierarchy. Each level of a XP (= X”), for instance [ X YP], [ X’ YP] or [ X’ YP] (order of constituents X’X’XP

    irrelevant), represents a unique structural relation between constituents of that level and the head-of-phrase, namely, the relationship of complement-head, adjunct-head, or specifier-head. Such a hierarchy, denoted by bar-notations, is not universally retainable in a lexical structure, again for its more limited capacity than a XP.

    The tension between the want of a unified rule system for structure-building and the failure of the existing systems, such as the X-bar rule schema, has sometimes resulted in a total collapse of the rule schema. The best example is Packard’s (2000: 168) convoluted lexical rules for Chinese:

     -0 -0 / -1 / {W}-0 / -1/ {W}(2) a. X? X X -0 -0b. X? X G

    Instead of representing a morpheme (free or bound) of a syntactic category (N, V, A, P, etc.), X in (2) is associated with a primitive class, e.g. root, bound root, affix, and so on. In addition, -0 -1 there is a mixture of arbitrary bar-notations and primitive symbols. E.g. X= root, X= bound W-0root, X = affix, G = grammatical affix, {} = selected for once only, and so on. Also, only X is

    allowed to recur, but not others. As a result, the rules are almost idiosyncratically context-sensitive, and no longer have a context-free generating capacity required of computational rules, which by definition apply across the board to any item that calls on the rules to process it (Chomsky 1993/1995, Pinker 1999). Other defects include misconceived bar-notations and lack of principles of describing the category of a lexical structure formed by these rules. But, as well-documented in a number of studies, if a lexical structure has a head, the head should determine the category of that structure (e.g. Williams 1981, Di Sciullo & Williams 1987, Katamba 1993).

    This tension did not go away until the system of Generalized Transformation (GT) was introduced in Chomsky (1993/1995). GT replaces the rule schema but retains the X-bar format. As a result, the fore-mentioned previous representational redundancies are dissolved and relevant hierarchy reduced to a level acceptable to morphology. Though it has mainly been experimented on syntax and LF, and its application to lexicon largely untested, GT was introduced in the spirit of serving as a unified computational tool for grammar. It operates in two phases: projection and merge (Chomsky 1995: 189-190). Projection is a concept as well as a technical operation. For a lexical item X, once drawn into computation (i.e. structure-building), it will project into a hierarchical constituency that immediately dominates itself. Namely:

(3) X ? [ X] X

    ?” denotes projection, and it also denotes merge, movement, or deletion, depending on the structural and operational context. It is a general symbol for transformation, so to speak. It is assumed in Chomsky (1993/1995) that projection and merge occur in morphology and syntax, movement in syntax and LF, and deletion probably in PF.

    Yuanjian He 102

    In syntax, X = N/A/V/P/etc. Further projections will conform to relevant bar-format, e.g. [ X] (X = [ X]), [ X’] and [ X’] (XP = X”) (Chomsky 1995: 189). X’XX’XP

    In morphology, X = morpheme of a category of N/A/V/P/etc. A morpheme is either a root or an affix. Further projections will not conform to the bar-format, e.g. [ [ X]]. XX

    Now consider merge, which subsumes two operations: insert a primitive position in a targeted structure, and substitute the primitive position with another structure. Suppose that [ X] is targeted for merge. It will further project into a structure containing X

    [ X], and then a primitive position “0” is inserted and immediately substituted by another X

    structure Y:

(4) a. [ X] ? [ [ X]] XXX

    b. [ [ X]] ? [ [ X] 0 ] or [ 0 [ X]] XXXXXX

    c. [ [ X] 0 ] ? [ [ X] Y ] XXXX

    d. [ 0 [ X] ] ? [ Y [ X]] XXXX

    The head-parameter decides to which side of X, i.e. left or right, Y is merged. In other words, language-specific conditions dictate whether the merged structure is head-initial, like (4c), or head-final, like (4d).

    Note that by virtue of the fact that it is [ X], not Y, that is targeted for merge, hence X, not X

    Y, is therefore the head of the merged structure. In effect, by appropriate targeting, the merge operation automatically determines the category of a branched structure, making it either left- or right-hand headed.

    The process in (4) applies either in syntax or in the lexicon. In syntax, it is appropriately applicable to constructs like Verb-Aspect clusters in Chinese, where aspect markers are merged with a verb, such as [ V Asp], which then continues to project till it forms a VP. V

    In the lexicon, the process simply generates lexical structures. Given that Y is a structure itself, i.e. [ Y], [ [ Y] Z] or [ Z [ Y]], in which Z is a also a structure, the merged YYYYY

    structures in (4c-d) would therefore represent the following:

     c. X d. X

     X Y Y X

     | Y Z Z Y |

     X | | | | X

     Y Z Z Y

     e. X f. X

     X Y Y X

     | Z Y Y Z |

     X | | | | X

     Z Y Y Z

Chinese Lexicon as a Generating System 103

    In theory, GT may continue to operate on any of those structures in (5). In reality, the capacity of a lexical structure is relatively limited compared to its syntactic counterpart.

3. Complex Word Formation in Chinese

    Now I demonstrate how GT generates plural nouns, words with derivational affixes, and compounds in Chinese.

3.1 Regular Plural Nouns

    Only human nouns inflect for plurality in Chinese, and mainly for stylistic purposes, because, as is well known, Chinese nouns as a category do not differentiate singular from plural. In fact, plural nominal inflection is probably the only regular inflectional morphology in Chinese. The plural suffix involved is “-men”.

    Di Sciullo and Williams (1987: 25) and Katamba (1993: 312-313) observe that inflectional affixes determine the category of an inflected structure and hence are its head. Under GT, it means that inflectional affixes are targeted for merge, as shown with the Chinese -men” in:

     [ ](6) a. N ?N N

     [ ] b. men ?menN-Plu.

    [ ] [ []] c. men? men N-Plu.N-Plu.N-Plu.

    [ ] [ [ ]] d. men?0 men N-Plu.N-Plu.N-Plu.

     [ [ ]] [[ ] [ ]] e. 0 men?NmenN-Plu.N-Plu.N-Plu.NN-Plu.

N-Plu. represents nominal plurality as a category, and N any form of noun.

    Taking N as a root noun for example, we may have:

(7) N-Plu.

     N N-Plu.

     | |

    (8) N 老师

     laoshi men N N-Plu.

     | | teacher plu.

     - teachers 老师

    Note that N-Plu. is not just symbolic. It means that the structure is a plural noun and will get plurality interpretation when it enters in the Logical Form (LF). Compare (7) with (8): (7) is headed by the plural affix, but (8) by the noun, entailing that while [ N] is merged in [ 0 NN-Plu.

    [ men]] in (7), the reverse happens in (8). As a result, (7) is a plural noun, whereas (8) is N-Plu.

    just a noun. (8), if ever formed, will crash for not representing the correct semantics under the principle of Full Interpretation (Chomsky 1991: 441-442, 1993: 26-27, Chomsky & Lasnik 1995: 27).

    Yuanjian He 104

3.2 Words with Derivational Affixes

    It is a thorny issue in Chinese as to how to distinguish between derivational affixes and bound roots (Lu 1964, Chao 1968, Zhu 1982, Pan et al 1993, Packard 1997, 2000), because purely categorial affixes are rare in Chinese. For example, English affixes like cal (Adjectival)

    and ly (Adverbial), as in histori-cal-ly, have no comparable counterparts in Chinese. It is

    therefore prudent for me to adopt a working criterion for the current study, whereby we take Chinese derivational affixes as monosyllabic and capable of determining the category of the

    derived word, namely, they can head a derivation.

    By the criterion, a typical Chinese derivational head is a suffix, e.g.

(9) 创造 机械化

     V-N N-V/N

     chuangzao xing jixie hua

     create -ness machine -ize / ization

     - creativity - to mechanize / mechanization

    Taking the first item in (9) as an example, the process of structure-building is as follows:

    创造创造(10) a. ? [] V

     b. ? [] N

     c. [] ? [ 0 [] ] N NN

    创造 d. [ 0 [] ] ? [ [ ] [] ] NN NVN

Putting the end result of (10) in a tree diagram, we have:

    (11) N

     V N

     | |


     chuangzao xing

     create -ness

     - creativity

    The second item of (9), if categorized as an (ergative) verb, would have undergone through a similar structure-building process as in (10), except that the derived structure is headed by a verbal suffix.

    A less homogeneous situation is found with prefixes. Some can head a derivation, others cannot. Examples of the former type are:

Chinese Lexicon as a Generating System 105

(12) 方位 角度 自然

     quan-fangwei duo-jiaodu chao-ziran

     all-position many-angle super-nature

     - panoramic - of multi-perspective - supernatural

The prefixes are adjective-like, and the roots are nouns. Importantly, the derived words are

    also adjective-like. This is because, as Shao et al (2001: 181) observe, unlike nouns, the

    derived words in (12) are unable to be modified by a numeral-classifier cluster, e.g. *yi ge

    quan fangwei (*一个全方位;a panorama), *yi ge duo jiaodu (*一个多角度;a multi-perspective), and *yi ge chao ziran (*一个超自然;a super-nature).

    The structure for items in (12) would be (13), taking the first item as example:

(13) A

     A N

     | |


    quan- fangwei


     - panoramic

    But not every prefix can head a derivation, as in:

(14) 金属 作用 导体

     fei-jinshu fan-zuoyong chao-daoti

     non-metal anti-effect super-conductor

     - non-metal - contra-effect - super conductor

Though the prefixes are adjective-like, the derived words are nouns instead of adjectives, and

    hence right-hand headed, as illustrated with the first item of (14):

(15) N

     A N

     | |


     fei- jinshu

     non- metal

     - non-metal

    Yuanjian He 106

    Thus, as is inferable from (13) and (15) respectively, either roots or affixes seem to be able to head a derivation in Chinese. Technically, this is achievable by targeting either affixes or roots for merge.

    Chinese also appears to have a few instances of mid-affixes, as in:

(16) 对得 吃得 划不 赶不

     dui-de-qi chi-de-xiao hua-bu-lai gan-bu-ji

     face-able-up eat-able-digest spend-not-come hurry-not-reach

     - worthy of - able to cope - not worth it - unable to reach on time

    The middle element is taken as an affix for its obligatory presence that makes these items a word, i.e. if it is taken out, the resultant items are no longer a word (Hu et al 1992: 249). The relevant structure, for the first and the second item, is:

(17) a. V b. V

     Mod V Neg V

     V Mod | V Neg |

     | | | |

     qi lai

    dui -de- up hua -bu- come

     face -able- spend -not-

     - worthy of - not worth it

    In multiple derivations, direction of derivation often coincides with head directionality (Williams 1981). But, unlike other languages such as English (e.g. [ [ [ histori] [ cal]] ADVANA

    [ ly]]), Chinese has only a very limited number of multiple derivation cases. The nature of ADV

    the Chinese case needs to be further probed in future studies.

    Finally, it should be mentioned that a compound, in addition to roots, can also take a derivational affix, as in:

(18) A

     A N

     | A N

     | |

     fan 太平

     pan- taiping yang

     pacific ocean

     - pan Pacific Ocean

    Fan” (pan-) is another generally-assumed prefix in Chinese (e.g. Hu et al 1992, Shao et al

Chinese Lexicon as a Generating System 107

    2001). “Taiping Yang” (Pacific Ocean) is, however, not a root, but rather a compound. In (18), [[ 太平] []] (Taiping Yang, Pacific Ocean) is merged to the prefix [] (fan, pan-). N A N A

    Cases like this suggest that when derivation and compounding are both involved in complex word formation, compounding must proceed to derivation to allow compounds to take derivational affixes (also see Katamba 1993). Compounding itself is discussed next.

3.3 Extra-centric Compounding

    Compounding is either extra-centric or endocentric. Of the former type, examples are:

(19) [稀奇][古怪] [阴谋][诡计] [招摇][撞骗]

     xiqi-guguai yinmou-guiji zhaoyao-zhuangpian

     rare-odd plot-trick boast-cheat

     - very odd - tricks / intrigues - cheating

    The category of those words is jointly determined by the categories of the stems. Taking the first item of (19) as an example:

(20) A

    A A

    | |


    xiqi -guguai

    rare -odd

    - very odd

    (20) is a conjoining structure, of which either stem could have been targeted for merge:

(21) a. A ? [A] = one stem A

     b. A ? [A] = the other stem A

     c. [A] ? [ [A]] = either stem A AA

     d. [ [A]] ? [ [A] 0 ] or [ 0 [A]] AA AA AA

     e. [ [A] 0 ] ? [ [A] [ A]] or AA AA A

     [ 0 [A]] ? [ [A] [ A]] AA AA A

Examples of other categories are: [ [阴谋] [诡计]] (tricks/intrigues), [ [招摇] [ NN N VV V

    ]] (cheating), as shown in (19).

3.4 Endocentric Compounding

    This includes, roughly, the ordinary type and the synthetic type. The former type contains no verbal stem but the latter type does, to be separately discussed below.

    Yuanjian He 108

3.4.1 Ordinary Compounds

Of these, I will discuss the “N-N-(N)” form only, with four different compositions:

    (i) Monosyllabic stem + disyllabic stem:

(22) [][衬衫] [][花瓶] [][书架] [][拖鞋]

     bu-chenyi ci-huaping tie-shujia pi-tuoxie

     cloth-shirt porcelain-vase iron-book shelf leather-slipper

     - cotton shirt - porcelain vase - iron book shelf - leather slippers

(ii) Disyllabic stem + monosyllabic stem:

(23) [电视][] [墨水][] [汽车][] [牛仔][]

     dianshi-ju moshui-bi qiche-zhan niuzai-ku

     TV-drama ink-pen bus-station cowboy-trousers

     - TV drama - fountain pen - bus station - jeans

     (iii) Disyllabic stem + disyllabic stem:

(24) [水泥][地板] [英语][字典] [政府][机构]

     shuini-diban yingyu-zidian zhengfu-jigou

     cement-floor English-dictionary government-organization

     - cement floor - English dictionary - government departments

    (iv) Any of (i), (ii) and (iii) + disyllabic stem:

(25) [[卫生][]][官员] [[中文][]][教授]

     weishengbu-guanyuan zhongwenxi-jiaoshou

     health ministry-official Chinese Department-professor

     - Health Ministry official - professor of the Chinese Department

     [[铁路][]][职员] [[股票][市场]][动态]

     tieluju-zhiyuan gupiaoshichang-dongtai

     railway bureau-clerk stock market-trends

     - railway bureau employee - stock market trends

All are easily formable under GT. Taking the first item of (25) as an example, it is generated in

    the following processing:

    卫生 卫生(26) a. ? [] N

     b. ? [ ] N

     c. [ ] ? [ []] NNN

     d. [ []] ? [ 0 []] NN NN

Report this document

For any questions or suggestions please email