Directly inspired by: Christian Lehmann
The lemma is given in standard orthographic representation. Sometimes, the lemma representation itself is used for additional purposes, e.g. to indicate stress,syllable boundaries or word-break points for hyphenation. However, the present comprehensive microstructure provides dedicated fields for such purposes.
If the lemma is a segmental sign, but not an independent word form, it is flanked by a hyphen (or two hyphens) at the side where it is bound, like English cran- and -ize.
Only for nouns, allows users to search the dictionary with nominative forms.
Homonyms are, of course, separate entries distinguished by numbers; see the separate section for details. The same goes for the readings of a polysemous entry; see next item.
The standard graphemic representation already serves as lemma. This field is needed for alternative spellings found in the corpus. For example, the lemma encyclopedia contains encyclopaedia in this field. The lesser the degree to which the language is standardized, the more variation there is in available texts (e.g. of earlier periods); and the fewer dictionaries there are for the language, the more important does it become to display the variants in this field.
What is meant here is the proper name of a grammatical morpheme. For instance, the proper name of the English suffix -ize is ‘verbalizer’, and the proper name of English 's is ‘(Saxon) genitive’. Consequently, the possible contents of this field are unique (i.e. there is no range set), and only a portion of the entries of the lexical database will be specified for this field, viz. the grammatical formatives.
From among the grammatical categories of a lexical item, this field is dedicated to its syntactic category qua distributional category (for morphological categories see #18). This is understood as a narrow subcategory of a part of speech, e.g. ‘proper noun’, 'transitive verb with additional prepositional complement'. The taxonomy implied here will be explained in the grammar.
A lemma which belongs to diverse syntactic categories is considered polysemous. Each category then constitutes a record.
The nature of the inflectional categories to be specified here depends on the language. Examples are noun class, gender, possessive class, verbal voice, inflection class. An inflecting word of a language may fall into diverse morphological categories at once, e.g. voice x, conjugation class y. Some may be syntactically relevant lexical classes such as the gender of a noun, others may be purely morphological classes such as inflection classes. It is practical to set up a separate field for each of these categories.
This field contains the syntactic and semantic construction frame (for a verb: its valency frame), including selection restrictions. A case in point are different constructions of complement for complement-taking verbs. This is a specification of the information contained in #syntaxic_category. It should be represented by a formal notation, e.g. [ ~ X ]Y, where ~ indicates the position of the lemma, X represents relevant syntactic constituents or properties of the context, and Y is the syntactic category of the construction.
Irregular inflection. If the stem has inflected forms not derivable by rules pertaining to its inflection class, those forms are listed here. They may be both stem allomorphs such as worse, appearing in this field of the lemma bad, and irregular forms of the inflection paradigm, e.g. oxen appearing in this field of the lemma ox.
This field contains the immediate constituents of the lemma stem; as long as binarism obtains, there are two of them. In the case of a compound, they are two stems; in the case of a derivative, they are a stem and some derivational operator which may or may not be segmental. The items listed there are identical to certain lemmas of the database.
This field contains the root constituents of the lemma stem.
This field contains the technical term for the word-formation process that formed the lemma stem, e.g. bahuvrihi, causative, denominal, deverbal, intensive etc. Possible entries in this field are taken from a range set defined in the grammar, where the word-formation processes of the language are dealt with systematically.
In this field, the last word formation process applied is indicated, i.e. the process which was applied to the components of field #15 to form the stem of the lemma. In the case of a derivationally complex lemma, other word formation processes may have created stems that are part of it, in particular those of field #15. Such processes are not indicated here, since they may be seen by following the links of the latter field.
In this field, the set of lemmas is referenced which have the current lemma in their field #morphological_structure. Thus, references between the current field and that field are mutual.
This field lists collocations in which the lemma is involved. These may be any kind of fixed expressions, including phrases, idioms and proverbs (cf. Bergenholtz & Tarp, ch. 7.2). If one has decided to bestow lemma status to such complex expressions, then this field contains links to such lemmas.
Semantic information on the lemma is provided in different languages and from different points of view. The sum of the information contained in this subset of fields is highly redundant. This is a complex, composed field. It embeds a meaning translated for each target language (using language codes), examples in Pali, references for the examples, and translations of the examples in target languages. Also, this is an array, because an entry can have multiple meanings. This field also contains a confidence rating (0 to 10) for each target language.
This field contains a specification of the meaning of the lemma in plain prose of Pali, as it would be the case in a monolingual dictionary. Definitions given here are taken from various commentaries, sub commentaries , grammatical books, and Pali litterature. Sources should be properly referenced.
Each lexical item – at least those with a lexical meaning – belongs to one or more semantic classes. For instance, leg is a body part, spider is an insect, laugh is an expression of emotion. This classificatory information is implicit or even explicit in a good definition. However, it is highly useful to specify it in a separate field. Examples classes are : rivers, plants, body parts, trees, etc... We use classes defined in Aids to Pali Conversation & Translation by A. P. Buddhadatta Mahathera.
This field contains paradigmatic lexical relations to other lemmas which have the current lemma in the corresponding field. These relations are, therefore, mutual. Relartion types can be :
This field contains information on the etymology of the lemma, like an etymological dictionary. Naturally, this is subservient to information on word formation. Here we refer to abstract Proto-Indo-European language word roots.
Cognates.This field contains formally or semantically related words from genetically related languages. Apart from their intrinsic interest, they are often methodologically useful since they may help identify the basic meaning of a lexeme. There can be various target languages for cognates, including sanskrit. Example for lemma 'danta' :
The content of this field goes beyond linguistic semantics, giving information on real-world, especially culture-specific properties of concepts designated. This field may refer to the pertinent section of the 'Situation of the language', esp. the ethnographic situation, for background information. Those information can be present for each target langage. A good source of encyclopedic information is The Princeton Dictionary of Buddhism by Robert E. Buswell Jr. and Donald S. Lopez Jr.
A link to a picture related to the lemma.
A link to a sound file, containing the lemma pronounced by a human.
References to various works related to the lemma. Example : ‘Sinclair (ed.) 1987, ch. 7’
Comment This field contains any additional information, esp. of a methodological, stylistic, sociolinguistic nature, including the status of the lemma and ungrammatical examples. In contrast to the following field, its content could, in principle, be published (although it seldom will be). This field exists for each target language.
This field contains questions to be investigated and problems to be solved in future lexicographic work, especially fieldwork. This field is directly related to the previous one: a problem is formulated in the present field. Once it is solved, its solution is noted in the field ‘comment’ (so that it may not be forgotten), and the problem is deleted. Thus, the content of this field is destined exclusively for the researcher and never published. This field exists for each target language.