<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1"/><title>ON A CERTAIN DISTRIBUTION OF SEMANTIC UNITS</title></firstpageheader><abstract></abstract></header><body><section title=""><p>1965   latezoetlonôl Conference o*n Computation.! Linguistics</p><p>0i«r A   CERTAIN   DISTitïBUTIOK   0?    SEIVÏANTIC UIÛÏS</p><p>V.'o jcisoli ikalmov/ski</p><p>Department of Gs'neral- Linguistics Jasellciiian University 35 5 .a-upnicza Araküv;, Poland<page local="2"/></p><p>Skalmowski - 1</p><p>SUMMARY. A remarkable regularity of distribution of Arabic verbal roots in the vocabulary is shown to exist. Presented results suggest that similar regular distributions of semantic units in other languages may be found with the help of word formation rules and vocabulary statistics. Possible applications in approaching the problem of "true" multiple meaning in MT are being discussed.</p><p>The notion of "semantic unit" may be formulated in several ways /1/ so that the application of this term makes its explicit definition indispensable. It seems that difficulties in defining it arise from the fact that like most general terms it should be related to some definite theory. At present we do not possess any sufficiently strong and general theory of the semantics of natural languages, though important preliminary steps in this direction have already been made <b><i>/Z/.</i></b><b><i> </i></b>For this reason most seman­tic investigations of natural languages still preserve the "artisanlike" character stressed by M.Coyoud and all definitions of the semantic notions remain rather tentative — as well as all the more general conclusions drawn from such investigations. This, too, holds true for the present contribution, in which an empirical fact is described and some remarks on its possible applications to the problem of the "true" multiple meaning have been made.</p><page local="3"/><p>Skalmowski - 2</p><p>For this paper it seems advisable to hold apart two notions: that of the "concept<footnote anchor="11"/> and that of the "semantic unit". Given a generative descriptional device   G   /grammar/ and a pro­jective system of the type proposed by Katz and Fodor   S /seman­tics/ we can describe a semantic <u>concept</u>   in a'language   L as a set of   n-tuples of symbols from   G   and   3, ordered or par­tially ordered by the relations which define the formal rules of these systems, and having a common derivation in   <b><i>2.</i></b><b><i> </i></b>This broad frame allows us to regard as a concept every dictionary entry -except for-the "grammatical words" which do not possess any de­rivations in   G   - and leaves us a wide margin of freedeom in constructing arbitraty "conoe: t -systems- with a priori estab­lished features.</p><p>In a similar way we may describe a <u>semantic unit </u>as a set of n-tuples of   G-symbols, G-rules of word formation and   g-symbols, ordered or partially ordered by means of rela­tions which define the formal rules of these systems, and having a common derivation in   G   from some   3—symbol uniquely related to some   ü-symbol. This allows us to relate with tue notion of a semantic unit the linguistic notions of morpheme /or mere strictly: semanteme/ and of "word family", defined in terms of grammatical derivations.</p><p>The thesauric approach to the problem of meaning . in i:T /s.e.g.3/ pays tribut to the idea of ordering the symbols within the concepts, but at the same time it brings to light the problem of multiple meaning. This problem has been much discussed already /s.e.g.4/, but it is still far from being solved in all its aspects. Generally speaking the main difficul-<page local="4"/></p><p>Skalmowski - 3 ty arises from the fact that the "concept-systems" of languages are not isomorphic and even if we manage to bring them closer together there remains some amount of "looseness" within the concepts themselves, giving rise to the problem of "true" mul­tiple meaning. The "contextual'' multiple meaning may be resolved' - in principle, at least - by extending the notion of concepts both 'in the source and in the target languages to whole sentences or even larger utterances; this is allowed by our " broad<footnote anchor="1"/>' treatment of this notion, not specifying   the maximal size of the n-tuples of symbols. By this extension the inner structure of concepts makes the relations defining the isomorphism of the "concept -systems" more apparent; thus even such cases as the adequate translation of the Russian <i>U</i><b><i>$JteneHU9</i></b><i>&amp;s </i>the English "<u>changing</u>   /the order of integration/"and "<u>varying</u> /argument/" are theoretically resolvable. Yet there exist instances where the extension of concept would have to go beyond limits and to involve the whole language:  these are cases of "stylistic' difference in which there are not apparent reasons for choosing one of the possible synonyms instead of the other but where tue difference is distinctly felt by competent bilingual speakers. The problem is important for the translation of literary pieces, especially poetry; by the present stand of       it is still an "academic" problem, of course, but it exists after all. It may­be best illustrated by the question whether tuera are "better<footnote anchor="11"/>and "worse" translations of nonsensical expressions, such as the famous "furiously sleeping ideas", „egative answer would mean that every translation is equally good, which in turn would mean that only "meaningful" sentences are translatable; in that case<page local="5"/></p><p>Skalmowski the ivlï problems would be "enriched" with the whole load of phi­losophical questions - an embarassing development, certainly.</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">4</doubt><p>Vaguely felt differences between the intrinsic "semantic values" of different elements of language have given rise to the notions of "size" or "content" of semantic elements /5/ and several attempts - both to define these notions and to furnish models of the underlying mechanism have been made /5,6/. The main assumption - based on observations of ,/illis - was that there existed a "natural hierarchy" of concepts in natural languages,' forming a tree or at least a lattice with some de­finite statistical properties.</p><p>The present paper gives some results of an in­vestigation undertaken in order to test this hypotheses. Because of the marvelous clarity of the grammatical structure Arabic has been chosen as a "laboratory example". ^bout <b><i>90,j </i></b>of Arabic semantemes are verbal roots, with very few exceptions consisting of three consonants   G^-G^-C^; the usual dictionary form is the 3   per s. sg. maoc. nerf, of the form C.aC aC-.a, s.&lt;-.</p><doubt alpha="30.0" length="10" tooSmall="False" monospace="0.0">1cL    j7-</doubt><p>kasara "to break" /lit.  "he has broken"/. There are more than ten different verbal stem-patterns i.e. word formation rules, raodyfying the basic meaning of the root in a specific way; thus the    stem-pattern II: C ,aC0C..aC-,a   adds to tue basic meaning the shade of intensity, e.g. kasara "to break' kassora "to smash"; the stem-pattern III is conative, the IV - causative, etc.</p><p>. nil the trilitersl verbal roots in the ^rabic vocabulary have been divided into se para to classes according to their ability to form   s = 1,2,..., n different stems. ..e only<page local="6"/></p><p>Skalmowski the number of stem-patterns was considered and further applicabl v/ord formation rules /substantivisations, adjectivisations etc./ were disregarded this classification is a very rough approxi­mation to the hypothetical underlying hierarchy. It has been assumed that the number of stem-patterns defining a given class may be approximately viewed as an exponent of the "content" or "semantic value" of the semantic units belonging'to this class and that - if the hypothetical hierarchy was really based on this principle - the number of roots with greater   s   should be smaller than' that with smaller   s. Baranov's Arabic-Russian Dictionary /7/ has been used for counting the roots and it has been found that the relation between   s /the number of steui--patterns characterizing the given class/ and   r   /the number of roots belonging to this class/ was not only inversly proportiona but. also nearly functional and that the distribution of roots in the Arabic vocabulary may be described as a simple function r/s/ = N/As'" +Bs +C/, where   Ii    is the sum-total of roots and A, B and G are specific constants. The goodness of fit has been tested by the chi-sapuare distribution and it has been found that the differences between the empirical data and the theoretical distribution - except for one value - do not exceed 0.3 signifia ance .level.</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">5</doubt><p>In order to estimate the possible differences between particular dictionaries - which could arise from differences between the materials used for their compilation -two samples of ca. 700 items each have been taken from two different dictionaries   /7,Q/ and the distribution of roots in them compared with each other and with the over-all distribution<page local="7"/></p><p>Skalmowski - 6</p><p>All the distributions show a striking similarity, rendering nearly identical chi-square values. '</p><doubt alpha="50.0" length="2" tooSmall="False" monospace="0.0">x/</doubt><p>This result is a strong argument for the general validity of the discussed distribution in Arabic - and this fact in its turn speaks in favour of the existence of "natural hierarchies" of the semantic units in general.</p><doubt alpha="48.1" length="81" tooSmall="False" monospace="0.0">The constants for Baranov's Dictionary are: A = 0.004419 , B = 0.082 , C = 0.3812</doubt><p>It seems very probable that similar regular dis­tributions might be found in other languages, too - perhaps the ensemble of the "semantic parameters" would have to be much wider and the "trial and error" investigations would require more time but the whole work can be easily mechanised. The idea of interconnections between the syntactic and semantic structures of language is not new in structural linguistics /s.9 and 10/ and investigations along these lines have already been led in the domain of computational linguistics under direction of P.Garvin /11/« % suggestions go towards discovering such regular<page local="8"/></p><table class="main" frame="box" rules="all" border="1" regular="False"><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>x/        The figures</p></td><td class="cell"><p>are as</p></td><td class="cell"><p>follows :</p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>r 1</p><p><b>==£ = = = = === = = = </b><b>={==:</b><b> </b><b>=</b><b> </b><b>=</b><b> </b><b>=</b></p><p>Baranov's      I <b>Qoo </b>Dictionary    | yüü</p></td><td class="cell"><p><b><i>i</i></b></p></td><td class="cell"><p>2</p><p>714</p></td><td class="cell"><p>j</p><p><b>1 </b>3</p><p>i</p><p>_i---</p><p>1---</p><p>1586 t</p></td><td class="cell"><p>I 4</p><p>—i----</p><p>~T----</p><p>1411</p></td><td class="cell"><p></p></td><td class="cell"><p>' 5 254</p></td><td class="cell"><p><footnote anchor="1"/></p></td><td class="cell"><p>' 6 154</p></td><td class="cell"><p>' 7 74</p></td><td class="cell"><p>i's</p><p>I 18</p></td><td class="cell"><p>1 *</p><p>110</p></td><td class="cell"><p></p></td><td class="cell"><p>N</p><p>3209</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>theoretical   i q74_ distribution J</p><p><b>L_</b><b> </b><b>____________</b><b> </b>,     . I     , . .</p></td><td class="cell"><p></p></td><td class="cell"><p>754</p></td><td class="cell"><p>l</p><p>|561 I</p></td><td class="cell"><p>|398</p></td><td class="cell"><p>j 262</p></td><td class="cell"><p>I</p><p>I 155 l</p></td><td class="cell"><p>I 76</p></td><td class="cell"><p>J 26</p></td><td class="cell"><p>I 4</p></td><td class="cell"><p></p></td><td class="cell"><p></p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>sample           i ;n/Baranov/ j</p></td><td class="cell"><p>|163</p></td><td class="cell"><p>I</p><p>i 131 i</p></td><td class="cell"><p>I 99</p></td><td class="cell"><p>I 55</p></td><td class="cell"><p>i</p><p>! « i</p></td><td class="cell"><p></p></td><td class="cell"><p>18</p></td><td class="cell"><p>j 3</p></td><td class="cell"><p>I 1</p></td><td class="cell"><p></p></td><td class="cell"><p>708</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>sample           I OOQ/Wehr/ j</p></td><td class="cell"><p>jl63</p></td><td class="cell"><p>1117 i</p></td><td class="cell"><p>j 95</p></td><td class="cell"><p>j 50</p></td><td class="cell"><p>i</p></td><td class="cell"><p>26</p></td><td class="cell"><p>j 13</p></td><td class="cell"><p>j 3</p><p><b>i</b></p></td><td class="cell"><p>j 1</p></td><td class="cell"><p></p></td><td class="cell"><p>697</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr></table><p>Skalmowski - 7 distributions which would facilitate the task of finding more strict correlations between the synonyms within particular con­cepts on computational basis. The underlying assumption is that the "universes of discours" in various languages are of about the same "size" /whatever it would mean - but such an assumption is tacitely made in every translation/, and that the semantic units underlying the components of concepts are ordered according to their "content", so that the problem of "true" multiple meaning in certain cases may be solved by means of matching the components of concepts of the source and target languages on the basis of their "semantic value".</p><p>As an illustration let us consider a few equivalent English verbs in two different translations /A. -12,    . -13/ of the Koranic Sura 84, being translations of Arabic verba derived from roots all belonging to the same class /5 stem -patterns/, i.e. according tc our assumption having about the same "semantic value". The "value" of corresponding ünglish verbs has been tentatively estimated by the number of different sub-entries in Chambers's ^k^<footnote anchor="1"/> Century Dictionary /numbers in brackets/: <u>Arabic </u><u>English</u> infatara<page local="9"/></p><doubt alpha="50.0" length="16" tooSmall="False" monospace="0.0">/A./to split/16/</doubt><doubt alpha="64.3" length="28" tooSmall="False" monospace="0.0">/K./to severe/3/todeceive/5/</doubt><doubt alpha="100.0" length="5" tooSmall="False" monospace="0.0">garra</doubt><doubt alpha="50.0" length="34" tooSmall="False" monospace="0.0">to beguile/4/to shape       /1Q/ '</doubt><doubt alpha="100.0" length="6" tooSmall="False" monospace="0.0">sawiya</doubt><doubt alpha="59.3" length="27" tooSmall="False" monospace="0.0">to fashion /11/ to roast/9/</doubt><doubt alpha="100.0" length="4" tooSmall="False" monospace="0.0">sala</doubt><doubt alpha="54.5" length="11" tooSmall="False" monospace="0.0">to burn/30/</doubt><p>Skalmowski - 8</p><p>The applied "method" being unsystematic and ad hoc the example allows no generalisations but it may illustrate our argument that the problem of '"true" multiple meaning arises in cases of "expressive language" from the fact that even when the concepts of source and target languages agree there is no correlation between their respective components except for differences between their "value", based on differences on <i>tue </i>paradigmatic level. Thus e.g. for the concept "applying heat on something" two different semantic units could have been arbitrarily chosen by the two interpreters, as they regarded th subsets of synonyms within the concepts as unordered, My suggestion is that these subsets might be at least partially ordered by means of the intrinsic value of the semantic units underlying them and that correlations between them might be established in more objective terms of numeric measures of their content.</p></section><references><p>/1/ Coyaud IJ. -   Quelques problèmes de construction d'un "langag formalise sémantique". La Traduction Automatique 1963 fasc.2 /2/ Katz J.J., Fodor J.A. - The structure of a semantic theory.</p><doubt alpha="42.1" length="19" tooSmall="False" monospace="0.0">Language 39/2/,1963</doubt><p>/3/ Sparck-Jones K. - Mechanised semantic classification. 1961</p><p>International Conference on ^.ech.Transi, and applied Language Analysis. London 1962. Vol.11<page local="10"/></p><p>Skalmowski - 9 /4/ Janiotis A., Josselson H.H. - Multiple Meaning in Machine</p><p>Translation, ibid. /5/ Herdan Cr. -   Type-Token Ma the mat ics, Mouton et Go. The Hague /6/ Mandelbrot B. - On the Language of Taxonomy: an Outline of a "Thermostatistical" Theory of Systems of Cate­gories with v/illis ,'^atural/ Structure. Information Theory, ed. G.Cherry,London 1956 /8/ V/ehr H. '-      Arabisches '.Yörterbuch für die Schriftsprache der</p><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">1960</doubt><doubt alpha="57.8" length="64" tooSmall="False" monospace="0.0">/7/ Baranov X.K. - Arabsko-russkij slovar, /2^ ed./, Moskwa 1958</doubt><p>Gegenwart. O.Earrasowitz, Leipzig 1952 /9/ kurylowicz J. - Derivation lexicale et dérivation syntaxique.</p><p>/Contribution à la théorie des parties duxdiscours/. Bull, de la Soc. de Linguistique de Paris, Vol.^MVII, 1936 /10/ kurylowicz J. - Zai.ietki o znacenii olova. Voprosy Jazykoznanija, 1955, Il o 3.</p><p>/11/ Swanson D.H. - The mature of Multiple Meaning. Proceedings of the National symposium on MT /Los Angeles 1960/', ed. K.P.Edmunds on /13/ Nicholson R.A. - A Literary History of the Arabs.</p><doubt alpha="62.7" length="67" tooSmall="False" monospace="0.0">/12/ Arberry A.J. - The n.oran Interpreted. Oxford Unv-. Press 1964</doubt><p>The Cambridge Unv. Press 1907</p></references></body></article>