What are the acceptable sentences

Introduction to syntax theory

Lecturer: Martin Volk

Grammar writing is much more difficult than rule writing. The intricate interrelations of the individual rules of a grammar make grammar writing a complex and error-prone process, much like computer programming. Friedman `` Computational Testing of Linguistic Models in Syntax and Semantics ''; 1989

Overview

The topics of this lecture are mainly based on [Borsley 91]. However, the English examples have largely been replaced by German ones.


Goals of syntax theory

Syntax theory has two goals:

  • the development of grammars, i.e. precise descriptions of the syntax of natural languages.
  • the development of theories to determine the syntactic similarities of natural languages.

The second point assumes that natural languages ​​do not differ in terms of their structure at will (as was still believed in the 1950s). Phenomena can be imagined that do not occur in any natural language. There is no such thing as a language in which a question is formed by systematically and completely reversing the order of the words.

The boy ate the hamburger. * Hamburger the ass boy the.

What is language

Definition of language according to Chomsky

is the set of all sentences a speaker could use.

Definition of formal languages

A is a set of character strings made up of the symbols of any alphabet (= a finite set of symbols). (after [Hopcroft and Ullman 92]).

Differences Between Formal and Natural Languages

  • Structural ambiguities (see special lecture) He defends hard-earned money.
  • Implicit elements: anaphors and ellipses Hans sees Peter and Maria Susanne.
  • Delimitation: what should be considered grammatically well-formed? Hans sleeps late because he is very tired.

Acceptability and grammaticality

When analyzing a natural language, it must be determined which sentences a speaker of this language uses or could use. This can be achieved by asking native speakers about their judgment, their intuition, about a given sentence. This gives you a judgment about the Acceptability a sentence. However, acceptability is not the same as grammaticality.

A sentence can be unacceptable for a number of reasons. E.g. because it presents problems for cognitive processing.

The man the girl the boy knows likes is here. The man who watches the woman who calls the child stands at the window.

The question arises: are there sentences that are acceptable but not grammatical?

He sees a not unintelligent person.

Grammaticality is based on linguistic competence. This means that a sentence is grammatical if it is classified as `` acceptable '' by most speakers in isolation from a context. In contrast, a sentence is considered acceptable if it is classified as such in a specific context. Acceptability is based on linguistic performance.

Grammatical sentence
Peter is hoping for more money.
Acceptable rate
Peter is hoping for bananas.

in the context of the question:

What does Susanne like to eat?

Syntax theory and traditional grammar

Syntax theory draws on the source of traditional grammar, but strives for greater precision. For this purpose, various grammar theories were formulated within syntax theory, which define the formal framework of the syntax.

Grammar theory (definition)

One is a formal system that defines what the rules and principles of a grammar for a natural language are like. The grammar theory is a metagrammatics because it prescribes the syntax and semantics of the grammar rules.

Examples of grammar theories are:

For example, says the grammar theory GPSGthat the syntax of a natural language can be described by:

  • three rule types (meta rules, ID rules, LP rules),
  • two feature instantiation principles (Feature Specification Defaults and Feature Cooccurrence Restrictions) and
  • three feature inheritance principles (Head Feature Convention, Control Agreement Principle and Foot Feature Principle).

The usefulness of syntax theory

The benefits of syntax theory lie in the following areas:

  • Language teaching and learning
  • Computer implementation of language comprehension

We will first consider systems for recognizing spoken language. With the help of a syntax analysis, a computer can distinguish between more and sea to meet:

He eats more meat than you. He's lying by the sea.

When generating spoken language, the different pronunciations of identical words must be determined.

I don't like assembly, I don't like assembly.

The resolution of ambiguities can affect not only individual words, but also the references within the compound of sentences.

The employees of every department who use PCs ... The employees of every department who use PCs ...

Finally, there are applications in which the syntax analysis is central: Grammar Checker.

motivation

Without assuming an internal structure of sentences, it is impossible to make statements about which sentences are possible in a language and which are not.

The man was angry with the woman.

This sentence could be analyzed as a chain consisting of: article, noun, verb, adjective, preposition, article and noun. However, this analysis would not express many generalizations. E.g. it would not express that the connection between the and man is stronger than between man and was.

Therefore, methods of distributionalism are used to determine larger units, so-called constituents. These include: (see [Duden 95] p. 600 ff.)

  • Replacement sample

    General variant: Replacing a string of words with another string of words. Assumption: If the sentence remains grammatical with replacement, the word strings are of the same type.

    Special variant: Replacing a word chain with a pronoun. Assumption: Everything that can be replaced by a pronoun is a syntactic unit.

    Peter is looking for edible mushrooms. Peter is looking for her. * Peter is looking for edible them.
  • Shift rehearsal

    Moving a string of words in a sentence while maintaining the grammaticality and truth value. The shift options in German include: inversion, passivation.

    Peter is looking for edible mushrooms. Peter is looking for edible mushrooms. Edible mushrooms were looked for by Peter. * Peter looks for edible mushrooms.

    In English, the shifting options include: cleft formation, passivation and topicalization.

    Stefan painted a picture of Maja. Cleft: It was a picture of Maja that Stefan painted. Passive: A picture of Maja was painted by Stefan. Topicalization: A picture of Maja Stefan painted.
  • coordination

    Coordination of two word strings. Assumption: Only word strings of the same type can be coordinated.

    Peter looks for edible mushrooms and small wild strawberries. Peter searches in the undergrowth and at the edge of the forest. * Peter is looking for edible mushrooms and at the edge of the forest.
  • Omission sample

    Gradual omission of individual words. Assumption: The components of a constituent must be eliminated together.

    Peter looks for edible mushrooms at the edge of the forest. * Peter is looking for edible mushrooms. Peter is looking for edible mushrooms.

The following constituents are postulated for English:

  • Noun phrase (NP) consisting of e.g. Det + N proper noun Det + AdjP + N NP Conj NP
  • Verb phrase (VP) consisting of e.g. Verb Verb + NP Verb + NP + PP
  • Prepositional phrase (PP) consisting of e.g. Prep + NP PP Conj PP
  • Adjective phrase (AdjP) consisting of e.g. Adj Adv + Adj Adj + PP

Forms of representation

Syntax structures are represented as nested lists or as trees. These representations are isomorphic.

[The man] [was angry [with [the woman]]].

In the tree display, detailed information that is not of interest is often left out and symbolized with a triangle. Important terminology:

root
The highest node of a tree is called the root of a tree.
Dominance
A knot X dominates a node Y if X is on the way from Y to the root.
Constituents
X is a constituent of Y if Y dominates node X.
Mother knot
X is the mother node of Y if X is the next node on the way from Y to the root.
Sibling node
X and Y are sibling nodes if they have the same mother.

Two important restrictions apply to syntax trees:

  1. Only parts that are next to each other can form a constituent. That means there are no crossing branches in the tree.
  2. No constituent can belong to two parent nodes at the same time.

These restrictions facilitate the automatic processing and representation, but they make the analysis of sentences with discontinuous elements (e.g. separated verb prefixes) and double function (e.g. double subject function in infinitive constructions) more difficult:

He likes to portray himself. He tries to help Gabi.

Constituent structure examined in more detail

A closer examination of the constituent structure shows that not only lexical categories (N, V, Adj, Prep) and phrasal constituents (NP, VP, PP, AdjP) are required, but also intermediate levels.

the old wise women and aged men

This part of the sentence can not only be analyzed with NPs and N, because old men cannot be an independent NP. Therefore one postulates an intermediate plane N 'with:

[NP the [N 'old wise women] and [N' old men]]

With analogous arguments one also postulates V ', P' and A '. In alternative notation, a slash is used. bar) dragged over the respective category symbol. A category that forms a constituent of the phrasal level with a constituent of the intermediate level is called a specifier. specifier).

Phrase Structure Rules

The best known type of syntactic rules are the phrase structure rules (PS rules). They determine what is possible and what is not possible in a language. A PS rule has the form:

A -> B1 ... Bn

where a constituent symbol is grammar and all are either constituent symbols (e.g. NP, AdjP, PP), category symbols (e.g. N, Adj, V) or lexeme (e.g. he, gone, house). The rule can be interpreted as a branch in a local tree or as a condition over a local tree. A local tree is a tree of depth 1, or, in other words, a tree in which there is only one parent node, which is also the root.

Recursiveness is crucial when using PS rules. Recursive rules allow an infinite number of sentences to be described with finite means.

Note: In more recent work on syntax theory it is argued that PS rules do not necessarily have to be part of a grammar (see e.g. HPSG in [Pollard and Sag 94]).

Rules and sentences

The first goal in applying PS rules is that Description of the exact sentence structures a natural language. The restriction to exactly the sentence structures that occur is important, as otherwise rules of the following form could be drawn up:

Sentence -> word sentence sentence -> word

Once you have grasped all the words in the language, you can use these two rules to form all sentences. However, the explanatory value of this type of rule is zero.

A second goal is that description as simple as possible the sentence structures. We are looking for the smallest possible number of rules that are as compact and expressive as possible. If, for example, the direct object always comes before the indirect object in a language, one would only want to formulate this fact once in the grammar and not repeat it in many rules.

A third goal is to find rules that are as easy as possible transferred to other languages to let.

Separate rules of dominance and precedence

In order to be able to express generalizations that are only implicitly contained in PS-rules, a distinction is made in newer grammar theories between dominance and precedence rules. Dominance rules (ID rules for engl. immediate dominance) only determine the dominance relationship between parent node and child node. The order of the sibling nodes is left open. Only through the use of precedent rules (LP rules for engl. linear precedence) succession restrictions are set between sibling nodes.

Dominance rules are noted in a similar way to PS rules with the difference that the elements on the right-hand side of the rule are separated by commas.

A -> B1, B2, ..., Bn

Rules of precedence are noted as:

A. <>

with the meaning that it must stand before when both appear as siblings.

The following then applies: A local tree is well-formed if and only if it satisfies the dominance conditions of an ID rule and the precedence conditions of all relevant LP rules.

Note: ID and LP rules can only be used if the sibling order is independent of the parent node.

Non-local conditions over trees

Both PS rules and ID / LP rules are limited in their scope to local trees. Their use therefore becomes problematic when phenomena are more widely distributed. Examples are discontinuous elements:

Stefan scratched himself.Who did John believe Mary saw? Stefan introduces us to his childhood friend, whom he hadn't seen for 15 years.

One tries to get these problems under control in modern grammar theories by transporting syntactic features in the syntax tree.

In modern grammar theories (such as GPSG, HPSG), syntactic categories are not atomic units but complex structures made up of smaller elements, syntactic features. features), are composed. For example, you want to speak man Not only note that it is a noun, but also that this form is nominative singular and that it is a noun with masculine gender.

`Mann '-> N [case = nom, number = Sg, gender = mask]

In the same way, verb forms must be provided with information on number, person, tense and adjectives with information on case, number, gender and declension.

Phrasal and Lexical Categories

Further evidence for the claim that categories are complex structures is obtained from the observation that phrasal categories are projections of lexical categories (X-bar theory). This means that a category XP normally contains a category X 'as a child node, which in turn has a child X' or X, where X stands for A, N, P or V. So if NP, N 'and N are nominal in character, this should be implicitly coded. This is usually done by introducing a 'bar' feature to indicate the level. So:

[Nominal = +, Bar = 0] N [Nominal = +, Bar = 1] N '[Nominal = +, Bar = 2] NP

Note: Phrasal categories are also known as maximal projections.

Generalizations about categories

The properties nominal and verbal are seen as elementary and serve to define the basic categories:

| + nominal -nominal ---------- | --------------------- + verbal | Adjective verb -verbal | Noun preposition

This definition achieves a higher degree of abstraction, which allows more general statements. Thus, an accusative NP now corresponds to the following feature structure:

[Nominal = +, Verbal = -, Bar = 2, Case = Acc]

Syntactic features

Syntactic categories are nothing more than sets of syntactic features, more precisely: feature-value pairs. These are called feature structures. However, some additional conditions apply. A characteristic in a characteristic structure can have the following values:

  1. a string (e.g.)
  2. a reference to another value (a so-called co-reference) or
  3. again a feature structure

Note: In the following we use NP, VP, AdjP and PP for the sake of simplicity of writing, meaning the complex feature structures with the features: nominal, verbal, bar.

Operations using feature structures

In order to be able to apply rules that contain complex feature structures, we need an operation that determines when two feature structures "match". This operation is called Unification. It corresponds intuitively to a compatibility test between two feature structures and, in the positive case, leads to a union of the feature structures to form a new feature structure.

The unification is often introduced via the subsumption relation: A feature structure X subsumes a feature structure Y if and only if Y contains all feature-value pairs of X (and perhaps others as well).

If two feature structures X and Y are in a subsumption relation to each other, they can always be unified. The result is then equal to the more information-rich feature structure. But they can also be unified if they are not in a subsumption relation, but their contents are compatible. More details can be found in the lecture on the fundamentals of feature logic.


Martin Volk
Date of last modification:
URL of this page: http://www.ifi.unizh.ch