Linguistic Structure
Humans somehow turn linear into complex meaning with bigger, non-linear units. We need to make explicit this structural complexity. Sometimes, this is even ambiguous.
We can use this to extract information from human languages.
Why is this hard?
- coding: global clarity, local ambiguity (number of white spaces doesn’t matter, but code always have one exact meaning)
- speaking: global ambiguity, local clarity (words are always clearly said, but what they refer to maybe unclear)
Prepositional Ambiguity
Why? — Prepositional Phrase does not have clear attachment. The sequence of possible attachments grows exponentially.
Coordination Scope Ambiguity
Two Representations
Phrase-Structure Grammar
Phrase-Structure Grammar uses Context-Free Grammars to represent language into broken phrases (nobody actually does this unless you want to go build a Constituency Parsing.)
- starting with word units (perform POS Tagging)
- combine words into phrase units (create NPs, VPs, etc.)
- combine phrases into bigger phrases (build up sentences with phrase-level grammars)
With this parsing information, you can force a Context-Free Grammar based on what could be possible:
And you can use this CFG to parse our text (not very well…. because of ambiguity).
Dependency Grammar
Dependency structure uses a graph between words to represents relationship directly on the world level.
“Which word is the head word, and what does it modify?”
- find head word
- which things modify the head word
repeat
Dependency Structure
We begin with the headword of a sentence (usually the verb); and then, we go through the text and connect the head to each dependent; this forms a tree with the head word on top, and each word POINTS to its dependent (i.e. headword on top).
Sometimes people point from dependent to head, but that’s lame. So we point from HEAD to dependent (ROOT => Verb, and so on).
Sources of Dependency Information
- Bilexical affinities — is a particular pairwise dependency possible?
- Dependency distance — most, but not all dependencies are between nearby words
- Intervening material — dependencies rarely span intervening words or punctuations
- valency —- how many dependencies on which side are usual for a head?
Non-Projective Dependencies
Dependencies can sometimes cross each other
Parsing Mechanism
Greedy Transition-Based Parsing
- take a stack, \(\sigma\), written with top tot the right
- take a buffer, \(\beta\), written with to to the left
- a set of dependency arcs \(A\), which we predict
Three operations: push one word into the stack (“shift”), link last word on stack to second to last word (“arc-left”), link second to last word on stack to last word on stack (“arc-right”)
Graph Based Parsing
ask each word “what am I a dependent of?” So, it becomes a classification problem per word vs. other adjacent words. this is what Stanza uses
Evaluating Depparse
We assess whether or not we have a good parser by checking whether or not the parsed arcs are good.
Unlabeled Dependency Score
We consider the P/R/A/F of tuples of (head, dep, word, word), considering only exact matches as positive.
Labeled Dependency Score
We consider the P/R/A/F of tuples of (head, dep, word, word, arc-label), considering only exact matches as positive.