Tuesday, July 15, 2003

on names, nominals, and whether this matters at all

Work drained the hell out of me today as I tried, mostly in vain, to distill Kamp and Reyle (hereafter K&R)'s From Discourse to Logic: Introduction to Model-Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory into a few pages.

After struggling to find a good starting place, I settled upon names. On the surface, names seem fairly straightforward and easy to recognize. We annotators joke that if you'd capitalize it, it's a name. That's hardly a fail-safe test, unfortunately. K&R (hereafter K&R)'s description of names goes something like this: names provide no description of their referents, instead pointing "directly" to them. For example, Danny Loss refers to me without providing any further information about me (except, of course, that my name is Danny Loss)¹. The guy who works at the LDC is not a name since it provides description about the entity that is me (since it's also not a pronoun, it's nominal, at least in the parlance of our current project). It follows rather obviously that names are atomic units.

But this definition is not problem-free. What do we do with The U.S. Supreme Court? No one would deny that that's the name of that body. Yet that string certainly seems to provide information about the court, namely that it serves the United States and that it's the highest court of the land. K&R would, apparently, argue that this information is evident is merely by coincidence. But it's clearly more complicated than that.

Another problem comes up with things like the c/Court when talking about the Supreme Court. At first glance, the lowercase version would not appear to be a name, the uppercase version as a nickname for the Supreme Court. But that distinction is based entirely on orthography and is not reflected in speech.

So. The conclusion to be reached, I think is that noun phrases don't fall into the neat categories we'd like to assign them to. It's unfortunate, but when you're trying to impose an artificial structure on something as complex as natural language, it's almost expected. Sadly, for this project, we are forced to make that distinction. At the moment, I don't think there's a good way to systematically make that decision. Welcome to the world of natural language annotation...

In the scope of ACE, the project for which I'm doing this work, this distinction is crucial. Here's an example: say someone's interested in the actions of George W. Bush in a given range of dates. At first glance, a simple text search would suffice. However, given that this technology is (assuming all goes as planned) meant to process a huge amount of text from varied sources, the computational costs of a string search over the entire corpus increase rather quickly. If the person could limit their search to name mentions (George W. Bushis, after all, a name), the volume of text to be searched would be dramatically smaller. So the distinction matters.²

In the real world, of course, no one gives a damn about names, nominals, and pronouns. Native speakers of English, for example, communicate just fine without knowing whether a given noun phrase they're using is merely pointing to a referent or describing it. And this is why I have such strong reservations about pursuing theoretical linguistics as a career (I've all but decided against it... history is feeling right at the moment); I want to study real people in the real world. As fascinating as language is (and, believe me, it is), studying it in a vacuum is, for me, ultimately unproductive in the Grand Scheme of Things. I really have no clue if academic history is better, but right now it feels as if that'd be more satisfying.

-------------------------------

1) Even this isn't quite accurate. Danny Loss likely indicates that I have some familial relationship with other people whose surname is Loss. But that's splitting some pretty fine hairs.

2) I realize I haven't given a justification for why this project matters. The short version: automatic text processing would be a Good Thing.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home