From one perspective, the renewal of interest in connectionism andneural modeling was a natural step in the endeavor to elaborateabstract notions of cognitive content and functioning to the pointwhere they can make testable contact with brain theory andneuroscience. But it can also be seen as a paradigm shift, to theextent that the focus on subsymbolic processing began to be linked to agrowing skepticism concerning higher-level symbolic processing asmodels of mind, of the sort associated with earlier semanticnetwork-based and rule-based architectures. For example, Ramsay et al.(1991) argued that the demonstrated capacity of connectionistmodels to perform cognitively interesting tasks undermined thethen-prevailing view of the mind as a physical symbol system. Butothers have continued to defend the essential role of symbolicprocessing. For example, Anderson (1983, 1993) contended that whiletheories of symbolic thought need to be grounded in neurally plausibleprocessing, and while subsymbolic processes are well-suited forexploiting the statistical structure of the environment, neverthelessunderstanding the interaction of these subsymbolic processes required atheory of representation and behavior at the symbolic level.

Since the 1970s, there has been a gradual trend away from purelyprocedural approaches to ones aimed at encoding the bulk of linguisticand world knowledge in more understandable, modular, re-usable forms,with firmer theoretical foundations. This trend was enabled by theemergence of comprehensive syntactico-semantic frameworks such asGeneralized Phrase Structure Grammar (GPSG), Head-driven PhraseStructure Grammar (HPSG), Lexical-Functional Grammar (LFG),Tree-Adjoining Grammar (TAG), and Combinatory Categorial Grammar (CCG),where in each case close theoretical attention was paid both to thecomputational tractability of parsing, and the mapping from syntax tosemantics. Among the most important developments in the latter areawere Richard Montague's profound insights into the logical (especiallyintensional) semantics of language, and Hans Kamp's and Irene Heim'sdevelopment of Discourse Representation Theory (DRT), offering asystematic, semantically formal account of anaphora in language.

G. Tiotto, P. Prinetto, E. Piccolo, N. Bertoldi, F. Nunnari, V. Lombardo, A. Mazzei, L. Lesmo, and A. D. Principe. On the creation and the annotation of a large-scale Italian-LIS parallel corpus. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Valletta, Malta, May 2010. ISBN 10: 2-9517408-6-7.

A major shift in nearly all aspects of natural language processingbegan in the late 1980s and was virtually complete by the end of 1995:this was the shift to corpus-based, statistical approaches (signalledfor instance by the appearance of two special issues on the subject bythe quarterly Computational Linguistics in 1993). The newparadigm was enabled by the increasing availability and burgeoningvolume of machine-readable text and speech data, and was drivenforward by the growing awareness of the importance of thedistributional properties of language, the development of powerful newstatistically based learning techniques, and the hope that thesetechniques would overcome the scalability problems that had besetcomputational linguistics (and more broadly AI) since itsbeginnings.

The archaeological picture changed dramatically around 40-50,000 years ago with the appearance of behaviorally modern humans. This was an abrupt and dramatic change in subsistence patterns, tools and symbolic expression. The stunning change in cultural adaptation was not merely a quantitative one, but one that represented a significant departure from all earlier human behavior, reflecting a major qualitative transformation. It was literally a “creative explosion” which exhibited the “technological ingenuity, social formations, and ideological complexity of historic hunter-gatherers.”7 This human revolution is precisely what made us who we are today.

The Computational Genomics Research Program of the Center for Genomic Sciences pioneered Bioinformatics in Mexico; for years it has compiled, integrated and represented knowledge on genetic regulation. Nowadays, we have taken one of the main objectives of UNAM as our own: the compromise to communicate and spread knowledge.

The corpus-based approach has indeed been quite successful in producingcomprehensive, moderately accurate speech recognizers, part-of-speech(POS) taggers, parsers for learned probabilistic phrase-structuregrammars, and even MT and text-based QA systems and summarizationsystems. However, semantic processing has been restricted to rathershallow aspects, such as extraction of specific data concerningspecific kinds of events from text (e.g.,location, date, perpetrators,victims, etc., of terrorist bombings) or extraction of clusters ofargument types, relational tuples, or paraphrase sets from textcorpora. Currently, the corpus-based, statistical approaches are stilldominant, but there appears to be a growing movement towardsintegration of formal logical approaches to language with corpus-basedstatistical approaches in order to achieve deeper understanding andmore intelligent behavior in language comprehension and dialoguesystems. There are also efforts to combine connectionist and neural-netapproaches with symbolic and logical ones. The following sections willelaborate on many of the topics touched on above. General referencesfor computational linguistics are Allen 1995, Jurafsky andMartin 2009, and Clark et al. 2010.

Language is structured at multiple levels, beginning in the case ofspoken language with patterns in the acoustic signal that can bemapped to phones (the distinguishable successive sounds ofwhich languages are built up). Groups of phones that are equivalentfor a given language (not affecting the words recognized by a hearer,if interchanged) are the phonemes of the language. Thephonemes in turn are the constituents of morphemes (minimalmeaningful word segments), and these provide the constituents ofwords. (In written language one speaks instead of characters,graphemes, syllables, and words.) Words are groupedinto phrases, such as noun phrases, verb phrases, adjectivephrases and prepositional phrases, which are the structural componentsof sentences, expressing complete thoughts. At still higher levels wehave various types of discourse structure, though this is generallylooser than lower-level structure.

Information Extraction. Typically, information extraction (IE) gathers a set of structured data that describe an event from an unstructured data source (documents, videos or images). In the biomedical field IE has been used to extract protein-protein and gene-gene interactions from collections of scientific articles. In the lab, we are particularly interested in using IE to extract regulatory interactions between transcription factors and transcription units, along with the growth condition in which they happen.

