Sunday, September 2, 2012


Book Detail 
Paperback: 310 pages
Publisher: O'Reilly Media (November 1, 2012)
Language: English
ISBN-10: 1449306667
ISBN-13: 978-1449306663
File Size : 1.78 Mb | File Format : PDF
Book Description

With this digital Early Release edition of Natural Language Annotation for Machine Learning, you get the entire book bundle in its earliest form – the author's raw and unedited content – so you can take advantage of this content long before the book's official release. You'll also receive updates when significant changes are made, as well as the final ebook version.

Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.

Systems exist for analyzing existing corpora, but making a new corpus can be extremely complex. To help you build a foundation for your own machine learning goals, this easy-to-use guide includes case studies that demonstrate four different annotation tasks in detail. You’ll also learn how to use a lightweight software package for annotating texts and adjudicating the annotations.

This book is a perfect companion to O'Reilly’s Natural Language Processing with Python, which describes how to use existing corpora with the Natural Language Toolkit.

About The Authors

James Pustejovsky teaches and does research in Artificial Intelligence and Computational Linguistics in the Computer Science Department at Brandeis University. His main areas of interest include: lexical meaning, computational semantics, temporal and spatial reasoning, and corpus linguistics. He is active in the development of standards for interoperability between language processing applications, and lead the creation of the recently adopted ISO standard for time annotation, ISO-TimeML. He is currently heading the development of a standard for annotating spatial information in language. More information on publications and research activities can be found at his webpage: pusto.com.

Amber Stubbs is a Ph.D. candidate in Computer Science at Brandeis University in the Laboratory for Linguistics and Computation. Her dissertation is focused on creating an annotation methodology to aid in extracting high-level information from natural language files, particularly biomedical texts. Information about her publications.

Table of Contents 

1.   The Basics
The Importance of Language Annotation
 The Layers of Linguistic Description
 What is Natural Language Processing?
A Brief History of Corpus Linguistics
 What is a Corpus?
 Early Use of Corpora
 Corpora Today
 Kinds of Annotation
Language Data and Machine Learning
 Classification
 Clustering
 Structured Pattern Induction
The Annotation Development Cycle
 Model the phenomenon
 Annotate with the Specification
 Train and Test the algorithms over the corpus
 Evaluate the results
 Revise the Model and Algorithms
Summary

2.   Defining Your Goal and Dataset
Defining a goal
 The Statement of Purpose
 Refining your Goal: Informativity versus Correctness
Background research
 Language Resources
 Organizations and Conferences
 NLP Challenges
Assembling your dataset
 Collecting data from the Internet
 Eliciting data from people
Preparing your data for annotation
 Metadata
 Pre-processed data
The size of your corpus
 Existing Corpora
 Distributions within corpora
Summary

3.   Building Your Model and Specification
Some Example Models and Specs
 Film genre classification
 Adding Named Entities
 Semantic Roles
Adopting (or not Adopting) Existing Models
 Creating your own Model and Specification: Generality versus Specificity
 Using Existing Models and Specifications
 Using Models without Specifications
Different Kinds of Standards
 ISO Standards
 Community-driven standards
 Other standards affecting annotation
Summary

4.  Applying and Adopting Annotation Standards to your Model
Annotated corpora
 Metadata annotation: Document classification
 Text Extent Annotation: Named Entities
 Linked Extent Annotation: Semantic Roles
 ISO Standards and you
Summary

Appendix: Bibliography

Download Ebook : Natural Language Annotation for Machine Learning


1 comment:

  1. Updated Mediafire Link :
    MF : http://adf.ly/D4bzW

    ReplyDelete