Book Detail
Paperback: 310 pages
Publisher: O'Reilly Media (November 1, 2012)
Language: English
ISBN-10: 1449306667
ISBN-13: 978-1449306663
File Size : 1.78 Mb | File Format : PDF
Book DescriptionWith this digital Early Release edition of Natural Language Annotation for Machine Learning, you get the entire book bundle in its earliest form – the author's raw and unedited content – so you can take advantage of this content long before the book's official release. You'll also receive updates when significant changes are made, as well as the final ebook version.
Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.
Systems exist for analyzing existing corpora, but making a new corpus can be extremely complex. To help you build a foundation for your own machine learning goals, this easy-to-use guide includes case studies that demonstrate four different annotation tasks in detail. You’ll also learn how to use a lightweight software package for annotating texts and adjudicating the annotations.
This book is a perfect companion to O'Reilly’s Natural Language Processing with Python, which describes how to use existing corpora with the Natural Language Toolkit.
About The Authors
James Pustejovsky teaches and does research in Artificial Intelligence and Computational Linguistics in the Computer Science Department at Brandeis University. His main areas of interest include: lexical meaning, computational semantics, temporal and spatial reasoning, and corpus linguistics. He is active in the development of standards for interoperability between language processing applications, and lead the creation of the recently adopted ISO standard for time annotation, ISO-TimeML. He is currently heading the development of a standard for annotating spatial information in language. More information on publications and research activities can be found at his webpage: pusto.com.
Amber Stubbs is a Ph.D. candidate in Computer Science at Brandeis University in the Laboratory for Linguistics and Computation. Her dissertation is focused on creating an annotation methodology to aid in extracting high-level information from natural language files, particularly biomedical texts. Information about her publications.
Table of Contents
1. The Basics
The Importance of Language Annotation
The Layers of Linguistic Description
What is Natural Language Processing?
A Brief History of Corpus Linguistics
What is a Corpus?
Early Use of Corpora
Corpora Today
Kinds of Annotation
Language Data and Machine Learning
Classification
Clustering
Structured Pattern Induction
The Annotation Development Cycle
Model the phenomenon
Annotate with the Specification
Train and Test the algorithms over the corpus
Evaluate the results
Revise the Model and Algorithms
Summary
2. Defining Your Goal and Dataset
Defining a goal
The Statement of Purpose
Refining your Goal: Informativity versus Correctness
Background research
Language Resources
Organizations and Conferences
NLP Challenges
Assembling your dataset
Collecting data from the Internet
Eliciting data from people
Preparing your data for annotation
Metadata
Pre-processed data
The size of your corpus
Existing Corpora
Distributions within corpora
Summary
3. Building Your Model and Specification
Some Example Models and Specs
Film genre classification
Adding Named Entities
Semantic Roles
Adopting (or not Adopting) Existing Models
Creating your own Model and Specification: Generality versus Specificity
Using Existing Models and Specifications
Using Models without Specifications
Different Kinds of Standards
ISO Standards
Community-driven standards
Other standards affecting annotation
Summary
4. Applying and Adopting Annotation Standards to your Model
Annotated corpora
Metadata annotation: Document classification
Text Extent Annotation: Named Entities
Linked Extent Annotation: Semantic Roles
ISO Standards and you
Summary
Appendix: Bibliography
Download Ebook : Natural Language Annotation for Machine Learning
Updated Mediafire Link :
ReplyDeleteMF : http://adf.ly/D4bzW