Better Word Representations with Recursive Neural Networks ...

Better Word Representations with Recursive Neural Networks ...

Better Word Representations with Recursive Neural Networks for Morphology Thang Luong Joint work with Richard Socher and Christopher D. Manning 35323 Word frequencies in Wikipedia documents (986 million tokens) 22687 1285 di sti nc t di sti nc tiv e 406 di di

co di di sti sti s s n ti tra nc nc nc tinc d tiv tiv t ti el y en istin ness ves es cti s on Word frequencies in Wikipedia documents 35323 (986 million tokens) 23828 22687

3526 3479 1285 569 406 175 141 108 di di di di di i di co di di di sti sti sti sti sti ndi sti s s sti n ti tra sti nc nc nc

nc nc nc nc tinc nc nc d t tio tly tio tiv tiv t ti tiv t el y en istin ness ves e n ns es cti s on And more indistinctly, non-distinct, indistinctive, non-distinctive,

indistinctness, distincted, semi-distinct, contra-distinction, indistinction, contradistinctive Problem: these words are independent entities! Vector-space word representations (Collobert & Weston, 2010) (Huang et. al., 2012) different distinctive broader unique broad distinctive distinct narrower separate companion roskam distinctness morphologies pesawat clefts pathologies hitoshi enjoyed exacerbate impacts allow prevent involve affect characterize outweigh enable unnoticed dwarfed monti sheaths krystal unaffected mitigated overwhelmed south-southeast Very successful in recent years for NLP tasks Vector-space word representations (Collobert & Weston, 2010) (Huang et. al., 2012)

different distinctive broader unique broad distinctive distinct narrower separate companion roskam distinctness morphologies pesawat clefts pathologies hitoshi enjoyed exacerbate impacts allow prevent involve affect characterize outweigh enable unnoticed dwarfed monti sheaths krystal unaffected mitigated overwhelmed south-southeast Very successful in recent years for NLP tasks Problem: poorly estimate rare and complex words Vector-space word representations (Collobert & Weston, 2010) (Huang et. al., 2012) different distinctive broader unique broad distinctive distinct narrower separate companion roskam distinctness morphologies pesawat clefts

pathologies hitoshi enjoyed exacerbate impacts allow prevent involve affect characterize outweigh enable unnoticed dwarfed monti sheaths krystal unaffected mitigated overwhelmed south-southeast This work divergent diverse distinctive homogeneous distinctiveness smallness largeness exactness decrease arise complicate extend disaffected undisputed unopposed unrestricted Very successful in recent years for NLP tasks Problem: poorly estimate rare and complex words Goal: capture both syntactics (word structure) and semantics Our approach Neural Language

Model , unfortunately the Semantics bank was closed Morphology Model Compute vector representations for complex words on the fly. Syntactics Our approach network structure Neural Language Model

, unfortunately the bank was , Morphology Model unfortunate closed , ly close d , un fortunate Neural Language Model: simple feed-forward network (Huang, et al., 2012) with

ranking-type cost (Collobert et al., 2011). Morphology Model: recursive neural network (Socher et al., 2011). Unsupervised Morphological Structures Our morphoRNN assumes words pre* stm suf*: Morphological segmentations (pre* stm suf*)+ provided by Morfessor (Creutz & Lagus, 07). Post-process words, including hyphenated ones. Learn meanings of unconventional morphemes: alpre in Arabic names: al-jazeera, al-salem relatedsuf in compound adjectives: health-related, government-related Experiments: Word Similarity Task Word similarity dataset: king queen: 8.58 king cabbage: 0.23 Datasets: WS-353 (Finkelstein et al., 02) MC (Miller & Charles, 91) RG (Rubenstein & Goodenough, 65) SCWS* (Huang et al., 12) A new rare word (RW) dataset: word1 untracked unflagging unprecedented apocalyptical organismal diagonal obtainment discernment confinement word2 inaccessible constant new prophetic system

line acquiring knowing restraint Metric: correlation between similarity scores given by our models and those assigned by human raters. Results Start training from either: HSMN embedding (Huang et al., 2012). C&W embedding (Collobert et al., 2011). 80 HSMN Column1 C&W C&W+m 70 60 50 40 30 20 10

0 WS-353 MC RG SCWS* RW Results Start training from either: HSMN embedding (Huang et al., 2012). C&W embedding (Collobert et al., 2011). 80 HSMN Column1 C&W C&W+m 70 60 50 40 30 20 10

0 WS-353 MC RG SCWS* RW Conclusions Words (Collobert et al., 2011) commenting insisting insisted focusing hinted unaffected unnoticed dwarfed mitigated heartlessness saudi-owned This work commented interviewing comments disaffected undisputed unopposed corruptive inhumanity ineffectual overawed avatar mohajir kripalani

saudi-based syrian-controlled syrian-backed Learned syntactic-semantic word vectors by combining: RNN: models morphological structures of words. NLM: learns semantics from the surrounding contexts. Introduced a new dataset of rare words. Future: apply to morphologically complex languages or in other English domains such as bio-NLP. Outline More details (context-sensitive model) Context-insensitive model (first thing we tried) Rare word dataset More results & Analysis , Context-sensitive model unfortunately , unfortunate ly Feed-forward network: the

bank was closed , close , un fortunate Recursive neural network (RNN): parameter sharing Objective: ranking-type cost s(cat chills on a mat) > s(cat chills on a Sofia) Learning: back-propagation d Context-insensitive model unfortunately 2 unfortunately 2

, unfortunate ly , un fortunate Objective: squared Euclidean distances newly constructed vs. reference representations Problem: strongly bias towards syntactic agreement. Similar to (Lazaridou et al., ACL13): compositional distributional semantic models No recursive composition: only an affix and a stem Rare Word Datasets Datasets: WS-353 (Finkelstein et al., 02) MC (Miller & Charles, 91) RG (Rubenstein & Goodenough, 65) SCWS* (Huang et al., 12) A new rare word (RW) dataset 1200 1000 800 600 400 200 0

1063 801 472 341 87 9 0 WS-353 0 Unknown 676719714 1 17 21 0 0 MC [1, 100] 4 22 22 0 0 RG [101, 1000] 140 26 2 SCWS*

[1001, 10000] 41 RW (new) [10001, ] Rare Word Dataset Construction Select rare words: grouped by affixes and frequencies Each word has a non-zero number of synsets in WordNet [6, 10] untracked unrolls unundissolved apocalyptical traversals -al bestowals obtainment acquirement -ment retrenchments [11, 100] unrehearsed unflagging unfavourable acoustical extensional organismal discernment revetment rearrangements [101, 1000] unprecedented unmarried

uncomfortable directional diagonal spherical confinement establishment management Form pairs: for each rare word, first select a synset then select hypernyms, hyponyms, meronyms, and attributes word1 untracked unflagging unprecedented apocalyptical organismal diagonal obtainment discernment confinement word2 inaccessible constant new prophetic system line Collect human judgments: Amazon Mechanical Turk 3145 pairs rated by 10 people (US native speakers), 2034 pairs accepted. acquiring knowing restraint Results 80 60

HSMN stem context-insensitive context-sensitvie 40 20 0 WS-353 MC RG SCWS RW 80 60 C&W stem context-insensitive context-sensitive 40 20

0 WS-353 MC RG SCWS RW Analysis Words C&W insisting insisted commenting focusing hinted caused plagued affected impacted damaged unnoticed dwarfed unaffected mitigated heartlessness saudi-owned avatar mohajir kripalani

C&W + context-insensitive C&W + context-sensitive republishing accounting commented interviewing expounding comments disaffected unaffected affecting extended extending mitigated disturbed constrained disaffected unconstrained disaffected undisputed uninhibited unopposed fearlessness vindictiveness corruptive inhumanity restlessness ineffectual overawed saudi-based somaliland al- saudi-based syrian-controlled jaber syrian-backed Context-insensitive model: well enforces structural agreement, but ignore semantics. Context-sensitive model: blends well syntactic (word structure) and semantic information. Possible extensions Bio-NLP domain:

complicated but logical taxonomy alpha-adrenergic, alpha-mpt, alpha-mpt-treated dihydroxyphenylaline, dihydroxyphenylserine Extend the model from pre* stm suf* (pre* stm suf*)+ . to Jointly learn morpheme vectors and segmentations: Bad segmentations: disc+over, under+stand Perhaps, a fast version of the split-merge heuristic in grammatical induction? Thank you! Bilingual Word Representations unfortunately the bank was closed malheureusement la banque a t ferm Objective: for each alignment, sum LMEnglish: score(unfortunately the bank was closed) LMFrench: score(malheureusement la banque a t ferm) Alignment: Weigh alignment constraints by alignment probabilities. Assume alignments are perfect for now?

Related Work Factored NLM (Alexandrescu & Kirchhoff, HLT06): each word: a vector of features (factors) such as stems, morphological tags, and cases. Concatenate vectors of factors, no composition. Compositional distributional semantic models (Lazaridou et al., ACL13): Similar to our context-insensitive model No recursive composition: only an affix and a stem Unsupervised Morphological Structures Utilize morphological segmentation toolkit: Morfessor (Creutz & Lagus, 2007): (pre* stm suf*)+ Recursively split words: MDL-inspired objective Want input of the form pre* stm suf*: (1) Restrict segmentations pre* stm{1,2} suf* (2) Split hyphenated words A-B as Astm Bstm (3) Decide the main stem in pre* Astm Bstm suf* Discover more interesting morphemes: alpre in Arabic names: al-jazeera, al-salem relatedsuf in compound adjectives: health-related, government-related (4) Reject segmentation if an affix or an unknown stem (not a word by itself) whose type count is below a predefined threshold.

Recently Viewed Presentations

  • ECOSOC Western Asia Ministerial Meeting Addressing noncommunicable diseases

    ECOSOC Western Asia Ministerial Meeting Addressing noncommunicable diseases

    Dr Fiona Adshead Director, Chronic Diseases and Health Promotion World Health Organization 10-11 May 2009 Doha, Qatar Noncommunicable Diseases and their shared risk factors Four diseases ...
  • Do Now  1) What is a bank?  2)

    Do Now 1) What is a bank? 2)

    Is a higher interest rate on a Certificate of Deposit (CD) worth giving up liquidity (the ability to easily convert your financial resources to cash)? Would you trade the convenience of getting cash from the ATM near your office/school for...
  • Meeting of Screening Committee for Krishi Karman Award

    Meeting of Screening Committee for Krishi Karman Award

    Kisan Call Centre at Panchkula for proper dissemination of market price information. Agro Malls- a unique project of its kind in India. Four Agro Malls at Rohtak, Karnal, Panipat and Panchkula constructed. 106 principal yards, 178 sub-yards and 192 purchase...
  • English 10 - 2/21/12

    English 10 - 2/21/12

    In chapter 12, explain why Calpurnia speaks differently in the Finch household, and among her neighbors at the church. Depression is only anger without the energy. Oologist - collector of bird's eggs. Goals - Discussion selected from student quotes or...
  • Three-Dimensional Figures and Spatial Reasoning

    Three-Dimensional Figures and Spatial Reasoning

    Three-Dimensional Figures and Spatial Reasoning Lesson 9-7 Vocabulary Prisms Pyramids Other 3D Shapes Homework Time Three-Dimensional Figures and Spatial Reasoning Lesson 9-7 Vocabulary Prisms Pyramids Other 3D Shapes Homework Time A three-dimensional figure is a figure that does not lie...
  • Bond Basics - University of Kentucky

    Bond Basics - University of Kentucky

    Generally, the call price is above the bond's face value. The difference between the call price and the face value is the call premium Bonds are not usually callable during the first few years of a bond's life. During this...
  • California Department of Food and Agriculture Office of

    California Department of Food and Agriculture Office of

    The Statement of Economic Interests, Form 700 is required by the California Fair Political Practices Commission also referred to as the FPPC. As a Technical Review Committee member who influences funding decisions of the Specialty Crop Block Grant Program, you...
  • Energy Force and Motion - Boone County Schools

    Energy Force and Motion - Boone County Schools

    Energy Force and Motion Mrs. Pidgeon's 5th Grade Class On the weather channel a hurricane's path is explained in terms of velocity. People want to know how fast a hurricane is traveling but they also need to know in what...