Language to Language TranslationA Way to Homogeneous Team

Language to Language TranslationA Way to Homogeneous Team

Language to Language TranslationA Way to Homogeneous Team effort of:Mentor:India... Anasree Prof. K.T.Talele Chatterjee & Diwa Why the system ?? What is language? Need for proper communication Hazards of miscommunication Hence Need for our system Key users of our system Our system

overview.... Any out of 8 languages Hindi English Enjoy the words ! Speak in one language & listen in another language in just 3 steps ! t Inpu ch Spee in is h Engl i d or Hin Speech to Text

Text to Text Bengali text to Bengali speech Text to Speech English or Hindi speech to English or Hindi text e.g. English English text to text of selected output language e.g. Bengali Outp u Spee t ch in 8 Diffe re Lang nt uag

e Voice Input Speech to Text Architecture ! Analog to Feature Phonetic Lexicon Analog to Digital Feature Extraction Language Model Acoustic Model

Speech Engine/Decode r Store Word in a File 1.Voice Input 3.Feature Extracting 2.Analog to Digital Noise Filtering Speech to Text Architecture ! Voice Input Phonetic

Lexicon Analog to Digital Feature Extraction Languag e Model Acoustic Model Speech Engine Store Word in a File Acoustic Model Components of ASR contd.... Audio

Recording Tool CMU Sphinx Train Text Transcription Software Statistical Representations of the Sounds that make up each Word ACOUSTIC MODEL Uses Hidden Markov Model (HMM)

Speech to Text Architecture ! Voice Input Phoneti c Lexicon Phonetic Lexicon Analog to Digital Feature Extraction Languag e Model Acoustic Model Speech

Engine Store Word in a File Components of ASR contd.... Phonetizer Valid words from output of acoustic model Contains words + phonetic PHONETIC LEXICON Phoneme -basic unit of Phonetic

representation of every word in vocabulary Hindi :- Itrans-3 English :phonetics Hindi Speech Hindi Script /UTF8 IT3 to UTF8 Itrans-3 Sound Wave Itrans-3 Phoneme In:d:iyaa / / / /

Paanii / / / / Phonem e Hindi.dic Hindi Word In:d:iyaa / / // Paanii / / / / English Speech word Sound Wave Pocket Sphinx

Phoneme SphinxTrain Phoneme Cmu07.di c Word English Speech to Text Architecture ! Voice Input Analog to Digital Feature Extraction Language Model

Phonetic Lexicon Acoustic Model Speech Engine Store Word in a File Languag e Model Components of ASR contd.... Captures underlying grammatical structure of language. Most common language models

n-gram LM Tool LANGUAGE CMUCLMTK Statistical Language MODEL Model assigns probability to sequence of m words by probability distribution. USE:Restrict Word Search Steps of Language Create Model:Word frequencie s CORPUS.TXT

Vocabulary file Corpus N-gram file Language Model in .ARPA format CMU Cam LM TOOL KIT CORPUS.ARPA .ARPA File Speech to Text Architecture ! Voice Input Phonetic

Lexicon Analog to Digital Feature Extraction Languag e Model Acoustic Model Speech Engine Store Word in a File Speech Engine Components of ASR contd....

Aspects of Speech Decoding Modified Version DTW Algorithm used Compares input speech data with acoustic models Determine which part of signal is speech and filter out silence durations Uses SPEECH ENGINE / DECODER

Tool CMU Sphinx-PocketSphinx Samples of PocketSphinx acting as a Decoder.... Text to Text Architecture Retrieve Stored Word from File E.g. India FIND RETRIVE Database Script of Word in Selected Language E.g. /

/ / / Use & Creation of Database! Text to Speech Architecture ! Phonetic Synthesizer Text parser Input Text in UTF8 Encodings Text to Phonetic Script Conversion Speech Synthesizer CV Pair Algorithm Sound concatenation

Grapheme To Phoneme Rules Sound Database Speech Grapheme to Phoneme Conversion ! Phonetic description syllable based. 8 kinds of sounds allowed V: a plain vowel CV: a consonant followed by a vowel VC: a vowel followed by a consonant CVC: a consonant followed by a vowel followed by a consonant

HCV: a half consonant, followed by a CV HCVC: a half consonant, followed by a CVC 0C: a consonant alone G[0-9]*: a silence gap of the specified length (typical gaps (C -consonant, V -Vowel, H-Half Sound) Consonants & Vowels ! VOWELS :CONSONANTS :- Text to Speech Architecture ! Phonetic Synthesizer Text parser Input Text in UTF8 Encodings Text to Phonetic Script Conversion

Speech Synthesizer CV Pair Algorithm Sound concatenation Grapheme To Phoneme Rules Sound Database Speech Text to Phonetic Script ! Unicode text common script. Speech Synthesizer common script Words in Hindi

Examples Specific Phoneme (G2P) CV Pair (Pronunciation Sound) khana kh2 n2 CV CV maun m13n CVC kahaan

k1 h2an CV CVC pratibha pHr1 t3 bh2 HCV CV CV sankalp s1n k1l 0p CVC CVC 0C chandramaa ch1n dHr1 m2 CVC HCV CV praan

pHr2n HCVC aadesh 2 d8sh CVC 0C 0C andaaz 1n d2z VC CVC ahimsa 1 h3n s2 V CVC CV Text to Speech

Architecture ! Phonetic Synthesizer Text parser Input Text in UTF8 Encodings Text to Phonetic Script Conversion Speech Synthesizer CV Pair Algorithm Sound concatenation Grapheme To Phoneme Rules Sound Database

Speech Sound Database ! Sound files are gsm compressed i.e. .gsm fromat Sound units stored in the database are: CV pairs VC pairs V C Halfs gr :--- 1..33 * 2 4 6 8 9 10 12 13 14 15 :--- 2 4 6 8 9 10 12 13 14 15 * 1..34 :--- 1..14 :--- 1..34 :--- ky kr kl kll kv ksh khy khr khl khv gy gl gv gn ghy ghr ghv ghn chy chr chv jy jv ty tr tv thy thr dy dr dv dhy dhr dhv ny nr nv tty ttr ttv ddy ddr ddv py pr pl pll fr fl by br bl bhy bhr bhl my mr vy vr vl

Total size of db --- 1 MB Sound Concatenation CV files named consonan t number named V files x.y.gsm vowel number x.gsm CV files

named consonan t number vowel number Halfs files named vowel number 0C files named consonan t number

x.y.gsm x.y.gsm 2 consonants x.gsm 4 more Files cvoffsets vcoffsets voffsets hoffsets Text to Speech Architecture ! Phonetic Synthesizer Text parser Input Text in

UTF8 Encodings Text to Phonetic Script Conversion Speech Synthesizer CV Pair Algorithm Sound concatenation Grapheme To Phoneme Rules Sound Database Speec h Extended modules:-

S2T T2S Constr aints :Future scope :- T2T File Reader S2T Reporter Training is tedious : 2 input Languages. Phone generation of all Indian languages difficult.

Can be trained for all Indian languages Increase accuracy Better quality of the text to speech synthesizer modules A larger dictionary approx. 2000-3000 words BOL INDIA BOL PRIVATE LIMITED Masters of Computer Application. Sardar Patel Institute Of Technology. Anasree Chatterjee (Director)

Diwa Arunashree (West) (Director) Mumbai-58 Andheri Prof. K.T.Talele (Joint Director) Shivani Nadkarni (Joint Director) Aditya Naravane (Joint Director) Language to Language Translator A way To Homogeneous India Languator -- especially designed for the 3Ts that is Travelers, Tourists and at pars the people who are victims of Transferable jobs. It will also serve to certain extent the needs of S2T Reporters.

Recently Viewed Presentations

  • Constitutional Government - Mayfield High School

    Constitutional Government - Mayfield High School

    Amendment process is outlined in Article V of the constitution. Constitutional amendments have added to, modified, replaced and/or made inoperable provisions of the original document and previous amendments . Example- 2nd Amendment & the right to bear arms. Example- 21st...
  • Introductory Chemistry, 2nd Edition Nivaldo Tro

    Introductory Chemistry, 2nd Edition Nivaldo Tro

    Remove four more Hs for each additional unsaturation. Some Unsaturated Hydrocarbons * Aromatic Hydrocarbons Aromatic hydrocarbons contain a ring structure that seems to have C=C, but doesn't behave that way. The most prevalent example is benzene. C6H6. Other compounds have...
  • Teacher Orientation - Jasper City Schools

    Teacher Orientation - Jasper City Schools

    Harry Wong Videos and Publications. The MASTER Teacher Videos. Other Central Office Resources. School Resources . Elementary and Secondary Suggested Resource List . Professional Development Program. National, State and Local Emphasis. Local Expectations and Requirements.
  • Electron Blue - PC\|MAC

    Electron Blue - PC\|MAC

    electron cloud. or. shells-0 amu. Neutron. nucleus. o. 1 amu. The number of protons in an atom is its . atomic number. The number of protons + neutrons in an atom is its . Mass number or atomic mass. What...
  • By Richard C. Wilson

    By Richard C. Wilson

    The EIF conducted an in-depth due diligence process, which included in-person meetings both in Luxembourg and Tel Aviv, on-site due diligence on the iAngels team and our processes, and meetings with CEOs from our portfoliocompanies ... By Richard C. Wilson...
  • Klimat och miljö i förändring - Varifrån ska vi få vår mat?

    Klimat och miljö i förändring - Varifrån ska vi få vår mat?

    Klimat och miljö i förändring - Varifrån ska vi få vår mat? Peter Edling Ordförande, Jordbruksavdelningen Kungl. Skogs- och Lantbruksakademien
  • Introduction to Cognitive Linguistics

    Introduction to Cognitive Linguistics

    1.1 LEXICOLOGY DEFINED 1.2 LEXICOLOGY AS A LEVEL OF LINGUISTIC ANALYSIS 1.3 THE STRUCTURE OF ENGLISH VOCABULARY Chapter 1: What is lexicology? 1.1 LEXICOLOGY DEFINED morphology etymology semantics lexicography 1.2 LEXICOLOGY AS A LEVEL OF ANALYSIS lex. & phonology lex....
  • Marriage - WordPress.com

    Marriage - WordPress.com

    Virtue of marriage . Why is marriage sacred in Islam? Virtues of marriage. Following the Sunnah. Protects muslim from adultery or any other sin of the kind. Can have a family . Couples achieve mutual affection, mercy and love. Prophet...