SAPA SCALE Conference 2012 | 07.09.2012

The principal objective of the conference is to bring together researchers addressing perceptually motivated speech and audio processing tasks with the tools of statistical signal processing and machine learning. The themes of the conference are: Statistical models for speech and audio processing motivated by human perception Developing the commonalities between speech recognition and synthesis to provide richer and more sophisticated models for speech Adaptive learning approaches to speech and audio signal processing and their incorporation into statistical models

Conference Program

20:02

Pitch Estimation Using Mutual Information
Yanbo Xu, University of Maryland College Park
Sept. 7, 2012 · 11:07 a.m.

30:16

Establishing some principles of human speech production through two-dimensional computational models
Mauro Nicolao, University of Sheffield
Sept. 7, 2012 · 11:31 a.m.

49:32

Human sound perception - what can we learn from it when developing audio analysis algorithms?
Tuomas Virtanen, Tampere University of Technology
Sept. 7, 2012 · 11:46 a.m.

188 views

20:14

A Spectral Envelope Estimation Method Based on F0-Adaptive Multi-Frame Integration Analysis
Tomoyasu Nakano, AIST
Sept. 7, 2012 · 1:02 p.m.

143 views

18:48

Cochlear Implant-like Processing of Speech Signal for Speaker Verification
Cong-Thanh Do, LIMSI-CNRS/Universite Paris-Sud
Sept. 7, 2012 · 1:14 p.m.

18:48

Speech intelligibility enhancement for HMM-based synthetic speech in noise
Cassia Valentini-Botinhao, University of Edinburgh
Sept. 7, 2012 · 1:37 p.m.

160 views

26:09

A Generalized Stein's Estimation Approach for Speech Enhancement Based on Perceptual Criteria
Sunder Ram Krishnan, Indian Institute of Science
Sept. 7, 2012 · 2:02 p.m.

13:54

Template-based ASR using Posterior features and Synthetic References: comparing different TTS systems
Serena Soldo, Idiap Research Institute
Sept. 7, 2012 · 2:04 p.m.

15:48

Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron
Kalu U. Ogbureke, University College Dublin
Sept. 7, 2012 · 2:26 p.m.

104 views

22:31

Non-Stationary Signal Processing and its Application in Speech Recognition
Zoltán Tüske, RWTH Aachen University
Sept. 7, 2012 · 2:29 p.m.

135 views

15:35

Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models
Liang Lu, University of Edinburgh
Sept. 7, 2012 · 3:16 p.m.

24:55

Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition using WFST
M. Ali Basha Shaik, RWTH Aachen University
Sept. 7, 2012 · 3:36 p.m.

22:18

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

144 views

19:22

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:31 a.m.

139 views

02:28

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power (part 2)
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:50 a.m.

17:19

Structured Sparse Coding for Microphone Array Location Calibration
Afsaneh Asaei, Idiap/CMU
Sept. 8, 2012 · 11:55 a.m.

208 views

22:18

Inharmonic Speech: A Tool for the Study of Speech Perception and Separation
Dan Ellis, NYU/Columbia/Wakayama
Sept. 8, 2012 · 1:19 p.m.

15:50

Multi-Channel Speech Separation with Soft Time-Frequency Masking
Friedrich Faubel , Saarland University
Sept. 8, 2012 · 1:48 p.m.

219 views

18:50

Smoothing Speech Trajectories by Regularization
Louis ten Bosch, Radboud University Nijmegen
Sept. 8, 2012 · 2:06 p.m.

207 views

20:38

Data-driven Speech Representations for NMF-based Word Learning
Hugo Van hamme, KU Leuven
Sept. 8, 2012 · 2:51 p.m.

291 views

24:44

Spectro-Temporal Features with Distribution Equalization
Samuel K. Ngouoko M. , Bielefeld University/Honda Research
Sept. 8, 2012 · 3:18 p.m.

117 views

17:22

Log-normal matrix factorization with application to speech-music separation
Takuya Yoshioka, Sakaue Daichi, NTT Communication Science Laboratories
Sept. 8, 2012 · 3:46 p.m.

198 views

SAPA SCALE Conference 2012
September 2012 · 22 Talks

Pitch Estimation Using Mutual Information
Yanbo Xu, University of Maryland College Park
Sept. 7, 2012 · 11:07 a.m.

Establishing some principles of human speech production through two-dimensional computational models
Mauro Nicolao, University of Sheffield
Sept. 7, 2012 · 11:31 a.m.

Human sound perception - what can we learn from it when developing audio analysis algorithms?
Tuomas Virtanen, Tampere University of Technology
Sept. 7, 2012 · 11:46 a.m.

A Spectral Envelope Estimation Method Based on F0-Adaptive Multi-Frame Integration Analysis
Tomoyasu Nakano, AIST
Sept. 7, 2012 · 1:02 p.m.

Cochlear Implant-like Processing of Speech Signal for Speaker Verification
Cong-Thanh Do, LIMSI-CNRS/Universite Paris-Sud
Sept. 7, 2012 · 1:14 p.m.

Speech intelligibility enhancement for HMM-based synthetic speech in noise
Cassia Valentini-Botinhao, University of Edinburgh
Sept. 7, 2012 · 1:37 p.m.

A Generalized Stein's Estimation Approach for Speech Enhancement Based on Perceptual Criteria
Sunder Ram Krishnan, Indian Institute of Science
Sept. 7, 2012 · 2:02 p.m.

Template-based ASR using Posterior features and Synthetic References: comparing different TTS systems
Serena Soldo, Idiap Research Institute
Sept. 7, 2012 · 2:04 p.m.

Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron
Kalu U. Ogbureke, University College Dublin
Sept. 7, 2012 · 2:26 p.m.

Non-Stationary Signal Processing and its Application in Speech Recognition
Zoltán Tüske, RWTH Aachen University
Sept. 7, 2012 · 2:29 p.m.

Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models
Liang Lu, University of Edinburgh
Sept. 7, 2012 · 3:16 p.m.

Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition using WFST
M. Ali Basha Shaik, RWTH Aachen University
Sept. 7, 2012 · 3:36 p.m.

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:31 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power (part 2)
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:50 a.m.

Structured Sparse Coding for Microphone Array Location Calibration
Afsaneh Asaei, Idiap/CMU
Sept. 8, 2012 · 11:55 a.m.

Inharmonic Speech: A Tool for the Study of Speech Perception and Separation
Dan Ellis, NYU/Columbia/Wakayama
Sept. 8, 2012 · 1:19 p.m.

Multi-Channel Speech Separation with Soft Time-Frequency Masking
Friedrich Faubel , Saarland University
Sept. 8, 2012 · 1:48 p.m.

Smoothing Speech Trajectories by Regularization
Louis ten Bosch, Radboud University Nijmegen
Sept. 8, 2012 · 2:06 p.m.

Data-driven Speech Representations for NMF-based Word Learning
Hugo Van hamme, KU Leuven
Sept. 8, 2012 · 2:51 p.m.

Spectro-Temporal Features with Distribution Equalization
Samuel K. Ngouoko M. , Bielefeld University/Honda Research
Sept. 8, 2012 · 3:18 p.m.

Log-normal matrix factorization with application to speech-music separation
Takuya Yoshioka, Sakaue Daichi, NTT Communication Science Laboratories
Sept. 8, 2012 · 3:46 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

SAPA SCALE Conference 2012 September 2012 · 22 Talks

Pitch Estimation Using Mutual Information Yanbo Xu, University of Maryland College Park Sept. 7, 2012 · 11:07 a.m.

Establishing some principles of human speech production through two-dimensional computational models Mauro Nicolao, University of Sheffield Sept. 7, 2012 · 11:31 a.m.

Human sound perception - what can we learn from it when developing audio analysis algorithms? Tuomas Virtanen, Tampere University of Technology Sept. 7, 2012 · 11:46 a.m.

A Spectral Envelope Estimation Method Based on F0-Adaptive Multi-Frame Integration Analysis Tomoyasu Nakano, AIST Sept. 7, 2012 · 1:02 p.m.

Cochlear Implant-like Processing of Speech Signal for Speaker Verification Cong-Thanh Do, LIMSI-CNRS/Universite Paris-Sud Sept. 7, 2012 · 1:14 p.m.

Speech intelligibility enhancement for HMM-based synthetic speech in noise Cassia Valentini-Botinhao, University of Edinburgh Sept. 7, 2012 · 1:37 p.m.

A Generalized Stein's Estimation Approach for Speech Enhancement Based on Perceptual Criteria Sunder Ram Krishnan, Indian Institute of Science Sept. 7, 2012 · 2:02 p.m.

Template-based ASR using Posterior features and Synthetic References: comparing different TTS systems Serena Soldo, Idiap Research Institute Sept. 7, 2012 · 2:04 p.m.

Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron Kalu U. Ogbureke, University College Dublin Sept. 7, 2012 · 2:26 p.m.

Non-Stationary Signal Processing and its Application in Speech Recognition Zoltán Tüske, RWTH Aachen University Sept. 7, 2012 · 2:29 p.m.

Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models Liang Lu, University of Edinburgh Sept. 7, 2012 · 3:16 p.m.

Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition using WFST M. Ali Basha Shaik, RWTH Aachen University Sept. 7, 2012 · 3:36 p.m.

Language Identification using Spectro-Temporal Patch features Kamal Sahn, CMU Sept. 8, 2012 · 11:06 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power Youssef Oualil, Saarland University/Idiap Sept. 8, 2012 · 11:31 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power (part 2) Youssef Oualil, Saarland University/Idiap Sept. 8, 2012 · 11:50 a.m.

Structured Sparse Coding for Microphone Array Location Calibration Afsaneh Asaei, Idiap/CMU Sept. 8, 2012 · 11:55 a.m.

Inharmonic Speech: A Tool for the Study of Speech Perception and Separation Dan Ellis, NYU/Columbia/Wakayama Sept. 8, 2012 · 1:19 p.m.

Multi-Channel Speech Separation with Soft Time-Frequency Masking Friedrich Faubel , Saarland University Sept. 8, 2012 · 1:48 p.m.

Smoothing Speech Trajectories by Regularization Louis ten Bosch, Radboud University Nijmegen Sept. 8, 2012 · 2:06 p.m.

Data-driven Speech Representations for NMF-based Word Learning Hugo Van hamme, KU Leuven Sept. 8, 2012 · 2:51 p.m.

Spectro-Temporal Features with Distribution Equalization Samuel K. Ngouoko M. , Bielefeld University/Honda Research Sept. 8, 2012 · 3:18 p.m.

Log-normal matrix factorization with application to speech-music separation Takuya Yoshioka, Sakaue Daichi, NTT Communication Science Laboratories Sept. 8, 2012 · 3:46 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

SAPA SCALE Conference 2012
September 2012 · 22 Talks

Pitch Estimation Using Mutual Information
Yanbo Xu, University of Maryland College Park
Sept. 7, 2012 · 11:07 a.m.

Establishing some principles of human speech production through two-dimensional computational models
Mauro Nicolao, University of Sheffield
Sept. 7, 2012 · 11:31 a.m.

Human sound perception - what can we learn from it when developing audio analysis algorithms?
Tuomas Virtanen, Tampere University of Technology
Sept. 7, 2012 · 11:46 a.m.

A Spectral Envelope Estimation Method Based on F0-Adaptive Multi-Frame Integration Analysis
Tomoyasu Nakano, AIST
Sept. 7, 2012 · 1:02 p.m.

Cochlear Implant-like Processing of Speech Signal for Speaker Verification
Cong-Thanh Do, LIMSI-CNRS/Universite Paris-Sud
Sept. 7, 2012 · 1:14 p.m.

Speech intelligibility enhancement for HMM-based synthetic speech in noise
Cassia Valentini-Botinhao, University of Edinburgh
Sept. 7, 2012 · 1:37 p.m.

A Generalized Stein's Estimation Approach for Speech Enhancement Based on Perceptual Criteria
Sunder Ram Krishnan, Indian Institute of Science
Sept. 7, 2012 · 2:02 p.m.

Template-based ASR using Posterior features and Synthetic References: comparing different TTS systems
Serena Soldo, Idiap Research Institute
Sept. 7, 2012 · 2:04 p.m.

Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron
Kalu U. Ogbureke, University College Dublin
Sept. 7, 2012 · 2:26 p.m.

Non-Stationary Signal Processing and its Application in Speech Recognition
Zoltán Tüske, RWTH Aachen University
Sept. 7, 2012 · 2:29 p.m.

Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models
Liang Lu, University of Edinburgh
Sept. 7, 2012 · 3:16 p.m.

Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition using WFST
M. Ali Basha Shaik, RWTH Aachen University
Sept. 7, 2012 · 3:36 p.m.

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:31 a.m.

Joint Detection and Localization of Multiple Speakers using a Probabilistic Steered Response Power (part 2)
Youssef Oualil, Saarland University/Idiap
Sept. 8, 2012 · 11:50 a.m.

Structured Sparse Coding for Microphone Array Location Calibration
Afsaneh Asaei, Idiap/CMU
Sept. 8, 2012 · 11:55 a.m.

Inharmonic Speech: A Tool for the Study of Speech Perception and Separation
Dan Ellis, NYU/Columbia/Wakayama
Sept. 8, 2012 · 1:19 p.m.

Multi-Channel Speech Separation with Soft Time-Frequency Masking
Friedrich Faubel , Saarland University
Sept. 8, 2012 · 1:48 p.m.

Smoothing Speech Trajectories by Regularization
Louis ten Bosch, Radboud University Nijmegen
Sept. 8, 2012 · 2:06 p.m.

Data-driven Speech Representations for NMF-based Word Learning
Hugo Van hamme, KU Leuven
Sept. 8, 2012 · 2:51 p.m.

Spectro-Temporal Features with Distribution Equalization
Samuel K. Ngouoko M. , Bielefeld University/Honda Research
Sept. 8, 2012 · 3:18 p.m.

Log-normal matrix factorization with application to speech-music separation
Takuya Yoshioka, Sakaue Daichi, NTT Communication Science Laboratories
Sept. 8, 2012 · 3:46 p.m.