Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Modified PSOLA-Genetic Algorithm Based Approach for Voice Re-Construction


Affiliations
1 Department of Computer Science & Engineering, Jaypee University of Engineering & Technology, Guna, Madhya Pradesh, India
2 Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India
     

   Subscribe/Renew Journal


The process by which we try to reconstruct or regenerate a voice sample from a source sample or try to modify a source voice to a desirable voice, is called synthetic voice generation or artificial voice or voice conversion. The basic and conventional remedies to overcome this issue are based on training and applying conversion functions which generally require a suitable amount of pre-stored training data from both the source and the target speaker. The paper deals with a very crucial issue of achieving the required prosody, timber and some other unique voice templates by considerably reducing the dependence on the sample training dataset of voice. We needed to find out a way by which we can have templates of the "to be achieved voice" which are nearly same parametrically. This is achieved by assigning a marker to the target voice sample for training. A proper estimation of the transformation function can be made possible only by the above mentioned data. We can get the process done by pre-existing methods. In a nutshell, what we proposed is a system by which even in the scarce availability of training dataset, we can reach to a considerable amount of closeness of the target voice. Even though there is a disadvantage that to have higher precision and closer resemblance, we need to have clear idea of the system of spelling that a language uses.

Keywords

Artificial Voice, Prosody, Timber, Source Voice, Target Voice, Formant Structure.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Connaghan, K. P. & Patel, R. (2013) Impact of prosodic strategies on vowel intelligibility in childhood motor speech impairment. Journal of Medical Speech Language Pathology, 20(4), 133-139. Ganvit,Y., Lokhandwala, M. A. & Bhatt, N. S. (2012).
  • Implementation and overall performance evaluation of voice morphing based on PSOLA algorithm. International Journal of Advanced Engineering Technology, June, 3(2), 75-78.
  • Lemmetty, S. & Karjalainen, M. (1999). Review of Speech Synthesis Technology (Phd. Thesis) Mangayyagari, S. & Sankar, R. (2007). Pitch Conversion Based on Pitch Mark Mapping. IEEE Proceedings Southeast Conference, 2007.
  • Naniwa, Y., Kondo, T. & Kamiyama, K. (2012). Study on the Artificial Synthesis of Human Voice Using Radial Basis Function Networks. In Proceedings in Information and Communications Technology, 4, 291-300.
  • Patel, R., Connaghan, K., Franco, D., Edsall, E., Forgit, D., Olsen, L., Ramage, L., Tyler, E. & Russell, S. (2013). The caterpillar: A novel reading passage for assessment of motor speech disorders. American Journal of Speech Language Pathology, February, 22, 1-9.
  • Patel, R., Hustad, K., Connaghan, K. P. & Furr, W. (2013). Relationship between prosody and intelligibility in children with Dysarthria. Journal of Medical Speech Language Pathology, 20(4), 95-99.
  • Patel, R., Niziolek, C., Reilly, K. & Guenther, F. (2011). Prosodic adaptations to pitch perturbation in running speech. Journal of Speech Language and Hearing Research, 54, 1051-1059.
  • Raghunath, A., Veerapandian, G. A. & Subramanian, V. G. (2013). Reconstruction of Human Voice for Impersonation. Final Report, November 18, 2013.
  • Yamagishi, J., Veaux, C., King, S. & Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1-5.
  • Ye, H. & Young, S. (2004). Voice Conversion for Unknown Speakers. In Conference Proceedings of INTERSPEECH 2004-ICSLP. 8th International Conference on Spoken Language Processing.
  • Ye. H. & S. Young (2003). Perceptually Weighted Linear Transformations for Voice Conversion. In Proceeding of 8th European Conference on Speech and Technology, EUROSPEECH 2003 INTERSPEECH 2003, Geneva, Switzerland.

Abstract Views: 297

PDF Views: 0




  • Modified PSOLA-Genetic Algorithm Based Approach for Voice Re-Construction

Abstract Views: 297  |  PDF Views: 0

Authors

Partha Sarthy Banerjee
Department of Computer Science & Engineering, Jaypee University of Engineering & Technology, Guna, Madhya Pradesh, India
Uttam Kumar Roy
Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India

Abstract


The process by which we try to reconstruct or regenerate a voice sample from a source sample or try to modify a source voice to a desirable voice, is called synthetic voice generation or artificial voice or voice conversion. The basic and conventional remedies to overcome this issue are based on training and applying conversion functions which generally require a suitable amount of pre-stored training data from both the source and the target speaker. The paper deals with a very crucial issue of achieving the required prosody, timber and some other unique voice templates by considerably reducing the dependence on the sample training dataset of voice. We needed to find out a way by which we can have templates of the "to be achieved voice" which are nearly same parametrically. This is achieved by assigning a marker to the target voice sample for training. A proper estimation of the transformation function can be made possible only by the above mentioned data. We can get the process done by pre-existing methods. In a nutshell, what we proposed is a system by which even in the scarce availability of training dataset, we can reach to a considerable amount of closeness of the target voice. Even though there is a disadvantage that to have higher precision and closer resemblance, we need to have clear idea of the system of spelling that a language uses.

Keywords


Artificial Voice, Prosody, Timber, Source Voice, Target Voice, Formant Structure.

References