A language family is a group of languages which have descended from a common mother language. Since the ancestor is common, these languages are expected to be similar in some respect and manifest the similarity in scientific experiments. In language identification, language-specific features are extracted from speech and a model is created which represents the language. This work extends the language identification framework to capture features common to language families and create models which can efficiently represent the language families. Mel frequency cepstral coefficient (MFCC) and speech signal-based frequency cepstral coefficient (SFCC) are used as primary feature extraction tools. A combination of these along with shifted delta coefficient (SDC) gives the final set of features. The work uses Gaussian mixture model (GMM) and support vector machines (SVM) as modelling tools. Different combinations of these feature extraction and modelling techniques are used to get four different systems: MFCC + SDC + GMM, SFCC + SDC + GMM, MFCC + SDC + SVM and SFCC + SDC + SVM. Experiments with these systems show that the language families can be identified with reasonable accuracy. Further, the work tests the influence of one language family on the other and finds that in most cases, the languages which are spoken in areas lying on the boundary of two families are more influenced by the other family. A deviation from it can relate to geopolitical isolation of two neighbouring regions and thus can give new insights or corroborate investigations of historians.
Keywords
Feature Extraction, Language Family, Modeling Techniques, Mutual Influence.
User
Font Size
Information