
ASV security remains an unsolved problem, because there is no universal solution that does not depend on the speech synthesis methods used by the attacker. Anti-spoofing approaches can be based on searching for phase and tone frequency anomalies appearing during speech synthesis and on a preliminary knowledge of the acoustic differences of specific speech synthesizers. Speech synthesis attacks are the most dangerous as the technologies of speech synthesis are developing rapidly (GAN, Unit selection, RNN, etc.). Automatic speaker verification systems (ASV) are vulnerable to various types of spoofing attacks: impersonation, replay attacks, voice conversion, and speech synthesis attacks. Voice biometrics security is a large-scale problem significantly raised over the past few years. The paper considers methods of countering speech synthesis attacks on voice biometric systems in banking. Furthermore, due to limited research, data should be analyzed both with and without vocalic detection until it becomes clear which one is more valid. The maximum limit of the ADSV extraction range for male participants should be changed from 300 Hz to 200 Hz for connected speech readings to obtain accurate CPP F0 measures. It was concluded that separate normative data should be applied clinically for all four age/gender groups. In general, for both vowels and connected speech, younger women had markedly higher CPP F0 values than older women, while older men had slightly higher CPP F0 values compared to younger men. Younger speakers had better voice quality (CPP) than older speakers. Age did not affect voice quality for vowels /a/ and /i/ however, it did affect it for connected speech. Male voice quality (CPP and L/H spectral ratio) was better in vowels /a/ and /i/, but female voice quality was better (CPP values) for connected speech. Dependent variables were Cepstral Peak Prominence (CPP), Low-to-High Spectral Ratio (L/H spectral ratio), and Cepstral Peak Prominence Fundamental Frequency (CPP F0) for both vowels and connected speech. Speakers were asked to sustain the vowels /a/ and /i/, read out loud four CAPE-V stimulus sentences, and the 2nd and 3rd sentence of the Rainbow Passage. Sixty participants consisting of fifteen males and fifteen females, ages 20-30 years, and fifteen males and fifteen females, ages 40-50 years contributed speech samples to be analyzed in this study. Therefore, the purpose of this study was to provide normative data for Long-Term Average spectral- and cepstral-based measures for both men and women in two different age groups to aid clinicians with assessing and treating voice disorders. However, the small numbers of normal subjects in previous research and wide age ranges prevent a good estimation of the performance of normal speakers of various ages on these measures. A review of recent literature suggested that cepstral- and spectral-based acoustic measures showed good potential as objective measures of dysphonia for clinical application.
