Getting started with STRAIGHT in command mode
URL: http://indiana.edu/~acoustic/s522/getti … 0_006b.pdf
Размер: 1,9 MБ (1 908 838 байт)
...
Using default parameters
This section introduces the simplest analysis and re-synthesis procedure.
Reading speech from a file
The first step is to read a file consisting of a speech signal. A file vaiueo2d.wav is used
to illustrate the following
steps. The file is located in the STRAIGHT directory. The following command reads the speech
signal into the
variable x and sets the sampling frequency (Hz) to fs.
[x,fs]=wavread(’vaiueo2d.wav’);
Extracting excitation source parameters
The next step is to extract source parameters. Source parameter consists of a fundamental
frequency f0raw and
periodicity indices ap at each frequency band. The following command does it.
[f0raw,ap]=exstraightsource(x,fs);
The fundamental frequency information is used to guide the following spectral information
extraction.
Extracting smoothed time-frequency representation (STRAIGHT spectrum)
A time-frequency representation n3sgram, that is an F0 adaptively smoothed spectrogram, is
extracted using the
following command.
n3sgram=exstraightspec(x,f0raw,fs);
The algorithm for eliminating interferences caused by a periodic excitation based on an
extended pitch synchronous
analysis and an optimum F0 adaptive smoothing based on a spline theory is the core of
STRAIGHT (and is a stateless
algorithm). Let’s call the time-frequency representation as STRAIGHT spectrogram later on.
Accordingly, let’s
call its time slice as a STRAIGHT spectrum.
Re-synthesizing speech
Speech re-synthesis without modification is just calling the following command.
sy = exstraightsynth(f0raw,n3sgram,ap,fs);
The synthesized speech signal is stored in the variable sy. The following Matlab command
reproduces it from an
audio output.
soundsc(sy,fs);
Скрипт для Matlab 7.0: synthes.m
Путь D:\HTS-demo_CMU_2\data\synthes.m
% Можно использовать любой wav-файл (здесь с частотой дискретизации 48000 Гц)
path(path,'C:\usr\local\STRAIGHTtrial\Resources\STRAIGHTV40pcode\STRAIGHTV40pcode');
fprintf(1,'Synthesizing ./gen/cmu_us_arctic_slt_a0021.wav\n');
[x,fs]=wavread('./wav/cmu_us_arctic_slt_a0021.wav'); %Reading speech from a file
[f0raw,ap]=exstraightsource(x,fs); % Extracting excitation source parameters
n3sgram=exstraightspec(x,f0raw,fs); %Extracting smoothed time-frequency representation
(STRAIGHT spectrum)
% Re-synthesizing speech
% Speech re-synthesis without modification is just calling the following command.
sy = exstraightsynth(f0raw,n3sgram,ap,fs);
soundsc(sy,fs); % sound
break; % прервать скрипт
quit; % выход из матлаб
Скрипт для Matlab 7.0: synthes2.m
Путь D:\HTS-demo_CMU_2\data\synthes2.m
path(path,'C:\usr\local\STRAIGHTtrial\Resources\STRAIGHTV40pcode\STRAIGHTV40pcode');
prm.spectralUpdateInterval = 5.000000;
prm.F0searchUpperBound=280;
prm.F0searchLowerBound=110;
fprintf(1,'Synthesizing ./gen/cmu_us_arctic_slt_a0021.wav\n');
[x,fs]=wavread('./wav/cmu_us_arctic_slt_a0021.wav'); %Reading speech from a file
[f0raw,ap]=exstraightsource(x,fs,prm); % Extracting excitation source parameters
n3sgram=exstraightspec(x,f0raw,fs,prm); % Extracting smoothed time-frequency representation (STRAIGHT spectrum)
% Re-synthesizing speech
% Speech re-synthesis without modification is just calling the following command.
sy = exstraightsynth(f0raw,n3sgram,ap,fs,prm);
soundsc(sy,fs); % sound
% [sy] = exstraightsynth(f0raw,n3sgram,ap,48000,prm); % it work
wavwrite( sy/max(abs(sy)), 48000, './gen/cmu_us_arctic_slt_a0021.wav');
----------------
файлы ap, sp, f0 в текстовом формате должны быть преобразованы в формат float le
./gen/_txtap_2_ap.bat
x2x +af < ../ap/cmu_us_arctic_slt_a0021.ap > cmu_us_arctic_slt_a0021.ap
./gen/_txtap_2_ap.bat
x2x +af < ../ap/cmu_us_arctic_slt_a0021.ap > cmu_us_arctic_slt_a0021.ap
./gen/_txtsp_2_sp.bat
x2x +af < ../sp/cmu_us_arctic_slt_a0021.sp > cmu_us_arctic_slt_a0021.sp
Скрипт для Matlab 7.0: synthes3.m
Путь D:\HTS-demo_CMU_2\data\synthes3.m
path(path,'C:\usr\local\STRAIGHTtrial\Resources\STRAIGHTV40pcode\STRAIGHTV40pcode');
prm.spectralUpdateInterval = 5.000000;
prm.F0searchUpperBound=280;
prm.F0searchLowerBound=110;
fprintf(1,'Synthesizing ./gen/cmu_us_arctic_slt_a0021.wav\n');
% [x,fs]=wavread('./wav/cmu_us_arctic_slt_a0021.wav'); %Reading speech from a file
% [f0raw,ap]=exstraightsource(x,fs,prm); % Extracting excitation source parameters
% n3sgram=exstraightspec(x,f0raw,fs,prm); % Extracting smoothed time-frequency representation (STRAIGHT spectrum)
% save './gen/cmu_us_arctic_slt_a0021_ascii.sp' n3sgram -ascii;
% save './gen/cmu_us_arctic_slt_a0021_ascii.f0' f0raw -ascii;
% save './gen/cmu_us_arctic_slt_a0021_ascii.ap' ap -ascii;
% %файлы в текстовом формате должны быть преобразованы в формат float le
fid1 = fopen('./gen/cmu_us_arctic_slt_a0021.sp','r','ieee-le');
n3sgram = fread(fid1,[1025, 500],'float');
fclose(fid1);
fid2 = fopen('./gen/cmu_us_arctic_slt_a0021.ap','r','ieee-le');
ap = fread(fid1,[1025, 500],'float');
fclose(fid2);
fid3 = fopen('./gen/cmu_us_arctic_slt_a0021.f0','r','ieee-le');
f0raw = fread(fid3,[1, 500],'float32');
fclose(fid3);
%break;
% Re-synthesizing speech
% Speech re-synthesis without modification is just calling the following command.
sy = exstraightsynth(f0raw,n3sgram,ap,fs,prm);
soundsc(sy,fs); % sound
% [sy] = exstraightsynth(f0raw,n3sgram,ap,48000,prm);
wavwrite( sy/max(abs(sy)), 48000, './gen/cmu_us_arctic_slt_a0021.wav');
Отредактировано Inprj21 (2015-03-05 05:24:49)