<-- Home

Ruby简单的音频变调处理

好久没有更新了,我也是越来越懒了。

最近研究了下Phase Vocoder变调算法。

相位声码器(Phase Vocoder),维基百科讲了基本原理,总结一下就是先将音频拉伸,使用快速傅里叶变换进行插值重叠部分音频,调整相位确保重叠的音频之间没有影响, 然后进行快速傅里叶逆变换,重新叠加合成声音。

变速与音高的关系:一段音频可以被编码为一个一维数组,如果将这个一维数组的偶数项元素删除,这个数组的长度变成原来一半,音频的长度减半了,音频速度变成两倍,频率也会翻倍,音频会拥有更高的音高。

如果需要一个更高的音高,可以先将这段音频拉伸并保持音高不变,然后再加快它的速度,如此最后得到的音频将具有原始音频同样的长度,更高的频率,更高的音高。

网上有很多这个算法的C/C++、Java、Python、Matlab实现,但是我没有找到Ruby的,我参考了一个Python和一个C语言的实现。因为这个算法用到了FFT,需要科学计算函数库的支持,而Ruby对科学计算的支持……呃,找了一会发现了Ruby/Numo::NArray这个gem,基本能满足我的需要。

Ruby-Numo提供了基本的科学计算功能类似python的numpy,其中Numo::NArray、Numo::FFTE提供了用于科学计算的多维向量和FFT的功能。

获取一段音频文件的采样,将其表示为一维数组,然后把这个数组中元素按照一定间隔删除(添加),来改变数组长度,使音频加速(减速),音高增加(减少)。

def arange(start, size, factor)
  Numo::DFloat.new((size.to_f / factor).ceil).seq(start, factor)
end

def speedx(snd_array, factor)
  index = arange(0, snd_array.size, factor).round
  index = index.[](index < snd_array.size).cast_to(Numo::Int32)
  return snd_array.[](index)
end

对数字信号进行快速傅里叶变换,可得到数字信号的分析频谱,为了减少频谱泄漏,通常在采样后对信号加窗。应用最广泛的窗函数是汉宁窗(hanning window)。 Ruby实现的hanning window

def hanning(size)
  signal = []
  (0...size).each do |i|
    signal << 0.5 - 0.5 * Math.cos((2 * Math::PI * i) / (size - 1))
  end
  return Numo::DFloat.cast(signal)
end

使用快速傅里叶变换进行插值重叠部分音频,调整相位,进行快速傅里叶逆变换合成声音,实现音频的拉伸

def stretch(snd_array, factor, window_size, h)

  phase = Numo::DFloat.zeros(window_size)
  hanning_window = hanning(window_size) 
  result = Numo::DFloat.zeros(snd_array.size.to_f / factor + window_size )

  arange(0, snd_array.size - (window_size + h), h * factor).each do |i|
    # 将原始音频截取成两部分
    a1 = snd_array.[](i...(i + window_size))
    a2 = snd_array.[]((i + h)...(i + window_size + h))
    # FFT插值
    s1 = Numo::FFTE.zfft1d((hanning_window * a1).cast_to(Numo::DComplex), -1)
    s2 = Numo::FFTE.zfft1d((hanning_window * a2).cast_to(Numo::DComplex), -1)
    phase = (phase + (s2 / s1).arg) % 2 * Math::PI
    # IFFT 重新叠加
    rephase = Numo::FFTE.zfft1d(s2.abs * Numo::NMath.exp(phase * 1i), 1)
    # 调整完成, 加hanning window
    x = (i.to_f / factor).to_i
    result[x...(x + window_size)] += hanning_window * rephase.real
  end
  # 调整音量大小
  result = result * (2 ** 14) / result.max
  result.cast_to(Numo::Int16)
end

先拉伸,后调整速度即可实现音高的改变

def pitch(snd_array, n, window_size=2**13, h=2**11)
  # 12个半音一个8度
  factor = 2 ** (n / 12.0)
  stretched = stretch(snd_array, 1.0 / factor, window_size, h)
  return speedx(stretched.[](window_size...stretched.size), factor)
end

完整代码

require 'numo/narray'
require 'numo/ffte'
require 'wavefile'
require 'byebug'

include WaveFile

def arange(start, size, factor)
  Numo::DFloat.new((size.to_f / factor).ceil).seq(start, factor)
end

def speedx(snd_array, factor)
  index = arange(0, snd_array.size, factor).round
  index = index.[](index < snd_array.size).cast_to(Numo::Int32)
  return snd_array.[](index)
end

def hanning(size)
  signal = []
  (0...size).each do |i|
    signal << 0.5 - 0.5 * Math.cos((2 * Math::PI * i) / (size - 1))
  end
  return Numo::DFloat.cast(signal)
end


def stretch(snd_array, factor, window_size, h)
  phase = Numo::DFloat.zeros(window_size)
  hanning_window = hanning(window_size)
  result = Numo::DFloat.zeros(snd_array.size.to_f / factor + window_size )
  arange(0, snd_array.size - (window_size + h), h * factor).each do |i|
    a1 = snd_array.[](i...(i + window_size))
    a2 = snd_array.[]((i + h)...(i + window_size + h))

    s1 = Numo::FFTE.zfft1d((hanning_window * a1).cast_to(Numo::DComplex), -1)
    s2 = Numo::FFTE.zfft1d((hanning_window * a2).cast_to(Numo::DComplex), -1)

    phase = (phase + (s2 / s1).arg) % 2 * Math::PI
    rephase = Numo::FFTE.zfft1d(s2.abs * Numo::NMath.exp(phase * 1i), 1)
    
    x = (i.to_f / factor).to_i
    result[x...(x + window_size)] += hanning_window * rephase.real
  end
  result = result * (2 ** 14) / result.max
  result.cast_to(Numo::Int16)
end


def pitch(snd_array, n, window_size=2**13, h=2**11)
  factor = 2 ** (n / 12.0)
  stretched = stretch(snd_array, 1.0 / factor, window_size, h)
  return speedx(stretched.[](window_size...stretched.size), factor)
end


wavr = Reader.new("ye.wav");
samples = wavr.read(wavr.total_sample_frames).samples.flatten

new_samples = pitch(Numo::DFloat.cast(samples), 12).cast_to(Numo::Int16).to_a

buffer = Buffer.new(new_samples, wavr.format)

Writer.new("new.wav", wavr.format) do |writer|
  writer.write(buffer)
end

原始音频

提高1个八度

_(:з」∠)_