<-- Home

Ruby简单的音频变速处理

_(:з」∠)_
以前用SoundTouch库写了一个简单的声音合成程序,可以对一个简单的声音样本进行处理,让其变调、变速,从而合成新的音乐。 然后这几天我又研究了下音频变速是如何实现的,学了一个最简单的SOLA算法,并用Ruby实现。

SOLA(同步波形叠加法,Synchronized Overlap-Add)算法属于语音变速不变调算法中的时域法。 SOLA算法原理很简单,从原始声音数据的开头处取出一定大小的数据,假如取7个sample,放在一个新的Buffer,然后在原始数据再往后面的数据中取9个sample,与前面的7个sample做一个叠加,叠加的范围我们假设为2,那么(7-2) /9 =0.555,这就意味着声音的持续时间和原来相比减少了约44.5%。同时注意到一点,时间的间隔(采样频率)并没有改变,也就是说声音的频率(音调)没有发生改变。至于为什么要叠加一部分,就是为了抑制这种由不连续的抽取声音信号造成的数据丢失所引发的噪音或者声音过度不自然。

具体流程是先拷贝一个序列到开头,接着找到最佳的叠加位置,通过计算归一化互相关系数来比较得到。 虽然说的很复杂,但是写起来还是很简单的。

require 'wavefile'
include WaveFile

class Sola
  Sequence = 4410
  OverlapSize = 882
  SeekWindow = 662
  FlatDuration = Sequence - 2 * OverlapSize
  attr_reader :samples, :input
  #叠加处理
  def overlap(offset, prev, pnew)
    OverlapSize.times do |i|
      @samples[offset + i] = (@input[prev + i] * (OverlapSize - i) + @input[pnew + i] * i) / OverlapSize
    end
  end
  #找最佳叠加位置
  def seek(prev, pnew)
    bestoffset = 0
    bestcorr = -1.0e30
    temp = []
    OverlapSize.times do |i|
      temp[i] = (@input[prev + i] * i * (OverlapSize - i)).to_f
    end
    SeekWindow.times do |i|
      crosscorr = 0.0
      OverlapSize.times do |j|
        crosscorr += @input[pnew + i + j].to_f * temp[j]
      end
      if crosscorr > bestcorr
        bestcorr = crosscorr
        bestoffset = i
      end      
    end
    bestoffset
  end

  def initialize(input, scale)
    @samples = []
    @input = input
    seq_offset = 0
    prev_offset  = 0
    out_offset = 0
    in_offset = 0
    num_in_samples = @input.length
    skip =  ((Sequence - OverlapSize) * scale).to_i
    #Sola算法
    while num_in_samples > skip + SeekWindow
      @samples += @input[seq_offset..(seq_offset + FlatDuration)]
      prev_offset = seq_offset + FlatDuration
      in_offset += Sequence - OverlapSize
      seq_offset = in_offset + seek(prev_offset, in_offset)
      overlap(out_offset, prev_offset, seq_offset)
      seq_offset += OverlapSize
      in_offset  += OverlapSize
      out_offset += Sequence - OverlapSize
      num_in_samples -= Sequence
    end
  end
end

wavr = Reader.new("in.wav")
samples = wavr.read(wavr.total_sample_frames).samples
#加快20%
newsamples = Sola.new(samples.flatten, 1.2).samples 
buffer = Buffer.new(newsamples, wavr.format)
Writer.new("out.wav", wavr.format) do |writer|
  writer.write(buffer)
end

下面是处理前的原始音频

处理后,加快20%

但是最尴尬的事情是,我处理一个4MB的WAV文件居然用了5分钟,这TM也太慢了,主要时间花在了seek方法上,这个两重循环简直要命!

以后研究一下Phase Vocoder变调算法。