A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://patents.google.com/patent/CN105976823B/en below:

CN105976823B - Adaptive audio water mark method and system based on phase code

基于相位编码的自适应音频水印方法及系统Adaptive audio watermarking method and system based on phase coding

技术领域technical field

本发明涉及数字音频水印领域,尤其涉及基于相位编码的自适应音频水印方法及系统。The invention relates to the field of digital audio watermarking, in particular to an adaptive audio watermarking method and system based on phase coding.

背景技术Background technique

数字音频水印是向音频信号中添加某些数字信息以达到文件真伪鉴别、版权保护、信息隐藏等目的的信号处理操作。基于相位编码的自适应音频水印系统是指根据心理声学模型在相位谱上动态的调整水印的嵌入强度,确保音频水印在满足不可感知性的条件下鲁棒性最大。传统的基于相位编码的音频水印嵌入算法,直接在相位谱上添加固定强度水印。如果嵌入水印的强度过大,很容易产生噪声,影响音质;如果嵌入水印的强度过小,在检测是不易检查出来,影响鲁棒性。另外音频信号是动态变化的,即使在某些区域嵌入的强度适宜,但对另一些区域来说可能嵌入强度过大或过小。这样的水印嵌入方式使音频水印不能同时满足不可感知性和鲁棒性。Digital audio watermarking is a signal processing operation that adds some digital information to the audio signal to achieve the purpose of document authenticity identification, copyright protection, and information hiding. The adaptive audio watermarking system based on phase coding refers to dynamically adjusting the embedding strength of the watermark on the phase spectrum according to the psychoacoustic model, so as to ensure that the audio watermark is the most robust under the condition of imperceptibility. The traditional audio watermark embedding algorithm based on phase encoding directly adds a fixed intensity watermark on the phase spectrum. If the strength of the embedded watermark is too large, it is easy to generate noise and affect the sound quality; if the strength of the embedded watermark is too small, it is difficult to detect it during detection, which affects the robustness. In addition, the audio signal is dynamically changing, and even if the embedded strength is suitable in some areas, it may be too strong or too weak for other areas. Such watermark embedding method makes audio watermark unable to satisfy imperceptibility and robustness at the same time.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供基于相位编码的自适应音频水印技术方案,使在相位谱的水印嵌入强度根据音频信号自适应的调整,来达到音频水印不可感知性和鲁棒性的折衷。The purpose of the present invention is to provide an adaptive audio watermarking technical solution based on phase coding, so that the watermark embedding strength in the phase spectrum can be adaptively adjusted according to the audio signal, so as to achieve a compromise between the imperceptibility and robustness of the audio watermark.

本发明技术方案提供一种基于相位编码的自适应音频水印方法,包括嵌入过程和检测过程,The technical scheme of the present invention provides an adaptive audio watermarking method based on phase coding, which includes an embedding process and a detection process,

所述嵌入过程包括以下步骤,The embedding process includes the following steps,

步骤A1,读取音频文件,得到时域的音频信号x和采样率fs1,对时域的音频信号x先分帧,帧长用N表示,xn表示第n帧时域信号,再做时频变换,得到频域的音频信号的幅值谱Xn以及相位谱 Step A1, read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, divide the audio signal x in the time domain into frames, the frame length is represented by N, and x n represents the time domain signal of the nth frame, and then do frequency transform to obtain the amplitude spectrum X n and the phase spectrum of the audio signal in the frequency domain

步骤A2,根据采样率fs1、帧长度N以及根据人耳感知敏感的频率部分预设的嵌入的开始频率FWMIN、结束频率FWMAX,计算各帧频域信号能够嵌入水印的范围,得到此范围的最大值freqmax1和最小值freqmin1,选取此范围内的频域音频信号;Step A2, according to the sampling rate fs1, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception, calculate the range that the frequency domain signal of each frame can embed the watermark, and obtain the maximum value of this range. The value freqmax1 and the minimum value freqmin1, select the frequency domain audio signal within this range;

freqmin1=floor((FWMIN×2.0/fs1)×N)freqmin1=floor((FWMIN×2.0/fs1)×N)

freqmax1=floor((FWMAX×2.0/fs1)×N)freqmax1=floor((FWMAX×2.0/fs1)×N)

其中,floor是向下取整函数;Among them, floor is the round-down function;

步骤A3,利用密钥key作为随机数种子,生成长度为freqmax1-freqmin1+1的二进制伪随机的扩频序列u;Step A3, using the key key as a random number seed, generating a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;

步骤A4,进行对于音调成分的判断如下,Step A4, the judgment of the pitch component is as follows,

其中,k表示所在的频率局部最大值点,j表示离局部最大值点k的距离,Pn[k]dB表示第n帧信号的在局部最大值点k处的信号功率,Pn[k-j]dB表示距离最大值点j处的信号功率值;Among them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB represents the signal power value at the distance from the maximum point j;

根据判断结果,采用全局掩蔽阈值Thn,得到相位谱的相位掩蔽阈值θn;According to the judgment result, adopt the global masking threshold Th n to obtain the phase masking threshold θ n of the phase spectrum;

步骤A5,根据伪随机序列u、相位掩蔽阈值θn和水印比特b,利用以下公式在音频的相位谱上进行水印的嵌入,得到嵌入水印后的相位谱如下,Step A5, according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b, use the following formula in the audio phase spectrum Embed the watermark on and get the phase spectrum after embedding the watermark as follows,

其中,α为常数,控制水印嵌入的强度;Among them, α is a constant, which controls the strength of watermark embedding;

利用频域信号的幅度谱Xn和嵌入水印后的相位谱然后通过欧拉公式得到嵌入水印后的频域信号如下,Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula as follows,

其中,Yn为嵌入水印后的频域信号,e为自然指数;Among them, Y n is the frequency domain signal after embedding the watermark, and e is the natural index;

步骤A6,将嵌入水印后的频域信号Yn变换到时域信号yn,生成带有水印的音频文件;所述检测过程包括以下步骤,Step A6, transform the frequency-domain signal Y n after embedding the watermark into the time-domain signal y n to generate an audio file with a watermark; the detection process comprises the following steps,

步骤B1,读取带有水印的时域音频文件,得到时域的带有水印的音频信号的幅值数据z和采样率fs2,对时域信号先分帧,帧长为N,zn为待检测信号的第n帧;再做时频变换,得到频域的音频信号的幅度谱Zn和相位谱ξn;Step B1, read the time domain audio file with the watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time domain, divide the time domain signal into frames, the frame length is N, and z n is: The nth frame of the signal to be detected; do time-frequency transformation again to obtain the amplitude spectrum Z n and the phase spectrum ξ n of the audio signal in the frequency domain;

步骤B2,根据采样率fs2,帧长度N以及根据人耳感知敏感的频率部分预设的嵌入的开始频率FWMIN、结束频率FWMAX,计算各帧频域信号能够嵌入水印的范围,得到此范围的最大值freqmax2和最小值freqmin2,选取此范围内的音频的幅度谱;Step B2, according to the sampling rate fs2, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception, calculate the range that the frequency domain signal of each frame can embed the watermark, and obtain the maximum value of this range. The value freqmax2 and the minimum value freqmin2, select the amplitude spectrum of the audio within this range;

freqmin2=floor((FWMIN×2.0/fs2)×N)freqmin2=floor((FWMIN×2.0/fs2)×N)

freqmax2=floor((FWMAX×2.0/fs2)×N)freqmax2=floor((FWMAX×2.0/fs2)×N)

其中,floor是向下取整函数;Among them, floor is the round-down function;

步骤B3,利用密钥key作为随机数种子,生成长度为freqmax2-freqmin2+1的二进制伪随机的扩频序列u;Step B3, use the key key as the random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax2-freqmin2+1;

步骤B4,根据以下相关统计检验公式,对伪随机序列u和待检测信号第n帧的相位谱ξn,做相关计算,得到待检信号第n帧信号的检测充分统计量rn;Step B4, according to the following correlation statistical test formula, perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected, and obtain the detection sufficient statistic rn of the signal of the nth frame of the signal to be detected;

其中,<·>表示信号的内积计算;Among them, <·> represents the inner product calculation of the signal;

如果检测充分统计量rn≥0,那么检测到的水印比特b=1;否则为b=0。If the detected sufficient statistic rn ≥ 0, then the detected watermark bit is b=1; otherwise, b=0.

而且,步骤A4中,利用三角关系得到相位谱的相位掩蔽阈值θn, Moreover, in step A4, the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship,

本发明相应提供一种基于相位编码的自适应音频水印系统,包括音频水印嵌入子系统和自适应音频水印检测子系统,The present invention correspondingly provides an adaptive audio watermarking system based on phase coding, comprising an audio watermark embedding subsystem and an adaptive audio watermark detection subsystem,

所述音频水印嵌入子系统包括以下模块,The audio watermark embedding subsystem includes the following modules,

第一时频转换模块,用于读取音频文件,得到时域的音频信号x和采样率fs1,对时域的音频信号x先分帧,帧长用N表示,xn表示第n帧时域信号,再做时频变换,得到频域的音频信号的幅值谱Xn以及相位谱 The first time-frequency conversion module is used to read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, divide the audio signal x in the time domain into frames, the frame length is represented by N, and x n represents the nth frame. domain signal, and then do time-frequency transformation to obtain the amplitude spectrum X n and phase spectrum of the audio signal in the frequency domain

第一嵌入范围选择模块,用于根据采样率fs1、帧长度N以及根据人耳感知敏感的频率部分预设的嵌入的开始频率FWMIN、结束频率FWMAX,计算各帧频域信号能够嵌入水印的范围,得到此范围的最大值freqmax1和最小值freqmin1,选取此范围内的频域音频信号;The first embedding range selection module is used to calculate the range in which the watermark can be embedded in the frequency domain signal of each frame according to the sampling rate fs1, the frame length N, and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception , get the maximum value freqmax1 and minimum value freqmin1 in this range, and select the frequency domain audio signal within this range;

freqmin1=floor((FWMIN×2.0/fs1)×N)freqmin1=floor((FWMIN×2.0/fs1)×N)

freqmax1=floor((FWMAX×2.0/fs1)×N)freqmax1=floor((FWMAX×2.0/fs1)×N)

其中,floor是向下取整函数;Among them, floor is the round-down function;

第一扩频序列生成模块,用于利用密钥key作为随机数种子,生成长度为freqmax1-freqmin1+1的二进制伪随机的扩频序列u;The first spread spectrum sequence generation module is used for using the key key as a random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;

改进的心理声学模块,用于进行对于音调成分的判断如下,The improved psychoacoustic module is used to judge the tonal components as follows,

其中,k表示所在的频率局部最大值点,j表示离局部最大值点k的距离,Pn[k]dB表示第n帧信号的在局部最大值点k处的信号功率,Pn[k-j]dB表示距离最大值点j处的信号功率值;Among them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB represents the signal power value at the distance from the maximum point j;

根据判断结果,采用全局掩蔽阈值Thn,得到相位谱的相位掩蔽阈值θn;According to the judgment result, adopt the global masking threshold Th n to obtain the phase masking threshold θ n of the phase spectrum;

加性嵌入模块,用于根据伪随机序列u、相位掩蔽阈值θn和水印比特b,利用以下公式在音频的相位谱上进行水印的嵌入,得到嵌入水印后的相位谱如下,The additive embedding module is used to use the following formula in the phase spectrum of the audio according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b Embed the watermark on and get the phase spectrum after embedding the watermark as follows,

其中,α为常数,控制水印嵌入的强度;Among them, α is a constant, which controls the strength of watermark embedding;

利用频域信号的幅度谱Xn和嵌入水印后的相位谱然后通过欧拉公式得到嵌入水印后的频域信号如下,Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula as follows,

其中,Yn为嵌入水印后的频域信号,e为自然指数;Among them, Y n is the frequency domain signal after embedding the watermark, and e is the natural index;

时频逆变换模块,用于将嵌入水印后的频域信号Yn变换到时域信号yn,生成带有水印的音频文件;a time-frequency inverse transform module, used to transform the frequency domain signal Y n after embedding the watermark into the time domain signal y n to generate an audio file with a watermark;

所述自适应音频水印检测子系统包括以下模块,The adaptive audio watermark detection subsystem includes the following modules,

第二时频转换模块,用于读取带有水印的时域音频文件,得到时域的带有水印的音频信号的幅值数据z和采样率fs2,对时域信号先分帧,帧长为N,zn为待检测信号的第n帧;再做时频变换,得到频域的音频信号的幅度谱Zn和相位谱ξn;The second time-frequency conversion module is used to read the time-domain audio file with a watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time-domain, and divide the time-domain signal into frames first, and the frame length is is N, z n is the nth frame of the signal to be detected; do time-frequency transformation again to obtain the amplitude spectrum Zn and phase spectrum ξ n of the audio signal in the frequency domain;

第二嵌入范围选择模块,用于根据采样率fs2,帧长度N以及根据人耳感知敏感的频率部分预设的嵌入的开始频率FWMIN、结束频率FWMAX,计算各帧频域信号能够嵌入水印的范围,得到此范围的最大值freqmax2和最小值freqmin2,选取此范围内的音频的幅度谱;The second embedding range selection module is used to calculate the range in which the watermark can be embedded in the frequency domain signal of each frame according to the sampling rate fs2, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception , get the maximum value freqmax2 and minimum value freqmin2 in this range, and select the amplitude spectrum of the audio in this range;

freqmin2=floor((FWMIN×2.0/fs2)×N)freqmin2=floor((FWMIN×2.0/fs2)×N)

freqmax2=floor((FWMAX×2.0/fs2)×N)freqmax2=floor((FWMAX×2.0/fs2)×N)

其中,floor是向下取整函数;Among them, floor is the round-down function;

第二扩频序列生成模块,用于利用密钥key作为随机数种子,生成长度为freqmax2-freqmin2+1的二进制伪随机的扩频序列u;The second spread spectrum sequence generation module is used to use the key key as a random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax2-freqmin2+1;

相关提取模块,用于根据以下相关统计检验公式,对伪随机序列u和待检测信号第n帧的相位谱ξn,做相关计算,得到待检信号第n帧信号的检测充分统计量rn;The correlation extraction module is used to perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected according to the following correlation statistical test formula, and obtain the detection sufficient statistic r n of the signal of the nth frame of the signal to be detected ;

其中,<·>表示信号的内积计算;Among them, <·> represents the inner product calculation of the signal;

如果检测充分统计量rn≥0,那么检测到的水印比特b=1;否则为b=0。If the detected sufficient statistic rn ≥ 0, then the detected watermark bit is b=1; otherwise, b=0.

而且,改进的心理声学模块中,利用三角关系得到相位谱的相位掩蔽阈值θn, Moreover, in the improved psychoacoustic module, the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship,

本发明选择在音频信号的相位谱上嵌入水印依赖于人耳对相位修改的不敏感。通过放宽对心理声学模型一中频谱有音调区域的判断,来得到更多的有音调的成分,使得计算的全局掩蔽阈值更精确,利用可修改的幅度与可修改相位角度之间的三角关系得到相位角度的掩蔽阈值,从而能够在相位谱上根据相位角度的掩蔽阈值自适应的调整水印的嵌入强度,确保音频水印在不可感知的情况下,使水印的嵌入强度最大来确保音频水印的鲁棒性。本发明技术方案具有重要的市场价值。The choice of the present invention to embed the watermark on the phase spectrum of the audio signal relies on the insensitivity of the human ear to phase modification. By relaxing the judgment of the tonal region of the spectrum in psychoacoustic model 1, more tonal components are obtained, so that the calculated global masking threshold is more accurate. The triangular relationship between the modifiable amplitude and the modifiable phase angle is obtained. The masking threshold of the phase angle, so that the embedded strength of the watermark can be adaptively adjusted on the phase spectrum according to the masking threshold of the phase angle, to ensure that the audio watermark is not perceptible, and the embedded strength of the watermark is maximized to ensure the robustness of the audio watermark. sex. The technical solution of the present invention has important market value.

附图说明Description of drawings

图1是本发明实施例的嵌入子系统结构框图。FIG. 1 is a structural block diagram of an embedded subsystem according to an embodiment of the present invention.

图2是本发明实施例的检测子系统结构框图。FIG. 2 is a structural block diagram of a detection subsystem according to an embodiment of the present invention.

图3是本发明实施例的嵌入过程流程图FIG. 3 is a flowchart of an embedding process according to an embodiment of the present invention

图4是本发明实施例的检测过程流程图。FIG. 4 is a flowchart of a detection process according to an embodiment of the present invention.

具体实施方式Detailed ways

下面以具体实施例结合附图对本发明的技术方案作进一步说明。The technical solutions of the present invention will be further described below with specific embodiments in conjunction with the accompanying drawings.

本发明实施例提供的一种基于相位编码的自适应音频水印系统,包括音频水印嵌入子系统和自适应音频水印检测子系统。An adaptive audio watermarking system based on phase coding provided by an embodiment of the present invention includes an audio watermark embedding subsystem and an adaptive audio watermark detection subsystem.

参见图1,本发明实施例提供的基于相位编码的自适应音频水印嵌入子系统,包括第一时频转换模块1、第一嵌入范围选择模块2、第一扩频序列生成模块3、改进的心理声学模块4、加性嵌入模块5和时频逆变换模块6,具体实施时可以采用软件固化技术实现各模块。Referring to FIG. 1 , an adaptive audio watermark embedding subsystem based on phase coding provided by an embodiment of the present invention includes a first time-frequency conversion module 1, a first embedding range selection module 2, a first spread spectrum sequence generation module 3, an improved The psychoacoustic module 4 , the additive embedding module 5 and the time-frequency inverse transformation module 6 can be implemented by software solidification technology.

所述第一时频转换模块1,用于将读取到的时域音频信号转换为频域信号,并将时域音频信号的相关信息以及频域信号输出给第一嵌入范围选择模块2;The first time-frequency conversion module 1 is used to convert the read time-domain audio signal into a frequency-domain signal, and output the relevant information of the time-domain audio signal and the frequency-domain signal to the first embedding range selection module 2;

所述第一嵌入范围选择模块2,根据读取到的时域音频信号的信息(采样率)和频域信号以及人耳较为敏感的频率范围计算此频域信号可以嵌入水印的范围,将该嵌入范围的最大值和最小值输出给第一扩频序列生成模块3;The first embedding range selection module 2 calculates the range in which the watermark can be embedded in the frequency domain signal according to the read information (sampling rate) of the time-domain audio signal, the frequency-domain signal and the frequency range to which the human ear is more sensitive. The maximum value and the minimum value of the embedded range are output to the first spreading sequence generation module 3;

所述第一扩频序列生成模块3,用于根据随机数种子和嵌入范围选择模块2输入的嵌入范围的最大值和最小值生成与嵌入范围同长度的幅值为1或-1均匀分布的随机序列,并将此随机序列输出给加性嵌入模块5;The first spreading sequence generation module 3 is used for generating a uniformly distributed amplitude value of 1 or -1 with the same length as the embedding range according to the random number seed and the maximum and minimum values of the embedding range input by the embedding range selection module 2. random sequence, and output this random sequence to the additive embedding module 5;

所述改进的心理声学模块4,通过放宽心理声学模型一中有声调区域的判决条件,来得到更多的有声调区,从而提供更好的幅值掩蔽阈值,然后根据可改变阈值与原始幅值的三角关系得到可以调整的相位角度阈值,并将相位角度阈值输出给加性嵌入模块5;The improved psychoacoustic module 4 obtains more tonal regions by relaxing the judgment conditions of the tonal regions in the psychoacoustic model 1, thereby providing better amplitude masking thresholds, and then according to the variable threshold and the original amplitude. The triangular relationship of the values obtains an adjustable phase angle threshold, and outputs the phase angle threshold to the additive embedding module 5;

所述加性嵌入模块5,用于根据生成频域的带有水印信息的音频信号输出给时频逆变换模块6;The additive embedding module 5 is used to output the audio signal with watermark information in the frequency domain to the time-frequency inverse transform module 6;

所述时频逆变换模块6,用于将加性嵌入模块5输入的频域的带有水印信息的音频信号转换为时域的带有水印信息的音频信号,并生成音频文件,得到的带有水印信息的音频文件。The time-frequency inverse transformation module 6 is used to convert the audio signal with watermark information in the frequency domain input by the additive embedding module 5 into the audio signal with watermark information in the time domain, and generate an audio file, and the obtained band Audio files with watermark information.

参见图2,本发明实施例提供的基于相位编码的自适应音频水印检测子系统,包括第二时频转换模块7、第二嵌入范围选择模块8、第二扩频序列生成模块9、相关提取模块10,具体实施时可以采用软件固化技术实现各模块。Referring to FIG. 2 , the phase coding-based adaptive audio watermark detection subsystem provided by the embodiment of the present invention includes a second time-frequency conversion module 7, a second embedding range selection module 8, a second spread spectrum sequence generation module 9, and a correlation extraction module. Module 10, each module can be realized by software solidification technology during specific implementation.

所述第二时频转换模块7与模块1的功能基本相同,将产生的结果输出给嵌入范围选择模块8;The second time-frequency conversion module 7 has basically the same function as the module 1, and outputs the result to the embedded range selection module 8;

所述第二嵌入范围选择模块8与模块2的功能基本相同,将嵌入范围的最大值和最小值输出给扩频序列生成模块9,将嵌入范围内的信号输出相关检测模块10;The second embedding range selection module 8 has basically the same function as the module 2, and outputs the maximum value and the minimum value of the embedding range to the spread spectrum sequence generation module 9, and outputs the signal within the embedding range to the correlation detection module 10;

所述第二扩频序列生成模块9与模块3的功能基本相同,将产生的结果输出给相关检测模块10;The second spreading sequence generation module 9 has basically the same function as the module 3, and outputs the generated result to the correlation detection module 10;

所述相关检测模块10,用于根据嵌入范围选择模块8输入的信号和扩频序列生成模块9输入的扩频序列,计算相关值,根据相关值的符号,判断出水印。The correlation detection module 10 is used to calculate the correlation value according to the signal input by the embedding range selection module 8 and the spread spectrum sequence input by the spread spectrum sequence generation module 9, and determine the watermark according to the sign of the correlation value.

各模块具体实现参见方法相应步骤,本发明不予赘述。本发明实施例提供的一种基于相位编码的自适应音频水印方法,包括嵌入过程和检测过程。For the specific implementation of each module, refer to the corresponding steps of the method, which will not be repeated in the present invention. An adaptive audio watermarking method based on phase coding provided by an embodiment of the present invention includes an embedding process and a detection process.

参见图3,本发明实施例提供的基于相位编码的自适应音频水印嵌入过程,可以采用计算机软件技术手段自动进行流程,具体包括以下步骤:Referring to FIG. 3 , the phase coding-based adaptive audio watermark embedding process provided by the embodiment of the present invention can be automatically performed by using computer software technical means, and specifically includes the following steps:

步骤A1,读取音频文件,得到时域的音频信号x和采样率fs1,对时域信号先分帧(帧长用N表示,xn表示第n帧时域信号)再做时频变换(例如FFT快速傅里叶变换),分别取频域音频信号幅值谱Xn以及相位谱 Step A1, read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, first divide the time domain signal into frames (the frame length is represented by N, and x n represents the time domain signal of the nth frame), and then perform time-frequency transformation ( For example, FFT fast Fourier transform), take the frequency domain audio signal amplitude spectrum X n and phase spectrum respectively

步骤A2,根据采样率fs1,帧长度N以及人耳较为敏感的频率范围(本领域技术人员可根据人耳感知特性自行设定,例如1000-7000Hz)计算频域信号帧可以嵌入水印的范围,得到此范围的最大值为freqmax1,最小值为freqmin1,选取此范围内的频域音频信号;Step A2, according to the sampling rate fs1, the frame length N and the frequency range that the human ear is more sensitive to (those skilled in the art can set it by themselves according to the human ear perceptual characteristics, such as 1000-7000Hz) to calculate the range in which the watermark can be embedded in the frequency domain signal frame, The maximum value of this range is freqmax1, the minimum value is freqmin1, and the frequency domain audio signal within this range is selected;

freqmin1=floor((FWMIN×2.0/fs1)×N) (1)freqmin1=floor((FWMIN×2.0/fs1)×N) (1)

freqmax1=floor((FWMAX×2.0/fs1)×N) (2)freqmax1=floor((FWMAX×2.0/fs1)×N) (2)

FWMIN,FWMAX分别表示人耳较为敏感的最低频率和最高频率,即根据人耳感知敏感的频率部分预设的嵌入的开始频率、结束频率;floor是向下取整函数。FWMIN and FWMAX respectively represent the lowest frequency and the highest frequency that the human ear is more sensitive to, that is, the embedded start frequency and end frequency preset according to the frequency part that the human ear is sensitive to; floor is a downward rounding function.

步骤A3,利用密钥key作为随机数种子,生成长度为freqmax1-freqmin1+1的二进制伪随机的扩频序列u;Step A3, using the key key as a random number seed, generating a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;

在MATLAB中的实施例具体过程如下:The specific process of the embodiment in MATLAB is as follows:

首先,利用密钥key,调用RandStream函数(随机种子函数)对rand函数(随机数生成函数)进行初始化,然后调用rand函数生成随机数,由于rand函数生成的随机数是0~1之间的数,还需对这些数进行四舍五入变成0和1的二进制伪随机序列,然后将此单极性的伪随机序列,转为双极性只含有+1和-1的伪随机序列u。First, use the key key to call the RandStream function (random seed function) to initialize the rand function (random number generation function), and then call the rand function to generate a random number, since the random number generated by the rand function is a number between 0 and 1 , it is necessary to round these numbers into a binary pseudo-random sequence of 0 and 1, and then convert this unipolar pseudo-random sequence into a bi-polar pseudo-random sequence u that only contains +1 and -1.

步骤A4,修改ISO-MPEG心理声学模型一对于音调成分的判断,通过得到更多的音调成分来得到更准确的幅度信号的掩蔽阈值,对于最后的掩蔽阈值不采用子带内的最小掩蔽阈值,而是直接采用全局掩蔽阈值Thn,然后利用三角关系得到相位谱的相位掩蔽阈值θn。Step A4, revising ISO-MPEG psychoacoustic model-judgment for tonal components, obtain more accurate masking thresholds of amplitude signals by obtaining more tonal components, and do not adopt the minimum masking thresholds in the subband for the final masking thresholds, Instead, the global masking threshold Th n is directly adopted, and then the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship.

实施例具体过程如下:The specific process of the embodiment is as follows:

将ISO-MPEG心理声学模型一中频谱有声调区域判断条件,在功率谱Pn的局部最大值点k必须大于附近所有频率点7dB,修改为大于附近所有样本频率1dB,并且存在大于7dB的情况即可。In ISO-MPEG psychoacoustic model 1, the judgment condition of the spectrum with tone area is that the local maximum point k of the power spectrum P n must be greater than 7dB of all nearby frequency points, modified to be greater than 1dB of all nearby sample frequencies, and there are cases where it is greater than 7dB That's it.

其中,k表示所在的频率局部最大值点,j表示离局部最大值点k的距离,Pn[k]dB表示第n帧信号的在局部最大值点k处的信号功率,Pn[k-j]dB表示距离最大值点j处的信号功率值。Among them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB is the signal power value at the distance from the maximum point j.

基于以上修改后判断条件,得到对于音调成分的判断结果后,计算得到全局掩蔽阈值Thn。全局掩蔽阈值为信号幅度在不失真情况下可修改的最大的值。在实轴和虚轴组成的二维平面内,针对频域点,以掩蔽阈值为半径构成的圆为该频域点可以修改的区域,当修改后的频域点与原点的连线与圆相切时,变动的相位值最大,即为相位角度可变的最大值,作为相位掩蔽阈值,利用三角关系可以得到相位掩蔽阈值θn Based on the above modified judgment conditions, after obtaining the judgment result for the pitch component, the global masking threshold Th n is obtained by calculation. The global masking threshold is the maximum value at which the signal amplitude can be modified without distortion. In the two-dimensional plane composed of the real axis and the imaginary axis, for the frequency domain point, the circle formed by the masking threshold as the radius is the area where the frequency domain point can be modified. When the phase is tangent, the variable phase value is the largest, which is the maximum variable phase angle. As the phase masking threshold, the phase masking threshold θ n can be obtained by using the triangular relationship.

步骤A5,根据伪随机序列u、相位掩蔽阈值θn和水印比特b,利用下面的公式在音频的相位谱上进行水印的嵌入,得到嵌入水印后的相位谱 Step A5, according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b, use the following formula in the audio phase spectrum Embed the watermark on and get the phase spectrum after embedding the watermark

其中,α为常数,控制水印嵌入的强度,具体实施时本领域技术人员可预设取值。Among them, α is a constant, which controls the strength of watermark embedding, and can be preset by those skilled in the art during specific implementation.

利用频域信号的幅度谱Xn和嵌入水印后的相位谱然后通过欧拉公式得到嵌入水印后的频域信号Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula

其中,Yn为嵌入水印后的频域信号,e为自然指数。Among them, Y n is the frequency domain signal after embedding the watermark, and e is the natural exponent.

步骤A6,将嵌入水印后的频域信号Yn变换到时域信号yn,最后生成音频文件,即得到带有水印的音频文件。Step A6, transform the frequency domain signal Y n after embedding the watermark into the time domain signal yn , and finally generate an audio file, that is, obtain an audio file with a watermark.

各模块具体实现参见方法相应步骤,本发明不予赘述。本发明实施例提供的基于相位编码的自适应音频水印检测方法,包括嵌入过程和检测过程。For the specific implementation of each module, refer to the corresponding steps of the method, which will not be repeated in the present invention. The phase coding-based adaptive audio watermark detection method provided by the embodiment of the present invention includes an embedding process and a detection process.

参见图4,本发明实施例提供的基于相位编码的自适应音频水印检测方式,可以采用计算机软件技术手段自动进行流程,具体包括以下步骤:Referring to FIG. 4 , the phase coding-based adaptive audio watermark detection method provided by the embodiment of the present invention can automatically carry out the process by using computer software technical means, and specifically includes the following steps:

步骤B1,读取带有水印的时域音频文件,得到时域的带有水印的音频信号的幅值数据z和采样率fs2,对时域信号先分帧(帧长同样为N,zn为待检测信号的第n帧)再做时频变换(例如FFT快速傅里叶变换),得到频域的音频信号的幅度谱Zn和相位谱ξn。Step B1, read the time domain audio file with the watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time domain, divide the time domain signal into frames (the frame length is also N, z n ) Perform time-frequency transformation (eg, FFT fast Fourier transform) for the nth frame of the signal to be detected, to obtain the amplitude spectrum Z n and phase spectrum ξ n of the audio signal in the frequency domain.

步骤B2,根据采样率fs2,帧长度N以及人耳较为敏感的频率范围计算此频域信号可以嵌入水印的范围,得到此范围的最大值为freqmax2,最小值为freqmin2,选取此范围内的音频的幅度谱;Step B2, according to the sampling rate fs2, the frame length N and the frequency range that the human ear is more sensitive to, calculate the range in which the watermark can be embedded in the frequency domain signal, and obtain the maximum value of this range as freqmax2 and the minimum value as freqmin2, and select the audio frequency within this range. The magnitude spectrum of ;

freqmin2=floor((FWMIN×2.0/fs2)×N) (8)freqmin2=floor((FWMIN×2.0/fs2)×N) (8)

freqmax2=floor((FWMAX×2.0/fs2)×N) (9)freqmax2=floor((FWMAX×2.0/fs2)×N) (9)

FWMIN,FWMAX分别表示人耳较为敏感的最低频率和最高频率,即根据人耳感知敏感的频率部分预设的嵌入的开始频率、结束频率;floor是MATLAB里面的向下取整函数。FWMIN and FWMAX respectively represent the lowest frequency and the highest frequency that the human ear is more sensitive to, that is, the embedded start frequency and end frequency preset according to the frequency part that the human ear is sensitive to; floor is the round-down function in MATLAB.

步骤B3,利用密钥key,采取和水印嵌入时一样的方式生成双极性只有+1和-1的二值伪随机序列u。即利用密钥key作为随机数种子,生成长度为freqmax2-freqmin2+1的二进制伪随机的扩频序列u。Step B3, use the key key to generate a binary pseudo-random sequence u with bipolar only +1 and -1 in the same way as the watermark embedding. That is, using the key key as a random number seed, a binary pseudo-random spreading sequence u with a length of freqmax2-freqmin2+1 is generated.

步骤B4,根据相关统计检验公式(10),对伪随机序列u和待检测信号第n帧的相位谱ξn,做相关计算,得到待检信号第n帧信号的检测充分统计量rn。Step B4, according to the correlation statistical test formula (10), perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected, and obtain the detection sufficient statistic rn of the signal of the nth frame of the signal to be detected.

式中<·>表示信号的内积计算。where <·> represents the inner product calculation of the signal.

如果检测充分统计量rn≥0,那么检测到的水印比特b=1;否则为b=0。If the detected sufficient statistic rn ≥ 0, then the detected watermark bit is b=1; otherwise, b=0.

本发明中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described in the present invention are merely illustrative of the spirit of the present invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4