ææ¯é¢åtechnical field
æ¬åææ¶åæ°åé³é¢æ°´å°é¢åï¼å°¤å ¶æ¶ååºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ¹æ³åç³»ç»ãThe invention relates to the field of digital audio watermarking, in particular to an adaptive audio watermarking method and system based on phase coding.
èæ¯ææ¯Background technique
æ°åé³é¢æ°´å°æ¯åé³é¢ä¿¡å·ä¸æ·»å æäºæ°åä¿¡æ¯ä»¥è¾¾å°æä»¶ç伪é´å«ãçæä¿æ¤ãä¿¡æ¯éèçç®ççä¿¡å·å¤çæä½ãåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°ç³»ç»æ¯ææ ¹æ®å¿ç声妿¨¡åå¨ç¸ä½è°±ä¸å¨æçè°æ´æ°´å°çåµå ¥å¼ºåº¦ï¼ç¡®ä¿é³é¢æ°´å°å¨æ»¡è¶³ä¸å¯æç¥æ§çæ¡ä»¶ä¸é²æ£æ§æå¤§ãä¼ ç»çåºäºç¸ä½ç¼ç çé³é¢æ°´å°åµå ¥ç®æ³ï¼ç´æ¥å¨ç¸ä½è°±ä¸æ·»å åºå®å¼ºåº¦æ°´å°ã妿åµå ¥æ°´å°ç强度è¿å¤§ï¼å¾å®¹æäº§çåªå£°ï¼å½±åé³è´¨ï¼å¦æåµå ¥æ°´å°ç强度è¿å°ï¼å¨æ£æµæ¯ä¸ææ£æ¥åºæ¥ï¼å½±å鲿£æ§ãå¦å¤é³é¢ä¿¡å·æ¯å¨æååçï¼å³ä½¿å¨æäºåºååµå ¥ç强度éå®ï¼ä½å¯¹å¦ä¸äºåºåæ¥è¯´å¯è½åµå ¥å¼ºåº¦è¿å¤§æè¿å°ãè¿æ ·çæ°´å°åµå ¥æ¹å¼ä½¿é³é¢æ°´å°ä¸è½åæ¶æ»¡è¶³ä¸å¯æç¥æ§å鲿£æ§ãDigital audio watermarking is a signal processing operation that adds some digital information to the audio signal to achieve the purpose of document authenticity identification, copyright protection, and information hiding. The adaptive audio watermarking system based on phase coding refers to dynamically adjusting the embedding strength of the watermark on the phase spectrum according to the psychoacoustic model, so as to ensure that the audio watermark is the most robust under the condition of imperceptibility. The traditional audio watermark embedding algorithm based on phase encoding directly adds a fixed intensity watermark on the phase spectrum. If the strength of the embedded watermark is too large, it is easy to generate noise and affect the sound quality; if the strength of the embedded watermark is too small, it is difficult to detect it during detection, which affects the robustness. In addition, the audio signal is dynamically changing, and even if the embedded strength is suitable in some areas, it may be too strong or too weak for other areas. Such watermark embedding method makes audio watermark unable to satisfy imperceptibility and robustness at the same time.
åæå 容SUMMARY OF THE INVENTION
æ¬åæçç®çæ¯æä¾åºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°ææ¯æ¹æ¡ï¼ä½¿å¨ç¸ä½è°±çæ°´å°åµå ¥å¼ºåº¦æ ¹æ®é³é¢ä¿¡å·èªéåºçè°æ´ï¼æ¥è¾¾å°é³é¢æ°´å°ä¸å¯æç¥æ§å鲿£æ§çæè¡·ãThe purpose of the present invention is to provide an adaptive audio watermarking technical solution based on phase coding, so that the watermark embedding strength in the phase spectrum can be adaptively adjusted according to the audio signal, so as to achieve a compromise between the imperceptibility and robustness of the audio watermark.
æ¬åæææ¯æ¹æ¡æä¾ä¸ç§åºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ¹æ³ï¼å æ¬åµå ¥è¿ç¨åæ£æµè¿ç¨ï¼The technical scheme of the present invention provides an adaptive audio watermarking method based on phase coding, which includes an embedding process and a detection process,
æè¿°åµå ¥è¿ç¨å æ¬ä»¥ä¸æ¥éª¤ï¼The embedding process includes the following steps,
æ¥éª¤A1ï¼è¯»åé³é¢æä»¶ï¼å¾å°æ¶åçé³é¢ä¿¡å·xåéæ ·çfs1ï¼å¯¹æ¶åçé³é¢ä¿¡å·xå å帧ï¼å¸§é¿ç¨N表示ï¼xn表示第n帧æ¶åä¿¡å·ï¼ååæ¶é¢åæ¢ï¼å¾å°é¢åçé³é¢ä¿¡å·çå¹ å¼è°±Xn以åç¸ä½è°± Step A1, read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, divide the audio signal x in the time domain into frames, the frame length is represented by N, and x n represents the time domain signal of the nth frame, and then do frequency transform to obtain the amplitude spectrum X n and the phase spectrum of the audio signal in the frequency domain
æ¥éª¤A2ï¼æ ¹æ®éæ ·çfs1ã帧é¿åº¦Nä»¥åæ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çFWMINãç»æé¢çFWMAXï¼è®¡ç®å帧é¢åä¿¡å·è½å¤åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼freqmax1åæå°å¼freqmin1ï¼é忤èå´å çé¢åé³é¢ä¿¡å·ï¼Step A2, according to the sampling rate fs1, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception, calculate the range that the frequency domain signal of each frame can embed the watermark, and obtain the maximum value of this range. The value freqmax1 and the minimum value freqmin1, select the frequency domain audio signal within this range;
freqmin1ï¼floor((FWMINÃ2.0/fs1)ÃN)freqmin1=floor((FWMINÃ2.0/fs1)ÃN)
freqmax1ï¼floor((FWMAXÃ2.0/fs1)ÃN)freqmax1=floor((FWMAXÃ2.0/fs1)ÃN)
å ¶ä¸ï¼flooræ¯åä¸åæ´å½æ°ï¼Among them, floor is the round-down function;
æ¥éª¤A3ï¼å©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax1-freqmin1+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuï¼Step A3, using the key key as a random number seed, generating a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;
æ¥éª¤A4ï¼è¿è¡å¯¹äºé³è°æåç夿å¦ä¸ï¼Step A4, the judgment of the pitch component is as follows,
å ¶ä¸ï¼k表示æå¨çé¢çå±é¨æå¤§å¼ç¹ï¼j表示离å±é¨æå¤§å¼ç¹kçè·ç¦»ï¼Pn[k]dB表示第n帧信å·çå¨å±é¨æå¤§å¼ç¹kå¤çä¿¡å·åçï¼Pn[k-j]dB表示è·ç¦»æå¤§å¼ç¹jå¤çä¿¡å·åçå¼ï¼Among them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB represents the signal power value at the distance from the maximum point j;
æ ¹æ®å¤æç»æï¼éç¨å ¨å±æ©è½éå¼Thnï¼å¾å°ç¸ä½è°±çç¸ä½æ©è½éå¼Î¸nï¼According to the judgment result, adopt the global masking threshold Th n to obtain the phase masking threshold θ n of the phase spectrum;
æ¥éª¤A5ï¼æ ¹æ®ä¼ªéæºåºåuãç¸ä½æ©è½éå¼Î¸nåæ°´å°æ¯ç¹bï¼å©ç¨ä»¥ä¸å ¬å¼å¨é³é¢çç¸ä½è°±ä¸è¿è¡æ°´å°çåµå ¥ï¼å¾å°åµå ¥æ°´å°åçç¸ä½è°±å¦ä¸ï¼Step A5, according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b, use the following formula in the audio phase spectrum Embed the watermark on and get the phase spectrum after embedding the watermark as follows,
å ¶ä¸ï¼Î±ä¸ºå¸¸æ°ï¼æ§å¶æ°´å°åµå ¥ç强度ï¼Among them, α is a constant, which controls the strength of watermark embedding;
å©ç¨é¢åä¿¡å·çå¹ åº¦è°±Xnååµå ¥æ°´å°åçç¸ä½è°±ç¶åéè¿æ¬§æå ¬å¼å¾å°åµå ¥æ°´å°åçé¢åä¿¡å·å¦ä¸ï¼Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula as follows,
å ¶ä¸ï¼Yn为åµå ¥æ°´å°åçé¢åä¿¡å·ï¼e为èªç¶ææ°ï¼Among them, Y n is the frequency domain signal after embedding the watermark, and e is the natural index;
æ¥éª¤A6ï¼å°åµå ¥æ°´å°åçé¢åä¿¡å·Yn忢尿¶åä¿¡å·ynï¼çæå¸¦ææ°´å°çé³é¢æä»¶ï¼æè¿°æ£æµè¿ç¨å æ¬ä»¥ä¸æ¥éª¤ï¼Step A6, transform the frequency-domain signal Y n after embedding the watermark into the time-domain signal y n to generate an audio file with a watermark; the detection process comprises the following steps,
æ¥éª¤B1ï¼è¯»åå¸¦ææ°´å°çæ¶åé³é¢æä»¶ï¼å¾å°æ¶åçå¸¦ææ°´å°çé³é¢ä¿¡å·çå¹ å¼æ°æ®zåéæ ·çfs2ï¼å¯¹æ¶åä¿¡å·å å帧ï¼å¸§é¿ä¸ºNï¼znä¸ºå¾ æ£æµä¿¡å·ç第n帧ï¼ååæ¶é¢åæ¢ï¼å¾å°é¢åçé³é¢ä¿¡å·çå¹ åº¦è°±Znåç¸ä½è°±Î¾nï¼Step B1, read the time domain audio file with the watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time domain, divide the time domain signal into frames, the frame length is N, and z n is: The nth frame of the signal to be detected; do time-frequency transformation again to obtain the amplitude spectrum Z n and the phase spectrum ξ n of the audio signal in the frequency domain;
æ¥éª¤B2ï¼æ ¹æ®éæ ·çfs2ï¼å¸§é¿åº¦Nä»¥åæ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çFWMINãç»æé¢çFWMAXï¼è®¡ç®å帧é¢åä¿¡å·è½å¤åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼freqmax2åæå°å¼freqmin2ï¼é忤èå´å çé³é¢çå¹ åº¦è°±ï¼Step B2, according to the sampling rate fs2, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception, calculate the range that the frequency domain signal of each frame can embed the watermark, and obtain the maximum value of this range. The value freqmax2 and the minimum value freqmin2, select the amplitude spectrum of the audio within this range;
freqmin2ï¼floor((FWMINÃ2.0/fs2)ÃN)freqmin2=floor((FWMINÃ2.0/fs2)ÃN)
freqmax2ï¼floor((FWMAXÃ2.0/fs2)ÃN)freqmax2=floor((FWMAXÃ2.0/fs2)ÃN)
å ¶ä¸ï¼flooræ¯åä¸åæ´å½æ°ï¼Among them, floor is the round-down function;
æ¥éª¤B3ï¼å©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax2-freqmin2+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuï¼Step B3, use the key key as the random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax2-freqmin2+1;
æ¥éª¤B4ï¼æ ¹æ®ä»¥ä¸ç¸å ³ç»è®¡æ£éªå ¬å¼ï¼å¯¹ä¼ªéæºåºåuåå¾ æ£æµä¿¡å·ç¬¬n帧çç¸ä½è°±Î¾nï¼åç¸å ³è®¡ç®ï¼å¾å°å¾ æ£ä¿¡å·ç¬¬n帧信å·çæ£æµå åç»è®¡érnï¼Step B4, according to the following correlation statistical test formula, perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected, and obtain the detection sufficient statistic rn of the signal of the nth frame of the signal to be detected;
å ¶ä¸ï¼<·>表示信å·çå 积计ç®ï¼Among them, <·> represents the inner product calculation of the signal;
å¦ææ£æµå åç»è®¡érnâ¥0ï¼é£ä¹æ£æµå°çæ°´å°æ¯ç¹bï¼1ï¼å¦å为bï¼0ãIf the detected sufficient statistic rn ⥠0, then the detected watermark bit is b=1; otherwise, b=0.
èä¸ï¼æ¥éª¤A4ä¸ï¼å©ç¨ä¸è§å ³ç³»å¾å°ç¸ä½è°±çç¸ä½æ©è½éå¼Î¸nï¼ Moreover, in step A4, the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship,
æ¬åæç¸åºæä¾ä¸ç§åºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°ç³»ç»ï¼å æ¬é³é¢æ°´å°åµå ¥åç³»ç»åèªéåºé³é¢æ°´å°æ£æµåç³»ç»ï¼The present invention correspondingly provides an adaptive audio watermarking system based on phase coding, comprising an audio watermark embedding subsystem and an adaptive audio watermark detection subsystem,
æè¿°é³é¢æ°´å°åµå ¥åç³»ç»å æ¬ä»¥ä¸æ¨¡åï¼The audio watermark embedding subsystem includes the following modules,
ç¬¬ä¸æ¶é¢è½¬æ¢æ¨¡åï¼ç¨äºè¯»åé³é¢æä»¶ï¼å¾å°æ¶åçé³é¢ä¿¡å·xåéæ ·çfs1ï¼å¯¹æ¶åçé³é¢ä¿¡å·xå å帧ï¼å¸§é¿ç¨N表示ï¼xn表示第n帧æ¶åä¿¡å·ï¼ååæ¶é¢åæ¢ï¼å¾å°é¢åçé³é¢ä¿¡å·çå¹ å¼è°±Xn以åç¸ä½è°± The first time-frequency conversion module is used to read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, divide the audio signal x in the time domain into frames, the frame length is represented by N, and x n represents the nth frame. domain signal, and then do time-frequency transformation to obtain the amplitude spectrum X n and phase spectrum of the audio signal in the frequency domain
第ä¸åµå ¥èå´éæ©æ¨¡åï¼ç¨äºæ ¹æ®éæ ·çfs1ã帧é¿åº¦Nä»¥åæ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çFWMINãç»æé¢çFWMAXï¼è®¡ç®å帧é¢åä¿¡å·è½å¤åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼freqmax1åæå°å¼freqmin1ï¼é忤èå´å çé¢åé³é¢ä¿¡å·ï¼The first embedding range selection module is used to calculate the range in which the watermark can be embedded in the frequency domain signal of each frame according to the sampling rate fs1, the frame length N, and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception , get the maximum value freqmax1 and minimum value freqmin1 in this range, and select the frequency domain audio signal within this range;
freqmin1ï¼floor((FWMINÃ2.0/fs1)ÃN)freqmin1=floor((FWMINÃ2.0/fs1)ÃN)
freqmax1ï¼floor((FWMAXÃ2.0/fs1)ÃN)freqmax1=floor((FWMAXÃ2.0/fs1)ÃN)
å ¶ä¸ï¼flooræ¯åä¸åæ´å½æ°ï¼Among them, floor is the round-down function;
ç¬¬ä¸æ©é¢åºåçææ¨¡åï¼ç¨äºå©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax1-freqmin1+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuï¼The first spread spectrum sequence generation module is used for using the key key as a random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;
æ¹è¿çå¿ç声妿¨¡åï¼ç¨äºè¿è¡å¯¹äºé³è°æåç夿å¦ä¸ï¼The improved psychoacoustic module is used to judge the tonal components as follows,
å ¶ä¸ï¼k表示æå¨çé¢çå±é¨æå¤§å¼ç¹ï¼j表示离å±é¨æå¤§å¼ç¹kçè·ç¦»ï¼Pn[k]dB表示第n帧信å·çå¨å±é¨æå¤§å¼ç¹kå¤çä¿¡å·åçï¼Pn[k-j]dB表示è·ç¦»æå¤§å¼ç¹jå¤çä¿¡å·åçå¼ï¼Among them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB represents the signal power value at the distance from the maximum point j;
æ ¹æ®å¤æç»æï¼éç¨å ¨å±æ©è½éå¼Thnï¼å¾å°ç¸ä½è°±çç¸ä½æ©è½éå¼Î¸nï¼According to the judgment result, adopt the global masking threshold Th n to obtain the phase masking threshold θ n of the phase spectrum;
å æ§åµå ¥æ¨¡åï¼ç¨äºæ ¹æ®ä¼ªéæºåºåuãç¸ä½æ©è½éå¼Î¸nåæ°´å°æ¯ç¹bï¼å©ç¨ä»¥ä¸å ¬å¼å¨é³é¢çç¸ä½è°±ä¸è¿è¡æ°´å°çåµå ¥ï¼å¾å°åµå ¥æ°´å°åçç¸ä½è°±å¦ä¸ï¼The additive embedding module is used to use the following formula in the phase spectrum of the audio according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b Embed the watermark on and get the phase spectrum after embedding the watermark as follows,
å ¶ä¸ï¼Î±ä¸ºå¸¸æ°ï¼æ§å¶æ°´å°åµå ¥ç强度ï¼Among them, α is a constant, which controls the strength of watermark embedding;
å©ç¨é¢åä¿¡å·çå¹ åº¦è°±Xnååµå ¥æ°´å°åçç¸ä½è°±ç¶åéè¿æ¬§æå ¬å¼å¾å°åµå ¥æ°´å°åçé¢åä¿¡å·å¦ä¸ï¼Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula as follows,
å ¶ä¸ï¼Yn为åµå ¥æ°´å°åçé¢åä¿¡å·ï¼e为èªç¶ææ°ï¼Among them, Y n is the frequency domain signal after embedding the watermark, and e is the natural index;
æ¶é¢éåæ¢æ¨¡åï¼ç¨äºå°åµå ¥æ°´å°åçé¢åä¿¡å·Yn忢尿¶åä¿¡å·ynï¼çæå¸¦ææ°´å°çé³é¢æä»¶ï¼a time-frequency inverse transform module, used to transform the frequency domain signal Y n after embedding the watermark into the time domain signal y n to generate an audio file with a watermark;
æè¿°èªéåºé³é¢æ°´å°æ£æµåç³»ç»å æ¬ä»¥ä¸æ¨¡åï¼The adaptive audio watermark detection subsystem includes the following modules,
ç¬¬äºæ¶é¢è½¬æ¢æ¨¡åï¼ç¨äºè¯»åå¸¦ææ°´å°çæ¶åé³é¢æä»¶ï¼å¾å°æ¶åçå¸¦ææ°´å°çé³é¢ä¿¡å·çå¹ å¼æ°æ®zåéæ ·çfs2ï¼å¯¹æ¶åä¿¡å·å å帧ï¼å¸§é¿ä¸ºNï¼znä¸ºå¾ æ£æµä¿¡å·ç第n帧ï¼ååæ¶é¢åæ¢ï¼å¾å°é¢åçé³é¢ä¿¡å·çå¹ åº¦è°±Znåç¸ä½è°±Î¾nï¼The second time-frequency conversion module is used to read the time-domain audio file with a watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time-domain, and divide the time-domain signal into frames first, and the frame length is is N, z n is the nth frame of the signal to be detected; do time-frequency transformation again to obtain the amplitude spectrum Zn and phase spectrum ξ n of the audio signal in the frequency domain;
第äºåµå ¥èå´éæ©æ¨¡åï¼ç¨äºæ ¹æ®éæ ·çfs2ï¼å¸§é¿åº¦Nä»¥åæ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çFWMINãç»æé¢çFWMAXï¼è®¡ç®å帧é¢åä¿¡å·è½å¤åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼freqmax2åæå°å¼freqmin2ï¼é忤èå´å çé³é¢çå¹ åº¦è°±ï¼The second embedding range selection module is used to calculate the range in which the watermark can be embedded in the frequency domain signal of each frame according to the sampling rate fs2, the frame length N and the embedded start frequency FWMIN and end frequency FWMAX preset according to the frequency part that is sensitive to human ear perception , get the maximum value freqmax2 and minimum value freqmin2 in this range, and select the amplitude spectrum of the audio in this range;
freqmin2ï¼floor((FWMINÃ2.0/fs2)ÃN)freqmin2=floor((FWMINÃ2.0/fs2)ÃN)
freqmax2ï¼floor((FWMAXÃ2.0/fs2)ÃN)freqmax2=floor((FWMAXÃ2.0/fs2)ÃN)
å ¶ä¸ï¼flooræ¯åä¸åæ´å½æ°ï¼Among them, floor is the round-down function;
ç¬¬äºæ©é¢åºåçææ¨¡åï¼ç¨äºå©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax2-freqmin2+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuï¼The second spread spectrum sequence generation module is used to use the key key as a random number seed to generate a binary pseudo-random spread spectrum sequence u with a length of freqmax2-freqmin2+1;
ç¸å ³æå模åï¼ç¨äºæ ¹æ®ä»¥ä¸ç¸å ³ç»è®¡æ£éªå ¬å¼ï¼å¯¹ä¼ªéæºåºåuåå¾ æ£æµä¿¡å·ç¬¬n帧çç¸ä½è°±Î¾nï¼åç¸å ³è®¡ç®ï¼å¾å°å¾ æ£ä¿¡å·ç¬¬n帧信å·çæ£æµå åç»è®¡érnï¼The correlation extraction module is used to perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected according to the following correlation statistical test formula, and obtain the detection sufficient statistic r n of the signal of the nth frame of the signal to be detected ;
å ¶ä¸ï¼<·>表示信å·çå 积计ç®ï¼Among them, <·> represents the inner product calculation of the signal;
å¦ææ£æµå åç»è®¡érnâ¥0ï¼é£ä¹æ£æµå°çæ°´å°æ¯ç¹bï¼1ï¼å¦å为bï¼0ãIf the detected sufficient statistic rn ⥠0, then the detected watermark bit is b=1; otherwise, b=0.
èä¸ï¼æ¹è¿çå¿ç声妿¨¡åä¸ï¼å©ç¨ä¸è§å ³ç³»å¾å°ç¸ä½è°±çç¸ä½æ©è½éå¼Î¸nï¼ Moreover, in the improved psychoacoustic module, the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship,
æ¬åæéæ©å¨é³é¢ä¿¡å·çç¸ä½è°±ä¸åµå ¥æ°´å°ä¾èµäºäººè³å¯¹ç¸ä½ä¿®æ¹ç䏿æãéè¿æ¾å®½å¯¹å¿ç声妿¨¡åä¸ä¸é¢è°±æé³è°åºåçå¤æï¼æ¥å¾å°æ´å¤çæé³è°çæåï¼ä½¿å¾è®¡ç®çå ¨å±æ©è½é弿´ç²¾ç¡®ï¼å©ç¨å¯ä¿®æ¹çå¹ åº¦ä¸å¯ä¿®æ¹ç¸ä½è§åº¦ä¹é´çä¸è§å ³ç³»å¾å°ç¸ä½è§åº¦çæ©è½éå¼ï¼ä»èè½å¤å¨ç¸ä½è°±ä¸æ ¹æ®ç¸ä½è§åº¦çæ©è½éå¼èªéåºçè°æ´æ°´å°çåµå ¥å¼ºåº¦ï¼ç¡®ä¿é³é¢æ°´å°å¨ä¸å¯æç¥çæ åµä¸ï¼ä½¿æ°´å°çåµå ¥å¼ºåº¦æå¤§æ¥ç¡®ä¿é³é¢æ°´å°ç鲿£æ§ãæ¬åæææ¯æ¹æ¡å ·æéè¦çå¸åºä»·å¼ãThe choice of the present invention to embed the watermark on the phase spectrum of the audio signal relies on the insensitivity of the human ear to phase modification. By relaxing the judgment of the tonal region of the spectrum in psychoacoustic model 1, more tonal components are obtained, so that the calculated global masking threshold is more accurate. The triangular relationship between the modifiable amplitude and the modifiable phase angle is obtained. The masking threshold of the phase angle, so that the embedded strength of the watermark can be adaptively adjusted on the phase spectrum according to the masking threshold of the phase angle, to ensure that the audio watermark is not perceptible, and the embedded strength of the watermark is maximized to ensure the robustness of the audio watermark. sex. The technical solution of the present invention has important market value.
éå¾è¯´æDescription of drawings
å¾1æ¯æ¬åæå®æ½ä¾çåµå ¥åç³»ç»ç»ææ¡å¾ãFIG. 1 is a structural block diagram of an embedded subsystem according to an embodiment of the present invention.
å¾2æ¯æ¬åæå®æ½ä¾çæ£æµåç³»ç»ç»ææ¡å¾ãFIG. 2 is a structural block diagram of a detection subsystem according to an embodiment of the present invention.
å¾3æ¯æ¬åæå®æ½ä¾çåµå ¥è¿ç¨æµç¨å¾FIG. 3 is a flowchart of an embedding process according to an embodiment of the present invention
å¾4æ¯æ¬åæå®æ½ä¾çæ£æµè¿ç¨æµç¨å¾ãFIG. 4 is a flowchart of a detection process according to an embodiment of the present invention.
å ·ä½å®æ½æ¹å¼Detailed ways
ä¸é¢ä»¥å ·ä½å®æ½ä¾ç»åéå¾å¯¹æ¬åæçææ¯æ¹æ¡ä½è¿ä¸æ¥è¯´æãThe technical solutions of the present invention will be further described below with specific embodiments in conjunction with the accompanying drawings.
æ¬åæå®æ½ä¾æä¾çä¸ç§åºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°ç³»ç»ï¼å æ¬é³é¢æ°´å°åµå ¥åç³»ç»åèªéåºé³é¢æ°´å°æ£æµåç³»ç»ãAn adaptive audio watermarking system based on phase coding provided by an embodiment of the present invention includes an audio watermark embedding subsystem and an adaptive audio watermark detection subsystem.
åè§å¾1ï¼æ¬åæå®æ½ä¾æä¾çåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°åµå ¥åç³»ç»ï¼å æ¬ç¬¬ä¸æ¶é¢è½¬æ¢æ¨¡å1ã第ä¸åµå ¥èå´éæ©æ¨¡å2ãç¬¬ä¸æ©é¢åºåçææ¨¡å3ãæ¹è¿çå¿ç声妿¨¡å4ãå æ§åµå ¥æ¨¡å5åæ¶é¢éåæ¢æ¨¡å6ï¼å ·ä½å®æ½æ¶å¯ä»¥éç¨è½¯ä»¶åºåææ¯å®ç°å模åãReferring to FIG. 1 , an adaptive audio watermark embedding subsystem based on phase coding provided by an embodiment of the present invention includes a first time-frequency conversion module 1, a first embedding range selection module 2, a first spread spectrum sequence generation module 3, an improved The psychoacoustic module 4 , the additive embedding module 5 and the time-frequency inverse transformation module 6 can be implemented by software solidification technology.
æè¿°ç¬¬ä¸æ¶é¢è½¬æ¢æ¨¡å1ï¼ç¨äºå°è¯»åå°çæ¶åé³é¢ä¿¡å·è½¬æ¢ä¸ºé¢åä¿¡å·ï¼å¹¶å°æ¶åé³é¢ä¿¡å·çç¸å ³ä¿¡æ¯ä»¥åé¢åä¿¡å·è¾åºç»ç¬¬ä¸åµå ¥èå´éæ©æ¨¡å2ï¼The first time-frequency conversion module 1 is used to convert the read time-domain audio signal into a frequency-domain signal, and output the relevant information of the time-domain audio signal and the frequency-domain signal to the first embedding range selection module 2;
æè¿°ç¬¬ä¸åµå ¥èå´éæ©æ¨¡å2ï¼æ ¹æ®è¯»åå°çæ¶åé³é¢ä¿¡å·çä¿¡æ¯(éæ ·ç)åé¢åä¿¡å·ä»¥å人è³è¾ä¸ºææçé¢çèå´è®¡ç®æ¤é¢åä¿¡å·å¯ä»¥åµå ¥æ°´å°çèå´ï¼å°è¯¥åµå ¥èå´çæå¤§å¼åæå°å¼è¾åºç»ç¬¬ä¸æ©é¢åºåçææ¨¡å3ï¼The first embedding range selection module 2 calculates the range in which the watermark can be embedded in the frequency domain signal according to the read information (sampling rate) of the time-domain audio signal, the frequency-domain signal and the frequency range to which the human ear is more sensitive. The maximum value and the minimum value of the embedded range are output to the first spreading sequence generation module 3;
æè¿°ç¬¬ä¸æ©é¢åºåçææ¨¡å3ï¼ç¨äºæ ¹æ®éæºæ°ç§åååµå ¥èå´éæ©æ¨¡å2è¾å ¥çåµå ¥èå´çæå¤§å¼åæå°å¼çæä¸åµå ¥èå´åé¿åº¦çå¹ å¼ä¸º1æ-1åååå¸çéæºåºåï¼å¹¶å°æ¤éæºåºåè¾åºç»å æ§åµå ¥æ¨¡å5ï¼The first spreading sequence generation module 3 is used for generating a uniformly distributed amplitude value of 1 or -1 with the same length as the embedding range according to the random number seed and the maximum and minimum values of the embedding range input by the embedding range selection module 2. random sequence, and output this random sequence to the additive embedding module 5;
æè¿°æ¹è¿çå¿ç声妿¨¡å4ï¼éè¿æ¾å®½å¿ç声妿¨¡åä¸ä¸æå£°è°åºåçå¤å³æ¡ä»¶ï¼æ¥å¾å°æ´å¤çæå£°è°åºï¼ä»èæä¾æ´å¥½çå¹ å¼æ©è½éå¼ï¼ç¶åæ ¹æ®å¯æ¹åéå¼ä¸åå§å¹ å¼çä¸è§å ³ç³»å¾å°å¯ä»¥è°æ´çç¸ä½è§åº¦éå¼ï¼å¹¶å°ç¸ä½è§åº¦éå¼è¾åºç»å æ§åµå ¥æ¨¡å5ï¼The improved psychoacoustic module 4 obtains more tonal regions by relaxing the judgment conditions of the tonal regions in the psychoacoustic model 1, thereby providing better amplitude masking thresholds, and then according to the variable threshold and the original amplitude. The triangular relationship of the values obtains an adjustable phase angle threshold, and outputs the phase angle threshold to the additive embedding module 5;
æè¿°å æ§åµå ¥æ¨¡å5ï¼ç¨äºæ ¹æ®çæé¢åçå¸¦ææ°´å°ä¿¡æ¯çé³é¢ä¿¡å·è¾åºç»æ¶é¢éåæ¢æ¨¡å6ï¼The additive embedding module 5 is used to output the audio signal with watermark information in the frequency domain to the time-frequency inverse transform module 6;
æè¿°æ¶é¢éåæ¢æ¨¡å6ï¼ç¨äºå°å æ§åµå ¥æ¨¡å5è¾å ¥çé¢åçå¸¦ææ°´å°ä¿¡æ¯çé³é¢ä¿¡å·è½¬æ¢ä¸ºæ¶åçå¸¦ææ°´å°ä¿¡æ¯çé³é¢ä¿¡å·ï¼å¹¶çæé³é¢æä»¶ï¼å¾å°çå¸¦ææ°´å°ä¿¡æ¯çé³é¢æä»¶ãThe time-frequency inverse transformation module 6 is used to convert the audio signal with watermark information in the frequency domain input by the additive embedding module 5 into the audio signal with watermark information in the time domain, and generate an audio file, and the obtained band Audio files with watermark information.
åè§å¾2ï¼æ¬åæå®æ½ä¾æä¾çåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ£æµåç³»ç»ï¼å æ¬ç¬¬äºæ¶é¢è½¬æ¢æ¨¡å7ã第äºåµå ¥èå´éæ©æ¨¡å8ãç¬¬äºæ©é¢åºåçææ¨¡å9ãç¸å ³æå模å10ï¼å ·ä½å®æ½æ¶å¯ä»¥éç¨è½¯ä»¶åºåææ¯å®ç°å模åãReferring to FIG. 2 , the phase coding-based adaptive audio watermark detection subsystem provided by the embodiment of the present invention includes a second time-frequency conversion module 7, a second embedding range selection module 8, a second spread spectrum sequence generation module 9, and a correlation extraction module. Module 10, each module can be realized by software solidification technology during specific implementation.
æè¿°ç¬¬äºæ¶é¢è½¬æ¢æ¨¡å7䏿¨¡å1çåè½åºæ¬ç¸åï¼å°äº§ççç»æè¾åºç»åµå ¥èå´éæ©æ¨¡å8ï¼The second time-frequency conversion module 7 has basically the same function as the module 1, and outputs the result to the embedded range selection module 8;
æè¿°ç¬¬äºåµå ¥èå´éæ©æ¨¡å8䏿¨¡å2çåè½åºæ¬ç¸åï¼å°åµå ¥èå´çæå¤§å¼åæå°å¼è¾åºç»æ©é¢åºåçææ¨¡å9ï¼å°åµå ¥èå´å çä¿¡å·è¾åºç¸å ³æ£æµæ¨¡å10ï¼The second embedding range selection module 8 has basically the same function as the module 2, and outputs the maximum value and the minimum value of the embedding range to the spread spectrum sequence generation module 9, and outputs the signal within the embedding range to the correlation detection module 10;
æè¿°ç¬¬äºæ©é¢åºåçææ¨¡å9䏿¨¡å3çåè½åºæ¬ç¸åï¼å°äº§ççç»æè¾åºç»ç¸å ³æ£æµæ¨¡å10ï¼The second spreading sequence generation module 9 has basically the same function as the module 3, and outputs the generated result to the correlation detection module 10;
æè¿°ç¸å ³æ£æµæ¨¡å10ï¼ç¨äºæ ¹æ®åµå ¥èå´éæ©æ¨¡å8è¾å ¥çä¿¡å·åæ©é¢åºåçææ¨¡å9è¾å ¥çæ©é¢åºåï¼è®¡ç®ç¸å ³å¼ï¼æ ¹æ®ç¸å ³å¼ç符å·ï¼å¤æåºæ°´å°ãThe correlation detection module 10 is used to calculate the correlation value according to the signal input by the embedding range selection module 8 and the spread spectrum sequence input by the spread spectrum sequence generation module 9, and determine the watermark according to the sign of the correlation value.
忍¡åå ·ä½å®ç°åè§æ¹æ³ç¸åºæ¥éª¤ï¼æ¬åæä¸äºèµè¿°ãæ¬åæå®æ½ä¾æä¾çä¸ç§åºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ¹æ³ï¼å æ¬åµå ¥è¿ç¨åæ£æµè¿ç¨ãFor the specific implementation of each module, refer to the corresponding steps of the method, which will not be repeated in the present invention. An adaptive audio watermarking method based on phase coding provided by an embodiment of the present invention includes an embedding process and a detection process.
åè§å¾3ï¼æ¬åæå®æ½ä¾æä¾çåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°åµå ¥è¿ç¨ï¼å¯ä»¥éç¨è®¡ç®æºè½¯ä»¶ææ¯ææ®µèªå¨è¿è¡æµç¨ï¼å ·ä½å æ¬ä»¥ä¸æ¥éª¤ï¼Referring to FIG. 3 , the phase coding-based adaptive audio watermark embedding process provided by the embodiment of the present invention can be automatically performed by using computer software technical means, and specifically includes the following steps:
æ¥éª¤A1ï¼è¯»åé³é¢æä»¶ï¼å¾å°æ¶åçé³é¢ä¿¡å·xåéæ ·çfs1ï¼å¯¹æ¶åä¿¡å·å å帧(帧é¿ç¨N表示ï¼xn表示第n帧æ¶åä¿¡å·)ååæ¶é¢åæ¢(ä¾å¦FFTå¿«éå éå¶åæ¢)ï¼åå«åé¢åé³é¢ä¿¡å·å¹ å¼è°±Xn以åç¸ä½è°± Step A1, read the audio file, obtain the audio signal x in the time domain and the sampling rate fs1, first divide the time domain signal into frames (the frame length is represented by N, and x n represents the time domain signal of the nth frame), and then perform time-frequency transformation ( For example, FFT fast Fourier transform), take the frequency domain audio signal amplitude spectrum X n and phase spectrum respectively
æ¥éª¤A2ï¼æ ¹æ®éæ ·çfs1ï¼å¸§é¿åº¦N以å人è³è¾ä¸ºææçé¢çèå´(æ¬é¢åææ¯äººå坿 ¹æ®äººè³æç¥ç¹æ§èªè¡è®¾å®ï¼ä¾å¦1000-7000Hz)计ç®é¢åä¿¡å·å¸§å¯ä»¥åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼ä¸ºfreqmax1ï¼æå°å¼ä¸ºfreqmin1ï¼é忤èå´å çé¢åé³é¢ä¿¡å·ï¼Step A2, according to the sampling rate fs1, the frame length N and the frequency range that the human ear is more sensitive to (those skilled in the art can set it by themselves according to the human ear perceptual characteristics, such as 1000-7000Hz) to calculate the range in which the watermark can be embedded in the frequency domain signal frame, The maximum value of this range is freqmax1, the minimum value is freqmin1, and the frequency domain audio signal within this range is selected;
freqmin1ï¼floor((FWMINÃ2.0/fs1)ÃN) (1)freqmin1=floor((FWMINÃ2.0/fs1)ÃN) (1)
freqmax1ï¼floor((FWMAXÃ2.0/fs1)ÃN) (2)freqmax1=floor((FWMAXÃ2.0/fs1)ÃN) (2)
FWMIN,FWMAXåå«è¡¨ç¤ºäººè³è¾ä¸ºææçæä½é¢çåæé«é¢çï¼å³æ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çãç»æé¢çï¼flooræ¯åä¸åæ´å½æ°ãFWMIN and FWMAX respectively represent the lowest frequency and the highest frequency that the human ear is more sensitive to, that is, the embedded start frequency and end frequency preset according to the frequency part that the human ear is sensitive to; floor is a downward rounding function.
æ¥éª¤A3ï¼å©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax1-freqmin1+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuï¼Step A3, using the key key as a random number seed, generating a binary pseudo-random spread spectrum sequence u with a length of freqmax1-freqmin1+1;
å¨MATLABä¸ç宿½ä¾å ·ä½è¿ç¨å¦ä¸ï¼The specific process of the embodiment in MATLAB is as follows:
é¦å ï¼å©ç¨å¯é¥keyï¼è°ç¨RandStream彿°(éæºç§å彿°)对rand彿°(éæºæ°çæå½æ°)è¿è¡åå§åï¼ç¶åè°ç¨rand彿°çæéæºæ°ï¼ç±äºrand彿°çæçéæºæ°æ¯0ï½1ä¹é´çæ°ï¼è¿é对è¿äºæ°è¿è¡åèäºå ¥åæ0å1çäºè¿å¶ä¼ªéæºåºåï¼ç¶åå°æ¤åææ§çä¼ªéæºåºåï¼è½¬ä¸ºåææ§åªå«æ+1å-1çä¼ªéæºåºåuãFirst, use the key key to call the RandStream function (random seed function) to initialize the rand function (random number generation function), and then call the rand function to generate a random number, since the random number generated by the rand function is a number between 0 and 1 , it is necessary to round these numbers into a binary pseudo-random sequence of 0 and 1, and then convert this unipolar pseudo-random sequence into a bi-polar pseudo-random sequence u that only contains +1 and -1.
æ¥éª¤A4ï¼ä¿®æ¹ISO-MPEGå¿ç声妿¨¡åä¸å¯¹äºé³è°æåç夿ï¼éè¿å¾å°æ´å¤çé³è°æåæ¥å¾å°æ´åç¡®çå¹ åº¦ä¿¡å·çæ©è½éå¼ï¼å¯¹äºæåçæ©è½éå¼ä¸éç¨å带å çæå°æ©è½éå¼ï¼èæ¯ç´æ¥éç¨å ¨å±æ©è½éå¼Thnï¼ç¶åå©ç¨ä¸è§å ³ç³»å¾å°ç¸ä½è°±çç¸ä½æ©è½éå¼Î¸nãStep A4, revising ISO-MPEG psychoacoustic model-judgment for tonal components, obtain more accurate masking thresholds of amplitude signals by obtaining more tonal components, and do not adopt the minimum masking thresholds in the subband for the final masking thresholds, Instead, the global masking threshold Th n is directly adopted, and then the phase masking threshold θ n of the phase spectrum is obtained by using the triangular relationship.
宿½ä¾å ·ä½è¿ç¨å¦ä¸ï¼The specific process of the embodiment is as follows:
å°ISO-MPEGå¿ç声妿¨¡åä¸ä¸é¢è°±æå£°è°åºå夿æ¡ä»¶ï¼å¨åçè°±Pnçå±é¨æå¤§å¼ç¹kå¿ é¡»å¤§äºéè¿ææé¢çç¹7dBï¼ä¿®æ¹ä¸ºå¤§äºéè¿æææ ·æ¬é¢ç1dBï¼å¹¶ä¸åå¨å¤§äº7dBçæ åµå³å¯ãIn ISO-MPEG psychoacoustic model 1, the judgment condition of the spectrum with tone area is that the local maximum point k of the power spectrum P n must be greater than 7dB of all nearby frequency points, modified to be greater than 1dB of all nearby sample frequencies, and there are cases where it is greater than 7dB That's it.
å ¶ä¸ï¼k表示æå¨çé¢çå±é¨æå¤§å¼ç¹ï¼j表示离å±é¨æå¤§å¼ç¹kçè·ç¦»ï¼Pn[k]dB表示第n帧信å·çå¨å±é¨æå¤§å¼ç¹kå¤çä¿¡å·åçï¼Pn[k-j]dB表示è·ç¦»æå¤§å¼ç¹jå¤çä¿¡å·åçå¼ãAmong them, k represents the frequency local maximum point, j represents the distance from the local maximum point k, P n [k] dB represents the signal power of the nth frame signal at the local maximum point k, P n [kj ] dB is the signal power value at the distance from the maximum point j.
åºäºä»¥ä¸ä¿®æ¹å夿æ¡ä»¶ï¼å¾å°å¯¹äºé³è°æåçå¤æç»æåï¼è®¡ç®å¾å°å ¨å±æ©è½éå¼Thnãå ¨å±æ©è½éå¼ä¸ºä¿¡å·å¹ 度å¨ä¸å¤±çæ åµä¸å¯ä¿®æ¹çæå¤§çå¼ãå¨å®è½´åèè½´ç»æçäºç»´å¹³é¢å ï¼é对é¢åç¹ï¼ä»¥æ©è½éå¼ä¸ºå徿æçå为该é¢åç¹å¯ä»¥ä¿®æ¹çåºåï¼å½ä¿®æ¹åçé¢åç¹ä¸åç¹çè¿çº¿ä¸åç¸åæ¶ï¼åå¨çç¸ä½å¼æå¤§ï¼å³ä¸ºç¸ä½è§åº¦å¯åçæå¤§å¼ï¼ä½ä¸ºç¸ä½æ©è½éå¼ï¼å©ç¨ä¸è§å ³ç³»å¯ä»¥å¾å°ç¸ä½æ©è½éå¼Î¸n Based on the above modified judgment conditions, after obtaining the judgment result for the pitch component, the global masking threshold Th n is obtained by calculation. The global masking threshold is the maximum value at which the signal amplitude can be modified without distortion. In the two-dimensional plane composed of the real axis and the imaginary axis, for the frequency domain point, the circle formed by the masking threshold as the radius is the area where the frequency domain point can be modified. When the phase is tangent, the variable phase value is the largest, which is the maximum variable phase angle. As the phase masking threshold, the phase masking threshold θ n can be obtained by using the triangular relationship.
æ¥éª¤A5ï¼æ ¹æ®ä¼ªéæºåºåuãç¸ä½æ©è½éå¼Î¸nåæ°´å°æ¯ç¹bï¼å©ç¨ä¸é¢çå ¬å¼å¨é³é¢çç¸ä½è°±ä¸è¿è¡æ°´å°çåµå ¥ï¼å¾å°åµå ¥æ°´å°åçç¸ä½è°± Step A5, according to the pseudo-random sequence u, the phase masking threshold θ n and the watermark bit b, use the following formula in the audio phase spectrum Embed the watermark on and get the phase spectrum after embedding the watermark
å ¶ä¸ï¼Î±ä¸ºå¸¸æ°ï¼æ§å¶æ°´å°åµå ¥ç强度ï¼å ·ä½å®æ½æ¶æ¬é¢åææ¯äººåå¯é¢è®¾åå¼ãAmong them, α is a constant, which controls the strength of watermark embedding, and can be preset by those skilled in the art during specific implementation.
å©ç¨é¢åä¿¡å·çå¹ åº¦è°±Xnååµå ¥æ°´å°åçç¸ä½è°±ç¶åéè¿æ¬§æå ¬å¼å¾å°åµå ¥æ°´å°åçé¢åä¿¡å·Using the amplitude spectrum X n of the frequency domain signal and the phase spectrum after embedding the watermark Then, the frequency domain signal after embedding the watermark is obtained by Euler's formula
å ¶ä¸ï¼Yn为åµå ¥æ°´å°åçé¢åä¿¡å·ï¼e为èªç¶ææ°ãAmong them, Y n is the frequency domain signal after embedding the watermark, and e is the natural exponent.
æ¥éª¤A6ï¼å°åµå ¥æ°´å°åçé¢åä¿¡å·Yn忢尿¶åä¿¡å·ynï¼æåçæé³é¢æä»¶ï¼å³å¾å°å¸¦ææ°´å°çé³é¢æä»¶ãStep A6, transform the frequency domain signal Y n after embedding the watermark into the time domain signal yn , and finally generate an audio file, that is, obtain an audio file with a watermark.
忍¡åå ·ä½å®ç°åè§æ¹æ³ç¸åºæ¥éª¤ï¼æ¬åæä¸äºèµè¿°ãæ¬åæå®æ½ä¾æä¾çåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ£æµæ¹æ³ï¼å æ¬åµå ¥è¿ç¨åæ£æµè¿ç¨ãFor the specific implementation of each module, refer to the corresponding steps of the method, which will not be repeated in the present invention. The phase coding-based adaptive audio watermark detection method provided by the embodiment of the present invention includes an embedding process and a detection process.
åè§å¾4ï¼æ¬åæå®æ½ä¾æä¾çåºäºç¸ä½ç¼ç çèªéåºé³é¢æ°´å°æ£æµæ¹å¼ï¼å¯ä»¥éç¨è®¡ç®æºè½¯ä»¶ææ¯ææ®µèªå¨è¿è¡æµç¨ï¼å ·ä½å æ¬ä»¥ä¸æ¥éª¤ï¼Referring to FIG. 4 , the phase coding-based adaptive audio watermark detection method provided by the embodiment of the present invention can automatically carry out the process by using computer software technical means, and specifically includes the following steps:
æ¥éª¤B1ï¼è¯»åå¸¦ææ°´å°çæ¶åé³é¢æä»¶ï¼å¾å°æ¶åçå¸¦ææ°´å°çé³é¢ä¿¡å·çå¹ å¼æ°æ®zåéæ ·çfs2ï¼å¯¹æ¶åä¿¡å·å å帧(帧é¿åæ ·ä¸ºNï¼znä¸ºå¾ æ£æµä¿¡å·ç第n帧)ååæ¶é¢åæ¢(ä¾å¦FFTå¿«éå éå¶åæ¢)ï¼å¾å°é¢åçé³é¢ä¿¡å·çå¹ åº¦è°±Znåç¸ä½è°±Î¾nãStep B1, read the time domain audio file with the watermark, obtain the amplitude data z and the sampling rate fs2 of the watermarked audio signal in the time domain, divide the time domain signal into frames (the frame length is also N, z n ) Perform time-frequency transformation (eg, FFT fast Fourier transform) for the nth frame of the signal to be detected, to obtain the amplitude spectrum Z n and phase spectrum ξ n of the audio signal in the frequency domain.
æ¥éª¤B2ï¼æ ¹æ®éæ ·çfs2ï¼å¸§é¿åº¦N以å人è³è¾ä¸ºææçé¢çèå´è®¡ç®æ¤é¢åä¿¡å·å¯ä»¥åµå ¥æ°´å°çèå´ï¼å¾å°æ¤èå´çæå¤§å¼ä¸ºfreqmax2ï¼æå°å¼ä¸ºfreqmin2ï¼é忤èå´å çé³é¢çå¹ åº¦è°±ï¼Step B2, according to the sampling rate fs2, the frame length N and the frequency range that the human ear is more sensitive to, calculate the range in which the watermark can be embedded in the frequency domain signal, and obtain the maximum value of this range as freqmax2 and the minimum value as freqmin2, and select the audio frequency within this range. The magnitude spectrum of ;
freqmin2ï¼floor((FWMINÃ2.0/fs2)ÃN) (8)freqmin2=floor((FWMINÃ2.0/fs2)ÃN) (8)
freqmax2ï¼floor((FWMAXÃ2.0/fs2)ÃN) (9)freqmax2=floor((FWMAXÃ2.0/fs2)ÃN) (9)
FWMIN,FWMAXåå«è¡¨ç¤ºäººè³è¾ä¸ºææçæä½é¢çåæé«é¢çï¼å³æ ¹æ®äººè³æç¥ææçé¢çé¨åé¢è®¾çåµå ¥çå¼å§é¢çãç»æé¢çï¼flooræ¯MATLABéé¢çåä¸åæ´å½æ°ãFWMIN and FWMAX respectively represent the lowest frequency and the highest frequency that the human ear is more sensitive to, that is, the embedded start frequency and end frequency preset according to the frequency part that the human ear is sensitive to; floor is the round-down function in MATLAB.
æ¥éª¤B3ï¼å©ç¨å¯é¥keyï¼éååæ°´å°åµå ¥æ¶ä¸æ ·çæ¹å¼çæåææ§åªæ+1å-1çäºå¼ä¼ªéæºåºåuãå³å©ç¨å¯é¥keyä½ä¸ºéæºæ°ç§åï¼çæé¿åº¦ä¸ºfreqmax2-freqmin2+1çäºè¿å¶ä¼ªéæºçæ©é¢åºåuãStep B3, use the key key to generate a binary pseudo-random sequence u with bipolar only +1 and -1 in the same way as the watermark embedding. That is, using the key key as a random number seed, a binary pseudo-random spreading sequence u with a length of freqmax2-freqmin2+1 is generated.
æ¥éª¤B4ï¼æ ¹æ®ç¸å ³ç»è®¡æ£éªå ¬å¼(10)ï¼å¯¹ä¼ªéæºåºåuåå¾ æ£æµä¿¡å·ç¬¬n帧çç¸ä½è°±Î¾nï¼åç¸å ³è®¡ç®ï¼å¾å°å¾ æ£ä¿¡å·ç¬¬n帧信å·çæ£æµå åç»è®¡érnãStep B4, according to the correlation statistical test formula (10), perform correlation calculation on the pseudorandom sequence u and the phase spectrum ξ n of the nth frame of the signal to be detected, and obtain the detection sufficient statistic rn of the signal of the nth frame of the signal to be detected.
å¼ä¸<·>表示信å·çå 积计ç®ãwhere <·> represents the inner product calculation of the signal.
å¦ææ£æµå åç»è®¡érnâ¥0ï¼é£ä¹æ£æµå°çæ°´å°æ¯ç¹bï¼1ï¼å¦å为bï¼0ãIf the detected sufficient statistic rn ⥠0, then the detected watermark bit is b=1; otherwise, b=0.
æ¬åæä¸ææè¿°çå ·ä½å®æ½ä¾ä» ä» æ¯å¯¹æ¬åæç²¾ç¥ä½ä¸¾ä¾è¯´æãæ¬åææå±ææ¯é¢åçææ¯äººåå¯ä»¥å¯¹ææè¿°çå ·ä½å®æ½ä¾ååç§åæ ·çä¿®æ¹æè¡¥å æéç¨ç±»ä¼¼çæ¹å¼æ¿ä»£ï¼ä½å¹¶ä¸ä¼å离æ¬åæçç²¾ç¥æè è¶ è¶æéæå©è¦æ±ä¹¦æå®ä¹çèå´ãThe specific embodiments described in the present invention are merely illustrative of the spirit of the present invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4