æ¬åææä¾äºä¸ç§å¾çå 容屿§è¯å«æ¹æ³åç³»ç»ï¼æ¹æ³å æ¬ï¼è®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ ¹æ®å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è®ç»çé卿¨¡åï¼æ ¹æ®è®ç»åççé卿¨¡åè¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãæ¬åæçä¼ç¹å¨äºï¼æ ¹æ®å¾çå¨ç½ç»ä¸è¢«è½¬è½½æä¼ æçæ°æ®å¯ä»¥è¯å«å¾ççå 容屿§ï¼å°¤å ¶å¯ä»¥ç¨äºå¤ææ¯å¦ä¸ºå¹¿åå¾çã
The present invention provides a method and system for identifying image content attributes. The method includes: calculating the relative number of reprints of multiple homologous image clusters for a specific resource site; training a filter model according to the multiple homologous image clusters and the corresponding relative reprint numbers ; Identify image content attributes in the target image cluster according to the trained filter model. The advantage of the present invention is that the content attribute of the picture can be identified according to the data that the picture is reproduced or disseminated on the network, especially for judging whether it is an advertisement picture.
Description Translated from Chinese å¾çå 容屿§è¯å«æ¹æ³åç³»ç»Image content attribute identification method and systemææ¯é¢åtechnical field
æ¬åææ¶åå¾åè¯å«é¢åï¼å ·ä½æ¶åä¸ç§å¾çå 容屿§è¯å«æ¹æ³åç³»ç»ãThe present invention relates to the field of image recognition, in particular to a method and system for identifying image content attributes.
èæ¯ææ¯Background technique
å¨ç½ç»ä¸å¾å¤ç±»åçèµæºç«ç¹ä¸ï¼é½ä¼åºç°ä¸äºå¹¿åå¾çï¼è¿äºå¹¿åå¾ççç§ç±»é常丰å¯ï¼å ¶å æ¬åç±»ååç广åï¼ä¾å¦ï¼å ³äºå¥¶ç²ãè¡£æç广åï¼ï¼åå®ä½ååºç广åï¼ä»¥åä¸äºå ¶ä»ç±»åç广åãOn many types of resource sites on the Internet, there will be some advertisement pictures. These advertisement pictures are very rich in types, including advertisements of various commodities (for example, advertisements about milk powder and clothes), advertisements of physical stores, and some Other Types of Ads.
è¿äºå¹¿åå¾çä¸ä½ä¼åºç°å¨åå®¶çç«ç¹ä¸ï¼ä¹ä¼åºç°å¨å ¶ä»èµæºç«ç¹ç页é¢ä¸ï¼ä¾å¦ï¼å¨å è®¸ç¨æ·ä¸ä¼ å¾çç社åºï¼è®ºåãå¾çç«çï¼ï¼ä¼æä¸äºç¨æ·ä¸ä¼ 广åå¾çã大é广åå¾ççåå¨ï¼å¾å¾å¯¹ç¨æ·é æå¹²æ°ï¼çè³ç¨æ·è¿è¡å¾çæç´¢æ¶ï¼ä¹ä¼åºç°ä¸ç¨æ·éæ±æ å ³ç广åå¾çãThese advertisement pictures will appear not only on the merchant's site, but also on the pages of other resource sites. For example, in communities (forums, picture sites, etc.) that allow users to upload pictures, some users will upload advertisement pictures. The existence of a large number of advertising pictures often causes interference to users, and even when users search for pictures, advertising pictures that have nothing to do with user needs will appear.
ä»å¾ççå¾åå 容è§åº¦æ¥çï¼ä¸å广åå¾çæ¯æ²¡æç¹å«å¤çç¸ä¼¼ç¹çï¼æä»¥åºäºç®åçå¾åè¯å«ææ¯ï¼é¾ä»¥å¯¹å¾ççå¾çå 容屿§è¿è¡è¯å«ï¼å³é¾ä»¥è¯å«åºåªäºå¾ç为广åå¾çï¼ä¹å°±æ æ³å¯¹å¹¿åå¾çè¿è¡é对æ§çå¤çï¼ç¨æ·çä½éªå¿ ç¶åå°å¹¿åå¾ççå½±åãFrom the perspective of the image content of the pictures, different advertising pictures do not have many similarities, so based on the current image recognition technology, it is difficult to identify the picture content attributes of the pictures, that is, it is difficult to identify which pictures are advertising pictures, and It is impossible to carry out targeted processing on the advertisement picture, and the user's experience is bound to be affected by the advertisement picture.
åæå 容Contents of the invention
é´äºä¸è¿°é®é¢ï¼æåºäºæ¬åæä»¥ä¾¿æä¾ä¸ç§å æä¸è¿°é®é¢æè è³å°é¨åå°è§£å³ä¸è¿°é®é¢çä¸ç§å¾çå 容屿§è¯å«æ¹æ³åç³»ç»ãIn view of the above problems, the present invention is proposed to provide a method and system for identifying image content attributes that overcome the above problems or at least partially solve the above problems.
便®æ¬åæçä¸ä¸ªæ¹é¢ï¼æä¾äºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼å ¶å æ¬ï¼è®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ ¹æ®å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è®ç»çé卿¨¡åï¼æ ¹æ®è®ç»åççé卿¨¡åè¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãAccording to one aspect of the present invention, a method for identifying image content attributes is provided, which includes: calculating the relative number of reprints of multiple homologous image clusters for a specific resource site; Filter model; identify the image content attributes in the target image cluster according to the trained filter model.
å¯éå°ï¼è®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°çæ¥éª¤å æ¬ï¼å¯¹äºå¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼å¤ä¸ªèµæºç«ç¹å æ¬ç¹å®èµæºç«ç¹ãOptionally, the step of calculating the relative number of reprints of multiple homologous picture clusters for a specific resource site includes: for a homologous picture cluster among the multiple homologous picture clusters, placing the pictures in the homologous picture cluster on the specific resource site Compared with the number of reprints on multiple resource sites, the relative number of reprints of the same-source picture cluster to a specific resource site is obtained, and the multiple resource sites include the specific resource site.
å¯éå°ï¼å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾çæ¥éª¤å æ¬ï¼è®¡ç®ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°ï¼è®¡ç®å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°ï¼ååæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬ä¸å¹³å转载æ°ç第ä¸å·®å¼ï¼ä»¥åååæºå¾çç°ä¸çå¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬äºå¹³å转载æ°ç第äºå·®å¼ï¼å°ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ãOptionally, the step of comparing the number of reprints of pictures in the same-source picture cluster on a specific resource site with the number of reprints on multiple resource sites includes: calculating the first average number of reprints of pictures on a specific resource site ;Calculate the second average number of reprints of pictures on multiple resource sites; take the first difference between the number of reprints and the first average number of reprints of pictures in the same-source picture cluster on a specific resource site, and take the same-source picture cluster The second difference between the number of reprints of the pictures in multiple resource sites and the second average number of reprints, and compare the first difference with the second difference to obtain the relative number of reprints of the same-source picture cluster for a specific resource site.
å¯éå°ï¼è®¡ç®ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°çæ¥éª¤å æ¬ï¼åå¤ä¸ªåæºå¾çç°çå¾çä¸ä½äºç¹å®èµæºç«ç¹ä¸çå¤ä¸ªå¾çï¼å°å¤ä¸ªå¾ççæ°éä¸å¤ä¸ªå¾ç对åºçåæºå¾çç°çæ°éè¿è¡å¯¹æ¯ï¼å¾å°ç¬¬ä¸å¹³å转载æ°ãOptionally, the step of calculating the first average number of reprints of pictures on a specific resource site includes: taking multiple pictures located on a specific resource site among pictures of multiple homologous picture clusters, and combining the number of multiple pictures with the multiple The number of homologous picture clusters corresponding to the picture is compared to obtain the first average number of reprints.
å¯éå°ï¼è®¡ç®å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°çæ¥éª¤å æ¬ï¼å°å¤ä¸ªåæºå¾çç°çå¾ççæ°éï¼ä¸å¤ä¸ªåæºå¾çç°çæ°éè¿è¡æ¯è¾ï¼å¾å°ç¬¬äºå¹³å转载æ°ãOptionally, the step of calculating the second average number of reprints of pictures on multiple resource sites includes: comparing the number of pictures in multiple homologous picture clusters with the number of multiple homologous picture clusters to obtain the second average Number of reprints.
å¯éå°ï¼å¨å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾çæ¥éª¤ä¹åï¼è¿å æ¬ï¼æåå¤ä¸ªèµæºç«ç¹ä¸åºç°çå¾ç龿¥ï¼æ£æµå¾ç龿¥ä¸åæºå¾çç°çå¾ç对åºç龿¥æ¯å¦ç¸åï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾ççæ ¡éªä¿¡æ¯ä¸åæºå¾çç°çå¾ççæ ¡éªä¿¡æ¯æ¯å¦ç¸åï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾çä¸åæºå¾çç°çå¾çæ¯å¦åå¨ä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼æ ¹æ®æ£æµç»æï¼ç¡®å®å¾ç龿¥æ¯å¦ä¸ºåæºå¾çç°çå¾çç转载ï¼å¹¶ç»è®¡åæºå¾çç°çå¾çç转载æ°ãOptionally, before the step of comparing the number of reprints of pictures in the same-source picture cluster on a specific resource site with the number of reprints on multiple resource sites, it also includes: grabbing the pictures that appear on multiple resource sites Picture link; detect whether the picture link is the same as the link corresponding to the picture of the same-source picture cluster, and/or check whether the check information of the picture corresponding to the picture link is the same as the check information of the picture of the same-source picture cluster, and/or detect Whether the picture corresponding to the picture link and the picture of the same source picture cluster have one or more identical image features; according to the detection result, determine whether the picture link is a reprint of the picture of the same source picture cluster, and count the pictures of the same source picture cluster Number of reprints.
å¯éå°ï¼ç¹å®èµæºç«ç¹ä¸ºå¤ä¸ªåæºå¾çç°ä¸è½¬è½½æ¯ä¸ªåæºå¾çç°çå¾çæå¤çèµæºç«ç¹ãOptionally, the specific resource site is the resource site that reprints the most pictures of each same-source picture cluster among multiple same-source picture clusters.
å¯éå°ï¼æ¯ä¸ªåæºå¾çç°çå¾ç对åºå䏿ºå¾çï¼ä¸æ¯ä¸ªåæºå¾çç°çå¾çä¸å ¶å¯¹åºçæºå¾çå ·æä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ãOptionally, the pictures of each homologous picture cluster correspond to the same source picture, and the pictures of each homologous picture cluster and its corresponding source picture have one or more identical image features.
å¯éå°ï¼æè¿°æ¹æ³è¿ä¸æ¥å æ¬ï¼æåæè¿°åæºå¾çç°ä¸å å«çå¾ççæ ¼å¼ç¹å¾å/æå¾çç龿¥ç¹å¾ï¼æ ¹æ®æè¿°å¤ä¸ªåæºå¾çç°ã对åºçç¸å¯¹è½¬è½½æ°ï¼ä»¥å对åºå å«çå¾ççæ ¼å¼ç¹å¾è®ç»çé卿¨¡åï¼æ ¹æ®è®ç»åççé卿¨¡åï¼åºäºæè¿°ç¸å¯¹è½¬è½½æ°ä»¥åç®æ å¾çç°ä¸å å«çå¾ççæ ¼å¼ç¹å¾å/æå¾çç龿¥ç¹å¾ï¼æ¥è¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãOptionally, the method further includes: extracting format features and/or link features of pictures contained in the same-source picture clusters, according to the multiple homologous picture clusters, corresponding relative reprint numbers, and corresponding The format features of the included pictures train the filter model; according to the trained filter model, based on the relative number of reprints and the format features of the pictures contained in the target picture cluster and/or the link features of the pictures, identify the The image content property of the .
å¯éå°ï¼æè¿°å¾ççæ ¼å¼ç¹å¾å æ¬ä½ä¸éäºä»¥ä¸ä¸çä¸ç§æå ç§ç»åï¼å¾ççé¿/宽ï¼å¾çç大å°ï¼å¾ççæ¸ æ°åº¦ï¼Optionally, the format features of the picture include but are not limited to one or a combination of the following: length/width of the picture, size of the picture, clarity of the picture,
å¯éå°ï¼æè¿°å¾çç龿¥ç¹å¾å æ¬ä½ä¸éäºä»¥ä¸ä¸çä¸ç§æå ç§ç»åï¼å¾ç龿¥æ¯å¦åç½é¡µåç«ï¼å¾çè·³è½¬é¾æ¥æ¯å¦ç«å¤ã便®æ¬åæçå¦ä¸ä¸ªæ¹é¢ï¼æä¾äºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼å ¶å æ¬ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡åï¼ç¨äºè®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼è®ç»æ¨¡åï¼ç¨äºå°å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è¾å ¥çéå¨ä¸è®ç»çé卿¨¡åï¼çéå¨ï¼éäºæ ¹æ®è®ç»æ¨¡åå¾å°è®ç»åççé卿¨¡åï¼å¹¶æ ¹æ®æ¨¡åå¯¹ç®æ å¾çç°è¿è¡çéï¼è¯å«æ¨¡åï¼ç¨äºæ ¹æ®çéå¨å¯¹ç®æ å¾çç°è¿è¡çéï¼è¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãOptionally, the picture link features include but are not limited to one or several combinations of the following: whether the picture link is on the same site as the web page, and whether the picture jump link is off-site. According to another aspect of the present invention, a system for identifying image content attributes is provided, which includes: a relative reprint number calculation module, used to calculate the relative reprint number of multiple homologous picture clusters for a specific resource site; a training module, used for Input multiple homologous picture clusters and corresponding relative reprints into the filter to train the filter model; the filter is adapted to obtain the trained filter model according to the training module, and screen the target picture cluster according to the model; the identification module , used to filter the target image cluster according to the filter, and identify the image content attributes in the target image cluster.
å¯éå°ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡å对äºå¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼å¤ä¸ªèµæºç«ç¹å æ¬ç¹å®èµæºç«ç¹ãOptionally, the relative reprint count calculation module compares the number of reprints of the pictures in the same source picture cluster on a specific resource site to the number of reprints on multiple resource sites for a homologous picture cluster in a plurality of homologous picture clusters. By comparing the numbers, the relative number of reprints of the homologous picture clusters to the specific resource site is obtained, and multiple resource sites include the specific resource site.
å¯éå°ï¼è¿å æ¬ï¼ç¬¬ä¸å¹³å转载æ°è®¡ç®æ¨¡åï¼ç¨äºè®¡ç®ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°ï¼ç¬¬äºå¹³å转载æ°è®¡ç®æ¨¡åï¼ç¨äºè®¡ç®å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡åååæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬ä¸å¹³å转载æ°ç第ä¸å·®å¼ï¼ä»¥åååæºå¾çç°ä¸çå¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬äºå¹³å转载æ°ç第äºå·®å¼ï¼å°ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ãOptionally, it also includes: a first average reprint calculation module, used to calculate the first average reprint number of pictures on a specific resource site; a second average reprint number calculation module, used to calculate the number of pictures on multiple resource sites The second average number of reprints; the relative reprint number calculation module gets the first difference between the number of reprints and the first average number of reprints of the pictures in the same-source picture cluster on a specific resource site, and the number of pictures in the same-source picture cluster in multiple The second difference between the number of reprints on a resource site and the second average number of reprints, and comparing the first difference with the second difference to obtain the relative number of reprints of the same-source picture cluster for a specific resource site.
å¯éå°ï¼ç¬¬ä¸å¹³å转载æ°è®¡ç®æ¨¡ååå¤ä¸ªåæºå¾çç°çå¾çä¸ä½äºç¹å®èµæºç«ç¹ä¸çå¤ä¸ªå¾çï¼å°å¤ä¸ªå¾ççæ°éä¸å¤ä¸ªå¾ç对åºçåæºå¾çç°çæ°éè¿è¡å¯¹æ¯ï¼å¾å°ç¬¬ä¸å¹³å转载æ°ãOptionally, the first average reprint count calculation module takes multiple pictures located on a specific resource site among the pictures of multiple homologous picture clusters, and calculates the number of multiple pictures with the number of homologous picture clusters corresponding to the multiple pictures. By comparison, the first average number of reprints is obtained.
å¯éå°ï¼ç¬¬äºå¹³å转载æ°è®¡ç®æ¨¡åå°å¤ä¸ªåæºå¾çç°çå¾ççæ°éï¼ä¸å¤ä¸ªåæºå¾çç°çæ°éè¿è¡æ¯è¾ï¼å¾å°ç¬¬äºå¹³å转载æ°ãOptionally, the second average reprint number calculation module compares the number of pictures in multiple homologous picture clusters with the number of multiple homologous picture clusters to obtain the second average reprint number.
å¯éå°ï¼è¿å æ¬ï¼å¾ç龿¥æå模åï¼ç¨äºæåå¤ä¸ªèµæºç«ç¹ä¸åºç°çå¾ç龿¥ï¼å¾ç龿¥æ£æµæ¨¡åï¼ç¨äºæ£æµå¾ç龿¥ä¸åæºå¾çç°çå¾ç对åºç龿¥æ¯å¦ç¸åï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾ççæ ¡éªä¿¡æ¯ä¸åæºå¾çç°çå¾ççæ ¡éªä¿¡æ¯æ¯å¦ç¸åï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾çä¸åæºå¾çç°çå¾çæ¯å¦åå¨ä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼å¾ç转载æ°ç»è®¡æ¨¡åï¼ç¨äºæ ¹æ®æ£æµç»æï¼ç¡®å®å¾ç龿¥æ¯å¦ä¸ºåæºå¾çç°çå¾çç转载ï¼å¹¶ç»è®¡åæºå¾çç°çå¾çç转载æ°ãOptionally, it also includes: a picture link grabbing module, used to grab picture links that appear on multiple resource sites; a picture link detection module, used to detect whether the picture link is the same as the link corresponding to the picture of the same source picture cluster, And/or detect whether the verification information of the picture corresponding to the picture link is the same as the verification information of the pictures of the same-source picture cluster, and/or detect whether one or more of the pictures corresponding to the picture link and the pictures of the same-source picture cluster are the same image features; the picture reprint count statistics module is used to determine whether the picture link is a reprint of a picture of the same source picture cluster according to the detection result, and count the reprint number of the picture of the same source picture cluster.
å¯éå°ï¼ç¹å®èµæºç«ç¹ä¸ºå¤ä¸ªåæºå¾çç°ä¸è½¬è½½æ¯ä¸ªåæºå¾çç°çå¾çæå¤çèµæºç«ç¹ãOptionally, the specific resource site is the resource site that reprints the most pictures of each same-source picture cluster among multiple same-source picture clusters.
å¯éå°ï¼æ¯ä¸ªåæºå¾çç°çå¾ç对åºå䏿ºå¾çï¼ä¸æ¯ä¸ªåæºå¾çç°çå¾çä¸å ¶å¯¹åºçæºå¾çå ·æä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ãOptionally, the pictures of each homologous picture cluster correspond to the same source picture, and the pictures of each homologous picture cluster and its corresponding source picture have one or more identical image features.
æ ¹æ®æ¬åæçå¾çå 容屿§è¯å«æ¹æ³åç³»ç»ï¼å©ç¨äºåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ä½ä¸ºè®ç»æ°æ®è¿è¡çé卿¨¡åçè®ç»ï¼ç¸å¯¹è½¬è½½æ°æ¯è½å¤åæ å¾çå¨ç¹å®èµæºç«ç¹çç«å ç«å¤æ¯ä¾çæ°æ®ï¼èä½ä¸ºå¹¿åçå¾ççä¸ä¸ªä¸»è¦ç¹ç¹å¨äºï¼å¨æä¸èµæºç«ç¹ä¸è½¬è½½ç次æ°é常é«ï¼èå¨äºèç½èå´å å ¶ä»èµæºç«ç¹ä¸è½¬è½½ç次æ°ä¼ç¸å¯¹å°ææ¾åå°ï¼å æ¤ç¸å¯¹è½¬è½½æ°ç大å°å¯ä»¥ç¨äºåºåå«å¾çæ¯å¦ä½ä¸ºå¹¿åè¿è¡ä¼ æï¼èå©ç¨ç¸å¯¹è½¬è½½æ°è¿è¡ççé卿¨¡åçè®ç»ï¼åå¾å°ççé卿¨¡åå¯ä»¥èªè¡å¯¹å¾ççå¾çå 容屿§è¿è¡è¯å«ï¼åç¡®å°å¤æå¾çæ¯å¦ä¸ºå¹¿åå¾çãAccording to the picture content attribute recognition method and system of the present invention, the relative reprint number of the homologous picture cluster to the specific resource site is used as the training data for the training of the filter model. One of the main characteristics of pictures used as advertisements is that the number of reprints on a certain resource site is very high, while the number of reprints on other resource sites within the Internet range will be relatively less, so the relative reprint The size of the number can be used to distinguish whether the picture is spread as an advertisement, and the filter model is trained by using the relative reprint number, and the obtained filter model can identify the picture content attribute of the picture by itself, and accurately judge whether the picture is for the ad image.
ä¸è¿°è¯´æä» æ¯æ¬åæææ¯æ¹æ¡çæ¦è¿°ï¼ä¸ºäºè½å¤æ´æ¸ æ¥äºè§£æ¬åæçææ¯ææ®µï¼èå¯ä¾ç §è¯´æä¹¦çå 容äºä»¥å®æ½ï¼å¹¶ä¸ä¸ºäºè®©æ¬åæçä¸è¿°åå ¶å®ç®çãç¹å¾åä¼ç¹è½å¤æ´ææ¾ææï¼ä»¥ä¸ç¹ä¸¾æ¬åæçå ·ä½å®æ½æ¹å¼ãThe above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.
éå¾è¯´æDescription of drawings
éè¿é è¯»ä¸æä¼é宿½æ¹å¼çè¯¦ç»æè¿°ï¼åç§å ¶ä»çä¼ç¹åçå¤å¯¹äºæ¬é¢åæ®éææ¯äººåå°å徿¸ æ¥æäºãéå¾ä» ç¨äºç¤ºåºä¼é宿½æ¹å¼çç®çï¼èå¹¶ä¸è®¤ä¸ºæ¯å¯¹æ¬åæçéå¶ãèä¸å¨æ´ä¸ªéå¾ä¸ï¼ç¨ç¸åçåè符å·è¡¨ç¤ºç¸åçé¨ä»¶ãå¨éå¾ä¸ï¼Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:
å¾1示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«æ¹æ³çæµç¨å¾ï¼Fig. 1 shows the flowchart of the picture content recognition method according to one embodiment of the present invention;
å¾2示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«æ¹æ³çé¨åæµç¨å¾ï¼FIG. 2 shows a partial flow chart of a method for identifying picture content according to an embodiment of the present invention;
å¾3示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«æ¹æ³çæµç¨å¾ï¼Fig. 3 shows the flowchart of the picture content identification method according to an embodiment of the present invention;
å¾4示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«ç³»ç»çæ¡å¾ï¼Fig. 4 shows a block diagram of a picture content recognition system according to an embodiment of the present invention;
å¾5示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«ç³»ç»çæ¡å¾ï¼FIG. 5 shows a block diagram of a picture content recognition system according to an embodiment of the present invention;
å¾6示åºäºæ ¹æ®æ¬åæçä¸ä¸ªå®æ½ä¾çå¾çå 容è¯å«ç³»ç»çæ¡å¾ãFig. 6 shows a block diagram of a picture content recognition system according to an embodiment of the present invention.
å ·ä½å®æ½æ¹å¼Detailed ways
ä¸é¢å°åç §é徿´è¯¦ç»å°æè¿°æ¬å ¬å¼çç¤ºä¾æ§å®æ½ä¾ãè½ç¶éå¾ä¸æ¾ç¤ºäºæ¬å ¬å¼çç¤ºä¾æ§å®æ½ä¾ï¼ç¶èåºå½çè§£ï¼å¯ä»¥ä»¥åç§å½¢å¼å®ç°æ¬å ¬å¼èä¸åºè¢«è¿ééè¿°ç宿½ä¾æéå¶ãç¸åï¼æä¾è¿äºå®æ½ä¾æ¯ä¸ºäºè½å¤æ´éå½»å°çè§£æ¬å ¬å¼ï¼å¹¶ä¸è½å¤å°æ¬å ¬å¼çèå´å®æ´çä¼ è¾¾ç»æ¬é¢åçææ¯äººåãExemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
å¦å¾1æç¤ºï¼æ¬åæçä¸ä¸ªå®æ½ä¾æä¾äºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼å ¶å æ¬ï¼æ¥éª¤110ï¼è®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ¯ä¸ªå¾çç°æ¯å¯¹ä¸ç»å¾ççèåï¼ä¾å¦ï¼å¯ä»¥æ¯ç¸ä¼¼åº¦è¾é«çä¸ç»å¾çï¼èç¸å¯¹è½¬è½½æ°æ¯ä¸ç§è½å¤åæ åæºå¾çç°çå¾çå¨ç¹å®èµæºç«ç¹ç«å ç«å¤ç转载æ¯ä¾çæ°æ®ï¼ç¸å¯¹è½¬è½½æ°çè®¡ç®æ¹å¼è¾å¤ï¼æ¬å®æ½ä¾ä¸ä¸å¯¹ç¸å¯¹è½¬è½½æ°çè®¡ç®æ¹å¼è¿è¡éå¶ï¼æ¥éª¤120ï¼æ ¹æ®å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è®ç»çé卿¨¡åï¼éè¿å¯¹å¹¿åå¾ççç ç©¶åç°ï¼å¹¿åå¾çæä»¥ä¸ç¹ç¹ï¼å¹¿åå¾ççäº§ææ¬é«ï¼å¾å¤å¹¿åå¾ç齿¯åæ·è±è´¹éé±ãè±è´¹æ¶é´å¶ä½çï¼å 为广åå¾çççäº§ææ¬é«ï¼æä»¥åæ·ä¼å°ä¸å¼ 广åå¾çä¼ æå¾å¤æ¬¡ï¼ä½æ¯è¿äºå¹¿åå¾çåºæ¬ä¸åªæåæ·ä¼è¿è¡ä¼ æï¼èå ¶ä»çç¨æ·ååºæ¬ä¸ä¼ä¼ æå¹¿åå¾çï¼å¹¿åå¾çå¨ä¼ æä¸çè¿ç§å·®å«æç»ä¼ä½ç°å¨èµæºç«ç¹ä¸ç转载æ°ä¸ï¼å¨ç¹å®çèµæºç«ç¹ä¸è½¬è½½ç次æ°é常å¤ï¼åæ·æ æä¼ æï¼ï¼èå¨äºèç½å ¶ä»ç«ç¹ä¸çè½¬è½½çæ¬¡æ°ç¸å¯¹å°çå¤ï¼å ¶ä»ç¨æ·å¹¶ä¸ä¼ æï¼ï¼ä¹å³å¹¿åå¾çå¨ç¹å®èµæºç«ç¹ç«å ç«å¤ç转载æ¯ä¾ä¼æ¯è¾é«ï¼æä»¥ç¸å¯¹è½¬è½½æ°å¯ä»¥ä½ä¸ºåºå广åå¾çåé广åå¾ççä¸ç§æ°æ®ï¼èè®ç»çé卿¨¡åçå·¥å ·å æ¬ä½ä¸éäºå¼æºçLIBSVMï¼æ¥éª¤130ï¼æ ¹æ®è®ç»åççé卿¨¡åè¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ï¼å³è¯å«ç®æ å¾çç°ä¸çå¾çæ¯å¦ä¸ºå¹¿åå¾çï¼æå©äºå¯¹å¹¿åå¾çè¿è¡è¿æ»¤çå¤çï¼é¿å 广åå¾çå¯¹ç¨æ·çä½éªé æå½±åï¼åè®¾ç®æ å¾çç°ä¸ºå¯¹åºå¾çæç´¢è¯·æ±çä¸ç»å¾çï¼åæ ¹æ®æ¬å®æ½ä¾çææ¯æ¹æ¡ï¼å¯ä»¥ä»å ¶ä¸è¯å«åºå¹¿åå¾çå¹¶è¿è¡è¿æ»¤ï¼ä»èå°é广åå¾çä½ä¸ºæç´¢ç»ææä¾ç»ç¨æ·ï¼ä»èä¿è¯ç¨æ·ç使ç¨ä½éªãAs shown in Figure 1, an embodiment of the present invention provides a method for identifying image content attributes, which includes: step 110, calculating the relative number of reprints of multiple homologous image clusters for a specific resource site, each image cluster is a pair of The aggregation of a group of pictures, for example, can be a group of pictures with high similarity, and the relative reprint number is a data that can reflect the reprint ratio of pictures of the same source picture cluster on and off the site of a specific resource site, and the relative reprint number There are many calculation methods, and this embodiment does not limit the calculation method of the relative reprint number; Step 120, train the filter model according to multiple homologous picture clusters and the corresponding relative reprint numbers, and find through the research on the advertisement pictures that the advertisement Pictures have the following characteristics: the production cost of advertising pictures is high, and many advertising pictures are produced by merchants who spend money and time. Because of the high production cost of advertising pictures, merchants will spread a single advertising picture many times, but these advertising pictures Only merchants on the Internet will spread, while other users will basically not spread advertising pictures. This difference in the spread of advertising pictures will eventually be reflected in the number of reprints on resource sites: the number of reprints on specific resource sites is very high. More (merchant intentionally disseminates), while the number of reprints on other Internet sites is relatively less (other users do not disseminate), that is, the reprint ratio of advertising images on specific resource sites will be relatively high, so the relative reprint Number can be used as a kind of data of distinguishing advertisement picture and non-advertising picture, and the tool of training filter model includes but not limited to open-source LIBSVM; Step 130, according to the filter model after training, identify the picture content attribute in the target picture cluster, That is to identify whether the pictures in the target picture cluster are advertising pictures, which is beneficial to filter the advertising pictures and avoid the impact of the advertising pictures on the user experience. Assuming that the target picture cluster is a group of pictures corresponding to the picture search request, then according to this According to the technical solution of the embodiment, the advertising pictures can be identified and filtered, so that the non-advertising pictures can be provided to the user as the search result, thereby ensuring the user experience.
å¨å®é åºç¨ä¸ï¼å¨æ¬åææåºçç¸å¯¹è½¬è½½æ°ä¹å¤ï¼è¿åæ¶èèå°å ¶ä»çç¹å¾ï¼ä¾å¦å¾ççé¿/宽ï¼å¾çç大å°ï¼å¾ççæ¸ æ°åº¦ï¼å¾ç龿¥æ¯å¦åç½é¡µåç«ï¼æå¾çè·³è½¬é¾æ¥æ¯å¦ç«å¤çç¹å¾ï¼å¨è®ç»çé卿¶ä¼æ ¹æ®å¤ä¸ªåæºå¾çç°åèªå¯¹åºçç¸å¯¹è½¬è½½æ°ï¼ä»¥åå¾çç°ä¸çå¾ççé¿/宽ï¼å¾çç大å°ï¼å¾ççæ¸ æ°åº¦ï¼å¾ç龿¥æ¯å¦åç½é¡µåç«ï¼å¾çè·³è½¬é¾æ¥æ¯å¦ç«å¤ä¸çä¸ä¸ªæå¤ä¸ªç»åï¼å ç»è¿çéå¨å»å¦ä¹ åè®ç»ãå¨ç®æ å¾çç°è¯å«æ¶ï¼ä¹ä¼å¯¹åºåç §ä¸è¿°è¿äºå ¶ä»ç¹å¾ä¸çä¸ä¸ªæå¤ä¸ªæ¥è¿è¡çéå¹¶è¯å«æ¯å¦ä¸ºå¹¿åå¾çãIn practical applications, in addition to the relative number of reprints proposed by the present invention, other features are also considered, such as the length/width of the picture, the size of the picture, the clarity of the picture, whether the picture link is on the same site as the web page, or whether the picture Whether the jump link is off-site or not, when training the filter, it will be based on the relative number of reprints corresponding to multiple homologous picture clusters, as well as the length/width of the picture in the picture cluster, the size of the picture, the clarity of the picture, the picture Whether the link is on the same site as the web page, whether the picture jump link is one or more combinations outside the site, first go through the filter to learn and train. When the target picture cluster is identified, one or more of the above-mentioned other features will also be referred to to filter and identify whether it is an advertisement picture.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¥éª¤110å¯ä»¥å æ¬ï¼å¯¹äºå¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¾å¦å¨å¾çç«Aä¸è½¬è½½äº30次ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼ä¾å¦å¨10个å¾çç«ï¼å æ¬å¾çç«Aï¼ä¸å ±è½¬è½½äº35次ï¼å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼å¤ä¸ªèµæºç«ç¹å æ¬ç¹å®èµæºç«ç¹ï¼æ¬å®æ½ä¾ä¸æä¾äºè®¡ç®ç¸å¯¹è½¬è½½æ°çå¯è¡æ¹å¼ï¼ä¸ä¸å¯¹å ·ä½çæ¯è¾æ¹å¼è¿è¡éå®ï¼ä¾å¦ï¼å30/35ã30/ï¼35-30ï¼ä½ä¸ºç¸å¯¹è½¬è½½æ°é½æ¯å¯ä»¥çãAnother embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiments, in the method for identifying image content attributes in this embodiment, step 110 may include: Picture cluster, compare the number of reprints of pictures in the same source picture cluster on a specific resource site, for example, 30 times on picture site A, with the number of reprints on multiple resource sites, for example, on 10 picture sites A total of 35 reprints were made on (including picture site A), and the relative reprint numbers of the same-source picture clusters for specific resource sites were obtained. Multiple resource sites include specific resource sites. This embodiment provides a feasible way to calculate the relative reprint numbers. The specific comparison method is not limited, for example, 30/35, 30/(35-30) can be used as the relative number of reprints.
å¦å¾2æç¤ºï¼æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¥éª¤110å æ¬ï¼æ¥éª¤111ï¼è®¡ç®ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°ï¼ä¾å¦å设å¾çç«Aç第ä¸å¹³å转载æ°ä¸º5ï¼æ¥éª¤112ï¼è®¡ç®å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°ï¼ä¾å¦å设10个å¾çç«ï¼å æ¬å¾çç«Aï¼ç第äºå¹³å转载æ°ä¸º20ï¼æ¥éª¤113ï¼ååæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬ä¸å¹³å转载æ°ç第ä¸å·®å¼ï¼å第ä¸å·®å¼å®é ä¸å¯åæ åæºå¾çç°çå¾çä¸å ¶ä»å¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载差å¼ï¼å·®å¼è¶å¤§åè¡¨ç¤ºåæºå¾çç°ä¸ºå¹¿åå¾ççå¯è½æ§è¶å¤§ï¼ç»ååè¿°ç宿½ä¾å¯ç¥ç¬¬ä¸å·®å¼ä¸º30-5=25ï¼ä»¥åååæºå¾çç°ä¸çå¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬äºå¹³å转载æ°ç第äºå·®å¼ï¼å第äºå·®å¼å®é ä¸å¯åæ åæºå¾çç°çå¾çä¸å ¶ä»å¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载差å¼ï¼å·®å¼è¶å¤§è¡¨ç¤ºåæºå¾çç°ä¸ºå¹¿åå¾ççå¯è½æ§è¶å°ï¼ç»ååè¿°ç宿½ä¾å¯ç¥ç¬¬äºå·®å¼ä¸º35-20=15ï¼å°ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ¬å®æ½ä¾ä¸æä¾äºå¦ä¸ç§è®¡ç®ç¸å¯¹è½¬è½½æ°çæ¹å¼ï¼ä¸èèå°åæºå¾çç°çå¾çä¸å ¶ä»å¾çç转载差å¼ï¼ä½¿å¾ç¸å¯¹è½¬è½½æ°è½æ´å¥½å°åæ å¾çæ¯å¦ä¸ºå¹¿åå¾çï¼æ¬å®æ½ä¾ä¸ä¸å¯¹ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯æ¹å¼è¿è¡éå®ï¼ä¾å¦ï¼å25/15ï¼ï¼25±aï¼/ï¼15±bï¼é½æ¯å¯ä»¥çï¼aãb为常æ°ãAs shown in Figure 2, another embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiment, step 110 of the method for identifying image content attributes in this embodiment includes: step 111, calculating the specific resource site The first average number of reprints of pictures on the website, for example, assuming that the first average number of reprints of picture site A is 5; step 112, calculating the second average number of reprints of pictures on multiple resource sites, for example, assuming 10 picture sites (including The second average number of reprints of picture station A) is 20; step 113, take the first difference between the number of reprints of pictures in the same source picture cluster on a specific resource site and the first average number of reprints, then the first difference is actually It can reflect the reprinting difference between pictures of the same-source picture cluster and other pictures on a specific resource site. The larger the difference, the greater the possibility that the same-source picture cluster is an advertisement picture. Combining the above-mentioned embodiments, we can know the first difference is 30-5=25, and taking the second difference between the number of reprints and the second average number of reprints of pictures in the same-source picture cluster on multiple resource sites, the second difference can actually reflect the same-source picture cluster The difference between the reprinted pictures of the picture and other pictures on multiple resource sites, the greater the difference, the less likely the homologous picture cluster is an advertisement picture. Combining the foregoing embodiments, it can be seen that the second difference is 35-20=15, Compare the first difference with the second difference to obtain the relative number of reprints of the same-source picture cluster for a specific resource site. This embodiment provides another way to calculate the relative reprint number, and takes into account the pictures The reprint difference with other pictures makes the relative number of reprints better reflect whether the picture is an advertisement picture. In this embodiment, the comparison method between the first difference and the second difference is not limited, for example, take 25/15, (25 ±a)/(15±b) are all possible, a and b are constants.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¥éª¤111å æ¬ï¼åå¤ä¸ªåæºå¾çç°çå¾çä¸ä½äºç¹å®èµæºç«ç¹ä¸çå¤ä¸ªå¾çï¼å°å¤ä¸ªå¾ççæ°éä¸å¤ä¸ªå¾ç对åºçåæºå¾çç°çæ°éè¿è¡å¯¹æ¯ï¼å¾å°ç¬¬ä¸å¹³å转载æ°ï¼ä¾å¦å¾çç«A䏿100å¼ å¾çï¼è¯¥100å¼ å¾çä½äº20个å¾çç°ä¸ï¼å第ä¸å¹³å转载æ°ä¸º100/20=5ï¼æ¬å®æ½ä¾çææ¯æ¹æ¡ä¸æä¾äºä¸ç§å¿«é髿å¾å°å¹³å转载æ°çæ¹å¼ãAnother embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiments, in the method for identifying image content attributes in this embodiment, step 111 includes: taking images located in a specific resource in multiple homologous image clusters For multiple pictures on the site, compare the number of multiple pictures with the number of homologous picture clusters corresponding to multiple pictures to obtain the first average number of reprints. For example, there are 100 pictures on picture site A, and the 100 pictures are located in Among the 20 image clusters, the first average number of reprints is 100/20=5, and the technical solution of this embodiment provides a way to quickly and efficiently obtain the average number of reprints.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¥éª¤112å æ¬ï¼å°å¤ä¸ªåæºå¾çç°çå¾ççæ°éï¼ä¸å¤ä¸ªåæºå¾çç°çæ°éè¿è¡æ¯è¾ï¼å¾å°ç¬¬äºå¹³å转载æ°ï¼ä¾å¦10个å¾çç«ï¼å æ¬å¾çç«Aï¼ä¸æ1000å¼ å¾çï¼è¯¥1000å¼ å¾çå¯è类为50个å¾çç°ï¼å第äºå¹³å转载æ°ä¸º1000/50=20ï¼æ¬å®æ½ä¾çææ¯æ¹æ¡ä¸æä¾äºä¸ç§å¿«é髿å¾å°å¹³å转载æ°çæ¹å¼ãAnother embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiments, in the method for identifying image content attributes in this embodiment, step 112 includes: combining the number of images in multiple homologous image clusters with Compare the number of multiple homologous picture clusters to get the second average number of reprints. For example, there are 1000 pictures on 10 picture sites (including picture site A), and the 1000 pictures can be clustered into 50 picture clusters, then the second 2. The average number of reprints is 1000/50=20. The technical solution of this embodiment provides a fast and efficient way to obtain the average number of reprints.
å¦å¾3æç¤ºï¼æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¥éª¤110ä¹åï¼è¿å æ¬ï¼æ¥éª¤101ï¼æåå¤ä¸ªèµæºç«ç¹ä¸åºç°çå¾ç龿¥ï¼URLï¼ï¼æ¥éª¤102ï¼æ£æµå¾ç龿¥ä¸åæºå¾çç°çå¾ç对åºç龿¥æ¯å¦ç¸åï¼è¿åæ äºä¸å¼ å¾çæ¯å¦ä»¥ä¸åçURL被转载ï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾ççæ ¡éªä¿¡æ¯ä¸åæºå¾çç°çå¾ççæ ¡éªä¿¡æ¯ï¼å æ¬ä½ä¸éäºMD5å¼ï¼æ¯å¦ç¸åï¼è¿åæ äºæ¯å¦åå¨å¤å¼ ç¸åçå¾çï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾çä¸åæºå¾çç°çå¾çæ¯å¦åå¨ä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼è¿åæ äºå¤å¼ å¾çæ¯å¦ç¸åï¼æç±åä¸å¼ å¾çä¿®æ¹å¾å°ï¼æ¬å®æ½ä¾ä¸çå¾åç¹å¾å æ¬ä½ä¸éäºè½®å»ç¹å¾ãé¢è²ç¹å¾ãç´æ¹å¾ç¹å¾çï¼æ¥éª¤103ï¼æ ¹æ®æ£æµç»æï¼ç¡®å®å¾ç龿¥æ¯å¦ä¸ºåæºå¾çç°çå¾çç转载ï¼å¹¶ç»è®¡åæºå¾çç°çå¾çç转载æ°ï¼åæ¬å®æ½ä¾ä¸æä¾äºä¸ç§å¯å ¨é¢ç»è®¡å¾ç转载æ°çææ¯æ¹æ¡ãAs shown in FIG. 3 , another embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiments, the method for identifying image content attributes in this embodiment, before step 110, further includes: step 101, capturing Get picture links (URLs) that appear on multiple resource sites; step 102, check whether the picture links are the same as the links corresponding to pictures in the same source picture cluster, which reflects whether a picture is reproduced with different URLs, and/or Detect whether the verification information of the picture corresponding to the picture link is the same as the verification information (including but not limited to MD5 value) of the picture in the same source picture cluster, which reflects whether there are multiple identical pictures, and/or detects whether the picture link corresponds to Whether there are one or more of the same image features in the pictures of the same source picture cluster, which reflects whether multiple pictures are the same, or obtained by modifying the same picture. The image features in this embodiment include but are not limited to contour features , color features, histogram features, etc.; step 103, according to the detection results, determine whether the picture link is a reprint of a picture of the same source picture cluster, and count the number of reprints of the pictures of the same source picture cluster, then the present embodiment provides a A technical solution that can comprehensively count the number of picture reprints.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼ç¹å®èµæºç«ç¹ä¸ºå¤ä¸ªåæºå¾çç°ä¸è½¬è½½æ¯ä¸ªåæºå¾çç°çå¾çæå¤çèµæºç«ç¹ï¼è½¬è½½å¾çæå¤æ¬¡æ°çç«ç¹å¾å¯è½ä¸ºå¹¿åå¾ççåæ·è¿è¡ä¼ æçç«ç¹ï¼è¯¥ç«ç¹å¯¹åºçè½¬è½½æ°æè½å¤ææå°åæ åºå¾çæ¯å¦ä¸ºå¹¿åå¾çãAnother embodiment of the present invention proposes a method for identifying image content attributes. Compared with the above-mentioned embodiments, the method for identifying image content attributes in this embodiment requires a specific resource site to reprint each image of the same origin in multiple image clusters of the same origin. The resource site with the most pictures in the cluster, and the site with the most reposted pictures are likely to be the sites that the merchants of the advertising pictures spread. The number of reprints corresponding to this site can most effectively reflect whether the pictures are advertising pictures.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«æ¹æ³ï¼æ¯ä¸ªåæºå¾çç°çå¾ç对åºå䏿ºå¾çï¼ä¸æ¯ä¸ªåæºå¾çç°çå¾çä¸å ¶å¯¹åºçæºå¾çå ·æä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼å卿¬å®æ½ä¾çææ¯æ¹æ¡ä¸ï¼æ¯ä¸ªåæºå¾çç°çå¾çç¸åï¼æå¯ä»¥åä¸å¾çä¿®æ¹å¾å°ï¼æ¬å®æ½ä¾ä¸çå¾åç¹å¾å æ¬ä½ä¸éäºè½®å»ç¹å¾ãé¢è²ç¹å¾ãç´æ¹å¾ç¹å¾çãAnother embodiment of the present invention proposes a picture content attribute recognition method. Compared with the above-mentioned embodiment, in the picture content attribute recognition method of this embodiment, each picture of the same source picture cluster corresponds to the same source picture, and each same source picture cluster The pictures of the source picture cluster and the corresponding source pictures have one or more identical image features, then in the technical solution of this embodiment, the pictures of each same-source picture cluster are the same, or can be obtained by modifying the same picture. Image features in include but not limited to contour features, color features, histogram features, etc.
å¦å¾4æç¤ºï¼æ¬åæçä¸ä¸ªå®æ½ä¾æä¾äºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼å ¶å æ¬ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡å210ï¼ç¨äºè®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ¯ä¸ªå¾çç°æ¯å¯¹ä¸ç»å¾ççèåï¼ä¾å¦ï¼å¯ä»¥æ¯ç¸ä¼¼åº¦è¾é«çä¸ç»å¾çï¼èç¸å¯¹è½¬è½½æ°æ¯ä¸ç§è½å¤åæ åæºå¾çç°çå¾çå¨ç¹å®èµæºç«ç¹ç«å ç«å¤ç转载æ¯ä¾çæ°æ®ï¼ç¸å¯¹è½¬è½½æ°çè®¡ç®æ¹å¼è¾å¤ï¼æ¬å®æ½ä¾ä¸ä¸å¯¹ç¸å¯¹è½¬è½½æ°çè®¡ç®æ¹å¼è¿è¡éå¶ï¼è®ç»æ¨¡å220ï¼ç¨äºå°å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è¾å ¥çéå¨ä¸è®ç»çé卿¨¡åãéè¿å¯¹å¹¿åå¾ççç ç©¶åç°ï¼å¹¿åå¾çæä»¥ä¸ç¹ç¹ï¼å¹¿åå¾ççäº§ææ¬é«ï¼å¾å¤å¹¿åå¾ç齿¯åæ·è±è´¹éé±ãè±è´¹æ¶é´å¶ä½çï¼å 为广åå¾çççäº§ææ¬é«ï¼æä»¥åæ·ä¼å°ä¸å¼ 广åå¾çä¼ æå¾å¤æ¬¡ï¼ä½æ¯è¿äºå¹¿åå¾çåºæ¬ä¸åªæåæ·ä¼è¿è¡ä¼ æï¼èå ¶ä»çç¨æ·ååºæ¬ä¸ä¼ä¼ æå¹¿åå¾çï¼å¹¿åå¾çå¨ä¼ æä¸çè¿ç§å·®å«æç»ä¼ä½ç°å¨èµæºç«ç¹ä¸ç转载æ°ä¸ï¼å¨ç¹å®çèµæºç«ç¹ä¸è½¬è½½ç次æ°é常å¤ï¼åæ·æ æä¼ æï¼ï¼èå¨äºèç½å ¶ä»ç«ç¹ä¸çè½¬è½½çæ¬¡æ°ç¸å¯¹å°çå¤ï¼å ¶ä»ç¨æ·å¹¶ä¸ä¼ æï¼ï¼ä¹å³å¹¿åå¾çå¨ç¹å®èµæºç«ç¹ç«å ç«å¤ç转载æ¯ä¾ä¼æ¯è¾é«ï¼æä»¥ç¸å¯¹è½¬è½½æ°å¯ä»¥ä½ä¸ºåºå广åå¾çåé广åå¾ççä¸ç§æ°æ®ï¼çéå¨230ï¼éäºæ ¹æ®è®ç»æ¨¡åå¾å°è®ç»åççé卿¨¡åï¼å¹¶æ ¹æ®æ¨¡åå¯¹ç®æ å¾çç°è¿è¡çéï¼æ¬å®æ½ä¾ä¸ä½¿ç¨ççéå¨å æ¬ä½ä¸éäºå¼æºçLIBSVMï¼è¯å«æ¨¡å240ï¼ç¨äºæ ¹æ®çéå¨å¯¹ç®æ å¾çç°è¿è¡çéï¼è¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ï¼å³è¯å«ç®æ å¾çç°ä¸çå¾çæ¯å¦ä¸ºå¹¿åå¾çãAs shown in FIG. 4 , an embodiment of the present invention provides a picture content attribute identification system, which includes: a relative reprint count calculation module 210, which is used to calculate the relative reprint count of multiple homologous picture clusters for a specific resource site, Each picture cluster is an aggregation of a group of pictures, for example, it can be a group of pictures with high similarity, and the relative reprint number is a kind of reprint ratio that can reflect the pictures of the same source picture cluster inside and outside the site of a specific resource site For the data, there are many calculation methods for the relative reprint number, and the calculation method of the relative reprint number is not limited in this embodiment; the training module 220 is used to input multiple homologous picture clusters and corresponding relative reprint numbers into the filter for training filter model. Through the study of advertising pictures, it is found that advertising pictures have the following characteristics: the production cost of advertising pictures is high, and many advertising pictures are produced by merchants who spend money and time. It has been disseminated many times, but basically only the merchants will spread these advertising pictures, while other users will basically not spread the advertising pictures. This difference in the dissemination of advertising pictures will eventually be reflected in the number of reprints on the resource site: The number of reprints on a specific resource site is very high (the merchant spreads it intentionally), while the number of reprints on other Internet sites is relatively small (other users do not spread it), that is, the number of advertising pictures on and off the site of a specific resource site The reprint ratio will be relatively high, so the relative reprint number can be used as a kind of data to distinguish advertising pictures from non-advertising pictures; the filter 230 is suitable for obtaining a trained filter model according to the training module, and filtering the target picture cluster according to the model , the filters used in this embodiment include but are not limited to open source LIBSVM; the identification module 240 is used to screen the target picture cluster according to the filter, and identify the picture content attributes in the target picture cluster, that is, to identify the target picture cluster. Whether the image is an ad image.
å¦å¤ï¼å®é åºç¨ä¸æè¿°ç³»ç»è¿ä¸æ¥å æ¬ï¼å¾çæ ¼å¼ç¹å¾æ¨¡å310å/æå¾ç龿¥ç¹å¾æ¨¡å320ï¼æè¿°å¾çæ ¼å¼ç¹å¾æ¨¡å310ï¼éäºæååæºå¾çç°ä»¥åç®æ å¾çç°ä¸å å«çå¾ççæ ¼å¼ç¹å¾ï¼æè¿°å¾ç龿¥ç¹å¾æ¨¡å320ï¼éäºæååæºå¾çç°ä»¥åç®æ å¾çç°ä¸å å«çå¾çç龿¥ç¹å¾ï¼æè¿°è®ç»æ¨¡å220è¿ä¸æ¥éäºåºäºå¤ä¸ªåæºå¾çç°ã对åºçç¸å¯¹è½¬è½½æ°ä»¥å对åºçå¾çæ ¼å¼ç¹å¾å/æå¾ç龿¥ç¹å¾ï¼ä¸åè¾å ¥çéå¨ä¸è®ç»çé卿¨¡åï¼æè¿°çéå¨230ï¼è¿ä¸æ¥éäºæ ¹æ®è®ç»åçæ¨¡åï¼ç»åç®æ å¾çç°å¯¹åºçç¸å¯¹è½¬è½½æ°ä»¥å对åºçå¾çæ ¼å¼ç¹å¾å/æå¾ç龿¥ç¹å¾ï¼å¯¹ç®æ å¾çç°è¿è¡çéï¼æè¿°è¯å«æ¨¡å240ï¼è¿ä¸æ¥ç¨äºæ ¹æ®æè¿°çéå¨åºäºç®æ å¾çç°å¯¹åºçç¸å¯¹è½¬è½½æ°ä»¥å对åºçå¾çæ ¼å¼ç¹å¾å/æå¾ç龿¥ç¹å¾å¯¹ç®æ å¾çç°è¿è¡çéï¼è¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãIn addition, in practical applications, the system further includes: a picture format feature module 310 and/or a picture link feature module 320; the picture format feature module 310 is suitable for extracting the format of pictures contained in the same source picture cluster and the target picture cluster feature; the picture link feature module 320 is adapted to extract the link features of the pictures contained in the homologous picture cluster and the target picture cluster; and the corresponding picture format feature and/or picture link feature, and input the filter model into the filter together; the filter 230 is further adapted to combine the relative number of reprints corresponding to the target picture cluster and the corresponding Image format features and/or image link features, to filter the target image cluster; the identification module 240 is further used to filter based on the relative number of reprints corresponding to the target image cluster and the corresponding image format features and/or images The link feature screens the target image cluster and identifies the image content attributes in the target image cluster.
æå©äºå¯¹å¹¿åå¾çè¿è¡è¿æ»¤çå¤çï¼é¿å 广åå¾çå¯¹ç¨æ·çä½éªé æå½±åï¼åè®¾ç®æ å¾çç°ä¸ºå¯¹åºå¾çæç´¢è¯·æ±çä¸ç»å¾çï¼åæ ¹æ®æ¬å®æ½ä¾çææ¯æ¹æ¡ï¼å¯ä»¥ä»å ¶ä¸è¯å«åºå¹¿åå¾çå¹¶è¿è¡è¿æ»¤ï¼ä»èå°é广åå¾çä½ä¸ºæç´¢ç»ææä¾ç»ç¨æ·ï¼ä»èä¿è¯ç¨æ·ç使ç¨ä½éªãIt is beneficial to perform filtering and other processing on the advertisement pictures, so as to avoid the impact of the advertisement pictures on the user experience. Assuming that the target picture cluster is a group of pictures corresponding to the picture search request, according to the technical solution of this embodiment, the advertisement pictures can be identified from them And filtering is performed, so that non-advertising images are provided to users as search results, thereby ensuring user experience.
å¨å®é åºç¨ä¸ï¼å¨æ¬åææåºçç¸å¯¹è½¬è½½æ°ä¹å¤ï¼è¿èèå°å ¶ä»çç¹å¾ï¼ä¾å¦å¾ççé¿/宽ï¼å¾çç大å°ï¼å¾ççæ¸ æ°åº¦ï¼å¾ç龿¥æ¯å¦åç½é¡µåç«ï¼æå¾çè·³è½¬é¾æ¥æ¯å¦ç«å¤çç¹å¾ï¼åæ ·å ç»è¿åç±»å¨å»å¦ä¹ åè®ç»ãå¨ç®æ å¾çç°è¯å«æ¶ï¼ä¹ä¼èèä¸è¿°è¿äºå ¶ä»ç¹å¾ä¸çä¸ä¸ªæå¤ä¸ªæ¥è¿è¡çéå¹¶è¯å«æ¯å¦ä¸ºå¹¿åå¾çãIn practical applications, in addition to the relative number of reprints proposed by the present invention, other features are also considered, such as the length/width of the picture, the size of the picture, the clarity of the picture, whether the link of the picture is on the same site as the web page, or whether the picture jumps Features such as whether the link is off-site or not are also learned and trained by the classifier first. When identifying the target picture cluster, one or more of the above-mentioned other features will also be considered to screen and identify whether it is an advertisement picture.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡å210对äºå¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¾å¦å¨å¾çç«Aä¸è½¬è½½äº30次ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼ä¾å¦å¨10个å¾çç«ï¼å æ¬å¾çç«Aï¼ä¸å ±è½¬è½½äº35次ï¼å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼å¤ä¸ªèµæºç«ç¹å æ¬ç¹å®èµæºç«ç¹ï¼æ¬å®æ½ä¾ä¸æä¾äºè®¡ç®ç¸å¯¹è½¬è½½æ°çå¯è¡æ¹å¼ï¼ä¸ä¸å¯¹å ·ä½çæ¯è¾æ¹å¼è¿è¡éå®ï¼ä¾å¦ï¼å30/35ã30/ï¼35-30ï¼ä½ä¸ºç¸å¯¹è½¬è½½æ°é½æ¯å¯ä»¥çãAnother embodiment of the present invention proposes a picture content attribute recognition system. Compared with the above-mentioned embodiment, in the picture content attribute recognition system of this embodiment, the relative reprint number calculation module 210 can determine the number of pictures of a same source in a plurality of homologous picture clusters. Source picture cluster, compare the number of reprints of pictures in the same source picture cluster on a specific resource site, for example, 30 times on picture site A, with the number of reprints on multiple resource sites, for example, in 10 pictures A total of 35 reprints were made on the website (including picture site A), and the relative number of reprints of the same-source picture cluster to a specific resource site was obtained. Multiple resource sites include specific resource sites. This embodiment provides a feasible way to calculate the relative reprint number , and does not limit the specific comparison method, for example, it is all possible to take 30/35, 30/(35-30) as the relative reprint number.
å¦å¾5æç¤ºï¼æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼è¿å æ¬ï¼ç¬¬ä¸å¹³å转载æ°è®¡ç®æ¨¡å250ï¼ç¨äºè®¡ç®ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°ï¼ä¾å¦å设å¾çç«Aç第ä¸å¹³å转载æ°ä¸º5ï¼ç¬¬äºå¹³å转载æ°è®¡ç®æ¨¡å260ï¼ç¨äºè®¡ç®å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°ï¼ä¾å¦å设10个å¾çç«ï¼å æ¬å¾çç«Aï¼ç第äºå¹³å转载æ°ä¸º20ï¼ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡å210ååæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬ä¸å¹³å转载æ°ç第ä¸å·®å¼ï¼å第ä¸å·®å¼å®é ä¸å¯åæ åæºå¾çç°çå¾çä¸å ¶ä»å¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载差å¼ï¼å·®å¼è¶å¤§åè¡¨ç¤ºåæºå¾çç°ä¸ºå¹¿åå¾ççå¯è½æ§è¶å¤§ï¼ç»ååè¿°ç宿½ä¾å¯ç¥ç¬¬ä¸å·®å¼ä¸º30-5=25ï¼ä»¥åååæºå¾çç°ä¸çå¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ä¸ç¬¬äºå¹³å转载æ°ç第äºå·®å¼ï¼å第äºå·®å¼å®é ä¸å¯åæ åæºå¾çç°çå¾çä¸å ¶ä»å¾çå¨å¤ä¸ªèµæºç«ç¹ä¸ç转载差å¼ï¼å·®å¼è¶å¤§è¡¨ç¤ºåæºå¾çç°ä¸ºå¹¿åå¾ççå¯è½æ§è¶å°ï¼ç»ååè¿°ç宿½ä¾å¯ç¥ç¬¬äºå·®å¼ä¸º35-20=15ï¼å°ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯å¾å°åæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æ¬å®æ½ä¾ä¸æä¾äºå¦ä¸ç§è®¡ç®ç¸å¯¹è½¬è½½æ°çæ¹å¼ï¼ä¸èèå°åæºå¾çç°çå¾çä¸å ¶ä»å¾çç转载差å¼ï¼ä½¿å¾ç¸å¯¹è½¬è½½æ°è½æ´å¥½å°åæ å¾çæ¯å¦ä¸ºå¹¿åå¾çï¼æ¬å®æ½ä¾ä¸ä¸å¯¹ç¬¬ä¸å·®å¼å第äºå·®å¼å¯¹æ¯æ¹å¼è¿è¡éå®ï¼ä¾å¦ï¼å25/15ï¼ï¼25±aï¼/ï¼15±bï¼é½æ¯å¯ä»¥çï¼aãb为常æ°ãAs shown in Figure 5, another embodiment of the present invention proposes a picture content attribute identification system. Compared with the above-mentioned embodiment, the picture content attribute identification system of this embodiment further includes: a first average reprint count calculation module 250 , used to calculate the first average number of reprints of pictures on a specific resource site, for example, assuming that the first average number of reprints of picture site A is 5; the second average number of reprints calculation module 260 is used to calculate the number of pictures on multiple resource sites For example, assuming that the second average number of reprints of 10 picture sites (including picture site A) is 20; the relative reprint number calculation module 210 takes the number of reprints of pictures in the same source picture cluster on a specific resource site The first difference with the first average number of reprints, the first difference can actually reflect the reprint difference between the pictures of the same-source picture cluster and other pictures on a specific resource site, and the larger the difference, it means that the same-source picture cluster is The greater the possibility of advertising pictures, combined with the foregoing embodiment, it can be seen that the first difference is 30-5=25, and the number of reprints of pictures in the same source picture cluster on multiple resource sites and the second average number of reprints are taken. The second difference, the second difference can actually reflect the reprint difference between pictures of the same source picture cluster and other pictures on multiple resource sites, the larger the difference, the less likely the same source picture cluster is an advertisement picture , in combination with the foregoing embodiment, it can be seen that the second difference is 35-20=15, and the first difference and the second difference are compared to obtain the relative number of reprints of the same-source picture cluster for a specific resource site. This embodiment provides another A way to calculate the relative number of reprints, and taking into account the reprinting differences between pictures of the same source picture cluster and other pictures, so that the relative reprint number can better reflect whether the picture is an advertisement picture. In this embodiment, the first difference and The second difference is defined in a comparative manner, for example, 25/15, (25±a)/(15±b) is acceptable, and a and b are constants.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼ç¬¬ä¸å¹³å转载æ°è®¡ç®æ¨¡å250åå¤ä¸ªåæºå¾çç°çå¾çä¸ä½äºç¹å®èµæºç«ç¹ä¸çå¤ä¸ªå¾çï¼å°å¤ä¸ªå¾ççæ°éä¸å¤ä¸ªå¾ç对åºçåæºå¾çç°çæ°éè¿è¡å¯¹æ¯ï¼å¾å°ç¬¬ä¸å¹³å转载æ°ï¼ä¾å¦å¾çç«A䏿100å¼ å¾çï¼è¯¥100å¼ å¾çä½äº20个å¾çç°ä¸ï¼å第ä¸å¹³å转载æ°ä¸º100/20=5ï¼æ¬å®æ½ä¾çææ¯æ¹æ¡ä¸æä¾äºä¸ç§å¿«é髿å¾å°å¹³å转载æ°çæ¹å¼ãAnother embodiment of the present invention proposes a picture content attribute recognition system. Compared with the above-mentioned embodiment, in the picture content attribute recognition system of this embodiment, the first average reprint count calculation module 250 takes pictures of multiple homologous picture clusters For multiple pictures located on a specific resource site, compare the number of multiple pictures with the number of homologous picture clusters corresponding to multiple pictures to obtain the first average number of reprints. For example, if there are 100 pictures on picture site A, the If 100 pictures are located in 20 picture clusters, the first average number of reprints is 100/20=5. The technical solution of this embodiment provides a way to quickly and efficiently obtain the average number of reprints.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼ç¬¬äºå¹³å转载æ°è®¡ç®æ¨¡å260å°å¤ä¸ªåæºå¾çç°çå¾ççæ°éï¼ä¸å¤ä¸ªåæºå¾çç°çæ°éè¿è¡æ¯è¾ï¼å¾å°ç¬¬äºå¹³å转载æ°ï¼ä¾å¦10个å¾çç«ï¼å æ¬å¾çç«Aï¼ä¸æ1000å¼ å¾çï¼è¯¥1000å¼ å¾çå¯è类为50个å¾çç°ï¼å第äºå¹³å转载æ°ä¸º1000/50=20ï¼æ¬å®æ½ä¾çææ¯æ¹æ¡ä¸æä¾äºä¸ç§å¿«é髿å¾å°å¹³å转载æ°çæ¹å¼ãAnother embodiment of the present invention proposes a picture content attribute recognition system. Compared with the above-mentioned embodiment, in the picture content attribute recognition system of this embodiment, the second average reprint number calculation module 260 calculates the picture content of a plurality of homologous picture clusters is compared with the number of multiple homologous picture clusters to obtain the second average number of reprints. For example, there are 1000 pictures on 10 picture sites (including picture site A), and the 1000 pictures can be clustered into 50 pictures cluster, the second average number of reprints is 1000/50=20, and the technical solution of this embodiment provides a way to quickly and efficiently obtain the average number of reprints.
å¦å¾6æç¤ºï¼æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼è¿å æ¬ï¼å¾ç龿¥æå模å270ï¼ç¨äºæåå¤ä¸ªèµæºç«ç¹ä¸åºç°çå¾ç龿¥ï¼URLï¼ï¼å¾ç龿¥æ£æµæ¨¡å280ï¼ç¨äºæ£æµå¾ç龿¥ä¸åæºå¾çç°çå¾ç对åºç龿¥æ¯å¦ç¸åï¼è¿åæ äºä¸å¼ å¾çæ¯å¦ä»¥ä¸åçURL被转载ï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾ççæ ¡éªä¿¡æ¯ä¸åæºå¾çç°çå¾ççæ ¡éªä¿¡æ¯ï¼å æ¬ä½ä¸éäºMD5å¼ï¼æ¯å¦ç¸åï¼è¿åæ äºæ¯å¦åå¨å¤å¼ ç¸åçå¾çï¼å/ææ£æµå¾ç龿¥å¯¹åºçå¾çä¸åæºå¾çç°çå¾çæ¯å¦åå¨ä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼è¿åæ äºå¤å¼ å¾çæ¯å¦ç¸åï¼æç±åä¸å¼ å¾çä¿®æ¹å¾å°ï¼æ¬å®æ½ä¾ä¸çå¾åç¹å¾å æ¬ä½ä¸éäºè½®å»ç¹å¾ãé¢è²ç¹å¾ãç´æ¹å¾ç¹å¾çï¼å¾ç转载æ°ç»è®¡æ¨¡å290ï¼ç¨äºæ ¹æ®æ£æµç»æï¼ç¡®å®å¾ç龿¥æ¯å¦ä¸ºåæºå¾çç°çå¾çç转载ï¼å¹¶ç»è®¡åæºå¾çç°çå¾çç转载æ°ï¼åæ¬å®æ½ä¾ä¸æä¾äºä¸ç§å¯å ¨é¢ç»è®¡å¾ç转载æ°çææ¯æ¹æ¡ãAs shown in Figure 6, another embodiment of the present invention proposes a picture content attribute identification system, compared with the above-mentioned embodiment, the picture content attribute identification system of this embodiment also includes: picture link capture module 270, uses The image link (URL) that appears on multiple resource sites is used for grabbing; the image link detection module 280 is used to detect whether the image link is the same as the link corresponding to the pictures of the same source image cluster, which reflects whether a picture is in a different The URL is reposted, and/or check whether the verification information of the picture corresponding to the picture link is the same as the verification information (including but not limited to MD5 value) of the pictures of the same source picture cluster, which reflects whether there are multiple identical pictures, And/or detect whether the picture corresponding to the picture link has one or more identical image features with the pictures of the same source picture cluster, which reflects whether the multiple pictures are the same, or are obtained by modifying the same picture. The image in this embodiment Features include but are not limited to contour features, color features, histogram features, etc.; picture reprint count statistics module 290 is used to determine whether the picture link is a reprint of a picture of a homologous picture cluster according to the detection results, and count the number of homologous picture clusters For the number of reprints of pictures, this embodiment provides a technical solution that can comprehensively count the number of reprints of pictures.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼ç¹å®èµæºç«ç¹ä¸ºå¤ä¸ªåæºå¾çç°ä¸è½¬è½½æ¯ä¸ªåæºå¾çç°çå¾çæå¤çèµæºç«ç¹ï¼è½¬è½½å¾çæå¤æ¬¡æ°çç«ç¹å¾å¯è½ä¸ºå¹¿åå¾ççåæ·è¿è¡ä¼ æçç«ç¹ï¼è¯¥ç«ç¹å¯¹åºçè½¬è½½æ°æè½å¤ææå°åæ åºå¾çæ¯å¦ä¸ºå¹¿åå¾çãAnother embodiment of the present invention proposes a picture content attribute recognition system. Compared with the above-mentioned embodiment, in the picture content attribute recognition system of this embodiment, a specific resource site reprints each same-source picture in a plurality of same-source picture clusters The resource site with the most pictures in the cluster, and the site with the most reposted pictures are likely to be the sites that the merchants of the advertising pictures spread. The number of reprints corresponding to this site can most effectively reflect whether the pictures are advertising pictures.
æ¬åæçå¦ä¸å®æ½ä¾æåºä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼ä¸ä¸è¿°å®æ½ä¾ç¸æ¯ï¼æ¬å®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ï¼æ¯ä¸ªåæºå¾çç°çå¾ç对åºå䏿ºå¾çï¼ä¸æ¯ä¸ªåæºå¾çç°çå¾çä¸å ¶å¯¹åºçæºå¾çå ·æä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼å卿¬å®æ½ä¾çææ¯æ¹æ¡ä¸ï¼æ¯ä¸ªåæºå¾çç°çå¾çç¸åï¼æå¯ä»¥åä¸å¾çä¿®æ¹å¾å°ï¼æ¬å®æ½ä¾ä¸çå¾åç¹å¾å æ¬ä½ä¸éäºè½®å»ç¹å¾ãé¢è²ç¹å¾ãç´æ¹å¾ç¹å¾çãAnother embodiment of the present invention proposes a picture content attribute recognition system. Compared with the above-mentioned embodiment, in the picture content attribute recognition system of this embodiment, each picture of the same source picture cluster corresponds to the same source picture, and each same source picture The pictures of the source picture cluster and the corresponding source pictures have one or more identical image features, then in the technical solution of this embodiment, the pictures of each same-source picture cluster are the same, or can be obtained by modifying the same picture. Image features in include but not limited to contour features, color features, histogram features, etc.
卿¤æä¾çç®æ³åæ¾ç¤ºä¸ä¸ä»»ä½ç¹å®è®¡ç®æºãèæç³»ç»æè å ¶å®è®¾å¤åºæç¸å ³ãåç§éç¨ç³»ç»ä¹å¯ä»¥ä¸åºäºå¨æ¤ç示æä¸èµ·ä½¿ç¨ãæ ¹æ®ä¸é¢çæè¿°ï¼æé è¿ç±»ç³»ç»æè¦æ±çç»ææ¯æ¾èæè§çãæ¤å¤ï¼æ¬åæä¹ä¸é对任ä½ç¹å®ç¼ç¨è¯è¨ãåºå½æç½ï¼å¯ä»¥å©ç¨åç§ç¼ç¨è¯è¨å®ç°å¨æ¤æè¿°çæ¬åæçå 容ï¼å¹¶ä¸ä¸é¢å¯¹ç¹å®è¯è¨æåçæè¿°æ¯ä¸ºäºæ«é²æ¬åæçæä½³å®æ½æ¹å¼ãThe algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.
卿¤å¤ææä¾ç说æä¹¦ä¸ï¼è¯´æäºå¤§éå ·ä½ç»èãç¶èï¼è½å¤çè§£ï¼æ¬åæç宿½ä¾å¯ä»¥å¨æ²¡æè¿äºå ·ä½ç»èçæ åµä¸å®è·µãå¨ä¸äºå®ä¾ä¸ï¼å¹¶æªè¯¦ç»ç¤ºåºå ¬ç¥çæ¹æ³ãç»æåææ¯ï¼ä»¥ä¾¿ä¸æ¨¡ç³å¯¹æ¬è¯´æä¹¦ççè§£ãIn the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
类似å°ï¼åºå½çè§£ï¼ä¸ºäºç²¾ç®æ¬å ¬å¼å¹¶å¸®å©çè§£åä¸ªåææ¹é¢ä¸çä¸ä¸ªæå¤ä¸ªï¼å¨ä¸é¢å¯¹æ¬åæçç¤ºä¾æ§å®æ½ä¾çæè¿°ä¸ï¼æ¬åæçå个ç¹å¾ææ¶è¢«ä¸èµ·åç»å°åä¸ªå®æ½ä¾ãå¾ãæè å¯¹å ¶çæè¿°ä¸ãç¶èï¼å¹¶ä¸åºå°è¯¥å ¬å¼çæ¹æ³è§£éæåæ å¦ä¸æå¾ï¼å³æè¦æ±ä¿æ¤çæ¬åæè¦æ±æ¯å¨æ¯ä¸ªæå©è¦æ±ä¸ææç¡®è®°è½½çç¹å¾æ´å¤çç¹å¾ãæ´ç¡®åå°è¯´ï¼å¦ä¸é¢çæå©è¦æ±ä¹¦æåæ ç飿 ·ï¼åææ¹é¢å¨äºå°äºåé¢å ¬å¼çåä¸ªå®æ½ä¾çææç¹å¾ãå æ¤ï¼éµå¾ªå ·ä½å®æ½æ¹å¼çæå©è¦æ±ä¹¦ç±æ¤æç¡®å°å¹¶å ¥è¯¥å ·ä½å®æ½æ¹å¼ï¼å ¶ä¸æ¯ä¸ªæå©è¦æ±æ¬èº«é½ä½ä¸ºæ¬åæçåç¬å®æ½ä¾ãSimilarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
æ¬é¢åé£äºææ¯äººåå¯ä»¥çè§£ï¼å¯ä»¥å¯¹å®æ½ä¾ä¸ç设å¤ä¸ç模åè¿è¡èªéåºæ§å°æ¹å并䏿å®ä»¬è®¾ç½®å¨ä¸è¯¥å®æ½ä¾ä¸åçä¸ä¸ªæå¤ä¸ªè®¾å¤ä¸ãå¯ä»¥æå®æ½ä¾ä¸ç模åæåå æç»ä»¶ç»åæä¸ä¸ªæ¨¡åæåå æç»ä»¶ï¼ä»¥åæ¤å¤å¯ä»¥æå®ä»¬åæå¤ä¸ªå模åæååå æåç»ä»¶ãé¤äºè¿æ ·çç¹å¾å/æè¿ç¨æè åå ä¸çè³å°ä¸äºæ¯ç¸äºææ¥ä¹å¤ï¼å¯ä»¥éç¨ä»»ä½ç»å对æ¬è¯´æä¹¦ï¼å æ¬ä¼´éçæå©è¦æ±ãæè¦åéå¾ï¼ä¸å ¬å¼çææç¹å¾ä»¥å妿¤å ¬å¼ç任使¹æ³æè 设å¤çææè¿ç¨æåå è¿è¡ç»åãé¤éå¦å¤æç¡®éè¿°ï¼æ¬è¯´æä¹¦ï¼å æ¬ä¼´éçæå©è¦æ±ãæè¦åéå¾ï¼ä¸å ¬å¼çæ¯ä¸ªç¹å¾å¯ä»¥ç±æä¾ç¸åãçåæç¸ä¼¼ç®ççæ¿ä»£ç¹å¾æ¥ä»£æ¿ãThose skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings), as well as any method or method so disclosed, may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
æ¤å¤ï¼æ¬é¢åçææ¯äººåè½å¤çè§£ï¼å°½ç®¡å¨æ¤æè¿°çä¸äºå®æ½ä¾å æ¬å ¶å®å®æ½ä¾ä¸æå æ¬çæäºç¹å¾è䏿¯å ¶å®ç¹å¾ï¼ä½æ¯ä¸å宿½ä¾çç¹å¾çç»åæå³çå¤äºæ¬åæçèå´ä¹å å¹¶ä¸å½¢æä¸åç宿½ä¾ãä¾å¦ï¼å¨ä¸é¢çæå©è¦æ±ä¹¦ä¸ï¼æè¦æ±ä¿æ¤ç宿½ä¾çä»»æä¹ä¸é½å¯ä»¥ä»¥ä»»æçç»åæ¹å¼æ¥ä½¿ç¨ãFurthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
æ¬åæçå个é¨ä»¶å®æ½ä¾å¯ä»¥ä»¥ç¡¬ä»¶å®ç°ï¼æè 以å¨ä¸ä¸ªæè å¤ä¸ªå¤çå¨ä¸è¿è¡ç软件模åå®ç°ï¼æè 以å®ä»¬çç»åå®ç°ãæ¬é¢åçææ¯äººååºå½çè§£ï¼å¯ä»¥å¨å®è·µä¸ä½¿ç¨å¾®å¤çå¨æè æ°åä¿¡å·å¤çå¨ï¼DSPï¼æ¥å®ç°æ ¹æ®æ¬åæå®æ½ä¾çå¾çå 容屿§è¯å«ç³»ç»ä¸çä¸äºæè å ¨é¨é¨ä»¶çä¸äºæè å ¨é¨åè½ãæ¬åæè¿å¯ä»¥å®ç°ä¸ºç¨äºæ§è¡è¿éææè¿°çæ¹æ³çä¸é¨åæè å ¨é¨çè®¾å¤æè è£ ç½®ç¨åºï¼ä¾å¦ï¼è®¡ç®æºç¨åºåè®¡ç®æºç¨åºäº§åï¼ãè¿æ ·çå®ç°æ¬åæçç¨åºå¯ä»¥åå¨å¨è®¡ç®æºå¯è¯»ä»è´¨ä¸ï¼æè å¯ä»¥å ·æä¸ä¸ªæè å¤ä¸ªä¿¡å·çå½¢å¼ãè¿æ ·çä¿¡å·å¯ä»¥ä»å ç¹ç½ç½ç«ä¸ä¸è½½å¾å°ï¼æè å¨è½½ä½ä¿¡å·ä¸æä¾ï¼æè 以任ä½å ¶ä»å½¢å¼æä¾ãThe various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the image content attribute recognition system according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
åºè¯¥æ³¨æçæ¯ä¸è¿°å®æ½ä¾å¯¹æ¬åæè¿è¡è¯´æè䏿¯å¯¹æ¬åæè¿è¡éå¶ï¼å¹¶ä¸æ¬é¢åææ¯äººåå¨ä¸è±ç¦»æéæå©è¦æ±çèå´çæ åµä¸å¯è®¾è®¡åºæ¿æ¢å®æ½ä¾ã卿å©è¦æ±ä¸ï¼ä¸åºå°ä½äºæ¬å·ä¹é´çä»»ä½åèç¬¦å·æé æå¯¹æå©è¦æ±çéå¶ãåè¯âå å«â䏿é¤å卿ªå卿å©è¦æ±ä¸çå ä»¶ææ¥éª¤ãä½äºå ä»¶ä¹åçåè¯âä¸âæâä¸ä¸ªâ䏿é¤åå¨å¤ä¸ªè¿æ ·çå ä»¶ãæ¬åæå¯ä»¥åå©äºå æ¬æè¥å¹²ä¸åå ä»¶ç硬件以ååå©äºéå½ç¼ç¨çè®¡ç®æºæ¥å®ç°ãå¨å举äºè¥å¹²è£ ç½®çåå æå©è¦æ±ä¸ï¼è¿äºè£ ç½®ä¸çè¥å¹²ä¸ªå¯ä»¥æ¯éè¿åä¸ä¸ªç¡¬ä»¶é¡¹æ¥å ·ä½ä½ç°ãåè¯ç¬¬ä¸ã第äºã以å第ä¸çç使ç¨ä¸è¡¨ç¤ºä»»ä½é¡ºåºãå¯å°è¿äºåè¯è§£é为åç§°ãIt should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.
Claims (10) Translated from Chinese1.ä¸ç§å¾çå 容屿§è¯å«æ¹æ³ï¼å ¶å æ¬ï¼1. A method for identifying image content attributes, comprising: 计ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼Calculate the relative number of reprints of multiple homologous image clusters for a specific resource site; æ ¹æ®æè¿°å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è®ç»çé卿¨¡åï¼training a filter model according to the plurality of homologous picture clusters and the corresponding relative reprint numbers; æ ¹æ®è®ç»åççé卿¨¡åè¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãIdentify image content attributes in the target image cluster based on the trained filter model. 2.æ ¹æ®æå©è¦æ±1æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼æè¿°è®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°çæ¥éª¤å æ¬ï¼2. The picture content attribute identification method according to claim 1, wherein the step of calculating the relative number of reprints of a plurality of homologous picture clusters for a specific resource site comprises: å¯¹äºæè¿°å¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°æè¿°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼å¾å°æè¿°åæºå¾çç°å¯¹äºæè¿°ç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æè¿°å¤ä¸ªèµæºç«ç¹å æ¬æè¿°ç¹å®èµæºç«ç¹ãFor a homologous picture cluster among the plurality of homologous picture clusters, comparing the number of reprints of pictures in the homologous picture cluster on a specific resource site with the number of reprints on multiple resource sites, to obtain The number of relative reprints of the homologous picture cluster to the specific resource site, where the multiple resource sites include the specific resource site. 3.æ ¹æ®æå©è¦æ±2æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼æè¿°å°æè¿°åæºå¾çç°ä¸çå¾çå¨æè¿°ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾çæ¥éª¤å æ¬ï¼3. The picture content attribute identification method according to claim 2, wherein, the number of reprints of the pictures in the homologous picture cluster on the specific resource site is compared with the number of reprints on multiple resource sites The steps to compare include: è®¡ç®æè¿°ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°ï¼calculating the first average number of reprints of pictures on the specific resource site; è®¡ç®æè¿°å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°ï¼calculating the second average number of reprints of pictures on the plurality of resource sites; åæè¿°åæºå¾çç°ä¸çå¾çå¨æè¿°ç¹å®èµæºç«ç¹ä¸ç转载æ°ä¸æè¿°ç¬¬ä¸å¹³å转载æ°ç第ä¸å·®å¼ï¼ä»¥ååæè¿°åæºå¾çç°ä¸çå¾çå¨æè¿°å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ä¸æè¿°ç¬¬äºå¹³å转载æ°ç第äºå·®å¼ï¼å°æè¿°ç¬¬ä¸å·®å¼åæè¿°ç¬¬äºå·®å¼å¯¹æ¯å¾å°æè¿°åæºå¾çç°å¯¹äºæè¿°ç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ãTaking the first difference between the number of reprints of the pictures in the same-source picture cluster on the specific resource site and the first average number of reprints, and taking the pictures in the same-source picture cluster among the multiple The second difference between the number of reprints on the resource site and the second average number of reprints, comparing the first difference with the second difference to obtain the relative Number of reprints. 4.æ ¹æ®æå©è¦æ±3æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼æè¿°è®¡ç®æè¿°ç¹å®èµæºç«ç¹ä¸çå¾çç第ä¸å¹³å转载æ°çæ¥éª¤å æ¬ï¼4. The picture content attribute identification method according to claim 3, wherein the step of calculating the first average number of reprints of pictures on the specific resource site comprises: åæè¿°å¤ä¸ªåæºå¾çç°çå¾çä¸ä½äºæè¿°ç¹å®èµæºç«ç¹ä¸çå¤ä¸ªå¾çï¼å°æè¿°å¤ä¸ªå¾ççæ°éä¸æè¿°å¤ä¸ªå¾ç对åºçåæºå¾çç°çæ°éè¿è¡å¯¹æ¯ï¼å¾å°æè¿°ç¬¬ä¸å¹³å转载æ°ãTaking multiple pictures located on the specific resource site among the pictures of the multiple homologous picture clusters, comparing the number of the multiple pictures with the number of homologous picture clusters corresponding to the multiple pictures, and obtaining The first average number of reprints. 5.æ ¹æ®æå©è¦æ±3æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼æè¿°è®¡ç®æè¿°å¤ä¸ªèµæºç«ç¹ä¸çå¾çç第äºå¹³å转载æ°çæ¥éª¤å æ¬ï¼5. The picture content attribute identification method according to claim 3, wherein the step of calculating the second average number of reprints of pictures on the multiple resource sites comprises: å°æè¿°å¤ä¸ªåæºå¾çç°çå¾ççæ°éï¼ä¸æè¿°å¤ä¸ªåæºå¾çç°çæ°éè¿è¡æ¯è¾ï¼å¾å°æè¿°ç¬¬äºå¹³å转载æ°ãThe number of pictures in the multiple homologous picture clusters is compared with the number of the multiple homologous picture clusters to obtain the second average number of reprints. 6.æ ¹æ®æå©è¦æ±2æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼å¨æè¿°å°æè¿°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾çæ¥éª¤ä¹åï¼è¿å æ¬ï¼6. The picture content attribute identification method according to claim 2, wherein, comparing the number of reprints of the pictures in the homologous picture cluster on a specific resource site with the number of reprints on multiple resource sites Before the comparison step, also include: æåæè¿°å¤ä¸ªèµæºç«ç¹ä¸åºç°çå¾ç龿¥ï¼Grab the image links appearing on the multiple resource sites; æ£æµæè¿°å¾ç龿¥ä¸æè¿°åæºå¾çç°çå¾ç对åºç龿¥æ¯å¦ç¸åï¼å/ææ£æµæè¿°å¾ç龿¥å¯¹åºçå¾ççæ ¡éªä¿¡æ¯ä¸æè¿°åæºå¾çç°çå¾ççæ ¡éªä¿¡æ¯æ¯å¦ç¸åï¼å/ææ£æµæè¿°å¾ç龿¥å¯¹åºçå¾çä¸æè¿°åæºå¾çç°çå¾çæ¯å¦åå¨ä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ï¼Detecting whether the picture link is the same as the link corresponding to the picture in the same-source picture cluster, and/or detecting whether the check information of the picture corresponding to the picture link is the same as the check information of the picture in the same-source picture cluster , and/or detecting whether one or more of the same image features exist between the picture corresponding to the picture link and the picture of the homologous picture cluster; æ ¹æ®æ£æµç»æï¼ç¡®å®æè¿°å¾ç龿¥æ¯å¦ä¸ºæè¿°åæºå¾çç°çå¾çç转载ï¼å¹¶ç»è®¡æè¿°åæºå¾çç°çå¾çç转载æ°ãAccording to the detection result, determine whether the picture link is a reprint of a picture of the same-source picture cluster, and count the number of reprints of the pictures of the same-source picture cluster. 7.æ ¹æ®æå©è¦æ±2æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼7. The picture content attribute identification method according to claim 2, wherein, æè¿°ç¹å®èµæºç«ç¹ä¸ºæè¿°å¤ä¸ªåæºå¾çç°ä¸è½¬è½½æ¯ä¸ªåæºå¾çç°çå¾çæå¤çèµæºç«ç¹ãThe specific resource site is the resource site that reprints the most pictures of each same-source picture cluster among the plurality of same-source picture clusters. 8.æ ¹æ®æå©è¦æ±1è³7ä¸ä»»ä¸é¡¹æè¿°çå¾çå 容屿§è¯å«æ¹æ³ï¼å ¶ä¸ï¼8. The picture content attribute recognition method according to any one of claims 1 to 7, wherein, æ¯ä¸ªåæºå¾çç°çå¾ç对åºå䏿ºå¾çï¼ä¸æ¯ä¸ªåæºå¾çç°çå¾çä¸å ¶å¯¹åºçæºå¾çå ·æä¸ä¸ªæå¤ä¸ªç¸åçå¾åç¹å¾ãThe pictures of each same-source picture cluster correspond to the same source picture, and the pictures of each same-source picture cluster and its corresponding source picture have one or more identical image features. 9.ä¸ç§å¾çå 容屿§è¯å«ç³»ç»ï¼å ¶å æ¬ï¼9. A picture content attribute identification system, comprising: ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡åï¼ç¨äºè®¡ç®å¤ä¸ªåæºå¾çç°å¯¹äºç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼The relative reprint count calculation module is used to calculate the relative reprint count of multiple homologous image clusters for a specific resource site; è®ç»æ¨¡åï¼ç¨äºå°æè¿°å¤ä¸ªåæºå¾çç°ä»¥å对åºçç¸å¯¹è½¬è½½æ°è¾å ¥çéå¨ä¸è®ç»çé卿¨¡åï¼A training module, configured to input the plurality of homologous picture clusters and the corresponding relative reprint numbers into the filter to train the filter model; çéå¨ï¼éäºæ ¹æ®æè¿°è®ç»æ¨¡åå¾å°è®ç»åççé卿¨¡åï¼å¹¶æ ¹æ®æè¿°æ¨¡åå¯¹ç®æ å¾çç°è¿è¡çéï¼A filter, adapted to obtain a trained filter model according to the training module, and filter the target picture cluster according to the model; è¯å«æ¨¡åï¼ç¨äºæ ¹æ®æè¿°çéå¨å¯¹ç®æ å¾çç°è¿è¡çéï¼è¯å«ç®æ å¾çç°ä¸çå¾çå 容屿§ãThe identification module is configured to screen the target picture cluster according to the filter, and identify the picture content attributes in the target picture cluster. 10.æ ¹æ®æå©è¦æ±9æè¿°çå¾çå 容屿§è¯å«ç³»ç»ï¼å ¶ä¸ï¼10. The picture content attribute identification system according to claim 9, wherein, æè¿°ç¸å¯¹è½¬è½½æ°è®¡ç®æ¨¡åå¯¹äºæè¿°å¤ä¸ªåæºå¾çç°ä¸çä¸ä¸ªåæºå¾çç°ï¼å°æè¿°åæºå¾çç°ä¸çå¾çå¨ç¹å®èµæºç«ç¹ä¸ç转载æ°ï¼ä¸å¨å¤ä¸ªèµæºç«ç¹ä¸ç转载æ°ç¸æ¯è¾ï¼å¾å°æè¿°åæºå¾çç°å¯¹äºæè¿°ç¹å®èµæºç«ç¹çç¸å¯¹è½¬è½½æ°ï¼æè¿°å¤ä¸ªèµæºç«ç¹å æ¬æè¿°ç¹å®èµæºç«ç¹ãThe relative reprint number calculation module calculates the number of reprints of the pictures in the same source picture cluster on a specific resource site and the number of pictures in the multiple resource sites for a homologous picture cluster among the multiple homologous picture clusters By comparing the number of reprints of the same-source picture cluster to the specific resource site, the relative reprint number of the same-source picture cluster is obtained, and the multiple resource sites include the specific resource site.
CN201310632676.8A 2013-12-02 2013-12-02 Picture content attribute identification method and system Active CN103617262B (en) Priority Applications (2) Application Number Priority Date Filing Date Title CN201310632676.8A CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system PCT/CN2014/087109 WO2015081748A1 (en) 2013-12-02 2014-09-22 Method and system for identifying content attribute of picture Applications Claiming Priority (1) Application Number Priority Date Filing Date Title CN201310632676.8A CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system Publications (2) Family ID=50167965 Family Applications (1) Application Number Title Priority Date Filing Date CN201310632676.8A Active CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system Country Status (1) Cited By (5) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title CN103995857A (en) * 2014-05-14 2014-08-20 å京å¥èç§ææéå ¬å¸ Method and device for achieving image search and sorting WO2015081748A1 (en) * 2013-12-02 2015-06-11 å京å¥èç§ææéå ¬å¸ Method and system for identifying content attribute of picture CN105022738A (en) * 2014-04-21 2015-11-04 䏿µ·äº¬ç¥ä¿¡æ¯ç§ææéå ¬å¸ Extracting and mapping method of network picture format file on the basis of histograms CN106599177A (en) * 2016-12-12 2017-04-26 å½äºç§æè¡ä»½æéå ¬å¸ A processing method for advertising page shielding CN107451180A (en) * 2017-06-13 2017-12-08 ç¾åº¦å¨çº¿ç½ç»ææ¯ï¼åäº¬ï¼æéå ¬å¸ Identify method, apparatus, equipment and the computer-readable storage medium of website affinity Citations (4) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data CN101071433A (en) * 2007-05-10 2007-11-14 è ¾è®¯ç§æï¼æ·±å³ï¼æéå ¬å¸ Picture download system and method CN102419777A (en) * 2012-01-10 2012-04-18 å¤å°å¨çº¿(å京)ä¿¡æ¯ææ¯æéå ¬å¸ Internet picture advertisement filtering system and filtering method thereof CN102591983A (en) * 2012-01-10 2012-07-18 å¤å°å¨çº¿(å京)ä¿¡æ¯ææ¯æéå ¬å¸ Advertisement filter system and advertisement filter methodEffective date of registration: 20220727
Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015
Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.
Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)
Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.
Patentee before: Qizhi software (Beijing) Co.,Ltd.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4