??ÎÄÓÉlinglijun911??Ï×

pdfÎÄµµ?ÉÄÜÔÚWAP?Ëä?ÀÀÌåÑé???Ñ????ÒéÄúÓÅÏÈÑ?ÔñTXT???òÏÂÔØÔ?ÎÄ?þµ????ú?é????

Feature Article

the same key used to mark the content. By breaking a single client, the adversary inherently breaks the entire system. In addition, watermark detection in this scenario must be performed blindly (that is, without the original recording) and in real time, even on small devices. To date, no technology has been able to robustly detect watermarks under such circumstances. An alternative to content screening is content ?ngerprinting. In this scenario, copyright owners distribute a uniquely watermarked content copy to each client. Clients can play or redistribute the content without any barriers. Copyright owners scan public distribution channels for illegally distributed clips. Each identi?ed pirated clip is analyzed for the existence of one or more ?ngerprints, which are used to trace piracy to its origin. Detecting ?ngerprints usually requires powerful machines that can devote signi?cant resources to the forensic process. A ?ngerprint detector can access the original unmarked object and use it to detect ?ngerprints, even from content modi?ed by malicious attacks. The main vulnerability of this forensic protection scenario is collusion. A clique of malicious users can collude their copies and create a new copy that is statistically clean of any traces that might point to any of the colluders. Typically, the collusion resistance of even the best ?ngerprint-encoding schemes is low1 (for example, less than one hundred for a two-hour high-de?nition video clip).

A Dual WatermarkFingerprint System

Darko Kirovski, Henrique Malvar, and Yacov Yacobi Microsoft Research

A dual-purpose watermarking and ?ngerprinting system for multimedia screening uses the same secret key to mark all content copies, but different detection keys within each media player. Under optimal attacks, the system??s collusion resistance is superlinear in object size. he Internet??s growth has made the unauthorized copying and distribution of digital media easier than ever before. As a consequence, the music industry claims an enormous annual revenue loss due to piracy (see http://www.riaa.org)?ªa loss allegedly exacerbated by file-sharing Web communities. As Internet bandwidth increases, piracy is likely to affect the movie industry in a similar manner. Legal attempts to alleviate the problem have had limited success so far. One source of hope for copyrighted content distribution on the Internet lies in technological advances that would allow copyright enforcement in both client-server and peer-to-peer scenarios. Traditional

data-protection methods such as scrambling or encryption are minor hurdles to attackers because they can always rerecord the content and freely distribute it. This problem is commonly referred to as the analog hole. One proposed solution is to hide within the media signal a secret, robust, and imperceptible watermark. A watermark designates a multimedia clip as protected. Before playing the clip, the client machine searches for watermark existence within the clip. Watermark detection signals to the client machine that playing the clip requires a license. Recent efforts have sought to de?ne standards for protecting music content using watermarks with little success (see

http://www.sdmi.org). The main vulnerability of watermark-based content-screening systems is that each client machine must store

T

Dual watermark-?ngerprint system

This article proposes a multimedia content protection system in which all copies of a protected object are identically watermarked, but each user has a distinct secret detection key that differs from the secret embedding key. An attacker with access to one detection key can fool the corresponding watermark detector but not other watermark detectors. Surprisingly, analogous to a criminal action, during this attack the attacker necessarily inserts his or her ?ngerprint into the modi?ed content. Even a collusion clique of relatively large size cannot entirely remove the secret marks from the protected content by colluding their detection keys. More importantly, if the clique is not large enough, traces of the detection keys of all colluders can be detected with relatively high accuracy in the attacked clip. Figure 1 (next page) illustrates the main entities of our dual watermark-?ngerprint system and their interactions. Our proposed watermark-fingerprint system achieves a minimum collusion size K that grows

1070-986X/04/$20.00 ? 2004 IEEE

Published by the IEEE Computer Society

59

Authorized licensed use limited to: Georgia State University. Downloaded on December 19, 2008 at 16:49 from IEEE Xplore. Restrictions apply.

Key generation entity Random key generator ci Random key generator i detector index

Watermark chips w Marked media x Media signal Marker Ci y

hi Watermark detector key Attacker v Optimal mark estimator {hi , i = 1, 2, ??,k} Set of detector keys

Modified media ? y = y?Cv

hi Mark present/ absent decision

mation attack,3 and the swap attack4 have successfully removed or obfuscated embedded watermarks enough to fool blind detection. When

robust watermarking systems become a reality, however, the techniques we describe here might contribute signi?cantly in building an ef?cient content protection system for multimedia content distribution for a large-scale client base. In addition to describing the system, we present performance analyses for the low-cost sensitivity attack5 and an improved attack on the ?ngerprint detector with additive Gaussian noise. The two efforts are new with respect to our previously published work.6 Dual watermarking Traditional spread-spectrum watermark systems, outlined in the ??Spread-Spectrum Watermarking?? sidebar, detect watermarks using a key w, which is in essence a secret watermarking key (SWK). In copyright enforcement schemes, watermark detection is done at the client (the media player), which must then have access to the SWK. Adversaries can recreate the original content if they succeed in obtaining the SWK from a single player. This can be achieved in several ways: by breaking into a detector?ªthat is, reverse engineering the detection software or hardware?ªor using the sensitivity attack.5 In our dual watermark-fingerprint system, depicted in Figure 1, the watermark detection key (WDK) differs from the SWK, so breaking into a single detector does not provide enough information to remove the watermark w. The media signal x is watermarked the same way as in traditional spread-spectrum watermarking. However, for each media player i, an individualized watermark-?ngerprint detection key WDK hi is created from an SWK w in the following way. Let C = {cij} denote an m ?Á N matrix, where cij ?Ê R, cij = N (0, B2), that is, each entry is a zero-mean Gaussian random variable with standard deviation ?Òc = B. For any practical purpose of the dual watermark?ngerprint system, we assume B ?Ö A. Each row i contains a watermark carrier, denoted by ci. The ith WDK is de?ned as hi = w + ci. The goal of the watermark carrier ci is to hide the SWK w in hi so that knowledge of hi does not deterministically imply knowledge of w, as long as B is large enough. In other words, no player contains the SWK w, but rather a modified version of it. Because the players use correlation-based watermark detection, they should still be capable of detecting the watermark in a marked content y

{i} Indices of collusion participants

Correlation detector

Correlation detector

Fingerprint detector

Watermark detector (in media player)

Figure 1. General system block diagram for the proposed watermark?ngerprint system. Note that each watermark detector i uses a different detection key hi. In the attack model, a set of detection keys is colluded to form an estimate v of the watermark w.

linearly with the size N of the marked object. In addition, we can

augment our watermark-?ngerprint system with a segmentation layer. The media content is partitioned into S segments, in which media players as well as forensic analyzers can reliably detect a watermark or fingerprint. Only detection keys that belong to the same segment can participate in the collusion clique. With segmentation, the minimum collusion size K grows as O(N log N). Therefore, with or without segmentation, our watermark-?ngerprint system significantly improves on the best-known asymptotic resistance to (fingerprint) collusion attacks of about O(N1/4).1 Because we use a new protection protocol, comparing our system to classic fingerprint systems might seem unfair. However, such a comparison is important because the two technologies share a common goal: multimedia copyright enforcement. Our aim in this article is to characterize the collusion attacks against this system under the assumption that watermark detection is robust against signal-processing attacks on the protected object. To the best of our knowledge, no modern watermark technology demonstrates such robustness. To date, attacks such as Stirmark,2 the esti-

IEEE MultiMedia 60

Authorized licensed use limited to: Georgia State University. Downloaded on December 19, 2008 at 16:49 from IEEE Xplore. Restrictions apply.

as long as the number of chips N is large enough to attenuate the noise introduced by the watermark carriers ci. The detection process involves correlating the received media file ? with hi, which generates a y detector output dW = y . hi. Similarly to traditional ? spread-spectrum watermarking, if y is marked, dW ? = 1 + gW; otherwise dW = 0 + gW. The difference is that now gW is a function of both the media signal x and the watermark carrier ci. If there are no attacks?ªthat is, y = y = x + w?ªthen ? dW = y + hi = (x + w) (w + ci) = 1 + gW, where gW = s (w + ci) + w ? ci is a zero-mean noise component of dW. For this case, we derive the detection noise variance as ?Ò 2 ?Ö (A2 + B2 + A2B2 )/N. gW In the remainder of the article, we assume equality in the expression for detection noise variance ?ÒgW. Because the detection noise variance is significantly increased because of the watermark carrier ci (if y is watermarked, then gW ? = gT + x . ci + w . ci or else gW = gT + x . ci, where Var[x . ci] >> Var[gT + w . ci]), our watermark-fingerprint system requires larger N than traditional spread spectrum for the same watermark detector performance. We can improve detector performance by generating watermark carriers such that (?ci ?Ê C)w . ci = 0. Although such carriers do not pose any tangible effect on system security, they reduce the detection noise while screening watermarked clips. Copyright enforcement Our dual watermark-?ngerprint system comprises a number of entities that combine to enforce copyright protection. Watermark detector (WMD). The WMD

correlates a potentially marked signal y with the ? client??s WDK hi?ªthat is, dW = y .hi. It decides that ? the content is marked if dW > ?ÄW. The probability of false positives (identifying an unmarked content as marked) is denoted as ?Å1, which must be relatively small?ªfor example, ?Å1 = 10?C9. Attacker. As part of an optimal attack to the system, the adversary breaks K clients and extracts their WDKs {hi, i = 1 ?? K}. Next, the adversary creates an attack vector v as an optimal estimate of the SWK w given the collusion key set {hi, i = 1, ?? K}, and generates an attacked signal as ? = y ? v. y The closer v estimates w, the more the attacker

Spread-Spectrum Watermarking

In spread-spectrum watermarking, the media signal to be watermarked x ?Ê RN can be modeled as a random variable, in which each element of x is Gaussain random variable with standard deviation A?ªthat is, xj = N (0, A2). For audio signals, for example, A is typically within A ?Ê {5, 15}, after necessary media preprocessing steps. A watermark key w is a spread-spectrum (SS) sequence vector w ?Ê {?À 1}N, where each element wj is usually called a chip. Vector addition?ªy = x + w?ªcreates the marked signal y. A small modification of this embedding rule, or improved spread spectrum (ISS),1 leads to lower detection-error probability. To simplify the following analysis, we assume the use of traditional spread spectrum. Let w ? v denote the normalized inner product of vectors w and v?ªthat is, w ? v ?Ô N?C1?Æwjvj, with w2 ?Ô w ? w. For example, for w as defined previously, we have w2 = 1. We assume the media player contains a watermark detector that receives a modi?ed version y of the watermarked signal y. The ? watermark detector performs a correlation (or matched ?lter) test dT = y .w, ? and, using a classical Neyman-Pearson hypothesis test, decides that the watermark is present if dT > ?ÄT, where ?ÄT is the detection threshold controlling the tradeoff between the false positive probabilities and false negative decisions. Modulation and detection theory have shown that such a detector is optimal.2 With no malicious attacks or other signal modi?cations (that is, y = y), if ? the signal y is marked, dT = 1 + gT where the detection noise gT is a Gaussian zero-mean random variable with variance ?Ò 2T = A2/N. Otherwise, the correg lation test yields dT = 0 + gT. For equal probabilities of false positives and false negatives, we should set ?ÄT = 1/2. For robustness against attacks, we must appropriately choose the signal domain x, and might have to make some small modi?cations on the watermark pattern. In this article, we assume that the designers of the watermark detector have considered these precautions, so we can disregard media attacks. Finally, modeling the host signal x as a Gaussian random variable is a realistic assumption because most watermarking technologies replicate watermark chips to provide robustness to desynchronization attacks. From the watermark

detector??s perspective, chip replication is equivalent to averaging samples of the host signal. Consequently, averaged samples of the host signal, due to the central limit theorem, can be relatively accurately approximated using Gaussian distribution, regardless of the distribution of the original host signal samples.

References

1. H.S. Malvar and D. Flor?ºncio, ??Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking,?? IEEE Trans. Signal Processing, vol. 51, no. 4, Apr. 2003, pp. 898-905. 2. H.L. van Trees, Detection, Estimation, and Modulation Theory, Part I, John Wiley & Sons, 1968.

removes the watermark while generating ? . We y use ?Å2 to denote the probability that a watermark chip is incorrectly estimated by the attacker?ªthat is, ?Å2 = Pr[vj ?Ù wj]. The attacker aims at forcing ?Å2 to be as small as possible, whereas we design the system parameters such that ?Å2 is near 1/2.

61

Authorized licensed use limited to: Georgia State University. Downloaded on December 19, 2008 at 16:49 from IEEE Xplore. Restrictions apply.

Fingerprint detector (FPD). The FPD recovers the attack vector v from an attacked content y and the originally marked content y simply by ? v = y ? y. Unlike the WMDs, the FPD has access to ? the watermark carrier matrix C. Thus, the FPD correlates v with a suspect watermark carrier ci?ª that is, it computes dF = v ? ci and decides that the ith client is part of the collusion if dF > ?ÄF (in other words, ?ÄF is the FPD threshold). The FPD has less noise in the correlated vectors than the WMD, and thus the collusion resistance of the FPD is much higher. We use ?Å3 to denote the probability of false positives in the FPD?ªthat is, incriminating a player that was not in the collusion set. Therefore, ?Å3 must be very small, like ?Å1. We use ?Ç to denote the probability of false negatives at the FPD. We would like it to be small, but do not insist that it is as small as ?Å1 and ?Å3. Ultimately, the adversary??s goal is to create an optimal attack vector v ?Ö w based on a collection of K WDKs such that ? v reduces the expected E[dW] to a level at which the probability of detecting a true positive at the WMD is relatively low (?Ö ?Å1) and ? v reduces the expected E[dF] to a level at which the likelihood of a false negative at the FPD is relatively high (for example, ?Ç ?Ý 0.9).

Attacks without collusion

An adversary with knowledge of at most one WDK can perform several attacks on an object. Attacks on protected objects A basic assumption of our watermark-fingerprint mechanism is that there exists a spreadspectrum watermarking mechanism that can be broken only by

modifying the marked content beyond the threshold for low fidelity of the attacked copy with respect to the original recording.3 Typical attacks in this domain range from compression, ?ltering, resampling, equalization, and various other editing procedures,7 to

desynchronization (or data shifting) techniques that aim to misalign the embedded spread-spectrum sequence in the content (such as the Stirmark attack2). Robustness to desynchronization attacks can be achieved using a certain amount of chip redundancy.3 However, spread-spectrum chip replication also improves the ef?cacy of a watermark estimation attack.3 In addition, the fact that both audio and video are highly repetitive phe-

nomena has spurred a highly successful new generation of swap attacks, which replace relatively lengthy watermarked blocks of audio or video with perceptually similar blocks found elsewhere and hence, not marked with the corresponding watermark.4 It is difficult to achieve protection against such attacks. The basic assumption of this article is that if such improvements to watermaking techniques are ever developed, the attacker would be forced to focus on breaking the player to retrieve the watermark detection keys. Having a robust watermarking technology is not the only requirement for secure e-commerce transactions of multimedia. Traditional watermarking assumes that the SWK is hidden at the client side. By breaking a single client, the adversary can create the original content and thus allow all clients to play the content as unmarked. We refer to this as BORE: break once, run everywhere. In our system, we assume the attacker will eventually break at least one client and capture that machine??s WDK. For the economic viability of the dual watermark-fingerprint system, the effort of breaking a client should be difficult to automate: a trustworthy operating system would bar patching, software debugging, and reverse engineering as simple ways to obtain machines?? WDKs. (Trustworthy computing is usually de?ned as storage and computation that cannot be altered or intercepted by any system user without tampering with the computing hardware.) Similarly, watermark detectors should deter trivial implementations of the sensitivity attack as a way to retrieve the detection key, as we discuss later. Our scheme is generally BORE-resistant at the protocol level. By breaking a single client, the adversary can play content as unmarked on that broken client, but must collude the extracted client WDKs with other clients to finally create content that can play on all players. With our dual

watermark-fingerprint system, we significantly improve collusion resistance through a ?ngerprinting mechanism that can identify the members of the clique if the clique??s cardinality is smaller than a relatively large lower bound. Subtraction attack Suppose an adversary breaks client i and extracts its WDK hi = ci + w. The adversary can

then create an attack vector v = ?Áhi such that the modified media y = y ? v produces E[dw] = ? E[ y . hi] ?Äw, thus defeating that client??s water? mark detector. To determine ?Á, we note that

IEEE MultiMedia 62

dW = y ? hi = [ x + w ? ?Á (ci + w )] ? (ci + w ) = 1 ? ?Á (1 + ci2 ) + x ? ci + x ? w + (1 ? 2?Á )ci ? w

0.5

pdf (dW) E [dW] = 2?Å2

y=x+w E [dW] = 1

0.4

E [dW] = 0

Thus, by setting ?Á = (1 + B2)?C1, we get E[dW] = 0?ª that is, dW = 0 + gW. We also see that ?Ò2w g 2 (3+A 2+B 2+A 2B 2)/N and ?Ò v = ?Á2(1+B 2) = ?Á 1. Therefore, given knowledge of the client??s detection key, the subtraction attack can drive the detector correlation all the way to zero with just a slight increase in the detector noise ?ÒgW2 and a negligible increase in content distortion 2 (because ?Ò v w 2=1). If the attacker tries to use a key hl to break a detector i ?Ù l, to drive E[dW] = 0, the attacker would need to set ?Á = 1. However, 2 this would drive ?Ò v = (1+B 2) 1, causing too much content distortion. It would also make ?Ò2w g increase by an amount equal to 3B4/N, which would make the decisions in the ith watermark detector erratic. In other words, even by driving E[dW] = 0, the ith detector cannot be broken with probability much better than 1/2. Resemblance to public-key systems We have concluded that the attacker??s knowledge of a single detector??s WDK hi is not suf?cient to break any other detector via the key subtraction attack. Knowing hi is not enough to infer w either. In that respect, our dual watermark/ fingerprint system resembles a public-key cryptosystem, because knowledge of the veri?cation key (hi) does not imply knowledge of the signing key (w). However, unlike public-key cryptosystems, our system does not expose the WDK outside an individual player.

0.3

=x+w?v y

0.2

0.1

y=x

0 ?0.5

dw 0

?Ä w = 0.5

1.0

1.5

Proof. The optimal estimate for each element vj of the attack vector is given by vj = +1 if Pr[wj = +1|{hi}] ?Ý 1/2, and vj = ?C1 if Pr[wj = +1|{hi}] < 1/2. This estimate is optimal because it minimizes Pr[vj wj]. Because hij = wj + cij, where cij is independent and Gaussian, we can write Pr[wj = +1|{hi}] = 1/(1 + ?Íj), where vj=??K 1pc(hij?1) and pc(?Æ )=(2?Ð)? i= 1/2 exp[??Æ2/(2B2)]. We can write ?Íj = exp(?C2?Ñj/B2), where ?Ñj = ??K hij. Thus, Pr[wj = +1|{hi} ?Ý 1/2] when i=1 ?Ñj ?Ý 0, and Pr[wj = +1|{hi] < 1/2] when ?Ñj < 0. ?ö WMD performance Given the optimal attack, we can compute the average estimation error in the attack vector, ?Å2 = Pr[vj ?Ù wj], as follows. Because the wj chips are equally likely to be +1 or ?C1, due to the symmetry of w, ?Å2 = Pr[sj ?Ý 0|wj = ?C1]. Because for wj = ?C1, we have sj = ?K + cj, where?cj = ??K cij. Therefore, i=1 ? ?Å2=Pr[ cj ?ÝK], where?cj has a Gaussian distribution ? with zero mean and variance B?ÌK. ? Corollary 1. A collusion of size K produces ? K ? 1 erfc ? ? 2 ?B 2?

Figure 2. The probability density functions for correlation tests against three signals: a nonmarked ? y=x, marked ? y=x+w, and attacked using the v = sign(mean(?)) attack ? y=x+w ? v. All correlation test deviations approximate ?ÒgW?ÖAB/?ÌN. We set ? exemplary detection threshold to ?ÄW=1/2.

Collusion attacks

Consider a collusion clique of size K that has broken its players and extracted K different WDKs hi. We devise the optimal attack based on that set of keys {hi, i = 1, 2, ??, K}. Without loss of generality, we assume that the extracted WDKs (with indices 1 to K) are those in the collusion. Optimal attack The attacker??s job is to estimate the SWK w by an attack vector v so that the modified media y = y?v will not show significant correlation in ? any watermark detector j, not even for j > K. The best job the attacker can perform is given by the sign(mean(??)) attack. Lemma 1. The optimal attack is performed using the vector v = sign(??K= 1hi). i

?Å2 =

Given ?Å2, we can evaluate the ef?ciency of the subtraction attack y=y?v for the optimal attack ? vector v. Because E[v ? w] = Pr[vj = wj] ?C Pr[vj ?Ù wj] = 1 ?C 2?Å2, after attack the expected outcome of the watermark correlation detector drops to E[dW] = 2?Å2. Figure 2 depicts the resulting probability density functions (PDFs) for dW when computed against an original, marked, and attacked signal. The attacker might attempt a stronger subtrac-

July?CSeptember 2004 63

Figure 3. Corollary 2??s claim. The ?gure depicts the collusion size K required to reduce the correlation value to E[dW] = ?È.

100

90

80

70 Collusion cardinality K

60

50

40

30

20 ?Â = {1, 1.1, ??, 2} 10

0 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Correlation after the attack E[dw = ?È]

tion attack, of the form y=y ? ?Âv with ?Â > 1, to ? bring the WMD output down further, to E[dW] = 2?Â?Å2 ?C (?Â ?C 1). As long as ?Â is not too large, the attacked content y might be acceptable to users. ? Defeating the WMD To reduce the expected correlation to E[dW] = ?È, the adversary must achieve an attack vector error rate of ?Å2 = (?È + ?Â ?C 1)/(2?Â) through collusion. From Corollary 1, we see that for fixed ?È and ?Â the minimum collusion size grows proportional to B2. Corollary 2. To reduce the correlation value to E[dW] = ?È, the adversary must collude K WDKs, with ? ? 1? ?È? ? K = 2 B ?erf -1 ? ? ? ?Â ?? ?? ? ?

2 2

or the probability that a WMD will still detect the watermark will not be low enough to justify the attacker??s effort. In other words, the attack is successful only if it makes ?Å1 ?Ö 1. It is not necessary to set ?È all the way to zero because doing so would require an excessively large K. By setting ?Â > 1, however, we can force ?È = 0. To make the attacker??s job more dif?cult, we increase parameter B, the standard deviation of the watermark carrier c, because K grows with B2. In doing so, however, we increase the detection noise variance ?Ò 2 = (A2 + B2 + A2B2)/N, where A is the gW standard deviation of the original content x and N is the object size. For a given ?ÒgW, we determine that the probability of false positives ?Å1 = Pr[dw > ?Äw|object is not marked] is given by Corollary 3. Corollary 3. An object of size N produces ? ? ?ÄW N 1 ? erfc ? 2 ? 2( A2 + B2 + A2 B2 ) ? ? ?

IEEE MultiMedia

Example 1. For B = 10, ?È = 0.25, and ?Â = 2, the attacker must collude at least K = 24 keys. For ?Â = 1, the attacker must collude at least K = 133 keys. Figure 3 illustrates the dependency of K with respect to ?È and ?Â for B = 10. The attacker must set ?È much smaller than ?ÄW,

?Å1 =

If ?ÄW = 1/2, e1 is also the probability of false negatives?ªthat