Technical measures (part V): content identification
This week Future of Copyright will devote specific attention to technical measures against piracy that can be employed by ISPs. Today we discuss the pros and cons of filtering by means of content identification.
One last measure to combat harmful and illegal content is automatic file recognition. If files can be identified automatically, data transfers from these files can subsequently be blocked (filtering). It also creates the possibility to search P2P networks or UseNet for illegally distributed content (scanning). Several techniques can be used to make files identifiable, such as hash matching, fingerprinting and watermarking.
With hash matching, a mathematical formula is applied to a file to generate a unique alphanumeric string (hash value). This hash value is unique to that particular file and all its copies. Possible matches can be found by calculating the hash values of certain files and comparing them to a database with known hash values. If there is a match, the content in question is illegal. It is important to note that hash values are unique to the bits files are composed of, not their contents. When files are converted from -.mpg to -.avi, for example, the hash values will no longer correspond.
Fingerprinting is a process similar to hash matching. Instead of identifying bits, fingerprinting identifies contents. Fingerprints are attached to original files and remain intact even if these files are subsequently altered. As in hash matching, the fingerprints of certain files can be compared to a database of known fingerprints. Watermarking affixes unique and (preferably) invisible digital watermarks to files. When watermarked files are found, they can be identified by running them through a database with known watermarks.
The good thing about content recognition is that it enables the identification and blocking of harmful and illegal content without impeding the legitimate transfer of files. However, setting up an effective system for content recognition can be daunting. Even though several providers are already using content recognition systems to filter out child pornography, there is no clear-cut solution for filtering out copyrighted works, which is a rather more complicated matter. Copyrighted materials often come in multiple large files, which means they are often transferred in fragments. This not only makes file recognition much harder, but it also requires significantly more computing power.
ISPs have indicated that implementing content recognition systems is a definite possibility, but they also stated that they (themselves) would not be able to shoulder the costs associated with using additional computing power. Another problem is that content recognition systems for copyrighted materials are not standardised at the moment, which means ISPs will have to implement several different systems. They will also need to consult with numerous copyright owners and hardware and software suppliers. A cooperative body consisting of copyright owners and suppliers would help solve this problem, providing ISPs with a central contact point and access to a larger database with reference files.

Comments(0)
Your comment