Blockchain content licensing for Cognitive Computing

Posted on 23 December 2015 by Cristiano Solarino

At the time of this writing there is no satisfactory business model that addresses licensing of copyrighted content for use in Cognitive Computing Applications (CCAs) and APIs such as IBM Watson™. We propose a novel solution based on Blockchain technology that simultaneously guarantees proof of ownership of the content, incentivises the content owner/publisher to distribute via the new channel and guarantees pricing fairness and cost control for CCAs implementors through accurate measure of actual, real-time content usage.

Introduction

A class of Cognitive Computing Application (CCA) is Question/Answer (Q/A). It’s a revision of the expert system paradigm whereby the underlying expertise is accessed through question-based interactions. Users can formulate their question in a manner that is most natural to them and that befits the context of the interaction. This is possible thanks to improvements in technologies like speech recognition, natural language processing and efficient analysis of large amounts of data.

Once the question is processed a Q/A cognitive system will generally compute statistical samples of the knowledge corpus as well as the degree of confidence on their relevance as pertinent answers.

The knowledge corpus is unquestionably the fundamental component of this class of Cognitive Computing Application and also its Achilles’ heel. As CCAs start becoming larger in scope and their applicability grows outside the sphere of very narrow domains, they will require a proportional if not exponential increase in the amount of content needed and variety of sources the knowledge material is gathered from. The quality of the source material will also have to meet the demands of CCA users especially in those areas where quality is critical (healthcare, finance, law, etc).

Most CCA implementors are unlikely to be content producers themselves, therefore, those that cannot rely on scraping free content from the internet will be in the difficult position of having to negotiate with third parties for the usage / acquisition / licensing of the required material. The problem that content publishers are faced with is how to monetise the peculiar way in which CCAs distribute content – the problem for CCA implementors is that standard content licensing models are wholly inapplicable and costly relative to the actual use of the content.

CCAs and their corpus

Let’s assume a Q/A CCA system. Its core component is the content corpus C comprised of a set of documents {d1, d2, .., dn} = C. Typically a single (possibly copyrighted) content resource, say a book, an academic paper, or a journal, will have to be pre-processed or curated before it can be added to C. This usually means stripping tables, images and other noise (headings, adverts, etc.) in order to only retain the textual information. The cleaner version of the resource may additionally be reformatted in terms of headings, paragraphs and sections and may be broken down into several documents. It is easy to see that each resource’s former value will have been diluted in varying degrees by this operation.

Q/A CCAs are probabilistic systems meaning that given a question q each document in C has a probability p(D = di|q) of being sampled from for an answer. The training process consists in presenting the cognitive module with a realistic set of questions sampled from the underlying probability distribution Q over the set of all possible questions that could be asked by users of the CCA and matching each with an appropriate excerpt from some di. Before release of a CCA a testing process will employ domain experts to produce new samples of questions and rate answers in order to update and tweak the posterior distribution p(D|Q).

The crucial point for our present purpose is that at a given time a resource has only a certain probability of being picked for producing an answer and if picked, only a fraction (a small paragraph) of the document is ever returned as the suggested answer. The probability of a resource being selected depends ultimately on:

  • the total number of distinct resources submitted to the corpus
  • the result of the pre-processing operation
  • the probability distribution Q
  • the result of the training process
  • the result of the testing process
  • the interaction frequency of users with the CCA

Additionally the number of total resources will very likely increase and/or change over time impacting all other CCA processes. Note also that, depending on the distribution Q, for any given resource it may only be the same small portion of the resource that is ever selected. This drastically reduces the value of the rest of the resource to the CCA and its users.

Content monetisation

In light of the above is it reasonable to maintain that a fair monetisation scheme for the use of a single resource can be implemented a-priori based on its market value as a purchaseable, whole product? Note that a fair monetisation scheme must be one that protects both parties. Very popular CCAs could potentially reach user bases of millions. Suppose the corpus is comprised of a single resource: it could be indeed detrimental for the resource provider / publisher to be naive about content usage.

In any case, one important point that must be understood is that CCAs create a novel channel of distribution for content providers and therefore an additional source of revenue that is not mutually exclusive with existing revenue streams simply because the patterns of usage and interaction with the resource are completely different. For this reason it is counterproductive for content providers to entertain overly greedy or defensive monetisation schemes as long as the ownership and copyrights of the material are duly protected.

What requirements can we draw for a monetisation scheme of licensable, copyrighted content that is applicable in the context of CCAs? We think that:

Requirement 1: It must be fair for all parties.

Requirement 2: It must adapt to the probabilistic nature of CCAs as distribution channels.

Requirement 3: It must be robust and secure, i.e. impervious to malicious manipulation (”gaming the system”), breach of ownership, breach of copyright, etc.

Requirement 4: It must be auditable – audit trails can be used as additional evidence in matters of dispute.

Requirement 5: It must incentivise content providers to use CCAs as additional distribution channels.

Requirement 6: It must incentivise CCA implementors to adopt the scheme.

Requirement 7: It must provide a means of cost control.

Proposal for a solution

We propose that a monetisation scheme that conforms to the requirements identified above for applicability to CCAs as content distribution channels is implemented in the form of a Distributed Application (Dapp) built on top of a blockchain. Specifically a blockchain technology that supports smart contracts.

Blockchains and smart contracts

From Monax explainers:

A blockchain network validates data-driven transactions while also preventing the incorporation of unauthorised transactions. (…) A blockchain network operates as a distributed data store, meaning there is no single master node within the network; rather, every node is an equal peer.

(…)

Modern blockchain designs are capable of storing arbitrary data and establishing permissions to modify that data through self-administering and self-executing scripts which are performed by a distributed virtual machine.These scripts are known as smart contracts, and they allow platform operators to define complex and fully customisable rules which govern the blockchain’s interaction with its users.

Fairness

We propose a blockchain because of its decentralised nature whereby the system is formed entirely of identical peers with no single-point of entry removing the need of fee-enabled middle man services or systems built with the invested interest of one party or the other in mind. Furthermore, the use of independent, automated, chain-bound, smart contracts for processing, validating and dispatching transactions removes the possibility of bias, corruption or malpractice.

Adaptability

The only way for a monetisation scheme to be adaptable in these circumstances is to be as dynamic as the system it is intended to quantify. Therefore, we suggest for each Q/A transaction within a CCA to be mapped to a corresponding blockchain transaction. From these transactions we would expect to be able to at least gather:

  • who to debit (e.g. CCA provider),
  • who to credit (e.g. document copyright owner), and
  • how much value to debit/credit (e.g. depending on document selected, extent of excerpt returned, etc.).

We would additionally like to imbue them with the capability to support:

  • expenditure monitoring and control (e.g. CCA-defined caps), and
  • possibly the means for content providers to be safeguarded from improper use of their content.

Robustness, security and auditing

A blockchain database is robust against denial of service or single point of failure because of its distributed nature. Blockchains are also designed with the requirement to guarantee cryptographically that the chain with most consensus reflects the true transactions history from genesis to the current block. Additionally, the update rule for a chain, that is, the way in which a new block is created, employs difficult mathematical problems that are expensive in terms of computing resources to solve. This makes it unfeasible to try and forge blocks or create alternative versions of the chain containing forged transactions. A blockchain is not like a traditional database, which always stores the latest state of a datum when a record is updated. A blockchain records the entire history of values and transitions so that the complete transaction history can be retrieved, which is ideal for auditing purposes.

Consider finally a strategy whereby smart contracts record ownership and copyright information from the content provider prior to a CCA issuing transactions. Upon a transaction being issued, other smart contracts will communicate with the ownership record contract to establish the validity of the transaction and either pass or reject and penalise the issuer.

Incentivisation

Blockchain technology makes it remarkably easy to create token systems, from tradeable sub-currencies to systems with no corresponding real-world value. This coupled with smart contracts that compute, collect / dispense fees, would enable on-chain, automated payments for the fair use of resources but also alternatively provide reward token schemes for both CCA transaction issuers and content providers that appropriately incentivise both parties to participate in blockchain enabled CCA transaction.

Cost control

Smart contracts are essentially autonomous agents that can act on behalf of an entity be it a Dapp (distributed application) or even another smart contract. Special contracts can be created whereby CCA transaction issuers can set cost bounds. Whenever a cost bound is breached, the smart contract can infer from the context whether to update the bound or block the transaction altogether and notify the origin CCA, which in turn may decide to filter out the blocked content from subsequent answers / responses.

Conclusion

We propose a blockchain-based, content monetisation scheme that is appropriate for the requirements of Cognitive Computing Applications. At the essence of the monetisation scheme is the real-time mapping between a CCA transaction (a user asking a question to the Q/A module) and a blockchain transaction that contains the source of the transaction, the source of the content and the extent of the resource that was distributed. Once details about the value of a unit of content are established and agreed (the blockchain could also be used to manage auctions whereby CCAs can discover and use content providers with the most appropriate fees for their quality requirements and budget) the CCA transaction issuer is guaranteed that the costs incurred will follow directly from the probability distribution p(D|Q).

Smart contracts would autonomously regulate the recording and maintenance of ownership and copyright on behalf of content providers and will validate transactions against these records. Smart contracts would also enable CCA transaction issuers to specify and monitor cost incurred in real-time and react accordingly when a threshold has been reached. Blockchains support token systems so that on-chain fee payment can be implemented as well as any other suitable token-based incentivisation scheme aimed at establishing and increasing traction to the system from both CCA implementor and content providers. Ultimately blockchains are designed from the ground up to be secure, robust and auditable.