Introduction
That data helps to generate value is a very robust idea [1] [2]: We talk about data as “the new oil”, and the concept of “big data” is widely spread. While this idea applies to many areas of modern life, it is especially prominent in the financial sector, where data-based insights are crucial to make the right decisions and adequately navigate through the waves of uncertainty. In the financial sector, traders analyze data to generate insights, gain knowledge, and ultimately make better investments. Researchers analyze financial and consumers data to generate knowledge about our economy and society. Companies analyze data to forecast the development of their industries, to predict demand for new products, and to be able to anticipate shocks. These applications turn data into a production factor that once analyzed, leads to the generation of knowledge and value.
However, data is still not being fully understood as a production factor, and is not being fully exploited either. The central reason why data is not fully understood yet, is because data is very different from other traditional production factors, such as capital, labor, and oil. The reason why data is not being fully exploited is because its attributes make its trade in a data market rather difficult if not outright impossible, confine it to be kept in closed silos despite its digital nature, and stop organizations to maximize its potential value.
New technologies such as blockchain and artificial intelligence, as well as a new conception of how data works (and more importantly, how data can work), are changing this situation. We are embarking in a transition from big data to shared data, in which the knowledge that emerges from data is starting to securely move in our society. All this is happening thanks to the design and implementation of data exchanges. Data exchanges are platforms that gather data from many different sources and that allow third parties to run algorithms on these data. As a result, these third parties can generate insights (knowledge) with new sources of data. Hence, data exchanges give rise to the concept of shared data, which is the natural next step of big data.
Data exchanges are going to profoundly transform the way in which knowledge is generated, and they are going to open the horizon for the next level of data-based value generation. Specifically, they are going to set incentives for citizens and organizations to record and share novel data – data which is going to generate value without violating the privacy of any citizen or organization. This change is going to enable organizations to generate new knowledge by analyzing these novel sources of data. And the innovative part of this transformation is that it will occur without sensitive data from any organization or individual escaping the secure boundaries of its current confinement. A new wave of value is about to come.
In this chapter we guide the reader in understanding which characteristics make data so special as a production factor, and we reflect on how data exchanges are going to change the way in which knowledge is generated. In addition, we suggest how a data exchange could greatly enhance the value of the Federal Reserve Board’s Survey of Consumer Finances for businesses, policy makers, and researchers.
Data as a production factor
Data is a non-fungible production factor. This means that the concept of data is very broad: one unit of data (for example one MB) can contain data about almost everything that can be recorded. Some of these data might be useful for an organization, whereas some of them might not. Let us illustrate this idea with an example. When an investment fund receives one unit of investment (e.g. one USD) it is irrelevant for the fund which specific USD out of the many USD in circulation it receives. At the end of the day, the one USD is going to be invested and hopefully produce a return after some time. This fact makes capital a fungible production factor, since one USD is replaceable by any other random dollar and this replacement does not affect the fund’s performance. However, should the same fund receive one unit of data (e.g. one MB of data) to develop a new trading algorithm, then not all the units of data will equally serve the firm. Out there exist health-related data, financial-data, geo-location data, weather data, public data, private data, curated data, noisy data, etc. Some of this data will serve the fund to train its algorithms, and some will not. This fact makes data a non-fungible production factor.
Data tends to create value when it comes in big volumes. While small amounts of data can be valuable in very particular contexts, they are useless for conducting business analytics, for training an algorithm, or for identifying trends. Hence, in many contexts, only the aggregation of data that results in big volumes of data, has value for the organizations.
Moreover, and contrary to capital and labor, data is non-exclusive in its use, meaning that the same unit of data can be used for example by many funds at the same time [3]. This is different from what happens with capital and labor, since a dollar can only be invested in one stock at a time, just like a worker’s hour of work can only take place at one firm.
Data changes across time. Data can change on a daily, or even hourly basis, implying that older data can become obsolete and therefore that the newer data is, the more valuable it might be. As an example, think of the price of stocks: anticipating the value of a stock is extremely valuable, whereas knowing the price at which a trade has already happened, is much less valuable.
Data about a subject belongs to the subject itself. Data about an organization belong to the organization itself. This implies that it is illegal or inadequate to sell, share, exchange, or trade the private data without the informed consent of the data owner.
In many cases, data is not created by one isolated instance, but by the interaction of two or more instances. Think of a trade. The price at which a stock is traded requires a buyer and a seller. Only the interaction between the buyer and the seller generates the trade and its associated data.
The nature of data
Data – contrary to capital and labor – is currently not openly traded in a market. Hence, organizations (firms, funds, associations, investors, banks, traders, policy makers, researchers, etc.) tend to either only work with public data, or with the data that they have generated within their own organizations [4].
The reason that no transparent data market in which individuals sell their data individually to a third party has yet emerged, lies precisely in the attributes that data has. First, data is non-fungible. Hence, an individual, a company, or an organization has no means to sell its data directly to a third party, simply because the potentially buying organization would need to audit the (probably unstructured) data before purchasing it, in order to assess if the data that it is going to buy, is adequate or not. While this is technically possible, it is a tedious process that would be associated with high costs. Second, to train algorithms, an organization is only interested in purchasing big volumes of data, since in the majority of contexts, only a high volume of data can result in insights. Third, it is illegal to offer third- parties access to individual personal data that allows the identification of specific individuals, without previously getting the (informed) consent of the individuals. Fourth, the data of some organizations might be very sensitive, and organizations might not want to give direct access to it to third parties.
The fact that data does not have ways to be securely and transparently traded, represents a market friction, as while capital, labor and oil move freely in the open market to those firms at which they get the highest returns (in the form of interest in the case of capital and oil, and in the form of wages in the case of labor), data is not doing it. Only opaque deals are occurring, which is preventing all organizations to engage in the market of data. As a result, not all organizations are confronted with access to the market, there is no standard in terms of data purchases, and hence, no efficient allocation of the organizations resources is being promoted.
Data exchanges
As a solution to the above-mentioned problems, and in order to enable a compliant, efficient, and secure sharing and selling of data, data exchanges are emerging. Data exchanges are platforms that have the permission to gather, curate and aggregate data from many different sources (companies,
universities, funds, banks, individuals, etc.), in order to allow third parties to gain insights (knowledge) from these data. Data exchanges are a layer between the individuals or organizations owning data, and the third parties. Accounting for the specific characteristics that data has as a production factor, data exchanges make the structured, secure, and legal generation of aggregated-data-based insights for value generation possible. Moreover, and given the fact that data exchanges aggregate the data of many different agents, and are informed about the value that data has, they can sell the data-based insights at an adequate price, and they can distribute the resulting earnings among the individuals whose data has been used to generate those insights.
Specifically, data exchanges allow third parties to run their (privacy audited) computer code on the exchanges’ platform in order to analyze the data that belongs to the individuals. By doing so, data exchanges mitigate the fact that data is non-fungible, since they can apply the code to the specific data that is relevant for each third party. Moreover, since they aggregate data from many different agents, they provide the data volume that the algorithms need to generate value. Additionally, since data exchanges aggregate the interests of many data owners, they are in a stronger position when it comes to price assessing and negotiation. Finally, data exchanges offer insights that cannot be traced to any individual company, subject, or organization, which solves the problems of privacy and consents. The ultimate goal of data exchanges is to enable a transparent, efficient and sustained trade of aggregated-data-based insights to which all organizations have equal access.
While the proposals being made regarding how to build a data exchange might differ from one another, they all share certain common characteristics. First, they directly or indirectly assume that data is non- fungible, and that therefore a bespoke analysis of data is necessary. Second, they assume that platforms’ users and firms’ clients are entitled to own a digital copy of the data that they produce and that this copy can be hosted at a third party (the exchange).
A standard for this sort of exchange is the OPAL [5] [6], initiative developed around the MIT Media Lab, the Imperial College London, Orange, the World Economic Forum and the Data-Pop Alliance. The objective of OPAL is to make broad arrays of data available for analysis in a manner that does not violate personal data privacy. OPAL achieves this by making use of three concepts: First, the algorithm goes to the environment in which the data is. By doing so, the data is always kept secure in its original repository. Access to this repository is controlled by the repository owner. Second, only aggregate answers or “safe answers” are returned. The run algorithms are made public, such that they can be studied and vetted by experts to be “safe”. Third, the data is always in an encrypted state. This is of particular use when the data to be analyzed needs to be kept private, due to its sensitivity. Private data can be kept private but at the same time be used to generate value and yield answer to the algorithms run on it.
Consequences of the implementation of data exchanges. The economic consequence of the establishment of data exchanges would be that society would have a new, open source of data to run algorithms to generate knowledge. Organizations would be able to utilize data that they have not produced in order to generate relevant insights for them. This would be equivalent to data moving beyond the boundaries of the organization at which the data has been produced and becoming productive for hospitals, universities, funds, investors, traders, banks, and all organizations (in fact, for all of them at the same time). This implies that data would escape the silos in which it is currently stored and be able to generate insights that could move freely in the economy. This would solve the existing friction that we mentioned above, which in turn could result in higher economic growth.
Drawing an analogy with the other factors of production, implementing a data exchange would be as positive for the economy as having a better trained and educated labor force at once, or finding new oil or gold reserves (with the difference that the labor force, the oil and the gold could only be used by one firm at a time, whereas data could be shared among many firms). This would solve the above-mentioned market friction and data (understood now as a production factor), could work for many instances at the same time, multiplying the data-based knowledge generated by firms.
Another consequence of the establishment of data exchanges would be that the existing organizations in the financial sector, such as for example credit unions, banks, and funds can transform their business strategy, and monetize these data by incorporating themselves to a data exchange. And while the revenue that data owners could generate by selling their data is still unclear, it could represent a significant complement to the current revenue. These organizations could continue transmitting the generated value to lower parts of the supply chain (i.e. to the individuals who have contributed to the generation of these data).
One more consequence of the establishment of data exchanges would be that current incumbents would have to confront new competitors. Today, a small company with a bright idea, a very well-trained team (labor) and a huge investment (capital) might have difficulties to compete with established giants, since its lack of data prevents it from competing and developing useful algorithms or enough insights about its clients. With the implementation of data exchanges, smaller players could run their algorithms on the data stored in the exchanges, and therefore have access to the same production factor (the data) as the incumbents.
A specific example of the impact that data exchanges can have on the generation of knowledge can be presented in the context of the Federal Reserve Board’s Triennial Survey of Consumer Finances (SCF). The SCF collects information about family incomes, net worth, balance sheet components, credit use, and other financial outcomes [7]. The SCF is an important source of data to determine the financial well-being of households in the US economy. Moreover, researchers use the data made available by the SCF to conduct analyses and develop economic policy recommendations. This makes the data of the SCF extremely valuable. However, due to the difficulty of collecting and structuring the data (a process that involves less than 7000 families), the data is gathered only every three years. Data from a much larger number of households, collected at much higher frequency, could potentially be very helpful for investors, private companies, policy makers, and researchers. Upgrading the SCF with data exchanges on household finance would be a way to generate broader data at a higher frequency. Since the organizations using this scarce and valuable data could be interested in paying for this upgraded data, citizens would have an incentive to frequently integrate their data on the SCF. This would result in a broader data infrastructure, which would be accessible to all organizations to generate new knowledge. And the citizens nurturing the SCF would economically benefit from this situation.
Summary
Once that we understand the characteristics that data has as a production factor (beyond the superficial idea that “data is the new oil”), we will be able to understand how data actually works and how we can properly interact with data exchanges. Understanding that data is a non-fungible, non-exclusive production factor is crucial for this.
Acknowledging the characteristics of data, and how knowledge and value are generated by analyzing it, we – as individuals or as organizations – will be able organize in data exchanges and interact with them to the best of our interest. For those who provide data to the exchange, this will become a new source of income. For those who run algorithms on the data, this will imply more knowledge and ultimately, more value.
The implementation of data exchanges will open up a new stream of opportunities for individuals and organizations. The correct understanding of data as a production factor can only fasten and improve the way in which we, as a society, take this chance.