Skip to main content
SearchLoginLogin or Signup

12. Interoperability of Distributed Systems

Published onApr 30, 2020
12. Interoperability of Distributed Systems
·
history

You're viewing an older Release (#1) of this Pub.

  • This Release (#1) was created on Apr 30, 2020 ()
  • The latest Release (#2) was created on Oct 07, 2020 ().

1. Introduction

One of the key issues facing the nascent area of blockchain and distributed ledger technology (DLT) is the lack of interoperability across various blockchain networks [1]. The original blockchain idea of Haber and Stornetta [2][3] is now a fundamental construct within most blockchain systems, starting with the Bitcoin system which first adopted the idea and deployed it in a digital currency context. Given the history of the development of the Internet and of computer networks in general (e.g. LANs, WANs), it is unlikely that the world will settle on one global blockchain system operating universally. The emerging picture will most likely consist of “islands” of blockchain systems, which – like autonomous systems that make-up the Internet – must be “stitched” together in some fashion to make a coherent unity.

One of the main features of a blockchain network that distinguishes it from an Internet routing domain is the perceived value of the signed information contained in the records of the shared ledger of the blockchain. In the case of a routing domain, the goal of the routing protocols and the network elements (e.g. routers) is to route the data packet through the domain in as minimal time as possible. Data packets or PDUs (protocol data units) are therefore seen as being ephemeral and carries no value in themselves. Indeed, in some routing protocols may even duplicate some data packets and pass them through different routes in order to speed-up the delivery of the application-layer message in its totality.

In the world of blockchain technology, many of the leading developers of blockchain protocols and networks seek to be the sole “platform” where transactions occur. We believe this outlook is too short-term and even naive, given the history of the purpose of the Internet development. Many leading voices in the blockchain world often fail to understand the fundamental goals of the Internet architecture as promoted and led by the Defense Advanced Research Projects Agency (DARPA), and thus fail to fully appreciate how these goals have shaped the Internet to achieve its success as we see it today. There was a pressing need in the Cold War period of the 1960s and 1970s to develop a new communications network architecture that did not previously exist, one that would allow communications to survive in the face of attacks. In Section 2 we review and discuss these goals.

The goal of the current work is to bring to the forefront the notion of interoperability, survivability and manageability for blockchain systems, using lessons learned from the three decades of the development of the Internet. Our overall goal is to develop a design philosophy for an interoperable blockchain architecture, and to identify some design principles that promote interoperability.

2. The Design Philosophy of the Internet

In considering the future direction for blockchain systems generally, it is useful to recall and understand goals of the Internet architecture as defined in the early 1970s as a project funded by DARPA. The definition of the Internet as view in the late 1980s is the following: it is “a packet switched communications facility in which a number of distinguishable networks are connected together using packet switched communications processors called gateways which implement a store and forward packet-forwarding algorithm” [4][5].

2.1 Fundamental Goals

It is important to remember that the design of the ARPANET and the Internet favored military values (e.g. survivability, flexibility, and high performance) over commercial goals (e.g. low cost, simplicity, or consumer appeal) [6], which in turn has affected how the Internet has evolved and has been used. This emphasis was understandable given the Cold War backdrop to the packet-switching discourse throughout the 1960s and 1970s. The Advanced Research Projects Agency Network (ARPANET) was an early packet-switching network. It was the first network to implement the TCP/IP protocol suite.

The DARPA view at the time was that there are seven (7) goals of the Internet architecture, with the first three being fundamental to the design, and the remaining four being second level goals. The following are the fundamental goals of the Internet in the order of importance [4][5]:

  1. Survivability: Internet communications must continue despite loss of networks or gateways.

    This is the most important goal of the Internet, especially if it was to be the blueprint for military packet switched communications facilities. This meant that if two entities are communicating over the Internet, and some failure causes the Internet to be temporarily disrupted and reconfigured to reconstitute the service, then the entities communicating should be able to continue without having to reestablish or reset the high-level state of their conversation. Therefore, to achieve this goal, the state information which describes the on-going conversation must be protected. But more importantly, in practice this explicitly meant that it is acceptable to lose the state information associated with an entity if, at the same time, the entity itself is lost. This notion of state of conversation is related to the end-to-end principle discussed below.

  2. Variety of service types: The Internet must support multiple types of communications service.

    What was meant by “multiple types” is that at the transport level the Internet architecture should support different types of services distinguished by differing requirements for speed, latency and reliability. Indeed, it was this goal that resulted in the separation into two layers of the TCP layer and IP layer, and the use of bytes (not packets) at the TCP layer for flow control and acknowledgement.

  3. Variety of networks: The Internet must accommodate a variety of networks.

    The Internet architecture must be able to incorporate and utilize a wide variety of network technologies, including military and commercial facilities.

The remaining four goals of the Internet architecture are: (4) distributed management of resources, (5) cost effectiveness, (6) ease of attaching hosts, and (7) accountability in resource usage. Over the ensuing three decades these second level goals have been addressed in different ways. For example, accountability in resource usage evolved from the use of rudimentary management information bases (MIB) into the current sophisticated traffic management protocols and tools. Cost effectiveness was always an important aspect of the business model for consumer ISPs and corporate networks.

2.2 The End-to-End Principle

One of the critical debates throughout the development of the Internet architecture in the 1980s was in regards to the placement of functions that dealt with reliability of message delivery (e.g. duplicate message detection, message sequencing, guaranteed message delivery, encryption). In essence the argument revolved around the amount of effort put into reliability measures within the data communication system, and was seen as an engineering trade-off based on performance. That is, how much low-level function (for reliability) needed to be implemented by the networks versus implementation by the applications at the endpoints.

The line of reasoning against low-level function implementation in the network became known as the end-to-end argument or principle. The basic argument is as follows: a lower level subsystem that supports a distributed application may be wasting its effort in providing a function that must be implemented at the application level anyway [7]. Thus, for example, for duplicate message suppression the task must be accomplished by the application itself seeing that the application is most knowledgeable as to how to detect its own duplicate messages.

Another case in point relates to data encryption. If encryption/decryption was to be performed by the network, then the network and its data transmission systems must be trusted to securely manage the required encryption keys. Also, when data enters the network (to be encrypted there) the data will be in plaintext and therefore susceptible to theft and attacks. Finally, the recipient application of the encrypted messaged will still need to verify the source-authenticity of the message. The application will still need to perform key management. As such, the best place to perform data encryption/decryption is the application endpoints – there is no need for the communication subsystem to provide for automatic encryption of all traffic. That is, encryption is an end-to-end function.

The end-to-end principle was a fundamental design principle of the security architecture of the Internet. Among others, it influenced the direction of the subsequent security features of the Internet, including the development of the IP-security sublayer [8] and its attendant key management function [9]. Today the entire Virtual Private Network (VPN) subsegment of the networking industry started based on this end-to-end principle. (The global VPN market alone is forecasted to reach 70 billion dollars in the next few years). The current day-to-day usage of the Secure Sockets Layer (TLS) [10] to protect HTTP web-traffic (i.e. browsers) is also built on the premise that client-server data encryption is an end-to-end function performed by the browser (client) and by the HTTP server.

2.3 The Autonomous Systems Paradigm

Another key concept in the development of the Internet is that of autonomous systems (AS) (or routing domains) as a connectivity unit that provide scale-up capabilities. More specifically, the classic definition of an AS is a connected group of one or more networks (distinguishable via IP prefixes) run by one or more network-operators which has a single and clearly defined routing policy [11]. The notion of autonomous systems provides a way to hierarchically aggregate routing information, such that the distribution of routing information itself becomes a manageable task. This division into domains provides independence for each domain owner/operator to employ the routing mechanisms of its choice. IP packet routing inside an autonomous system is therefore referred to as intra-domain routing, while routing between (across) autonomous systems is referred to as inter-domain routing. The common goal of many providers of routing services (consumer ISPs, backbone ISPs and participating corporations) is that of supporting different types of services (in the sense of speed, latency and reliability).

Figure 1: Autonomous Systems as a set of networks and gateways (after cit. 5)

In the case of intra-domain routing the aim is to share best-route information among routers using an intra-domain routing protocol (e.g. distance vector such as RIP [12], or link-state such as OSPF [13]). The routing protocol of choice must address numerous issues, including possible loops and imbalances in traffic distribution. Today routers are typically owned and operated by the legal owner of the autonomous system (e.g. ISP or corporation). These owners then enter into peering agreements with each other in order to achieve end-to-end reachability of destinations across multiple hops or domains. The primary revenue model in the ISP industry revolves around different tiers of services appropriate to different groups of customers.

There are several important points regarding autonomous systems paradigm and the positive impact this paradigm has had on the development of the Internet for the past four decades:

  • Autonomous systems paradigm leads to scale:
    The autonomous system paradigm, the connectionless routing model and the distributed network topology of the Internet allows each unit (the AS) to solve performance issues locally. This in turn promotes service scale in the sense of throughput (end-to-end) and reach (the large numbers of connected endpoints). As such, it is important to see the global Internet today a connected set of “islands” of autonomous system, stitched together through peering agreements.

  • Domain-level control with distributed topology:
    Each autonomous system typically possesses multiple routers operating the same intra-domain routing protocol. The availability of multiple routers implies availability of multiple routing paths through the domain. Despite this distributed network topology, these routers are centrally controlled (e.g. by the network administrator of the domain). The autonomous system as a control-unit provides manageability, visibility and peering capabilities centrally administered by the owner of the domain.

  • Each entity is uniquely identifiable in its domain:
    All routers (and other devices, such as bridges and switches) in an autonomous system are uniquely identifiable and visible to the network operator. This is a precondition of routing. The identifiability and visibility of devices in a domain is usually limited to that domain. Entities outside the domain may not even be aware of the existence individual routers in the domain.

  • Autonomous system reachability: Autonomous systems interact with each other through special kinds of routers – called Gateways – that are designed and configured for cross domain packet routing. These operate specific kinds of protocols (such as an exterior Border Gateway Protocol [14]), which provides transfer of packets across domains. For various reasons (including privacy and security) these exterior-facing gateway protocols typically advertise only reachability status information regarding routers and hosts in the domain, but do not publish internal routing conditions.

  • Autonomous systems are owned and operated by legal entities:
    All routing autonomous systems (routing domains) today are owned, operated and controlled by known entities. Internet Service Providers (ISPs) provide their Autonomous System Numbers (ASNs) and routing prefixes to Internet Routing Registries (IRRs). IRRs can be used by ISPs to develop routing plans. An example of an IRR is the American Registry for Internet Numbers (ARIN) [15], which is one of several IRRs around the world.

In the next section we re-map the fundamental goals of the Internet architecture in the context blockchain systems, with the goal of identifying some fundamental requirements for blockchain interoperability.

3. Interoperable Blockchains: Towards a Design Philosophy

During the 1970s and 1980s several local area network (LAN) systems were in development and were marketed for Enterprises (e.g. IBM SNA [16], DECnet [17]). However, these LANs were distinct enough in their technological approaches (e.g. PHY layer protocols) that they did not interoperate with each other [6]. Today we are seeing a very similar situation, in which multiple blockchain designs are being proposed (e.g. Bitcoin [18], Ethereum [19] Hyperldeger [20], CORDA [21], each having different technological designs and approaches. Most share some common terminology (e.g. “transaction”, “mining node”, etc.), but there is little or no interoperability among these systems.

Following from the first fundamental goal of the Internet architecture, the lesson learned there was that interoperability is key to survivability. Thus, interoperability is core to the entire value-proposition of blockchain technology. Interoperability across blockchain systems must be a requirement – both at the mechanical level and the value level – if blockchain systems and technologies are to become the fundamental infrastructure of the future global commerce [22], [23].

The current work focuses primarily on the interoperability across blockchain systems at the mechanical level, as the basis to achieve a measurable degree of technical-trust across these systems. In turn, technical-trust is needed by the upper level functions to achieve interoperability at the value level, so that legal-frameworks can be created that are able quantify risks based on the technological choices used to implement technical-trust. Poorly designed blockchain systems should present a higher risk for commerce, and vice versa. Finally, business-trust can be built upon these legal-frameworks to allow business transactions to occur seamlessly across multiple blockchain systems globally.

In this section we identify and discuss some of the challenges to blockchain interoperability, using the Internet architecture as a guide and using the fundamental goals as the basis for developing a design philosophy for interoperable blockchains.

In order to clarify the meaning on “interoperability” in the context blockchain systems, we offer the following definition of an “interoperable blockchain architecture” using the NIST definition of “blockchain” (p.50 of [24]):

An interoperable blockchain architecture is a composition of distinguishable blockchain systems, each representing a unique distributed data ledger, where atomic transaction execution may span multiple heterogeneous blockchain systems, and where data recorded in one blockchain is reachable, verifiable and referenceable by another possibly foreign transaction in a semantically compatible manner.

In the following we re-cast the aspects of survivability, variety of service types and variety of systems in the context of blockchain systems.

3.1 Survivability

As mentioned previously, interoperability is key to survivability. In the Internet architecture, survivability as viewed by DARPA [4][5] meant that communications must continue despite loss of networks and gateways. In practical engineering terms, this meant the use of the packet-switching model as a realization of the connectionless routing paradigm.

In the context of blockchain systems generally, survivability should also mean continued operations in the face of various kinds of attacks. The possible types of attacks to a blockchain system has been discussed elsewhere, and consists of a broad spectrum. These range from classic network-level attacks (e.g. network partitions, denial of services, DDOS, etc.) to more sophisticated attacks targeting the particular constructs (e.g. consensus implementation [25], [26], [27]), to targeting specific implementations of mining nodes (e.g. code vulnerabilities; viruses). Similar to applications on the Internet, we can also view survivability more specifically from the perspective of the application (and its user) that is transacting on the blockchain. A user’s transaction should proceed as far as possible despite the blockchain being under attack.

Figure 2: Example of the reliability of a simple transaction

For blockchain systems we propose to re-interpret the term “survivability” to mean the completion (confirmation) of an application-level transaction independent of blockchain systems involved in achieving the completion of the transaction. Furthermore, the transaction may be composed of sub-transactions, in the same sense of a message on the Internet may consists of multiple IP datagrams. Thus, in the blockchain case an application-level transaction may consists of multiple ledger-level transactions (sub-transaction) where each could be intended for (and be confirmed at) different blockchain systems (e.g. sub-transaction for asset transfer in blockchain A, simultaneously with sub-transaction for payments and sub-transaction for taxes in blockchain B).

Here, the notion of packets routing through multiple domains being opaque to the user’s communications application (e.g. email applications, browsers) is now re-cast to the notion of sub-transactions confirmed on a spread of blockchain systems generally being opaque to the user application. Thus, the challenge of reliability and “best effort delivery” becomes the challenge of ensuring that an application-level transaction is completed within reasonable time, possibly with the application itself being oblivious to the actual blockchains where different ledger-level sub-transactions are finally confirmed.

To illustrate the challenges of survivability as interpreted in this manner, we start with the simplest case in which an application sends a “data” transaction (signed hash value) to a blockchain for the purpose of recording it on the ledger of the blockchain (Figure 2). We ignore for the moment the dichotomy of permissionless and permissioned blockchains and ignore the specific logic syntax of the blockchain. Here the application does not care which blockchain records the data as long as once the transaction is confirmed, later the application (and other entities) can find the transaction/block and verify the data has been recorded immutably. Figure 2 illustrates the scenario. The application transmits data-bytes (hash) to a blockchain system No. 1, and waits for confirmation to become available on the blockchain. After waiting for some predetermined time unsuccessfully (i.e. timeout), the application transmits the same data-bytes to a different blockchain system No. 2. The application continues this process until it is able to obtain the desired confirmation.

Although the example in Figure 2 may appear overly simplistic, inefficient, and has the undesirable side-effect of confirmations on multiple blockchains, it highlights a number of questions similar to those posed in the early days of the Internet architecture development:

  • Application degree of awareness: To what degree must an application be aware of the internal constructs of a blockchain system in order to interact with it and make use of the blockchain? Most (all of) wallet applications today must maintain configuration information regarding which blockchain system to which a key applies.

    As a point of comparison, an email client application today is not aware of constructs of packets, MPDUs, routing and so on. It interacts with mail-server according to a high-level protocol (e.g. POP3, IMAP, SMTP) and a well-defined API. The email client needs only to know the destination email address.

  • Distinguishability and addressability of blockchain systems: For an interoperable blockchain architecture, each blockchain autonomous system must be distinguishable from a naming perspective as well as from an addressing/routing perspective. This introduces some new challenges, such as the situation where a node is permitted to participate in several blockchain systems simultaneously. From a key management perspective, there is also the question regarding multiple uses of the same public key pair across several distinct blockchain systems.

  • Placement of reliability functions: What is the correct notion of “reliability” in the context of interoperable blockchain systems and where should the function of reliability be placed? That is, should the function of re-transmitting the same data-bytes (transaction) be part of the application, part of the blockchain system or part of a yet to be defined “middle layer”?


    As a comparison, within the TCP/IP stack the TCP protocol has a number of flow control features that “hides” reliability issues from the higher-level applications.


  • Semantic interoperability: If in the future there will emerge blockchain autonomous systems with differing applications (e.g. registry of assets, currency trading, etc.), what mechanisms are needed to convey to an external system the functional goal of a blockchain and its application-specific semantics?

    As a comparison, the HTTP protocol and RPC inter-process communications both run on the TCP/IP layer. However, these represent different resource access paradigms for different types of applications.

  • Objective benchmarks for speed and performance: How do external entities obtain information about the current performance/throughput of a blockchain system and what measure can be used to compare across systems.

Figure 3: Service types based on different confirmation models

3.2 Variety of Service Types

The second goal of the Internet architecture was the support for different types of services, distinguished by different speeds, latency and reliability. The bi-directional reliable data delivery model was suitable for a variety of “applications” on the Internet but each application required different speeds and bandwidth consumptions (e.g. remote login; file transfer, etc.). This understanding led to the realization early in the design of the Internet that more than one transport service would be needed and that the architecture must support simultaneously transports wishing to tailor reliability, delay or bandwidth usage. This resulted in the separation of TCP (that provided reliable sequenced data stream) from the IP protocol that provided “best effort” delivery using the common building block of the datagram. The User Datagram Protocol (UDP) [28] was created to address the need for certain applications that wished to trade reliability for direct access to the datagram construct.

For blockchain systems we propose to re-interpret the notion of service types from the perspective of the different needs of various applications. We distinguish three (3) basic types of service:

  • Immediate direct confirmation: This refers to applications which require the fastest confirmation from a specific destination blockchain system. The confirmation of the transaction must occur at the destination blockchain. As such, speed and latency are the primary concerns for these types of applications. This is summarized in Figure 3(a). This situation is an analog of the classic TCP-based login service, in which the user performs login to a specific computer system and needs confirmation in as minimal delay as possible (e.g. milliseconds, seconds).

    Digital currency applications (e.g. currency trading system) are a typical example of cases needing direct and immediate confirmation with low latency.

  • Delayed mediated confirmation: This refers to applications which are satisfied with a “temporary” confirmation produced by a mediating blockchain system, which will then seek to “move” the transaction to its intended destination blockchain system. This is summarized in Figure 3(b).


    The application will obtain two confirmations: the first would be a temporary confirmation from the mediating blockchain system, while the final confirmation will occur at the destination blockchain system at a later time. As such, there are two latency values corresponding to the two confirmations. The understanding here is that the application deems the first latency to be acceptable from a practical perspective, and that the second latency can be of a longer period of time (e.g. minutes). This is akin to the store-and-forward method used by classic electronic mail systems.


    An example of this type of application are non-critical notarization applications which seek to record static (unchanging) data (e.g. birth date on a birth certificate) and which do not require low latency confirmations.

  • Multi-party mediated confirmation: This scenario is a multi-party variation of the single party mediated case mentioned above. Here, two (or more) applications are seeking to complete a common transaction at an agreed destination blockchain system, with the aid of settlement logic that executes at destination blockchain system. Each of the applications are willing to accept a “temporary” confirmation produced by a mediating blockchain system, with the understanding that they will obtain a final confirmation from the destination blockchain system. This is summarized in Figure 3(c).


    This situation is akin to TCP-based messaging or chat servers (e.g. XMPP), in which two (or more) parties converge on a common server even though they each may have their own local servers.

3.3 Variety of blockchain systems

The third fundamental goal of the Internet architecture was to support a variety of networks, which included networks employing different transmission technologies at the physical layer (e.g. X.25, SNA, etc.), local networks and long-haul networks, and networks operated/owned by different legal entities. The minimum assumption of the Internet architecture – which is core to the success of the Internet as an interoperable system of networks – was that each network must be able to transport a datagram as the lowest unit common denominator. Furthermore, this was to be performed “best effort” – namely with reasonable reliability, but not perfect reliability.

For blockchain systems we propose a reinterpretation of the minimal assumption as consisting of:

  1. a common standardized transaction format and syntax that will be understood by all blockchain systems regardless of their respective technological implementation, and

  2. a common standardized minimal operations set that will be implemented all blockchain systems regardless of their technological choices.

The notion of a common transaction format is akin to the definition of the minimal IP datagram, which was first published in the 1974 milestone paper by Vint Cerf and Bob Kahn [5]. The operation involved in the datagram case is simple and is implicit in the datagram construct itself, namely that a set of bytes needs to be transmitted from one IP address to another. The situation is somewhat more complex in blockchain systems. Aside from the current common fields found in transactions in current systems (e.g. sender/receiver public keys, timestamp, pointers) there is the question of semantic meaning of the operations intended by the op-code symbols. Some mathematical operations are clear (e.g. op-codes for addition, multiplication, hash function), but others may introduce some degree of ambiguity across systems.

Similar to the variety of technologies implementing LANs and local routing in the 1980s and 1990s, today there are several technological aspect that differentiate one blockchain system from another:

  • Governance model: The term “governance” in the context of blockchain systems typically is used to refer to the combination of (i) the human-driven policies for the community of participants, (ii) the rules of operations that are encoded within the blockchain software and hardware fabric itself, and (iii) the intended application of the blockchain, which is often

    expressed as the “smart contracts” (stored-procedures available on nodes) that are application-specific.

  • Speed of confirmation: The speed (or “throughput”) of a blockchain system refers to the confirmation speed, based on the population size of the participating nodes and other factors.

  • Strength of consensus: An important consideration is the size of the population of nodes (i.e. entities contributing to the consensus) at any given moment and whether this information is obtainable. Obtaining this information may be challenging in systems where nodes are either anonymous, or perhaps unobtainable by external entities in the case of permissioned systems.

  • Degrees of permissionability: Currently the permissionless/permissioned distinction refers to the degree to which users can participate in the system [24]. Interoperability across permissioned blockchains poses additional questions with regards to how data recorded on the ledger can be referenced (referred to or “pointed to”) by transactions in a foreign domain (i.e. another blockchain system).

  • Degrees of anonymity: There are at least two (2) degrees of anonymity that is relevant to blockchain systems. The first pertains to the anonymity of end-users (i.e. identity-anonymity [29], [30], [31]) and the second is the anonymity of the nodes participating in processing transactions (e.g. nodes participating in a given consensus instance). Combinations are possible, such as where a permissioned system may require all consensus nodes to be strongly authenticated and identified, but allows for end-users to remain permissionless (and even unidentified/unauthenticated).

  • Cybersecurity and assurance levels of nodes: The robustness of a blockchain system consisting of a peer-to-peer network of nodes is largely affected by the security of the nodes that make-up the network. If nodes are easily compromised directly (e.g. hacks) or via indirect means (e.g. dormant viruses), the utility of the blockchain system degrades considerably [32].

4. Gateways for Interoperability and Manageability

As mentioned previously, similar to the Internet architecture consisting of a network of autonomous systems, the future blockchain technology may evolve to becoming a network of interconnected blockchain systems – each with differing internal consensus protocols, incentives mechanisms, permissions and security-related constraints. Key to this interconnectivity is the notion of blockchain gateways. In this section we discuss the potential use of blockchain gateways to provide interoperability and interconnectivity across different blockchain systems and service types.

Interoperability becomes a complex matter when transactions in permissionless blockchains (publicly readable ledgers) interact with permissioned (private) blockchains, where transaction entries on the ledger may reveal confidential information and therefore considered to be private. The use-case examples typically involve interactions between (a) ledgers that record factual existential information about a given asset or object, with (b) ledgers that record legal ownership of that asset or object.

4.1 Intra-Domain and Inter-Domain Nodes

Similar to a routing autonomous system being composed of one or more (possibly nested) routing domains, we propose viewing a blockchain system as consisting of one or more ledger management domains. Thus, just as routers in a routing-domain operate one or more routing protocols to achieve best routes through that domain, nodes in a blockchain domain to contribute to maintaining a shared ledger by running one or more ledger management protocols (e.g. consensus algorithms, membership management) to achieve stability and fast convergence (i.e. confirmation throughput) of the ledger in that domain.

Nodes could therefore be classified from the perspective of ledger management as operating either intra-domain or inter-domain. Figure 4 illustrates the concept, showing one (1) blockchain autonomous system, with three (3) local domain blockchains, each managing a distinct ledger.

  • Intra-domain nodes: These are nodes and other entities whose main task is maintaining ledger information and conducting transactions within one domain.

    Examples includes nodes which participate in consensus computations (e.g. full mining nodes in Bitcoin [18]), nodes that “orchestrate” consensus computations (e.g. Orderers and Endorsers in Hyperledger Fabric [20]), and nodes which perform validations only (e.g. Validators in Ripple [33]).

    Figure 4: Blockchain Autonomous System, Domains and Gateways

  • Inter-domain nodes: These are nodes and other entities whose main task is dealing with cross-domain transactions involving different blockchain autonomous systems. We refer to these nodes as inter-domain gateways.

Although Figure 4 shows a small number of nodes G to be designated as inter-domain nodes, ideally all nodes N in a given blockchain autonomous system should have the capability (i.e. correct software, hardware, trusted computing base) to become an inter-domain gateway. This allows dynamic groups (subsets) of the population of nodes to become gateway groups that act collectively on behalf of the blockchain system as a whole [34]. In the remainder of the current work we will denote inter-domain gateways simply as “gateways”.

4.2 Defining the Perimeter for Blockchain Autonomous Systems

In the history of routing on the Internet, the emergence and evolution of the concept of autonomous systems was driven partly by the need to manage networks. Among others, the owners and operators of networks needed to define their network physical perimeter, deploy administrative controls over the networks, and legally understand the areas of business responsibilities and liabilities. This arrangement provided the freedom on the part of the operators to design the routing topologies according to their business needs, applying different protocols, tools and devices in each domain. The result is that the physical perimeter of each network is clearly demarcated, without any ambiguities with regard to the legal ownership of each autonomous system.

The situation is somewhat more complex in blockchain systems which employ a peer-to-peer network of nodes in a geographically distributed topology, and where the participation of nodes are dynamic over time. One revolutionary aspect of the Bitcoin system [18] is its openness for any person or entity to participate in the act of mining by independently and anonymously deploying CPU cycles to compute the proof of work (consensus) algorithm. As such – and in contrast to routing domains – the perimeter of the Bitcoin network of nodes is not a physical or geographical one but rather a computation-participatory one. This independence and anonymity meant that it is difficult or even impossible to know how many nodes (and which nodes) actually spent CPU cycles (successfully or not) in computing a given instance of the proof of work. Although unauthenticable anonymous identities maybe useful in some scenarios, in the context of the Bitcoin system it may allow certain entities (e.g. state-sponsored actors) to amass computing power in a large mining-pool and to “weaponize” that hash-power at the opportune moment.

The future of blockchain autonomous systems may evolve into an interconnected set of autonomous and independent blockchain systems, each with its own interior protocols, entities and systems and where each system’s perimeter is defined along one or more of the following axis:

  • Degree of identifiable and authenticable participation: The “membership” of a node and entity in a blockchain autonomous system may be defined as the potential for that node or entity to positively (or negatively) impact the community in a considerable way.


    In tightly permissioned blockchain systems, the organization or community that oversees the operations of the system may demand that all member nodes (i.e. legal owners of nodes) not only register their identities but also report their participation in one measure or another (e.g. participated in a mining instance). For example, in an enterprise (single organization) privately owned blockchain system, all nodes are legally owned by the organization and thus the degree of participation is fully controlled by the organization.


    In a multi-organization consortium arrangement, the consortium may require all nodes belonging to consortium members to be identified beforehand (i.e. registered), but a node’s actual participation in each consensus computation may be at the discretion of the member. Thus, the consortium as a collective may wish to ensure that no unknown or rogue node affects the consortium as a whole but allows each member to control their own resources.


    To this end approaches using anonymous-verifiable identity schemes [30], [35] may be used to offer some degree of anonymity to the nodes. Similarly, methods to prove participation in computation can be devised based on schemes that use a combination of the consensus algorithm, a periodic reporting of hardware internal state (e.g. TPM registers [36], Quote protocol [37]) and secure multi-party computation techniques [38].

  • Degree of trust and assurance: Another factor related to perimeters and membership is the degree of provable trust each node can attain and can convey to other nodes and entities. The idea here is that the ability of a node to perform its tasks with high assurance (e.g. perform proof of work, safeguard its private keys, etc.) becomes input into the decision as to whether to accept the computation results of that node.

    This factor is notably important in the multi-organization consortium arrangement. The aspect of provable assurance may determine the acceptability internally of the results of the consensus computations by the member nodes. For example, in networks that deal with high-value transactions the consortium may require its members to deploy trusted computing technologies that convey technical trust.


    In this context it is useful to revisit some key architectural designs of the Trusted Platform Module (TPM) from the late 1990s which provided a basic understanding on trustworthiness [39][40]. Reusing some of the concepts in trusted computing, a node can be considered to exhibit technical trust if it (i) operates unhindered and shielded from external influences or interference, (ii) operates for a well-defined task, and (iii) has the ability to report results of its computations and its internal status truthfully.

  • Business model of the organization or community: The business purpose of a blockchain autonomous system may determine the degree of required identifiable and authenticable participation, as well as the minimal required trust and assurance.


    For example, a blockchain system for supply-chain management of components for a defense organization [41] has a different set of constraints compared to a blockchain used for supply-chain management of consumer goods [42].


    Similarly, a consortium of organizations whose goal is high-speed trading in digital assets using blockchain technology has different business purpose than a consortium of music publishers seeking greater accessibility to rights-data in a global music ecosystem [43],[44].

4.3 Use-Case: Inter-domain Transactions

To illustrate and aid discussion, we use a simple example shown in Figure 5 in which an asset recorded in blockchain system BC1 is to be transferred to blockchain system BC2. Both blockchain BC1 and BC2 are permissioned/private blockchain systems.

In Figure 5, a User A with Application X has his or her asset ownership (e.g. land title deed) recorded on the ledger inside blockchain BC1. The local transaction-identifier for this ledger entry is Tx1privateID. The User A wishes to transfer legal ownership of the asset (e.g. sell) to a different User B running Application Y. However, the User B requires that the asset be “moved” to the blockchain BC2 and be authoritatively recorded on the ledger of BC2. This would allow User B to later sell the asset locally in blockchain BC2 to other users in BC2. Note that being private blockchain systems, none of the gateways or nodes in BC1 can directly read/write to BC2, and vice versa.

Figure 5: Example of Inter-Domain Transaction Across Two Blockchain Systems

The following is a high level summary of the transaction flows between BC1 and Bc2. In step 1 the User A initiates the transfer to User B by submitting a (candidate) transaction to BC1 with the recipient address being the public-key of User B in BC2 (e.g. pubkeyB /BC2). Because the destination blockchain BC2 is a foreign system, only the gateway nodes in BC1 are permitted or have the capability to process this transaction. In step 2 the gateway G1 selects the candidate (unprocessed) transaction of User A in BC1 and proceeds to process the transaction. Seeing that the pending transaction is destined for pubkeyB /BC2, gateway G1 locates one or more gateways G2 in BC2 and proceeds perform trust negotiations with G2 (see Section 4.5 below).

In step 3 because BC1 is a private system, the gateway G1 has to mask the private identifier value Tx1privateID with a new public value Tx1publicID. G1 has to persistently maintain this table of mappings (Tx1privateID, Tx1publicID). In the future Gateway G1 must provide a means to resolve the public value Tx1publicID back to the internal private value Tx1privateID should that be required (see Section 4.4 below). In step 4 and step 5 (multi-round) gateways G1 and G2 must establish trust by executing a key establishment protocol that includes the exchange of keying parameters, hardware root of trust certificates (e.g. AIK certificates in TPM [36]), hardware status reports (e.g. Quote protocol reports [37]), and other relevant trust establishment parameters.

In step 6, after pairwise technical-trust has been established between gateways G1 and G2, the gateway G2 proceeds to submit a new local transaction with identifier Tx2privateID into the ledger of BC2 addressed pubkeyB . This local transaction references the asset (in BC1) identified by the public value Tx1publicID. In effect, gateway G2 is “registering” this asset Tx2privateID as belonging to User B with public key pubkeyB . In step 7, confirmation has been achieved in the ledger of BC2. Gateway G2 needs to indirectly report this confirmed status of Tx2privateID to gateway G1. As such G2 has to mask local transaction-identifier Tx2privateID with a new public identifier Tx2publicID. Gateway G2 must henceforth maintain a persistent mapping between Tx2publicID and Tx2privateID.

In step 8 Gateway G2 issues a signed assertion to G1 stating that the asset with transaction identifier Tx2publicID has been confirmed on ledger BC2, and includes a hash of the private identifier Tx2privateID in the assertion. In step 9 upon seeing the signed assertion from G2, gateway G1 proceeds to submit a new “invalidation” transaction in BC1, essentially marking that the asset previously known as Tx1privateID has been moved to BC2. The invalidation-transaction also includes a reference to the new home of the asset, namely Tx2publicID/BC2. It should also include a hash of the signed assertion from G2. Finally, in step 10 the User B is able to see the confirmed transaction Tx2privateID/BC2, while User A sees that Tx1privateID/BC1 as also being confirmed.

Note that the above use-case is an abstract example only. Many variations are possible for these flows, including the incorporation of commitment protocols (e.g. 2-phase commit) in Steps 2 to 9.

4.4 Visibility and Referenceability of Transaction Identifiers

One key potential use of gateways in the context of blockchain interoperability is to provide some degree of control over the visibility (i.e. read access) of transaction-identifiers residing on the ledger of the blockchain system.

  • Masking of private identifiers: In cases of private/permissioned blockchain systems where all transaction information on the confirmed blocks on the ledger is considered confidential information (including the transaction-identifiers), a gateway may offer the possibility to support the notion of identifier “masking”.


    As discussed in Section 4.3, a substitute transaction-identifier is used for external referenceability in a persistent manner. In a sense, this is akin to the network address translation found in NAT devices and dual-stack IPv4/IPv6 routers.

  • Resolution of private identifiers: If identifier masking or translation is used, a corresponding resolution function can be implemented at gateways. Thus, in the example of Section 4.3, after the asset has been moved from BC1 to BC2, whenever one of more nodes in BC1 obtains a query regarding Tx1publicID/BC1, the node can forward this query to one or more of the gateways in BC1 which collectively share the mapping table. In turn, one of these gateways in BC1 can re-map the query into Tx2publicID/BC2 and redirect the query to one or more of the gateways in BC2. This resolver role is similar to the DNS systems, and also to the OCSP Responder model in PKI [45] which can report on the status of a public-key in an X.509 certificate issued by the Certificate Authority who operates the Responder service.

4.5 Inter-Domain Trust Establishment

A second potential use of gateways in the context of blockchain interoperability is to support the establishment of trust (i.e. technical-trust) across blockchain autonomous systems. We believe there is a promising role for trusted hardware to implement many of the function of the gateways. As mentioned previously, ideally all nodes in a given blockchain autonomous system should possess the relevant trusted hardware and software to allow them to take-on the role of gateways as required.

Examples of trusted hardware include the TPM [36] with its various roots of trust for measurement, storage and reporting. The first successful version was TPM v1.2 that supported a “one-size-fits-all” approach that primarily targeted the PC market. A second generation TPM v2.0 expanded trusted computing features to better support vertical markets. TPMv2.0 introduced platform specific profiles that define mandatory, optional and excluded functionality for PC Client, Mobile and Automotive-Thin platform categories. Platform-specific profiles allow TPM vendors flexibility in implementing TPM features that accommodates a specific market. Additionally, TPMv2.0 supports three key hierarchies, for storage, platform and endorsement. Each hierarchy can support multiple keys and cryptographic algorithms. We believe that TPM v2.0 profiles for trusted gateways could be developed for the blockchain infrastructure market.

Another example of trusted hardware is the Software Guard Extensions (SGX) from Intel Corporation [37]. The SGX offers another perspective on trusted computing base where a trusted environment exists within a user process called an Enclave. The SGX TCB consists of hardware isolated memory pages, CPU instructions for creating, extending, initializing, entering, exiting and attesting the enclave and privileged CPU modes for controlling access to enclave memory. A second generation SGX (see [46]) added support for dynamic memory management where enclave runtimes could dynamically increase or decrease the number of enclave pages.

There are multiple steps to establish measurable technical-trust that can be input into legal frameworks in the context of peering. Some of these are as follows:

  • Mutual verification of gateway device-identities: Prior to interacting, two gateways belonging to separate blockchain autonomous system must mutually verify their device identities (e.g. AIK-certificates in TPM).

  • Mutual attestation of gateway device status: As part of trust establishment each gateway may be required to attest to its hardware and software stack, as well as the current state of some of its hardware registers (e.g. Quote protocol [36], [37]).

  • Mutual session key establishment: For use-cases involving session keys, the gateways have the additional task of negotiating the keying parameters, and establish the relevant session keys.

  • Mutual reporting of transaction settlement: In use-cases involving one (or both) private blockchains, an additional requirement could be the signing of assertions using a gateway’s device-keys.



4.6 Peering-Points for Peering Business Agreements

The third potential use of gateways in the context of blockchain interoperability is to serve as the peering-points identified within peering agreements or contracts. In the case of the various ISPs that make-up the Internet, peering agreements are contracts that define the various interconnection aspects (e.g. traffic bandwidth, protocols, etc.) as well as fees (“settlements”) and possible penalties. For the interoperability of autonomous blockchain systems, a notion similar to peering agreements must be developed that possess features specifically for blockchain technology and the governance model used by the systems. Peering agreements should include, among others, the following:

  • Identification of gateways chosen as peering points: A blockchain peering agreement should require the clear identification of gateways which are permitted to peer with other gateways. This agreement may specify the device certificates, hardware and software manifest (e.g. hash of manifest), root certificates, device status attestations, and so on.

  • Specify the minimal trust establishment mechanisms and parameters: A peering agreement should specify the trust negotiation and establishment protocols, the respective known parameters (e.g. size of key parameters), the key management protocols, standards compliance required, minimal assurance level required, and others.

  • Specify warranties and liabilities: Similar to peering agreements for ISPs and Certificate Practices Statement for certificate authorities, blockchain peering agreements should clearly identify the liabilities of parties (e.g. in monetary terms) in negative or catastrophic scenarios (e.g. gateway is compromised).

5. Conclusions

The fundamental goals underlying the Internet architecture has played a key role in determining the interoperability of the various networks and service types, which together compose the Internet as we know it today. Interoperability is key to survivability. A number of design principles emerged from the evolution of internet routing in the 1970s and 1980s, which ensured the scalable operation of the Internet over the last three decades.

We believe that a similar design philosophy is needed for interoperable blockchain systems. The recognition that a blockchain system is an autonomous system is an important starting point that allows notions such as reachability, referencing of transaction data in ledgers, scalability and other aspects to be understood more meaningfully – beyond the current notion of throughput (“scale”), which is often the sole measure of performance used with regards to many blockchain systems today.

Furthermore, interoperability forces a deeper re-thinking into how permissioned and permissionless blockchain systems can interoperate without a third party (such as an exchange). A key aspect is the semantic interoperability at the value level and at the mechanical level. Interoperability at the mechanical level is necessary for interoperability at the value level but does not guarantee it. The mechanical level plays a crucial role in providing technological solutions that can help humans in quantifying risk through the use of a more measurable notion of technical-trust. Human agreements (i.e. legal contracts) must be used at the value level to provide semantically compatible meanings to the constructs (e.g. coins, tokens) that circulate in the blockchain system.

Comments
10
?
Suzana Maranhao Moreno:

After reading this word, I was expecting to read more about this topic during this text. I understand that it is close related to the AS concept, but there are many topics that can be explored here.

?
Suzana Maranhao Moreno:

Considering the definition of interoperable blockchain architecture, I imagined that the goal of a gateway interoperability were bigger. Something like: enable atomic transaction execution among multiple heterogeneous blockchain systems and enable data recorded in be reachable, verifiable and referenceable by another possibly foreign transaction in a semantically compatible manner.

?
Suzana Maranhao Moreno:

Considering that some blockchain charge individual transactions, the content of the transaction itself is very relevant to its correct confirmation.

?
Suzana Maranhao Moreno:

You are always talking about blockchain, but Corda is defined as DLT by SDOs like ITU and ISO.

?
Suzana Maranhao Moreno:

ITU SG16/Q22 is working on that. You can see the presentation about the assessment framework discussed at https://www.itu.int/en/ITU-T/webinars/20201202/Pages/default.aspx.

However, speed and performance will always depend on the specific network (number and location of nodes, consensus algorithm and many other topics).

?
Suzana Maranhao Moreno:

I usually differentiate “blockchain netowork” and “blockchain platform/protocol”. Is blockchain system the same that blockchain network?

?
Suzana Maranhao Moreno:

I have to confess I did not understand this bullet. I understand how HTTP and RPC works and the example, but it still not clear this question “what mechanisms are needed to convey to an external system the functional goal of a blockchain and its application-specific semantics?” What functional goal are you talking about?

?
Suzana Maranhao Moreno:

I suppose these three are a unique bullet

Bryan Wilson:

In the author section, Thomas Hardjon should be Thomas Hardjono.

Stephen Coller:

“Hyperledger”