The Storage Model of the Internet Computer and the Roadmap for More Storage Development

In this blog post, we clarify the storage model of the Internet Computer (IC) and provide some insights related to the roadmap for more storage.
We will first outline what types of storage blockchain can generally provide, then detail the unique trade-offs realized by the Internet Computer architecture, and finally outline the next milestone on the roadmap for achieving more storage.
In the context of blockchain storage, roughly speaking, two types of storage can be distinguished: fully replicated storage and distributed storage. The Internet Computer relies on fully replicated storage, where, as part of the protocol, it ensures that all participating nodes store a complete copy of the data—commonly referred to as a replicated state—thus supporting direct read/write/update/delete of data as part of any operation agreed upon in a replicated manner by the participating nodes through some consensus protocol.
From the perspective of smart contract developers, this type of storage feels very much like the permanently available RAM in traditional computer programs.
On the other hand, in distributed storage, consensus protocols merely act as coordinators, deciding which subset of nodes stores which portion of the previously agreed data, which usually means that not all participating nodes need to store all the data, thus reducing the replication factor.
However, it is crucial to note that this also means that directly reading data during replicated execution becomes unfeasible, which is why this type of storage is mainly used for storing static blobs.
Therefore, while the model of fully replicated data is clearly more powerful than the distributed storage model for building applications on top of it, it also faces scalability challenges.
The architecture of the Internet Computer encompasses three concepts that uniquely address these scalability challenges and provide immense fully replicated storage capacity: deterministic decentralization, high-performance storage layer implementation, and the ability to scale by adding subnets.
We will now briefly discuss how they facilitate highly scalable fully replicated storage:
Deterministic decentralization: The Network Nervous System (NNS) DAO will make informed decisions about which nodes join the network and which nodes will become part of the subnets. Therefore, the total number of nodes in each subnet is much less than in a setup that allows any node to join the network or subnet, thus achieving diversity and decentralization goals.
High-performance storage layer implementation: Recently, as part of the Stellarator milestone project, the entire storage layer of IC has been redesigned. Among other things, the new storage architecture is an important step toward achieving a more fully replicated storage capacity for each subnet. At the launch of the Stellarator milestone project, the maximum storage capacity of each subnet has been increased to 1 TiB. Importantly, the new architecture also supports subsequent projects to achieve a more fully replicated storage capacity for each subnet.
Scaling by adding subnets: The NNS can launch new subnets when needed, allowing for new storage capacity to be added based on demand.
Due to these architectural characteristics of the Internet Computer, the focus so far has been on optimizing the capacity of fully replicated storage. In the remaining parts of this article, we will provide a more specific overview of the subsequent steps for the Internet Computer's fully replicated storage capacity.
As of the time of writing, a single subnet can store 1 TiB (approximately 1.1 TB) of fully replicated storage. The Internet Computer currently has 34 subnets hosting dapps, which means the total replicated storage capacity is currently 34 TiB.
From 1 TiB subnet storage to 2 TiB
The design and implementation of the new storage layer of IC allows it to avoid performing time-consuming operations that grow linearly with the size of the replicated state. Its operations only depend on the amount of data that changes compared to the previous state. Thus, from this perspective, the Internet Computer is well-prepared to increase the maximum replicated state size of subnets.
To smoothly increase capacity to 2 TiB, there are some known factors that need to be investigated. However, other unknown factors may be discovered during implementation.
Nodes newly joining the subnet or nodes lagging behind other nodes in the subnet use a protocol called 'state synchronization' to fully catch up with the latest state of the subnet. Additionally, subnet recovery may involve certain nodes needing to synchronize the entire state, and benchmarking is required to understand the performance of state synchronization at the 2 TiB state and whether this performance is acceptable in all cases, and/or whether optimization is needed.
Sometimes, nodes participating in the subnet need to hash the replicated state. Although this is done incrementally (i.e., only the differences from the previous state need to be hashed), there are extreme cases where the entire state also needs to be hashed. We need to test to determine if these cases are acceptable.
Optimistically, the road to 2 TiB is passable through extensive testing and some potential optimizations.
Subnet storage exceeding 2 TiB
A natural question arises: is it possible to develop further? Unfortunately, exceeding 2 TiB is slightly more complicated than reaching 2 TiB, primarily because in some worst-case scenarios, scaling up may lead to nodes' physical disks being filled.
In particular, the way the new storage layer stores files on disk, along with the fact that the protocol is quite conservative in retaining old states, means that there will be considerable overhead in terms of disk usage.
Therefore, to exceed 2 TiB, for example, to increase to 4 TiB or even larger storage space, some modifications to the protocol are needed. First, it is necessary to consider changing the storage layer parameters to reduce storage overhead, which will certainly affect execution performance. Second, the protocol needs to be modified to be more proactive when deleting old states.
Clearly, both measures need to be designed and implemented with extraordinary caution and extensive testing. Achieving this goal will also require revisiting all points mentioned in the 2 TiB step and possibly making further improvements.
Thus, we are still some way from this step, but we remain very confident that we will ultimately achieve this goal.
Internet Computer distributed storage space
Finally, it is worth noting that while the Internet Computer has so far focused on providing massive replicated storage, there is nothing fundamentally preventing the protocol from extending to also add support for distributed storage spaces (such as blob storage), a topic that will be discussed in another article later.
Conclusion
In recent years, the Internet Computer has continuously pushed the limits of blockchain replicated storage capacity, which is crucial for many use cases in Web3 that were considered impossible a few years ago. One example is artificial intelligence, where large language models run entirely on-chain.
The Internet Computer has some unique architectural features that allow it to exceed the range of existing supported replicated storage, as mentioned above. Some specific follow-up actions have already been planned to further enhance the storage capacity of the Internet Computer.
Additionally, efforts to provide a second type of storage that can store static blobs on the Internet Computer will soon begin.
#DFINITY  #ICP.  #IC 
IC content you care about
Technological progress | Project information | Global activities
Follow and collect IC Binance channel
Stay updated with the latest news