-
Notifications
You must be signed in to change notification settings - Fork 56
[discussion] A theoretical estimation of Oxen chain's storage size vs Session message storage requirement #480
Description
The ORC-8 The Session Network Token suggests the potential to save storage by removing the legacy Oxen chain, thereby freeing up more space for Session messages. This discussion aims to provide theoretical analysis to aid the development team in making a balanced decision. It is posted as a separate issue to prevent hijacking the original thread with off-topic discussions.
TL;DR: Saving storage by removing the legacy Oxen chain isn't as efficient a strategy as it first appears.
According to the official guidelines [1] the minimum storage size requirement for a service node is 40 GB. The main usage of storage is the Oxen chain and the Session message database. Removing the Oxen chain could free up about 20 GB, accounts for about 50% of total space. Although this seems significant at first glance, upon closer inspection, it may not be as critical as it appears.
There are approximately 700k Monthly Active Users (MAU) at present, or about 2k biweekly users per 'fat' swarm. Storage server stats log shows approximately 500 MB of user message storage on a service node with a 14 days TTL. Compared to the total disk space, this represents only a tiny fraction (1.25%). This suggests that storage size won't become a bottleneck for Session message storage in the short term.
But what about the mid to long term? Over the past few years, Session's user base has grown rapidly. We might want to assume a potential exponential growth rate for the next few years, an assumption that is both simple and practical, based on the growth history of other successful messenger apps.
The tricky part is that the growth rate of message storage size is unknown, but we can make some assumptions.
One naive assumption is that message storage size is proportional to the number of users, serving as a lower bound.
A more aggressive assumption is that message storage size grows much faster, proportionally to the square of the number of users. In real-world scenarios, users are often divided into many small 'villages,' each with a high internal connection density. Meanwhile, connections between users from different villages are fewer. Within 'villages' of village_size users, there are village_size*(village_size-1)/2 pairs of connections at most, resulting in an O(village_size^2) total connection count. When we aggregate all the villages, we get a network that is locally dense and globally sparse, where the number of connections is approximately O(user_count^2) with an extremely small constant factor like 0.0025. Additionally, if we assume message_storage_size ~ O(message_count) ~ O(connection_count), then we conclude that message storage size grows proportionally to the square of the number of users.
Although the square assumption is somewhat arbitrary, it isn't baseless. Many networks exhibit a "Power Law" pattern [2], where in the case of social connections, the exponent might be a number between 1 and 2. In other words, message storage size might grow with O(user_count^r), where r is a number between 1 and 2. Thus, we use O(user_count^2) as an educated guess for the upper bound growth rate.
Lower Bound Estimation of Message Storage Size: Estimated Biweekly Message Storage Size Per ‘Fat’ Swarm Under Linear Assumption
Upper Bound Estimation of Message Storage Size: Estimated Biweekly Message Storage Size Per ‘Fat’ Swarm Under Square Assumption
| User Growth Rate | MAU | Lower Bound Estimation of Message Storage Size | Upper Bound Estimation of Message Storage Size |
|---|---|---|---|
| 1x | 700k | 0.5GB | 0.5GB (1x) |
| 2x | 1.4M | 1GB | 2GB (4x0.5GB) |
| 4x | 2.8M | 2GB | 8GB (16x0.5GB) |
| 8x | 5.6M | 4GB | 32GB (64x0.5GB) |
| 16x | 11.2M | 8GB | 128GB (256x0.5GB) |
| 32x | 22.4M | 16GB | 512GB (1024x0.5GB) |
| 64x | 44.8M | 32GB | 2048GB (4096x0.5GB) |
| 128x | 89.6M | 64GB | 8192GB (16384x0.5GB) |
If we look at the above table, when the network is small, message storage size is also small compared to the Oxen chain size. When the network is large, message storage size greatly exceeds the magnitude of the Oxen chain size. For example, under the square assumption, if the Session user base grows 4x to 2.8M, we will need 8GB of message storage size. It might make sense to remove the legacy Oxen chain to free up ~20GB. However, after another 2x growth to 5.6M MAU, we suddenly need 32GB of storage for messages, at which point saving space from the Oxen chain is no longer sufficient, and operators have to upgrade hardware eventually. In other words, there is a very short critical time window where freeing up storage space by deleting the Oxen chain makes sense. Before that critical time window, the message storage requirement is too small to worry about yet. After that critical time window, the message storage requirement suddenly becomes so large that removing the Oxen chain contributes very little to it. This critical time window could be as short as one year if the network grows 2x in a year.
Regardless of whether we assume linear growth of message storage requirement (lower bound), square growth of message storage requirement (upper bound), or any other growth rate in between, the conclusion doesn't change much. As a legacy chain, the Oxen chain storage size will likely stabilize, but as a rapidly growing social network, Session's message storage requirement will increase rapidly.
On the other hand, storage is relatively cheap compared to other computational resources. In the history of the IT industry, storage costs have consistently decreased, so it shouldn't be a bottleneck in our use case from a budget management perspective.
Note: I have spent time analyzing the historical status logs of the Oxen storage server, and it turns out that the relationship between user count and message storage size is quite complicated. The above numbers are theoretical and are used for ease of explanation rather than accurate prediction.
[1] https://docs.oxen.io/oxen-docs/using-the-oxen-blockchain/oxen-service-node-guides/full-service-node-setup-guide
[2] https://en.wikipedia.org/wiki/Power_law