NeoFS and Storj Comparison

Neo SPCC
18 min readJan 27, 2021

--

Introduction

This is the next article about the comparison of architectural features of NeoFS with other decentralized storage systems. The first overview of existing decentralized storage solutions can be found in the NeoNewsToday article about services such as NeoFS, Sia, Swarm, and Filecoin.

The comparison with Filecoin is also available in the article and the Neo Live talk about distributed cloud storage platforms with AMA session. Today we will look at Storj, which is the closest to NeoFS in terms of the ideology of building services and integration with other protocols, like AWS S3, and etc., to be a storage-layer of real applications and to be a part of the real economy.

To begin with, it would be good to remind the main characteristics of NeoFS and Storj.

NeoFS is a distributed, decentralized object storage platform developed by Neo SPCC. NeoFS Nodes are organized in a peer-to-peer network that takes care of storing and distributing user’s data. Any Neo user may participate in the network and get paid to provide storage resources to other users, store his data in NeoFS and pay a competitive price for it. Users can reliably store object data in the NeoFS network and have a transparent data placement process due to the decentralized architecture and flexible storage policies.

Each NeoFS Node is responsible for executing the storage policies that users select for geographical location, reliability level, number of nodes, type of disks, capacity, etc. Thus, NeoFS gives full control over data to users. Deep integration with Neo Blockchain allows NeoFS to be used by decentralized applications (dApps) directly from NeoVM on the smart contract code level. This way, dApps are not limited to on-chain storage and can manipulate large amounts of data without paying a prohibitive price.

NeoFS has a native gRPC Application Programming Interface (API) and popular protocol gateways such as AWS S3, HTTP, and etc., allowing developers to integrate their existing applications without rewriting their code easily. Using this set of features, it’s possible, for example, to use a dApp’s smart contract to manage monetary assets and data access permissions on NeoFS and let users access that data using a regular web browser or mobile application. NeoFS does not use its own token in the public network, unlike all competitors, but uses the Neo network GAS token.

Storj is a project initiated and developed by the for-profit technology company Storj Labs. It is a distributed storage platform built on Ethereum. Storj is a platform, cryptocurrency, and suite of decentralized applications that allows you to store data in a secure and decentralized manner. Your files are encrypted, shredded into chunks called ‘pieces,’ and stored in a network of computers and servers around the globe. No one but you has access to your files, not even in an encrypted form.

Storj uses public/private key encryption and cryptographic hash functions for security. To best protect your data, files are encrypted client-side on your computation device before they are uploaded. Each file is split up into chunks, which are first encrypted and then distributed across the Storj network. The network is comprised of Storj nodes run by users around the world who rent out their unused hard drive space in return for STORJ Token (STORJ).

According to the Storj website, it “can’t be censored, monitored, or have downtime. It´s a decentralized, end-to-end encrypted cloud storage platform that uses cryptography to secure your files”. However, in answers to Reddit questions, Storj repeatedly mentions that they operate under the US government’s full jurisdiction and censor operation of the service in some regions.

The more critical question arises about decentralization in its current form in Storj v3 with Tardigrade.

Major differences

Blockchain

At the blockchain level, the systems are quite similar. In both cases, blockchain is used for the payment and reward layer. Storj uses blockchain as a payment layer for STORJ tokens. NeoFS uses Neo GAS tokens.

At the same time, NeoFS also uses blockchain as a source of truth for primary synchronization/bootstrap of the decentralized network. NeoFS relies on Neo 3.0 blockchain and its features. This allows NeoFS nodes to concentrate on their primary task — storing and processing data, leaving assets management and distributed system coordination to Neo and a set of smart contracts.

The main smart contracts provide the deposit and withdrawal of GAS tokens to the NeoFS account and the list of Inner Ring nodes on the Neo MainNet. NeoFS internal banking and data audit results are stored in the sidechain. Thus, a large number of NeoFS internal transactions do not occur in the Neo blockchain network, which makes the system cheaper to operate without losing the level of trust and guarantees of the main chain. This approach also allows keeping the Inner Ring nodes’ anonymity and not disclosing their network addresses to other network nodes.

The main NeoFS network contract is deployed in the Neo MainNet. This contract’s roles are to maintain the Inner Ring nodes list, maintain the list of nodes-candidates for the Inner Ring, accept GAS input assets from users, and withdraw GAS to users.

The NeoFS network’s service contracts, such as Network Map contract, Container contract, Balance contract, Data Audit contract, and Reputation contract, are stored on the NeoFS Neo sidechain.

Support for the NeoFS protocol has been added to Neo 3.0 native oracle contract allowing NeoFS objects to be used inside smart contracts. For example, a contract can take the decision to transfer tokens or otherwise change its behavior based on the content of the object stored in NeoFS.

NeoFS is designed to work reliably in an unstable network and a network with untrusted nodes. We have taken into account the need to scale the network globally with a constant change in the storage network nodes’ amount and quality.

The system does not require operational attention and is capable of self-healing or controllably degrade until the very end, retaining the ability to perform the target function of storing data with the possibility of emergency evacuation of data to other nodes in case of failures.

In the Storj Tardigrade, all network and user management, distribution of data over the network and its retrieval, storage of metadata, billing, and reputation are controlled by a single entity, a satellite (server cluster), centralized point to control the network.

Network

At the network level, the main difference is that Storj Tardigrade has a centralized network model. All actions with data, accounting, and auditing should be done through one central entity (Satellite). As the network grows, problems can arise due to the performance limitations of the satellite node. All user data will also be inaccessible in case of problems with the Satellite (failure, attack, control lock, network unavailability, or any malicious activity). Payment is also carried out by one satellite node in Storj Tardigrade, which potentially makes it a profitable and priority target for attacks. And there are currently 4 satellites — three of them serve commercial separate networks — US, Europe, Asia, and one is a test satellite.

All information about the data stored in the network is contained in the satellite memory, and the loss of this metadata can lead to the loss of access to the data. The satellite also stores accounts of network users. To save or receive an object, the user should first contact the Satellite node. All such nodes in Tardigrade are operated by Storj Labs themselves and require maintenance. Thus, the entire network is completely dependent on one organization.

In the NeoFS, the network has no central nodes. The Inner Ring nodes’ entities are distributed and are responsible for payment and validation that data is stored correctly as an additional confirmation layer for payment. In this case, the nodes of the Inner Ring do not participate in data storage or retrieval. The entire NeoFS network does not have a single point of failure, as it happens in Storj. Any node can become a candidate for one of the Inner Ring nodes' role and get a reward for its administrative actions.

Storj’s approach as a separate entity that controls data distribution also poses a potential scalability issue. Storj scalability is hard limited by Satellite node resources. As a result, satellites are forming separate networks of storage. For example, each region in Tardigrade has its own satellite node, which forms an independent network.

This leads to another NeoFS advantage. In NeoFS, it is possible to build a CDN without any additional action. A similar experience on Tardigrade will be non-efficient due to the restriction of centralization: one Satellite node for the region — one network. Satellite is placed on the region and control all regional separate network. Thus, an object uploaded to the network in Asia will not be available in the US network. In order to download it in the US, it will be necessary first to contact an Asian satellite that can impose large network latency. So, we can talk about decentralization in Storj Tardigrade only in terms of the fact that data is stored on a large number of independent nodes, but the control of all processes passes through a central point.

Data placement

As mentioned above, in Storj Tardigrade, data distribution over storage nodes and a request to receive data occurs and is processed within the Satellite node. Thus, it is a black box for the user. And the user has no control over how and where his data will be stored.

In NeoFS, the user controls how and where his data will be stored due to a flexible system of storage policies. How does it work?

Each Storage node in the system has a set of key-value attributes describing different node’s properties like geographical location or presence of SSD drives. Inner Ring nodes generate the Network Map, a multi-graph structure, allowing to group and select Storage nodes based on those key-value attributes.

In NeoFS, users put their data into Containers. Containers are like folders in a file system or buckets in Amazon’s S3, but with Storage Policy attached. Storage Policy is created by the user and defines how objects in this container should be stored.

The policy can use nodes attributes as follows: “Store data in three different countries on two different continents in three copies on nodes with SSD disks and good reputation.” Storage nodes will do their best to keep data in accordance with this policy. Otherwise, they will not get paid for their service.

All storage nodes’ service fees are paid in GAS. We believe it is good for the Neo ecosystem and more convenient for users. They may own storage nodes and receive GAS, but at the same time, spend that GAS on paying for storage of their backups on other NeoFS nodes in the network.

This unique combination of Network Map, Storage Policy, and monetary incentives allows users to have full control over their data. They define where and how it should be stored and who can have access to it without trusting any third-party service or company.

Object properties

In addition to the NeoFS network protocol, it is interesting to look at the storage entity — object. In NeoFS, a storage object consists not only of a payload but also has a set of metadata. Including the user himself can assign his own metadata for more flexible work with objects. This allows work with data in NeoFS as in classical object storage systems and uses NeoFS to store unstructured data and use it for IoT and Big Data analytics.

The NeoFS protocol also supports operations such as searching through the metadata of user objects or filtering rules in the Access Control Lists (ACL) to give access to objects with certain metadata. This allows building any application for working with data to get objects from a container by their metadata.

Storj also has the ability to store and process metadata to maintain Amazon S3 compatibility. However, the storage of metadata in the system is separate from the data itself and is located in the metadata storage at the satellite node.

Access Control

For authentication, NeoFS uses Neo blockchain ECDSA, decentralized and owned by the user. This is a secure way — all wallets work in all blockchains. In addition, with the launch of NeoID, users will link their key and their identity. Thus, the user himself owns his account and keys.

In Storj, authentication is provided by API keys. The Satellite node controls users’ accounts. At the same time, access control in Storj is also provided by a Satellite node. Access control is based on “macaroon” API Keys; restricted API keys only have access to a subset of what the parent API key allowed.

NeoFS supports access control inside the NeoFS network protocol with a flexible multi-level ACL system that can be used without any additional requests to the central node as it is in Storj. ACLs specify users’ IDs and their rights, namely to read (search through the container) or write (other object operations). While receiving a request, any storage node gets a container and compares the sender (the first element in the chain of signatures) and the container’s ACL. The container’s ACL covers all objects therein. Thus, the container owner obtains full control and sets certain permissions authorizing defined groups of users only.

NeoFS uses a flexible ACL system involving basic ACL, extended ACL if it is allowed, and bearer tokens for specific or temporary access to obtain information on authorization rules.

In NeoFS, the user can differentiate access rights both for the container (basic ACL) and for a specific object or a group of objects (extended ACL), jointed by any arbitrary attribute. It is possible to define access rights for each specific operation. Basic and extended ACL together uses multiple parameters, therefore, providing greater control. In this way, the owner of the data has complete control over who has access to it.

Data Reliability

One of the most important criteria for a storage system is the reliability of the object storage.

NeoFS is designed in the object replication paradigm. In this case, the user himself controls the amount of redundancy based on storage policy. At the same time, a flexible pipeline system for object processing allows using erasure codes, additionally in the next versions of NeoFS.

Storj stores data in the erasure code paradigm only. Mandatory use of erasure codes increases the time it takes to assemble an object upon retrieving or prepare it for storage. This can be critical when there is a very large stream of very small data.

But more important is the system’s ability to recover data in case of storage node failures or data corruption.

In NeoFS, data is stored in containers. Each container is served by a subset of storage nodes that satisfy the user’s storage policy defined for this particular container. Those nodes are defined mathematically as functions applied to the Network map with storage policy as an argument. The resulting set of nodes is responsible for ensuring that the storage policy is satisfied and data is not corrupted. In case of success, they share users’ payment for data storage. One storage node may serve many containers, so small shares from each container sum-up in a significant reward if it behaves correctly. The same is true for the losses in the case of container nodes’ misbehavior. This motivates nodes to keep an eye on other container members and properly perform all required replication, migration, and data recovery processes. Thus, monitoring compliance with the storage policy, the number of replicas in the system, and the creation of copies of objects in the event of a node failure or data loss occurred asynchronously and distributed. This process does not depend on any central nodes and does not have bottlenecks. Storage nodes themselves watch over each other and copy data between nodes, motivated by the receipt of a reward and the risk of losing it.

In Storj, the recovery process is initiated in a centralized manner by Satellite during the Audit procedure. Only when the central controlling node sees data corruption or the loss of a part of it, it initiates and will carry out the recovery procedure. Satellite node should download all objects before recovery in case of loss of even one smaller piece of the whole object. This is a very expensive process in terms of satellite resources and can become a problem. Due to the complexity of the process, audit on specific data from the satellite rarely occurs (theoretically, up to once a month), which is dangerous in case of network degradation.

Summarizing, Storj Tardigrade has a very resource-intensive algorithm of data recovery due to the selected approach to use erasure codes only. To recover one piece of a large object, it is required to upload most of the file to recover the full object. Only after that, the algorithm forms a lost small part and uploads it to another node. In the history of issues with Storj satellite nodes, there are some incidents when the satellite crashed because of too many open connections due to the data recovery(repair) procedure.

In NeoFS and Storj, payment for storage occurs only after confirmation of storage, when the storage node passes data audit. In NeoFS, payments are made only for data confirmed by verification. In Storj, the node must collect a certain number of successful checks for random objects. Since in Storj Tardigrade, the check is performed by a centralized node, its resources are limited, and all data on the network cannot be checked. At the same time, payment in NeoFS occurs once in an epoch, a short period of time (in the test network, it is equal to 3 hours (or even 1 for now), and payment in Storj takes place once a month.

Audit

Each of the decentralized systems tries to solve the Data Audit problem in its own unique way. NeoFS has a focus on a probabilistic approach and homomorphic hashing to minimize the network load and not have single points of failure.

In NeoFS, Data Audit was implemented as a unique zero-knowledge multi-stage game based on homomorphic hash calculation without data disclosure. Data Audit is independent of object storage procedures (recovery, replication, and migration). NeoFS uses homomorphic hashing, a special type of hashing algorithm that allows computing the hash of a composite block from individual blocks’ hashes.

For integrity checks, NeoFS calculates a composite homomorphic hash of all the objects in a group under control and puts it into a structure named Storage Group. During integrity checks, NeoFS nodes can ensure that hashes of stored objects are correct and a part of that initially created composite hash. This can be done without moving the object’s data over the network, and no matter how many objects are in the Storage Group, the hash size is the same.

Each epoch, Inner Ring nodes perform a data audit. It is a two-stage game in terms of game theory.

At the first stage, nodes in the selected container are asked to collectively reconstruct a list of homomorphic hashes that form a composite hash stored in the Storage Group. By doing that, nodes demonstrate that they have all objects and can provide a hash of those objects. The provided list of hashes can be validated, but it is unknown if some nodes are lying at the current stage.

In the second stage, it is necessary to ensure that nodes are honest and do not fake the check results. The Inner Ring nodes calculate a set of nodes’ pairs that store the same object and ask each node to provide the homomorphic hashes of that object. Ranges are chosen in a way that the hash of a range asked from one node is the composite hash of ranges asked from another node in that pair. Nodes cannot predict objects or ranges that are chosen for audit. They cannot even predict a pair node for the game.

This stage discovers malicious nodes fast because each node serves multiple containers and Storage Groups and participates in many data audit sessions. When a node is caught in a lie, it will not get any rewards for this epoch. So the price of faking checks and risks are too high, and it is easier and cheaper for the node to be honest and behave correctly.

Combining the fact of nodes being able to reconstruct the Storage Group’s composite hash and the fact of nodes’ honest behavior, the system can consider that the data is safely stored, not corrupted, and available with a high probability.

In the case of a successful data audit result, the Inner Ring nodes initiate microtransactions between the data owner’s accounts and the storage node’s owner.

Data Audit in Storj is based on erasure codes. Satellite node selects from meta info DB a random stripe from a random remote segment. The audit verifier downloads all erasure shares generated for that stripe. The verifier needs at least the Reed-Solomon minimum required number of pieces in order to verify the stripe’s correctness. The verifier uses the Berlekamp-Welch algorithm to determine if any shares have been damaged or altered. If so, the verifier marks the culpable nodes as having failed the audit. If the share has not been altered, then the verifier will record the responsible nodes’ audit successes. If the node appears to be offline at the time of downloading, the verifier will record the node as offline.

Satellite nodes can produce a limited number of audits of objects per day. Recovery can be initiated only on the failed audit. The current model can cause serious problems with the growing number of consumers and stored objects in the case of a degraded network.

Nodes Reputation system

Data auditing aimed at regulating network nodes' behavior, in the ideal case, should be supported by a reputation system. Вoth storage systems have a reputation system for monitoring host behavior and removing malicious nodes from the network.

NeoFS implements reputation systems for Storage nodes and Inner Ring nodes in a decentralized network. The reputation system based on the EigenTrust protocol has been created for Storage nodes. If a node does not meet the reputation threshold, then it will be excluded from the network.

The Storj reputation system is organized as local scores of the storage nodes in the central node (Satellite), formed by the result of an audit or network availability from the Satellite node side.

Garbage collection

In addition to the audit, one of the interesting tasks is deleting objects by the user in decentralized and distributed systems.

In NeoFS, a decentralized tombstone model is implemented. When the object is deleted, the PUT operation of the service object occurs into the user’s container. All nodes storing the object receive information about the fact of deletion, and their garbage collector deletes data that is no longer relevant. Objects can be removed in a decentralized way.

In Storj, the Delete operation is implemented in a centralized way by request from the Satellite node and the user. If the storage node does not get requests the Garbage collector will send requests to the Satellite node from time to time to detect garbage (have optimization with Bloom filters). Each node produces an additional load for a single Satellite node, which is limited by Satellite performance. This is also a limitation for network scaling.

Economic model

NeoFS incentive model and pricing follow free-market principles. Each node declares how much it wants to receive as a reward for the upcoming epoch for data storage services. The placement function considers the declared price and prefers nodes with better prices. However, because nodes in the Network Map declare key-value attributes describing their parameters, such as geographical location, type of storage, capacity, and other things also used in Storage Policy, it leaves room for nodes that want a higher price if they provide better or unique services to still get into the placement function result.

In short, if the node is too greedy, it does not get users and rewards. To charge higher prices and make more profit, the node needs to satisfy market demand or provide something unique.

This allows the creation of a system with free-market principles and absolute transparency of rewards and fees. And this will ensure that, regardless of fluctuations in the token exchange rate, storage remains beneficial for all network participants.

On the contrary, in Storj Tardigrade, a fixed storage price for users and a hidden distribution of reward to storage nodes in STORJ tokens is used. Storj Tardigrade rejected decentralization in favor of a centralized service with distributed storage. Their service fully owns all finances, accepts payment from users in fiat (dollars), and pays rewards in STORJ tokens. At the same time, the entire billing and payment procedure is carried out in a closed and “manual” mode as a black-box (as an internal process of billing in the Satellite controlled by Storj Labs company).

The business logic of Storj, which purchases capacity from storage nodes and makes it available to users who need storage at the cost of USD 10 per TB, reminiscent of an Uber model. Storj looks very traditional in terms of wholesale at higher prices and profitability, pledging its benefit in the difference between the cost of buying (for tokens) and selling (mainly for USD) storage capacity.

Payment is made not only for storage but also for egress and ingress traffic. Since all interaction with the network occurs through the satellite in a centralized manner, this node keeps track of the traffic and billing. So, Storj Tardigrade has a centralized payment method as an internal process of billing in the satellite node.

NeoFS has a distributed and decentralized payment model with deposit and withdraws procedures between users and NeoFS Smart Contract and decentralized billing, based on Audit procedure.

Conclusion

Of course, Storj Tardigrade has its advantages by centralizing all operations within a single cluster node owned by Stor Labs, but this also leads to potential risks and limitations. Despite the marketing descriptions about decentralization, this can only be said in the sense that the data is stored on Storage nodes of the Storj network that do not belong to Storj Labs. However, Storj Tardigrade is rather a traditional cloud provider that sells space not of its own capacities in a data center but of storage nodes in a public network.

NeoFS, on the other hand, is built as a truly decentralized storage platform. NeoFS team solves the most difficult problems of adapting decades of experience in distributed storage systems to a decentralized setting. Due to real decentralized architecture, NeoFS is devoid of bottlenecks and single points of failure, such as the Satellite in Tardigrade Storj.

Both NeoFS and Storj are moving in the right direction of integration with external protocols such as S3, HTTP and etc. But at the same time, NeoFS strives to provide a platform applicable in the area of distributed and decentralized application, DeFi, and FinTech services, which focus on working in an untrusted environment.

--

--