The Decentralized Data Cloud is a decentralized data protocol built on the Cere Blockchain Network.


Latest Updates

🔥 Deploy your first browser-based game on DDC for a chance to earn $150!

🚀 DDC is available now on Testnet! - https://ddc.cere.network/

📰  The DDC Storage and CDN nodes code repositories are being audited by our security partner Halborn now, with expectation to release publicly soon.


Project Resources

Releases

Decentralized Data Cloud (DDC)

DDC provides a way to permanently store & retrieve data in a performant, usable, and trustless way.

DDC version 1.0 launched to Mainnet in Q4 2021 along with the debut of Cere’s NFT marketplace solution https://ondavinci.com/.

DDC version 2.0 is currently under development, and will introduce a new duel-layer structure of storage nodes and decentralized Content Delivery Nodes (dCDN) for improved content deliverability.


Benefits of the DDC

The DDC enables you to build Web3 dApps where the user owns the data.

Flexible data storage & management.

Secure, reliable and GDPR compliant data storing and validation.

Fast data delivery.

Built-in file sharing and serverless app hosting allows developers to…

Core Features

Storage Node

Data Objects

Content uploaded to the DDC is stored as “objects”, or sets of data pieces represented by a root file descriptor mapped to a unique Content Identifier (CID), which further references file chunks as defined by the DDC Data Schema. Data Objects cannot be independent. They must exist within a Data Bucket.

Data Buckets

DDC Buckets provide the ability to save, load, read, and update any type of data on demand and are paid for and managed as a single logical collection called a “Data Bucket”. There can be an arbitrary number of buckets associated with each account, and within each bucket, there can be an arbitrary number of objects.

Another way to put it is that the DDC provides unstructured cloud data storage service for “data objects,” which includes metadata that points or references files stored throughout the distributed network, as described in the File Storage section.

👀 Check the Quickstart Guide to learn how to deploy your first bucket on the DDC Testnet!

File Storage

The content of a file, say movie_4k.mp4 (12Gb) for example, is stored in chunks in the descriptor piece and/or in separate data pieces. The first chunk may be stored inline in the descriptor piece, or not. Additional chunks, if any, may be stored in data pieces that must be uploaded before the descriptor piece.

The data pieces are referenced by their CID in the links field of the descriptor piece. The data pieces are to be found in the same context as the descriptor piece, i.e., in the same Bucket and Cluster, although usually not on the same node.

The details of content chunking are the responsibility of the uploader. The inline chunk should be empty or small to allow fast access to the descriptor. Considering that a piece is the smallest unit that can be retrieved at once, the size and number of chunks should be chosen to optimize between parallelism, smoothness of streaming, memory usage, and overhead per chunk. Here are some possible use-cases:

Small files: inline chunk only; this is the most efficient in space and speed.

Big files: all content is in linked data pieces. The file descriptor is the smallest possible.

Streamable files: access to the start of the content as early as possible. The inline chunk may be used to detect the type, metadata, preview of the content (video thumbnail), or other application-specific filters. Other chunks should be small enough to allow smooth seeking in media players.

Payments

Users must pay for storage by making deposits at regular intervals in $CERE tokens to ensure data remains available as expected. Collateral received from Users is put into a managed escrow contract. Payments are received by Clusters and distributed to service providers according to the relative contribution from each Provider, along with quality of service measures as determined by DAC Validators. ****

Search

Uploaded data can be searched for by CID, Tags, and Filenames and can be combined using logical 'or' operator (e.g. tagA = N or tagB = M). The search operation is distributed and the result is collected from the nodes of the storage cluster where the Bucket is stored (because the Bucket is distributed across all cluster nodes).

Security

The most critical rules that govern the relations between Users, Providers, and Validators are executed on the Cere blockchain. All data is replicated over a number of independent providers. Failing or malicious providers are detected and replaced. The state and history of all nodes is also represented on-chain using cryptography to validate the amount and quality of service provided. All data is identified by cryptographic identifiers and signatures that let both apps and validators verify the integrity and provenance of the data. A system of data encryption and data sharing is provided to protect private data.

Bug Bounties are coming soon to our bounty taskboard in dework, part of the contributor’s program.

Scaling

Consistent Hashing

To efficiently store and retrieve arbitrarily sized and formatted data across a global network, Cere has implemented a variant of Consistent Hashing algorithm into DDC Storage Nodes.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Source: Wikipedia

The Consistent Hashing algorithm, with a ring patterned organizational structure as implemented in the DDC, offers many benefits:

With consistent hashing, the ring is divided into smaller, predefined ranges. Each node has a range assigned (from 1 to N). The start of the range is called a token. This means that each node will have N tokens in its Token Set.

The DDC doesn’t have any fixed size of data chunks, only a maximum size of 100MB/chunk, so data can’t be fully balanced between nodes. To minimize data skew in any given cluster, we also ensure the following rules are followed:

Partitioning

Additionally, partitioning gives us the ability to distribute a single dataset across the network, and this gives us the following benefits:

Merkle Tree

Merkle trees help to avoid data entropy when we add/remove data from cluster or recover nodes that missed data. The tree helps to find missed data between sources for a short time without transferring huge amounts of data. We can also use Merkle trees for checking that all required data was transferred to storage nodes during the DAC Validation.

In cryptography and computer science, a hash tree or Merkle tree is a tree in which every "leaf" (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large data structure. A hash tree is a generalization of a hash list and a hash chain.

(Source Wikipedia)

Cere’s Merkle tree implementation offers the following features:

Client SDKs

Client SDKs provide developers with the tools required to serve application end-users a unique and standardized interface. Cere officially supports the development of the following client SDKs for use with the DDC:

The primary methods provided by DDC SDK are as follows:

Data Encryption

Using the DDC Client SDK one can encrypt data on demand using industry standard encryption.

The DDC SDKs use the tweetnacl library for asymmetric and symmetric encryption.

When a user configures their DDC client, they set a secret encryption phrase and the SDK prepares 2 secrets:

  1. SDK hashes encryption phrase (blake2b-256). The result is the user's master Data Encryption Key (DEK) to symmetrically encrypt the data.
  2. tweetnacl uses an encryption phrase for generating private and public keys for asymmetric encryption

Data Encryption Key (DEK)

A user generates DEK, symmetrically encrypts data with DEK, and then stores it in the DDC. To avoid DEK storing on the user side, the SDK also symmetrically encrypts DEK via the user’s encryption private key and stores it in the DDC.

DEK Sharing

A user can share DEK in the same bucket so another user is able to decrypt data. A user uses asymmetric encryption to encrypt DEK to someone and stores it in the DDC. For Example, Alice needs to share DEK with Bob.

Hierarchical DEK

Sometimes a user needs to share groups of DEK, so the DDC SDK has the feature to build a hierarchy with folders. For example, a user has data with a logical difference /photos/albums/trip and /photos/albums/friends, sharing every file separately is a long operation and he always needs to share a new album directly after adding it. The SDK has the feature to build a hierarchical DEK for each album and a user has to share only the /photos/albums DEK.

Content Delivery Network (dCDN)

See dCDN for more information.

Cluster Management

Control clusters, add new providers, automatically certificate new nodes before adding them to cluster, manage the cluster, stake, payment for nodes, basic monitoring, and alerting.

Cluster performance results in adjustments to the tiers of clusters to define less reliable, cheaper, and more reliable, more expensive clusters. The Cluster Management system allows a cluster manager to switch between cheaper clusters and more expensive clusters for different tasks. Allowing node providers to request payouts and see their balances is another function within Cluster Management.

There are two main users to be aware of:

The services provided by DDC Cluster Management can be summarized as follows:

The first version of the Cluster Management UI will be ReactJS based.

We are currently validating scenarios that utilize smart contracts to manage reward distribution between node providers and various economic models for content consumption. Your thoughts on the topic in Discord are welcome!

Incremental Recovery