DDC | Notion

The Decentralized Data Cloud is a decentralized data protocol built on the Cere Blockchain Network.

Latest Updates

🔥 Deploy your first browser-based game on DDC for a chance to earn $150!

🚀 DDC is available now on Testnet! - https://ddc.cere.network/

📰 The DDC Storage and CDN nodes code repositories are being audited by our security partner Halborn now, with expectation to release publicly soon.

Project Resources

Tools
Documentation
DDC Network Details
Repositories

Releases

Storage Node
DDC SDK Go

Decentralized Data Cloud (DDC)

DDC provides a way to permanently store & retrieve data in a performant, usable, and trustless way.

DDC version 1.0 launched to Mainnet in Q4 2021 along with the debut of Cere’s NFT marketplace solution https://ondavinci.com/.

DDC version 2.0 is currently under development, and will introduce a new duel-layer structure of storage nodes and decentralized Content Delivery Nodes (dCDN) for improved content deliverability.

Benefits of the DDC

The DDC enables you to build Web3 dApps where the user owns the data.

Flexible data storage & management.

Store and retrieve data (objects/files) in an encrypted or unencrypted way in storage clusters
Support for storing large files
Search uploaded data by CID, Tags, and Filenames
Store mutable and immutable data
Familiarity for developers with bucket management similar to AWS S3
Write authentication to buckets

Secure, reliable and GDPR compliant data storing and validation.

Data replication to ensure a high level of availability and redundancy for critical data
Store data in a GDPR compliant way in geofenced EU clusters
Trustless validation to ensure data nodes behave correctly through incentive distribution and slashing

Fast data delivery.

Accelerate data delivery through dCDN clusters
Streaming files to ensure faster loading times
Horizontal Scaling

Built-in file sharing and serverless app hosting allows developers to…

Run serverless apps
Create shareable URLs without the need to run your own web app
Host static web apps on the dCDN cluster, including browser-based games.

Core Features

Storage Node

🖼️ Architecture Diagram

Data Objects

Content uploaded to the DDC is stored as “objects”, or sets of data pieces represented by a root file descriptor mapped to a unique Content Identifier (CID), which further references file chunks as defined by the DDC Data Schema. Data Objects cannot be independent. They must exist within a Data Bucket.

Data Buckets

DDC Buckets provide the ability to save, load, read, and update any type of data on demand and are paid for and managed as a single logical collection called a “Data Bucket”. There can be an arbitrary number of buckets associated with each account, and within each bucket, there can be an arbitrary number of objects.

Another way to put it is that the DDC provides unstructured cloud data storage service for “data objects,” which includes metadata that points or references files stored throughout the distributed network, as described in the File Storage section.

👀 Check the Quickstart Guide to learn how to deploy your first bucket on the DDC Testnet!

File Storage

The content of a file, say movie_4k.mp4 (12Gb) for example, is stored in chunks in the descriptor piece and/or in separate data pieces. The first chunk may be stored inline in the descriptor piece, or not. Additional chunks, if any, may be stored in data pieces that must be uploaded before the descriptor piece.

The data pieces are referenced by their CID in the links field of the descriptor piece. The data pieces are to be found in the same context as the descriptor piece, i.e., in the same Bucket and Cluster, although usually not on the same node.

The details of content chunking are the responsibility of the uploader. The inline chunk should be empty or small to allow fast access to the descriptor. Considering that a piece is the smallest unit that can be retrieved at once, the size and number of chunks should be chosen to optimize between parallelism, smoothness of streaming, memory usage, and overhead per chunk. Here are some possible use-cases:

Small files: inline chunk only; this is the most efficient in space and speed.

Big files: all content is in linked data pieces. The file descriptor is the smallest possible.

Streamable files: access to the start of the content as early as possible. The inline chunk may be used to detect the type, metadata, preview of the content (video thumbnail), or other application-specific filters. Other chunks should be small enough to allow smooth seeking in media players.

Payments

Users must pay for storage by making deposits at regular intervals in $CERE tokens to ensure data remains available as expected. Collateral received from Users is put into a managed escrow contract. Payments are received by Clusters and distributed to service providers according to the relative contribution from each Provider, along with quality of service measures as determined by DAC Validators. ****

Uploaded data can be searched for by CID, Tags, and Filenames and can be combined using logical 'or' operator (e.g. tagA = N or tagB = M). The search operation is distributed and the result is collected from the nodes of the storage cluster where the Bucket is stored (because the Bucket is distributed across all cluster nodes).

Security

The most critical rules that govern the relations between Users, Providers, and Validators are executed on the Cere blockchain. All data is replicated over a number of independent providers. Failing or malicious providers are detected and replaced. The state and history of all nodes is also represented on-chain using cryptography to validate the amount and quality of service provided. All data is identified by cryptographic identifiers and signatures that let both apps and validators verify the integrity and provenance of the data. A system of data encryption and data sharing is provided to protect private data.

Bug Bounties are coming soon to our bounty taskboard in dework, part of the contributor’s program.

Scaling

Consistent Hashing

To efficiently store and retrieve arbitrarily sized and formatted data across a global network, Cere has implemented a variant of Consistent Hashing algorithm into DDC Storage Nodes.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Source: Wikipedia

The Consistent Hashing algorithm, with a ring patterned organizational structure as implemented in the DDC, offers many benefits:

Data is distributed by tokens and each node can have many tokens. Tokens represent the uint node position in ring. (Virtual nodes)
Partition’s range from the current token to its’ clockwise neighbours in the ring; thus each partition belongs to the nearest token and contains data.
The primary partition is selected by hashing the data/file and converting to a uint number to determine the position in the ring.

With consistent hashing, the ring is divided into smaller, predefined ranges. Each node has a range assigned (from 1 to N). The start of the range is called a token. This means that each node will have N tokens in its Token Set.

The DDC doesn’t have any fixed size of data chunks, only a maximum size of 100MB/chunk, so data can’t be fully balanced between nodes. To minimize data skew in any given cluster, we also ensure the following rules are followed:

Each node should present not only the total size of stored data, but the partition should also be logically split into sub-partitions and present each sub-partition’s stored data size.
Standard distributed systems have random generation of tokens. The Cluster Manager is responsible for setting each node token in DDC. The Cluster Manager can get necessary information from each node about their sub-partition and calculate the best tokens for the node so data will be balanced correctly.

Partitioning

Additionally, partitioning gives us the ability to distribute a single dataset across the network, and this gives us the following benefits:

Support datasets that are larger than you can fit on one node
Scale throughput (need to allocate partitions on different nodes)