Glossary | Notion

Decentralized Data Cloud (DDC)

DDC provides a way to permanently store & retrieve data in a performant, usable, and trustless way. DDC version 1.0 launched to Mainnet in Q4 2021 along with the debut of Cere’s NFT marketplace solution https://ondavinci.com/. DDC version 2.0 is currently under development, and will introduce a new duel-layer structure of storage nodes and decentralized Content Delivery Nodes (dCDN) for improved content deliverability.

Data Objects

Data in the DDC is stored as "objects," with each object represented by a unique Content Identifier (CID) that maps to a root file descriptor, referencing file chunks based on the DDC Data Schema. These Data Objects are not standalone; they must reside within a Data Bucket. Dive deeper into the intricacies of the DDC's storage mechanism here.

Data Buckets

DDC Buckets allow for on-demand data operations and are managed as a unified collection termed a "Data Bucket". Each account can have multiple buckets, each containing numerous data objects, with metadata referencing files across the distributed network as detailed in the File Storage section. For hands-on experience, refer to the Quickstart Guide to deploy your first bucket on the DDC Testnet.

Files

Files are stored in chunks within descriptor pieces and separate data pieces. The initial chunk might be inline in the descriptor, with subsequent chunks in data pieces, referenced by their CID in the links field. These data pieces exist in the same context as the descriptor, typically in the same Bucket and Cluster. Chunking details depend on the uploader, with the inline chunk optimized for quick descriptor access. Various use-cases dictate chunking: small files use inline chunks for efficiency, large files store content in linked data pieces, and streamable files prioritize early content access and smooth playback.

Deposit Collateral

Users deposit $CERE tokens periodically to maintain data availability. Deposited collateral is managed in an escrow contract. Clusters receive payments, which are then distributed to service providers based on their contribution and service quality, as assessed by DAC Validators.

CID

Content Identifier is a unique label used in the Cere DDC content-addressable system to identify a piece of content globally. Unlike traditional URLs that point to a location of data, a CID points to the content itself, regardless of where it's stored. This ensures that the content can be retrieved as long as it exists somewhere in the network, even if its location changes. The CID is derived from the cryptographic hash of the content, ensuring data integrity and enabling decentralized and distributed data storage and retrieval.

Consistent Hashing

Consistent hashing is a special kind of hashing technique used in distributed systems. Unlike traditional hashing, where a change in the number of slots or servers can cause a large disruption in key-to-slot mappings, consistent hashing minimizes the number of remappings when the system scales. This is achieved by mapping both keys and servers onto a circular "ring" and assigning keys to the nearest server on the ring. As a result, when a server is added or removed, only a fraction of the keys are remapped to different servers, improving the efficiency and scalability of data distribution.

Partitioning

Partitioning gives us the ability to distribute a single dataset across the network, and this gives us the following benefits:

Support datasets that are larger than you can fit on one node
Scale throughput (need to allocate partitions on different nodes)

Merkle Tree

Merkle trees ensure data integrity in distributed systems, facilitating quick identification of missing data and efficient data verification without transferring large datasets. In cryptography, a Merkle tree labels each leaf with a cryptographic hash of data, and non-leaf nodes with hashes of their child nodes' labels, enabling secure content verification. Cere's implementation of the Merkle tree allows nodes to check and recover data for specific token ranges, update specific tree sections without rebuilding the entire tree, and offers enhanced performance through increased synchronization points and configurable tree size and depth.