Açıklaması şöyle
Apache CarbonData is an indexed columnar data format that is developed specifically for big data scenarios where fast analytics and real-time insights are critical.
Deep Integration with Spark
Açıklaması şöyle
CarbonData has been deeply integrated with Apache Spark, providing Spark SQL’s query optimization techniques and using its Code Generation capabilities. This makes it possible to directly query CarbonData files using Spark SQL, hence giving faster and more efficient query results.
Multi-Layered Structure
Açıklaması şöyle
Apache CarbonData is structured in multiple layers, which includes the table, segment, block, and page levels. This hierarchical structure allows efficient data retrieval by skipping irrelevant data during the query execution.Table: A table is a collection of segments, and each segment represents a set of data files.Segment: A segment contains multiple data blocks, where each block can store a significant amount of data.Block: A block is divided into blocklets. Each blocklet holds a series of column pages, which are organized column-wise.Page: The page level is where the actual data is stored. The data in these pages is encoded and compressed, making data retrieval efficient.