Açıklaması şöyle
Apache Arrow is a cross-language development framework for in-memory data. It provides a standardized columnar memory format for efficient data sharing and fast analytics. Arrow employs a language-agnostic approach, designed to eliminate the need for data serialization and deserialization, improving the performance and interoperability between complex data processes and systems.
Apache Arrow vs Apache Parquet
Açıklaması şöyle
The Apache Arrow format project began in February 2016, focusing on columnar in-memory analytics workload. Unlike file formats like Parquet or CSV, which specify how data is organized on disk, Arrow focuses on how data is organized in memory.
Maven
Şu satırı dahil ederiz
<dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-memory</artifactId><version>6.0.1</version></dependency><dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-vector</artifactId><version>6.0.1</version></dependency>
Örnek
Yazma için şöyle yaparız
import org.apache.arrow.memory.RootAllocator;import org.apache.arrow.vector.*;import org.apache.arrow.vector.ipc.*;import org.apache.arrow.vector.util.*;// Set up the allocator and the schema for the vectortry (RootAllocator allocator = new RootAllocator(Integer.MAX_VALUE);VarCharVector vector = new VarCharVector("vector", allocator);ArrowWriter writer = new ArrowWriter(vector, new Schema(Collections. singletonList(vector.getField())))) {// Write data to the vectorvector.setSafe(0, "Apache".getBytes());vector.setSafe(1, "Arrow".getBytes());vector.setSafe(2, "Java".getBytes());vector.setValueCount(3);// Write vector to a filetry (FileOutputStream out = new FileOutputStream("arrow-data.arrow")) {writer.writeArrow(out.getChannel());}}
Okuma için şöyle yaparız
// Now, let's read the data we just wrotetry (RootAllocator allocator = new RootAllocator(Integer.MAX_VALUE);ArrowReader reader = new ArrowReader(new FileInputStream("arrow-data.arrow") .getChannel(), allocator)) {// Read schema and load the datareader.loadNextBatch();// Get the vectortry (VarCharVector vector = (VarCharVector) reader.getVectorSchemaRoot() .getVector("vector")) {// Iterate over the values in the vectorfor (int i = 0; i < vector.getValueCount(); i++) {System.out.println(new String(vector.get(i)));} } }