Menu
Dremio adds new Apache Iceberg features to its data lakehouse

Dremio adds new Apache Iceberg features to its data lakehouse

The new features include the ability to copy data and roll back changes in Apache Iceberg tables.

Credit: Dreamstime

Dremio is adding new features to its data lakehouse including the ability to copy data into Apache Iceberg tables and roll back changes made to these tables.  

Apache Iceberg is an open-source table format used by Dremio to store analytic data sets.  

In order to copy data into Iceberg tables, enterprises and developers have to use the new “copy into SQL” command, the company said.

“With one command, customers can now copy data from CSV and JSON file formats stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance,” Dremio said in an announcement Wednesday.

The copy operation is distributed across the entire, underlying lake house engine to load more data quickly, it added.

The company has also introduced a table rollback feature for enterprises, akin to a Windows system restore backup or a Mac Time Machine backup.

The tables can be backed up either to a specific time or a snapshot ID, the company said, adding that developers will have to make use of the “rollback” command to access the feature.

“The rollback feature makes it easy to revert a table back to a previous state with a single command. When rolling back a table, Dremio will create a new Apache Iceberg snapshot from the prior state and use it as the new current table state,” Dremio said.

Optimize command boosts Iceberg performance

In an effort to increase the performance of Iceberg tables, Dremio has introduced the “optimize” command to consolidate and optimize sizes of small files that are created when data manipulation commands such as insert, update, or delete are used.

“Often, customers will have many small files as a result of DML operations, which can impact read and write performance on that table and utilize excess storage,” the company said, adding that the “optimize” command can be used inside Dremio Sonar at regular intervals to maintain performance.

Dremio Sonar is a SQL engine that provides data warehousing capabilities to the company’s lakehouse.

The new features are expected to improve productivity of data engineers and system administrators while bringing utility to these class of users, said Doug Henschen, principal analyst at Constellation Research.

Dremio, which was an early proponent of Apache Iceberg tables in lakehouses, competes with the likes of Ahana and Starburst, both of which introduced support for Iceberg in 2021.

Other vendors such as Snowflake and Cloudera added support for Iceberg in 2022.

Dremio features new database, BI connectors

In addition to the new features, Dremio said that it was launching new connectors for Microsoft PowerBI, Snowflake and IBM Db2.

“Customers using Dremio and PowerBI can now use single sign-on (SSO) to access their Dremio Cloud and Dremio Software engines from PowerBI, simplifying access control and user management across their data architecture,” the company said.

The Snowflake and IBM DB2 connectors will allow enterprises to add Snowflake data warehouses and IBM DB2 databases as data sources for Dremio, it added.

This makes it easy to include data in these systems as part of the Dremio semantic layer, enabling customers to explore this data in their Dremio queries and views.

The launch of these connectors, according to Henschen, brings more plug-and-play options to analytics professionals from Dremio’s stable.


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Events

SustainTech

Join key decision-makers within Environmental, Social, and Governance (ESG) that have the power to affect real change and drive sustainable practices. SustainTech will bridge the gap between ambition and tangible action, promoting strategies that attendees can use in their day-to-day operations within their business.

EDGE 2023

EDGE is the leading technology conference for business leaders in Australia and New Zealand, built on the foundations of collaboration, education and advancement.

WIICTA 2023

ARN has celebrated gender diversity and recognised female excellence across the Australian tech channel since first launching WIICTA in 2012, acknowledging the achievements of a talented group of female front runners who have become influential figures across the local industry.

ARN Innovation Awards 2023

Innovation Awards is the market-leading awards program for celebrating ecosystem innovation and excellence across the technology sector in Australia.

Show Comments