Podknife - The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse by Data Engineering Podcast

Help
Help

Suggested Topics
- What is Podknife?
  
  Podknife is a curated podcast information and review site designed to be accessible in your browser from any device. We’re working to build the most useful podcast information source available by providing you with as much publicly available information about each podcast in our database as we can find and keeping it as up to date as possible.
- Can I listen to podcasts on Podknife?
  
  Yeah! There's a player at the bottom of the page.
- How can I get my podcast on Podknife?
  
  We’d love to hear about your podcast and consider it for addition. You can suggest your own by logging in and selecting the “Suggest a Podcast” link from the menu. Fill out the form and we’ll take it from there.
- How can I submit a review of a podcast?
  
  To submit a review of a podcast, go to the page of the podcast you want to review and click the "Submit Review" button below the image associated with the podcast. Clicking that button brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review.
- How can I submit a review of a podcast episode?
  
  To submit a review of a podcast episode, go to the page of the podcast whose episode you want to review and find the specific episode you're looking for. Clicking on any episode reveals a "View Episode" button and clicking on that brings you to an episode-specific page where you can find more information displayed about the episode along with a "Submit Review" button. Clicking that brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review of the episode.
- Is there a Podknife app?
  
  Not yet - we're keeping it in the browser at least for now. No app download required to access anything on the site.
- How can I change my display name?
  
  To change your display name, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new display name and press the "Save" button to confirm the change.
- How can I change my password?
  
  To change your password, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new password in the two fields below "Update Password" and press the "Save" button to confirm the change.
- How can I keep track of all my reviews?
  
  Within the menu, click on "My Profile", you will then be able to check all your reviews for both podcast & episode. You can also check all your favorite podcasts.
- How can I keep track of all my favorited podcasts?
  
  You can find podcasts you’ve favorited displayed on the My Profile page associated with your account (https://podknife.com/profile).
  You can share your My Profile page displaying your favorite podcasts as well as any reviews you’ve written publicly with a link in the form https://podknife.com/users/[displayname] where [displayname] is replaced by your Display Name.
- How can I favorite a podcast?
  
  On each podcast page you’ll find the outline of a star above the podcast title. Clicking this star fills it in and designates the podcast as a favorite (for logged in users) or invites the user to log in or register for an account (for users who are not logged in). You can find favorited podcasts displayed on the My Profile page associated with your account (https://podknife.com/profile).
- How can I register for a Podknife account?
  
  To register, click on the “Register” button in the top right of your browser window (or visit https://podknife.com/users/sign_up). Enter a Display Name to be associated with your account on the site (alongside any reviews you contribute, for example), an Email address to associate with the account and your password (twice for confirmation). Press the “Confirm” button and you should be all set, logged in and ready to review and favorite podcasts as you wish.
- How can I get in touch?
  
  We’d love to hear from you. You can reach us using the Feedback form (https://podknife.com/feedbacks/new) which can be found either in the footer on each page (for users who are not logged in) or in the dropdown menu (for logged in users).
  
  You can also reach us on Twitter by tweeting us @podknife (https://twitter.com/podknife) or on Facebook by messaging us from our page (https://www.facebook.com/podknife/).
- Log In
- Register
- Feedback
- Help
- Privacy Policy
- Terms of Use
© Podknife 2024

Please login or sign up to post and edit reviews.

This episode currently has no reviews.

Submit Review

Favorite Add to Queue

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Podcast |: Data Engineering Podcast
Publisher |: Tobias Macey
Media Type |: audio
Podknife tags |: Data Science,; Interview,; Technology
Categories Via RSS |: Technology
Publication Date |: Feb 19, 2023
Episode Duration |: 00:55:06

Description
iTunes Summary

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to timextender.com/dataengineering where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Ryan Blue about the evolution and applications of the Iceberg table format and how he is making it more accessible at Tabular Interview Introduction How did you get involved in the area of data management? Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem? Since it is a fundamentally a specification, how do you manage compatibility and consistency across implementations? What are the notable changes in the Iceberg project and its role in the ecosystem since our last conversation October of 2018? Around the time that Iceberg was first created at Netflix a number of alternative table formats were also being developed. What are the characteristics of Iceberg that lead teams to adopt it for their lakehouse projects? Given the constant evolution of the various table formats it can be difficult to determine an up-to-date comparison of their features, particularly earlier in their development. What are the aspects of this problem space that make it so challenging to establish unbiased and comprehensive comparisons? For someone who wants to manage their data in Iceberg tables, what does the implementation look like? How does that change based on the type of query/processing engine being used? Once a table has been created, what are the capabilities of Iceberg that help to support ongoing use and maintenance? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Iceberg/Tabular? When is Iceberg/Tabular the wrong choice? What do you have planned for the future of Iceberg/Tabular? Contact Info LinkedIn (https://www.linkedin.com/in/rdblue/) rdblue (https://github.com/rdblue) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Hadoop (https://hadoop.apache.org/) Data Lakehouse (https://www.forbes.com/sites/bernardmarr/2022/01/18/what-is-a-data-lakehouse-a-super-simple-explanation-for-anyone/) ACID == Atomic, Consistent, Isolated, Durable (https://en.wikipedia.org/wiki/ACID) Apache Hive (https://hive.apache.org/) Apache Impala (https://impala.apache.org/) Bodo (https://www.bodo.ai/) Podcast Episode (https://www.dataengineeringpodcast.com/bodo-parallel-data-processing-python-episode-223/) StarRocks (https://www.starrocks.io/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-open-data-lakehouse-episode-333/) DDL == Data Definition Language (https://en.wikipedia.org/wiki/Data_definition_language) Trino (https://trino.io/) PrestoDB (https://prestodb.io/) Apache Hudi (https://hudi.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/hudi-streaming-data-lake-episode-209/) dbt (https://www.getdbt.com/) Apache Flink (https://flink.apache.org/) TileDB (https://tiledb.com/) Podcast Episode (https://www.dataengineeringpodcast.com/tiledb-universal-data-engine-episode-146/) CDC == Change Data Capture (https://en.wikipedia.org/wiki/Change_data_capture) Substrait (https://substrait.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Summary

Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to timextender.com/dataengineering where you can do two things: watch us build a data estate in 15 minutes and start for free today.
Your host is Tobias Macey and today I'm interviewing Ryan Blue about the evolution and applications of the Iceberg table format and how he is making it more accessible at Tabular

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem?
- Since it is a fundamentally a specification, how do you manage compatibility and consistency across implementations?
What are the notable changes in the Iceberg project and its role in the ecosystem since our last conversation October of 2018?
Around the time that Iceberg was first created at Netflix a number of alternative table formats were also being developed. What are the characteristics of Iceberg that lead teams to adopt it for their lakehouse projects?
- Given the constant evolution of the various table formats it can be difficult to determine an up-to-date comparison of their features, particularly earlier in their development. What are the aspects of this problem space that make it so challenging to establish unbiased and comprehensive comparisons?
For someone who wants to manage their data in Iceberg tables, what does the implementation look like?
- How does that change based on the type of query/processing engine being used?
Once a table has been created, what are the capabilities of Iceberg that help to support ongoing use and maintenance?
What are the most interesting, innovative, or unexpected ways that you have seen Iceberg used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Iceberg/Tabular?
When is Iceberg/Tabular the wrong choice?
What do you have planned for the future of Iceberg/Tabular?

Contact Info

LinkedIn
rdblue on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA