Today we’re thrilled to share that IBM has acquired Ahana, the venture-backed SaaS for Presto startup company, and we want to write more about our belief in Open Source and why IBM and Ahana are joining forces for the benefit of Presto.
Last month the Computer History Museum in Mountain View, California, reverberated with “all things Presto,” at our PrestoCon 2022 conference. Back for the third time—and the first time post-pandemic—PrestoCon was ground zero for training, knowledge sharing, and inspiration about the open-source Presto for data analytics and lakehouses, as well as for the vibrant Presto community. This year was special however, as it was the first ever in-person PrestoCon event, and I couldn’t have been more thrilled to meet the community, hear how companies are using Presto in production, and learn what’s coming up on the engineering roadmap.
Last month we hosted PrestoCon, a return to in-person events that showcased the community development of Presto. In this blog we’ll detail Rippling’s presentation on their Presto use case, including their architecture, key optimizations, and hard earned lessons. You can also check out their full presentation here.
We here at the Presto Outreach Committee are absolutely thrilled to be entering the new year of 2023. It's hard to believe that another year has passed, but as we reflect on the past year, we can't help but feel grateful for the amazing growth and progress we've seen in the Presto community in 2022.
Earlier this month we hosted PrestoCon, a fantastic in-person event that showcased the innovation around the Presto project. In this blog we’ll detail Twilio’s presentation on their Presto use case, including their architecture, key optimizations, and lessons learned. You can also check out their full presentation here.
We believe that data analytics should be democratized—and is why we innovate Presto with state-of-the-art database technology. Trusted governance is important to us—and is why we model our project governance and bylaws after the Linux Foundation.
TO OUR FELLOW DATA ENGINEERS, SOFTWARE DEVELOPERS, AND DATA PLATFORM ENTHUSIASTS:
As the use of data analytics and SQL lakehouses grows, the open-forever Presto distributed SQL query engine has the enduring power to change the world with better data-driven decisions.
We take this moment to reflect on the open source Presto query engine and especially why open source Presto, hosted by the Linux Foundation’s Presto Foundation, is the best choice for those who care about data platforms and state-of-the-art database technology.
The Presto Foundation is thrilled to announce that today Presto has been awarded “2022 Editors Choice for Top 3 Data and AI Open Source Projects to Watch” from BigDATAwire. Past winners are a true who’s who in the data world including Apache Spark (2020), Apache Kafka (2018), MongoDB (2019), Apache Cassandra, ElasticSearch and Redis (2021). This award underscores what the Linux Foundation's Presto Foundation has known for a long time, that PrestoDB continues to be extremely popular, and we have recently dug into the data to find out more.
Co-author: Steven Mih, Board member, Presto Foundation Member: Ahana
The annual PrestoCon is coming back for its 3rd year and it’s going to be better than ever! If you want to learn how to use Presto with confidence and/or network with data engineers, this is the event for you. PrestoCon 2022 will be held in Mountain View, California on December 7th and 8th. The conference features two days of in-depth training sessions and talks led by some of the best minds in the industry. If you want to learn how to use Presto for data analytics and lakehouses, or simply to get the most out of your data infrastructure, register now and get ready for two exciting days of learning and networking!
Apache Parquet modular encryption provides encryption at-rest and in-transit at finer-grained. In big data world, data analytic tables are usually very wide with hundreds of columns, while only a small number of columns need to be protected. So the finer-grained access control is a better fit than coarse-grained one like table level access control.
In addition, data access restrictions, retention, and encryption at-rest are fundamental security controls. Column encryption with access control at the encryption key can solve all three problems with one unified solution as discussed in another blog One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet.
Apache Parquet modular encryption has been released in Parquet 1.12.0 and Presto has been updated to 1.12.1. This enables the Presto repository to incorporate the Parquet column encryption.
Today’s data is growing very fast, which creates challenges for query engines like Presto. Presto is a popular interactive query engine, because of its scalability, high performance, and smooth integration with Hadoop. As the volume of data grows, Presto needs to read larger chunks of data and load them into memory, which causes higher IO, memory usage, and GC time etc.
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
There are some initiatives done earlier to speed up the Presto reading Parquet data, but there is still a lot of data to read. Since the Java version Parquet(parquet-mr 1.11.0) release, a feature called Page Index has been added to speed up the queries by filtering unnecessary Parquet pages in column chunks.
This article discusses this feature, the porting status into Presto and the benchmark testing result.