Introducing the Presto blog

June 28, 2019

Presto is a key piece of data infrastructure at many companies. The community has many ongoing projects for taking it to new levels of performance and functionality plus unique experience and insight into challenges of scale.

We are opening this blog as an informal channel for discussing our work as well as technology trends and issues that affect the big data and data warehouse world at large. Our development continues to take place at github and can thus be followed by everybody. Here we seek to have a channel that is more concise and interesting to a broader readership than github issues and code comments would be.

We have current projects like Aria Presto for doubling CPU efficiency and Presto Unlimited for enabling fault tolerant execution of very large queries. We are running one of the world’s largest data warehouses and thus have a unique perspective on platform technologies, e.g. C++ vs. Java, data analytics usage patterns, integration of machine learning and database, data center infrastructure for supporting these and much more. Some of the big questions we are facing have to do with optimizing infrastructure at scale and designing the future of interoperable file formats and metadata. Today we are running ORC on Presto and Spark and system specific file formats for diverse online systems. We are constantly navigating the strait between universality and specialization and keep looking for ways to generalize while advancing functionality and performance.

The Presto user and developer community involves many of the world’s leading technology players. There is exciting work in progress around Presto at many of these companies. We look forward to tracking these too here. Articles from the Presto world are welcome. Stay tuned for everything Presto.