<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator>
  <link href="https://trino.io/broadcast/feed.xml" rel="self" type="application/atom+xml" />
  <link href="https://trino.io/broadcast/" rel="alternate" type="text/html" />
  <updated>2026-04-08T03:00:47+00:00</updated>
  <id>https://trino.io/broadcast/feed.xml</id>

  <title>Trino Community Broadcast episodes</title>

  <subtitle>Trino is a high performance, distributed SQL query engine for big data.</subtitle>

  
    <entry>
      <title>78: A view with a view with a view</title>
      <link href="https://trino.io/episodes/78.html" rel="alternate" type="text/html" title="78: A view with a view with a view" />
      <published>2026-01-16T00:00:00+00:00</published>
      <updated>2026-01-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/78</id>
      <content type="html" xml:base="https://trino.io/episodes/78.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Senior Developer
Advocate at &lt;a href=&quot;https://www.influxdata.com/&quot;&gt;InfluxData&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/robfromboulder/&quot;&gt;Rob Dickinson&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;h3 id=&quot;trino-478&quot;&gt;&lt;a href=&quot;/docs/current/release/release-478.html&quot;&gt;Trino 478&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for multiple plugin directories.&lt;/li&gt;
  &lt;li&gt;Propagate queryId to the Open Policy Agent authorizer.&lt;/li&gt;
  &lt;li&gt;Add support for reading encrypted Parquet files with the Hive connector.&lt;/li&gt;
  &lt;li&gt;Add numerous performance improvements and bug fixes for the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Update Docker container to use Java 25.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-479&quot;&gt;&lt;a href=&quot;/docs/current/release/release-479.html&quot;&gt;Trino 479&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Require Java 25 to build and run Trino.&lt;/li&gt;
  &lt;li&gt;Publish processing time for a query in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINISHING&lt;/code&gt; state to event
listeners.&lt;/li&gt;
  &lt;li&gt;Deprecate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LOGICAL&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTRIBUTED&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add a extraHeaders option to support sending arbitrary HTTP headers to the
JDBC driver and the CLI.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;APPLICATION_DEFAULT&lt;/code&gt; authentication type for GCS.&lt;/li&gt;
  &lt;li&gt;Remove support for unauthenticated access when GCS authentication type is set
to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SERVICE_ACCOUNT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for setting and dropping column defaults via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE ...
ALTER COLUMN&lt;/code&gt; to the memory connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;View &lt;a href=&quot;https://www.youtube.com/watch?v=7clvlAxGFOI&amp;amp;t=6s&amp;amp;pp=ygUSbWFuZnJlZCBtZW50b3JzIDEw&quot;&gt;Manfred mentors 10&lt;/a&gt; for a more detailed discussion.&lt;/p&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call minutes are available:
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-oct-2025&quot;&gt;October 2025&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-26-nov-2025&quot;&gt;November 2025&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-query-ui&quot;&gt;Trino query UI&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.npmjs.com/package/trino-query-ui&quot;&gt;v0.1.1 successfully released&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Now blocked by npm process change and necessary work to adapt to it&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;OpenText and Vertica connector
    &lt;ul&gt;
      &lt;li&gt;OpenText is looking for expression of interest from users - contact Manfred
or comment on the &lt;a href=&quot;https://github.com/trinodb/trino/pull/26904&quot;&gt;PR for potential
removal&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Working on collaboration to set up test environment with Trino project&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PowerBI connector for Trino
    &lt;ul&gt;
      &lt;li&gt;Manfred working with Microsoft and others to figure out future plans&lt;/li&gt;
      &lt;li&gt;Microsoft is looking for &lt;a href=&quot;https://community.fabric.microsoft.com/t5/Fabric-Ideas/Trino-connector/idi-p/4849124&quot;&gt;your votes for a Trino Fabric
connector&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Trino 480 and Trino Gateway 17 are hopefully coming soon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/playlist?list=PLHdo8mJLIMWALFrGgA6-wWcWgyZmjAex-&quot;&gt;Manfred
mentors&lt;/a&gt;
videos up to episode 10 now about various Trino topics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-rob&quot;&gt;Introducing Rob&lt;/h2&gt;

&lt;p&gt;Rob tells us about his history with Trino, software engineering, and management.&lt;/p&gt;

&lt;h2 id=&quot;a-view-with-a-view-with-a-view&quot;&gt;A view with a view with a view&lt;/h2&gt;

&lt;p&gt;We recap Rob’s past presentation and concepts from Trino Summit 2024 about views
and hierarchies of views. Then we move on to discuss all his recent development
and work. There include the
&lt;a href=&quot;https://github.com/robfromboulder/virtual-view-manifesto&quot;&gt;virtual-view-manifesto&lt;/a&gt;
and the &lt;a href=&quot;https://github.com/robfromboulder/viewmapper&quot;&gt;viewmapper&lt;/a&gt; and
&lt;a href=&quot;https://github.com/robfromboulder/viewzoo&quot;&gt;viewzoo&lt;/a&gt; projects.&lt;/p&gt;

&lt;p&gt;We also chat about Rob’s journey with AI tooling.&lt;/p&gt;

&lt;p&gt;A comparison of application code access to database storage with the different
approaches of an ORM layer, a micro service and API layer, and query engine and
view layer approach:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_comparison.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A detailed topology of an application taking advantage of virtual view
hierarchies:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_topology.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A concrete example of a view hierarchy for events – two swappable layers, one
for mapping to physical databases, and one for calculating event priority:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_example.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/virtual-view-manifesto&quot;&gt;virtual-view-manifesto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/viewmapper&quot;&gt;viewmapper&lt;/a&gt; for view storage&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/viewzoo&quot;&gt;viewzoo&lt;/a&gt; for view visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;28 Jan 2026 - &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;7 Feb 2026 - &lt;a href=&quot;https://www.meetup.com/trino-apac/events/312457635/&quot;&gt;Trino meetup in Bangalore&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Looking for guests and topics for Trino Community Broadcast 79 and beyond&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>77: One tool to proxy them all</title>
      <link href="https://trino.io/episodes/77.html" rel="alternate" type="text/html" title="77: One tool to proxy them all" />
      <published>2025-10-29T00:00:00+00:00</published>
      <updated>2025-10-29T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/77</id>
      <content type="html" xml:base="https://trino.io/episodes/77.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jordanzimmerman/&quot;&gt;Jordan Zimmerman&lt;/a&gt;, Senior
Staff Engineer at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/pablo-arteaga-20b547101/&quot;&gt;Pablo Arteaga&lt;/a&gt;,
Software Engineer at
&lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/&quot;&gt;Bloomberg&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Trino 478 is in the final staging of getting to release. We will talk about the
details in the next episode.&lt;/p&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-oct-2025&quot;&gt;August contributor call recap and
recording&lt;/a&gt;
is available.&lt;/li&gt;
  &lt;li&gt;New video tutorials for working on Trino and other open source projects
&lt;a href=&quot;https://www.youtube.com/playlist?list=PLHdo8mJLIMWALFrGgA6-wWcWgyZmjAex-&quot;&gt;Manfred
mentors&lt;/a&gt;
is live now and looking for &lt;a href=&quot;https://github.com/sponsors/mosabua&quot;&gt;sponsors&lt;/a&gt;.
Details about the tasks are available in the &lt;a href=&quot;https://github.com/simpligility/contributions&quot;&gt;contribution tracker
project&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jordan-and-pablo&quot;&gt;Introducing Jordan and Pablo&lt;/h2&gt;

&lt;p&gt;Manfred chats with Pablo and Jordan about their involvement in the Trino
community. We end up chatting a bunch about the Airlift framework that is a
foundation for Trino since Jordan has been involved in that project for a long
time. Pablo has been involved in Trino itself and worked on the OPA plugin and
the Trino Gateway, among other things.&lt;/p&gt;

&lt;h2 id=&quot;aws-proxy&quot;&gt;aws-proxy&lt;/h2&gt;

&lt;p&gt;The AWS Proxy is an open-source Java toolkit and library, not a standalone
application, designed to act as a transparent proxy for AWS Simple Storage
Service (S3) compatible object storage protocols.&lt;/p&gt;

&lt;p&gt;It was created by developers from Starburst, Bloomberg and other organizations
in the Trino community to address the need for enhanced governance and security
with tools like Apache Spark that lack security controls. It also supports
direct data access to S3 or S3-compatible systems, like MinIO or Dell ECS.&lt;/p&gt;

&lt;h3 id=&quot;key-functionality-and-use-cases&quot;&gt;Key functionality and use cases&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Security and governance layer&lt;/strong&gt;: The primary goal is to prevent client
applications from bypassing governance systems by accessing S3 directly. It
ensures all data access is channeled through the proxy, where custom business
logic can be applied.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Signature handling:&lt;/strong&gt; It handles the complex AWS Signature Version 4 (SIGv4)
protocol used for authenticating requests, which was the most challenging part
of its development.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Emulated credentials&lt;/strong&gt;: Clients are configured to use fake, worthless
credentials that are only recognized by the proxy. The proxy then validates
the user’s identity and request against security policies (like OPA), signs
the request with the real, secure AWS keys (kept safe behind the firewall),
and forwards it to the real S3 store.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Extensibility&lt;/strong&gt;: It’s built on the Airlift framework and uses a simple
Service Provider Interface (SPI) plugin mechanism. This allows users to add
custom logic authorization, object storage abstraction from buckets to tables,
redirection, and other use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In essence, it takes standard S3 requests from data tools and mediates them,
applying security, control, and abstraction before forwarding them to the actual
data lake storage.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb77-aws-proxy.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/aws-proxy&quot;&gt;aws-proxy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://github.com/Randgalt/record-builder&quot;&gt;Jordan’s record-builder open source project&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Looking for guests and topics for Trino Community Broadcast 78&lt;/li&gt;
  &lt;li&gt;26 November 2025 - &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>76: Triple platform treat</title>
      <link href="https://trino.io/episodes/76.html" rel="alternate" type="text/html" title="76: Triple platform treat" />
      <published>2025-09-26T00:00:00+00:00</published>
      <updated>2025-09-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/76</id>
      <content type="html" xml:base="https://trino.io/episodes/76.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jo-perez-data/&quot;&gt;Jo Perez&lt;/a&gt;, Founding Solutions
Engineer at &lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/shawn-gordon-37b9916/&quot;&gt;Shawn Gordon&lt;/a&gt;, Sr.
Developer Advocate at &lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Finally shipped a huge new release:&lt;/p&gt;

&lt;h3 id=&quot;trino-477&quot;&gt;&lt;a href=&quot;/docs/current/release/release-477.html&quot;&gt;Trino 477&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add Lakehouse connector.&lt;/li&gt;
  &lt;li&gt;Add SQL language features including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... SET
AUTHORIZATION&lt;/code&gt;, default column values, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER VIEW ... REFRESH&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add new SQL functions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cosine_distance()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_geojson_geometry()&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add lots of new features to the preview UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are too many connector improvements to list them all. Check out the
release notes. Also inspect the changes on the SPI since there are quite a few.&lt;/p&gt;

&lt;p&gt;Importantly, this release also includes some breaking changes.&lt;/p&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;And before Trino 477 we also shipped Trino Gateway:&lt;/p&gt;

&lt;h3 id=&quot;trino-gateway-16&quot;&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#16&quot;&gt;Trino Gateway 16&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add numerous UI improvements and fixes.&lt;/li&gt;
  &lt;li&gt;Require Java 24 and PostgreSQL 17 or higher.&lt;/li&gt;
  &lt;li&gt;Allow default routing group configuration.&lt;/li&gt;
  &lt;li&gt;Improve error propagation with external routing service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/charts/tags&quot;&gt;trino-1.41.0 and trino-gateway-1.16.0 Helm charts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-python-client/releases/tag/0.336.0&quot;&gt;trino-python-client 0.336.0&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-27-jun-2025&quot;&gt;July contributor call recap and recording&lt;/a&gt; is available.&lt;/li&gt;
  &lt;li&gt;The August contributor call recap and recording from Wednesday is in the works.&lt;/li&gt;
  &lt;li&gt;Java 25 shipped and adoption in Trino is on the way.&lt;/li&gt;
  &lt;li&gt;The new &lt;a href=&quot;https://github.com/trinodb/trino-odbc&quot;&gt;trino-odbc&lt;/a&gt; project was
contributed by &lt;a href=&quot;https://github.com/rileymcdowell&quot;&gt;Riley McDowell&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dprophet&quot;&gt;Erik Anderson&lt;/a&gt; is stepping up as subproject
maintainer for the ODBC driver.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/vagaerg&quot;&gt;Pablo Arteaga&lt;/a&gt; will lead the new efforts for
better OPA tooling and support in the &lt;a href=&quot;https://github.com/trinodb/trino-opa-tools&quot;&gt;trino-opa-tools
repository&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;We send our thanks to &lt;a href=&quot;https://github.com/mosiac1&quot;&gt;Cristian Osiac&lt;/a&gt; for his
contributions as subproject maintainer for
&lt;a href=&quot;https://github.com/trinodb/aws-proxy&quot;&gt;aws-proxy&lt;/a&gt;. He is unfortunately
stepping down from this work.&lt;/li&gt;
  &lt;li&gt;Trino recently overtook the old Presto in the &lt;a href=&quot;https://db-engines.com/en/ranking&quot;&gt;DB-Engines
ranking&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jo-and-shawn&quot;&gt;Introducing Jo and Shawn&lt;/h2&gt;

&lt;p&gt;We chat with Jo and Shawn about their background in the big data and data lake
community and beyond.&lt;/p&gt;

&lt;h2 id=&quot;collate&quot;&gt;Collate&lt;/h2&gt;

&lt;p&gt;We talk about the &lt;a href=&quot;https://open-metadata.org/&quot;&gt;OpenMetadata open source project&lt;/a&gt;
as a unified platform for data discovery, observability, and governance, with
80+ data connectors and a collaborative interface.&lt;/p&gt;

&lt;p&gt;Jo and Shawn teach us about how OpenMetadata can help  build and manage high
quality data assets at scale, with case studies, documentation, and community
resources and we dive into how Collate offers a platform around OpenMetadata and
more.&lt;/p&gt;

&lt;h2 id=&quot;triple-platform-treat&quot;&gt;Triple platform treat&lt;/h2&gt;

&lt;p&gt;Building a modern data platform isn’t just about picking tools—it’s about
creating a unified ecosystem where performance, governance, and trust work
seamlessly together. See how the power trio of Trino, Collate, and Apache Ranger
transforms your data operations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino: Lightning-fast analytics at scale. Query across any data source, any
format, anywhere—without the complexity of data movement or vendor lock-in.&lt;/li&gt;
  &lt;li&gt;Collate: Intelligent data trust and discovery AI-powered profiling, automated
quality testing, and smart alerting that keeps your data reliable and
discoverable.&lt;/li&gt;
  &lt;li&gt;Apache Ranger: Enterprise-grade security and governance, fine-grained access
controls, policy management, and audit trails that keep your data secure and
compliant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The integration advantage: Watch these three platforms work together to deliver
what every data team needs—fast queries, trusted data, and bulletproof
security—all in one cohesive stack.&lt;/p&gt;

&lt;p&gt;Jo and Shawn tell us more about “Trino + Collate + Apache Ranger = Data Platform
Excellence”, talk about the components and value provided by each of them, and
dive in with a demo, while Manfred and Cole ask more questions to dive deeper.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb76-collate.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://open-metadata.org/&quot;&gt;OpenMetadata&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getcollate.io/connectors/database/trino&quot;&gt;Collate Trino connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://youtu.be/x4BvgSMitL0&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Apache Ranger sink for
revere metadata with Collate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 77: One tool to proxy them all (aws-proxy) planned
for October&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>75: Your app sees clearly into Trino</title>
      <link href="https://trino.io/episodes/75.html" rel="alternate" type="text/html" title="75: Your app sees clearly into Trino" />
      <published>2025-07-05T00:00:00+00:00</published>
      <updated>2025-07-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/75</id>
      <content type="html" xml:base="https://trino.io/episodes/75.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at 
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/trevor-denning/&quot;&gt;Trevor Denning&lt;/a&gt;, Solutions
Engineer at &lt;a href=&quot;https://insightsoftware.com/&quot;&gt;insightsoftware&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;What’s going on with our releases?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Summer slump&lt;/li&gt;
  &lt;li&gt;Reduced maintainer work&lt;/li&gt;
  &lt;li&gt;Necessary migration for Maven Central as release blocker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-27-jun-2025&quot;&gt;June contributor call recap and recording&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/foundation.html&quot;&gt;Trino Software Foundation&lt;/a&gt; and
&lt;a href=&quot;/sponsor.html&quot;&gt;documentation for supporting the project&lt;/a&gt; on
the website.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-trevor&quot;&gt;Introducing Trevor&lt;/h2&gt;

&lt;p&gt;Trevor has been developing software for over 20 years and has deep knowledge of
ODBC and JDBC drivers for databases. He tells us more about his experience and
how he came to learn about Trino.&lt;/p&gt;

&lt;h2 id=&quot;more-about-insightsoftware&quot;&gt;More about insightsoftware&lt;/h2&gt;

&lt;p&gt;We untangle the long history of Simba, Logi Symphony, and insigtsoftware with
the Trino project to the current status, before we dive into the technical
details.&lt;/p&gt;

&lt;h2 id=&quot;odbc-and-jdbc&quot;&gt;ODBC and JDBC&lt;/h2&gt;

&lt;p&gt;After talking a bit about Trino, Iceberg, data lakes and related topics, we get
into the details about Simba Trino data connectivity with the ODBC and JDBC
drivers.&lt;/p&gt;

&lt;h2 id=&quot;demo&quot;&gt;Demo&lt;/h2&gt;

&lt;p&gt;Trevor shows us how you can use the ODBC driver to query Trino catalogs from
Microsoft Excel, which arguably the most widely used reporting and analytics
tool, despite really being a spreadsheet application. After that demo he moves
on to some business intelligence analytics with PowerBI.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb75-insightsoftware.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/drivers/trino-odbc-jdbc/&quot;&gt;Simba Trino ODBC &amp;amp; JDBC Drivers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://documentation.insightsoftware.com/simba-home-olh/content/homepage/trino.htm&quot;&gt;Simba Trino Driver Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-application.html#logi-symphony&quot;&gt;Logi Symphony&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/resources/scaling-bi-with-trino-and-apache-iceberg/&quot;&gt;Video: Scaling BI with Trino and Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/blog/unlocking-trinos-full-potential-with-simba-drivers-for-bi-etl/&quot;&gt;Blog post: Unlocking Trino’s Full Potential With Simba Drivers for BI &amp;amp; ETL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/blog/enhance-trino-performance-with-simbas-powerful-connectivity/&quot;&gt;Blog post: Enhance Trino Performance With Simba’s Powerful Connectivity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;We give a quick update on where to see Cole or Manfred next, and talk about
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Meet Manfred at the &lt;a href=&quot;https://www.chainguard.dev/&quot;&gt;Chainguard&lt;/a&gt; booth at the Black Hat conference in Las Vegas&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;Trino Contributor
Call&lt;/a&gt; planned for
the 23rd of July&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: One tool to proxy them all (aws-proxy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>74: Insights from a Norse god</title>
      <link href="https://trino.io/episodes/74.html" rel="alternate" type="text/html" title="74: Insights from a Norse god" />
      <published>2025-06-06T00:00:00+00:00</published>
      <updated>2025-06-06T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/74</id>
      <content type="html" xml:base="https://trino.io/episodes/74.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jeschkies/&quot;&gt;Karsten Jeschkies&lt;/a&gt; from &lt;a href=&quot;https://grafana.com/&quot;&gt;Grafana
Labs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-475.html&quot;&gt;Trino 475&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CORRESPONDING&lt;/code&gt; clause in set operations.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AUTO&lt;/code&gt; grouping set that includes all non-aggregated
columns in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Allow cross-region data retrieval when using the S3 native filesystem.&lt;/li&gt;
  &lt;li&gt;Add support for all storage classes when using the S3 native filesystem for
writes.&lt;/li&gt;
  &lt;li&gt;Numerous improvements on Iceberg, Hive, and Delta Lake connectors.&lt;/li&gt;
  &lt;li&gt;SPI - Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LazyBlock&lt;/code&gt; class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-476.html&quot;&gt;Trino 476&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another big release with lots of changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Require JDK 24 as runtime.&lt;/li&gt;
  &lt;li&gt;Add support for comparing values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Remove Example HTTP connector from binaries.&lt;/li&gt;
  &lt;li&gt;New required JVM config for BigQuery and Snowflake connectors.&lt;/li&gt;
  &lt;li&gt;Fix regression with graceful shutdown from Trino 474.&lt;/li&gt;
  &lt;li&gt;Improve performance of selective joins for federated queries for nearly all
connectors.&lt;/li&gt;
  &lt;li&gt;Add columns to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$all_manifests&lt;/code&gt; metadata tables for Iceberg tables.&lt;/li&gt;
  &lt;li&gt;Add support for user-assigned managed identity authentication for AzureFS for
object storage connectors.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR TIMESTAMP AS OF&lt;/code&gt; clause in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;Other releases and announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Gateway 16 still delayed, but Trino Gateway Helm chart 1.15.2&lt;/li&gt;
  &lt;li&gt;Trino Helm chart with 475 -&amp;gt; 1.39.1&lt;/li&gt;
  &lt;li&gt;Trino Python client &lt;a href=&quot;https://github.com/trinodb/trino-python-client/releases/tag/0.334.0&quot;&gt;0.334.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-karsten-and-grafana-labs&quot;&gt;Introducing Karsten and Grafana Labs&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/jeschkies/&quot;&gt;Karsten Jeschkies&lt;/a&gt; is an experienced software
engineer:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;2013 - 2016 Engineer at the Core Machine Learning team at Amazon&lt;/li&gt;
  &lt;li&gt;2016 - 2020 Mesosphere and D2IQ, maintainer of Marathon, a container
orchestrator for Mesos&lt;/li&gt;
  &lt;li&gt;2020 - now Maintainer of Loki for two years and now Cloud Provider
observability engineer at Grafana Labs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://grafana.com/&quot;&gt;Grafana Labs&lt;/a&gt; is the home of the well-known Grafana for
visualizations and dashboard and other powerful products such as Grafana Tempo, Grafana Mimir,
and Grafana Loki. Grafana is also involved in well-known projects such as
Prometheus and OpenTelemetry.&lt;/p&gt;

&lt;h2 id=&quot;log-management-with-loki&quot;&gt;Log management with Loki&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://grafana.com/oss/loki/&quot;&gt;Loki&lt;/a&gt; is a horizontally-scalable,
highly-available, multi-tenant log aggregation system inspired by Prometheus. It
helps you to drill into petabytes of logging data.&lt;/p&gt;

&lt;h2 id=&quot;analytics-with-trino&quot;&gt;Analytics with Trino&lt;/h2&gt;

&lt;p&gt;Karsten tells about the motivation to create a Trino connector, how the two
tools work together, what features are there, and what his plans are for the
future.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb74-loki-connector.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jeschkies/loki-trino-demo&quot;&gt;Demo source code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://grafana.com/oss/loki/&quot;&gt;Grafana Loki website&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/grafana/loki&quot;&gt;Loki source code repo&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/loki.html&quot;&gt;Loki connector documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Quick update on where to see Cole or Manfred next, and then join us for the
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call - May skipped, June edition to be determined&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: Visualizing with Logi Symphony and ODBC&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: One tool to proxy them all (aws-proxy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>73: Wrapping Trino packages with a bow</title>
      <link href="https://trino.io/episodes/73.html" rel="alternate" type="text/html" title="73: Wrapping Trino packages with a bow" />
      <published>2025-04-09T00:00:00+00:00</published>
      <updated>2025-04-09T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/73</id>
      <content type="html" xml:base="https://trino.io/episodes/73.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-473.html&quot;&gt;Trino 473&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for array literals.&lt;/li&gt;
  &lt;li&gt;Add LDAP-based group provider.&lt;/li&gt;
  &lt;li&gt;Remove the deprecated glue-v1 metastore type.&lt;/li&gt;
  &lt;li&gt;Remove the deprecated Databricks Unity catalog integration.&lt;/li&gt;
  &lt;li&gt;Remove the Kudu connector.&lt;/li&gt;
  &lt;li&gt;Remove the Phoenix connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But don’t use 473 since there were &lt;a href=&quot;https://github.com/trinodb/trino/issues/25381&quot;&gt;some breaking changes&lt;/a&gt;, fixed in…&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-474.html&quot;&gt;Trino 474&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Fix a correctness bug in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; queries with a large number
of unique groups.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalUser&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;authenticatedUser&lt;/code&gt; as resource group selectors.&lt;/li&gt;
  &lt;li&gt;Use JDK 24 as the runtime in the Docker container.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well. Java 24 is coming as requirement soon - test the container!&lt;/p&gt;

&lt;p&gt;Releases continue to be slower. Trino needs your help.&lt;/p&gt;

&lt;p&gt;Other releases and announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Gateway 16 delayed, but Trino Gateway Helm chart 1.15.1&lt;/li&gt;
  &lt;li&gt;Trino Helm chart with 474 -&amp;gt; 1.38.0&lt;/li&gt;
  &lt;li&gt;New book: &lt;a href=&quot;/blog/2025/03/27/olap-principles-book.html&quot;&gt;Core Principles and Design Practices of OLAP Engines from Yiteng Xu
and Gary Gao&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Massive new contribution looking for helpers - &lt;a href=&quot;https://github.com/trinodb/trino-query-ui&quot;&gt;trino-query-ui&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s explore the query ui repo a bit more…&lt;/p&gt;

&lt;h2 id=&quot;application-packaging-and-trino&quot;&gt;Application packaging and Trino&lt;/h2&gt;

&lt;p&gt;Manfred and Cole muse about the package artifacts from Trino, their history,
scope and pain points:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;RPM&lt;/li&gt;
  &lt;li&gt;tarball&lt;/li&gt;
  &lt;li&gt;Docker container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of them have and had issues, and everyone knew about them. Manfred
documented a lot the usage in &lt;a href=&quot;/trino-the-definitive-guide&quot;&gt;Trino: The Definitive
Guide&lt;/a&gt;. Finally some time in 2024
Manfred put some ideas down and in the last months implemented a lot of it.&lt;/p&gt;

&lt;p&gt;We discuss a few aspects such as the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Plugin architecture of Trino&lt;/li&gt;
  &lt;li&gt;What plugins are core or optional?&lt;/li&gt;
  &lt;li&gt;Are artifacts ready to use or not?&lt;/li&gt;
  &lt;li&gt;How painful is configuration?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;In our demo session we look at some of the changes and the new trino-packages
repository:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;RPM removal from Trino, and replacement module&lt;/li&gt;
  &lt;li&gt;trino-server-core tarball in Trino and plugin selection&lt;/li&gt;
  &lt;li&gt;trino-server-custom module&lt;/li&gt;
  &lt;li&gt;trinodb/trino-core:latest Docker container in Trino&lt;/li&gt;
  &lt;li&gt;custom-docker module&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred runs a build, shows the results, and walks through the packages
repository structure and instruction. To finish of we talk about next steps such
as removing plugins from the default binaries and therefore making them
optional.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-packages&quot;&gt;trino-packages repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22597&quot;&gt;Packaging improvement issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/installation/plugins.html&quot;&gt;Trino plugin documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Quick update on where to see Cole or Manfred next, and then join us for the
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call - 23rd of April&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 74: One tool to proxy them all (aws-proxy)&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 75: Insights from a Norse god (Loki connector)&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 76: Visualizing with Logi Symphony and ODBC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>72: Keeping the lake clean</title>
      <link href="https://trino.io/episodes/72.html" rel="alternate" type="text/html" title="72: Keeping the lake clean" />
      <published>2025-03-17T00:00:00+00:00</published>
      <updated>2025-03-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/72</id>
      <content type="html" xml:base="https://trino.io/episodes/72.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/viktor-kessler&quot;&gt;Viktor Kessler&lt;/a&gt;, Co-founder at Vakamo&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/thielc&quot;&gt;Christian Thiel&lt;/a&gt;, Co-founder at Vakamo&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-472.html&quot;&gt;Trino 472&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Color the server console output for improved readability.&lt;/li&gt;
  &lt;li&gt;Fix initialization failure for the DuckDB connector on Docker container.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; type and generate empty values for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt;,
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; types in the Faker connector.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$partition&lt;/code&gt; hidden column in the Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#15&quot;&gt;Trino Gateway 15&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Pop up messages in UI&lt;/li&gt;
  &lt;li&gt;Consistent use of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.yaml&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Use of OpenMetrics data from Trino clusters&lt;/li&gt;
  &lt;li&gt;Fix query errors when adhoc routing group has no healthy backends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-viktor-and-christian&quot;&gt;Introducing Viktor and Christian&lt;/h2&gt;

&lt;p&gt;We talk with Viktor and Christian about there experience in software engineering
and the world of big data, and what led them to start Vakamo together.&lt;/p&gt;

&lt;h2 id=&quot;metastores-and-catalogs&quot;&gt;Metastores and catalogs&lt;/h2&gt;

&lt;p&gt;We talk about data lakes, data lakehouses, object storage and the role of
metadata. Details we cover include the Hive Metatstore Service, the Thrift
protocol, Amazon Glue, and the new wave of catalogs. Specifically we also talk
about Apache Iceberg and the Iceberg REST catalog standard as a basis for
Lakekeeper, and then learn all the details about Lakekeeper.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/lakekeeper-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;In their demo Viktor and Christian show a multi-user Trino cluster secured by
OAuth 2, Open Policy Agent, and Lakekeeper.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakekeeper.io/&quot;&gt;Lakekeeper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.lakekeeper.io/&quot;&gt;Lakekeeper documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lakekeeper/lakekeeper&quot;&gt;Lakekeeper source&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lakekeeper/lakekeeper/tree/main/examples/trino-opa&quot;&gt;Example project with Trino and OPA&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/iceberg.html&quot;&gt;Iceberg connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/metastores.html#rest-catalog&quot;&gt;Iceberg REST catalog documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to a guest:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 73: Wrapping Trino packages with a bow&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>71: Fake it real good</title>
      <link href="https://trino.io/episodes/71.html" rel="alternate" type="text/html" title="71: Fake it real good" />
      <published>2025-02-27T00:00:00+00:00</published>
      <updated>2025-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/71</id>
      <content type="html" xml:base="https://trino.io/episodes/71.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/janwas/&quot;&gt;Jan Waś&lt;/a&gt;, 
Software Engineer at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-471.html&quot;&gt;Trino 471&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add &lt;a href=&quot;https://trino.io/docs/current/functions/ai.html&quot;&gt;AI functions&lt;/a&gt; for textual
tasks on data using OpenAI, Anthropic, or other LLMs using Ollama as backend.&lt;/li&gt;
  &lt;li&gt;Add support for logging output to the console in JSON format (useful in containers..).&lt;/li&gt;
  &lt;li&gt;Support additional Python libraries for use with Python user-defined functions.&lt;/li&gt;
  &lt;li&gt;Remove the RPM package.&lt;/li&gt;
  &lt;li&gt;Add &lt;a href=&quot;https://trino.io/docs/current/object-storage/file-system-local.html&quot;&gt;local file system support&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for S3 Tables in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#14&quot;&gt;Trino Gateway 14&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our first Trino Gateway release of 2025 shipped, and it is packed with great new
features and fixes. Some examples are the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Rules editor in the web interface&lt;/li&gt;
  &lt;li&gt;Automatic database schema update and support for Oracle&lt;/li&gt;
  &lt;li&gt;Trino cluster monitoring with JMX and OpenMetrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jan-waś&quot;&gt;Introducing Jan Waś&lt;/h2&gt;

&lt;p&gt;Jan, also known as &lt;a href=&quot;https://github.com/nineinchnick/&quot;&gt;nineinchnick on GitHub&lt;/a&gt;,
is a very active Trino contributor with a wide range of his own plugins and
projects. He is subproject maintainer for the Helm charts and the Grafana
plugin, and is heavily involved in GitHub actions setup and numerous other
efforts. Jan resides in Poland. When he is not working on Trino, you can find
him at metal, electronics, and even opera concerts across Europe or at home
playing video games.&lt;/p&gt;

&lt;h2 id=&quot;datafaker-faker-connector-and-trino&quot;&gt;Datafaker, Faker connector, and Trino&lt;/h2&gt;

&lt;p&gt;We talk about using simulated data from the TPC-H and TPC-DS connectors to learn
SQL and use it for other scenarios such as benchmarking, testing for SQL
support, and validating other connectors and data sources. This leads us to the
limitations of these connectors and how the Faker connector is the next step.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/datafaker-small.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Jan tells us about the Datafaker library and his motivation to create a
connector, and how it eventually landed in Trino itself.&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Jan shows us how to configure the connector and then demoes a number of use
cases from learning SQL to populating and testing other data sources.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/data-source.html#datafaker&quot;&gt;Datafaker project&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/reports&quot;&gt;Trino reports repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/nineinchnick/&quot;&gt;Other project repositories from Jan&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/28/trino-fest-2023-starburst-recap.html&quot;&gt;Zero-cost reporting, presented at Trino Fest 2023&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Watch the &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;recording of the Trino contributor call or read the
minutes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to a guest:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 72: Keeping the lake clean, all about
&lt;a href=&quot;https://lakekeeper.io/&quot;&gt;Lakekeeper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 73: Wrapping Trino packages with a bow&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>70: Previewing a new UI</title>
      <link href="https://trino.io/episodes/70.html" rel="alternate" type="text/html" title="70: Previewing a new UI" />
      <published>2025-02-13T00:00:00+00:00</published>
      <updated>2025-02-13T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/70</id>
      <content type="html" xml:base="https://trino.io/episodes/70.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/peter-kosztolanyi-5617938/&quot;&gt;Peter Kosztolanyi&lt;/a&gt;, 
Analytics Platform Lead at &lt;a href=&quot;https://wise.com/&quot;&gt;Wise&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the Trino releases since episode 69:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New DuckDB connector&lt;/li&gt;
  &lt;li&gt;New Grafana Loki connector&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH SESSION&lt;/code&gt; for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; queries&lt;/li&gt;
  &lt;li&gt;Raise minimum runtime requirement to Java 11 for JDBC driver and CLI&lt;/li&gt;
  &lt;li&gt;Remove Kinesis connector&lt;/li&gt;
  &lt;li&gt;Deprecate use of the legacy file system support for Azure Storage, Google
Cloud Storage, IBM Cloud Object Storage, S3 and S3-compatible object storage
systems - &lt;a href=&quot;/blog/2025/02/10/old-file-system.html&quot;&gt;check out the blog post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;introducing-peter-kosztolanyi&quot;&gt;Introducing Peter Kosztolanyi&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/koszti&quot;&gt;Peter Kosztolanyi&lt;/a&gt; is the Analytics Platform Lead at
&lt;a href=&quot;https://wise.com/&quot;&gt;Wise&lt;/a&gt; and he &lt;a href=&quot;https://youtu.be/K5RmYtbeXAc&quot;&gt;presented about their data
lake&lt;/a&gt; with Abdullah Alkhawatrah at &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit
2024&lt;/a&gt;. Peter has a lot
of experience in the data and business intelligence fields.&lt;/p&gt;

&lt;p&gt;He also contributes to the Trino Python client, and worked on his own phone and
messaging app for iOS and Android in the past.&lt;/p&gt;

&lt;h2 id=&quot;trino-legacy-web-ui&quot;&gt;Trino legacy web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;existing main web UI for
Trino&lt;/a&gt; has been around
for a long time, and sees very limited development and maintenance. It lacks
documentation, a modern look, a clean codebase, and is inconsistent across
screens. It is also very technical and developer focussed, and lacks features
like a SQL console to run queries.&lt;/p&gt;

&lt;h2 id=&quot;efforts-for-a-new-ui&quot;&gt;Efforts for a new UI&lt;/h2&gt;

&lt;p&gt;While we all knew about the problems of the old UI, nobody with enough UI coding
knowledge or time and motivation ever took up the banner to change the
situation. We did however get a great new UI contributed in Trino Gateway, and
that motivated some people in the community, especially Peter.&lt;/p&gt;

&lt;p&gt;Peter started with the same stack, pulled in maintainers like Mateusz Gajewski
and Manfred Moser, and kept working on improvements. We talk more about the
following aspects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Problems with the old UI and its technology stack&lt;/li&gt;
  &lt;li&gt;Trino Gateway UI&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22697&quot;&gt;Roadmap issue&lt;/a&gt; and discussion around the new UI&lt;/li&gt;
  &lt;li&gt;What is the stack now?&lt;/li&gt;
  &lt;li&gt;Look at the
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/core/trino-web-ui&quot;&gt;codebase&lt;/a&gt;,
tools, development, and
&lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Current status and next steps&lt;/li&gt;
  &lt;li&gt;What do we need from others?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Peter shows us the new UI from his development setup - the latest and greatest
set of features.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;Preview Web UI documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/core/trino-web-ui&quot;&gt;Preview Web UI codebase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22697&quot;&gt;Roadmap issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Legacy Web UI documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to be the next guest.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino contributor call, 27th of February&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 71 with Jan Waś about the new &lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker
connector&lt;/a&gt;, 27th of
February&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>69: Client protocol improvements</title>
      <link href="https://trino.io/episodes/69.html" rel="alternate" type="text/html" title="69: Client protocol improvements" />
      <published>2025-01-30T00:00:00+00:00</published>
      <updated>2025-01-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/69</id>
      <content type="html" xml:base="https://trino.io/episodes/69.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;, Sr. Staff Software Engineer at 
&lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the first release of 2025. It took us a bit longer to work through release blockers this time:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-469.html&quot;&gt;Trino 469&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FIRST&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; clauses to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE ...
ADD COLUMN&lt;/code&gt; for Iceberg, MySQL, and MariaDB.&lt;/li&gt;
  &lt;li&gt;SSE-C in S3 security mapping for Delta Lake, Hive, Hudi, and Iceberg&lt;/li&gt;
  &lt;li&gt;Allow configuration for Google Cloud Storage endpoint with object storage
connectors.&lt;/li&gt;
  &lt;li&gt;Allow connection validation and add more stats for JDBC driver.&lt;/li&gt;
  &lt;li&gt;Remove support for connector-level event listeners.&lt;/li&gt;
  &lt;li&gt;Misc improvements for the Faker connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;other-news&quot;&gt;Other news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Python client 0.332.0 with spooling support&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-23-jan-2025&quot;&gt;Trino contributor call&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-wendigo&quot;&gt;Introducing wendigo&lt;/h2&gt;

&lt;p&gt;What can we say? Top contributor and maintainer, and all around hacker on Trino,
numerous Trino subprojects, Airlift, and beyond.&lt;/p&gt;

&lt;h2 id=&quot;main-topic&quot;&gt;Main topic&lt;/h2&gt;

&lt;p&gt;Let’s talk about the Trino client protocol. Following are some topics we cover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is the client protocol for?&lt;/li&gt;
  &lt;li&gt;History of the client protocol&lt;/li&gt;
  &lt;li&gt;Available client drivers and client applications&lt;/li&gt;
  &lt;li&gt;Architecture and flow&lt;/li&gt;
  &lt;li&gt;Motivation to improve the protocol&lt;/li&gt;
  &lt;li&gt;Direct and spooling modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mateusz walks through the presentation and Cole and Manfred ask a lot of
questions:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;/assets/episode/tcb69-client-protocol.pdf&quot;&gt;
        Presentation
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Mateusz show us his example and testing setup with Starburst Galaxy clusters
configured for spooling protocol use and shares some of the performance gains he
observes.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb69-client-protocol.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/client/client-protocol.html&quot;&gt;Client protocol documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-driver.html&quot;&gt;Available client drivers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-application.html&quot;&gt;Available client applications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to be the next guest.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>68: Year of the Snake - Python UDFs</title>
      <link href="https://trino.io/episodes/68.html" rel="alternate" type="text/html" title="68: Year of the Snake - Python UDFs" />
      <published>2025-01-16T00:00:00+00:00</published>
      <updated>2025-01-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/68</id>
      <content type="html" xml:base="https://trino.io/episodes/68.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering and Trino maintainer at
&lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;David Phillips&lt;/a&gt;, Trino co-creator and maintainer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the Trino releases since episode 67:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-465.html&quot;&gt;Trino 465&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for customer-provided SSE key in S3 file system relevant for Hive,
Iceberg, Delta Lake and Hudi connectors.&lt;/li&gt;
  &lt;li&gt;Deterministic data, locale support, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;random_string&lt;/code&gt; function for the Faker
connector.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extra_properties&lt;/code&gt; in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry&lt;/code&gt; type in the PostgreSQL connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;docs/current/release/release-466.html&quot;&gt;Trino 466&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove Python requirement for Trino by replacing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;launcher&lt;/code&gt; script.&lt;/li&gt;
  &lt;li&gt;Improve client protocol throughput by introducing the spooling protocol and
ship it with documentation, including implementation in the JDBC driver and
the CLI.&lt;/li&gt;
  &lt;li&gt;Add support for data access control with Apache Ranger, including support for
column masking, row filtering, and audit logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;docs/current/release/release-467.html&quot;&gt;Trino 467&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Change default for internal communication to HTTP/1.1.&lt;/li&gt;
  &lt;li&gt;Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event
listeners.&lt;/li&gt;
  &lt;li&gt;Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;microdnf&lt;/code&gt; package manager from the Docker image.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$all_manifests&lt;/code&gt; metadata tables in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$transactions&lt;/code&gt; metadata table in the Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-468.html&quot;&gt;Trino 468&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add &lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Rename SQL routines to SQL user-defined functions.&lt;/li&gt;
  &lt;li&gt;Add cluster overview to the Preview Web UI.&lt;/li&gt;
  &lt;li&gt;Improve bucket execution for Hive and Iceberg.&lt;/li&gt;
  &lt;li&gt;Add support for non-transactional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statements for PostgreSQL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;other-news&quot;&gt;Other news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#13&quot;&gt;Trino Gateway 13&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit recap&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2025/01/07/2024-and-beyond.html&quot;&gt;Trino in 2024 and beyond&lt;/a&gt;, answer
our survey!&lt;/li&gt;
  &lt;li&gt;December 2024 Trino maintainer and contributor calls took place virtually.&lt;/li&gt;
  &lt;li&gt;Trino Python client 0.332.0 includes support for spooling mode of client
protocol.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;user-defined-functions-in-trino&quot;&gt;User-defined functions in Trino&lt;/h2&gt;

&lt;p&gt;First there were &lt;a href=&quot;/docs/current/develop/functions.html&quot;&gt;custom plugins with user defined
functions&lt;/a&gt;, and for a long
time, that was all there is.&lt;/p&gt;

&lt;p&gt;In 2023, David contributed SQL user-defined functions, also known as SQL
routines, and we ran a &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;competition for examples&lt;/a&gt;. Manfred wrote the docs and did a &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;training session with
Dain and Martin&lt;/a&gt;. And even back then,
David had plans to add other languages, and started working on Python.&lt;/p&gt;

&lt;p&gt;At &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit in 2024&lt;/a&gt; Martin Traverso announced the new upcoming feature in the keynote, and with
&lt;a href=&quot;/docs/current/release/release-468.html&quot;&gt;Trino 468&lt;/a&gt; we shipped
support for &lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;

&lt;p&gt;Why support Python for user-defined functions, as compared to just SQL? Simply
put, more is better, and Python is everywhere. We chat with David about the
details.&lt;/p&gt;

&lt;h2 id=&quot;development-history-and-collaboration&quot;&gt;Development history and collaboration&lt;/h2&gt;

&lt;p&gt;David tell us more about figuring out how to make it all work at all. He touches
on topics such as security, performance, deployment, monitoring, and
collaboration with other projects. We also talk about why other approaches like
using local CPython were discarded.&lt;/p&gt;

&lt;h2 id=&quot;architecture-and-consequences&quot;&gt;Architecture and consequences&lt;/h2&gt;

&lt;p&gt;In this discussion we talk try to cover the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;How does it all work?&lt;/li&gt;
  &lt;li&gt;What are some restrictions?&lt;/li&gt;
  &lt;li&gt;What performance can users expect?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s chat about this nesting:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb68-python-udf-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;examples-and-demo&quot;&gt;Examples and demo&lt;/h2&gt;

&lt;p&gt;A simple example from the documentation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;python_udf_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_parameter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_data_type&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;LANGUAGE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PYTHON&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;handler&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;python_function&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$$&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;python_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;err&quot;&gt;$$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;David shows us more, and we talk about the details.&lt;/p&gt;

&lt;h2 id=&quot;feedback-and-future-work&quot;&gt;Feedback and future work&lt;/h2&gt;

&lt;p&gt;We are looking for feedback:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;More examples for the documentation for our users&lt;/li&gt;
  &lt;li&gt;Use cases and experience testing the feature&lt;/li&gt;
  &lt;li&gt;Production deployment experiences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Future work depends on the feedback but definitely includes the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Performance improvements&lt;/li&gt;
  &lt;li&gt;Fine-tuning of available Python packages&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.python.org/&quot;&gt;Python&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly (Wasm)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://chicory.dev/&quot;&gt;Chicory&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf.html&quot;&gt;Trino user-defined functions overview&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-wasm-python&quot;&gt;trino-wasm-python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;You are all invited to chat with us about development at the Trino contributor
call on the 23rd of January.&lt;/li&gt;
  &lt;li&gt;Join us on the 30th of January with Mateusz Gajewski to learn about client
protocol improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>67: Extra speed with Exasol and Trino</title>
      <link href="https://trino.io/episodes/67.html" rel="alternate" type="text/html" title="67: Extra speed with Exasol and Trino" />
      <published>2024-10-30T00:00:00+00:00</published>
      <updated>2024-10-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/67</id>
      <content type="html" xml:base="https://trino.io/episodes/67.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; - 
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/thomas-bestfleisch/&quot;&gt;Thomas Bestfleisch&lt;/a&gt;, 
Senior Product Manager at &lt;a href=&quot;https://www.exasol.com/&quot;&gt;Exasol&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the recent Trino releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-461.html&quot;&gt;Trino 461&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_files&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_files_from_table&lt;/code&gt; procedures in the
Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-462.html&quot;&gt;Trino 462&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for read operations when using the Unity catalog as Iceberg REST
catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Improve performance and memory usage when decoding data in the CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-463.html&quot;&gt;Trino 463&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Enable HTTP/2 for internal communication by default.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timezone()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Include table functions with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW FUNCTIONS&lt;/code&gt; output.&lt;/li&gt;
  &lt;li&gt;Add support for writing change data feed when deletion vector is enabled to
the Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-464.html&quot;&gt;Trino 464&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Require JDK 23 to run Trino.&lt;/li&gt;
  &lt;li&gt;Add the Faker connector.&lt;/li&gt;
  &lt;li&gt;Add the Vertica connector.&lt;/li&gt;
  &lt;li&gt;Remove the Accumulo connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino maintainer call - great sync with some exciting news coming to the community soon.&lt;/li&gt;
  &lt;li&gt;Trino contributor call - &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-24-oct-2024&quot;&gt;recording and minutes available now&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Trino Kubernetes operator meeting - minutes coming soon.&lt;/li&gt;
  &lt;li&gt;Trino Summit call for speakers closed - stay tuned for announcements and
&lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;don’t forget to register&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-thomas-and-exasol&quot;&gt;Introducing Thomas and Exasol&lt;/h2&gt;

&lt;p&gt;Exasol is a lightning fast, in-memory database for analytics. And this is not
just a marketing slogan. Exasol has been at the top of the TPC-H benchmarks for
a long time now. Thomas tells more about the database and his role.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/exasol-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;exasol-and-trino&quot;&gt;Exasol and Trino&lt;/h2&gt;

&lt;p&gt;Trino and Exasol bridge the gap between extreme performance with in-memory usage
from Exasol, and massive scale from a lakehouse with Trino.&lt;/p&gt;

&lt;p&gt;We learn more about Exasol as Thomas guides us through his &lt;a href=&quot;/assets/episode/tcb67-exasol.pdf&quot;&gt;presentation about
Exasol and Trino&lt;/a&gt;, and take
the opportunity to question him for more details.&lt;/p&gt;

&lt;p&gt;The pull request for the Exasol connector has been a long time in the works and
was finally merged for Trino 452. We talk about the motivation, the process,
the results, and the future for the connector.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.exasol.com/&quot;&gt;Exasol&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/exasol.html&quot;&gt;Trino’s Exasol connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.exasol.com/exasol-saas/&quot;&gt;Exasol SaaS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/exasol/ai-lab&quot;&gt;Exasol AI lab&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hub.docker.com/r/exasol/docker-db&quot;&gt;Exasol container&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;Trino Summit 2024&lt;/a&gt;:
Information about first sessions and more available. Call for speakers closed.
Announcements coming soon.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>66: Chat with Trino and Wren AI</title>
      <link href="https://trino.io/episodes/66.html" rel="alternate" type="text/html" title="66: Chat with Trino and Wren AI" />
      <published>2024-09-12T00:00:00+00:00</published>
      <updated>2024-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/66</id>
      <content type="html" xml:base="https://trino.io/episodes/66.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/himanshu-mendapara-a732051aa/&quot;&gt;Himanshu Mendapra&lt;/a&gt;, 
Software Engineer at &lt;a href=&quot;https://begenuin.com/&quot;&gt;Genuin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/wwwy3y3/&quot;&gt;William Chang&lt;/a&gt;, 
CTO and Co-Founder at &lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/yadiacolindres/&quot;&gt;Yadia Colindres&lt;/a&gt;, 
Product Management Advisor at &lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-458.html&quot;&gt;Trino 458&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Deactivate legacy file system support for all catalogs. You must activate the
desired file system support with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-azure.enabled&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-gcs.enabled&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-s3.enabled&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.hadoop.enabled&lt;/code&gt; in
each catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Add support for tracing with OpenTelemetry to the JDBC driver.&lt;/li&gt;
  &lt;li&gt;Reduce data transfer from remote systems for queries with large &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; lists in
numerous connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-459.html&quot;&gt;Trino 459&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker container now uses Java 23. Please test this and let us know of any
problems since Java 23 is going to be a requirement soon.&lt;/li&gt;
  &lt;li&gt;Add support for KiB and similar data size units for the Trino CLI output.&lt;/li&gt;
  &lt;li&gt;Allow configuring maximum concurrent HTTP requests to Azure on every node&lt;/li&gt;
  &lt;li&gt;Add support for WASB to Azure Storage file system support.&lt;/li&gt;
  &lt;li&gt;Improve cache hit ratio for the file system cache.&lt;/li&gt;
  &lt;li&gt;Remove the local file connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-460.html&quot;&gt;Trino 460&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for using an Alluxio cluster as file system cache.&lt;/li&gt;
  &lt;li&gt;Add support for WASBS to Azure Storage file system support.&lt;/li&gt;
  &lt;li&gt;Remove the atop connector.&lt;/li&gt;
  &lt;li&gt;Remove the Raptor connector.&lt;/li&gt;
  &lt;li&gt;Numerous performance improvements for the Clickhouse connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Updated and improved documentation for contributors for Trino, Trino Gateway,
and other Trino projects.&lt;/li&gt;
  &lt;li&gt;Jan Was steps up as subproject maintainer for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-js-client&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Cristian Osiac, Jordan Zimmermann, and Pablo Arteaga are working on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws-proxy&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-himanshu&quot;&gt;Introducing Himanshu&lt;/h2&gt;

&lt;p&gt;Working at Genuin as software engineer, learning about new technologies, and
occasionally &lt;a href=&quot;https://github.com/himanshu634&quot;&gt;contributing to open source
projects&lt;/a&gt; like Wren AI.&lt;/p&gt;

&lt;h2 id=&quot;introducing-william-and-yadia&quot;&gt;Introducing William and Yadia&lt;/h2&gt;

&lt;p&gt;William is co-founder at Canner and drives everything about Canner Enterprise
and Wren AI as CTO. Yadia works with William at Canner and is product manager
for Wren AI.&lt;/p&gt;

&lt;p&gt;We talk about the history of Canner and their usage of Trino in Canner
Enterprise.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/canner-small.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Pivoting to talk about Wren AI, we learn about its architecture, use cases and
features, and continue along with an extensive demo of Wren AI.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/wren-ai-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getwren.ai/&quot;&gt;Wren AI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Canner/WrenAI/pull/535&quot;&gt;Pull request for Trino integration&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getwren.ai/oss/guide/connect/trino&quot;&gt;Trino as Wren AI data source documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.producthunt.com/posts/wren-ai-cloud&quot;&gt;Wren AI launch at producthunt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;A call out to help us &lt;a href=&quot;https://github.com/trinodb/trino/issues/23121&quot;&gt;clean up and close old
issues&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Join us for the next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast
67&lt;/a&gt; about the Exasol database and Trino connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>65: Performance boosts</title>
      <link href="https://trino.io/episodes/65.html" rel="alternate" type="text/html" title="65: Performance boosts" />
      <published>2024-09-12T00:00:00+00:00</published>
      <updated>2024-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/65</id>
      <content type="html" xml:base="https://trino.io/episodes/65.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-455.html&quot;&gt;Trino 455&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add query starting time in QueryStatistics in all event listeners, including
the new Kafka event listener.&lt;/li&gt;
  &lt;li&gt;Allow configuring endpoint for the native Azure filesystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-456.html&quot;&gt;Trino 456&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Invalid - release process errors resulted in invalid artifacts.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-457.html&quot;&gt;Trino 457&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance of queries involving joins when fault-tolerant execution
is enabled.&lt;/li&gt;
  &lt;li&gt;Improve performance for LZ4, Snappy and ZSTD compression and decompression.&lt;/li&gt;
  &lt;li&gt;Publish a JDBC driver JAR without bundled, third-party dependencies.&lt;/li&gt;
  &lt;li&gt;Improve performance for concurrent write operations on S3 by using lock-less
Delta Lake write reconciliation, made possible with the release of the AWS SDK
with S3 conditional write support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;h2 id=&quot;performance-boosters&quot;&gt;Performance boosters&lt;/h2&gt;

&lt;p&gt;We chat about some of the following aspects and projects and their impact on Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Role and history of Aircompressor.&lt;/li&gt;
  &lt;li&gt;Foundation from Airlift.&lt;/li&gt;
  &lt;li&gt;Relation to Java 22, and soon 23.&lt;/li&gt;
  &lt;li&gt;Status and next steps for improved and modernized file system support.&lt;/li&gt;
  &lt;li&gt;A quick glance at client protocol improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/airlift/aircompressor&quot;&gt;Aircompressor&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/airlift/airlift&quot;&gt;Airlift&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/object-storage.html&quot;&gt;Object storage and file system documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22271&quot;&gt;Project Swift&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;We chat about the &lt;a href=&quot;https://github.com/trinodb/trino/issues/23122&quot;&gt;recent cleanup of unused Slack
channels&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;A call out to help us &lt;a href=&quot;https://github.com/trinodb/trino/issues/23121&quot;&gt;clean up and close old
issues&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Check out our new &lt;a href=&quot;https://github.com/trinodb/presentations/tree/main/assets/backgrounds&quot;&gt;video call background
images&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Join us for the next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast
66&lt;/a&gt; about Wren AI and Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>64: Control with Open Policy Agent OPA</title>
      <link href="https://trino.io/episodes/64.html" rel="alternate" type="text/html" title="64: Control with Open Policy Agent OPA" />
      <published>2024-08-22T00:00:00+00:00</published>
      <updated>2024-08-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/64</id>
      <content type="html" xml:base="https://trino.io/episodes/64.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/sebastian-bernauer-622b95167&quot;&gt;Sebastian Bernauer&lt;/a&gt;, Software Developer at &lt;a href=&quot;https://trino.io/users.html#stackable&quot;&gt;Stackable&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/soenkeliebau/&quot;&gt;Sönke Liebau&lt;/a&gt;, Co-Founder and CPO
at &lt;a href=&quot;https://trino.io/user.htmls#stackable&quot;&gt;Stackable&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-454.html&quot;&gt;Trino 454&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance for queries that contain multiple aggregate functions,
including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add Kafka event listener plugin (yet to be documented).&lt;/li&gt;
  &lt;li&gt;Add configuration for fetch size with JDBC-based connectors (yet to be documented).&lt;/li&gt;
  &lt;li&gt;Add support for writing Deletion Vectors with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add new &lt;strong&gt;Resources&lt;/strong&gt; tab in the web interface with data from the new
light-weight query endpoint &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/v1/query?pruned=true&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add new Preview Web UI (help us test and develop!).&lt;/li&gt;
  &lt;li&gt;Add S3 security mapping for the native S3 filesystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;h2 id=&quot;stackable-opa-and-more&quot;&gt;Stackable, OPA, and more&lt;/h2&gt;

&lt;p&gt;We chat with Sönke and Sebastian about the following agenda topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is Stackable?&lt;/li&gt;
  &lt;li&gt;Open Policy Agent (OPA) authorization plugin
    &lt;ul&gt;
      &lt;li&gt;History&lt;/li&gt;
      &lt;li&gt;Recent development&lt;/li&gt;
      &lt;li&gt;Compatibility layer to Trino’s file-based access control&lt;/li&gt;
      &lt;li&gt;Quick demo on row filtering and column masking&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Auto-scaling Trino clusters using trino-lb
    &lt;ul&gt;
      &lt;li&gt;Differences between &lt;a href=&quot;https://trino.io/ecosystem/add-on.html#trino-gateway&quot;&gt;Trino
Gateway&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#trino-lb&quot;&gt;trino-lb&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other aspects we discuss include the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Performance considerations&lt;/li&gt;
  &lt;li&gt;Aspects of Trino on Kubernetes such as graceful shutdown,
PodDisruptionBudgets,  and anti-affinity&lt;/li&gt;
  &lt;li&gt;Plans for next steps&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-resources&quot;&gt;Other resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/assets/episode/tcb64-stackable-opa-trino-lb.pdf&quot;&gt;Presentation slide deck&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-youtube watch-listen-icon&quot; title=&quot;Youtube&quot;&gt;&lt;/i&gt; Video for
&lt;a href=&quot;https://www.youtube.com/watch?v=fbqqapQbAv0&quot;&gt;Trino OPA Authorizer - Stackable and Bloomberg at Trino Summit
2023&lt;/a&gt; presented by Sönke from
Stackable and Pablo Arteaga from Bloomberg&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-github&quot; title=&quot;GitHub&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://github.com/stackabletech/trino-operator/tree/main/tests/templates/kuttl/opa-authorization/trino_rules&quot;&gt;Source code repo for
compatibility layer between Trino classic file-based access control JSON and
OPA/Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-youtube watch-listen-icon&quot; title=&quot;Youtube&quot;&gt;&lt;/i&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ATlq_l3WNiA&quot;&gt;Longer demo
video for row filtering and column
masking&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast 65&lt;/a&gt; about
the new Exasol connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>63: Querying with JS</title>
      <link href="https://trino.io/episodes/63.html" rel="alternate" type="text/html" title="63: Querying with JS" />
      <published>2024-08-01T00:00:00+00:00</published>
      <updated>2024-08-01T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/63</id>
      <content type="html" xml:base="https://trino.io/episodes/63.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/emilyasunaryo&quot;&gt;Emily Sunaryo&lt;/a&gt;, DevRel Intern at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-452.html&quot;&gt;Trino 452&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add Exasol connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;euclidean_distance()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dot_product()&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cosine_distance()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Add support for using the BigQuery Storage Read API when using the query table
function with the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function for full query pass-through to the ClickHouse
connector.&lt;/li&gt;
  &lt;li&gt;Numerous improvements on the Delta Lake, Hive, Hudi, and Iceberg connectors
and the related file system support in Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-453.html&quot;&gt;Trino 453&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for non-equality joins.&lt;/li&gt;
  &lt;li&gt;Support for setting the SQL path for JDBC driver and CLI.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execute&lt;/code&gt; procedure to run arbitrary statements in the underlying data source.&lt;/li&gt;
  &lt;li&gt;Support for reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pgvector&lt;/code&gt; vector types in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;Support for views when using the Iceberg JDBC catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;p&gt;Other noteworthy topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/&quot;&gt;Trino Gateway 10&lt;/a&gt;
release is out, and includes some major refactoring and new features.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-25-jul-2024&quot;&gt;Trino Contributor Call&lt;/a&gt;
recap is available. Note that the file system support will soon switch to the
new Trino-native implementations as default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest-emily-sunaryo&quot;&gt;Guest Emily Sunaryo&lt;/h2&gt;

&lt;p&gt;Emily Sunaryo is a recent UC Berkeley graduate working in the Developer
Relations team at Starburst. She has a passion for both technical development
and also enablement of developer communities. With her degree in Data Science,
she is also interested in learning more about modern approaches to data
analytics and how emerging technologies can drive innovation in this space.&lt;/p&gt;

&lt;h2 id=&quot;trino-clients&quot;&gt;Trino clients&lt;/h2&gt;

&lt;p&gt;Trino clients come in many shapes and forms, but all of them allow users to run
SQL queries in Trino and access the results. They all use the Trino client REST
API. To make it easier for developers of these applications, as well as any
custom application, we provide a number of drivers as language-specific
wrappers. These include the JDBC driver, the Python client, the Go client, and
others.&lt;/p&gt;

&lt;h2 id=&quot;javascript&quot;&gt;JavaScript&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/images/logos/javascript.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/regadas&quot;&gt;Filipe Regadas&lt;/a&gt; agreed to transfer his
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt; project to
trinodb and is now subproject maintainer. We are in the process of getting to a
first release ready to ship. We would love for you to help us!&lt;/p&gt;

&lt;h2 id=&quot;learning-about-trino&quot;&gt;Learning about Trino&lt;/h2&gt;

&lt;p&gt;Emily’s journey and bringing it all together. From university and Starburst
internship to the Trino Community Broadcast, and a working demo web application.&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Emily talks about her demo web application using React, npm, and various other
libraries and tools to build a data application. The data resides in Trino,
specifically in &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst
Galaxy&lt;/a&gt; to make the
management easier, and she uses the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-js-client&lt;/code&gt; in her application to run
some pretty complex SQL queries again the NYC rideshare data set.&lt;/p&gt;

&lt;p&gt;Find more details in the
&lt;a href=&quot;https://github.com/emilysunaryo/trino-js-demo&quot;&gt;source code repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-aug-2024&quot;&gt;Trino Contributor Call&lt;/a&gt;
on the 22nd of August.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast 64&lt;/a&gt; with
the &lt;a href=&quot;https://trino.io/users.html#stackable&quot;&gt;Stackable&lt;/a&gt; team about OPA on the 22nd
of August.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>62: A lakehouse that simply works at Prezi</title>
      <link href="https://trino.io/episodes/62.html" rel="alternate" type="text/html" title="62: A lakehouse that simply works at Prezi" />
      <published>2024-07-11T00:00:00+00:00</published>
      <updated>2024-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/62</id>
      <content type="html" xml:base="https://trino.io/episodes/62.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/vincenzo-cassaro/&quot;&gt;Vincenzo Cassaro&lt;/a&gt; -
&lt;a href=&quot;https://twitter.com/viciocassaro&quot;&gt;@viciocassaro&lt;/a&gt;, Data Engineer at
&lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-451.html&quot;&gt;Trino 451&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for configuring a proxy for the S3 native file system.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t_pdf&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t_cdf&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Improve performance of certain queries involving window functions.&lt;/li&gt;
  &lt;li&gt;Lots of Iceberg connector improvements including support for incremental
refresh for basic materialized views.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other noteworthy topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/oneonestar&quot;&gt;Star Poon (oneonestar)&lt;/a&gt; approved as new
subproject maintainer for &lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino Gateway&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/06/24/trino-fest-recap.html&quot;&gt;Recap blog post&lt;/a&gt; from Trino Fest
with video recordings and slides is now available.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Congregation &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-congregation-14-june-2024&quot;&gt;recap notes&lt;/a&gt; are also available.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techplay.jp/event/944074&quot;&gt;Trino Japan meetup&lt;/a&gt; happened on the 10th of July.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest-vincenzo-cassaro&quot;&gt;Guest Vincenzo Cassaro&lt;/h2&gt;

&lt;p&gt;Vincenzo has been working with data in all its forms, from data modeling to
analytics and ML, since he completed his masters in computer engineering in
Italy. He is joining us from there, more specifically from Sicily, to chat with
us about how he got into computers, learned about Trino, and ended up at Prezi
now.&lt;/p&gt;

&lt;h2 id=&quot;about-prezi&quot;&gt;About Prezi&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt; probably doesn’t need any introduction, but just in
case: Prezi is a popular and powerful platform to create and show engaging
presentations, videos, and infographics.&lt;/p&gt;

&lt;h2 id=&quot;a-lakehouse-that-simply-works&quot;&gt;A Lakehouse that simply works&lt;/h2&gt;

&lt;p&gt;With so many different technologies and vendors making proposals, it’s easy to
lose track of what truly matters. We chat with Vincenzo Cassaro from Prezi about
how a simple combination of established, maintained, open source technologies
can make a lakehouse that truly works at the scale of a company with 150 million
users.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://prezi.com/view/P4HYav74ficPkkTAHjXJ/&quot;&gt;Prezi slide deck for Vincenzo’s talk&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt; is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-25-jul-2024&quot;&gt;Trino Contributor Call&lt;/a&gt; on the 25th of July.&lt;/li&gt;
  &lt;li&gt;Next Trino Community Broadcast on 1st of August.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>61: Trino powers business intelligence</title>
      <link href="https://trino.io/episodes/61.html" rel="alternate" type="text/html" title="61: Trino powers business intelligence" />
      <published>2024-06-20T00:00:00+00:00</published>
      <updated>2024-06-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/61</id>
      <content type="html" xml:base="https://trino.io/episodes/61.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/patrick-pichler/&quot;&gt;Patrick Pichler&lt;/a&gt;, Owner and
co-founder at &lt;a href=&quot;https://www.creativedata.io/&quot;&gt;Creative Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-449.html&quot;&gt;Trino 449&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add OpenLineage event listener.&lt;/li&gt;
  &lt;li&gt;Add support for views when using the Iceberg REST catalog.&lt;/li&gt;
  &lt;li&gt;Improve write performance for Parquet files in Hive, Iceberg, and Delta Lake
connector.&lt;/li&gt;
  &lt;li&gt;Improve equality delete performance in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-450.html&quot;&gt;Trino 450&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;first_value()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_value()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_trunc()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_add()&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_diff()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Add support for concurrent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; queries in Delta
Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading UniForm tables in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; in Iceberg and Memory connector.&lt;/li&gt;
  &lt;li&gt;Automatically configure BigQuery scan parallelism.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;first-recap-from-trino-fest-2024&quot;&gt;First recap from Trino Fest 2024&lt;/h2&gt;

&lt;p&gt;Cole and Manfred chat a bit about Trino Fest last week, mentioning that &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7waExsD4lWarA3ML4R2HH58A&quot;&gt;all
videos are now available&lt;/a&gt;,
and a blog post with slides and more material is coming as well.&lt;/p&gt;

&lt;h2 id=&quot;impression-from-trino-contributor-congregation&quot;&gt;Impression from Trino Contributor Congregation&lt;/h2&gt;

&lt;p&gt;Manfred and Dain lead the discussions in the congregation. We are excited about
a lot of the follow ups for the project and increased collaboration and
innovation.&lt;/p&gt;

&lt;h2 id=&quot;guest-patrick-pichler&quot;&gt;Guest Patrick Pichler&lt;/h2&gt;

&lt;p&gt;Patrick specializes in providing guidance, designing, and implementing
sustainable data, analytics and AI solutions utilizing open architectures at
Creative Data. He has a long history of working in the data and data platform
space as user, developer, administrator, manager, consultant, and educator.&lt;/p&gt;

&lt;h2 id=&quot;powerbi-overview&quot;&gt;PowerBI overview&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://powerbi.microsoft.com/&quot;&gt;Power BI&lt;/a&gt; is an interactive data visualization
software product suite developed by Microsoft with a primary focus on business
intelligence. We talk about the different available products and features, and
their usage in the community.&lt;/p&gt;

&lt;h2 id=&quot;trino-client-support-options-for-power-bi&quot;&gt;Trino client support options for Power BI&lt;/h2&gt;

&lt;p&gt;Typically, Power BI relies on ODBC drivers for connecting to specific data
sources. Since there is no open source Trino ODBC driver however, Patrick and
other clever developers have created a &lt;a href=&quot;https://github.com/CreativeDataEU/PowerBITrinoConnector&quot;&gt;Power BI
client&lt;/a&gt; that connects
to Trino directly via the client REST API - the
&lt;a href=&quot;https://github.com/CreativeDataEU/PowerBITrinoConnector&quot;&gt;PowerBITrinoConnector&lt;/a&gt;.
We discuss the details and limitation of both approaches, look at the source
code, and learn about import and direct query modes.&lt;/p&gt;

&lt;h2 id=&quot;demo&quot;&gt;Demo&lt;/h2&gt;

&lt;p&gt;Patrick showcases how to install and use the connector in his demo of Trino and
Power BI.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=Trino-Fest-Blog-Recap&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration is open now.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>60: Trino calling AI</title>
      <link href="https://trino.io/episodes/60.html" rel="alternate" type="text/html" title="60: Trino calling AI" />
      <published>2024-05-22T00:00:00+00:00</published>
      <updated>2024-05-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/60</id>
      <content type="html" xml:base="https://trino.io/episodes/60.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/isainalcik/&quot;&gt;Isa Inalcik&lt;/a&gt;, Principal Data
Engineer at &lt;a href=&quot;https://bestsecret.com/&quot;&gt;BestSecret Group&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-446.html&quot;&gt;Trino 446&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the Snowflake catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading S3 objects restored from Glacier storage in the Hive
connector.&lt;/li&gt;
  &lt;li&gt;Add support for unsupported type handling configuration in the Snowflake
connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino 447&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE FUNCTION&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Require Java 22.&lt;/li&gt;
  &lt;li&gt;Add support for concurrent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; in the Delta Lake
connector.&lt;/li&gt;
  &lt;li&gt;Remove support for Phoenix 5.1.x and earlier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-448.html&quot;&gt;Trino 448&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance of reading from Parquet files.&lt;/li&gt;
  &lt;li&gt;Add support for caching Glue metadata with the update to use the V2 REST
interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/&quot;&gt;Trino Gateway 8 and 9&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support support for configurable router policies with two new policies available.&lt;/li&gt;
  &lt;li&gt;Add a Helm chart for deployment.&lt;/li&gt;
  &lt;li&gt;Add new website.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also had a new Trino Helm chart release 0.20.0.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/nineinchnick&quot;&gt;Jan Waś&lt;/a&gt; is now also
&lt;a href=&quot;https://trino.io/development/roles#subproject-maintainers&quot;&gt;subproject maintainer&lt;/a&gt; of the
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;go client&lt;/a&gt; and the
&lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm charts&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;impressions-from-the-iceberg-summit&quot;&gt;Impressions from the Iceberg Summit&lt;/h2&gt;

&lt;p&gt;Last week, Cole attended the &lt;a href=&quot;https://iceberg-summit.org/&quot;&gt;Iceberg Summit&lt;/a&gt; with
a special Trino perspective, and we chat about his impressions and major
take-aways.&lt;/p&gt;

&lt;h2 id=&quot;guest-isa-inalcik-from-bestsecret&quot;&gt;Guest Isa Inalcik from BestSecret&lt;/h2&gt;

&lt;p&gt;Isa is a highly skilled data expert with over a decade of hands-on experience in
software development lifecycle. He is well versed with many data tools including
Trino/Starburst Enterprise Platform, Snowflake, Airflow, Apache Spark, Hive,
Apache Iceberg, dbt, and others.&lt;/p&gt;

&lt;h2 id=&quot;trino-at-bestsecret&quot;&gt;Trino at BestSecret&lt;/h2&gt;

&lt;p&gt;At BestSecret, a leading online retailer for fashion and lifestyle in Europe,
Isa spearheads the development of efficient and resilient ELT/ETL pipelines and
the implementation of data and AI-driven solutions. We chat in more details
about their setup and use cases, his solutions, and challenges he is facing.&lt;/p&gt;

&lt;h2 id=&quot;generative-ai-interest-and-use-cases&quot;&gt;Generative AI interest and use cases&lt;/h2&gt;

&lt;p&gt;Isa has been following the waves of interest in AI and sees the following use
cases related to data and Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Media (Audio,Video,Image): Extract information out of images.&lt;/li&gt;
  &lt;li&gt;Object categorization: Categorize objects on images, videos.&lt;/li&gt;
  &lt;li&gt;Data masking: For anonymizing sensitive data from unstructured text.&lt;/li&gt;
  &lt;li&gt;Data extraction: To pull structured information from unstructured text.&lt;/li&gt;
  &lt;li&gt;Sentiment analysis: For gauging the sentiment of textual data.&lt;/li&gt;
  &lt;li&gt;Language detection or translation: For language detection or translating.&lt;/li&gt;
  &lt;li&gt;Summarization: To generate concise summaries from lengthy texts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This inspired him to try an integration of the new emerging LLMs with Trino.&lt;/p&gt;

&lt;h2 id=&quot;trino-spi&quot;&gt;Trino SPI&lt;/h2&gt;

&lt;p&gt;Trino uses a service provider interface (SPI) to allow developers to create
plugins for features such as connectors, security integrations and custom
functions. This is crucial for business to implement required functionality and
enabled Isa to work on a plugin to support custom functions that call LLMs.&lt;/p&gt;

&lt;p&gt;The OpenAI API specification also allowed him to create one function that can be
used with different LLM backends.&lt;/p&gt;

&lt;h2 id=&quot;proof-of-concept-and-demo&quot;&gt;Proof of concept and demo&lt;/h2&gt;

&lt;p&gt;We look at the concept and implementation that Isa developed with the following
architecture:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/60/trino-ai-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Isa’s &lt;a href=&quot;https://github.com/alaturqua/trino-ai&quot;&gt;trino-ai repository&lt;/a&gt; contains
source code and more details as mentioned in his post on
&lt;a href=&quot;https://www.linkedin.com/posts/isainalcik_trino-trino-llama3-activity-7187411736587587584-e2WW/&quot;&gt;LinkedIn&lt;/a&gt;
and used in the demo.&lt;/p&gt;

&lt;h2 id=&quot;other-resources&quot;&gt;Other resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Post from Isa: &lt;a href=&quot;https://www.linkedin.com/pulse/maximize-performance-secret-scaling-trino-clusters-isa-inalcik-ffo5e/&quot;&gt;Maximize Performance: The Secret to Scaling Trino Clusters with KEDA&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Post from Isa: &lt;a href=&quot;https://www.linkedin.com/pulse/enhancing-security-observability-trino-open-policy-agent-isa-inalcik-zhl9e&quot;&gt;Enhancing Security and Observability in Trino with Open Policy Agent and OpenTelemetry&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ollama.com/&quot;&gt;Ollama&lt;/a&gt; system used to run LLMs&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/develop.html&quot;&gt;Trino SPI documentation&lt;/a&gt;, including
&lt;a href=&quot;https://trino.io/docs/current/develop/functions.html&quot;&gt;custom function creation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/05/08/trino-fest-lineup-finalized.html&quot;&gt;Finalized speaker lineup announced&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=banner&quot;&gt;Register for event and hotel now&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Special thanks to our Trino Fest sponsors - Starburst as event host and
Alluxio, Cloudinary, Onehouse, Startree, and Upsolver as event sponsors.&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call on 23rd of May.&lt;/li&gt;
  &lt;li&gt;Check out upcoming &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino Community Broadcast episodes and other events&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>59: Querying Trino with Java and jOOQ</title>
      <link href="https://trino.io/episodes/59.html" rel="alternate" type="text/html" title="59: Querying Trino with Java and jOOQ" />
      <published>2024-04-24T00:00:00+00:00</published>
      <updated>2024-04-24T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/59</id>
      <content type="html" xml:base="https://trino.io/episodes/59.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Lukas Eder, Creator of &lt;a href=&quot;https:/jooq.org&quot;&gt;jOOQ&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/lukaseder&quot;&gt;@lukaseder&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-445.html&quot;&gt;Trino 445&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for time travel queries with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPLACE&lt;/code&gt; modifier as part of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt; statement
with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for writing Bloom filters in Parquet files with the Hive connector.&lt;/li&gt;
  &lt;li&gt;Add support for dynamic filtering to the MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Expand support for function pushdown in the Snowflake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;lukas-eder-and-data-geekery&quot;&gt;Lukas Eder and data geekery&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/lukaseder&quot;&gt;Lukas&lt;/a&gt; is recognized as a Java Champion and
well-known as a very active member of the Java community. We chat about his
history and involvement in the community of Java and related open source
projects, and how it lead to &lt;a href=&quot;https://www.jooq.org/&quot;&gt;jOOQ and his company data
geekery&lt;/a&gt;. Lukas also briefly talks about other products.&lt;/p&gt;

&lt;h2 id=&quot;jooq&quot;&gt;jOOQ&lt;/h2&gt;

&lt;p&gt;jOOQ stands for jOOQ Object Oriented Querying (jOOQ). It generates Java code
from your database, and lets you build type safe SQL queries through its
fluent API.&lt;/p&gt;

&lt;p&gt;All editions of jOOQ since the 3.19 release include support for Trino. The
level of support depends on the used catalog and connector, and further
Trino-specific enhancements are in progress.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#jooq&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/jooq.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation and demo session with Lukas, we cover all the following
aspects and a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is jOOQ?&lt;/li&gt;
  &lt;li&gt;What motivated the creation of jOOQ?&lt;/li&gt;
  &lt;li&gt;Discuss the great reasons for using jOOQ:
    &lt;ul&gt;
      &lt;li&gt;Database first&lt;/li&gt;
      &lt;li&gt;Typesafe SQL&lt;/li&gt;
      &lt;li&gt;Code generation&lt;/li&gt;
      &lt;li&gt;Active records&lt;/li&gt;
      &lt;li&gt;Multi-tenancy&lt;/li&gt;
      &lt;li&gt;Standardization&lt;/li&gt;
      &lt;li&gt;Query lifecycle&lt;/li&gt;
      &lt;li&gt;Procedures&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;How does it compare to ORM system like &lt;a href=&quot;https://hibernate.org/&quot;&gt;Hibernate&lt;/a&gt; or
others like the old &lt;a href=&quot;https://blog.mybatis.org/&quot;&gt;MyBatis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;What databases are supported by jOOQ and commonly used?&lt;/li&gt;
  &lt;li&gt;Chat about some customer use cases.&lt;/li&gt;
  &lt;li&gt;Supported and required Java versions, fun with upgrades, and experience from customers.&lt;/li&gt;
  &lt;li&gt;How Lukas discovered Trino and decided to add support for it.&lt;/li&gt;
  &lt;li&gt;Challenges and interesting aspects of supporting different databases&lt;/li&gt;
  &lt;li&gt;What is next for jOOQ in general, and Trino support specifically?&lt;/li&gt;
  &lt;li&gt;Cool SQL features in Trino that might be suitable for standardization:
    &lt;ul&gt;
      &lt;li&gt;Higher order functions, partially &lt;a href=&quot;https://www.jooq.org/doc/dev/manual/sql-building/column-expressions/array-functions/&quot;&gt;already supported in jOOQ&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Integration of object-relational database feature, such as nested
collections with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIST&lt;/code&gt;.&lt;/li&gt;
      &lt;li&gt;Potential introduction of new concepts to SQL, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP&lt;/code&gt;.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Complexities from Trino having different catalogs and connectors, and the
catalog, schema, table hierarchy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;jOOQ resources and further information:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.jooq.org/&quot;&gt;Website&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://groups.google.com/g/jooq-user&quot;&gt;User group mailing list&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.jooq.org/learn/&quot;&gt;Documentation and other learning resources&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jOOQ/jOOQ&quot;&gt;Source code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples&quot;&gt;Example projects&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/JavaOOQ&quot;&gt;jOOQ on X&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/04/15/trino-fest-2024-approaches.html&quot;&gt;Great speaker lineup&lt;/a&gt; announced&lt;/li&gt;
  &lt;li&gt;More to come&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=banner&quot;&gt;Register for event and hotel now&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other news and events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred’s recap of Open Source Summit NA and Data Engineer Things meeting in Seattle.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call right after the episode.&lt;/li&gt;
  &lt;li&gt;Contact us to be a guest in upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>58: Understanding your users with Trino and Mitzu</title>
      <link href="https://trino.io/episodes/58.html" rel="alternate" type="text/html" title="58: Understanding your users with Trino and Mitzu" />
      <published>2024-04-04T00:00:00+00:00</published>
      <updated>2024-04-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/58</id>
      <content type="html" xml:base="https://trino.io/episodes/58.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/imeszaros/&quot;&gt;István Mészáros&lt;/a&gt;, Founder and CEO of
&lt;a href=&quot;https://www.mitzu.io/&quot;&gt;Mitzu&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-442.html&quot;&gt;Trino 442&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for configuring AWS deployment type in OpenSearch connector.&lt;/li&gt;
  &lt;li&gt;Fix a regression from 440 in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-443.html&quot;&gt;Trino 443&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ensure all files are deleted when native S3 file system support is enabled,
and some other object storage connector improvements.&lt;/li&gt;
  &lt;li&gt;Add support for a custom authorization header name in Prometheus connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-444.html&quot;&gt;Trino 444&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update Docker image to use Java 22 for runtime.&lt;/li&gt;
  &lt;li&gt;Numerous performance improvements for the Snowflake connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BYTE_STREAM_SPLIT&lt;/code&gt; encoding in Parquet files.&lt;/li&gt;
  &lt;li&gt;Add support for canned access control lists with the native S3 file system.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-trino-news&quot;&gt;Other Trino news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-7-21--mar-2024&quot;&gt;Trino Gateway
7&lt;/a&gt;
shipped with a new user interface thanks to a contribution from our new
&lt;a href=&quot;https://www.starburst.io/community/trino-champions/#peng-wei&quot;&gt;Starburst Trino champion Peng
Wei&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Status of the continuous integration and build setup with Apache Maven
improved a lot thanks to our collaboration with the new &lt;a href=&quot;https://www.starburst.io/community/trino-champions/#tamas-cservenak&quot;&gt;Starburst Trino
champion Tamas Cservenak&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-21-mar-2024&quot;&gt;recap is now
available&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;mitzu&quot;&gt;Mitzu&lt;/h2&gt;

&lt;p&gt;Mitzu is a warehouse-native product analytics platform that revolutionizes how
companies leverage their product usage data in the data lake.&lt;/p&gt;

&lt;p&gt;By directly connecting to Trino, Mitzu eliminates the need for traditional
reverse ETL processes to 3rd party applications such as Amplitude or Mixpanel.
Mitzu enables real-time self-served product analytics on top of the existing
data infrastructure with generated SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/client.html#mitzu&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/mitzu.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation and demo session with István we cover all the following
aspects and a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is product analytics?&lt;/li&gt;
  &lt;li&gt;Discuss some key terms, such as segmentation, funnels, and retention, and
discuss what insights and benefit become available.&lt;/li&gt;
  &lt;li&gt;What are some example use cases?&lt;/li&gt;
  &lt;li&gt;What kind of products can be analyzed?&lt;/li&gt;
  &lt;li&gt;Use of Mitzu for marketing.&lt;/li&gt;
  &lt;li&gt;What other product analytics tools exist, and what sets Mitzu apart?&lt;/li&gt;
  &lt;li&gt;How is Trino involved to make Mitzu warehouse-native?&lt;/li&gt;
  &lt;li&gt;What are the advantages of being warehouse-native? What does that mean?&lt;/li&gt;
  &lt;li&gt;Compare with Mitzu on other data platforms.&lt;/li&gt;
  &lt;li&gt;Implementation details of the Mitzu and Trino integration, such as connectors,
security, and client libraries&lt;/li&gt;
  &lt;li&gt;How to use Mitzu in terms of deployment and configuration.&lt;/li&gt;
  &lt;li&gt;Cool features of Mitzu.&lt;/li&gt;
  &lt;li&gt;Practical experience and customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Speakers are selected, contact and announcement coming soon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Register now&lt;/a&gt;, and book
travel and hotel.&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other news and events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred will attend &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;Open Source Summit
NA&lt;/a&gt;, and
present a Big Data Whirlwind Tour at the &lt;a href=&quot;https://www.meetup.com/data-engineer-things-seattle-meetup/events/300067664/&quot;&gt;inaugural Data Engineer Things
meeting&lt;/a&gt;
in Seattle.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call is now planned as monthly event with video recordings.&lt;/li&gt;
  &lt;li&gt;Check out the upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episode about jOOQ.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>57: Seeing clearly with OpenTelemetry</title>
      <link href="https://trino.io/episodes/57.html" rel="alternate" type="text/html" title="57: Seeing clearly with OpenTelemetry" />
      <published>2024-03-14T00:00:00+00:00</published>
      <updated>2024-03-14T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/57</id>
      <content type="html" xml:base="https://trino.io/episodes/57.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, co-creator of Trino
and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jmstephenson/&quot;&gt;Matt Stephenson&lt;/a&gt;, Senior Principal
Software Engineer at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-440.html&quot;&gt;Trino 440&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Snowflake connector&lt;/li&gt;
  &lt;li&gt;Support for sub-queries inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; clauses&lt;/li&gt;
  &lt;li&gt;Support for row filtering and column masking with Open Policy Agent&lt;/li&gt;
  &lt;li&gt;Improved latency when filesystem caching is enabled in Delta and Iceberg connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-441.html&quot;&gt;Trino 441&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;legacy&lt;/code&gt; mode for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.security&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And there is a regression for Iceberg, so wait for 442 potentially. (Update:
&lt;a href=&quot;https://trino.io/docs/current/release/release-442.html&quot;&gt;Trino 442&lt;/a&gt; is released.)&lt;/p&gt;

&lt;h2 id=&quot;other-trino-news&quot;&gt;Other Trino news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/20980&quot;&gt;Java 22 is coming to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;David Phillips appointed dedicated &lt;a href=&quot;https://trino.io/development/roles.html#file-system-lead.html&quot;&gt;file system lead&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-21-mar-2024&quot;&gt;Trino Contributor Call&lt;/a&gt; on the 21st of March&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/02/27/the-definitive-guide-2-jp.html&quot;&gt;Japenese edition of Trino: The Definitive Guide is out&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;opentelemetry&quot;&gt;OpenTelemetry&lt;/h2&gt;

&lt;p&gt;OpenTelemetry is a widely-used collection of APIs, SDKs, and tools that
instrument, generate, collect, and export telemetry data such as metrics, logs,
and traces to help you analyze application performance and behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#opentelemetry&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/opentelemetry.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation with Matt and David we cover all the following aspects, and
a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is &lt;a href=&quot;https://trino.io/ecosystem/add-on#opentelemetry&quot;&gt;OpenTelemetry&lt;/a&gt;?&lt;/li&gt;
  &lt;li&gt;Some basic concepts like &lt;a href=&quot;https://opentelemetry.io/docs/concepts/observability-primer/&quot;&gt;logs, spans, traces&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;How is this related to JMX and system data and other monitoring&lt;/li&gt;
  &lt;li&gt;What is &lt;a href=&quot;https://openmetrics.io/&quot;&gt;OpenMetrics&lt;/a&gt;? How is it related to
&lt;a href=&quot;https://trino.io/ecosystem/data-source.html#prometheus&quot;&gt;Prometheus&lt;/a&gt;?&lt;/li&gt;
  &lt;li&gt;What tools can you use with OpenTelemetry? Jaeger, Datadog, …&lt;/li&gt;
  &lt;li&gt;Reasoning to add OpenTelemetry to Trino&lt;/li&gt;
  &lt;li&gt;Implementation details&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/opentelemetry.html&quot;&gt;Trino documentation&lt;/a&gt; with
local example usage with Docker containers for Trino and Jaeger&lt;/li&gt;
  &lt;li&gt;Practical experience&lt;/li&gt;
  &lt;li&gt;Demo of real world usage with Starburst Galaxy and Datadog&lt;/li&gt;
  &lt;li&gt;Bonus topic - JSON-format logging via TCP socket&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024 and Trino Contributor Congregation&lt;/a&gt; are happening in June in Boston.
Submit your speaker proposals now, and register for the free event as soon as
you can, especially for live attendance.&lt;/p&gt;

&lt;p&gt;Check out the upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episodes about Mitzu and jOOQ.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt; online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>56: The vast possibilities of VAST and Trino</title>
      <link href="https://trino.io/episodes/56.html" rel="alternate" type="text/html" title="56: The vast possibilities of VAST and Trino" />
      <published>2024-02-22T00:00:00+00:00</published>
      <updated>2024-02-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/56</id>
      <content type="html" xml:base="https://trino.io/episodes/56.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Trino Community Leadership at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://linkedin.com/in/colleen-tartow-phd&quot;&gt;Colleen Tartow&lt;/a&gt;, Field CTO and
Head of Strategy at &lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/roman-zeyde/&quot;&gt;Roman Zeyde&lt;/a&gt;, Senior Software
Engineer at &lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST Data&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-439&quot;&gt;Release 439&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-439.html&quot;&gt;Trino 439&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New caching layer for Delta Lake, Hive, and Iceberg!&lt;/li&gt;
  &lt;li&gt;Documentation for new native file system support.&lt;/li&gt;
  &lt;li&gt;Fix for setting session properties on catalogs with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.&lt;/code&gt; in the name.&lt;/li&gt;
  &lt;li&gt;Fix for reading Snappy data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-gateway-6&quot;&gt;Trino Gateway 6&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Docker container setup!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-the-vast-database-and-data-platform&quot;&gt;Concept of the episode: The VAST database and data platform&lt;/h2&gt;

&lt;p&gt;Part database, part data warehouse, part data lake, describing
&lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST&lt;/a&gt; in one sentence is not the easiest undertaking.
You can talk about features like deep write buffers with underlying flash
columnar storage, the automatic contextual layer added on top of the data, or
the similarity-based global compression that more than makes up for the smaller
columnar chunks and makes it so much faster to find exactly the data you’re
looking for.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/data-source.html#vast&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/vast.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So what is VAST? It’s a state-of-the-art data platform. Why are we talking about
it on the Trino Community Broadcast? A world-class data storage solution still
needs a world-class query engine, and its speed paired with Trino’s makes for a
brilliant combination. We’re diving into how it works, why it is designed the
way it is, and maybe talk about the really cool &lt;a href=&quot;https://vastdata.com/database#performance-comparison&quot;&gt;performance
comparison&lt;/a&gt; they have on
their website showcasing Trino as their favorite query engine.&lt;/p&gt;

&lt;p&gt;Check out our conversation about the VAST database, VAST data platform, the
Trino connector, internal workings of the system, use case, customers and much
more in the interview.&lt;/p&gt;

&lt;p&gt;Also have a look at the &lt;a href=&quot;https://www.youtube.com/watch?v=RutbCY8i22Q&quot;&gt;presentation from Jason Russler about VAST from Trino
Summit 2023&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024 has been announced&lt;/a&gt; for this summer in Boston! Make sure
to check out the announcement blog post and register to attend, submit your
talks, or contact Starburst for information on sponsoring!&lt;/p&gt;

&lt;p&gt;Check out the upcoming Trino Community Broadcast episodes about OpenTelemetry
and Mitzu.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>55: Commander Bun Bun peeks at Peaka</title>
      <link href="https://trino.io/episodes/55.html" rel="alternate" type="text/html" title="55: Commander Bun Bun peeks at Peaka" />
      <published>2024-01-18T00:00:00+00:00</published>
      <updated>2024-01-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/55</id>
      <content type="html" xml:base="https://trino.io/episodes/55.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://linkedin.com/in/sakalsiz&quot;&gt;Mustafa Sakalsiz&lt;/a&gt;, CEO at
&lt;a href=&quot;https://www.peaka.com/&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/alitekin/&quot;&gt;Ali Tekin&lt;/a&gt;, Principal Software
Architect at &lt;a href=&quot;https://www.peaka.com/&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-437-438&quot;&gt;Releases 437-438&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-437.html&quot;&gt;Trino 437&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for configuring compression codecs&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;char&lt;/code&gt; values in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_utf8()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lpad()&lt;/code&gt; functions&lt;/li&gt;
  &lt;li&gt;Improved performance for Delta Lake queries without table statistics&lt;/li&gt;
  &lt;li&gt;Improved performance for Iceberg queries with filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino 438&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for access control with &lt;a href=&quot;https://trino.io/blog/2024/02/06/opa-arrived&quot;&gt;Open Policy Agent&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER COLUMN ... DROP NOT NULL&lt;/code&gt; in Iceberg and PostgreSQL&lt;/li&gt;
  &lt;li&gt;Support for configuring page sizes in Delta Lake, Hive, and Iceberg&lt;/li&gt;
  &lt;li&gt;Better type support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce_agg()&lt;/code&gt; function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And over in the land of the Trino Gateway…&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-5-24-jan-2024&quot;&gt;Trino Gateway version 5&lt;/a&gt;
released!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-peaka&quot;&gt;Concept of the episode: Peaka&lt;/h2&gt;

&lt;p&gt;Another Trino Community Broadcast episode means another cool piece of technology
that uses Trino for us to show off to the community. This time it’s Peaka,
a no-code approach to date warehousing that makes it easier than ever to set up
your data stack without needing a ton of complex engineering.&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;https://www.peaka.com/docs/getting-started/what-is-peaka/&quot;&gt;their own words&lt;/a&gt;,
Peaka is a platform that merges disparate data sources into a single data layer,
letting you join and blend them, query them using SQL or natural language, and 
expose your data to outside users through APIs. Sounds a bit like Trino, right?
That’s because underneath the hood, Trino is a key part of how they’re making it
happen. In this episode, we talk to the team at Peaka about where they got
started, how they’re making it easier than ever to leverage the federation that
Trino is capable of, and the work they’ve done on top to integrate their
platform with every SaaS data source under the sun.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-using-peaka&quot;&gt;Demo of the episode: Using Peaka!&lt;/h2&gt;

&lt;p&gt;If you want to see what the platform is like, then look no further. We’ll be
exploring:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Connecting to data sources&lt;/li&gt;
  &lt;li&gt;Filtering and combining data&lt;/li&gt;
  &lt;li&gt;Editing and running queries, including their visual query editor&lt;/li&gt;
  &lt;li&gt;Natural language queries&lt;/li&gt;
  &lt;li&gt;Visualizing data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-18719-filesystem-caching-with-alluxio&quot;&gt;PR of the episode: #18719: Filesystem caching with Alluxio&lt;/h2&gt;

&lt;p&gt;Perhaps it’s a little easier to link to the issue for tracking
&lt;a href=&quot;https://github.com/trinodb/trino/issues/20550&quot;&gt;the rollout&lt;/a&gt;, but however you
want to present it, caching in Trino is renewed! Caching is a huge performance win
for a wide variety of use cases, allowing the engine to run faster, better, and
pump out query results at an unparalleled pace. This is going to lead to 
performance improvements for Trino queries using the supported object storage 
connectors, and you’ll hear more from us about it once it’s officially launched.
The best part is that there’s even more coming down the line as support for it
is expanded.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>54: Trino 2023 wrapped</title>
      <link href="https://trino.io/episodes/54.html" rel="alternate" type="text/html" title="54: Trino 2023 wrapped" />
      <published>2024-01-18T00:00:00+00:00</published>
      <updated>2024-01-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/54</id>
      <content type="html" xml:base="https://trino.io/episodes/54.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of
Technical Content at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;, Trino co-creator and CTO at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-434-436&quot;&gt;Releases 434-436&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-434.html&quot;&gt;Trino 434&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FILTER&lt;/code&gt; clause to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LISTAGG&lt;/code&gt; function&lt;/li&gt;
  &lt;li&gt;Support reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; columns and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; statements in BigQuery connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-435.html&quot;&gt;Trino 435&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON_TABLE&lt;/code&gt; function&lt;/li&gt;
  &lt;li&gt;Improve reliability when reading from GCS&lt;/li&gt;
  &lt;li&gt;Improve query planning performance on Delta Lake tables&lt;/li&gt;
  &lt;li&gt;Improve reliability and memory usage for inserts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for Elasticsearch 8&lt;/li&gt;
  &lt;li&gt;New OpenSearch connector&lt;/li&gt;
  &lt;li&gt;Faster selective joins on partition columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional comments:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Disallow invalid configuration options with Delta Lake and Iceberg connector in 434&lt;/li&gt;
  &lt;li&gt;Separate metadata caching in numerous connectors&lt;/li&gt;
  &lt;li&gt;Various improvements for schema evolution in Hive connector&lt;/li&gt;
  &lt;li&gt;Require JDK 21.0.1 to run Trino with 436&lt;/li&gt;
  &lt;li&gt;Remove support of Elasticsearch 6 in 436&lt;/li&gt;
  &lt;li&gt;Fix minor issues for SQL routine and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON_TABLE&lt;/code&gt; function users&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2023&quot;&gt;Recap of Trino in 2023&lt;/h2&gt;

&lt;p&gt;We chat about all the developments in the Trino project and the Trino community
from 2023, including the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Various statistics about the project&lt;/li&gt;
  &lt;li&gt;Features and releases&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt;, &lt;a href=&quot;/blog/2023/12/18/trino-summit-recap.html&quot;&gt;Trino
Summit&lt;/a&gt;, and other events&lt;/li&gt;
  &lt;li&gt;New Trino maintainers&lt;/li&gt;
  &lt;li&gt;Polish and Chinese editions of definitive guide published&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Find more details and other topics in our &lt;a href=&quot;/blog/2024/01/19/trino-2023-wrapped.html&quot;&gt;blog post &lt;strong&gt;Trino 2023 wrapped&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Upcoming events in NYC and Vienna, details available in the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;events
calendar&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Contributor Congregation coming soon&lt;/li&gt;
  &lt;li&gt;Trino Gateway developer sync every two week, ping Manfred for invite&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can download &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;buy the book
online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>53: Understanding your data with Coginiti and Trino</title>
      <link href="https://trino.io/episodes/53.html" rel="alternate" type="text/html" title="53: Understanding your data with Coginiti and Trino" />
      <published>2023-11-16T00:00:00+00:00</published>
      <updated>2023-11-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/53</id>
      <content type="html" xml:base="https://trino.io/episodes/53.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of
Technical Content at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/msmullins/&quot;&gt;Matthew Mullins&lt;/a&gt;, CTO at
&lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/mullinsms&quot;&gt;@mullinsms&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/rnestertsov/&quot;&gt;Roman Nestertsov&lt;/a&gt;, Principle
Engineer at &lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/nestertsov&quot;&gt;@nestertsov&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-431-433&quot;&gt;Releases 431-433&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-431.html&quot;&gt;Trino 431&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://trino.io/docs/current/routines.html&quot;&gt;SQL routines&lt;/a&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP FUNCTION&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPLACE&lt;/code&gt; modifier in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Improved latency for prepared statements in JDBC driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-432.html&quot;&gt;Trino 432&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster filtering on columns containing long strings in Parquet data.&lt;/li&gt;
  &lt;li&gt;Predicate pushdown for real and double columns in MongoDB.&lt;/li&gt;
  &lt;li&gt;Support for Iceberg REST catalog in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_table&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_table&lt;/code&gt; procedures.&lt;/li&gt;
  &lt;li&gt;Support for BEARER authentication for Nessie catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-433.html&quot;&gt;Trino 433&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved support for Hive schema evolution.&lt;/li&gt;
  &lt;li&gt;Add support for altering table comments in the Glue catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also note that Trino 433 also includes documentation for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP CATALOG&lt;/code&gt;.
Check out the third SQL training session for a demo.&lt;/p&gt;

&lt;h2 id=&quot;sql-routine-competition&quot;&gt;SQL routine competition&lt;/h2&gt;

&lt;p&gt;Trino 431 finally delivered the long-awaited support for SQL routines. To
celebrate and see what you all come up with, we are running a competition.
&lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;Share your best SQL routine&lt;/a&gt;, and win a
reward sponsored by &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;call-for-java-21-testing&quot;&gt;Call for Java 21 testing&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/java-duke-21.png&quot; width=&quot;100px&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Java 21, the latest LTS release of Java, arrived in September 2023, and we want
to take advantage of the performance improvements, language features, and new
libraries. But to do so, &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;we need your input and confirmation that everything
works as expected&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-jdbc-driver&quot;&gt;Concept of the episode: JDBC driver&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/jdbc-small.png&quot; width=&quot;100px&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Java_Database_Connectivity&quot;&gt;Java Database Connectivity
(JDBC)&lt;/a&gt; is an
important standard for any JVM-based application, that wants to access a
relational database. Trino ships a JDBC driver that abstracts all the low-level
details of our conversational REST API for client tools and supports various
authentication mechanisms, TLS, and other features. This allows tools like
Coginiti to ignore those details, and work with the community on any
improvements for the benefit of all users.&lt;/p&gt;

&lt;h2 id=&quot;client-tool-focus-on-coginiti&quot;&gt;Client tool focus on Coginiti&lt;/h2&gt;

&lt;p&gt;Matthew and Roman are joining us from &lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;.
Coginiti delivers higher-quality analytics faster. Coginiti provides an
AI-enabled enterprise data workspace that integrates modular development,
version control, and data quality testing throughout the analytic development
lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.coginiti.co&quot;&gt;
  &lt;img src=&quot;/assets/images/logos/coginiti-small.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With support for Trino, Coginiti as a client tool provides access to all the
configured catalogs in Trino. It enables data engineers and analyst to work
together in a shared platform, reducing duplication in their work, and bringing
“Don’t repeat yourself (DRY)” to analysts.&lt;/p&gt;

&lt;p&gt;We talk about why Coginiti added &lt;a href=&quot;https://www.coginiti.co/databases/trino/&quot;&gt;support for
Trino&lt;/a&gt;. Coginiti is not a compute
platform itself, but access to many platforms enables a “data blender thinking”.
So as a user you start caring less about the location and source of the
database, and more about the data itself and how you can mix it together to gain
better insights. Every enterprise has more than one data platform, with
different data warehouses, RDBMSes, and data lakes. Matthew talks about reasons
for this situation,. and how Trino as a partner platform to enables users to
federate across all of these platforms when needed.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-coginiti-and-trino&quot;&gt;Demo of the episode: Coginiti and Trino&lt;/h2&gt;

&lt;p&gt;In the demo of Coginiti, Roman and Matthew show some of the features of the tool
that enable code reuse and managing transformations on Trino. A tour through
major aspects of the application gives a good impression on benefits and
supported use cases.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Our line up for speakers and sessions for Trino Summit is nearly finalized. Join
us on the 13th and 14th of December for the free, virtual event. Stay tuned for
details about all the sessions soon, and in the meantime - &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;don’t forget to
register&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Trino SQL training series&lt;/a&gt; just
had a successful third session yesterday, and you can check out all the material
in our follow up blog posts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Getting started with Trino and SQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Advanced analytics with SQL and Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is still a chance for you &lt;a href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;to register and attend the fourth session
live&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>52: Commander Bun Bun takes a bite out of Yugabyte</title>
      <link href="https://trino.io/episodes/52.html" rel="alternate" type="text/html" title="52: Commander Bun Bun takes a bite out of Yugabyte" />
      <published>2023-10-26T00:00:00+00:00</published>
      <updated>2023-10-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/52</id>
      <content type="html" xml:base="https://trino.io/episodes/52.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/dmagda/&quot;&gt;Denis Magda&lt;/a&gt;, Director of Developer
Relations at &lt;a href=&quot;https://www.yugabyte.com/&quot;&gt;Yugabyte&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-428-430&quot;&gt;Releases 428-430&lt;/h2&gt;

&lt;p&gt;Unofficial highlights from Cole:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-428.html&quot;&gt;Trino 428&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reduced memory usage for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Simplified configuration for managing writer counts&lt;/li&gt;
  &lt;li&gt;Faster reads for small Parquet files on data lakes&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://docs.pinot.apache.org/users/user-guide-query/query-options&quot;&gt;query options&lt;/a&gt;
on dynamic tables in Pinot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-429.html&quot;&gt;Trino 429&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster reading of ORC files in Hive&lt;/li&gt;
  &lt;li&gt;More types supported for schema evolution in Hive&lt;/li&gt;
  &lt;li&gt;Security improvements, including logging out of a session with the Web UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-430.html&quot;&gt;Trino 430&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for setting a timezone on the session level&lt;/li&gt;
  &lt;li&gt;Table statistics in MariaDB&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-jdbc-based-connectors&quot;&gt;Concept of the episode: JDBC-based connectors&lt;/h2&gt;

&lt;p&gt;In Trino, we have a lot of connectors that are based on top of JDBC. JDBC could
stand for “just da best connectors,” but it’s really Java database connectivity,
and it’s one of the core APIs by which many of the most prominent connectors in
the Trino ecosystem function. It’s so common, in fact, that we have
&lt;a href=&quot;/docs/current/develop/example-jdbc.html&quot;&gt;an example JDBC connector in Trino&lt;/a&gt; to
make it easier to go implement your own JDBC-based connector if you need one.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-yugabytedb&quot;&gt;Concept of the episode: YugabyteDB&lt;/h2&gt;

&lt;p&gt;But if the topic of today’s episode is YugabyteDB, why are we talking about
PostgreSQL? Well, if you’re unfamiliar with Yugabyte, lifting from
&lt;a href=&quot;https://docs.yugabyte.com/&quot;&gt;their docs&lt;/a&gt;: “YugabyteDB is distributed PostgreSQL
that delivers on-demand scale, built-in resilience, and a multi-API interface.”
Distributed architecture should be a familiar concept to a community involved
with a distributed query engine, and if you understand how Trino is able to
leverage it, you should also understand why it makes sense to pair with
Yugabyte. We’ll be discussing why Yugabyte got started, what it does differently
from other databases, what it does better than other databases, and how you
might want to use it with Trino.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-trino-on-yugabytedb&quot;&gt;Demo of the episode: Trino on YugabyteDB&lt;/h2&gt;

&lt;p&gt;As part of the episode, we’ll also be showing off how you can use YugabyteDB 
with Trino. We start with using the PostgreSQL connector, then Denis shows how 
to use the PostgreSQL connector to run Trino with Yugabyte. It’s always hard to
explain demos in show notes, so tune into the YouTube video and take a look for
yourself if you’re curious!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Summit, the biggest Trino event of the year, is coming up on the 13th and
14th of December, and like Trino Fest, it’ll be fully virtual. If you’d like to
give a talk about anything related to Trino, we’re looking for speakers now.
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;Submit your talk here!&lt;/a&gt; If you’d
rather attend, you can also
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;go register to attend now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prior to Trino Summit, if you’d like to learn about SQL from the absolute
experts, we’ve also gotten started with the
&lt;a href=&quot;/blog/2023/09/27/training-series&quot;&gt;Trino Training Series&lt;/a&gt;
that we’ll be running as a buildup to the summit. The
&lt;a href=&quot;/blog/2023/10/18/sql-training-1&quot;&gt;recap for the first session&lt;/a&gt;
is live, but there’s three more to come! Register now and look forward
to those great sessions starting from the ground up and ending with some key
tricks and Trino specifics that even a seasoned SQL veteran may not know about.&lt;/p&gt;

&lt;p&gt;We also have a talk about Trino on Ice and data meshes coming up in Redwood City
with Slalom and Starburst. If you’re local, consider
&lt;a href=&quot;https://go.slalom.com/starburstnorcal&quot;&gt;signing up and checking it out!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>51: Trino cools off with PopSQL</title>
      <link href="https://trino.io/episodes/51.html" rel="alternate" type="text/html" title="51: Trino cools off with PopSQL" />
      <published>2023-10-05T00:00:00+00:00</published>
      <updated>2023-10-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/51</id>
      <content type="html" xml:base="https://trino.io/episodes/51.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jakeptrsn/&quot;&gt;Jake Peterson&lt;/a&gt;, Head of Customer
Success at &lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Matthew Peveler, Software Engineer at &lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;,
&lt;a href=&quot;https://github.com/MasterOdin&quot;&gt;MasterOdin&lt;/a&gt; on GitHub&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-423-427&quot;&gt;Releases 423-427&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-423.html&quot;&gt;Trino 423&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Schema evolution for nested fields&lt;/li&gt;
  &lt;li&gt;Support for comments on materialized view columns&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASCADE&lt;/code&gt; option in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; for Clickhouse, MariaDB, MySQL,
Oracle and SingleStore&lt;/li&gt;
  &lt;li&gt;Various performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-424.html&quot;&gt;Trino 424&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for JSON, CSV, text and related formats in Hive&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASCADE&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; for PostgreSQL and Iceberg&lt;/li&gt;
  &lt;li&gt;Improved coordinator CPU utilization for large clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-425.html&quot;&gt;Trino 425&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for check constraints in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; for Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for the Decimal128 in MongoDB connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-426.html&quot;&gt;Trino 426&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET/RESET SESSION AUTHORIZATION&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of aggregations over decimal values.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for Databricks 13.3 LTS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-427.html&quot;&gt;Trino 427&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for pushing down &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; statements into connectors.&lt;/li&gt;
  &lt;li&gt;Support for reading Delta Lake tables with Deletion Vectors.&lt;/li&gt;
  &lt;li&gt;Faster writing to Parquet files in Delta Lake and Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for querying tags in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-popsql&quot;&gt;Concept of the episode: PopSQL&lt;/h2&gt;

&lt;p&gt;It may be familiar to some of our viewers to describe an environment where
key queries and dashboards are buried in someone’s personal workspace, and you
have to go ask them directly every time you want to check on your metrics.
When you’re running a world-class, highly-performant query engine like Trino and
investing time and resources into maintaining it, shouldn’t you treat your
queries like a first-class, collaborative, versioned system, too?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;, a playful spin on the word popsicle, solves the
sadness that is disorganized and siloed insights by centralizing queries into a
platform that has versioning, security, and a suite of collaborative tools
comparable to Google Drive. Want to work with your teammate on a query? You can
open up the same editor and see the same thing. Want to see what that query
someone ran last week was to see how the new feature is doing? It’s there. Have
a suggestion to improve something? Leave a comment. Realize your suggestion was
wrong and need to undo the change? You can view past versions of the query.&lt;/p&gt;

&lt;p&gt;PopSQL and Trino make sense together. PopSQL provides a best-in-class interface
for organizing, collaborating, and working together on all of your SQL queries
across the business, and Trino handles running those queries at unparalleled
speeds. They go hand-in-hand for treating your data and SQL analytics as first
class citizens. In today’s episode, we’ll be exploring what PopSQL is, how it
integrates with Trino, and how the engineers at PopSQL have done some cool
things with Trino to make the integration better than ever before. We’ll start
with that last one, actually.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-a-new-nodejs-adapter-for-trino&quot;&gt;Concept of the episode: A new Node.js adapter for Trino&lt;/h2&gt;

&lt;p&gt;Trino in the frontend is… a tricky thing. We can go ahead and admit that the
&lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Trino web UI&lt;/a&gt; isn’t going to win any
awards for design or functionality. And while a couple Node-based libraries
exist out there, including &lt;a href=&quot;https://www.npmjs.com/package/presto-client&quot;&gt;presto-client-node&lt;/a&gt;
and &lt;a href=&quot;https://github.com/vweevers/lento&quot;&gt;lento&lt;/a&gt;. But presto-client-node lacked
support for streaming and had some issues handling 500 errors, and lento doesn’t
quite support Trino out of the box and only supports single streams, which
wasn’t ideal for PopSQL’s distributed architecture. So when PopSQl’s engineers
went to build their frontend and integrate with Trino, what did they do? Build
their own adapter.&lt;/p&gt;

&lt;p&gt;We’ll talk about how it was implemented, what key features it unlocks, and why
it makes using PopSQL with Trino an even better experience.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-using-popsql-with-trino&quot;&gt;Demo of the episode: Using PopSQL with Trino&lt;/h2&gt;

&lt;p&gt;It’s hard to write show notes for a demo, because you can’t really experience
the demo by reading about what’s happening. But as a surface-level overview,
we’ll be going over:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Setting up a connection&lt;/li&gt;
  &lt;li&gt;The schema explorer&lt;/li&gt;
  &lt;li&gt;The SQL editor&lt;/li&gt;
  &lt;li&gt;Query scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-57-on-trino-gateway-release-version-3&quot;&gt;PR of the episode: #57 on trino-gateway: Release version 3&lt;/h2&gt;

&lt;p&gt;Last week, the community officially released the
&lt;a href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;trino-gateway&lt;/a&gt;, a proxy and load
balancer that enables large operations to run multiple Trino clusters in
harmony with each other to serve big queries and small queries alike. If you or
your organization have a need for more than one Trino cluster and want the
seamless experience of being able to connect to any of them through a single
interface, then check it out! It’s the product of many months of effort and
should be a fantastic solution for running Trino at the absolute largest scales.&lt;/p&gt;

&lt;p&gt;To learn more about it, you should check out
&lt;a href=&quot;/blog/2023/09/28/trino-gateway&quot;&gt;the blog post announcing its first release.&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Summit, the biggest Trino event of the year, is coming up on the 13th and
14th of December, and like Trino Fest, it’ll be fully virtual. If you’d like to
give a talk about anything related to Trino, we’re looking for speakers now.
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;Submit your talk here!&lt;/a&gt; If you’d
rather attend, you can also
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;go register to attend now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prior to Trino Summit, if you’d like to learn about SQL from the absolute
experts, we’ve also announced the &lt;a href=&quot;/blog/2023/09/27/training-series&quot;&gt;Trino Training Series&lt;/a&gt;
that we’ll be running as a buildup to the summit. Register now and look forward
to four great sessions starting from the ground up and ending with some key
tricks and Trino specifics that even a seasoned SQL veteran may not know about.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>50: Celebrating 50 episodes of Trino Community Broadcast</title>
      <link href="https://trino.io/episodes/50.html" rel="alternate" type="text/html" title="50: Celebrating 50 episodes of Trino Community Broadcast" />
      <published>2023-07-27T00:00:00+00:00</published>
      <updated>2023-07-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/50</id>
      <content type="html" xml:base="https://trino.io/episodes/50.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Head of Developer Relations at &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/daindumb&quot;&gt;@daindumb&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-421-422&quot;&gt;Releases 421-422&lt;/h2&gt;

&lt;p&gt;Unofficial highlights from Cole:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-421.html&quot;&gt;Trino 421&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; constraints in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; on Google Sheets.&lt;/li&gt;
  &lt;li&gt;Faster queries on MongoDB tables with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-422.html&quot;&gt;Trino 422&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS ... SELECT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Support for nested fields in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ADD COLUMN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Faster Avro reader for Hive.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_table&lt;/code&gt; procedure to register Hadoop tables in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-50&quot;&gt;Concept of the episode: 50!&lt;/h2&gt;

&lt;p&gt;No, that’s not a factorial, we’re just excited to have made it to 50 Trino
Community Broadcast episodes. We’ve brought back some familiar faces to talk
about what we’ve done, how we’ve got here, what it takes to keep an open source
project ticking for over a decade, and celebrate the steps we’ve taken along
the way. It’s unscripted, and the discussion carries to wherever it feels like.&lt;/p&gt;

&lt;p&gt;Tune in to hear about the history of the Trino Community Broadcast, the upcoming
Snowflake connector, and a few of the core philosophies that have kept Trino
running. Manfred also shows off updates to the Trino website, highlighting all
the tools, data sources, and add-ons that you can use with Trino.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Trino Fest was a little over a month ago, and we’re publishing the last recap of
all the talks to the Trino blog today! Check out our YouTube channel and the
Trino website to catch up on everything you missed.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>49: Trino, Ibis, and wrangling Python in the SQL ecosystem</title>
      <link href="https://trino.io/episodes/49.html" rel="alternate" type="text/html" title="49: Trino, Ibis, and wrangling Python in the SQL ecosystem" />
      <published>2023-07-06T00:00:00+00:00</published>
      <updated>2023-07-06T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/49</id>
      <content type="html" xml:base="https://trino.io/episodes/49.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/cpcloud&quot;&gt;Phillip Cloud&lt;/a&gt;, Principal Engineer at Voltron
Data. &lt;a href=&quot;https://www.youtube.com/@cpcloud&quot;&gt;Check out his YouTube channel!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-419-420&quot;&gt;Releases 419-420&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-419.html&quot;&gt;Trino 419&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array_histogram&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Faster reading and writing of Parquet data.&lt;/li&gt;
  &lt;li&gt;Support for Nessie catalog in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-420.html&quot;&gt;Trino 420&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Underscores in numeric literals (e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1_000_000&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Hexadecimal, binary and octal numeric literals (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x1a&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0b1010&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0o12&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Support for comments on view columns in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RENAME COLUMN&lt;/code&gt; in MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Support for mixed case table names in Druid connector.&lt;/li&gt;
  &lt;li&gt;Faster queries when statistics are unavailable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-episode-what-is-ibis&quot;&gt;Question of the episode: What is Ibis?&lt;/h2&gt;

&lt;p&gt;Taken straight from &lt;a href=&quot;https://ibis-project.org/concept/why_ibis/&quot;&gt;the Ibis website&lt;/a&gt;,
Ibis is a dataframe interface to execution engines with support for 15+
backends (including Trino!). Ibis doesn’t replace your existing execution
engine, it extends it with powerful abstractions and intuitive syntax.&lt;/p&gt;

&lt;p&gt;For those who love doing all their data-related work in Python, this allows you
to write Python code that leverages the speed and power of Trino without needing
to become a SQL master. For the die-hard SQL users out there,
&lt;a href=&quot;https://ibis-project.org/tutorial/ibis-for-sql-users/&quot;&gt;they have a guide on Ibis for SQL users&lt;/a&gt;
that explains how it fully replaces SQL with Python code that is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Type-checked and validated as you go.&lt;/li&gt;
  &lt;li&gt;Easier to write. Pythonic function calls with tab completion in IPython.&lt;/li&gt;
  &lt;li&gt;More composable. Break complex queries down into easier-to-digest pieces.&lt;/li&gt;
  &lt;li&gt;Easier to reuse. Mix and match Ibis snippets to create expressions tailored
for your analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if you’ve been writing SQL queries since day 1 and swear by it, opening the
door to using Python for analytics creates many new possibilities, widens the
possible talent pool you can work with, and gives you an entire second ecosystem
to integrate with.&lt;/p&gt;

&lt;p&gt;And ultimately, at the end of the day, the idea is that you get the ease of
writing Python code with the power and performance of a blazing fast SQL engine.
&lt;a href=&quot;https://youtu.be/pAWseFS4eAk&quot;&gt;You get the best of both worlds&lt;/a&gt;, and using Ibis
doesn’t lock you out of rolling up your sleeves and writing some SQL when a
situation calls for it.&lt;/p&gt;

&lt;h3 id=&quot;and-you-dont-need-to-learn-different-sql-dialects&quot;&gt;And you don’t need to learn different SQL dialects&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/49/standards_2x.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Trino more or less adheres to ANSI SQL, but it implements some ANSI features
that are rarely seen in other query engines, and other query engines choose to
deviate in a variety of ways. This can be a headache if you’re migrating to
Trino, as queries need to be re-written, re-structured, and tested to make sure
they return the same results. If you got set up with Ibis, first, it would do
that thinking for you, and a Python query could be converted to whatever dialect
of SQL you need without any issue. It can save time, effort, headaches, or a
sense of being locked into a specific SQL dialect, freeing you up to move
between query engines without any pain points… because of course, you want to
move to Trino, which is the best query engine.&lt;/p&gt;

&lt;p&gt;It also needs pointing out that this allows you to federate your queries while
you federate your queries.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-converting-python-to-sql&quot;&gt;Concept of the episode: Converting Python to SQL&lt;/h2&gt;

&lt;p&gt;Take some Python like so:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ibis&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ibis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;examples&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ml_latest_small_movies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fetch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg_rating&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;order_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;desc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And Ibis can automatically turn it into SQL that executes on Trino:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;con&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg_rating&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Obviously, this example is lightweight, but as queries grow more complex and
sophisticated, the conversion becomes more and more worthwhile. And we mentioned
that the Python code is easier to re-use, but it really is - if you want to run
a similar query in conjunction with the query above, those &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;movies&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rating_by_year&lt;/code&gt; variables still exist, and writing some code to leverage them
is a lot easier and more intuitive than setting up SQL sub-queries and aliases.&lt;/p&gt;

&lt;h3 id=&quot;questions-for-phillip&quot;&gt;Questions for Phillip&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Why is it called Ibis?&lt;/li&gt;
  &lt;li&gt;How much of a normal SQL workload do you think could be handled and run by
Ibis?&lt;/li&gt;
  &lt;li&gt;How much can Ibis optimize SQL queries for performance?&lt;/li&gt;
  &lt;li&gt;Which SQL dialect has been the worst to deal with?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-15026-support-insert-in-google-sheets-connector&quot;&gt;PR of the episode: #15026: Support INSERT in Google Sheets connector&lt;/h2&gt;

&lt;p&gt;Google Sheets is one of our not-as-talked-about connectors in Trino, but it
still sees use and community updates, and we want to give that a shoutout in
today’s Trino Community Broadcast. &lt;a href=&quot;https://github.com/trinodb/trino/pull/15026&quot;&gt;#15026&lt;/a&gt;
from &lt;a href=&quot;https://github.com/sbernauer&quot;&gt;Sebastien Bernauer&lt;/a&gt; adds &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; support to
the connector, so now you can read &lt;em&gt;and&lt;/em&gt; write from Google Sheets in Trino,
empowering the world of SQL-on-spreadsheets.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-477-on-trinoio-add-mateusz-gajewski-to-maintainer-list&quot;&gt;PR of the episode: #477 on trino.io: Add Mateusz Gajewski to maintainer list&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino.io/pull/477&quot;&gt;We’ve added another maintainer to Trino!&lt;/a&gt;
We just spend an episode introducing Manfred and James Petty as maintainers, and
Mateusz is right behind them after years of effort helping Trino as a
contributor and reviewer.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Trino Fest wrapped up a few weeks ago, and we’re publishing recaps of all the
talks to the Trino blog! Keep an eye on our YouTube channel and the Trino
website to catch up on everything you missed.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>48: What is Trino?</title>
      <link href="https://trino.io/episodes/48.html" rel="alternate" type="text/html" title="48: What is Trino?" />
      <published>2023-05-31T00:00:00+00:00</published>
      <updated>2023-05-31T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/48</id>
      <content type="html" xml:base="https://trino.io/episodes/48.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-417-418&quot;&gt;Releases 417-418&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-417.html&quot;&gt;Trino 417&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt; ALL queries.&lt;/li&gt;
  &lt;li&gt;Faster processing of Parquet data in Hudi, Iceberg, Hive, and Delta Lake
connectors.&lt;/li&gt;
  &lt;li&gt;Faster reads of nested row fields in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-418.html&quot;&gt;Trino 418&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXECUTE IMMEDIATE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table_changes&lt;/code&gt; function in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Faster joins on partition columns in Delta Lake, Hive, Hudi, and Iceberg
connectors.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the Oracle connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-episode-what-is-trino&quot;&gt;Question of the episode: What is Trino?&lt;/h2&gt;

&lt;p&gt;We’ve put out nearly 50 Trino Community Broadcast episodes, but we haven’t yet
done the simplest, most obvious topic of them all - an exploration of what Trino
is, how Trino works, and how you can run it. This week, we’re taking a step back
and doing a broader overview of those things, because the world needs to know…
what is Trino?&lt;/p&gt;

&lt;p&gt;If you check the Trino documentation, it starts with a definition of what Trino
isn’t. But we’ll start with what Trino is: a distributed SQL query engine
written in Java. If you have a SQL query, Trino can process and run it on an
extremely wide variety of data sources and return a result to you that you’d
expect from that SQL query. It can run queries on traditional relational
databases like Oracle, MySQL, and PostgreSQL; it works on data likes like Hive,
Iceberg, Delta Lake, and Hudi; and it runs on no-SQL databases like Cassandra
and MongoDB. You give Trino a query, Trino gives you results. And the best part
is that it doesn’t just work, it works blazing fast.&lt;/p&gt;

&lt;p&gt;The key thing to point out is that Trino does not store data, and it is not a
database on its own. It is a query engine, designed to sit on top of databases
and provide an ANSI-standard SQL interface to query whatever you’re storing your
data in. In order to use Trino, you need to start by having data stored
somewhere else. Of course, Trino can write data to those underlying
sources with the same SQL syntax, so for the end user, it can be an all-in-one
interface to those underlying data sources, an abstraction that saves users from
needing to understand the differences between data being stored in Iceberg and
data being stored in Oracle.&lt;/p&gt;

&lt;h3 id=&quot;how-does-it-work&quot;&gt;How does it work?&lt;/h3&gt;

&lt;p&gt;Trino uses a distributed architecture, with a singular coordinator node that
schedules and orchestrates the workload, as well as many worker nodes that
carries out tasks and processes data.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-do-you-run-trino&quot;&gt;Concept of the episode: How do you run Trino?&lt;/h2&gt;

&lt;p&gt;The better question might be “how can’t you run Trino?” As the project has
matured, it’s been added to various third-party tools and integrated into
different apps that help make it easier to run than ever before. We have some
exciting news to share on that front soon, but for now, the biggest ways to run
Trino include:&lt;/p&gt;

&lt;h3 id=&quot;tarball&quot;&gt;Tarball&lt;/h3&gt;

&lt;p&gt;You can directly download the Trino server, manually configure it, and start it
up like any other program. Clients can connect to the server from there,
utilizing the web interface or the CLI to run queries. This is the most manual
way to set up Trino, but it works, and it doesn’t depend on anything else.
&lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html&quot;&gt;Our docs go into a ton of detail on this process.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;docker&quot;&gt;Docker&lt;/h3&gt;

&lt;p&gt;Trino provides a Docker image that can be run through the Docker software. You
start by downloading and installing Docker, create a container from
the Trino image, and then you can run that image to immediately get Trino up
and running. No manual configuration needed, no messing around with creating
directories or files, it just works. It’s perhaps the simplest way to get Trino
off the ground, and recommended for anyone trying to run it independently just
to fiddle around with it.
&lt;a href=&quot;https://trino.io/docs/current/installation/containers.html&quot;&gt;As always, you can refer to the docs for more information.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;kubernetes-and-helm&quot;&gt;Kubernetes and Helm&lt;/h3&gt;

&lt;p&gt;Trino provides a Helm chart for use with Kubernetes, so after setting up
Kubernetes, kubectl, and Helm, you can install Trino on your Kubernetes cluster
with Helm. It comes with the same pre-configured image as Docker, so there’s no
need to manually set that up, but in order to run queries, you’ll also need to
set up a tunnel between the coordinator pod within Kubernetes and whatever
machine you want to run those queries on. If this is the right setup for you,
you probably already know that, and you don’t need us to go into more detail.
&lt;a href=&quot;https://trino.io/docs/current/installation/kubernetes.html&quot;&gt;More info is in the Trino docs.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;trino-clients&quot;&gt;Trino clients&lt;/h3&gt;

&lt;p&gt;On the most basic side of things, Trino provides a command-line interface and a
web UI. If you want something more robust, a couple open source clients have
been made in the community -
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;one written for Python&lt;/a&gt; and
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;one written in Go&lt;/a&gt;. There’s a
couple other Python clients that will be even easier to run coming soon, and
we’ll be hearing from them at Trino Fest in just two weeks.&lt;/p&gt;

&lt;h3 id=&quot;or&quot;&gt;Or…&lt;/h3&gt;

&lt;p&gt;On the not-so-free side of things, Starburst Galaxy and AWS Athena offer Trino
as a cloud service, which can make life even easier.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-can-you-contribute-to-trino&quot;&gt;Concept of the episode: How can you contribute to Trino?&lt;/h2&gt;

&lt;p&gt;We’ve got a page on the website dedicated to
&lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;the contribution process&lt;/a&gt;, though we’d
like to welcome anyone and everyone listening to take a crack at contributing to
Trino if it’s something you’re interested in. Open source projects can always
use more help, and we’d like to see community contributions whenever. From that
process page, the steps are:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Sign the CLA.&lt;/li&gt;
  &lt;li&gt;Make sure your contribution is something that Trino wants/needs.&lt;/li&gt;
  &lt;li&gt;Implement your change.&lt;/li&gt;
  &lt;li&gt;Open a pull request.&lt;/li&gt;
  &lt;li&gt;Request and wait for a review.&lt;/li&gt;
  &lt;li&gt;Address review comments.&lt;/li&gt;
  &lt;li&gt;Wait for it to be merged.&lt;/li&gt;
  &lt;li&gt;Wait for the next release, and then… your code change is in Trino!&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;pr-of-the-episode-11701-support-nessie-catalog-in-iceberg-connector&quot;&gt;PR of the episode: #11701: Support Nessie Catalog in Iceberg connector&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://projectnessie.org/&quot;&gt;Nessie&lt;/a&gt; is a transactional catalog designed for use
with data lakes like Iceberg and Delta Lake. Its key selling point is git-like
version control, making it easy to view history, roll back, and see who made
what adjustments when. &lt;a href=&quot;https://github.com/trinodb/trino/pull/11701&quot;&gt;PR #11701&lt;/a&gt;
allows Trino’s Iceberg connector to query Nessie, adding yet another tool and
opportunity for query federation to Trino’s belt.&lt;/p&gt;

&lt;p&gt;And though we hate to say it, Nessie might just be the only other project in the
world with a mascot that can compete with Commander Bun Bun.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Coming up in just two weeks, Trino Fest is a two-day event that will feature
talks from a wide range of speakers surrounding the Trino ecosystem. As already
hinted at, we’ll be hearing from a couple new Python clients, from Trino users
sharing tips and tricks to maximize the utility of the software, and from
community contributors adding exciting new features and extensions to Trino.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Register to attend&lt;/a&gt; if you’re
interested and want to tune in to an awesome speaker lineup! It’s virtual and
completely free to attend, so all you’ve got to do is sign up.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>47: Meet the new Trino maintainers</title>
      <link href="https://trino.io/episodes/47.html" rel="alternate" type="text/html" title="47: Meet the new Trino maintainers" />
      <published>2023-05-05T00:00:00+00:00</published>
      <updated>2023-05-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/47</id>
      <content type="html" xml:base="https://trino.io/episodes/47.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;James Petty&lt;/a&gt;, Senior Software Engineer at AWS&lt;/li&gt;
  &lt;li&gt;Also Manfred. Kind of.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-411-416&quot;&gt;Releases 411-416&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-411.html&quot;&gt;Trino 411&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; procedure to convert a Hive table to Iceberg.&lt;/li&gt;
  &lt;li&gt;Join and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; pushdown in Ignite.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; in Ignite.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;procedure&lt;/code&gt; table function for executing stored procedures in SQL Server.&lt;/li&gt;
  &lt;li&gt;Faster join queries over Hive bucketed tables.&lt;/li&gt;
  &lt;li&gt;Faster planning for tables with many columns in Hive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-412.html&quot;&gt;Trino 412&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exclude_columns&lt;/code&gt; table function.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ADD COLUMN&lt;/code&gt; in Ignite.&lt;/li&gt;
  &lt;li&gt;Support for table comments in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;Faster sum(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT ...&lt;/code&gt;) queries for various connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-413.html&quot;&gt;Trino 413&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; in the Phoenix connector.&lt;/li&gt;
  &lt;li&gt;Support for table comments in the Oracle connector.&lt;/li&gt;
  &lt;li&gt;Improved performance of queries involving window functions or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-414.html&quot;&gt;Trino 414&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for tracing using OpenTelemetry.&lt;/li&gt;
  &lt;li&gt;Support for Databricks 12.2 LTS in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in Redshift connector.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sequence&lt;/code&gt; table function.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-415.html&quot;&gt;Trino 415&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/docs/current/release/release-416.html&quot;&gt;Trino 416&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A whole lot of minor performance improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-the-two-new-trino-maintainers&quot;&gt;Introducing the two new Trino maintainers&lt;/h2&gt;

&lt;p&gt;Manfred should hardly need an introduction to Trino Community Broadcast viewers,
as he’s been around and hosting episodes from the beginning, and authored
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
In the background, he’s also been quietly working on docs, the website, and
a wide variety of other initiatives in the Trino community.&lt;/p&gt;

&lt;p&gt;James should also be familiar to anyone who has contributed on Trino. Iconically
rocking a GitHub avatar of the face of
&lt;a href=&quot;https://en.wikipedia.org/wiki/Bob_Ross&quot;&gt;Bob Ross&lt;/a&gt;, it’s hard to miss when he
shows up on a pull request. And working on Trino as part of
&lt;a href=&quot;https://aws.amazon.com/athena/&quot;&gt;AWS Athena&lt;/a&gt;, he’s been a major engineering
contributor for the last several years, with 262 commits under his belt and more
on the way.&lt;/p&gt;

&lt;h2 id=&quot;what-is-a-maintainer&quot;&gt;What is a maintainer?&lt;/h2&gt;

&lt;p&gt;If you don’t go clicking around on the Trino website fanatically trying to find
everything you can possibly read about the project, there’s a chance you’ve
never bumped into our &lt;a href=&quot;https://trino.io/development/roles.html&quot;&gt;roles&lt;/a&gt; page,
which highlights how Trino is governed. To quote that page:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In Trino, maintainer is an active role. A maintainer is responsible for
merging code only after ensuring it has been reviewed thoroughly and aligns with
the Trino vision and guidelines. In addition to merging code, a maintainer
actively participates in discussions and reviews. Being a maintainer does not
grant additional rights in the project to make changes, set direction, or
anything else that does not align with the direction of the project. Instead, a
maintainer is expected to bring these to the project participants as needed to
gain consensus. The maintainer role is for an individual, so if a maintainer
changes employers, the role is retained. However, if a maintainer is no longer
actively involved in the project, their maintainer status will be reviewed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or, in normal speech, a maintainer is a trusted individual with merge rights.
But with great power comes great responsibility, higher standards, and an
expectation to be an active steward of the Trino project. It’s not easy to
become a maintainer - prior to Manfred and James, it had been over a year since
the most recent maintainer was appointed. The high bar of activity, quality, and
attitude is not trivial by any stretch, and so we’re excited to talk to them
about the role, how they got here, and what they’re looking forward to for the
future of Trino.&lt;/p&gt;

&lt;h2 id=&quot;the-path-to-becoming-a-maintainer&quot;&gt;The path to becoming a maintainer&lt;/h2&gt;

&lt;h3 id=&quot;manfred&quot;&gt;Manfred&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;When did you first start working on Trino?&lt;/li&gt;
  &lt;li&gt;What’s your proudest contribution to the project?&lt;/li&gt;
  &lt;li&gt;Have a funny story you’ve wanted to share to the world?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;james&quot;&gt;James&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;When did you first start working on Trino?&lt;/li&gt;
  &lt;li&gt;What’s your proudest contribution to the project?&lt;/li&gt;
  &lt;li&gt;Why the Bob Ross avatar?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-16753-improve-topn-row-number--rank-performance&quot;&gt;PR of the episode: &lt;a href=&quot;https://github.com/trinodb/trino/pull/16753&quot;&gt;16753: Improve TopN row number / rank performance&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;We normally focus on flashy and user-facing PRs for the PR of the episode, but
this week, courtesy of our guest James, we’re going to highlight something that
better represents the more routine work that’s going on in Trino all the time:
a performance improvement.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Trino Fest&lt;/a&gt; is coming up in just a
couple months. Register to attend or
&lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;sign up to submit a talk&lt;/a&gt; if you have
something to share!&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;. Kevin Haley’s
&lt;a href=&quot;https://www.meetup.com/boston-data-engineering/events/291662797/&quot;&gt;Getting to Know Trino&lt;/a&gt;
in Boston was a great success, and we’d love to hear from other Trino community 
members who’d be interested in hosting other events!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>46: Trino heats up with Ignite</title>
      <link href="https://trino.io/episodes/46.html" rel="alternate" type="text/html" title="46: Trino heats up with Ignite" />
      <published>2023-03-15T00:00:00+00:00</published>
      <updated>2023-03-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/46</id>
      <content type="html" xml:base="https://trino.io/episodes/46.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jian-chen-7aa3a2225/&quot;&gt;Jason&lt;/a&gt;, Senior Data
Engineer at Shopee.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-408-410&quot;&gt;Releases 408-410&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-408.html&quot;&gt;Trino 408&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Apache Ignite connector!&lt;/li&gt;
  &lt;li&gt;Add support for writing decimal types to BigQuery.&lt;/li&gt;
  &lt;li&gt;Improve performance when reading structural types from Parquet files in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-409.html&quot;&gt;Trino 409&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for nested fields in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP COLUMN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for sorted tables in Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for time type in Cassandra.&lt;/li&gt;
  &lt;li&gt;Faster aggregations containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; with dynamic patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-410.html&quot;&gt;Trino 410&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sheet&lt;/code&gt; table function in Google Sheets.&lt;/li&gt;
  &lt;li&gt;Better file pruning in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-the-ignite-connector-to-trino&quot;&gt;Introducing the Ignite connector to Trino&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://trino.io/docs/current/connector/ignite.html&quot;&gt;Trino Ignite connector&lt;/a&gt;
was added a couple releases ago in Trino 408. It’s not every day that we add a
new connector to Trino, and so the topic of today’s episode is exploring the
connector, what it does, and what its use cases are. After that, we are going
to talk about the process of coming in as an outside engineer and contributing 
an entirely new connector to Trino.&lt;/p&gt;

&lt;h2 id=&quot;what-is-ignite&quot;&gt;What is Ignite?&lt;/h2&gt;

&lt;p&gt;Apache Ignite is an in-memory distributed database, comparable to others you may
be familiar with like Redis and SingleStore. If you’re not familiar with them or 
with in-memory computing, the gist is that by focusing on using RAM instead of
disk storage, you can create a database system which is &lt;em&gt;much&lt;/em&gt; faster - the
Ignite website advertises 10-1000x improvements. Of course, this is more
expensive, too, so it thrives in settings where performance is critical.&lt;/p&gt;

&lt;p&gt;With an initial release 7 years ago, Ignite is still a relative newcomer among
in-memory databases, coming with modern bells and whistles that has it
positioned to become a successor to those other, comparable databases mentioned
above. It also has some key functionality that sets it apart, including a
fully-distributed architecture which can use disk storage, allowing it to scale
horizontally.&lt;/p&gt;

&lt;h2 id=&quot;contributing-the-ignite-connector&quot;&gt;Contributing the Ignite connector&lt;/h2&gt;

&lt;p&gt;The Trino community and developers try their best to be active reviewers,
collaborators, and participants on pull requests coming in from outside
contributors. Massive contributions like the Ignite connector can take a lot of
round trips, back-and-forth discussion, and work from both the contributor and
the project’s maintainers to get it into a state where it is ready to merge and
go live for users to try out.&lt;/p&gt;

&lt;p&gt;To give you an idea,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/8323&quot;&gt;the pull request (PR) to contribute Ignite&lt;/a&gt;
was opened in mid-June, 2021. It received immediate feedback from a couple
maintainers, went through a few round trips with amendments, re-reviews, more
edits, and then other reviews. But in an open source environment, each round
trip can tend to take longer and longer. Progress stalled in November 2021, and
neither Jason nor the maintainers poked the Ignite PR for nearly a year. In
October 2022, as part of Trino DevRel’s roundup of stale and out-of-date pull
requests, we bumped back into the work that Jason had done. The wheels began to
turn again, starting slow but picking up the pace, until it returned to full and
active development, with several maintainers checking in frequently until the
connector was ready to go. But that’s the story from an observer, and we’ve got
Jason here to go into more detail.&lt;/p&gt;

&lt;h3 id=&quot;questions-for-jason&quot;&gt;Questions for Jason&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;How was the Trino review process?&lt;/li&gt;
  &lt;li&gt;Were there any major lessons you picked up along the way?&lt;/li&gt;
  &lt;li&gt;What tips would you give to someone else looking to add something into Trino?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-13493-add-support-for-migrate-procedure-in-iceberg&quot;&gt;PR of the episode: #13493: Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; procedure in Iceberg&lt;/h2&gt;

&lt;p&gt;If you’ve been in the data space for a while, you may know that there’s a bit of
a prevailing current in migrating from Hive to Iceberg. Out with the old, in
with the new, and in with the performance gains. &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt;,
one of the Trino maintainers,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13493&quot;&gt;has added a table procedure to Trino’s Iceberg connector&lt;/a&gt;
to make that process much, much simpler. Rather than a slow, manual, and arduous
process, if you have a Hive table stored in a file format supported by Iceberg,
it’s now as simple as calling the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; table procedure and letting it run.
The procedure copies the schema, partitioning, properties, and location of the
source table, then streams in all the data files from the source table to
re-build it all in the Iceberg format. Neat, right?&lt;/p&gt;

&lt;h2 id=&quot;more-about-ignite&quot;&gt;More about Ignite&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://ignite.apache.org/&quot;&gt;Check out the Ignite website!&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/ApacheIgnite&quot;&gt;Ignite on Twitter&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/showcase/apache-ignite/&quot;&gt;Ignite on LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Kevin Haley will be hosting an in-person event,
&lt;a href=&quot;https://www.meetup.com/boston-data-engineering/events/291662797/&quot;&gt;Getting to Know Trino&lt;/a&gt;,
in Boston, Massachusetts on Wednesday, April 5. You need to register in advance,
so if you’re in the Boston area and interested in attending, go sign up!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>45: Trino swimming with the DolphinScheduler</title>
      <link href="https://trino.io/episodes/45.html" rel="alternate" type="text/html" title="45: Trino swimming with the DolphinScheduler" />
      <published>2023-02-23T00:00:00+00:00</published>
      <updated>2023-02-23T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/45</id>
      <content type="html" xml:base="https://trino.io/episodes/45.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at
  &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/davidzollo/&quot;&gt;David Zollo&lt;/a&gt;, Apache
DolphinScheduler PMC Chair&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/zhongjiajie/&quot;&gt;Jay Chung&lt;/a&gt;,  Apache
DolphinScheduler PMC Member&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/niko-zeng/&quot;&gt;Niko Zeng&lt;/a&gt;,  Apache
DolphinScheduler Community Manager&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/williamk2000/&quot;&gt;William Guo&lt;/a&gt;, Apache Software 
Foundation Member&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2022&quot;&gt;Recap of Trino in 2022&lt;/h2&gt;

&lt;p&gt;Highlights from the blog post &lt;a href=&quot;/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;The rabbit reflects on Trino in 2022&lt;/a&gt; touch upon various aspects.&lt;/p&gt;

&lt;h2 id=&quot;release-407&quot;&gt;Release 407&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-407.html&quot;&gt;Trino 407&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for highly selective queries.&lt;/li&gt;
  &lt;li&gt;Improved performance when reading numeric, string and timestamp
values from Parquet files.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function for full query pass-through in Cassandra.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_table&lt;/code&gt; procedure in Delta Lake and Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for writing to the change data feed in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cole’s comments:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;For our contributors, we added a new action to track and ping the developer
relations team on stale pull requests to further prompt maintainers to take a
look. This doesn’t have any immediate impact on end users, but it’ll improve
the development and contribution process.&lt;/li&gt;
  &lt;li&gt;A Kerberos fix for the Kudu connector should make using it much
less of a headache on long-running Trino instances.&lt;/li&gt;
  &lt;li&gt;There were some really sophisticated performance improvements
that came from shifting default config values and adding some new
ones, all of which took a whole lot of testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-407.html&quot;&gt;Trino 407&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;what-is-workflow-orchestration&quot;&gt;What is workflow orchestration?&lt;/h2&gt;

&lt;p&gt;Workflow orchestration refers to the process of coordinating and automating
complex sequence of operations known as workflows consisting of multiple
interdependent tasks. This involves designing and defining the workflow,
scheduling and executing the tasks, monitoring the progress and outcomes, and
handling any errors or exceptions that may arise. In the context of Trino, the
tasks are typically the processing of SQL queries on one or more Trino cluster
and other related systems to create a data pipeline or similar automation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/45/data-pipelines.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-do-we-need-a-workflow-orchestration-tool-for-building-a-data-lake&quot;&gt;Why do we need a workflow orchestration tool for building a data lake?&lt;/h2&gt;

&lt;p&gt;Building a data lake can involve many complex and interdependent data processing
tasks, which can be challenging to manage and scale without a workflow
orchestration tool. Sometimes we can consider tools like Trino at the center of 
the universe, and perhaps it would be easier to schedule SQL queries with a much
simpler tool. Most companies, however, require a larger variety of tasks to
build a data lake that interoperate on more than just running SQL on Trino. Even
if you primarily run Trino SQL scripts to run these jobs, it is better to have
an orchestration tool instead of managing all processes manually.&lt;/p&gt;

&lt;h2 id=&quot;what-is-apache-dolphinscheduler&quot;&gt;What is Apache DolphinScheduler?&lt;/h2&gt;

&lt;p&gt;&lt;img width=&quot;75%&quot; src=&quot;/assets/episode/45/dolphin-scheduler.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler is an open-source, distributed workflow scheduling
platform designed to manage and execute batch jobs, data pipelines, and ETL
processes. DolphinScheduler enables users to create and manage consecutive jobs
run easily, including support for different types of tasks, such as SQL
statements, shell scripts, Spark jobs, Kubernetes deployments, and many others.
In short, it’s a powerful and user-friendly workflow orchestration platform that
enables users to automate and manage their complex data processing tasks.&lt;/p&gt;

&lt;p&gt;Read &lt;a href=&quot;https://blog.devgenius.io/dolphinscheduler-helps-trino-quickly-realize-the-integrated-data-construction-of-lake-and-warehouse-cde095b6573b&quot;&gt;this blog on Trino and Apache DolphinScheduler&lt;/a&gt;
to find out more.&lt;/p&gt;

&lt;h3 id=&quot;does-dolphinscheduler-have-any-computing-engine-or-storage-layer&quot;&gt;Does DolphinScheduler have any computing engine or storage layer?&lt;/h3&gt;

&lt;p&gt;DolphinScheduler is a powerful tool for managing and orchestrating data
processing workflows across a range of computing engines and storage systems,
but it does not provide its own computing or storage capabilities.&lt;/p&gt;

&lt;h2 id=&quot;what-are-the-differences-to-other-workflow-orchestration-systems&quot;&gt;What are the differences to other workflow orchestration systems?&lt;/h2&gt;

&lt;p&gt;Airflow is the incumbent de facto workload orchestrator. Many data engineers 
currently rely on Airflow to handle their workflow orchestration today so it
helps to understand DolphinScheduler’s benefits in relation to Airflow. Both
Dolphin Scheduler and Airflow are designed to be scalable and highly available
to support large-scale distributed environments.&lt;/p&gt;

&lt;p&gt;Airflow supports a wide range of third-party integrations, including popular
data processing frameworks such as Trino, Spark, and Flink, as well as with
cloud services such as AWS and Google Cloud. Dolphin Scheduler supports a
similar range of data processing frameworks and tools. This makes both platforms
suitable for managing diverse data processing tasks.&lt;/p&gt;

&lt;p&gt;DolphinScheduler project believes that future data governance belongs to data
engineers and consumers alike and should not be centralized to a single team.
Product-focused engineering teams should have access to data and be able to
orchestrate workflows without the need for extensive coding skills.
DolphinScheduler uses a drag and drop web UI to create and manages workflows
while also providing programmatic access using tools like Python SDK and Open
API.&lt;/p&gt;

&lt;p&gt;A positive feature of DolphinScheduler supporting users outside the data team
through a UI is that it offers robust security features. This includes
authentication, authorization, and data encryption, to ensure that users’ data
and workflows are protected.&lt;/p&gt;

&lt;p&gt;DolphinScheduler has relatively limited documentation and community support
since they are a newer project, but they are working hard to improve the 
developer experience and documentation.&lt;/p&gt;

&lt;h2 id=&quot;how-does-dolphinscheduler-deal-with-failures&quot;&gt;How does DolphinScheduler deal with failures?&lt;/h2&gt;

&lt;p&gt;Failure is an inevitable aspect of data workflow orchestration. The merits of
many of these orchestration tools come from how well they aid users in
responding to failures by monitoring health and notifying users when things go
wrong.&lt;/p&gt;

&lt;h3 id=&quot;does-dolphinscheduler-have-an-alarm-mechanism-itself&quot;&gt;Does DolphinScheduler have an alarm mechanism itself?&lt;/h3&gt;

&lt;p&gt;Apache DolphinScheduler supports user notifications as part of a workflow. This
mechanism is designed to help users monitor and manage their workflows more
effectively and respond quickly to any issues.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/45/alerts.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;These alerts can be configured to notify users via email, SMS, or other
communication channels, and can include details such as the name of the
workflow, the name of the failed task, and the error message or stack trace
associated with the failure.&lt;/p&gt;

&lt;p&gt;In addition to these configurable alerts, DolphinScheduler provides a dashboard
for monitoring the status and progress of workflows and tasks. It includes
real-time updates and visualizations of workflow performance and status. The
dashboard helps users quickly identify any issues or bottlenecks in their
workflows and take corrective action as needed.&lt;/p&gt;

&lt;p&gt;&lt;img width=&quot;80%&quot; src=&quot;/assets/episode/45/monitoring.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-creating-a-simple-trino-workflow-in-dolphinscheduler&quot;&gt;Demo of the episode: Creating a simple Trino workflow in DolphinScheduler&lt;/h2&gt;

&lt;p&gt;For this episodes’ demo, we look at creating a workflow consisting of a Trino
query execution managed by a workflow in DolphinScheduler.&lt;/p&gt;

&lt;p&gt;Run the demo by following 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/dolphinscheduler&quot;&gt;the steps listed&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-improve-performance-of-parquet-files&quot;&gt;PR of the episode: Improve performance of Parquet files&lt;/h2&gt;

&lt;p&gt;While we’re on the topic of data lakes, we had several performance for Parquet
files in release 407 from contributor and maintainer, 
&lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;@raunaqmorarka&lt;/a&gt;. This change includes an
improvement on performance of reading Parquet files for
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15713&quot;&gt;decimal types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15850&quot;&gt;numeric types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15923&quot;&gt;string types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15954&quot;&gt;timestamp and boolean types&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While Trino has historically had better performance for the ORC format, the 
Parquet file type has grown drastically in popularity and so this is one of 
many examples of the improving support around Parquet files for data lakes.&lt;/p&gt;

&lt;h2 id=&quot;find-out-more-about-dolphinscheduler&quot;&gt;Find out more about DolphinScheduler&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://dolphinscheduler.apache.org/&quot;&gt;https://dolphinscheduler.apache.org/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/dolphinscheduler&quot;&gt;https://github.com/apache/dolphinscheduler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/dolphinschedule&quot;&gt;https://twitter.com/dolphinschedule&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>44: Seeing clearly with Metabase</title>
      <link href="https://trino.io/episodes/44.html" rel="alternate" type="text/html" title="44: Seeing clearly with Metabase" />
      <published>2023-01-26T00:00:00+00:00</published>
      <updated>2023-01-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/44</id>
      <content type="html" xml:base="https://trino.io/episodes/44.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/luispaolini/&quot;&gt;Luis Paolini&lt;/a&gt;, Success Engineer at
&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/andrewdibiasio/&quot;&gt;Andrew DiBiasio&lt;/a&gt;, Software
Engineer at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/piotrleniartek&quot;&gt;Piotr Leniartek&lt;/a&gt;, Product Manager
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2022&quot;&gt;Recap of Trino in 2022&lt;/h2&gt;

&lt;p&gt;Highlights from the blog post &lt;a href=&quot;/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;The rabbit reflects on Trino in 2022&lt;/a&gt; touch upon various aspects.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lots of growth for the community celebrating 10 years Trino&lt;/li&gt;
  &lt;li&gt;Trino Summit, Cinco de Trino, Trino Community Broadcast, and more content&lt;/li&gt;
  &lt;li&gt;Trino: The Definitive Guide second edition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lots of Trino releases and new features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; support&lt;/li&gt;
  &lt;li&gt;JSON functions&lt;/li&gt;
  &lt;li&gt;Table functions&lt;/li&gt;
  &lt;li&gt;Fault-tolerant execution&lt;/li&gt;
  &lt;li&gt;Upgrade to Java 17&lt;/li&gt;
  &lt;li&gt;New Delta Lake, Hudi, and MariaDB connectors&lt;/li&gt;
  &lt;li&gt;Tons and tons of performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-404-to-406&quot;&gt;Releases 404 to 406&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-404.html&quot;&gt;Trino 404&lt;/a&gt; not found&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-405.html&quot;&gt;Trino 405&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER COLUMN ... SET DATA TYPE&lt;/code&gt; statement.&lt;/li&gt;
  &lt;li&gt;Support for Apache Arrow when reading from BigQuery.&lt;/li&gt;
  &lt;li&gt;Support for views in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for the Iceberg REST catalog.&lt;/li&gt;
  &lt;li&gt;Support for Protobuf encoding in the Kafka connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and query pushdown in the Redshift connector.&lt;/li&gt;
  &lt;li&gt;Performance improvements when reading Parquet data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-406.html&quot;&gt;Trino 406&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for JDBC catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for exchange spooling on HDFS.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; constraints with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Improved performance for Parquet files with the Delta Lake, Hive, Hudi and
Iceberg connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-405.html&quot;&gt;Trino 405&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-406.html&quot;&gt;Trino 406&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also shipped trino-python-client 0.321.0 with the following improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for SQLAlchemy 2.0.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varbinary&lt;/code&gt; query parameters.&lt;/li&gt;
  &lt;li&gt;Add support for variable precision &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datetime&lt;/code&gt; types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-is-metabase&quot;&gt;What is Metabase&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;../assets/images/logos/metabase-small.png&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt; is the  easy, open-source BI tool with the
friendly UX and integrated tooling to let your company explore data on their
own. Everyone in your company can ask questions and learn from your data.&lt;/p&gt;

&lt;p&gt;Running Metabase locally is easy. Try with a container runtime and the 300 MB
image:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run -it -p 3000:3000 metabase/metabase
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Or use a JVM and the 260MB single JAR file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;wget https://downloads.metabase.com/latest/metabase.jar
java -jar metabase.jar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can go zero to dashboard in under 6 minutes - &lt;a href=&quot;https://www.metabase.com/demo&quot;&gt;learn more from the
demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-screenshot.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Core features and advantages of Metabase include the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Visual query build&lt;/li&gt;
  &lt;li&gt;Dashboards&lt;/li&gt;
  &lt;li&gt;Models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metabase is a web-based application that you run on a server. You can make it
available to multiple users. It uses SQL to create queries, reports,
visualizations, dashboards, and more.&lt;/p&gt;

&lt;p&gt;You can host it yourself locally, run it in your own datacenter or use the
cloud:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-self-hosted.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/44/metabase-cloud-hosted.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase is an open source project licensed under the GNU Affero General Public
License (AGPL) license. It is written in Clojure and therefore runs on the Java
virtual machine.&lt;/p&gt;

&lt;p&gt;Following is a high-level architecture diagram:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase is also the name of the company, founded in 2014. It provides an
expanded version under a commercial license, a SaaS version of the application,
support and others services, and manages the open source project.&lt;/p&gt;

&lt;p&gt;Metabase is running in more than 50K instances around the world, including over
2K using the SaaS version.&lt;/p&gt;

&lt;h2 id=&quot;history-of-metabase-and-trino&quot;&gt;History of Metabase and Trino&lt;/h2&gt;

&lt;p&gt;Metabase was first released in 2015 as version 0.9. Since the initial release it
has grown to be a well known and widely used BI application.&lt;/p&gt;

&lt;p&gt;A Presto driver was created in 2018. It directly integrated with the client REST
API. With the rename of Presto to Trino, Manfred &lt;a href=&quot;https://github.com/metabase/metabase/pull/15160&quot;&gt;created a
PR&lt;/a&gt; that replicates this for
Trino to ensure continued support for the community. In the discussion it was
decided that it would be better to use the Trino JDBC driver, similar to how
other drivers for Metabase work.&lt;/p&gt;

&lt;p&gt;After some more demand from the user and customer community, Starburst and
Metabase established a collaboration, and started implementation of the current
driver. Piotr led the charge, Andrew buckled down and learned Clojure, and
together a first release was created and tested. The driver is now provided as
an open source project managed by Starburst.&lt;/p&gt;

&lt;h2 id=&quot;core-advantages-of-using-metabase-with-trino&quot;&gt;Core advantages of using Metabase with Trino&lt;/h2&gt;

&lt;p&gt;With Metabase and the driver for Trino, Trino users have access to a well
established and proven open source BI tool. It is suitable for internal usage in
any organization, and users can upgrade to commercial version for more demanding
deployments and use cases.&lt;/p&gt;

&lt;p&gt;The combination of Trino and Metabase also provides a number of unique benefits
for Metabase users that are not available with typical drivers for systems.
These are typically databases that support SQL, and are limited to the specific
database.&lt;/p&gt;

&lt;p&gt;With Trino and the driver, you have access to the following unique features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Metabase users can connect to databases that do no yet have a Metabase driver,
but are supported by Trino&lt;/li&gt;
  &lt;li&gt;Trino also enables using SQL for system that don’t support SQL such as MongoDB
or Elasticsearch, and therefore allows Metabase usage with these systems.&lt;/li&gt;
  &lt;li&gt;With Trino you can join data from different catalogs in the same SQL query.
This also applies to Metabase reports or visualizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;Can I join multiple engines? Yes &lt;br /&gt;
Can I join SQL and no-SQL engines? YES!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ElasticSearch, Google Spreadsheets, Cassandra, Redis and others are all
accessible with Trino. Specifically this also opens up querying object storage
data lakes on S3 and other systems with the Hive, Delta Lake, Iceberg, and Hudi
connectors - all from Metabase.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-trino-datasources.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase also includes support for access control for any connected datasource,
all the way to row level security. This includes Trino and can be used to secure
Trino access through Metabase to a large group of your Trino users, such as all
BI users. It can even be used to add row level security for No SQL databases.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-no-sql-security.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-metabase-and-trino&quot;&gt;Demo of the episode: Metabase and Trino&lt;/h2&gt;

&lt;p&gt;Luis shows us the demo from his repository at
&lt;a href=&quot;https://github.com/paoliniluis/metabase-trino&quot;&gt;https://github.com/paoliniluis/metabase-trino&lt;/a&gt;.
Watch our video to see it and action, and check out the instructions in the
repository to try yourself.&lt;/p&gt;

&lt;h2 id=&quot;real-world-use-cases-at-meesho&quot;&gt;Real world use cases at Meesho&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;../assets/images/logos/meesho-small.png&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.meesho.com/&quot;&gt;Meesho&lt;/a&gt; is India’s fastest growing internet commerce
company. They provide a large retail website and support small business
entrepreneurs with their platform.&lt;/p&gt;

&lt;p&gt;Meesho relies on the Trino, Metabase and the Trino Metabase driver from
Starburst for their data platform.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/meesho-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Piotr and Luis share more details:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Meesho needs the ability to query the lake, with high speed, concurrency and
scale. It was not possible before Trino, in the form of Starburst Enterprise,
and Metabase were introduced.&lt;/li&gt;
  &lt;li&gt;Meesho observes more than 13 million queries from Metabase in 10 months.&lt;/li&gt;
  &lt;li&gt;Meesho uses Metabase to add security and governance for the data assets.&lt;/li&gt;
  &lt;li&gt;A next planned step is to integrate with &lt;a href=&quot;https://www.metabase.com/docs/latest/data-modeling/models#enable-model-caching-in-metabase&quot;&gt;Metabase Model
Caching&lt;/a&gt;
to improve user experience even more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode&quot;&gt;PR of the episode&lt;/h2&gt;

&lt;p&gt;Let’s explore the code a bit, instead of focussing on a specific PR. The whole
driver codebase is open source at
&lt;a href=&quot;https://github.com/starburstdata/metabase-driver&quot;&gt;https://github.com/starburstdata/metabase-driver&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As mentioned earlier the whole driver is written in Clojure, and Andrew tells us
more about his experience writing the driver and working with the two systems.&lt;/p&gt;

&lt;p&gt;We also talk about a recent community &lt;a href=&quot;https://github.com/starburstdata/metabase-driver/pull/59&quot;&gt;PR for datetime
functions&lt;/a&gt; and the
ongoing work to support model caching.&lt;/p&gt;

&lt;h2 id=&quot;datanova-and-other-trino-events&quot;&gt;Datanova and other Trino events&lt;/h2&gt;

&lt;p&gt;We invite you all to join us for the &lt;a href=&quot;http://bit.ly/3j2N9Q9&quot;&gt;free, virtual conference
Datanova&lt;/a&gt; from Starburst. Trino and related tools and
approaches are touched upon in many presentations and discussion.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;../community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Metabase and Trino are a great combination of tools. Together they unlock use
cases that are difficult or impossible to implement with other tools. Give it a
try!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>43: Trino saves trips with Alluxio</title>
      <link href="https://trino.io/episodes/43.html" rel="alternate" type="text/html" title="43: Trino saves trips with Alluxio" />
      <published>2022-12-15T00:00:00+00:00</published>
      <updated>2022-12-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/43</id>
      <content type="html" xml:base="https://trino.io/episodes/43.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Bin Fan, VP of Open Source at Alluxio and PMC maintainer of Alluxio open 
source and TSC member of Presto (&lt;a href=&quot;https://twitter.com/binfan&quot;&gt;@binfan&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/beinan/&quot;&gt;Beinan Wang&lt;/a&gt;, Software Engineer at 
Alluxio and Presto committer&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/alluxio-trino.jpeg&quot; /&gt;
&lt;br /&gt;
The Alluxio crew at Trino Summit 2022. &lt;br /&gt;
From left to right:
&lt;a href=&quot;https://www.linkedin.com/in/beinan/&quot;&gt;Beinan Wang&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/bin-fan/&quot;&gt;Bin Fan&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/bitsondatadev/&quot;&gt;Brian Olsen&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/dennyglee/&quot;&gt;Denny Lee&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/hopechong/&quot;&gt;Hope Wang&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/jasminechenwang/&quot;&gt;Jasmine Wang&lt;/a&gt;.
&lt;br /&gt;
Somehow Denny Lee from &lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; snuck in there
😉. Love the data community vibes on this one.

&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-data-caching-and-orchestration&quot;&gt;Concept of the episode: Data caching and orchestration&lt;/h2&gt;

&lt;p&gt;Out of all those petabytes of data you store, only a small fraction of it is
creating business value for you today. When you scan the same data multiple
times and transfer it over the wire, you’re wasting time, compute cycles, and
ultimately money. This gets worse when you’re pulling data across regions or
clouds from disaggregate Trino clusters. In situations like these, caching
solutions can make a tremendous impact on the latency and cost of your queries.&lt;/p&gt;

&lt;h3 id=&quot;trino-without-caching&quot;&gt;Trino without caching&lt;/h3&gt;

&lt;p&gt;There seems to be a sizeable portion of the community who aren’t using a
caching solution. Not all workloads will really benefit from caching. If you
are performing more writes than reads, the cache will need to constantly be
invalidated before performing each read. If you are scanning all your data to
run daily migrations, you would not benefit from caching. However, one of the
most common use cases where Trino shines is interactive adhoc analytics. This 
type of querying is very fast in Trino, especially when using modern storage 
formats like Iceberg.&lt;/p&gt;

&lt;h3 id=&quot;two-types-of-caching&quot;&gt;Two types of caching&lt;/h3&gt;

&lt;p&gt;There are two types of caching used with Trino. The first type caches the
results of a common query or sub query to optimize performance for any query
that overlaps with predicates to obtain the cached results.&lt;/p&gt;

&lt;p&gt;The other type of query is object file caching. Rather than store the results of
the query, you are caching the files from a file or object store that are
scanned as part of the query.&lt;/p&gt;

&lt;p&gt;In this episode, we will focus on the latter type of caching. This will apply to
connectors like Hive, Iceberg, Delta Lake, and Hudi.&lt;/p&gt;

&lt;h3 id=&quot;hive-connector-caching&quot;&gt;Hive connector caching&lt;/h3&gt;

&lt;p&gt;Trino has an &lt;a href=&quot;https://trino.io/docs/current/connector/hive-caching.html&quot;&gt;embedded caching engine&lt;/a&gt;
in the Hive connector. This is convenient as it ships with Trino, however, it 
does not work outside the Hive connector. The caching engine is 
&lt;a href=&quot;https://github.com/qubole/rubix&quot;&gt;Rubix&lt;/a&gt;. While this system works for simple
Hive use cases, it fails to address use cases outside of Hive and hasn’t been
maintained since 2020. There are many features missing like security features
and support for more compute engines.&lt;/p&gt;

&lt;h3 id=&quot;what-is-alluxio&quot;&gt;What is Alluxio?&lt;/h3&gt;

&lt;p&gt;Alluxio is world’s first open source data orchestration technology for analytics
and AI for the cloud. It provides a common interface enabling computation
frameworks to connect to numerous storage systems through a common interface.
Alluxio’s memory-first tiered architecture enables data access at speeds orders
of magnitude faster than existing solutions. Alluxio was originally developed at
Berkley Amp Labs, &lt;a href=&quot;https://amplab.cs.berkeley.edu/wp-content/uploads/2014/11/2014_socc_tachyon.pdf&quot;&gt;and was originally called Tachyon&lt;/a&gt;.
It was less focused on caching and data orchestration and more focused on
fault-tolerance via lineage and other techniques borrowed from Spark.&lt;/p&gt;

&lt;p&gt;Alluxio lies between data driven applications, such as Trino and Apache Spark,
and various persistent storage systems, such as Amazon S3, Google Cloud Storage,
HDFS, Ceph, and MinIO. Alluxio unifies the data stored in these different
storage systems, presenting unified client APIs and a global namespace to its
upper layer data driven applications.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/alluxio-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Alluxio is commonly used as a distributed shared caching service so compute
engines talking to Alluxio can transparently cache frequently accessed data,
especially from remote locations, to provide in-memory I/O throughput. Alluxio
also enables unifying all data storage to a single namespace. This can make
things simpler if your data is stored across different systems, have data stored
in different regions, or stored across different clouds.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/inside-alluxio.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://docs.alluxio.io/os/user/stable/en/Overview.html&quot;&gt;https://docs.alluxio.io/os/user/stable/en/Overview.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;what-is-data-orchestration&quot;&gt;What is data orchestration?&lt;/h3&gt;

&lt;p&gt;A data orchestration platform abstracts data access across storage systems,
virtualizes all the data, and presents the data via standardized APIs with
global namespace to data-driven applications. In the meantime, it should have
caching functionality to enable fast access to warm data. In summary, a data
orchestration platform provides data-driven applications data accessibility,
data locality, and data elasticity.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://www.alluxio.io/blog/data-orchestration-the-missing-piece-in-the-data-world/&quot;&gt;https://www.alluxio.io/blog/data-orchestration-the-missing-piece-in-the-data-world/&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;trino-and-alluxio-expedia-use-case&quot;&gt;Trino and Alluxio: Expedia use case&lt;/h3&gt;

&lt;p&gt;Expedia needed to have the ability to query cross cluster over different regions
while simplifying the interface to their local data sources.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/expedia-trino-alluxio.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://www.alluxio.io/blog/unifying-cross-region-access-in-the-cloud-at-expedia-group-the-path-toward-data-mesh-in-the-brand-world/&quot;&gt;Unifying cross-region access in the cloud at Expedia Group — The path toward data mesh in the brand world&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-alluxioalluxio-pr-13000-add-a-doc-for-trino&quot;&gt;PR of the episode: Alluxio/alluxio PR 13000 Add a doc for Trino&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/Alluxio/alluxio/pull/13000&quot;&gt;This episode’s PR&lt;/a&gt; is actually
not located in a Trino repository. This PR comes from the Alluxio repository. It
happened in the wake of the rebranding from Presto to Trino. PRs like this
helped continue the Trino community as it grew awareness around the new name, as
well as, fixed any potential issues that occurred with the hasty renaming we had
to do.&lt;/p&gt;

&lt;p&gt;This was submitted by Alluxio engineer, &lt;a href=&quot;https://github.com/yuzhu&quot;&gt;David Zhu&lt;/a&gt;.
A huge thanks to David and his contributions to Trino as well!&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-running-trino-on-alluxio&quot;&gt;Demo of the episode: Running Trino on Alluxio&lt;/h2&gt;

&lt;p&gt;This demo of the episode, covers how to configure Alluxio to use write-through
caching to MinIO. This is done using the Iceberg connector with only one change
to the location property on the table from the Trino perspective.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/yaxPEWRpEzc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;To follow this demo, copy the code located in the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/alluxio/trino-alluxio-iceberg-minio&quot;&gt;trino-getting-started repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>42: Trino Summit 2022 recap</title>
      <link href="https://trino.io/episodes/42.html" rel="alternate" type="text/html" title="42: Trino Summit 2022 recap" />
      <published>2022-11-17T00:00:00+00:00</published>
      <updated>2022-11-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/42</id>
      <content type="html" xml:base="https://trino.io/episodes/42.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Zhan, Product Manager at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/brianzhan1&quot;&gt;@brianzhan1&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/claudiusli&quot;&gt;Claudius Li&lt;/a&gt;, Product Manager at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/daindumb&quot;&gt;@daindumb&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Martin Traverso, Trino creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-402-to-403&quot;&gt;Releases 402 to 403&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-402.html&quot;&gt;Trino 402&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for column comments in Hive and Iceberg views.&lt;/li&gt;
  &lt;li&gt;Support predicate pushdown on temporal types in MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nullif&lt;/code&gt;, and arithmetic operations in SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-403.html&quot;&gt;Trino 403&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; in MongoDB.&lt;/li&gt;
  &lt;li&gt;Faster aggregations.&lt;/li&gt;
  &lt;li&gt;Faster data transfers with fault-tolerant execution.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW SCHEMAS&lt;/code&gt; in BigQuery.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; in Apache Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-402.html&quot;&gt;Trino 402&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-403.html&quot;&gt;Trino 403&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;trino-summit-2022-recap&quot;&gt;Trino Summit 2022 recap&lt;/h2&gt;

&lt;p&gt;This episode we’re doing a recap of both the Trino Summit and the first Trino
Contributor Congregation. We dive into what everyone’s favorite Trino Summit
sessions were. Then we cover key takeaways from the Trino Contributor
Congregation, which took place the day after.&lt;/p&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>41: Trino puts on its Hudi</title>
      <link href="https://trino.io/episodes/41.html" rel="alternate" type="text/html" title="41: Trino puts on its Hudi" />
      <published>2022-10-27T00:00:00+00:00</published>
      <updated>2022-10-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/41</id>
      <content type="html" xml:base="https://trino.io/episodes/41.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Sagar Sumit, Software Engineer at 
 &lt;a href=&quot;https://www.onehouse.ai&quot;&gt;Onehouse&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/sagarsumit6&quot;&gt;@sagarsumit6&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/yueluhelloworld&quot;&gt;Grace (Yue) Lu&lt;/a&gt;, Software
Engineer at &lt;a href=&quot;https://robinhood.com&quot;&gt;Robinhood&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This &lt;strong&gt;free&lt;/strong&gt; event on November
10th will take place in-person at the Commonwealth Club in San Francisco, CA or
can also be attended remotely!&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Read about the recently announced speaker sessions and details in these blog posts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;Trino Summit 2022 first post&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/10/19/trino-summit-2022-teaser-2.html&quot;&gt;Trino Summit 2022 second post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250 
attendees, so register soon if you plan to attend in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-396-to-401&quot;&gt;Releases 396 to 401&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-396.html&quot;&gt;Trino 396&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance when processing strings.&lt;/li&gt;
  &lt;li&gt;Faster writing of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; types to Parquet.&lt;/li&gt;
  &lt;li&gt;Support for pushing down complex join criteria to connectors.&lt;/li&gt;
  &lt;li&gt;Support for column and table comments in BigQuery connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-397.html&quot;&gt;Trino 397&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;S3 Select pushdown for JSON data in Hive connector.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_trunc&lt;/code&gt; predicates over partition columns in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Reduced query latency with Glue catalog in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-398.html&quot;&gt;Trino 398&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Hudi connector.&lt;/li&gt;
  &lt;li&gt;Improved performance for Parquet data in Delta Lake, Hive and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Support for column comments in Accumulo connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; type in Pinot connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-399.html&quot;&gt;Trino 399&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster joins.&lt;/li&gt;
  &lt;li&gt;Faster reads of decimal values in Parquet data.&lt;/li&gt;
  &lt;li&gt;Support for writing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; columns in BigQuery.&lt;/li&gt;
  &lt;li&gt;Support for predicate pushdown involving datetime types in MongoDB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-400.html&quot;&gt;Trino 400&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for TRUNCATE in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for the Pinot proxy.&lt;/li&gt;
  &lt;li&gt;Improved latency when querying Iceberg tables with many files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-401.html&quot;&gt;Trino 401&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance and reliability of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for writing to Google Cloud Storage in Delta Lake.&lt;/li&gt;
  &lt;li&gt;Support for IBM Cloud Object Storage in Hive.&lt;/li&gt;
  &lt;li&gt;Support for writes with fault-tolerant execution in MySQL, PostgreSQL, and SQL
Server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Cole:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The new Hudi connector is worth mentioning twice. It was in the works for a
while, and we’re really excited it has arrived and continues to improve.&lt;/li&gt;
  &lt;li&gt;Trino 396 added support for version three of the Delta Lake writer, then Trino
401 added support for version four, so we’ve jumped from two to four since the
last time you saw us!&lt;/li&gt;
  &lt;li&gt;There have been a ton of fixes to table and column comments across a wide
variety of connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-396.html&quot;&gt;Trino 396&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-397.html&quot;&gt;Trino 397&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-398.html&quot;&gt;Trino 398&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-399.html&quot;&gt;Trino 399&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-400.html&quot;&gt;Trino 400&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-401.html&quot;&gt;Trino 401&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-intro-to-hudi-and-the-hudi-connector&quot;&gt;Concept of the week: Intro to Hudi and the Hudi connector&lt;/h2&gt;

&lt;p&gt;This week we’re talking about the Hudi connector that was added in version 398.&lt;/p&gt;

&lt;h3 id=&quot;what-is-apache-hudi&quot;&gt;What is Apache Hudi?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Apache Hudi&lt;/a&gt; (pronounced “hoodie”) is a streaming
data lakehouse platform by combining warehouse and database functionality. Hudi
is a table format that enables transactions, efficient upserts/deletes, advanced
indexing, streaming ingestion services, data clustering/compaction
optimizations, and concurrency.&lt;/p&gt;

&lt;p&gt;Hudi is not just a table format, but has many services aimed at creating
efficient incremental batch pipelines. Hudi was born out of Uber and is used at
companies like Amazon, ByteDance, and Robinhood.&lt;/p&gt;

&lt;h3 id=&quot;merge-on-read-mor-and-copy-on-write-cow-tables&quot;&gt;Merge on read (MOR) and copy on write (COW) tables&lt;/h3&gt;

&lt;p&gt;The Hudi table format and services aim to provide a suite of tools that make
Hudi adaptive to realtime and batch use cases on the data lake. Hudi will lay
out data following 
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#merge-on-read-table&quot;&gt;merge on read&lt;/a&gt;,
which optimizes writes over reads, and
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#copy-on-write-table&quot;&gt;copy on write&lt;/a&gt;,
which optimizes reads over writes.&lt;/p&gt;

&lt;h3 id=&quot;hudi-metadata-table&quot;&gt;Hudi metadata table&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://hudi.apache.org/docs/next/metadata&quot;&gt;Hudi metadata table&lt;/a&gt; can
improve read/write performance of your queries. The main purpose of this table
is to eliminate the requirement for the “list files” operation. It is a result
from how
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Hive-modelled SQL tables&lt;/a&gt;
point to entire directories versus pointing to specific files with ranges.
Using files with ranges help prune out files outside the query criteria.&lt;/p&gt;

&lt;h3 id=&quot;hudi-data-layout&quot;&gt;Hudi data layout&lt;/h3&gt;

&lt;p&gt;Hudi uses 
&lt;a href=&quot;https://hudi.apache.org/docs/next/file_layouts&quot;&gt;multiversion concurrency control (MVCC)&lt;/a&gt;,
where compaction action merges logs and base files to produce new file slices
a cleaning action gets rid of unused/older file slices to reclaim space on the
file system.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/hudi-mvcc-files.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;robinhood-trino-and-hudi-use-cases&quot;&gt;Robinhood Trino and Hudi use cases&lt;/h3&gt;

&lt;p&gt;One of the well-known users of Trino and Hudi is Robinhood. Grace (Yue) Lu, who
&lt;a href=&quot;https://www.youtube.com/watch?v=gFTDQGRXOus&quot;&gt;joined us at Trino Summit 2021&lt;/a&gt;,
covers Robinhood’s architecture and use cases for Trino and Hudi.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/robinhood-hudi-trino-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Robinhood ingests data via Debezium and streams it into Hudi. Then Trino is able
to read data as it becomes available in Hudi.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/robinhood-use-cases.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Hudi and Trino support critical use cases like IPO company stock allocation,
liquidity risk monitoring, clearing settlement reports, and generally fresher
metrics reporting and analysis.&lt;/p&gt;

&lt;h3 id=&quot;the-current-state-of-the-trino-hudi-connector&quot;&gt;The current state of the Trino Hudi connector&lt;/h3&gt;

&lt;p&gt;Before we had 
&lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot;&gt;the official Hudi connector&lt;/a&gt;,
many, like Robinhood, had to use the Hive connector. They were therefore not
able to take advantage of the metadata table and many other optimizations Hudi
provides out of the box.&lt;/p&gt;

&lt;p&gt;The connector gets around that and now enables using some Hudi abstractions.
However, the connector is currently limited to read-only mode and doesn’t
support writes. Spark is the primary system used to stream data to Trino in
Hudi. Check out the demo to see the connector in action.&lt;/p&gt;

&lt;h3 id=&quot;upcoming-features-in-hudi-connector&quot;&gt;Upcoming features in Hudi connector&lt;/h3&gt;

&lt;p&gt;First we want to get the read support improved and support all query types. As a
next step we aim to add DDL support.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The connector only supports copy on write tables, and soon we will add merge
on read table support.&lt;/li&gt;
  &lt;li&gt;Hudi has multiple 
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#query-types&quot;&gt;query types&lt;/a&gt;.
Adding snapshot querying support will be coming shortly.&lt;/li&gt;
  &lt;li&gt;Integration with metadata table.&lt;/li&gt;
  &lt;li&gt;Utilize the column statistics index.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-14445-fault-tolerant-execution-for-postgresql-and-mysql-connectors&quot;&gt;PR 14445: Fault-tolerant execution for PostgreSQL and MySQL connectors&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/14445&quot;&gt;PR of the episode&lt;/a&gt; was
contributed by Matthew Deady (&lt;a href=&quot;https://github.com/mwd410&quot;&gt;@mwd410&lt;/a&gt;). The
improvements enable writes to PostgreSQL and MySQL when fault-tolerant execution
is enabled (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry-policy&lt;/code&gt; is set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TASK&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QUERY&lt;/code&gt;). This update included a 
few changes to core classes used for connectors using JDBC clients for Trino to
connect to the database. For example, Matthew was able to build on this PR by
adding a few additional changes to get this working in SQL Server in
&lt;a href=&quot;https://github.com/trinodb/trino/pull/14730&quot;&gt;PR 14730&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thank you so much to Matthew for extending our fault-tolerant execution to
connectors using JDBC clients! As usual, thanks to all the reviewers and
maintainers who got these across the line!&lt;/p&gt;

&lt;h2 id=&quot;demo-using-the-hudi-connector&quot;&gt;Demo: Using the Hudi Connector&lt;/h2&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone the
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hudi/trino-hudi-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git
cd community_tutorials/hudi/trino-hudi-minio
docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For now, you will need to import data using the Spark and Scala method we detail
in the video. Eventually we will provide a SparkSQL in the near term, and update
this to show the Trino DDL support when it lands.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SHOW CATALOGS;

SHOW SCHEMAS IN hudi;

SHOW TABLES IN hudi.default;

SELECT COUNT(*) FROM hudi.default.hudi_coders_hive;

SELECT * FROM hudi.default.hudi_coders_hive;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Hudi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://onehouse.io&quot;&gt;One House&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blog posts&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://robinhood.engineering/author-balaji-varadarajan-e3f496815ebf&quot;&gt;Fresher Data Lake on S3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>40: Trino&apos;s cold as Iceberg!</title>
      <link href="https://trino.io/episodes/40.html" rel="alternate" type="text/html" title="40: Trino&apos;s cold as Iceberg!" />
      <published>2022-09-08T00:00:00+00:00</published>
      <updated>2022-09-08T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/40</id>
      <content type="html" xml:base="https://trino.io/episodes/40.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/15/trino-iceberg.png&quot; /&gt;&lt;br /&gt;
Looks like Commander Bun Bun is safe on this Iceberg&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ryan Blue, creator of Iceberg and CEO at
 &lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt; (&lt;a href=&quot;https://github.com/rdblue&quot;&gt;@rdblue&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Sam Redai, Developer Advocate at &lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/samuelredai&quot;&gt;@samuelredai&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/tomnats&quot;&gt;Tom Nats&lt;/a&gt;, Director of Customer Solutions at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This &lt;strong&gt;free&lt;/strong&gt; event on November
10th will take place in-person at the Commonwealth Club in San Francisco, CA or
can also be attended remotely!  If you want to present, the
&lt;a href=&quot;https://sessionize.com/trino-summit-2022/&quot;&gt;call for speakers&lt;/a&gt; is open until
September 15th.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250
attendees, so register soon if you plan on attending in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-394-to-395&quot;&gt;Releases 394 to 395&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-394.html&quot;&gt;Trino 394&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;JSON output format for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; expressions.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; support in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;TLS support in Pinot connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-395.html&quot;&gt;Trino 395&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Better performance for large clusters.&lt;/li&gt;
  &lt;li&gt;Improved memory efficiency for aggregations and fault tolerant execution.&lt;/li&gt;
  &lt;li&gt;Faster aggregations over decimal columns.&lt;/li&gt;
  &lt;li&gt;Support for dynamic function resolution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Cole:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The improved performance of inserts on Delta Lake, Hive, and Iceberg is a huge
one. We’re not entirely sure how much it’ll matter in production use cases, but
some of the benchmarks suggested it could be massive - one test showed a 75%
reduction in query duration.&lt;/li&gt;
  &lt;li&gt;Dynamic function resolution in the SPI is going to unlock some very neat
possibilities down the line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-394.html&quot;&gt;Trino 394&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-395.html&quot;&gt;Trino 395&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-latest-features-in-apache-iceberg-and-the-iceberg-connector&quot;&gt;Concept of the week: Latest features in Apache Iceberg and the Iceberg connector&lt;/h2&gt;

&lt;p&gt;It has been over a year since we had Ryan on the Trino Community Broadcast as
guest to discuss what Apache Iceberg is and how it can be used in Trino. Since
then, the adoption of Iceberg in our community has skyrocketed. Iceberg is
delivering as a much better alternative to the Hive table format.&lt;/p&gt;

&lt;p&gt;The initial phase of the Iceberg connector in Trino aimed to provide fast and
interoperable read support. A typical usage was Trino alongside other query
engines like Apache Spark which supported many of the data modification language
(DML) SQL features on Iceberg. One of the biggest requests we got as adoption
increased was the ability to do everything through Trino. This episode dives
into some of the latest features that were missing from the early iterations of
the Iceberg connector and what has changed in Iceberg as well!&lt;/p&gt;

&lt;h3 id=&quot;what-is-apache-iceberg&quot;&gt;What is Apache Iceberg?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt; is a next-generation table format that
defines a standard around the metadata used to map data to a SQL query engine.
It addresses a lot of the maintainability and reliability issues many engineers
experienced with the way
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Hive modeled SQL tables&lt;/a&gt;
over big data files.&lt;/p&gt;

&lt;p&gt;One common confusion to point out is that table format is not equivalent to file
formats like ORC or Parquet. The table format is the layer that maintains
metadata mapping these files to the concept of a table and other common database
abstractions.&lt;/p&gt;

&lt;p&gt;This episode assumes you have some basic knowledge of Trino and Iceberg already. If
you are new to Iceberg or need a refresher, we recommend the two older episodes
about Iceberg and Trino basics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/14.html&quot;&gt;14: Iceberg: March of the Trinos&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/15.html&quot;&gt;15: Iceberg right ahead!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;why-iceberg-over-other-formats&quot;&gt;Why Iceberg over other formats?&lt;/h3&gt;

&lt;p&gt;There has been some great advancements to big data technologies that brought
back SQL and data warehouse capabilities. However, Hive and Hive-like table
formats are still missing some capabilities due to limitations that Hive tables
have, such as dropping and reintroducing stale data unintentionally. On top of
that, Hive tables require a lot of knowledge of Hive internals. Some recent
formats aim to remain backwards compatible with Hive, but inadvertently
reintroduce these limitations.&lt;/p&gt;

&lt;p&gt;This is not the case with Iceberg. Iceberg has the most support for query
engines and puts a heavy emphasis on being a format that is interoperable. This
improves the level of flexibility users have to address a wider array of use
cases that may involve querying over a system like Snowflake or a data lakehouse
running with Iceberg. All of this is made possible by the
&lt;a href=&quot;https://iceberg.apache.org/spec&quot;&gt;Iceberg specification&lt;/a&gt; that all these query
engines must follow.&lt;/p&gt;

&lt;p&gt;Finally, a great video presented by Ryan Blue that dives into Iceberg is,
“&lt;a href=&quot;https://www.youtube.com/watch?v=_GW3GYZK66U&quot;&gt;Why you shouldn’t care about Iceberg&lt;/a&gt;.”&lt;/p&gt;

&lt;h3 id=&quot;metadata-catalogs&quot;&gt;Metadata catalogs&lt;/h3&gt;

&lt;p&gt;Catalogs, in the context of Iceberg, refer to the central storage of metadata.
Catalogs are also used to provide the atomic compare-and-swap needed to support
&lt;a href=&quot;https://iceberg.apache.org/docs/latest/reliability&quot;&gt;serializable isolation in Iceberg&lt;/a&gt;.
We’ll refer to them as metadata catalogs to avoid confusion with Trino
&lt;a href=&quot;https://trino.io/docs/current/sql/show-catalogs.html&quot;&gt;catalogs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The two existing catalogs supported in Trino’s Iceberg connector are the
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#hive-metastore-catalog&quot;&gt;Hive Metastore Service&lt;/a&gt;
and the AWS metastore counterpart of the Hive Metastore, Glue. While this
provides a nice migration from the Hive model, many are looking to replace these
rather cumbersome catalogs with something that’s lightweight. It turns out that
the Iceberg connector only uses the Hive Metastore Service to point to top-level
metadata files in Iceberg while the majority of metadata exist in the metastore
files in storage. This makes it even more compelling to get rid of the complex
Hive service in favor of simpler services. Two popular catalogs outside of these
are the &lt;a href=&quot;https://iceberg.apache.org/docs/latest/jdbc&quot;&gt;JDBC catalog&lt;/a&gt; and the
&lt;a href=&quot;https://github.com/apache/iceberg/pull/4348&quot;&gt;REST catalog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two PRs in progress to support these metadata catalogs in Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/11772&quot;&gt;Trino PR 11772: Support JDBC catalog in Iceberg connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/13294&quot;&gt;Trino PR 13294: Add Iceberg RESTSessionCatalog Implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;branching-tagging-and-auditing-oh-my&quot;&gt;Branching, tagging, and auditing, oh my!&lt;/h3&gt;

&lt;p&gt;Another feature set that is coming in Iceberg is the ability to use
&lt;a href=&quot;https://github.com/apache/iceberg/pull/5364&quot;&gt;refs to alias your snapshots&lt;/a&gt;.
This would enable branching and tagging behavior similar to git and treating
the snapshot as a commit. This is yet another way that simplifies moving between
known states of the data in Iceberg.&lt;/p&gt;

&lt;p&gt;On a related note, branching and tagging will eventually be used in the
&lt;a href=&quot;https://tabular.io/blog/integrated-audits&quot;&gt;audit integration in Iceberg&lt;/a&gt;.
Auditing allows you to push a soft commit by making a snapshot available, but
it is not initially published to the primary table. This is achieved using Spark
and setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spark.wap.id&lt;/code&gt; configuration property. This enables interesting
patterns like
&lt;a href=&quot;https://www.dremio.com/subsurface/write-audit-publish-pattern-via-apache-iceberg/&quot;&gt;Write-Audit-Publish (WAP) pattern&lt;/a&gt;,
where you first write the data, audit it using a data quality tool like
&lt;a href=&quot;https://greatexpectations.io&quot;&gt;Great Expectations&lt;/a&gt;, and lastly publish the data
to be visible from the main table. Currently, auditing has to use the
cherry-pick operation to publish. This becomes more streamlined with branching
and tagging.&lt;/p&gt;

&lt;h3 id=&quot;the-puffin-file-format&quot;&gt;The Puffin file format&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://iceberg.apache.org/puffin-spec&quot;&gt;Puffin file format&lt;/a&gt; is an
alternative to &lt;a href=&quot;https://parquet.apache.org/&quot;&gt;Parquet&lt;/a&gt; and
&lt;a href=&quot;https://orc.apache.org/&quot;&gt;ORC&lt;/a&gt;. This format stores information such as indexes
and statistics about data managed in an Iceberg table that cannot be stored
directly within the Iceberg manifest. A Puffin file contains arbitrary pieces of
information called “blobs”, along with metadata necessary to interpret them.&lt;/p&gt;

&lt;p&gt;This format &lt;a href=&quot;https://www.mail-archive.com/dev@iceberg.apache.org/msg03593.html&quot;&gt;was proposed&lt;/a&gt;
by long-time Trino maintainer, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen @findepi&lt;/a&gt;,
to address a performance issue noted when using Trino on Iceberg. The Puffin
format is a great extension for those using Iceberg tables, as it enables better
query plans in Trino at the file level.&lt;/p&gt;

&lt;h3 id=&quot;pyiceberg&quot;&gt;pyIceberg&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/apache/iceberg/tree/master/python&quot;&gt;pyIceberg library&lt;/a&gt;
is an exciting development that enables users to read their data directly from
Iceberg into their own Python code easily.&lt;/p&gt;

&lt;h3 id=&quot;trino-iceberg-connector-updates&quot;&gt;Trino Iceberg connector updates&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/merge&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/7933&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/update&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/12026&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/delete&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/11886&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Time travel (&lt;a href=&quot;https://github.com/trinodb/trino/pull/10258&quot;&gt;PR&lt;/a&gt;) was initially
released in
&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html#iceberg-connector&quot;&gt;version 385&lt;/a&gt;,
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt; syntax for snapshots/time travel
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10768&quot;&gt;was deprecated&lt;/a&gt; in
&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html#iceberg-connector&quot;&gt;version 387&lt;/a&gt;,
and there were two bug fixes for this feature in versions
&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html#iceberg-connector&quot;&gt;386&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html#iceberg-connector&quot;&gt;388&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#alter-table-set-properties&quot;&gt;Partition migration&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/12259&quot;&gt;PR&lt;/a&gt;)
While Trino was able to read tables with these migrations applied by other query
engines, this feature allows Trino to write these changes.&lt;/li&gt;
  &lt;li&gt;The following three features are table maintenance commands.
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#optimize&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimize&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10497&quot;&gt;PR&lt;/a&gt;) which is the equivalent to
the Spark SQL
&lt;a href=&quot;https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_data_files&quot;&gt;rewrite_data_files&lt;/a&gt;.&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#expire-snapshots&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10810&quot;&gt;PR&lt;/a&gt;) and uses the equivalent name
in Spark.&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#remove-orphan-files&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;remove_orphan_files&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10810&quot;&gt;PR&lt;/a&gt;) and uses the equivalent name
in Spark.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Iceberg v2 support (&lt;a href=&quot;https://github.com/trinodb/trino/pull/11880&quot;&gt;PR1&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/12351&quot;&gt;PR2&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/12749&quot;&gt;PR3&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/11642&quot;&gt;PR4&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/9881&quot;&gt;PR5&lt;/a&gt;, and many more…)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Almost every release has some sort of Iceberg improvement around
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13636&quot;&gt;planning&lt;/a&gt; or
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13395&quot;&gt;pushdown&lt;/a&gt;. If you want all the
latest features and performance improvements described here, it’s important to
keep up with the latest Trino version.&lt;/p&gt;

&lt;h2 id=&quot;pr-13111-scale-table-writers-per-task-based-on-throughput&quot;&gt;PR 13111: Scale table writers per task based on throughput&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/13111&quot;&gt;PR of the episode&lt;/a&gt; was
contributed by Gaurav Sehgal (&lt;a href=&quot;https://github.com/gaurav8297&quot;&gt;@gaurav8297&lt;/a&gt;) to
enable Trino to automatically scale writers. This PR aims to the number of task
writers per worker.&lt;/p&gt;

&lt;p&gt;You can enable this feature by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scale_task_writers&lt;/code&gt; true in your
configuration. Its initial test results are showing a sixfold speed increase.&lt;/p&gt;

&lt;p&gt;Thank you so much to Gaurav and all the reviewers that got this PR through!&lt;/p&gt;

&lt;h2 id=&quot;demo-dml-operations-on-iceberg-using-trino&quot;&gt;Demo: DML operations on Iceberg using Trino&lt;/h2&gt;

&lt;p&gt;For this demo of the episode, we use the same schema as the demo we ran in
&lt;a href=&quot;https://trino.io/episodes/15.html&quot;&gt;episode 15&lt;/a&gt;, and revise the syntax to
include new features.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone the
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git
cd iceberg/trino-iceberg-minio
docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now open up your favorite Trino client and connect it to
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;localhost:8080&lt;/code&gt; to run the following commands:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Make sure to first create a bucket names &quot;logging&quot; in MinIO before running
 */
CREATE SCHEMA iceberg.logging
WITH (location = &apos;s3a://logging/&apos;);

/**
 * Create table
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format_version = 2, -- New property to specify Iceberg spec format. Default 2
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;,&apos;level&apos;]
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:23:53.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;1 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 13:36:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;2 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs
SET PROPERTIES partitioning = ARRAY[&apos;hour(event_time)&apos;];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;3 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;WARN&apos;,
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;4 message&apos;,
  ARRAY [&apos;bad things could be happening&apos;]
),
(
  &apos;WARN&apos;,
  timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;5 message&apos;,
  ARRAY [&apos;bad things could be happening&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs
WHERE event_time &amp;lt; timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;;

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. Now let&apos;s check the snapshots.
 */


SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || &apos;WHALE HELLO THERE!&apos;
WHERE
  lower(level) = &apos;warn&apos;;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Read data from an old snapshot (Time travel)
 *
 * Old way: SELECT * FROM iceberg.logging.&quot;logs@2806470637437034115&quot;;
 */

SELECT * FROM iceberg.logging.logs FOR VERSION AS OF 2806470637437034115;

/**
 * Merge
 */
CREATE TABLE iceberg.logging.src (
   level varchar NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;
);

INSERT INTO iceberg.logging.src VALUES
 (
   &apos;ERROR&apos;,
   &apos;3 message&apos;,
   ARRAY [&apos;This one will not show up because it is an ERROR&apos;]
 ),
 (
   &apos;WARN&apos;,
   &apos;4 message&apos;,
   ARRAY [&apos;This should show up&apos;]
 ),
 (
   &apos;WARN&apos;,
   &apos;5 message&apos;,
   ARRAY [&apos;This should show up as well&apos;]
 );

MERGE INTO iceberg.logging.logs AS t
USING iceberg.logging.src AS s
ON s.message = t.message
WHEN MATCHED AND s.level = &apos;ERROR&apos;
        THEN DELETE
WHEN MATCHED
    THEN UPDATE
        SET message = s.message || &apos;-updated&apos;,
            call_stack = s.call_stack || t.call_stack;

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is just the tip of the iceberg that shows the powerful &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statement
and the other features we have added to Iceberg!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community&quot;&gt;Iceberg Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/talks&quot;&gt;Iceberg Talks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/blogs&quot;&gt;Iceberg Blogs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blog posts&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Looks like Commander Bun Bun is safe on this Iceberg https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>39: Raft floats on Trino to federate silos</title>
      <link href="https://trino.io/episodes/39.html" rel="alternate" type="text/html" title="39: Raft floats on Trino to federate silos" />
      <published>2022-08-18T00:00:00+00:00</published>
      <updated>2022-08-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/39</id>
      <content type="html" xml:base="https://trino.io/episodes/39.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode, we are talking to two engineers from 
&lt;a href=&quot;https://goraft.tech/&quot;&gt;Raft&lt;/a&gt; and discuss how they use Trino to connect data
silos that exist across different departments in various government sectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/edwardwmorgan/&quot;&gt;Edward Morgan&lt;/a&gt;, 
Senior Platform Engineer/DevSecOps Manager at Raft&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/steve-morgan-b9bb6642/&quot;&gt;Steve Morgan&lt;/a&gt;, Chief
Data Engineer at Raft&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This will be a hybrid event on 
November 10th that will take place in-person at the Commonwealth Club in San 
Francisco, CA and can also be attended remotely!  If you want to present, the 
&lt;a href=&quot;https://sessionize.com/trino-summit-2022/&quot;&gt;call for speakers&lt;/a&gt; is open until
September 15th.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250 
attendees, so register soon if you plan on attending in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-392-to-393&quot;&gt;Releases 392 to 393&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-392.html&quot;&gt;Trino 392&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for dynamic filtering with fault-tolerant query execution.&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Support for Amazon S3 Select pushdown for JSON files.&lt;/li&gt;
  &lt;li&gt;Support for Avro format in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Faster queries when filtering by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__time&lt;/code&gt; column in Druid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-393.html&quot;&gt;Trino 393&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of highly selective &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Experimental docker image for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ppc64le&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Dynamic filtering support for various connectors.&lt;/li&gt;
  &lt;li&gt;Support for JSON and bytes type in Pinot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lots of other improvements on Delta Lake, Hive, and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Merge support in a bunch of connectors.&lt;/li&gt;
  &lt;li&gt;OAuth 2.0 refresh token fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-392.html&quot;&gt;Trino 392&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-393.html&quot;&gt;Trino 393&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-trino-at-raft&quot;&gt;Concept of the episode: Trino at Raft&lt;/h2&gt;

&lt;p&gt;Raft provides consulting services and is particularly skilled at DevSecOps. One
particular challenge they face is dealing with fragmented government
infrastructure. In this episode, we dive in to learn how Trino enables Raft to
supply government sector clients with a data fabric solution. Raft takes a
special stance on using and contributing to open source solutions that run well
on the cloud.&lt;/p&gt;

&lt;h3 id=&quot;intro-to-software-factories&quot;&gt;Intro to software factories&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;A “software factory” is an organized approach to software development that
provides software design and development teams a repeatable, well-defined path
to create and update software. It results in a robust, compliant, and more
resilient process for delivering applications to production” 
– &lt;a href=&quot;https://tanzu.vmware.com/software-factory&quot;&gt;VMWare&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a push against the previous attempts from larger government contractors
who tried to build one-size-fits-all solutions that ultimately failed. The new
wave of government solutions relies on methodologies similar to the software
industry that append more rules and standards around technologies they can adopt
in the stack.&lt;/p&gt;

&lt;p&gt;Software factories are now a common practice for government agencies to use, as
they are able to take standardized software stacks that go through rigorous
validation to make sure the meet the standards of the government. One important
element to these stacks are that they can be deployed in virtually any
environment. A common way to do this is using Kubernetes and containers.&lt;/p&gt;

&lt;h3 id=&quot;standards-and-anatomy-of-a-stack&quot;&gt;Standards and anatomy of a stack&lt;/h3&gt;

&lt;p&gt;With the movement towards standardization, government contractors will generally
build their stack using Kubernetes templates. Kubernetes underpins each of these
stacks while telemetry, monitoring, and policy agents are layered on after that.
For Raft, they wanted to provide a “single pane of glass” over the existing
fragmented systems that the Department of Defense (DoD) operates on. They began
to develop a stack that included Trino as their method to connect data over
various silos.&lt;/p&gt;

&lt;h3 id=&quot;data-fabric-at-raft&quot;&gt;Data Fabric at Raft&lt;/h3&gt;

&lt;p&gt;Data Fabric is an attempt to provide government agencies the ability to set up
a data mesh that is backed by Trino. Trino fits well in this narrative as it
provides SQL-over-everything. Data analysts and data scientists only need to
know SQL.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Data Fabric MVP is an end-to-end DataOps capability that can be deployed at the
edge, in the cloud, and in disconnected environments within minutes. It provides
a single control plane for normalizing and combining disparate data lakes, 
platforms, silos, and formats into SQL using Trino for batch data and Apache 
Pinot for user facing streaming analytics.&lt;/p&gt;

  &lt;p&gt;Data Fabric is driven by cloud native policy using Open Policy Agent (OPA) 
integrated with Trino and Kafka to provide row and column level obfuscation. It
provides enterprise data catalog to view data lineage, properties, and data
owners from multiple data platforms. – &lt;a href=&quot;https://datafabric.goraft.tech/&quot;&gt;Raft&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;security-concerns-around-trino&quot;&gt;Security concerns around Trino&lt;/h3&gt;

&lt;p&gt;A common first question the Raft team gets asked is around Trino being a high
security concern. The idea that Trino can connect to multiple data sources from
one location brings up fear that individuals may gain access to information at
a higher classification level than they have. The team has to educate the
different users on the best practices and how to ensure this problem doesn’t
occur. You will need a separate deployment of Data Fabric for each 
classification level and correctly identify policies in OPA that restrict
visibility to information above a users’ clearance.&lt;/p&gt;

&lt;h3 id=&quot;iron-bank-container-repository&quot;&gt;Iron Bank container repository&lt;/h3&gt;

&lt;p&gt;Iron Bank is a central repository of digitally-signed container images, 
including open-source and commercial off-the-shelf software, hardened to the 
DoD’s exacting specifications. Approved containers in Iron Bank have DoD-wide 
reciprocity across all classifications, accelerating the security approval 
process from months or even years down to weeks.&lt;/p&gt;

&lt;p&gt;To be considered for inclusion into Iron Bank, container images must meet
rigorous DoD software security standards. It is an extensive, continuous,
complicated effort for even the most sophisticated IT teams. Continuously
maintaining and managing hardening pipelines while incorporating evolving DoD
specifications and addressing new vulnerabilities (CVEs) can severely stretch
your resources, even if you have advanced tooling and experience in-house. 
(&lt;a href=&quot;https://oteemo.com/accelerate-to-iron-bank/&quot;&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;The Trino Docker image 
&lt;a href=&quot;https://repo1.dso.mil/dsop?filter=trino&quot;&gt;is available in Iron Bank&lt;/a&gt; and is
maintained by folks at &lt;a href=&quot;https://www.boozallen.com/&quot;&gt;Booz Allen Hamilton&lt;/a&gt;. Their
hard work makes it possible for Trino to be deployed in DoD environments.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-13354-add-s3-select-pushdown-for-json-files&quot;&gt;Pull requests of the episode: PR 13354: Add S3 Select pushdown for JSON files&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/13354&quot;&gt;PR of the episode&lt;/a&gt; was 
contributed by &lt;a href=&quot;https://github.com/preethiratnam&quot;&gt;preethiratnam&lt;/a&gt;. This pull
request enables S3 pushdown during a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; operation for JSON files. The 
pushdown logic is restricted to only root JSON fields, similar to CSV. S3 select
does support nested column filtering on JSON files, which is planned for another
PR at a later time to limit the scope.&lt;/p&gt;

&lt;p&gt;It’s already expensive enough to query JSON files, as you pay a hefty penalty
for deserialization. This at least filters out a lot of rows. Thanks to 
&lt;a href=&quot;https://github.com/arhimondr&quot;&gt;Andrii Rosa &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arhimondr&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-running-great-expectations-on-a-trino-data-lakehouse-tutorial&quot;&gt;Demo of the episode: Running Great Expectations on a Trino Data Lakehouse Tutorial&lt;/h2&gt;

&lt;p&gt;For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance, 
Hive metastore, and an edge node where various data libraries like Great
Expectations can run. Clone the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-datalake&quot;&gt;trino-datalake&lt;/a&gt; 
repository and navigate to the root directory in your cli. Then 
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-datalake.git

cd trino-datalake

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The rest of the demo is available in 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-datalake/blob/main/tutorials/expecting-greatness-from-trino.md&quot;&gt;this markdown tutorial&lt;/a&gt;
and is covered in the video demo below.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/h6UYOilESfQ&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-episode-how-can-i-deploy-trino-on-kubernetes-without-using-helm-charts&quot;&gt;Question of the episode: How can I deploy Trino on Kubernetes without using Helm charts?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/C0305TQ05KL/p1660685654979289&quot;&gt;Full question from Trino Slack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This user was not able to use Helm, due to some restriction in his company. They
needed the raw kubernetes yaml files to deploy Trino.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; While there are very nice ways that Helm offers to directly deploy to
a service that understands Helm charts, you can also use Helm on your machine to
generate all the kubernetes yaml configuration files. This can be done using the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;helm template&lt;/code&gt; command. See more on this from the 
&lt;a href=&quot;https://trino.io/episodes/31.html&quot;&gt;Trinetes episode&lt;/a&gt; that details this command.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://datafabric.goraft.tech/&quot;&gt;Raft Data Fabric&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/raft_tech&quot;&gt;Raft Twitter&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/company/raft-tech/&quot;&gt;Raft LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://boards.greenhouse.io/raft&quot;&gt;Raft Jobs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://goraft.tech/2022/08/15/trino-sql-everything.html&quot;&gt;Trino - SQL to rule them all&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.airforce-technology.com/news/raft-wins-usaf-sbir-phase-iii-contract/&quot;&gt;Raft wins USAF SBIR Phase III contract for data centralisation services&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://goraft.tech/blog/&quot;&gt;Raft Blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>38: Trino tacks on polymorphic table functions</title>
      <link href="https://trino.io/episodes/38.html" rel="alternate" type="text/html" title="38: Trino tacks on polymorphic table functions" />
      <published>2022-07-21T00:00:00+00:00</published>
      <updated>2022-07-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/38</id>
      <content type="html" xml:base="https://trino.io/episodes/38.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode we have the pleasure to chat with a couple familiar faces who
have been hard at work building and understanding the features we’re talking
about today:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;Kasia Findeisen&lt;/a&gt;, Trino Maintainer&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;, Trino Cocreator and Maintainer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-387-to-391&quot;&gt;Releases 387 to 391&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html&quot;&gt;Trino 387&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for writing ORC Bloom filters for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; columns.&lt;/li&gt;
  &lt;li&gt;Support for querying Pinot via the gRPC endpoint.&lt;/li&gt;
  &lt;li&gt;Support for predicate pushdown on string columns in Redis.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OPTIMIZE&lt;/code&gt; on Iceberg tables with non-identity partitioning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html&quot;&gt;Trino 388&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for JSON output in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance for row data types.&lt;/li&gt;
  &lt;li&gt;Support for OAuth 2.0 refresh tokens.&lt;/li&gt;
  &lt;li&gt;Support for table and column comments in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-389.html&quot;&gt;Trino 389&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; type and aggregation.&lt;/li&gt;
  &lt;li&gt;Faster joins when spilling to disk is disabled.&lt;/li&gt;
  &lt;li&gt;Improved performance when writing non-structural types to Parquet.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt; table function for full query pass-through in Elasticsearch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-390.html&quot;&gt;Trino 390&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for setting comments on views.&lt;/li&gt;
  &lt;li&gt;Improved UNNEST performance.&lt;/li&gt;
  &lt;li&gt;Support for Databricks runtime 10.4 LTS in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-391.html&quot;&gt;Trino 391&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for AWS Athena partition projection.&lt;/li&gt;
  &lt;li&gt;Faster writing of Parquet data in Iceberg and Delta Lake.&lt;/li&gt;
  &lt;li&gt;Support for reading BigQuery external tables.&lt;/li&gt;
  &lt;li&gt;Support for table and column comments in BigQuery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights and notes according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;Java 17 arrived as required runtime in 390&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Remove support for Elasticsearch versions below 6.6.0, add testing for OpenSearch 1.1.0.&lt;/li&gt;
  &lt;li&gt;New raw query table function in Elasticsearch can replace old full text search and query pass-through support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html&quot;&gt;Trino 387&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html&quot;&gt;Trino 388&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-389.html&quot;&gt;Trino 389&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-390.html&quot;&gt;Trino 390&lt;/a&gt;,
and &lt;a href=&quot;https://trino.io/docs/current/release/release-391.html&quot;&gt;Trino 391&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-polymorphic-table-functions&quot;&gt;Concept of the episode: Polymorphic table functions&lt;/h2&gt;

&lt;p&gt;We normally cover a broad variety of topics in the Trino community broadcast,
exploring different technical details, pull requests, and neat things that are
going on in Trino at large. This episode, however, we’re going to be more
focused, only taking a look at a particular piece of functionality that we’re
all very excited about: polymorphic table functions, or PTFs for short. If
you’re unfamiliar with what this means, that can sound like technobabble word
soup, so we can start exploring this with a simple question…&lt;/p&gt;

&lt;h3 id=&quot;what-is-a-table-function&quot;&gt;What is a table function?&lt;/h3&gt;

&lt;p&gt;The easiest answer to this question is that it’s a function which returns a
table. Scalar, aggregate, and window functions all work a little differently,
but ultimately, they all return a single value each time they are invoked. Table
functions are unique in that they return an entire table. This gives them some
interesting properties that we’ll dive into, but it also means that you can only
invoke them in situations where you’d use a full table, such as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;foo&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also use table functions in joins:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;bar&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;another_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And while that’s all neat, it begs the question…&lt;/p&gt;

&lt;h4 id=&quot;what-can-you-do-with-table-functions&quot;&gt;What can you do with table functions?&lt;/h4&gt;

&lt;p&gt;While standard table functions are cool, they have to return a pre-defined
schema, which limits their flexibility. However, they still have some
interesting uses as means of shortening queries or performing multiple
operations at once. If you frequently find yourself selecting from the same
table with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause checking equality to a specific column but with a
different value each time, you could define a table function which takes that
value as a parameter and allows you to skip all the copying and pasting just for
the sake of one line changing. You could take an extremely lengthy sub-query
with multiple joins and abbreviate it to something as short as one of the
examples above, and then use that in other queries. Or, if you want to update a
table, but you also want to insert into another table as part of the same
operation, you could combine those two steps into one table function, ensuring
that users won’t forget the second part of that process.&lt;/p&gt;

&lt;p&gt;So table functions are functions that return tables. It really is that simple,
and we’re already two-thirds of the way to understanding what polymorphic table
functions are. And now it’s time to add in that fun ‘polymorphic’ word.&lt;/p&gt;

&lt;h3 id=&quot;what-makes-a-table-function-polymorphic&quot;&gt;What makes a table function polymorphic?&lt;/h3&gt;

&lt;p&gt;A polymorphic table function is a type of table function where the schema of
the returned table is determined dynamically. This means that the returned table
data, including its schema, can be determined by the arguments you pass to the
function. And you might imagine, that makes PTFs a lot more powerful than an
ordinary, run-of-the-mill table function.&lt;/p&gt;

&lt;h4 id=&quot;what-can-you-do-with-polymorphic-table-functions&quot;&gt;What can you do with polymorphic table functions?&lt;/h4&gt;

&lt;p&gt;When you’re not determining the schema of the returned table well in advance,
you get the flexibility to do some pretty crazy things. It can be as simple as
adding or removing columns as part of the function, or it can be as complex as
building and returning an entirely new table based on some input data.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-the-many-ways-you-can-leverage-ptfs&quot;&gt;Demo of the episode: The many ways you can leverage PTFs&lt;/h2&gt;

&lt;p&gt;But we’ve talked enough at a high level about what PTFs are, so now it’s a good
time to look at what PTFs can actually do for you to make your life as a Trino
user easier, better, and more efficient.&lt;/p&gt;

&lt;h3 id=&quot;possible-polymorphic-table-functions&quot;&gt;Possible polymorphic table functions&lt;/h3&gt;

&lt;p&gt;One thing to note - all the examples we’re about to look at are &lt;em&gt;hypothetical&lt;/em&gt;.
We’re working to bring functions similar to these to Trino soon, but there’s a
few things left to implement before we get there, so for now, this is meant to
highlight why we’re implementing PTFs, and we’ll take a look at what you can
currently do with them a little later. When it does come time to implement
these functions, they will not be exactly the same as you see them here.&lt;/p&gt;

&lt;h4 id=&quot;select-except&quot;&gt;Select except&lt;/h4&gt;

&lt;p&gt;Imagine a table with 10 columns, named col1, col2, col3, etc. If you want to
select all the columns except the first one from that table, you end up with a
query that looks like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col10&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;my&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But that’s long, and it’s a pain to type, and it gets messy, especially if your
column names aren’t extremely short due to being part of a contrived example.
With a simple PTF, you could get the same result with:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;excl_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columns_to_exclude&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;col1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, this isn’t a great PTF, because it’s going to take more time to implement
than it takes to just write out your column names, and at least when we’re using
only 10 columns and short column names, invoking the function takes more writing
than doing it the old-fashioned way. Also, this is going to perform worse than
writing the query the ordinary way. As a rule of thumb, if it can be written
with normal SQL, it will be more performant when done that way. There are plans
to work on optimizing PTFs, but that’s not going to happen soon, so for the time
being, we’re focusing on how they enable things which previously couldn’t
be done at all, rather than making queries look nicer or cleaner.&lt;/p&gt;

&lt;p&gt;All that said, we wanted to include this example because this does a good job at
demonstrating how polymorphic table functions can work and what they can do for
you. But it’s a simple example, and now we can look at some which are a little
more complex and a little more practical.&lt;/p&gt;

&lt;h4 id=&quot;csvreader&quot;&gt;CSVreader&lt;/h4&gt;

&lt;p&gt;If you’ve ever tried to create a table from a CSV file, you know it can be a
painful experience. It has to be very explicit, very diligent, and there’s a lot
of manual cross-checking involved in ensuring that each column aligns perfectly
and is correctly typed for the columns present in the CSV. Enter polymorphic
table functions, here to save the day.&lt;/p&gt;

&lt;p&gt;Remember, this is hypothetical, so by the time we get to implementing something
similar to this in Trino, it will certainly look different. But a table function
like this will be defined on the connector, so all the end user needs to worry
about is what its signature might look like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CSVreader&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Filename&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;FloatCols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DateCols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One key thing to note here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIPTOR&lt;/code&gt; type. It is a type that describes
a list of column names, and there will be a function to convert a parameterized
list to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIPTOR&lt;/code&gt; type. Other than that, everything else here does what
you’d expect - you pass the function the name of the CSV file, the columns which
should be typed as floats, and the columns which should have a date typing. All
unspecified columns will still be handled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt;. Calling the function
might look something like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;CSVreader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Filename&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;my_file.csv&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;FloatCols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;principle&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;interest&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;DateCols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;due_date&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Given a CSV with this content:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-csv&quot;&gt;docno,name,due_date,principle,interest
123,Alice,01/01/2014,234.56,345.67
234,Bob,01/01/2014,654.32,543.21
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Such a function would return a table that looks like:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;docno&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;due_date&lt;/th&gt;
      &lt;th&gt;principle&lt;/th&gt;
      &lt;th&gt;interest&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;2014-01-01&lt;/td&gt;
      &lt;td&gt;234.56&lt;/td&gt;
      &lt;td&gt;345.67&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;2014-01-01&lt;/td&gt;
      &lt;td&gt;654.32&lt;/td&gt;
      &lt;td&gt;543.21&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;With a well-written PTF, the days of toiling over parsing a CSV into SQL are
over!&lt;/p&gt;

&lt;h4 id=&quot;pivot&quot;&gt;Pivot&lt;/h4&gt;

&lt;p&gt;Pivot is an oft-requested feature which hasn’t been built in Trino because it
isn’t a part of the standard SQL specification. A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt; keyword or built-in
function isn’t planned, but with PTFs, we can support &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;-like functionality
without needing to deviate from SQL.&lt;/p&gt;

&lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt; PTF might have the following definition:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pivot&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PASS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THROUGH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SEMANTICS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Output_pivot_columns&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns4&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But before we look at how you can invoke this, there’s a few clauses here that
are worth explaining…&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PASS THROUGH&lt;/code&gt; means that the input data (and all of its rows) will be fully
available in the output. The alternative to this is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NO PASS THROUGH&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH ROW SEMANTICS&lt;/code&gt; means that the result will be determined on a row-by-row
basis. The alternative to this is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH SET SEMANTICS&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And of course, the function takes some parameters, so a good function author
defines what those parameters do.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;‘Input’ is the input table. It’s any generic table.&lt;/li&gt;
  &lt;li&gt;‘Output_pivot_columns’ is the names of the columns to be created in the pivot
table.&lt;/li&gt;
  &lt;li&gt;Input_pivot_columns are all the columns to be pivoted into the output columns.
The first parameter is required, but you can specify more groupings. The
number of input columns in a group to be pivoted and the number of output
columns must be the same.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you’ve got a PIVOT function, and you understand how to invoke it, so all you
need to do is listen to &lt;a href=&quot;https://youtu.be/8w3wmQAMoxQ?t=82&quot;&gt;Ross from Friends&lt;/a&gt;
and make it happen:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;acctvalue&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Pivot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;My&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Output_pivot_columns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_pivot_columns1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_pivot_columns2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we presume we have this data in My.Data:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;ID&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;accttype1&lt;/th&gt;
      &lt;th&gt;acctvalue1&lt;/th&gt;
      &lt;th&gt;accttype2&lt;/th&gt;
      &lt;th&gt;acctvalue2&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;20000&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;350&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;25000&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;120&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The output of that query will be:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;ID&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;accttype&lt;/th&gt;
      &lt;th&gt;acctvalue&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;20000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;350&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;25000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;120&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You can see the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PASS THROUGH&lt;/code&gt; clause in action when you select D.id and D.name.&lt;/p&gt;

&lt;h4 id=&quot;execr&quot;&gt;ExecR&lt;/h4&gt;

&lt;p&gt;As a bonus cherry on top, and as an example of something very fun that you can
do with PTFs, how about executing an entire script written in R?&lt;/p&gt;

&lt;p&gt;A connector could provide a function with the signature:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ExecR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Script&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PASS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THROUGH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SEMANTICS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Rowtype&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The inputs here are the script, which can simply be pasted into the query as
text, the input table which contains the data for the script to run on, and then
a descriptor for row typing, as there’s otherwise no way for the engine to know
after running the R script. Worth pointing out and contrary to the PIVOT
example, this function has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NO PASS THROUGH&lt;/code&gt; because the R script will not have
the ability to copy input rows into output rows.&lt;/p&gt;

&lt;p&gt;Invoking this function is relatively straightforward:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ExecR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Script&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;Input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;My&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Rowtype&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;REAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And depending on your script and your data, you can make this as simple or as
extreme as you’d like!&lt;/p&gt;

&lt;h2 id=&quot;pull-request-of-the-episode-pr-12325-support-query-pass-through-for-jdbc-based-connectors&quot;&gt;Pull request of the episode: PR 12325: Support query pass-through for JDBC-based connectors&lt;/h2&gt;

&lt;p&gt;We’ve spent a lot of time talking about hypothetical value that we will be able
to derive from polymorphic table functions sometime down the line, but we should
also pump the brakes a little and take a look at what we &lt;em&gt;already&lt;/em&gt; have in Trino
in terms of polymorphic table functions. This PR, authored by Kasia Findeisen,
was the first code to land in Trino that allowed access to PTFs. It’s just one
particular PTF, but it’s pretty neat, so we can jump into it with a demo and an
explanation for how we’re already changing the game with PTFs.&lt;/p&gt;

&lt;h3 id=&quot;demo-of-the-episode-2-using-connector-specific-features-with-query-pass-through&quot;&gt;Demo of the episode #2: Using connector-specific features with query pass-through&lt;/h3&gt;

&lt;p&gt;Trino sticks to the SQL standard, which means that custom extensions and syntax
aren’t supported. If you’re using a Trino connector where the underlying
database has a neat feature that isn’t a part of the SQL standard, you
previously were unable to take advantage of that, and you knew it wasn’t going
to be added to Trino. But now with query pass-through, you can leverage any of
the cool non-standard extensions that belong to connectors! We’ll look at a
couple different examples, but keep in mind, because this is pushing an entire
query down to the connector, the possibilities will be based on what the
underlying database is capable of.&lt;/p&gt;

&lt;h4 id=&quot;group_concat-in-mysql&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP_CONCAT()&lt;/code&gt; in MySQL&lt;/h4&gt;

&lt;p&gt;In a table where we have employees and their manager ID, but no direct way to
list managers with all their employees, we can push down a query to MySQL and
use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP_CONCAT()&lt;/code&gt; to combine them all into one column with this query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
  *
FROM
  TABLE(
    mysql.system.query(
      query =&amp;gt; &apos;SELECT
        manager_id, GROUP_CONCAT(employee_id)
      FROM
        company.employees
      GROUP BY
        manager_id&apos;
    )
  );
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;model-clause-in-oracle&quot;&gt;MODEL clause in Oracle&lt;/h4&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause in Oracle is an incredibly powerful way to manipulate and
view data. As it’s non-ANSI compliant, it’s specific to Oracle, but if you want
to use it, now you can! Through polymorphic table functions, you can generate
and perform sophisticated calculations on multidimensional arrays - try saying
that five times fast. We don’t have the time to explain everything about how
this feature works, but if you want clarification, you can check out
&lt;a href=&quot;https://docs.oracle.com/cd/B19306_01/server.102/b14223/sqlmodel.htm&quot;&gt;the Oracle documentation on MODEL&lt;/a&gt;
and try it out for yourself.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
  SUBSTR(country, 1, 20) country,
  SUBSTR(product, 1, 15) product,
  year,
  sales
FROM
  TABLE(
    oracle.system.query(
      query =&amp;gt; &apos;SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales[&apos;Bounce&apos;, 2001] = 1000,
          sales[&apos;Bounce&apos;, 2002] = sales[&apos;Bounce&apos;, 2001] + sales[&apos;Bounce&apos;, 2000],
          sales[&apos;Y Box&apos;, 2002] = sales[&apos;Y Box&apos;, 2001])
      ORDER BY
        country&apos;
    )
  );
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Funnily enough, Oracle also supports polymorphic table functions, so if you
wanted to, you could use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to then invoke a PTF in Oracle,
including any of the hypothetical examples we went into above! PTFs inside of
PTFs are possible! …though probably not the best idea.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-where-are-we-at-and-whats-coming-next&quot;&gt;Question of the episode: Where are we at, and what’s coming next?&lt;/h2&gt;

&lt;p&gt;Right now, there’s a few things on the radar for moving forward with PTFs. The
first and more simple task at hand is expanding the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to other
connectors. We started with the JDBC connectors, but we have also landed a
similar function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt; for ElasticSearch, are working on a BigQuery
implementation, and there may still be more yet to come.&lt;/p&gt;

&lt;p&gt;On a broader scope, the reason this was the first PTF that was implemented is
because Trino doesn’t have to do anything to make it work. The next big step in
powering PTFs up is to create an operator and make the engine aware of them, so
that the engine can handle and process PTFs itself, which will open the door to
the wide array of possibilities we explored earlier.&lt;/p&gt;

&lt;p&gt;And finally, once that’s done, we plan on empowering you, the Trino community,
to go out and actually &lt;em&gt;make&lt;/em&gt; some polymorphic table functions. You already can
implement them today, but with those limitations: you can’t use table or
descriptor arguments, and the connector has to perform the execution. But once
the full framework for PTFs has been built, those examples from earlier (and
many possible others) still need to be implemented. There is a
&lt;a href=&quot;https://trino.io/docs/current/develop/table-functions.html&quot;&gt;developer guide&lt;/a&gt; on
implementing table functions which exists today, but there are plans to expand
it so that it’s easier to go in and add the PTFs which will make a difference
for you and your workflows.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>37: Trino powers up the community support</title>
      <link href="https://trino.io/episodes/37.html" rel="alternate" type="text/html" title="37: Trino powers up the community support" />
      <published>2022-06-16T00:00:00+00:00</published>
      <updated>2022-06-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/37</id>
      <content type="html" xml:base="https://trino.io/episodes/37.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode we have the pleasure to chat with our colleagues, who now make 
the Trino community better every day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden/&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/n1neinchnick&quot;&gt;Jan Waś&lt;/a&gt;, Software Engineer at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/KostasPardalis&quot;&gt;Kostas Pardalis&lt;/a&gt;, Group Project Manager at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/Moni4489&quot;&gt;Monica Miller&lt;/a&gt;, Developer Advocate at Starburst&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-382-to-386&quot;&gt;Releases 382 to 386&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-382.html&quot;&gt;Trino 382&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for reading wildcard tables in the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for adding columns in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support updating Iceberg table partitioning.&lt;/li&gt;
  &lt;li&gt;Improved &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; performance in the MySQL, Oracle, and PostgreSQL connectors.&lt;/li&gt;
  &lt;li&gt;Basic authentication in the Prometheus connector.&lt;/li&gt;
  &lt;li&gt;Exchange spooling on Google Cloud Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-383.html&quot;&gt;Trino 383&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_exists&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_query&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_value&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Support for table comments in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support IAM roles for exchange spooling on S3.&lt;/li&gt;
  &lt;li&gt;Improved performance for aggregation queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-384.html&quot;&gt;Trino 384&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for new pass-through query table function for Druid, MariaDB, MySQL,
Oracle, PostgreSQL, Redshift, SingleStore and SQL Server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html&quot;&gt;Trino 385&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_array&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_object&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Support for time travel syntax in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp(p)&lt;/code&gt; type in MariaDB connector.&lt;/li&gt;
  &lt;li&gt;Performance improvements in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html&quot;&gt;Trino 386&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for fault-tolerant query execution&lt;/li&gt;
  &lt;li&gt;Faster queries on Delta Lake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;383 had a regression, don’t use it.&lt;/li&gt;
  &lt;li&gt;As mentioned last time, exchange spooling is now supported on the three major
cloud object storage systems.&lt;/li&gt;
  &lt;li&gt;Query pass-through table function is a massive feature. We are adding this to
other connectors, and more details are coming in a future special episode.&lt;/li&gt;
  &lt;li&gt;Special props to &lt;a href=&quot;https://github.com/kasiafi&quot;&gt;Kasia&lt;/a&gt; for all the new JSON functions.&lt;/li&gt;
  &lt;li&gt;Phoenix 4 support is gone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-382.html&quot;&gt;Trino 382&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-383.html&quot;&gt;Trino 383&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-384.html&quot;&gt;Trino 384&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html&quot;&gt;Trino 385&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html&quot;&gt;Trino 386&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-to-strengthen-the-trino-community&quot;&gt;Concept of the episode: How to strengthen the Trino community&lt;/h2&gt;

&lt;p&gt;What is community, and why has this word seen more use around technical projects,
particularly those in the open-source space. There’s really no formal definition
of community in the context of technology. David Spinks, author of the book, 
“The Business of Belonging”, defines community as:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A group of people who feel a shared sense of belonging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For technical projects, this sense of belonging generally comes from the shared
affinity towards a specific product, like Trino, or it could be a brand that
hosts many products, like Google or Microsoft. There’s a lot that could be 
discussed here regarding why communities have become an essential ingredient to
a project’s success. The quick answer I like to offer is that projects,
open-source or proprietary, that have strong communities behind them
innovate and grow faster, and are more successful overall.&lt;/p&gt;

&lt;p&gt;As such, the Trino Software Foundation (TSF) recognizes that Trino will only be
as successful as the health of the community that builds, tests, uses, and 
shares it. The activities around building a technical community fall in between
engineering, marketing, and customer enablement. A common name that encompasses
the individuals that work in this space is developer relations, DevRel for
short. The goal of our work with the maintainers, contributors, users, and all 
other members of the community is the following:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Grow all aspects of the Trino project, and the Trino community to empower
current and future members of the community.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We introduce some new faces who are stewards in our journey to growing the
adoption of our favorite query engine, what each of them does, and how their
work impacts you as a community member! Most importantly, you can learn how to
get involved and help us learn how to best navigate ideas, issues, or any other
contributions you may have that helps Trino to be the best query engine.&lt;/p&gt;

&lt;h3 id=&quot;improving-the-onboarding-and-getting-started-pages&quot;&gt;Improving the onboarding and getting started pages&lt;/h3&gt;

&lt;p&gt;We don’t really have a seamless onboarding experience for new users. Many 
members have asked questions on where to get started. One logical place people
tend to go to when browsing on the front page of the Trino site is the 
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;getting started tab&lt;/a&gt;, which is ironically 
still on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino.io/download.html&lt;/code&gt; page. When you open this page, you are
brought to a page primarily containing the latest binary downloads, some
community links, and some reading material to books and other resources.&lt;/p&gt;

&lt;p&gt;The main thing you don’t really see is much getting started material. A lot of
the material is intermediate level at best. There is not much beginner level
guides to offer the self-service onboarding many are looking for when they just
want to play around without having to bother or wait for anyone to respond. As
it stands today, there is some work that Brian and Monica have started to create
in this area to make the onboarding simpler.&lt;/p&gt;

&lt;p&gt;A very common self-service getting started material is the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;trino-getting-started&lt;/a&gt;
repo that Brian created to host demonstrations for the broadcast
to show off some new feature or connector capabilities. This has been a good
way to offer a simple environment to get them started. The only way
to find this repository is to ask someone first. It would be ideal to showcase
getting started materials as part of the default experience of learning about
Trino.&lt;/p&gt;

&lt;p&gt;Monica is working now on building up some demos using SaaS products like
Starburst Galaxy as another method of using Trino without needing to install
Docker among having to use any of your hardware to run through some examples.
These options are typically more UI driven and much more approachable by
members of the community that aren’t engineers or administrators.&lt;/p&gt;

&lt;h3 id=&quot;release-process&quot;&gt;Release process&lt;/h3&gt;

&lt;h4 id=&quot;filling-out-a-pull-request&quot;&gt;Filling out a pull request&lt;/h4&gt;

&lt;p&gt;We’ve got a handy PR template that exists for all contributors to use when 
they’ve submitting a pull request to Trino. Most of it is simple and
self-explanatory. We ask you to describe what’s happening, where the change is
happening, and what type of change it is. These are for the sake of the
reviewers, giving them some important context so they understand what’s going on
when they review the code. For simpler changes, it’s not usually necessary to go
into a ton of detail here, but it’s nice to give a little summary for anyone looking at the PR.&lt;/p&gt;

&lt;p&gt;The next steps are what really matter for every single PR that’s going to be
merged - the documentation and release notes for a change. These are about
communicating to our users. Documentation refers to Trino docs, not code
comments. If Trino users need to be told how to use the feature you’re
changing because of how you’re changing it, that means we need to have
documentation for it. The PR template gives the options for how to go about
this, but it’s incredibly helpful to have this filled out. Similarly, we ask
whether or not release notes are necessary for the change, and what release
notes you propose for your change. Generally speaking, if it needs to be
documented, it almost always should have a release note. Even if it isn’t
documented, a release note is often a good idea - things like performance
improvements don’t require our users to change how they use Trino, but they
won’t mind knowing that something has gotten better! The release process
involves heavy editing of release notes, so it’s ok for the suggested note to be
imperfect.&lt;/p&gt;

&lt;h3 id=&quot;what-is-developer-experience-devex&quot;&gt;What is developer experience (DevEx)?&lt;/h3&gt;

&lt;p&gt;Trino is a technology that is built by developers, but also heavily used by 
developers. We want to ensure that the experience of both contributors and users
of Trino is the best possible. To do that, we have to focus on many different
aspects of this experience, from committing code to the CLIs and tools we offer
for debugging queries and most importantly to building a sustainable community
that can give answers and drive the future of the project. This is what DX is
for Trino.&lt;/p&gt;

&lt;h3 id=&quot;community-metrics&quot;&gt;Community metrics&lt;/h3&gt;

&lt;p&gt;A while ago we started gathering metrics related to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino GitHub repository&lt;/a&gt;.
This helped us identify issues like huge CI queue times. Most importantly we. can verify
that the changes we made improved things, and how much.&lt;/p&gt;

&lt;p&gt;In February this year, the 95th percentile of the CI queue time (not even the 
total run time!) was as high as almost 7 hours. Trino uses public GitHub runners,
and there can only be 60 jobs running concurrently at the same time. This is a 
bottleneck because Trino has extensive test coverage for the core engine, all
connectors, and other plugins. Because we can’t increase the number of runners,
we looked into doing impact analysis to skip tests for modules not impacted by
any change in a pull request.&lt;/p&gt;

&lt;p&gt;Since April, the 95th percentile of the CI queue time is under 1 hour, even 
though the number of contributions is at an all-time high.&lt;/p&gt;

&lt;p&gt;We keep track of these selected metrics in reports we create by running queries 
using the Trino CLI, saving the results in a markdown file, and publishing them 
as static pages using GitHub pages. The data is gathered using 
Trino connectors for the GitHub API and Git repositories. There’s a GitHub
actions workflow running on a schedule, that spins up a Trino server, so there’s no
infrastructure to maintain, except for a single S3 bucket. All of it is publicly
available in the &lt;a href=&quot;https://github.com/nineinchnick/trino-cicd&quot;&gt;nineinchnick/trino-cicd&lt;/a&gt; 
repository. On the right, there’s a link to GitHub pages with reports.&lt;/p&gt;

&lt;p&gt;We continue to add more reports, like tracking flaky tests or pull request 
activity:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://nineinchnick.github.io/trino-cicd/reports/flaky/&quot;&gt;Flaky tests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://nineinchnick.github.io/trino-cicd/reports/pr/&quot;&gt;Pull request activity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By being data-driven and transparent, we make sure to provide a good 
experience for everyone, and this also helps us figure out where we need more 
resources to focus on.&lt;/p&gt;

&lt;p&gt;We’re open to suggestions on what to track and which metrics to report on, so 
feel free to open issues and pull requests in the repository mentioned above, or
start a thread on the Trino Slack.&lt;/p&gt;

&lt;h3 id=&quot;pull-request-triage&quot;&gt;Pull request triage&lt;/h3&gt;

&lt;p&gt;One of the things we’ve been tracking over the last couple weeks has been the 
state of incoming PRs. We want to make sure that
each PR reaches a maintainer, and that they all receive timely feedback after 
asking for a review. The goal in looking into this process is to help 
streamline and improve the time-to-initial-comment. The pleasant discovery 
is that it doesn’t seem like we have a lot of room to improve on that front. Not
to pat ourselves on the back too heavily, but PRs find their way to maintainers,
and get an initial review quite quickly, and there’s little work to be done on
that front.&lt;/p&gt;

&lt;p&gt;Our next exploration is tracking PRs that don’t quickly get
approved and merged, and monitoring their life cycle and making sure follow-up
reviews are happening in a timely manner as well. We now know that we are
effective at giving initial feedback on a PR, but we also want to make sure that
these PRs aren’t falling off a cliff or turning into a long, drawn-out process
where each development iteration is slower than the last.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-12259-support-updating-iceberg-table-partitioning&quot;&gt;Pull requests of the episode: PR 12259: Support updating Iceberg table partitioning&lt;/h2&gt;

&lt;p&gt;This months &lt;a href=&quot;https://github.com/trinodb/trino/issues/12259&quot;&gt;PR of the episode&lt;/a&gt; 
was contributed by &lt;a href=&quot;https://github.com/alexjo2144&quot;&gt;alexjo2144&lt;/a&gt;. This feature is
an exciting update on the ability to modify the partition specification of a 
table in Iceberg. This is an update since Brian 
&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;wrote about this feature&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;At the time of writing, Trino is able to perform reads from tables that have 
multiple partition spec changes but partition evolution write support does not
yet exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This brings us much closer to having more feature parity with other query 
engines to manage Iceberg tables entirely through Trino. Thanks to our friend 
&lt;a href=&quot;https://github.com/findinpath&quot;&gt;Marius Grama &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;findinpath&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-iceberg-table-partition-migrations&quot;&gt;Demo of the episode: Iceberg table partition migrations&lt;/h2&gt;

&lt;p&gt;For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance,
and Hive metastore backed by a database. Clone the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;trino-getting-started&lt;/a&gt; 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then 
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This demo is actually very similar to a demo we did in 
&lt;a href=&quot;/episodes/15.html&quot;&gt;episode 15&lt;/a&gt;, except now we get to showcase one of Iceberg’s
most exciting features, partition evolution.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Make sure to first create a bucket names &quot;logging&quot; in MinIO before running
 */

CREATE SCHEMA iceberg.logging
WITH (location = &apos;s3a://logging/&apos;);

CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;]
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 12:23:53.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;1 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 13:36:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;2 message&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs 
SET PROPERTIES partitioning = ARRAY[&apos;hour(event_time)&apos;];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;3 message&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;4 message&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;5 message&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs 
WHERE event_time &amp;lt; timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;;

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. 
 */

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There’s been a lot of cool things going into the Iceberg connector these days,
and another exciting one that came out in release 381 was the support for 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/12026&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; in Iceberg&lt;/a&gt;. So we’re 
gonna showcase that:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || &apos;WHALE HELLO THERE!&apos;
WHERE
  lower(level) = &apos;warn&apos;;

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-episode-can-i-force-a-pushdown-join-into-a-connected-data-source&quot;&gt;Question of the episode: Can I force a pushdown join into a connected data source?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.trinoforum.org/t/forcing-push-down-join-into-connected-data-source/177&quot;&gt;Full question from Trino Forum&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Is there a way to “quote” a sub query, to tell the Trino planner just pushdown 
the query and don’t bother making a sub plan?&lt;/p&gt;

&lt;p&gt;I have a star schema, with one huge table (&amp;gt;100M rows) and a dimension table 
that has static attributes of the huge table.
The dimension table is filtered to create a map, that is joined to the huge 
table. The result is group by on a dimension and finally some of the metrics 
from the huge table are aggregated to calculate stats.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; We’ve recently introduced Polymorphic Table Functions to Trino in 
version 381.&lt;/p&gt;

&lt;p&gt;In version 384, which was just released a few days ago, the query table function
was added in PR 12325.&lt;/p&gt;

&lt;p&gt;For a quick example in MySQL:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; USE mysql.tiny;
USE
trino:tiny&amp;gt; SELECT * FROM TABLE(system.query(query =&amp;gt; &apos;SELECT 1 a&apos;));
a
---
1
(1 row)

trino:tiny&amp;gt; SELECT * FROM TABLE(system.query(query =&amp;gt; &apos;SELECT @@version&apos;));
@@version
-----------
8.0.29
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So this will run exactly the command on the underlying database (not exactly a 
pushdown but a pass-through) and return the results to Trino as a Table. 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT @@version&lt;/code&gt; is MySQL specific syntax that returns MySQL output as a table
that now Trino is able to further process.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>36: Trino plans to jump to Java 17</title>
      <link href="https://trino.io/episodes/36.html" rel="alternate" type="text/html" title="36: Trino plans to jump to Java 17" />
      <published>2022-05-19T00:00:00+00:00</published>
      <updated>2022-05-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/36</id>
      <content type="html" xml:base="https://trino.io/episodes/36.html">&lt;h2 id=&quot;releases-379-to-381&quot;&gt;Releases 379 to 381&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-379.html&quot;&gt;Trino 379&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New MariaDB connector&lt;/li&gt;
  &lt;li&gt;Performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for Google Cloud Storage in the Delta Lake connector&lt;/li&gt;
  &lt;li&gt;Support for Pinot 0.10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-380.html&quot;&gt;Trino 380&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update Cassandra connector to support v5 and v6 protocols.&lt;/li&gt;
  &lt;li&gt;Rename properties controlling Hive view parsing.&lt;/li&gt;
  &lt;li&gt;Allow changing file and table format with the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for bulk data insertion in SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-381.html&quot;&gt;Trino 381&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Experimental support for table functions.&lt;/li&gt;
  &lt;li&gt;Support for exchange spooling on Azure Blob Storage.&lt;/li&gt;
  &lt;li&gt;Support reading snapshot tables and materialized views in BigQuery connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Next is exchange spooling on &lt;a href=&quot;https://github.com/trinodb/trino/pull/12360&quot;&gt;Google Cloud Storage&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Framework for table functions is in place, implementations in connectors are coming.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ldap.ssl-trust-certificate&lt;/code&gt; as legacy config removes upgrade failures.&lt;/li&gt;
  &lt;li&gt;Introduce the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;least-waste&lt;/code&gt; low memory task killer policy.&lt;/li&gt;
  &lt;li&gt;Disable auto-suggestion in CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-379.html&quot;&gt;Trino 379&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-380.html&quot;&gt;Trino 380&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-381.html&quot;&gt;Trino 381&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;cinco-de-trino-recap-blog-post&quot;&gt;Cinco de Trino recap blog post&lt;/h3&gt;

&lt;p&gt;Check out this blog post that details all the cool talks that took place at 
&lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de Trino&lt;/a&gt; and
includes video resources. This was a mini version of the Trino Summit which will
take place later this year.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-will-trino-be-making-a-vectorized-c-version-of-trino-workers&quot;&gt;Question of the episode: Will Trino be making a vectorized C++ version of Trino workers?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1638450883102500&quot;&gt;Full question from Trino Slack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; Writing a C++ worker would require each plugin to be implemented in
C++ as well. However, you don’t need C++ for vectorization. Java already does a
technique called &lt;a href=&quot;https://web.archive.org/web/20211111020334/http://daniel-strecker.com/blog/2020-01-14_auto_vectorization_in_java/&quot;&gt;auto-vectorization&lt;/a&gt;
which we will demonstrate later in the show! Java 17 also introduces the new 
&lt;a href=&quot;https://openjdk.java.net/jeps/414&quot;&gt;Vector API&lt;/a&gt; which unlocks complex usage 
patterns that we can invest in moving forward. However, there’s so much more to
making operations faster than just bare metal speed that we are going to focus
on.&lt;/p&gt;

&lt;p&gt;To demonstrate this, I’d like to use an analogy about how I think of this. 
Comparing C++ and Java implementation is like comparing the two fastest men in 
the world. Usain Bolt holds the most world records for mens track to this
date, and teammate Yohan Blake holds many of the second place titles. Most of us
know Usain Bolt is the fastest of the two, and you may not have known or
remembered Yohan’s name before. Want to hear something crazy, Yohan has beaten 
Usain Bolt in a few races. The two are so close in speed, it’s seconds to 
milliseconds difference. The main difference in this analogy is that speed is
the only thing that matters in an olymic race. Howver, programming languages and
frameworks have a lot more tradeoffs.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/usain-bolt-yohan-blake.webp&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The point is, Java is fast and more importantly, it removes a lot of burden
maintaining and scaling out the code. This is conducive to a healthy open-source
project, and lowers the barrier for collaboration. Rather than go against this 
and take on the feat of having to rewrite an entire system in C++, why not lean
into the incredible innovation recent Java features have to offer to improve
performance even more.&lt;/p&gt;

&lt;p&gt;Another important aspect is rather than chasing the fastest bare metal speed,
it’s also incredibly important to dedicate time into ensuring that Trino’s
optimizer is producing the best possible plans to avoid doing unnecessary work.
To continue with the analogy, in a 100m race on a 400m track, imagine we have
Usain and Yohan go head to head. We may expect that Usain will likely win, given
his track record. However, if Usain is given the wrong instructions and runs in
the wrong direction (300m), my bets are that Yohan will win the race.&lt;/p&gt;

&lt;p&gt;In essence, the direction of Trino while still including bare metal performance
improvements in the JVM, will instead focus on not wasting time with suboptimal
query plans before or during runtime. There are so many optimizations that are
constantly being added to every release that ultimately makes for a 
work-smarter-not-harder query engine.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-java-17-and-rearchitecting-trino&quot;&gt;Concept of the episode: Java 17 and rearchitecting Trino&lt;/h2&gt;

&lt;p&gt;As Trino prepares to &lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;update to Java 17&lt;/a&gt;,
we wanted to give a glimpse at what has happened between the current required
JDK version, JDK 11, and future version JDK 17. Both of these versions are
long-term support versions, and in the four years from 11 to 17 
&lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;a lot of exciting improvements were added&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;java-17-updates&quot;&gt;Java 17 updates&lt;/h3&gt;

&lt;p&gt;Here are some &lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;updates coming up in Java 17&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;performance&quot;&gt;Performance&lt;/h4&gt;

&lt;p&gt;There were several JDK Enhancement Proposals (JEP) that improve performance as
well as many small changes to the JVM:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/339&quot;&gt;JEP 339&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/352&quot;&gt;JEP 352&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/356&quot;&gt;JEP 356&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/387&quot;&gt;JEP 387&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/412&quot;&gt;JEP 412&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Performance is a multifaceted topic that includes factors like throughput, 
latency, memory footprint, startup, ramp up, pause times, and shut down time.&lt;/p&gt;

&lt;p&gt;You can used standardized benchmarks like 
&lt;a href=&quot;https://www.spec.org/jbb2015/&quot;&gt;SPECjbb® 2015&lt;/a&gt; to test a Java application in 
most of these performance factors. Aside from the formalized benchmarks, it’s 
interesting to see the Java community come up with microbenchmarks to test 
relative speedups of JVMs on their own applications.
&lt;a href=&quot;https://www.optaplanner.org/blog/2021/09/15/HowMuchFasterIsJava17.html&quot;&gt;This user benchmark&lt;/a&gt;
found an 8.66% improvement in speed when using hte G1 garbage collector. They
isolated modules of their application to measure each microbenchmark separately.&lt;/p&gt;

&lt;p&gt;Martin did a similar test late last year, and reported anywhere from 10-15% 
improvement in speed in Java 17 using the G1 garbage collector. This is an 
exciting development and we hope to publish more about this as we get closer to 
updating.&lt;/p&gt;

&lt;h4 id=&quot;garbage-collectors&quot;&gt;Garbage collectors&lt;/h4&gt;

&lt;p&gt;Although garbage collectors are performance enhancements in their own right, 
there is a lot of exciting changes around garbage collectors in Java 17 since 
Java 11 which earns garbage collectors their own section.&lt;/p&gt;

&lt;p&gt;First not one, but two concurrent garbage collectors have made their way out of
incubation, and are ready for use.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/377&quot;&gt;JEP 377: ZGC: A Scalable Low-Latency Garbage Collector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/379&quot;&gt;JEP 379: Shenandoah: A Low-Pause-Time Garbage Collector&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aside from that, there are a bunch of big improvements to G1.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/344&quot;&gt;JEP 344: Abortable Mixed Collections for G1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/345&quot;&gt;JEP 345: NUMA-Aware Memory Allocation for G1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/346&quot;&gt;JEP 346: Promptly Return Unused Committed Memory from G1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;fantastic writeup and benchmark&lt;/a&gt;
by Stefan Johansson, they ran the &lt;a href=&quot;https://www.spec.org/jbb2015/&quot;&gt;SPECjbb® 2015&lt;/a&gt;
to evaluate the improvements of different garbage collectors over different LTS
versions.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/throughput.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/latency.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Pay attention to this chart, as it showcases the advantage of having a 
concurrent garbage collector like ZGC or Shenandoah that doesn’t interfere with
your application code. It’s incredible that 99% of the GC operations only took 
0.1ms. Wild!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/p99-pause.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/footprint.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Take particular note of the massive improvement of G1. This is especially 
exciting because G1 is recommended for Trino usage. It’s still too early to 
determine if ZGC or Shenendoah will have overall better performance depending on
the context in which the JVM is running. One thing to look forward to is the 
incredible drop in memory footprint over the different versions!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/g1-memory-footprint.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://www.youtube.com/watch?v=0BpY132mKm0&quot;&gt;Java YouTube Channel&lt;/a&gt;
&lt;/p&gt;

&lt;h4 id=&quot;vector-api-2nd-incubator-status&quot;&gt;Vector API (2nd incubator status)&lt;/h4&gt;

&lt;p&gt;One available capability that is still incubating is the 
&lt;a href=&quot;https://openjdk.java.net/jeps/414&quot;&gt;Vector API&lt;/a&gt;. Trino currently takes advantage
of the auto-vectorization that comes for free when the compiler detects that a
loop like this one used from Daniel Strecker’s
&lt;a href=&quot;https://web.archive.org/web/20211111020334/http://daniel-strecker.com/blog/2020-01-14_auto_vectorization_in_java/&quot;&gt;auto-vectorization blog&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/**
 * Run with this command to show native assembly:&amp;lt;br/&amp;gt;
 * Java -XX:+UnlockDiagnosticVMOptions
 * -XX:CompileCommand=print,VectorizationMicroBenchmark.square
 * VectorizationMicroBenchmark
 */&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VectorizationMicroBenchmark&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;square&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// line 11&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// repeatedly invoke the method under test. this&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// causes the JIT compiler to optimize the method&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;square&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without auto-vectorization, a command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vmulss&lt;/code&gt; (multiply scalar 
single-precision) versus with auto-vectorization the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vmulps&lt;/code&gt; (multiply packed
single-precision) which is a SIMD instruction the JIT compiler updated for us
without manual intervention.&lt;/p&gt;

&lt;p&gt;However, this isn’t always so straightforward to detect. As you can see from the
comments in the example, special criteria need to be met. For this, you can use
the Vector API to directly interface with SIMD and GPU instructions. We will 
show more on this in the demo.&lt;/p&gt;

&lt;h4 id=&quot;language-features&quot;&gt;Language features&lt;/h4&gt;

&lt;p&gt;Beyond the performance improvements, Java 17 includes some exciting new Java 
language updates and improvements. While some may not consider this as exciting
as performance boosts, language enhancements make it easier to write higher 
quality and maintainable code. This is especially important for an open source 
project that is maintained by many individuals.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A very useful change for Trino is the new support for 
&lt;a href=&quot;https://openjdk.java.net/jeps/378&quot;&gt;multiline text blocks&lt;/a&gt;. This allows you to 
go from having to write a SQL query represented in a one-dimensional string 
literal like this:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  String query = &quot;SELECT \&quot;emp_id\&quot;, \&quot;last_name\&quot; FROM \&quot;employee\&quot;\n&quot; +
                 &quot;WHERE \&quot;city\&quot; = &apos;Indianapolis&apos;\n&quot; +
                 &quot;ORDER BY \&quot;emp_id\&quot;, \&quot;last_name\&quot;;\n&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;to a much more readable two-dimensional string block like this:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  String query = &quot;&quot;&quot;
                 SELECT &quot;emp_id&quot;, &quot;last_name&quot; FROM &quot;employoee&quot;
                 WHERE &quot;city&quot; = &apos;Indianapolis&apos;
                 ORDER BY &quot;emp_id&quot;, &quot;last_name&quot;;
                 &quot;&quot;&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The new &lt;a href=&quot;https://openjdk.java.net/jeps/361&quot;&gt;switch expressions&lt;/a&gt; remove the
difficult-to-read syntax of switches that led to many bugs and confusing code
in the past. Particularly the ambiguity of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;break;&lt;/code&gt; statement logic:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  switch (day) {
      case MONDAY:
      case FRIDAY:
      case SUNDAY:
          System.out.println(6);
          break;
      case TUESDAY:
          System.out.println(7);
          break;
      case THURSDAY:
      case SATURDAY:
          System.out.println(8);
          break;
      case WEDNESDAY:
          System.out.println(9);
          break;
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;is made much easier to reason about using a functional clause to define the
  correct code to execute for a set of labels:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  switch (day) {
      case MONDAY, FRIDAY, SUNDAY -&amp;gt; System.out.println(6);
      case TUESDAY                -&amp;gt; System.out.println(7);
      case THURSDAY, SATURDAY     -&amp;gt; System.out.println(8);
      case WEDNESDAY              -&amp;gt; System.out.println(9);
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Always having to do a cast after checking for a type has always been an
annoyance to many Java developers. 
&lt;a href=&quot;https://openjdk.java.net/jeps/394&quot;&gt;Pattern Matching for instanceof&lt;/a&gt; makes this
go away. Look at this example you may be familiar with:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  if (obj instanceof String) {
      String s = (String) obj;    // grr...
      ...
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;Now imagine, you don’t have to have a cast statement for every one of these
  laying around in your codebase:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  if (obj instanceof String s) {
      // Let pattern matching do the work!
      ...
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/358&quot;&gt;Helpful NullPointerExceptions&lt;/a&gt; are
particularly exciting as the ever confusing nulls for no reason don’t come up, 
and require you to chase down where it happened in the code. Instead there is
new information added to the message that ideally gives you a more unique
message.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;rearchitecting-trino&quot;&gt;Rearchitecting Trino&lt;/h3&gt;

&lt;p&gt;With all these exciting changes, what does this mean for Trino? Let’s first dive 
into the thing that many of our users dread…upgrading.&lt;/p&gt;

&lt;h4 id=&quot;upgrade-to-java-17-when-its-time&quot;&gt;Upgrade to Java 17 (When it’s time)&lt;/h4&gt;

&lt;p&gt;As mentioned before, Java 17 is the current LTS version, following Java 11. Java
17 provides significant improvements that we outlined before. We believe that 
once we update, everyone should be running version 17 to get the best experience
out of Trino. Moving to Java 17 allows us to take advantage of many improvements
to the JDK and the Java language that were introduced since Java 11. There are 
some reasons people say they can’t update.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Updating Java in all the clients and code that calls Trino is tedious.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;You luckily only need to update the server that Trino is running on. The
 client or CLI can still run any version of Java.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There are conflicting Java versions on the node Trino servers run.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;If you are running another application depending on Java you shouldn’t be.
 Ideally Trino runs on its own servers. If there’s a smaller application to,
 for example, monitor Trino, then you should be able to install a separate
 version of Trino.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There is a company policy requiring specific JDKs be installed on all 
 servers.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;You can have side-by-side installs of multiple versions of the JDK and use 
 the appropriate one. You just need to launch Trino with the correct Java&lt;br /&gt;
 command. If your company is against using a newer JDK, you can point out the
 arguments above to update the policy to at least include JDK17.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;iterating-and-improving-trino&quot;&gt;Iterating and improving Trino&lt;/h4&gt;

&lt;p&gt;We’re also in the process of revamping the core execution engine, which 
enables us to implement the following improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Perform adaptive evaluation of expressions based on runtime cost.&lt;/li&gt;
  &lt;li&gt;Specialize evaluation for different data encodings (rle, dictionary, etc).&lt;/li&gt;
  &lt;li&gt;Implement tighter evaluation loops that make it easier for the VM to vectorize
automatically and generate better machine code.&lt;/li&gt;
  &lt;li&gt;Implement evaluation of certain operations more efficiently by taking 
advantage of SIMD or GPU-based processing.&lt;/li&gt;
  &lt;li&gt;Columnar evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h4&gt;

&lt;p&gt;Just as we did with the efforts around 
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;Project Tardigrade&lt;/a&gt; we
want to centralize these efforts under a project name that includes a set
of motivated community members and give it a cool name.&lt;/p&gt;

&lt;p&gt;After some discussion, we would like to announce &lt;em&gt;*Project Hummingbird*&lt;/em&gt; is the
new banner for the efforts around improving performance and concentrated updates
to the core of Trino.&lt;/p&gt;

&lt;p&gt;We chose hummingbirds as mascots because they are adaptive, light, and fast. 
Hummingbirds are the only birds with the incredible capability to fly in any 
direction and are super fast. It made sense as Trino evolves into a query engine
that is capable of adapting to its environment during query runtime, it is akin
to these agile and beautiful creatures.&lt;/p&gt;

&lt;h4 id=&quot;vectorization-is-not-a-silver-bullet&quot;&gt;Vectorization is not a silver bullet&lt;/h4&gt;

&lt;p&gt;There are many ways to parallelize the operations that we run on the Trino
server. There’s inter-node parallelization which split data to be operated on
across nodes. There’s intra-node parallelization, which generally refers to
multithreading across a CPU.&lt;/p&gt;

&lt;p&gt;As we start to move towards vectorizations, we start to become hardware 
dependent and just like with any other hardware setting, your mileage may vary
depending on the limitations of the resources Trino is running on.&lt;/p&gt;

&lt;p&gt;Further, any time parallelization is applied, there is generally some overhead 
to coordinate lookups, shuffling more data across processors, etc..&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-4649-disable-jit-byte-code-recompilation-cutoffs-in-default-jvmconfig&quot;&gt;Pull requests of the episode: PR 4649: Disable JIT byte code recompilation cutoffs in default jvm.config&lt;/h2&gt;

&lt;p&gt;This episodes &lt;a href=&quot;https://github.com/trinodb/trino/pull/4649&quot;&gt;pull request&lt;/a&gt; was
added by &lt;a href=&quot;https://github.com/shubhamtagra&quot;&gt;Shubham Tagra&lt;/a&gt; to increase the amount
of memory needed to avoid JIT recompilation cutoffs for large methods in the
JVM. If these limits are hit, the JIT compiler calls an uncommon_trap to
deoptimize the code. If the function is continually retried, continuous deopt or
a “deopt storm” can occur, and can cause a large CPU loss. The handling of this
is actually a bug in the JVM so this pull request provided a workaround.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This had been reported by multiple companies from 
&lt;a href=&quot;/blog/2021/10/06/jvm-issues-at-comcast.html&quot;&gt;Comcast&lt;/a&gt; to 
&lt;a href=&quot;https://shopify.engineering/faster-trino-query-execution-infrastructure&quot;&gt;Shopify&lt;/a&gt; 
that had these “random slowness” issues that were resolved when these JVM
settings were added.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-fizzbuzz---simd-style&quot;&gt;Demo of the episode: FizzBuzz - SIMD style!&lt;/h2&gt;

&lt;p&gt;Today I’m stealing, no wait, borrowing a project created by our friend 
&lt;a href=&quot;https://twitter.com/gunnarmorling&quot;&gt;Gunnar Morling&lt;/a&gt;. This showcases the well
known &lt;a href=&quot;https://www.morling.dev/blog/fizzbuzz-simd-style/&quot;&gt;FizzBuzz&lt;/a&gt; game, but
programmatically generates the resulting patterns from the game.&lt;/p&gt;

&lt;p&gt;Make sure you &lt;a href=&quot;https://stackoverflow.com/questions/52524112&quot;&gt;install JDK 17&lt;/a&gt; 
before running this code.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/simd-fizzbuzz.git

mvn clean verify

java --add-modules=jdk.incubator.vector -jar target/benchmarks.jar -f 1 -wi 5 -i 5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Documentation&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;JEPs in JDK 17 integrated since JDK 11&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://shopify.engineering/faster-trino-query-execution-infrastructure&quot;&gt;Shopify’s Path to a Faster Trino Query Execution: Infrastructure&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=yQqBqix7yTA&quot;&gt;Vector API and Record Serialization&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1JeoNr6-pZw&quot;&gt;The Vector API in JDK 17&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=0BpY132mKm0&quot;&gt;JDK 8 to JDK 18 in Garbage Collection: 10 Releases, 2000+ Enhancements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=e2lXj_t7ZBc&quot;&gt;Concurrent Garbage collectors: ZGC &amp;amp; Shenandoah&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Releases 379 to 381</summary>

      
      
    </entry>
  
    <entry>
      <title>35: Packaging and modernizing Trino</title>
      <link href="https://trino.io/episodes/35.html" rel="alternate" type="text/html" title="35: Packaging and modernizing Trino" />
      <published>2022-04-21T00:00:00+00:00</published>
      <updated>2022-04-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/35</id>
      <content type="html" xml:base="https://trino.io/episodes/35.html">&lt;h2 id=&quot;releases-375-to-378&quot;&gt;Releases 375 to 378&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-375.html&quot;&gt;Trino 375&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for table comments in the MySQL connector.&lt;/li&gt;
  &lt;li&gt;Improved predicate pushdown for PostgreSQL.&lt;/li&gt;
  &lt;li&gt;Performance improvements for aggregations with filters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 376&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Better performance when reading Parquet data.&lt;/li&gt;
  &lt;li&gt;Join pushdown for MySQL.&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for Oracle.&lt;/li&gt;
  &lt;li&gt;Support table and column comments in ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Support for adding and deleting schemas in Accumulo connector.&lt;/li&gt;
  &lt;li&gt;Support system truststore in CLI and JDBC driver.&lt;/li&gt;
  &lt;li&gt;Two-way TLS/SSL certificate validation with LDAP authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 377&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for standard SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;Better performance for Glue metastore.&lt;/li&gt;
  &lt;li&gt;Join pushdown for SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 378&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_base32&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_base32&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_orphan_files&lt;/code&gt; table procedures for Iceberg.&lt;/li&gt;
  &lt;li&gt;Faster planning of queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; predicates.&lt;/li&gt;
  &lt;li&gt;Faster query planning for Hive, Delta Lake, Iceberg, MySQL, PostgreSQL, and
SQL Server connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Generally lots of improvements on Hive, Delta Lake, Iceberg, and main
JDBC-based connectors.&lt;/li&gt;
  &lt;li&gt;Full Iceberg v2 table format support for read and later read and write
operations is getting closer and closer.&lt;/li&gt;
  &lt;li&gt;Table statistics support for PostgreSQL, MySQL, and SQL Server connector
including automatic join pushdown.&lt;/li&gt;
  &lt;li&gt;Fix failure of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT .. LIMIT&lt;/code&gt; operator when input data is dictionary
encoded.&lt;/li&gt;
  &lt;li&gt;Add new page to display the runtime information of all workers in the cluster
in Web UI.&lt;/li&gt;
  &lt;li&gt;Remove &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user&lt;/code&gt; property requirement in JDBC driver.&lt;/li&gt;
  &lt;li&gt;Require &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;internal-communication.shared-secret&lt;/code&gt; value with authentication
usage, breaking change for many users that have not set that secret.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-375.html&quot;&gt;Trino 375&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 376&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-377.html&quot;&gt;Trino 377&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-378.html&quot;&gt;Trino 378&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-packaging-trino&quot;&gt;Concept of the episode: Packaging Trino&lt;/h2&gt;

&lt;p&gt;To adopt Trino you typically need to run it on a cluster of machines. These can
be bare metal servers, virtual machines, or even containers. The Trino project
provides a few binary packages to allow you to install Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;tarball&lt;/li&gt;
  &lt;li&gt;rpm&lt;/li&gt;
  &lt;li&gt;container image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of them include a bunch of Java libraries that constitute Trino
and all the plugins. As a result there are only a few requirements. You need a
Linux operating system, since some of the libraries and code require Linux
indirectly, and a Java 11 runtime.&lt;/p&gt;

&lt;p&gt;Beyond that is just the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bin/launcher&lt;/code&gt; script, which is highly recommended, but
not required. It can be used as a service script or for manual
starts/stop/status of Trino, and only needs Python.&lt;/p&gt;

&lt;h3 id=&quot;tarball&quot;&gt;Tarball&lt;/h3&gt;

&lt;p&gt;The tarball, is a gz compressed tar archive. For installation you just need to
extract the archive anywhere. It contains the following directory structure.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bin&lt;/code&gt;, the launcher script and related files&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lib&lt;/code&gt;, all globally needed libraries&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plugins&lt;/code&gt;, connectors and other plugins with their own libraries each in
separate sub-directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need to create the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc&lt;/code&gt; directory with the needed configuration, since the
tarball does not include any defaults, and you can not start the application
without those.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/*.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/config.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/jvm.config&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/log.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/node.properties&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that all the files are within the created directory.&lt;/p&gt;

&lt;h3 id=&quot;rpm&quot;&gt;RPM&lt;/h3&gt;

&lt;p&gt;The RPM archive is suitable for RPM-based Linux distributions, but testing is
not very thorough across different versions and distributions.&lt;/p&gt;

&lt;p&gt;It adapts the tarball content to the Linux file system hierarchy, hooks the
launcher script up as daemon script, and adds default configuration files. That
allows you to start Trino after installing the archive, as well as with system
restarts.&lt;/p&gt;

&lt;p&gt;Locations used are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/trino&lt;/code&gt;, and others. These are
configured via the launcher script parameters.&lt;/p&gt;

&lt;p&gt;In a nutshell the RPM adds some convenience, but narrows down the supported
Linux distributions. It still requires Java and Python installation and
management.&lt;/p&gt;

&lt;h3 id=&quot;container-image&quot;&gt;Container image&lt;/h3&gt;

&lt;p&gt;The container image for Trino adds the necessary Linux, Java, and Python, and
adapts Trino to the container setup.&lt;/p&gt;

&lt;p&gt;The container adds even more convenience, since it is ready to use out of the
box. It allows usage on Kubernetes with the help of the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm
charts&lt;/a&gt;, and includes the required operating
system and application parts automatically.&lt;/p&gt;

&lt;h3 id=&quot;customization&quot;&gt;Customization&lt;/h3&gt;

&lt;p&gt;All three package Trino ships are just defaults. They all require further
configuration to adapt Trino to your specific needs in terms of hardware,
connected data sources, security configuration, and so on. All of these can be
done manually or with many existing tools.&lt;/p&gt;

&lt;p&gt;However, you can also take it a step further and create your own package suited
to your needs. The tarball can be used as source for any customization to create
your own package. In the following is a list of options and scenarios:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Use the tarball, but remove unused plugins.&lt;/li&gt;
  &lt;li&gt;Use the tarball as source to create your own specific package. For example a
deb archive for usage with Ubuntu, or an Alpine package for that same distro.&lt;/li&gt;
  &lt;li&gt;Create your own RPM similar to &lt;a href=&quot;https://github.com/simpligility/trino-packages&quot;&gt;Manfred’s proof of
concept&lt;/a&gt; that pulls out the
Trino RPM package creation into a separate project.&lt;/li&gt;
  &lt;li&gt;Create your own container image with different base distro, custom set of
plugins, and even with all your configuration baked into the image.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;others&quot;&gt;Others&lt;/h3&gt;

&lt;p&gt;You can also use &lt;a href=&quot;https://formulae.brew.sh/formula/trino&quot;&gt;brew on MacOS&lt;/a&gt;, but
that is not suitable for production usage. More for convenience to get a local
Trino for playing around.&lt;/p&gt;

&lt;h2 id=&quot;additional-topic-of-the-episode-modernizing-trino-with-java-17&quot;&gt;Additional topic of the episode: Modernizing Trino with Java 17&lt;/h2&gt;

&lt;p&gt;Currently Java 11 is required for Trino. Java 17 is the latest and greatest Java
LTS release with lots of good performance, security, and language improvements.
The community has been working hard to make Java 17 support a reality. At this
stage core Trino fully supports Java 17. Starburst Galaxy for example uses Java
17.&lt;/p&gt;

&lt;p&gt;The maintainers and contributors would like to move to fully support and also
require Java 17 soon. Here is where your input comes in, and we ask that you
let us know your thoughts about questions such as the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Are you looking forward to the new Java 17 language features and other
improvements as a contributor to Trino?&lt;/li&gt;
  &lt;li&gt;Are you already using Java 17 with Trino? In production or just testing?&lt;/li&gt;
  &lt;li&gt;If we require Java 17 in the next months, can you update to use Java 17 with
Trino?&lt;/li&gt;
  &lt;li&gt;If not, what are some of the hurdles?&lt;/li&gt;
  &lt;li&gt;Are you okay with staying at an older release, until you can use Java 17?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know on the #dev channel on Trino Slack or ping us directly. You can also
chime in on the &lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;roadmap issue&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-worker-stats-in-the-web-ui&quot;&gt;Pull requests of the episode: Worker stats in the Web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/issues/11653&quot;&gt;PR of the episode&lt;/a&gt; was
submitted &lt;a href=&quot;https://github.com/whutpencil&quot;&gt;Github user whutpencil&lt;/a&gt;, and adds a
significant new feature to the web UI. It exposes the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.runtimes.nodes&lt;/code&gt;
information, so statistics for each worker, in brand new pages. What a great
effort! Special thanks also go out to &lt;a href=&quot;https://github.com/dedep&quot;&gt;Dawid Adamek
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dedep&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-tarball-installation-and-new-web-ui-feature&quot;&gt;Demo of the episode: Tarball installation and new Web UI feature&lt;/h2&gt;

&lt;p&gt;In the demo of the month Manfred shows a worker installation to add to a local
tarball install of a coordinator, and then demos the Web UI with the new feature
from the pull request of the month.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-are-write-operations-in-delta-lake-supported-for-tables-stored-on-hdfs&quot;&gt;Question of the episode: Are write operations in Delta Lake supported for tables stored on HDFS?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CGB0QHWSW/p1650331073409229&quot;&gt;Full question from Slack&lt;/a&gt;:
I was trying the Delta Lake connector. I noticed that write operations are
supported for tables stored on Azure ADLS Gen2, S3 and S3-compatible storage.
Does that mean write operations are not supported for tables stored on HDFS?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; HDFS is always implicitly supported for data lake connectors. It isn’t
called out because it is assumed.&lt;/p&gt;

&lt;p&gt;The confusion actually came from an error message used when the user tried to
insert into a Delta Lake table they created in Spark. Then they tried inserting
a record into the table through IntelliJ IDEA and received the following error
message:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Unsupported target SQL type: -155
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;They thought the problem might be the wrong data type of birthday. Then used
statement below to insert a record into the table.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO
  presto.people10m (id, firstname, middlename, lastname, gender, birthdate, ssn, salary)
VALUES (1, &apos;a&apos;, &apos;b&apos;, &apos;c&apos;, &apos;male&apos;, timestamp &apos;1990-01-01 00:00:00 +00:00&apos;, &apos;d&apos;, 10);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, I got an error message like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query 20220419_031201_00015_8qe76 failed:
Cannot write to table in hdfs://masters/presto.db/people10m; hdfs not supported
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was an issue on the IntelliJ client.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/cinco-de-trino/&quot;&gt;Cinco de Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/285087048/&quot;&gt;Constructing an Intelligent Data Trellis from your Data Mesh&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Releases 375 to 378</summary>

      
      
    </entry>
  
    <entry>
      <title>34: A big delta for Trino</title>
      <link href="https://trino.io/episodes/34.html" rel="alternate" type="text/html" title="34: A big delta for Trino" />
      <published>2022-03-17T00:00:00+00:00</published>
      <updated>2022-03-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/34</id>
      <content type="html" xml:base="https://trino.io/episodes/34.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode Manfred has the pleasure to chat with two colleagues, who
are working on making Trino better every day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/claudiusli&quot;&gt;Claudius Li&lt;/a&gt;, Product Manager at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jhlodin&quot;&gt;Joe Lodin&lt;/a&gt;, Information Engineer at Starburst&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Brian is out to add another member to his family!&lt;/p&gt;

&lt;h2 id=&quot;releases-372-373-and-374&quot;&gt;Releases 372, 373, and 374&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-372.html&quot;&gt;Trino 372&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim_array&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Support for reading ZSTD-compressed Avro files.&lt;/li&gt;
  &lt;li&gt;Support for column comments in Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for Kerberos authentication in Kudu connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-373.html&quot;&gt;Trino 373&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; when querying Elasticsearch and PostgreSQL.&lt;/li&gt;
  &lt;li&gt;Improved performance when querying partitioned Hive tables.&lt;/li&gt;
  &lt;li&gt;Support access to S3 via HTTP proxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-374.html&quot;&gt;Trino 374&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Vim/Emacs editing mode for CLI.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; in Cassandra connector.&lt;/li&gt;
  &lt;li&gt;Support &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uint&lt;/code&gt; types in ClickHouse.&lt;/li&gt;
  &lt;li&gt;Support for Glue Metastore in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP SCHEMA&lt;/code&gt;, table and column comments in MongoDB&lt;/li&gt;
  &lt;li&gt;Improved pushdown for PostgreSQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights from Manfred&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Timeout configuration for LDAP authentication.&lt;/li&gt;
  &lt;li&gt;Values related to fault-tolerant execution in Web UI.&lt;/li&gt;
  &lt;li&gt;JDBC &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Driver.getProperties&lt;/code&gt; enables more client applications like DBVisualizer.&lt;/li&gt;
  &lt;li&gt;Vi and Emacs editing modes for interactive CLI usage.&lt;/li&gt;
  &lt;li&gt;Performance improvements in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;SingleStore JDBC driver usage, end of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memsql&lt;/code&gt; name.&lt;/li&gt;
  &lt;li&gt;Documentation for the atop connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the
&lt;a href=&quot;https://trino.io/docs/current/release/release-372.html&quot;&gt;Trino 372&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-373.html&quot;&gt;Trino 373&lt;/a&gt;, and
&lt;a href=&quot;https://trino.io/docs/current/release/release-374.html&quot;&gt;Trino 374&lt;/a&gt; release
notes.&lt;/p&gt;

&lt;h2 id=&quot;project-tardigrade-update&quot;&gt;Project Tardigrade update&lt;/h2&gt;

&lt;p&gt;The team around Project Tardigrade joined us in &lt;a href=&quot;./32.html&quot;&gt;episode 32&lt;/a&gt; to talk
about fault tolerant execution of queries in Trino. Now they have posted a
&lt;a href=&quot;/blog/2022/02/16/tardigrade-project-update.html&quot;&gt;status update on our blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It looks like things are really coming along well, and Joe has joined the effort
to &lt;a href=&quot;../docs/current/admin/fault-tolerant-execution.html&quot;&gt;create a first user-facing documentation
set&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The team has also posted a status update on the #project-tardigrade Slack
channel. Everything is ready for the community to perform first real world
testing, and help us make this a great feature set for Trino.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-a-new-connector-for-delta-lake-object-storage&quot;&gt;Concept of the episode: A new connector for Delta Lake object storage&lt;/h2&gt;

&lt;p&gt;It is great to have a new connector in Trino, but what does that even mean?
Let’s find out.&lt;/p&gt;

&lt;h3 id=&quot;what-is-a-connector&quot;&gt;What is a connector?&lt;/h3&gt;

&lt;p&gt;Just a quick refresher. Trino allows you to query many different data sources
with SQL statements. You enable that by creating a &lt;em&gt;catalog&lt;/em&gt; that contains the
configuration to connect to a specific &lt;em&gt;data source&lt;/em&gt;. The data source can be a
relational database, a NoSQL database, and an object storage. A &lt;em&gt;connector&lt;/em&gt; is
the translation layer that maps the concepts in the data source to the Trino
concepts of schema, tables, rows, columns, data types and so on. The connector
needs to know how to retrieve the data itself from data source, and also how to
interact with the metadata.&lt;/p&gt;

&lt;p&gt;Here are some examples metadata questions to answer:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What are the available tables in schema &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xyz&lt;/code&gt;?&lt;/li&gt;
  &lt;li&gt;What columns does table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;abc&lt;/code&gt; have and what are the data types?&lt;/li&gt;
  &lt;li&gt;What file format is used by the storage for table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;efg&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And some queries about the actual data:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Give me the top 100 rows from table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Give me all files in partition &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; in the directory &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So having a connector for your data source in Trino is a big deal. A connector
unlocks the data to all your SQL analytics powered by Trino, and the underlying
data source doesn’t even have to support SQL.&lt;/p&gt;

&lt;h3 id=&quot;what-is-delta-lake&quot;&gt;What is Delta Lake?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; is an evolution of the Hive/Hadoop object
storage data source. It is an open-source storage format. Data is stored in
files, typically using binary formats such as Parquet or ORC. Metadata is stored
in a Hive Metastore Service (HMS).&lt;/p&gt;

&lt;p&gt;Delta Lake supports ACID transactions, time travel, and many other features that
are lacking in the legacy Hive/Hadoop setup. This combination of traditional
data lake storage with data warehouse features is often called a lake house.&lt;/p&gt;

&lt;h3 id=&quot;history-of-the-new-connector&quot;&gt;History of the new connector&lt;/h3&gt;

&lt;p&gt;Delta Lake is fully open source, and part of the larger enterprise platform for
a lake house offered by &lt;a href=&quot;https://databricks.com/&quot;&gt;Databricks&lt;/a&gt;.
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; has supported Delta Lake users with a
connector for &lt;a href=&quot;https://docs.starburst.io/index.html#sep&quot;&gt;Starburst Enterprise&lt;/a&gt;
for nearly two years. To foster further adoption and innovation with the
community, the connector was &lt;a href=&quot;https://docs.starburst.io/blog/2022-03-15-delta-lake.html&quot;&gt;donated to Trino
373&lt;/a&gt; and continues to
be improved.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-add-delta-lake-connector-and-documentation&quot;&gt;Pull requests of the episode: Add Delta Lake connector and documentation&lt;/h2&gt;

&lt;p&gt;Over 25 developers helped &lt;a href=&quot;https://github.com/jirassimok&quot;&gt;Jakob&lt;/a&gt; with the effort
to &lt;a href=&quot;https://github.com/trinodb/trino/pull/10897&quot;&gt;open-source the connector&lt;/a&gt;. It
is a heavy lift to migrate a such a full featured connectors into Trino. By
comparison the &lt;a href=&quot;https://github.com/trinodb/trino/pull/11229&quot;&gt;documentation was
easy&lt;/a&gt;, but it is very important to
enable you. Well done everyone!&lt;/p&gt;

&lt;p&gt;Let’s have a look at the code in a bit more detail. A couple of key facts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Delta Lake connector is just another plugin like all other connectors.&lt;/li&gt;
  &lt;li&gt;This is a feature-rich connector supporting read and write operations.&lt;/li&gt;
  &lt;li&gt;It shares implementation details with Hive and Iceberg connectors such as HMS
access, Parquet and ORC file readers, and so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-of-the-episode-delta-lake-connector-in-action&quot;&gt;Demo of the episode: Delta Lake connector in action&lt;/h2&gt;

&lt;p&gt;Now let’s have a look at all this in action. In the demo Claudius uses
docker-compose to start up a HMS as metastore, MinIO as object storage, and of
course Trino as the query engine.&lt;/p&gt;

&lt;p&gt;If you want to follow along, all resources used for the demo are &lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/delta-lake&quot;&gt;available on
our getting started
repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is the sample catalog &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta.properties&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;py&quot;&gt;connector.name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;delta-lake&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.metastore.uri&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;thrift://hive-metastore:9083&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;http://minio:9000&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.aws-access-key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minio&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.aws-secret-key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minio123&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.path-style-access&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;delta.enable-non-concurrent-writes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once everything is up and running we can start playing.&lt;/p&gt;

&lt;p&gt;Verify that the catalog is available:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CATALOGS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Check if there are any schemas:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCHEMAS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Lets create a new schema:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;s3a://claudiustestbucket/myschema&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create a table, insert some records, and then verify:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;John&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;Jane&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run a query to get more data and insert it into a new table:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now for some data manipulation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;Jonathan&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And finally, lets clean up:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;ALTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file_size_threshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;10MB&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see with Trino and Delta Lake you get full create, read, update, and
delete operations on your lake house.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-how-do-i-secure-the-connection-from-a-trino-cluster-to-the-data-source&quot;&gt;Question of the episode: How do I secure the connection from a Trino cluster to the data source&lt;/h2&gt;

&lt;p&gt;Since we talked about connectors earlier, you already know that the
configuration for accessing a data source is assembled to create a catalog. This
approach uses a properties file in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog&lt;/code&gt;. For example, let’s look at the
recently updated &lt;a href=&quot;../docs/current/connector/sqlserver.html&quot;&gt;SQL Server connector
documentation&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;py&quot;&gt;connector.name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sqlserver&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:sqlserver://&amp;lt;host&amp;gt;:&amp;lt;port&amp;gt;;database=&amp;lt;database&amp;gt;;encrypt=false&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;root&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The connector uses username and password authentication. It connects using the
JDBC driver, which in turn enables TLs by default. A number of other connectors
also use JDBC drivers with username and password authentication, but the details
vary a lot. However, for all of them you can use &lt;a href=&quot;../docs/current/security/secrets.html&quot;&gt;secrets support in
Trino&lt;/a&gt; to use environment variable
references instead of hardcoding passwords.&lt;/p&gt;

&lt;p&gt;When it comes to other connectors the details of securing a connection vary even
more. Ultimately the answer to how to secure the connection, and if that is even
possible, is the usual “It depends”. Luckily you can check the documentation for
each connector to find out more and ping us on Slack if you need more help.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.starburst.io/blog/2022-03-15-delta-lake.html&quot;&gt;Starburst donates the Delta Lake connector to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/282794002/&quot;&gt;Operating Trino at Scale at Robinhood&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>33: Trino becomes highly available for high demand</title>
      <link href="https://trino.io/episodes/33.html" rel="alternate" type="text/html" title="33: Trino becomes highly available for high demand" />
      <published>2022-02-17T00:00:00+00:00</published>
      <updated>2022-02-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/33</id>
      <content type="html" xml:base="https://trino.io/episodes/33.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ramesh Bhanan, Vice President, at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/ramesh-bhanan-byndoor/&quot;&gt;@ramesh-bhanan-byndoor&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Sambit Dikshit, Managing Director, Tech Fellow at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/sambitdixit/&quot;&gt;@sambitdixit&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Siddhant Chadha, Senior Data Engineer at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/siddhant-chadha-838136142/&quot;&gt;@siddhant-chadha&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Suman Baliganahalli Narayan Murthy, Vice President at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/suman-b-n-08-03-1990/&quot;&gt;@suman-b-n&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Sumit Halder, Vice President at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/sumit-halder-a3732482/&quot;&gt;@sumit-halder&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-369-370-and-371&quot;&gt;Releases 369, 370, and 371&lt;/h2&gt;

&lt;p&gt;Trino 369&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for task level retries.&lt;/li&gt;
  &lt;li&gt;Support for groups in OAuth2 claims.&lt;/li&gt;
  &lt;li&gt;Column comments in ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Write Bloom filters in ORC files.&lt;/li&gt;
  &lt;li&gt;Procedure for optimizing Iceberg tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino 370&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add CLI support for ARM64.&lt;/li&gt;
  &lt;li&gt;Improved performance for ORC.&lt;/li&gt;
  &lt;li&gt;Improved performance for map and row types.&lt;/li&gt;
  &lt;li&gt;Reduced latency for OAuth2.0 authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino 371&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for secrets and user group selector in resource group manager.&lt;/li&gt;
  &lt;li&gt;Support AWS role session name in S3 security mapping configuration.&lt;/li&gt;
  &lt;li&gt;Many bug fixes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notes from Manfred&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for using PostgreSQL and Oracle as backend database for resource
groups.&lt;/li&gt;
  &lt;li&gt;Remove &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spill-order-by&lt;/code&gt;,  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spill-window-operator&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... SET PROPERTIES&lt;/code&gt; in the engine.&lt;/li&gt;
  &lt;li&gt;Prevent hanging query execution on failures with phased execution policy.&lt;/li&gt;
  &lt;li&gt;Support for renaming schemas in PostgreSQL and Redshift connectors.&lt;/li&gt;
  &lt;li&gt;Lots of improvements on Clickhouse connector, thanks Yuya!&lt;/li&gt;
  &lt;li&gt;Update to newer ClickHouse version removed support for Altinity 20.3.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$properties&lt;/code&gt; table and other hidden tables in Iceberg connector, including
docs.&lt;/li&gt;
  &lt;li&gt;Automatically adjust &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ulimit&lt;/code&gt; setting when using the RPM package.&lt;/li&gt;
  &lt;li&gt;Docker images changes to UBI.&lt;/li&gt;
  &lt;li&gt;Remove support/need for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;allow-drop-table&lt;/code&gt; catalog property in JDBC connectors.&lt;/li&gt;
  &lt;li&gt;A bunch of SPI changes.&lt;/li&gt;
  &lt;li&gt;DML with Iceberg connector with fault tolerant mode and more Tardigrade improvements.&lt;/li&gt;
  &lt;li&gt;Drop support for Kudu 1.13.0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-369.html&quot;&gt;Trino
369&lt;/a&gt;, &lt;a href=&quot;https://trino.io/docs/current/release/release-370.html&quot;&gt;Trino
370&lt;/a&gt;, and &lt;a href=&quot;https://trino.io/docs/current/release/release-371.html&quot;&gt;Trino
371&lt;/a&gt; release notes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-high-availability-with-trino&quot;&gt;Concept of the month: High availability with Trino&lt;/h2&gt;

&lt;p&gt;Goldman Sachs uses Trino to reduce last-mile ETL, and provide a unified way of 
accessing data through federated joins. Making a variety of data sets from 
different sources available in one spot for our data science team was a tall 
order. Data must be quickly accessible to data consumers, and systems like Trino
must be reliable for users to trust this singular access point for their data.&lt;/p&gt;

&lt;p&gt;In order for analysts and data scientists to use these services, they first need
to trust in the system. It was vital to Goldman Sachs that Trino has high 
availability. In the event of any failure, another Trino cluster is available to
process requests.&lt;/p&gt;

&lt;h3 id=&quot;integrating-trino-into-the-goldman-sachs-internal-ecosystem&quot;&gt;Integrating Trino into the Goldman Sachs internal ecosystem&lt;/h3&gt;

&lt;p&gt;Before high availability was a concern, the team had to first integrate Trino to
meet their requirements. This included integrating with internal security 
systems, observability systems, and credential stores. It also meant
adding integration with their governance services that manage cataloguing
services and data discovery engines. Finally, while many of the Trino connectors
that the team intended to use exist, there were many missing features and 
performance enhancements that would lead to a better user experience and more 
adoption. The team has since taken it upon themselves to work on these features
and contribute them back to Trino. We will cover some of these contributions in
the PR segment of this show.&lt;/p&gt;

&lt;h3 id=&quot;achieving-scaling-and-high-availability&quot;&gt;Achieving scaling and high availability&lt;/h3&gt;

&lt;p&gt;Once the team had much of Trino running for some initial use cases, the next 
step was to improve support for more simultaneous use cases and highly 
concurrent workloads. The team wanted trust in the system and so as they scaled
the ability to run blue-green deployments, enable resources isolation, and
have highly available clusters through failures became much more pertinant.&lt;/p&gt;

&lt;h3 id=&quot;trino-ecosystem-at-goldman-sachs&quot;&gt;Trino ecosystem at Goldman Sachs&lt;/h3&gt;

&lt;p&gt;Here is an overview of the Goldman Sachs ecosystem. It showcases the preexisting
services that needed to connect to Trino, the catalogs supported, and the method
in which Goldman Sachs achieves high availability through supporting multiple
clusters in various groups.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinoecosystem.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;

&lt;h3 id=&quot;dynamic-query-routing&quot;&gt;Dynamic query routing&lt;/h3&gt;

&lt;p&gt;In order to ensure that all the clusters receive an even distribution the team
created services that enable dynamic query routing across the different cluster
groups.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinodynamicqueryrouting.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;

&lt;h3 id=&quot;query-routing-components&quot;&gt;Query routing components&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.envoyproxy.io/&quot;&gt;Envoy Proxy&lt;/a&gt; - open source edge and service proxy
that provides features such as routing, traffic management, load balancing, 
external authorization, rate limiting, and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinocontrolplane.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Cluster Groups - cluster group is a set of various Trino clusters that can
be assigned traffic by the&lt;/li&gt;
  &lt;li&gt;Cluster Metadata Service - a service that provides the Envoy routers with all
the cluster related configurations&lt;/li&gt;
  &lt;li&gt;Router Service
    &lt;ul&gt;
      &lt;li&gt;Envoy Control Plane - The Envoy Control Plane is an xDs gRPC-based service, 
that is responsible for providing dynamic configurations to Envoy.&lt;/li&gt;
      &lt;li&gt;Upstream Cluster Selection - Envoy provides HTTP filters to parse and modify
both request and response headers. We use a custom Lua filter to parse the 
request and extract the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x-trino-user&lt;/code&gt; header. Then, we call the router 
service, which returns the upstream cluster address.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-8956-add-support-for-external-db-for-schema-management-in-mongodb-connector&quot;&gt;PR of the month: PR 8956 Add support for external db for schema management in MongoDB connector&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/8956&quot;&gt;PR of the month&lt;/a&gt; comes
from today’s guest Siddhant to solve &lt;a href=&quot;https://github.com/trinodb/trino/issues/8887&quot;&gt;this issue related to the MongoDB connector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Siddhant created the issue in response to the common problem that MongoDB 
connector users face when they don’t have write capability in the Mongo system.
Since MongoDB has no implicit schema, Trino uses a schema definition that is
written to a special MongoDB database. This PR enables users without write access
to create an external location to store their schema to avoid this issue.&lt;/p&gt;

&lt;p&gt;Thanks Siddhant for raising this issue, as it’s a common issue beginners using
the MongoDB connector face commonly.&lt;/p&gt;

&lt;h2 id=&quot;bonus-pr-of-the-month-pr-8202-metadata-for-alias-in-elasticsearch-connector-only-uses-the-first-mapping&quot;&gt;Bonus PR of the month: PR 8202 Metadata for alias in Elasticsearch connector only uses the first mapping&lt;/h2&gt;

&lt;p&gt;This bonus &lt;a href=&quot;https://github.com/trinodb/trino/pull/8202&quot;&gt;PR of the month&lt;/a&gt; comes
from another one of today’s guests, Suman. It solves multiple issues, meaning 
this feature is in high demand!&lt;/p&gt;

&lt;p&gt;The problem brought up by these issues also have to do with how we are mapping
schemas over NoSQL databases that don’t implicitely have a schema. In this case
Elasticsearch stores it’s schema in an object called a mapping. This mapping can
be strict or dynamic for various portions of the document that gets inserted.
The object that correlates to a table in Elasticsearch is called an index. To
keep Elasticsearch fast, multiple indexes are created periodically to support a
given document type similar to partitioning in a database. In general, these 
index follow a very common mapping for a given type, but the reality is that 
Elasticsearch allows you to vary from the mapping. Trino currently simplifies
the way this is done by only reading the first mapping and assuming that all
indexes and documents follow this schema. This pull request addresses this issue
by scanning a much larger sample of mappings and merging the schema to handle
any conflicts. It then goes further to cache these merged mappings for a given
amount of time.&lt;/p&gt;

&lt;p&gt;Thanks for all of your continued work on this Suman! It will help a lot!&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-month-trino-fiddle-a-tool-for-easy-online-testing-and-sharing-of-trino-sql-problems-and-their-solutions&quot;&gt;Demo of the month: Trino Fiddle: A tool for easy online testing and sharing of Trino SQL problems and their solutions&lt;/h2&gt;

&lt;p&gt;This months demo showcases a tool that Brian modified from &lt;a href=&quot;http://sqlfiddle.com/&quot;&gt;SQL Fiddle&lt;/a&gt;
tool called Trino Fiddle. This tool will allow Trino users to share problems
and answer questions that other Trino users are facing.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-does-trino-support-carbondata&quot;&gt;Question of the month: Does Trino support CarbonData?&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://www.trinoforum.org/t/142&quot;&gt;question of the month&lt;/a&gt; 
comes from &lt;a href=&quot;https://www.trinoforum.org/u/masayyed/summary&quot;&gt;Mahebub Sayyed&lt;/a&gt; on 
Trino Forum. Mahebub asks, “Does Trino support CarbonData?”&lt;/p&gt;

&lt;p&gt;The answer is a little tricky, but it can be done!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://carbondata.apache.org/&quot;&gt;CarbonData&lt;/a&gt; currently maintains a connector 
called &lt;a href=&quot;https://mvnrepository.com/artifact/org.apache.carbondata/carbondata-presto&quot;&gt;carbondata-presto&lt;/a&gt; 
that works with an older version of Trino, version 333 (an io.prestosql version 
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;before the rename&lt;/a&gt;). 
Someone has already opened &lt;a href=&quot;https://github.com/apache/carbondata/pull/4198&quot;&gt;a PR to update this connector to a current Trino version&lt;/a&gt; 
that they worked on in the middle of 2021 and hasn’t made much progress 
recently.&lt;/p&gt;

&lt;p&gt;That being said, you could build and use 
&lt;a href=&quot;https://github.com/czy006/carbondata/tree/trino-358-alpha/integration/trino&quot;&gt;the Trino version of the connector&lt;/a&gt; 
this person was working on, and see if it works for you. If you are running on a 
version of Trino that is older than 351, you should be able to use the existing 
carbondata-presto connector.&lt;/p&gt;

&lt;p&gt;If anyone feels motivated, it would be wonderful if you could help get this 
contributed to the CarbonData project, or even work with them to have it land
in the Trino project!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Enabling Highly Available Trino Clusters at Goldman Sachs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=-5mlZGjt6H4&quot;&gt;Video: Building a Federated Cost-Effective Highly Efficient Query Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://developer.gs.com/blog/posts&quot;&gt;Goldman Sachs Developer Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.goldmansachs.com/careers/&quot;&gt;Goldman Sachs Careers Page&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/gsdeveloper&quot;&gt;Follow @GSDeveloper on Twitter&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>32: Trino Tardigrade: Try, try, and never die</title>
      <link href="https://trino.io/episodes/32.html" rel="alternate" type="text/html" title="32: Trino Tardigrade: Try, try, and never die" />
      <published>2022-01-20T00:00:00+00:00</published>
      <updated>2022-01-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/32</id>
      <content type="html" xml:base="https://trino.io/episodes/32.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Andrii Rosa, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/andrii-rosa-79578561/&quot;&gt;@andrii-rosa-79578561&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Brian Zhan, Product Manager at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/brianzhan1&quot;&gt;@brianzhan1&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Lukasz Osipiuk, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/losipiuk&quot;&gt;@losipiuk&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Martin Traverso, Trino &amp;amp; Presto Co-founder and CTO at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Zebing Lin, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/linzebing/&quot;&gt;@linzebing&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;releases-367-and-368&quot;&gt;Releases 367 and 368&lt;/h2&gt;

&lt;p&gt;Martin’s official announcements merged into one:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lineage tracking for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH&lt;/code&gt; clauses and subqueries.&lt;/li&gt;
  &lt;li&gt;Option to hide inaccessible columns in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT *&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_metadata_cache()&lt;/code&gt; procedure for the Hive connector.&lt;/li&gt;
  &lt;li&gt;Improve performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;File-based access control for the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIME&lt;/code&gt; type in the SingleStore connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BINARY&lt;/code&gt; type in the Phoenix connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Prevent data loss on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; in Hive and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;New default query execution policy &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phased&lt;/code&gt; brings performance improvements.&lt;/li&gt;
  &lt;li&gt;And finally, numerous smaller improvements around memory management and query
processing for our project Tardigrade.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-367.html&quot;&gt;Trino
367&lt;/a&gt; and &lt;a href=&quot;https://trino.io/docs/current/release/release-368.html&quot;&gt;Trino
368&lt;/a&gt; release notes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-introducing-project-tardigrade&quot;&gt;Concept of the month: Introducing Project Tardigrade&lt;/h2&gt;

&lt;p&gt;Before we jump right into the project, lets cover some of the history of ETL and
data warehousing to better understand the problems that Tardigrade solves.&lt;/p&gt;

&lt;h3 id=&quot;why-do-people-want-to-do-etl-in-trino&quot;&gt;Why do people want to do ETL in Trino?&lt;/h3&gt;

&lt;p&gt;Trino is used for Extract, Transform, Load (ETL) workloads in many companies,
like Salesforce, Shopify, Slack, and older versions of Trino at Facebook.&lt;/p&gt;

&lt;p&gt;First, the most important thing is query speed. Queries run a lot faster in 
Trino. Open data stack technologies like Hive and Spark retry the query from 
intermediate checkpoints when something fails. However, there’s a performance 
cost to this. Trino has always been focused on delivering query results as 
quickly as possible. Now, Trino performs task-level retries enabling failure 
recovery where needed for the more long-running queries. More on this later 
though.&lt;/p&gt;

&lt;p&gt;Second, most companies have widely dispersed and fragmented data. It’s typical
for most companies to have different storage systems for different use cases.
This only becomes more commonplace when a merger and acquisition happens, and
you have a ton of data stored in yet another location. The acquiring company 
ends up having key information living in a bunch of different places. The net 
result is that the data engineer ends up spending weeks to write that simple 
dashboard. The data scientist trying to understand a trend gets impeded whenever
trying to draw data from a new source and gives up.&lt;/p&gt;

&lt;p&gt;Third, data engineers want to spend their time writing business logic, not 
moving SQL between engines. Unfortunately, this is where they end up spending 
much of their time. Many do their ad-hoc analytics in Trino, because it provides
a far more interactive experience than any other engine. If they don’t just use
Trino, they have a 1,000 line SQL ETL job that they now need to convert into
another dialect. You just need to search “convert Spark Presto SQL Stack 
Overflow” to see the numerous challenges that people face moving between 
engines.&lt;/p&gt;

&lt;p&gt;Whether it’s the optimizations in one engine not working in the other, a UDF in
Trino not existing in Spark, strange differences in the SQL dialect tripping 
people up, or being extremely difficult to debug, these factors always cause a 
delay in completing their tasks. Data engineers are especially paranoid about 
converting SQL correctly. Imagine reporting an incorrect revenue metric 
externally, billing a user of your platform the incorrect amount, or delivering
the wrong content to users due to any of these issues.&lt;/p&gt;

&lt;h3 id=&quot;why-are-people-reluctant-to-do-their-etl-in-trino&quot;&gt;Why are people reluctant to do their ETL in Trino?&lt;/h3&gt;

&lt;p&gt;Before the drive for big data and technologies like Hadoop showed up on the 
scene, systems like Teradata, Netezza, and Oracle were used to run ETL pipelines
in a largely offline manner. If a query failed, you simply had to restart it. 
Systems would brag about the low failure rate of their systems.&lt;/p&gt;

&lt;p&gt;As Big Data came to the forefront, systems like the &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf&quot;&gt;Google File System&lt;/a&gt;,
that largely inspired the design for the Hadoop Distributed File System, aimed 
to build large distributed systems that supported fault-tolerance. In essence,
faults were expected, and if a node in the system failed, no data would be lost.&lt;/p&gt;

&lt;p&gt;At this same time, compute and storage systems were becoming separate systems. 
Just as storage was built with fault-tolerance, compute systems like MapReduce
that processed and transformed data was also &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf&quot;&gt;built with fault tolerance in mind&lt;/a&gt;.
Apache Hive is a syntax and metadata layer that enables generating MapReduce 
jobs without having to write code. Apache Spark came on the analytics scene
by &lt;a href=&quot;https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf&quot;&gt;introducing lineage&lt;/a&gt; 
as a way for engineers to have more control over how and when their datasets
are flushed to disk. This technique, while novel, still took a very pessimistic
view that allowing faults was the worst case scenario to avoid.&lt;/p&gt;

&lt;p&gt;When Trino was created, it was designed with speed in mind. Trino creators 
Martin, Dain, and David chose not to add fault-tolerance to Trino as they
recognized the tradeoff of fast analytics. Due to the nature of the streaming 
exchange in Trino all tasks are interconnected. A failure of any task results in
a query failure. To support long running queries Trino has to be able to 
tolerate task failures.&lt;/p&gt;

&lt;p&gt;Having an all-or-nothing architecture makes it significantly more difficult to 
tolerate faults, regardless of how rare they are. The likelihood of a failure 
grows with the time it takes to complete a query. This risk also increases as 
the resource demands, such as memory requirements of a query, grow. It’s 
impossible to know the exact memory requirements for processing a query upfront.
In addition to increased likelihood of a failure, the impact of failing a long 
running query is much higher, as it often results in a significant waste of time
and resources.&lt;/p&gt;

&lt;p&gt;You may think all-or-nothing is a model destined to fail, especially when 
scaling to petabytes of data. On the contrary, Trino’s predecessor Presto was 
commonly used to execute batch workloads at this scale at Facebook. Even today,
companies like &lt;a href=&quot;https://medium.com/salesforce-engineering/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;Salesforce&lt;/a&gt;, 
&lt;a href=&quot;https://www.starburst.io/resources/trino-summit/?wchannelid=2ug6mgs5ao&amp;amp;wmediaid=j1eq196a4y&quot;&gt;Doordash&lt;/a&gt;, 
and many others, use Trino at Petabyte scale to handle ETL workloads. While it 
is possible, scaling Trino to run petabyte scale ETL pipelines, you really have
to know what you’re doing.&lt;/p&gt;

&lt;p&gt;Resource management is another challenge. Users don’t know exactly what 
resource utilization to expect from a query they submit. It is challenging to 
properly size the cluster and to avoid resource related failures.&lt;/p&gt;

&lt;p&gt;In essence, most people avoid using Trino for ETL because they lack the 
understanding of how to correctly configure Trino at scale.&lt;/p&gt;

&lt;h3 id=&quot;what-are-the-limitations-of-the-current-architecture&quot;&gt;What are the limitations of the current architecture?&lt;/h3&gt;

&lt;p&gt;In the current architecture Trino plans all tasks for processing a specific 
query upfront. These tasks interconnect with one another as the results from
one task are the input for the next. This interdependency is necessary but 
if any task fails along the way, it breaks the entire chain.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/interconnected-tasks.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Data is streamed through task graph with no intermediate checkpointing. The 
query execution has just internal, volatile state of operators running within 
tasks.&lt;/p&gt;

&lt;p&gt;As stated before, this architecture has advantages. Most notably high throughput
and low latency. Yet it implies some limitations too. Probably the most natural
one is that it does not allow for granular failure recovery. If one of the tasks
dies there is no way to restart processing from some intermediary state. The 
only option is to rerun the whole query from the very beginning.&lt;/p&gt;

&lt;p&gt;The other notable limitation is around memory consumption. With static task 
placement we have little control over resource utilization on nodes.&lt;/p&gt;

&lt;p&gt;Finally, the current architecture makes many decisions upfront during query
planning. The engine creates a query plan based on incomplete data using table 
statistics, or blindly, if statistics are not available. After the coordinator 
creates the plan, and query processing started, there aren’t many ways to adapt.
We have much more information during query execution at runtime. For example, we
cannot change the number of tasks for a stage. If we observe data skew, we can’t 
move tasks away from the overworked node, so the affected tasks have more 
resources at hand. We cannot change the plan for a subquery, if we notice that 
decision already made is not optimal.&lt;/p&gt;

&lt;h3 id=&quot;trino-engine-improvements-with-project-tardigrade&quot;&gt;Trino engine improvements with Project Tardigrade&lt;/h3&gt;

&lt;p&gt;Project Tardigrade aims to break the all-or-nothing execution barriers. It opens
many new opportunities around resource management, adaptive query optimization,
and failure recovery. We will use a technique called spooling that stores 
intermediate data in an efficient buffering layer at stage boundaries. The 
buffer stores intermediate results for the duration of a query or a stage, 
depending on the context. The project is named after the microscopic &lt;a href=&quot;https://en.wikipedia.org/wiki/Tardigrade&quot;&gt;Tardigrades&lt;/a&gt;
that are the world’s most indestructible creatures, akin to the resiliency we 
are adding to Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/tardigrade-logo.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Buffering intermediate results makes it possible to execute queries iteratively.
For example, the engine can process one or several tasks at a time, effectively 
reducing memory pressure, and allow memory intensive queries to succeed without 
a need to expand the cluster. Tardigrade can significantly lower cost of 
operation, specifically for the situation when only a small number of queries 
requires more memory than available.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/tardigrade-buffers.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;adaptive-planning&quot;&gt;Adaptive planning&lt;/h4&gt;

&lt;p&gt;The engine may also decide to re-optimize the query at stage boundaries. When&lt;br /&gt;
the engine buffers the intermediate data, it is possible to get better insight
into the nature of the data as it’s processed and adapt query plans accordingly.
For example, when the cost based optimizer makes a bad decision, because of 
incorrect statistics or estimates, it can pick the wrong type of join, or a 
suboptimal join order. The engine can then suspend the query, re-optimize the 
plan, and resume processing. Additionally, it may allow the engine to discover 
skewed datasets, and change query plans accordingly. This may significantly 
improve efficiency and landing time for workloads that are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; heavy.&lt;/p&gt;

&lt;h4 id=&quot;resource-management&quot;&gt;Resource management&lt;/h4&gt;

&lt;p&gt;Iterative query processing allows us to be more flexible at resource management.
Resource allocation can be adjusted as the queries run. For example, when a 
cluster is idle, we may allow a single query to utilize all available resources
on a cluster. When more workload kicks in, the resource allocation for the 
initial query can be gradually reduced, and available resources can be granted
to newly submitted workloads. With this model it is also significantly easier to
implement auto scaling. When the submitted workload requires more resources than
currently available in the cluster, the engine can request more nodes. Or the
opposite, if the cluster is underutilized it is easier to return resources when 
there’s no need to wait for slow running tasks. Being able to better manage 
available resources, and adjust the resource pool based on the current workload 
submitted, would make the engine significantly more cost effective.&lt;/p&gt;

&lt;h4 id=&quot;fine-grained-failure-recovery&quot;&gt;Fine-grained failure recovery&lt;/h4&gt;

&lt;p&gt;Last, but not least, with project Tardigrade we are going to provide 
fine-grained failure recovery. The buffering introduced at stage boundaries 
allows for a transparent restart of failed tasks. Fine grained failure recovery
would make completion time for ETL pipelines significantly more predictable. 
Also, it opens the opportunity of running ETL workloads on much cheaper, widely 
available spot instances that can further optimize operational costs.&lt;/p&gt;

&lt;h3 id=&quot;opportunities-that-tardigrade-opens&quot;&gt;Opportunities that Tardigrade opens&lt;/h3&gt;

&lt;p&gt;In summary, in Project Tardigrade we work on the following improvements to Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Predictable query completion times.&lt;/li&gt;
  &lt;li&gt;The ability to scale up or down to match the workload at runtime.&lt;/li&gt;
  &lt;li&gt;Fine grained resource management.&lt;/li&gt;
  &lt;li&gt;Non-homogenous hardware.&lt;/li&gt;
  &lt;li&gt;Adaptive resource limits for tasks.&lt;/li&gt;
  &lt;li&gt;Graceful Shutdown improvement.&lt;/li&gt;
  &lt;li&gt;Cheaper compute costs using spot instances that have lower failure guarantees.&lt;/li&gt;
  &lt;li&gt;Enables adaptive query replanning during runtime as context changes.&lt;/li&gt;
  &lt;li&gt;Handle situations where certain tasks are affected by data skew.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;efficient-exchange-data-buffering-implementation&quot;&gt;Efficient exchange data buffering implementation&lt;/h3&gt;

&lt;p&gt;This all sounds incredible, but it begs the question of how to best implement
these buffers? Enabling task-level retry requires us to store intermediate 
exchange data to a “distributed buffer”. In order to minimize the level of 
disturbance buffering has on the query performance, there needs to be careful 
design consideration.&lt;/p&gt;

&lt;p&gt;A naive implementation is to use a cloud object storage as intermediate storage.
This allows you to scale without maintaining a separate service. This is the 
initial option we are using as a prototype buffer. It is intended as a 
proof-of-concept and should be good enough for small clusters of ten to twenty
nodes. This option can be slow and won’t support high-cardinality exchanges. The
number of files grows quadratically with the number of partitions. Trino then 
has keep track of the metadata of all these files in order to plan and schedule
which tasks require which files for the query. With the high amount of files, 
there is memory cost to hold that metadata. There is also a penalty for the time
and bandwidth it takes on the network to list them all. This is a well know many
small files problem in big data.&lt;/p&gt;

&lt;h4 id=&quot;distributed-memory-with-spilling-as-a-buffer&quot;&gt;Distributed memory with spilling as a buffer&lt;/h4&gt;

&lt;p&gt;This solution requires a long-running managed service, but improves performance.
Depending on the design we choose, we can use write-ahead buffers to output data 
belonging to the same partition and provide sequential I/O to downstream tasks.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;70%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/buffer-implementation.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-month-task-retries-with-project-tardigrade&quot;&gt;Demo of the month: Task retries with Project Tardigrade&lt;/h2&gt;

&lt;p&gt;In this months demo, Zebing showcases task retries using Project Tardigrade 
after throwing his EC2 instance out the window! See what happens next…&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Tnd-QsDCd2Q&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-10319-trino-lineage-fails-for-aliasedrelation&quot;&gt;PR of the month: PR 10319 Trino lineage fails for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AliasedRelation&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/10319&quot;&gt;PR of the month&lt;/a&gt; was
created to resolve &lt;a href=&quot;https://github.com/trinodb/trino/issues/10272&quot;&gt;an issue&lt;/a&gt; 
reported by Lyft Data Infrasturcture Engineer, Arup Malakar (&lt;a href=&quot;https://github.com/amalakar&quot;&gt;@amalakar&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Arup reported that Trino lineage fails to capture upstream columns when join and
transformation is used. This issue more generally applied to any column used 
with function where its argument are from a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AliasedRelation&lt;/code&gt;. Starburst
engineer, Praveen Krishna (&lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;@Praveen2112&lt;/a&gt;), 
resolved the issue two days later, and with the help of Arup and the Lyft team,
tested the fix works!&lt;/p&gt;

&lt;p&gt;Thanks to both Arup and Praveen for the fix!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-how-do-you-cast-json-to-varchar-with-trino&quot;&gt;Question of the month: How do you cast JSON to varchar with Trino?&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://stackoverflow.com/questions/70701325&quot;&gt;question of the month&lt;/a&gt; 
comes from &lt;a href=&quot;https://stackoverflow.com/users/10924136&quot;&gt;Borislav Blagoev&lt;/a&gt; on Stack
Overflow. He asks, “How do you cast JSON to varchar with Trino?”&lt;/p&gt;

&lt;p&gt;This was answered by &lt;a href=&quot;https://stackoverflow.com/users/2501279&quot;&gt;Guru Stron&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Use &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json_format&quot;&gt;json_format&lt;/a&gt;/
&lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json_parse&quot;&gt;json_parse&lt;/a&gt; to handle json object conversions instead of casting:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;select json_parse(&apos;{&quot;property&quot;: 1}&apos;) objstring_to_json, json_format(json &apos;{&quot;property&quot;: 2}&apos;) jsonobj_to_string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Output:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;objstring_to_json&lt;/th&gt;
      &lt;th&gt;jsonobj_to_string&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;{“property”:1}&lt;/td&gt;
      &lt;td&gt;{“property”:2}&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/salesforce-engineering/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;How to ETL at Petabyte-Scale with Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>31: Trinites II: Trino on AWS Kubernetes Service</title>
      <link href="https://trino.io/episodes/31.html" rel="alternate" type="text/html" title="31: Trinites II: Trino on AWS Kubernetes Service" />
      <published>2021-12-16T00:00:00+00:00</published>
      <updated>2021-12-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/31</id>
      <content type="html" xml:base="https://trino.io/episodes/31.html">&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;releases-365-and-366&quot;&gt;Releases 365 and 366&lt;/h2&gt;

&lt;p&gt;Martin’s official announcement mentioned the following highlights:&lt;/p&gt;

&lt;p&gt;Trino 365&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Aggregations in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Compatibility with Pinot 0.8.0&lt;/li&gt;
  &lt;li&gt;HTTP proxy support for OAuth2 authentication&lt;/li&gt;
  &lt;li&gt;Many improvements to Iceberg connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Release notes: &lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;https://trino.io/docs/current/release/release-365.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Trino 366&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for automatic query retries&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DENY&lt;/code&gt; security rules&lt;/li&gt;
  &lt;li&gt;Performance optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Release notes: &lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;https://trino.io/docs/current/release/release-366.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Cool new SQL like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; and support for time travel&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;contains&lt;/code&gt; function for IP check in CIDR&lt;/li&gt;
  &lt;li&gt;Lots of performance and correctness fixes on Hive and Iceberg connectors&lt;/li&gt;
  &lt;li&gt;Drop support for old Pinot versions&lt;/li&gt;
  &lt;li&gt;Support for Hive to Iceberg redirects&lt;/li&gt;
  &lt;li&gt;Automatic TLS for internal communication support for Java 17&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And a last note, full Java 17 support is becoming a reality.&lt;/p&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;365&lt;/a&gt;
and &lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;366&lt;/a&gt; release notes.&lt;/p&gt;

&lt;p&gt;To play around with query retries, you need to set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy&lt;/code&gt; session
variable to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QUERY&lt;/code&gt; with the following command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET SESSION retry_policy=QUERY;&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;log4shell&quot;&gt;Log4Shell&lt;/h2&gt;

&lt;p&gt;There’s a new vulnerability in town that has the potential to affect Java
projects that use some Log4j2 versions. It is called Log4Shell, and it does not
affect Trino. Read &lt;a href=&quot;https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html&quot;&gt;the blog for more details&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/episode/31/log4shell.jpeg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-replicasets-deployments-and-services&quot;&gt;Concept of the month: ReplicaSets, Deployments, and Services&lt;/h2&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/24.html&quot;&gt;the first installment of Trinetes&lt;/a&gt;, we talked about what 
containerization is and why we use it. We covered the difference between tools
like docker-compose and container orchestration systems like Kubernetes (k8s).
Finally, we went over the first k8s object called a &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/pods/&quot;&gt;&lt;em&gt;pod&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a reminder, a pod is the basic unit of deployment in a k8s cluster. In this
episode, we cover how to scale, deploy, and connect these pods. If you are 
missing some context, you should review &lt;a href=&quot;/episodes/24.html&quot;&gt;the first installment of this series&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;replicasets&quot;&gt;ReplicaSets&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Replicas&lt;/em&gt; make one or more instances based on the same pod definitions. In k8s,
the object used to manage replication is a &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/&quot;&gt;&lt;em&gt;ReplicaSet&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;ReplicaSets provide high availability by managing multiple instances based on a 
pod definition in the k8s cluster. Kubernetes automatically brings up any failed
pod instances that go down in a ReplicaSets based on the number of replicas you
specify in the definition.&lt;/p&gt;

&lt;p&gt;Replication also enables load balancing IO traffic over multiple pods. You gain 
the flexibility to scale up or down as traffic increases or decreases without 
any downtime.&lt;/p&gt;

&lt;p&gt;To scale the number of pods in a live ReplicaSet, you can update the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;replicas&lt;/code&gt; 
value in the ReplicaSet definition file, then running the following command to
update it:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl replace -f replicaset-definition.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also edit the live ReplicaSet without changing the local file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl edit replicaset &amp;lt;replicaset-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;labels-and-selectors&quot;&gt;Labels and selectors&lt;/h3&gt;

&lt;p&gt;Kubernetes objects have &lt;a href=&quot;https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/&quot;&gt;labels&lt;/a&gt; 
which are just key/value properties used to identify and dynamically group k8s
objects. Labels should be meaningful and relevant to k8s users to easily 
comprehend things like which application, version, component, and environment 
certain objects belong to. Labels are shared across instances, and so they are 
not unique.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors&quot;&gt;Selectors&lt;/a&gt;
specify the grouping of instance to target a set of objects when deploying or 
applying other operations over these objects. For example, a ReplicaSet that 
identifies a set of pods with its selector to manage. When creating the 
ReplicaSet, k8s creates new pods defined in the ReplicaSet’s selector 
definition. If the pods crash, k8s brings up new pods and associates the new
pods with the ReplicaSet.&lt;/p&gt;

&lt;h3 id=&quot;deployments&quot;&gt;Deployments&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/controllers/deployment/&quot;&gt;Deployment&lt;/a&gt;
objects allow you to take a ReplicaSet, and perform actions on that set 
like creation, a rolling update, rollback, pod update, and so on.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/deployment.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;The best way to start making sense of these concepts is to look at the k8s
configuration files.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm template tcb trino/trino --version 0.3.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Below is the generated deployment configuration, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino/templates/deployment-worker.yaml&lt;/code&gt; with comments that delineate where
different sections of the configuration are defining.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#-------------------------Deployment-----------------------------
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
spec:
#-------------------------ReplicaSet-----------------------------
  replicas: 2
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: worker
  template:
#----------------------------Pod---------------------------------
    metadata:
      labels:
        app: trino
        release: tcb
        component: worker
    spec:
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-worker
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-worker
          image: &quot;trinodb/trino:latest&quot;
          imagePullPolicy: IfNotPresent
          env:
            []
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;configmap&quot;&gt;ConfigMap&lt;/h3&gt;

&lt;p&gt;You may have noticed that the pods define volumes that are referring to an
object called &lt;a href=&quot;https://kubernetes.io/docs/concepts/configuration/configmap/&quot;&gt;&lt;em&gt;ConfigMap&lt;/em&gt;&lt;/a&gt;.
This is a way to store non-confidential data in the form of key-value pairs.&lt;/p&gt;

&lt;p&gt;ConfigMaps are how the Trino chart loads the &lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html#configuring-trino&quot;&gt;Trino configurations&lt;/a&gt; 
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt; directory on the containers. The ConfigMap file, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino/templates/configmap-worker.yaml&lt;/code&gt;, defines the files loaded into the 
worker nodes. The only real difference of the ConfigMap is in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properites&lt;/code&gt; file specifying if the node is a coordinator or not.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=false
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery.uri=http://tcb-trino:8080

  log.properties: |
    io.trino=INFO
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The only other ConfigMap defines the &lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html#catalog-properties&quot;&gt;catalog properties files&lt;/a&gt;
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino/catalog&lt;/code&gt; folder. This ConfigMap only defines two catalogs.
They expose the TPC-H and TPC-DS benchmark datasets.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;networking&quot;&gt;Networking&lt;/h3&gt;

&lt;p&gt;Unlike in the Docker world, where it runs on the host directly where you can 
expose the container, pods in a k8s cluster run in a private network. 
Kubernetes exposes the internal IP address of the pod with the IP address of the
k8s node and a unique port.&lt;/p&gt;

&lt;p&gt;These IP addresses can be used to address pods internally, it’s not a good idea 
as these IP addresses are dynamic and subject to change upon termination and 
recreation. For this, you set up routing that handles addressing via pod name vs
IP address.&lt;/p&gt;

&lt;p&gt;When you have multiple k8s nodes, you have multiple IP addresses set up for
the nodes. The routing software must be set up to handle the assignment of the 
internal networks to each nodes to avoid conflicts across the cluster. This type
of  functionality exists in cloud services, such as Amazon EKS, Google GKE, and 
Azure AKS.&lt;/p&gt;

&lt;h3 id=&quot;services&quot;&gt;Services&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/services-networking/service/&quot;&gt;&lt;em&gt;Services&lt;/em&gt;&lt;/a&gt; 
establish connectivity between different pods and can make pods available 
from the external k8s node IP address. This enables loose coupling between 
microservices in applications.&lt;/p&gt;

&lt;p&gt;The above example is showing a NodePort service. There are three service types.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;ClusterIP - the service creates a virtual IP inside the cluster to enable 
communication between different services. This service is the default when you
don’t specify a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; value under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spec&lt;/code&gt; in the configuration.&lt;/li&gt;
  &lt;li&gt;NodePort - is used to expose the internal address of a pod using the IP 
address and port of the node it is running on.&lt;/li&gt;
  &lt;li&gt;Load Balancer - this service creates a load balancer for the application in 
supported cloud providers. We won’t cover this one, but this is used when 
we create our cluster in EKS using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s a diagram of the ClusterIP networking between different ReplicaSets.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/clusterip.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;NodePort’s establish connectivity to a specific ReplicaSet of pod instances. It 
cannot make a generically accessible IP address for services to communicate 
between one another.&lt;/p&gt;

&lt;p&gt;In our case, we configure an external IP address for the coordinator.
The Helm chart defines a ClusterIP service to accomplish this. Notice the
selector targets the Trino app, the release label, and only the coordinator 
component, which we know is one node.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;nodeport&quot;&gt;NodePort&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport&quot;&gt;&lt;em&gt;NodePort&lt;/em&gt;&lt;/a&gt; 
Service type, creates a proxy service to forward traffic to a specific port on 
the node from the pod.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/service.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;There are three ports when setting up a NodePort.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;TargetPort - is the port number on the pod itself, where the service forwards to.&lt;/li&gt;
  &lt;li&gt;Port - is the port used by the service.&lt;/li&gt;
  &lt;li&gt;NodePort - is the port that is exposed by the worker node and made available 
externally. NodePorts can only be in the range of 30000 - 32767.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The only required port to set is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;port&lt;/code&gt;. By default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;targetPort&lt;/code&gt; is the 
same as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;port&lt;/code&gt; and nodePort is automatically assigned a free port in the 
allowed range. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ports&lt;/code&gt; is also an array which is why the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-&lt;/code&gt; char is used.&lt;/p&gt;

&lt;h3 id=&quot;amazon-eks-elastic-kubernetes-service&quot;&gt;Amazon EKS (Elastic Kubernetes Service)&lt;/h3&gt;

&lt;p&gt;Amazon EKS is a managed container service to run and scale Kubernetes 
applications in the cloud. EKS provides k8s clusters in the cloud for you 
without your having to manage the whole k8s services and platform. Unlike with
your own k8s cluster, you can’t log into the control plane node in EKS, although
you won’t need to. You are able to access workers which are usually EC2 nodes.&lt;/p&gt;

&lt;p&gt;There are &lt;a href=&quot;https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html&quot;&gt;many steps involved in setting up a Kubernetes cluster&lt;/a&gt; 
on EKS, unless you use a simple command line tool called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; that
provisions the cluster for you.&lt;/p&gt;

&lt;h3 id=&quot;eksctl&quot;&gt;eksctl&lt;/h3&gt;

&lt;p&gt;From the &lt;a href=&quot;https://eksctl.io/&quot;&gt;eksctl website&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; is a simple CLI tool for creating and managing clusters on EKS - 
Amazon’s managed Kubernetes service for EC2. It is written in Go, uses 
CloudFormation, was created by Weaveworks and it welcomes contributions from 
the community. Create a basic cluster in minutes with just one command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;demo-of-the-month-deploy-trino-k8s-to-amazon-eks&quot;&gt;Demo of the month: Deploy Trino k8s to Amazon EKS&lt;/h2&gt;

&lt;p&gt;First, you’ll need to install the following tools if you haven’t done so already:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/weaveworks/eksctl&quot;&gt;eksctl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/&quot;&gt;kubectl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://helm.sh/docs/intro/install/&quot;&gt;helm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you need to add your IAM credentials to the 
&lt;a href=&quot;https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-where&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/.aws/credentials&lt;/code&gt; file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check the latest k8s version that is available on EKS.
&lt;a href=&quot;https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html&quot;&gt;https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;eksctl create cluster \
 --name tcb-cluster \
 --version 1.21 \
 --region us-east-1 \
 --nodegroup-name k8s-tcb-cluster \
 --node-type t2.large \
 --nodes 2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The command completed in 10 to 15 minutes. This is the first output you
see:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2021-12-16 01:25:17 [ℹ]  eksctl version 0.76.0
2021-12-16 01:25:17 [ℹ]  using region us-east-1
2021-12-16 01:25:17 [ℹ]  setting availability zones to [us-east-1a us-east-1e]
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1a - public:192.168.0.0/19 private:192.168.64.0/19
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1e - public:192.168.32.0/19 private:192.168.96.0/19
2021-12-16 01:25:17 [ℹ]  nodegroup &quot;k8s-tcb-cluster&quot; will use &quot;&quot; [AmazonLinux2/1.21]
2021-12-16 01:25:17 [ℹ]  using Kubernetes version 1.21
2021-12-16 01:25:17 [ℹ]  creating EKS cluster &quot;tcb-cluster&quot; in &quot;us-east-1&quot; region with managed nodes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After some time, you notice that two ec2 instances have come up. The final 
output of the tool should look like this.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2021-12-16 02:00:17 [ℹ]  waiting for at least 2 node(s) to become ready in &quot;k8s-tcb-cluster&quot;
2021-12-16 02:00:17 [ℹ]  nodegroup &quot;k8s-tcb-cluster&quot; has 2 node(s)
2021-12-16 02:00:17 [ℹ]  node &quot;ip-192-168-2-123.ec2.internal&quot; is ready
2021-12-16 02:00:17 [ℹ]  node &quot;ip-192-168-55-167.ec2.internal&quot; is ready
2021-12-16 02:00:18 [ℹ]  kubectl command should work with &quot;~/.kube/config&quot;, try &apos;kubectl get nodes&apos;
2021-12-16 02:00:18 [✔]  EKS cluster &quot;tcb-cluster&quot; in &quot;us-east-1&quot; region is ready
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Take special note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; overwrote your k8s configuration to point you to 
the EKS cluster instead of a local cluster. To test that you can connect, run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see two nodes running. Now everything is simple. All you have to do
to install Trino is reuse the Helm chart that we used to locally deploy Trino.
Now, with the exact same command, you deploy to EKS since the tool updated
your settings.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm install tcb trino/trino --version 0.3.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After you’ve installed the Helm chart, wait a minute or two for the Trino 
service to fully start and run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get deployments
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see the output that the coordinator and both workers are available.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tcb-trino-coordinator   1/1     1            1           67s
tcb-trino-worker        2/2     2            2           67s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To connect to the cluster, the Helm output gives pretty good instructions on how
to create a tunnel from the cluster to your local laptop.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l &quot;app=trino,release=tcb,component=coordinator&quot; -o jsonpath=&quot;{.items[0].metadata.name}&quot;)
  echo &quot;Visit http://127.0.0.1:8080 to use your application&quot;
  kubectl port-forward $POD_NAME 8080:8080
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run that, then go to &lt;a href=&quot;http://127.0.0.1:8080&quot;&gt;http://127.0.0.1:8080&lt;/a&gt;, and you should see the Trino UI.&lt;/p&gt;

&lt;p&gt;To clear out the Helm install, run:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete service --all
kubectl delete deployment --all
kubectl delete configmap --all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To tear down the entire k8s cluster, run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;eksctl delete cluster --name test-cluster --region us-east-1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-8921-support-truncate-table-statement&quot;&gt;PR of the month: PR 8921: Support TRUNCATE TABLE statement&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/trino/issues/8921&quot;&gt;PR of the month&lt;/a&gt;
implements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;. This command is very similar to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; statements,
with the exception that it does not perform deletes on individual rows. This 
ends up becoming a much faster operation that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; as it uses fewer system 
and logging resources.&lt;/p&gt;

&lt;p&gt;Thanks to Yuya Ebihira for adding the support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-how-do-i-run-systemsync_partition_metadata-with-different-catalogs&quot;&gt;Question of the month: How do I run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.sync_partition_metadata&lt;/code&gt; with different catalogs?&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1639094856214800&quot;&gt;question of the month&lt;/a&gt; 
comes from Yu on Slack. Yu asks:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Hi team, in the following system procedure, how can we specify the catalog name?
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.sync_partition_metadata(schema_name, table_name, mode, case_sensitive)&lt;/code&gt;
We are using multiple catalogs and we need to call this procedure against 
non-default catalog.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I answered this with a link back to our &lt;a href=&quot;/episodes/5.html&quot;&gt;fifth episode&lt;/a&gt; :&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You need to set the catalog either in the jdbc string as I do in the video, or
you need to set the session catalog variable,
&lt;a href=&quot;https://trino.io/docs/current/sql/set-session.html&quot;&gt;https://trino.io/docs/current/sql/set-session.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://normanlimxk.com/2021/12/07/run-trino-presto-on-minikube-on-aws/&quot;&gt;Run Trino/Presto on Minikube on AWS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/episodes/24.html&quot;&gt;Trinetes I: Trino on Kubernetes TCB episode&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sbakiu.medium.com/diy-analytics-platform-66638cc6a92f&quot;&gt;DIY Analytics Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=p6xDCz00TxU&quot;&gt;AWS EKS - Create Kubernetes cluster on Amazon EKS: the easy way&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Trino Summit 2021</summary>

      
      
    </entry>
  
    <entry>
      <title>30: Trino and dbt, a hot data mesh</title>
      <link href="https://trino.io/episodes/30.html" rel="alternate" type="text/html" title="30: Trino and dbt, a hot data mesh" />
      <published>2021-11-17T00:00:00+00:00</published>
      <updated>2021-11-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/30</id>
      <content type="html" xml:base="https://trino.io/episodes/30.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;José Cabeda, Data Engineer at &lt;a href=&quot;https://www.talkdesk.com&quot;&gt;Talkdesk&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/jecabeda&quot;&gt;@jecabeda&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Przemek Denkiewicz, Cloud Ecosystem Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/hovaesco&quot;&gt;@hovaesco&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;release-364&quot;&gt;Release 364&lt;/h2&gt;

&lt;p&gt;Trino 364 shipped on the first of November, just after our last episode. 
Martin’s official announcement mentioned the following highlights:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for dynamic filtering in Iceberg connector&lt;/li&gt;
  &lt;li&gt;Performance improvements when querying small files&lt;/li&gt;
  &lt;li&gt;Procedure to merge small files in Hive tables&lt;/li&gt;
  &lt;li&gt;Support for Cassandra UUID type&lt;/li&gt;
  &lt;li&gt;Support for MemSQL datetime and timestamp types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... RENAME TO&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;A whole bunch of performance improvements&lt;/li&gt;
  &lt;li&gt;Elasticsearch connector no longer fails with unsupported types&lt;/li&gt;
  &lt;li&gt;A lot of improvements on Hive and Iceberg connectors&lt;/li&gt;
  &lt;li&gt;Hive connector has optimize procedure now!&lt;/li&gt;
  &lt;li&gt;Parquet and avro fixes and improvements&lt;/li&gt;
  &lt;li&gt;Web UI performance improvement for long query texts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-364.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-and-dbt-a-hot-data-mesh&quot;&gt;Concept of the week: Trino and dbt, a hot data mesh&lt;/h2&gt;

&lt;p&gt;Data mesh, the buzzword that follows data lakehouse, may feel rather irrelevant
for many. This is especially true for those that just want to move from a Hive 
and HDFS cluster to storing data in object store, or from a cloud data warehouse
and query it with Trino.&lt;/p&gt;

&lt;p&gt;While data mesh is certainly in the hype cycle phase, it’s actually not a new
idea and has very sound principles. Many companies have written their own 
software and created organizational policies that align with the strategies 
outlined by the data mesh principles. In essence, these principles aim to make
data management for analytics platforms decentralized. This means decentralizing
the infrastructure and data engineers managing it to different domains (or 
products) within a company.&lt;/p&gt;

&lt;p&gt;What’s really exciting about data mesh is that much of the technology today 
makes these theoretical principles more of a reality without having to invent 
your own services. The author of &lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;data mesh&lt;/a&gt;,
Zhamak Dehghani, lays out 4 principles that characterize a data mesh:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Domain-oriented, decentralized data ownership and architecture&lt;/li&gt;
  &lt;li&gt;Data as a product&lt;/li&gt;
  &lt;li&gt;Self-serve data infrastructure as a platform&lt;/li&gt;
  &lt;li&gt;Federated computational governance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s see what the engineers from Talkdesk are doing to implement their data 
mesh.&lt;/p&gt;

&lt;h3 id=&quot;talkdesk&quot;&gt;Talkdesk&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://www.talkdesk.com&quot;&gt;Talkdesk&lt;/a&gt; is a contact center as a service. Talkdesk
was created at a &lt;a href=&quot;https://www.twilio.com&quot;&gt;Twilio&lt;/a&gt; Hackathon in 2011. They just 
hit a 10 billion dollar valuation. As a fast growing startup, they are growing 
their product strategy at a fast pace, and deal with a large data sets to
analyze regularly.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-scale.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The Talkdesk product is deployed in cloud infrastructure and provides all the 
infrastructure for operating a call center. Its architecture is heavily 
event-driven. Dealing with realtime events at scale is difficult and requires a 
reactive and flexible architecture.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-events.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The early architecture for the analytics platform followed a traditional
approach using Spark and Fivetran to ingest data into Redshift. It had various
pipelines to update the data for downstream consumption.&lt;/p&gt;

&lt;p&gt;This centralized workflow made communication across data entity management much
simpler as it all exists on the same team. However, scaling caused increased 
backlogs, which delayed analysis and deployments. It also made it difficult to 
handle different use cases like realtime and historical use cases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The use cases between analytics and transactional are varied and overlapping. 
Live data typically feeds into stateful databases that updates as data arrives. 
To analyze data in motion, you need a realtime database. Historical data exists
to keep a backup of multiple copies of different states over time. This enables
trend analysis over longer periods of time versus right now. One challenge 
Talkdesk faced was realizing a robust architecture that satisfies analyzing live
data that gets the latest changes as they arrive to OLTP databases while 
meeting all the analytics use cases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/olap-oltp.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;To enable analytics across the various use cases, Talkdesk integrated Trino into
their workflow to read data across both live and historic data and merge them.
Using Trino enabled reading from live data feeding into their stateful data 
stores, and reads across historic data stores to produce data in the form needed
to support Talkdesk products.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;90%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture-2.0.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Trino is also used to hide the complexity of the data platform, and allows 
merging data across mulitple relational and object stores.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture-2.0-external.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-dbt&quot;&gt;Why dbt?&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/21.html&quot;&gt;episode 21&lt;/a&gt; we discussed using dbt and Trino in detail. As
we mentioned there:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;dbt is a transformation workflow tool that lets teams quickly and 
collaboratively deploy analytics code, following software engineering best 
practices like modularity, CI/CD, testing, and documentation. It enables 
anyone who knows SQL to build production-grade data pipelines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can achieve modular, repeatable, and testable units of processing by 
defining various models and definitions to the data pipelines. For example:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/dbt-definition.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Using the definitions above, Talkdesk engineers were able to consolidate all
these tasks into a much more simplified graph of operations.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/dbt-results.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-data-mesh&quot;&gt;Why data mesh?&lt;/h3&gt;

&lt;p&gt;While a lot of focus has gone into the technology aspects of data mesh, there is
also a lot to be said about the implications on the data team and 
socio-political policies that come with data mesh. Talkdesk also made structural
changes to their team to improve their data mesh strategy.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-data-team.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;how-data-mesh-affects-the-everyday-life-of-data-engineers&quot;&gt;How data mesh affects the everyday life of data engineers?&lt;/h3&gt;

&lt;p&gt;There is a real fear that comes around when management changes business 
policies. It can be hard to tell how these policies trickle down and affect
the engineer’s every day work life. In general, engineers become more entrenched
in different domains rather than trying to manage all domains under one 
architecture. Data engineers are distributed to product teams and specialize
in the domain’s data models. They also have specific knowledge of how to use
the self-service platform to integrate across other teams.&lt;/p&gt;

&lt;h3 id=&quot;comparing-microservices-based-applications-to-the-data-mesh&quot;&gt;Comparing microservices-based applications to the data mesh&lt;/h3&gt;

&lt;p&gt;When we think of a functional system for deploying and managing 
microservices-based applications, there are several features that we’ve come to
expect. It is very easy to compare the features of microservices-based 
applications to features of a data mesh. &lt;a href=&quot;https://blog.starburst.io/data-mesh-a-software-engineers-perspective&quot;&gt;Data Mesh: A Software Engineer’s Perspective&lt;/a&gt;
blog.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-partitioned-table-tests-and-fixed-pr-9757&quot;&gt;PR of the week: Partitioned table tests and fixed PR 9757&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/trino/pull/9757&quot;&gt;PR of the week&lt;/a&gt;
is for the Iceberg connector. Release 364 had quite a few improvements for 
Iceberg and handled small issues that could cause query failure in some
scenarios. This PR addressed a query failure when reading a partition on a 
UUID column.&lt;/p&gt;

&lt;p&gt;Thanks to Piotr Findeisen for fixing this and many other bugs, as well as,
improving performance in the Iceberg connector!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-whats-the-difference-between-location-and-external_location&quot;&gt;Question of the week: What’s the difference between &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;?&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://www.trinoforum.org/t/105&quot;&gt;question of the week&lt;/a&gt; comes from 
Aakash Nand on Slack and ported to Trino Forum. Aakash asks:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When creating a Hive table in Trino, what is the difference between 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; . If I have to create external table I have
to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; right? What is the difference between these two?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This was answered Arkadiusz Czajkowski:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Tables created with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; are managed tables. You have full control over 
them from their creation to modification. tables created with 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; are tables created by third party systems. We just access 
them mostly for read. I would encourage you to use location in your case.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/geekculture/trino-dbt-a-match-in-sql-heaven-1df2a3d12b5e&quot;&gt;Trino + dbt = a match made in SQL heaven? Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/episodes/21.html&quot;&gt;Trino + dbt = a match made in SQL heaven? TCB episode&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;Data Mesh Principles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.starburst.io/data-mesh-a-software-engineers-perspective&quot;&gt;Data Mesh: A Software Engineer’s Perspective&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>29: What is Trino and the Hive connector</title>
      <link href="https://trino.io/episodes/29.html" rel="alternate" type="text/html" title="29: What is Trino and the Hive connector" />
      <published>2021-10-28T00:00:00+00:00</published>
      <updated>2021-10-28T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/29</id>
      <content type="html" xml:base="https://trino.io/episodes/29.html">&lt;h2 id=&quot;release-364&quot;&gt;Release 364&lt;/h2&gt;

&lt;p&gt;Release 364 is just around the corner, here is Manfred’s release preview:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... RENAME TO&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;A whole bunch of performance improvements&lt;/li&gt;
  &lt;li&gt;Elasticsearch connector no longer fails if fields with unsupported types exist&lt;/li&gt;
  &lt;li&gt;Hive connector has optimize procedure now!&lt;/li&gt;
  &lt;li&gt;Parquet and Avro fixes and improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-what-is-trino&quot;&gt;Concept of the week: What is Trino?&lt;/h2&gt;

&lt;p&gt;Trino is the project created by Martin Traverso, Dain Sundstrom, David Phillips,
and Eric Hwang in 2012 to replace the 300PB Hive data warehouse at Facebook. The
goal of Trino is to run fast ad-hoc analytics queries over big data file systems
like HDFS and object stores like S3.&lt;/p&gt;

&lt;p&gt;An initially unintended but now characteristic feature of Trino is its ability 
to execute federated queries over various distributed data sources. This
includes, but is not limited to: Accumulo, BigQuery, Apache Cassandra, 
ClickHouse, Druid, Elasticsearch, Google Sheets, Apache Iceberg, Apache Hive, 
JMX, Apache Kafka, Kinesis, Kudu, MongoDB, MySQL, Oracle, Apache Phoenix, 
Apache Pinot, PostgreSQL, Prometheus, Redis, Redshift, SingleStore (MemSQL), 
Microsoft SQL Server.&lt;/p&gt;

&lt;p&gt;How does Trino query across everything from data lakes, SQL, and NoSQL databases
at unprecedented speeds? It helps to start by going over Trino’s architecture:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/1-architecture.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Trino consists of two types of nodes, &lt;em&gt;coordinator&lt;/em&gt; and &lt;em&gt;worker&lt;/em&gt; nodes. The 
coordinator plans, and schedules the processing of SQL queries. The queries are 
submitted by users directly or with connected SQL reporting tools. The workers 
actually carry out more of the processing by reading the data from the source or
performing various operations within the task(s) they are assigned.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/2-SPI.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Trino is able to query over multiple data types by exposing a common interface
called the SPI (Service Provider Interface) that enables the core engine to
treat the interactions with each data source the same. Each connector must then
implement the SPI which includes exposing metadata, statistics, data location, 
and establishing one or more connections with an underlying data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/3-parser-planner.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Many of these interfaces are used in the coordinator during the analysis and 
planning phases. The analyzer, for example, uses the metadata SPI to make sure
the table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause actually exists in the data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/4-distributed-query-plan.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Once a logical query plan is generated, the coordinator then converts this to a
distributed query plan that maps actions into stages that contain tasks to be
run on nodes. Stages model the sequence of events and a directed acyclic graph
(DAG).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/5-task-management.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The coordinator then schedules tasks over the worker nodes as efficiently as 
possible, depending on the physical layout and distribution of the data.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/6-splits.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Data is split and distributed across the worker nodes to provide 
inter-node parallelism.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/7-parallelism-over-drivers.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Once this data arrives to the worker node, it is further divided and processed in 
parallel. Workers submit the processed data back to coordinator. Finally, the 
coordinator provides the results of the query to the user.&lt;/p&gt;

&lt;h2 id=&quot;pr-8821-add-https-query-event-logger&quot;&gt;PR 8821 Add HTTP/S query event logger&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/8821&quot;&gt;Pull request 8821&lt;/a&gt; enables Trino cluster 
owners to log query processing metadata by submitting it to an HTTP endpoint. 
This may be used for usage monitoring and alarming, but it might also be used to
extract analytics on cluster usage, such as tables/column usage metrics.&lt;/p&gt;

&lt;p&gt;Query events are serialized to JSON and sent to the provided address over HTTP 
or over HTTPS. Configuration allows selecting which events should be included.&lt;/p&gt;

&lt;p&gt;Thanks for the contribution &lt;a href=&quot;https://github.com/mosiac1&quot;&gt;mosiac1&lt;/a&gt; and others at
Bloomberg!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/event-listeners-http.html&quot;&gt;Read the docs&lt;/a&gt; 
to learn more about this exciting feature!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-does-the-hive-connector-depend-on-the-hive-runtime&quot;&gt;Question of the week: Does the Hive connector depend on the Hive runtime?&lt;/h2&gt;

&lt;p&gt;This week’s question covers a lot of the confusion around the &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html&quot;&gt;Hive
connector&lt;/a&gt;. In short, the answer 
is that the Hive runtime is not required. There’s more information available in 
the &lt;a href=&quot;https://trino.io/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Intro to the Hive Connector blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ZwaVZplVmVA&quot;&gt;An Overview of the Starburst Trino Query Optimizer (Karol Sobczak)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 364</summary>

      
      
    </entry>
  
    <entry>
      <title>28: Autoscaling streaming ingestion to Trino with Pravega</title>
      <link href="https://trino.io/episodes/28.html" rel="alternate" type="text/html" title="28: Autoscaling streaming ingestion to Trino with Pravega" />
      <published>2021-10-14T00:00:00+00:00</published>
      <updated>2021-10-14T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/28</id>
      <content type="html" xml:base="https://trino.io/episodes/28.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Derek Moore, Software Senior Principal Engineer at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/derekm00r3&quot;&gt;@derekm00r3&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Andrew Robertson,Principal Software Engineer at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/andrew-robertson-986b885/&quot;&gt;@andrew-robertson&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Karan Singh, Software Engineer 2 at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/singhkaranrakesh/&quot;&gt;@singhkaranrakesh&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;Get ready for &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;Trino Summit&lt;/a&gt;, coming
October 21st and 22nd! This annual Trino community event is where we gather 
practitioners that deploy Trino at scale and share their experiences and best 
practices with the rest of the community. While the planning for this event was 
a bit chaotic due to the pandemic, we have made the final decision to host the 
event virtually for the safety of all the attendees. We look forward to seeing
you there, and can’t wait to share more information in the coming weeks!&lt;/p&gt;

&lt;h2 id=&quot;release-363&quot;&gt;Release 363&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New HTTP event listener plugin&lt;/li&gt;
  &lt;li&gt;Insert overwrite for S3-backed tables&lt;/li&gt;
  &lt;li&gt;Support for Elasticsearch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaled_float&lt;/code&gt; type&lt;/li&gt;
  &lt;li&gt;Support for Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tuple&lt;/code&gt; type&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;time&lt;/code&gt; type in MySQL connector&lt;/li&gt;
  &lt;li&gt;Support for SQLServer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datetimeoffset&lt;/code&gt; type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Misc performance and memory usage improvements&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW ROLES&lt;/code&gt; fix&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt; fix for estimate display&lt;/li&gt;
  &lt;li&gt;Numerous improvements for Parquet files in Hive and Iceberg connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-363.html&quot;&gt;https://trino.io/docs/current/release/release-363.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-event-stream-abstractions-and-pravega&quot;&gt;Concept of the week: Event stream abstractions and Pravega&lt;/h2&gt;

&lt;h3 id=&quot;events-and-streams&quot;&gt;Events and streams&lt;/h3&gt;

&lt;p&gt;What is an event? This sounds like a silly question when asked generally. The
answer is less clear when discussing event-driven systems though. An &lt;strong&gt;event&lt;/strong&gt;
is an action or occurrence that is captured by either a sensor, or a generated
by a source system, and emitted to a sink system. Some examples include user
events from an application, system events in telemetry systems, or sensor events
from monitoring applications.&lt;/p&gt;

&lt;p&gt;What is an event stream? Now knowing what an event is, an &lt;strong&gt;event stream&lt;/strong&gt; is an 
unbounded set of events that are tracked over time.&lt;/p&gt;

&lt;p&gt;In this simple view, an event stream contains a sequential list of events. The
list contains events that have been processed, and some that still need to be 
processed.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/event-stream.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;This is very different from a more realistic view of event streams that 
considers that events arrive and are processed in parallel. Event load may also
fluctuate as events may burst around specific events or events have specific 
periodic behavior. While taking event ingest (writes) into consideration, it is
also important to consider event egress (reads) as part of the problem of 
representing event streams.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/event-stream-realistic.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-and-segments&quot;&gt;Pravega and segments&lt;/h3&gt;

&lt;p&gt;Engineers at Dell Labs wanted to find a better abstraction to solve for the 
problems they saw in existing event streaming systems. This included how to 
address this type of constant shift in scaling, while also addressing the 
brittle storage abstractions that even streams use today. The storage 
abstraction needs to allow for both real-time and historical analytics. The data
along a particular transaction also needs to be consistent.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Their solution is Pravega. The core of Pravega models streams built around a 
storage unit called a segment. A &lt;strong&gt;segment&lt;/strong&gt; is an append-only sequence of bytes
(not events/records). This offers a greater level of flexibility and better 
parallelism and serialization over streams. Pravega stream writers are then able
to write in parallel increasing ingest throughput.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/autoscale-parallel-segment.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;You can use &lt;strong&gt;routing keys&lt;/strong&gt; to map events to particular segments. Pravega
enforces order within specific keys, but does not guarantee ordering of events
across keys. The tradeoff is providing ordering of events versus higher 
parallelism and better performance.&lt;/p&gt;

&lt;p&gt;With segments, you can also scale up and scale down the number of segments 
depending on the workload you’re experiencing. Another compelling capability
this enables is managing transactions in the stream. As writers submit data,
they write to a temporary segment, which are merged to a permanent segment on
commit.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment-transactions.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The following diagram displays autoscaling splits and merges as specific routing
keys become more popular. To provide a clearer example, say that the routing 
keys are actually just hash geo location values for a taxi app that are mapped 
between zero and one. As certain locations become crowded, lets say that a lot 
of people are going home for the work day, and many taxis are in the downtown 
location. The locations mapped to the downtown routing keys can automatically 
trigger a split, and once the rush hour is over, it merges these segments as 
traffic slows down.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment-split-merge.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-architecture&quot;&gt;Pravega architecture&lt;/h3&gt;

&lt;p&gt;The Pravega architecture comes with writers groups and reader groups that scale
up and down along with the autoscaling applied to the segments. It consists of
a controller that maintains stream metadata and the segment store that works off
of tier one storage (Apache Bookkeeper) and tier two storage (Object storage).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/pravega-architecture.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Just like Trino, Pravega also aims to build a rich set of connectors with
systems that act as a source and sink. This includes a connector used for Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/pravega-connectors.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-compared-to-other-event-streaming-platforms&quot;&gt;Pravega compared to other event streaming platforms.&lt;/h3&gt;

&lt;p&gt;This chart is very helpful resource to summarize Pravega against other popular 
streaming platforms. This comes from the Pravega site so be sure to check for
an up to date list of these features moving forward.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Pravega&lt;/th&gt;
      &lt;th&gt;Kafka&lt;/th&gt;
      &lt;th&gt;Pulsar&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Transactions&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Event streams&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Long-term retention&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Durable by default&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Auto-scaling&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Ingestion of large data (video)&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;efficient at high partition counts&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Consistent state replication&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Key-value tables&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Comparison between Pravega, Kafka, and Pulsar: &lt;a href=&quot;https://pravega.io&quot;&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-week-querying-pravega-from-trino&quot;&gt;Demo of the week: Querying Pravega from Trino&lt;/h2&gt;

&lt;p&gt;This week the Pravega teams demonstrates an example from their &lt;a href=&quot;https://github.com/pravega/presto-connector/tree/main/getting-started&quot;&gt;getting-started&lt;/a&gt;
tutorial for the Trino connector.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pravega-presto-connector-pr-49&quot;&gt;PR of the week: Pravega presto-connector PR 49&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/pravega/presto-connector/pull/49&quot;&gt;PR of the week&lt;/a&gt;
doesn’t come from the Trino repository this week but rather the presto-connector
repository. The Trino portion of the repository was committed by Dell engineer 
Karan Singh. As it states, this now makes Pravega available from Trino along 
with the original Presto connector.&lt;/p&gt;

&lt;p&gt;Thanks Karan for adding Trino and Andrew for writing the original Presto-Pravega
connector!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-what-is-the-point-of-trino-forum-and-what-is-the-relationship-to-trino-slack&quot;&gt;Question of the week: What is the point of Trino Forum and what is the relationship to Trino Slack?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://www.trinoforum.org/t/what-is-the-point-of-this-forum-and-what-is-the-relationship-to-trino-slack/28&quot;&gt;question of the week&lt;/a&gt;
comes from the new Trino Forum by Starburst. Brian and a few others at Starburst
created. Slack is a much more adhoc platform for people to work 
through problems rather than to search and find solutions to problems. The Trino
community has such a great amount of knowledge accumulated in this Slack channel,
but there is no way for people to find answers unless they have joined here and 
none of the information we discuss can be found by a search engine like Google.&lt;/p&gt;

&lt;p&gt;Further, a lot of the answers are scattered between different conversations and 
this too can be condensed and simplified. I pondered about the best way for us 
to expose this and though maybe to add an FAQ page on &lt;trino.io&gt; but this would
get stale quickly and this would require a lot of work to be maintained at scale
without a crowdsourcing element. Instead, starting a [Discourse forum](https://www.discourse.org) 
(not to be confused with Discord) acts as a central repository of knowledge 
makes this information easily searchable. The forum is maintained by some of us 
at Starburst but over time we want more moderators from the community (this 
happens through merit and consistency using Discourse Trust levels).&lt;/trino.io&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.cncf.io/online-programs/pravega-rethinking-storage-for-streams/&quot;&gt;Pravega: Rethinking Storage For Streams&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>27: Trino gits to wade in the data LakeFS</title>
      <link href="https://trino.io/episodes/27.html" rel="alternate" type="text/html" title="27: Trino gits to wade in the data LakeFS" />
      <published>2021-09-30T00:00:00+00:00</published>
      <updated>2021-09-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/27</id>
      <content type="html" xml:base="https://trino.io/episodes/27.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Paul Singman, Developer Advocate at &lt;a href=&quot;https://treeverse.io/&quot;&gt;Treeverse&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/datawhisp&quot;&gt;@datawhisp&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;Get ready for &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;Trino Summit&lt;/a&gt;, coming
October 21st and 22nd! This annual Trino community event is where we gather 
practitioners that deploy Trino at scale, and share their experiences and best 
practices with the rest of the community. While the planning for this event was 
a bit chaotic due to the pandemic, we have made the final decision to host the 
event virtually for the safety of all the attendees. We look forward to seeing
you there, and can’t wait to share more information in the coming weeks!&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-lakefs-and-git-on-object-storage&quot;&gt;Concept of the week: LakeFS and Git on object storage&lt;/h2&gt;

&lt;p&gt;LakeFS offers git-like semantics over your files in the data lake. Akin to the
versioning you can do on Iceberg, you can also version your data with LakeFS, 
and roll back to previous commits when you make a mistake. LakeFS allows you to 
roll out new features in production or prod-like environments with ease and 
isolation from the real data. Join us as we dive into this awesome new way to 
approach versioning on your data!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/27/trino-lakefs.jpg&quot; /&gt;&lt;br /&gt;
Why we built LakeFS: &lt;a href=&quot;https://lakefs.io/why-we-built-lakefs-atomic-and-versioned-data-lake-operations/&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;features&quot;&gt;Features&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Exabytes scale version control&lt;/li&gt;
  &lt;li&gt;Git-like operations: branch, commit, merge, revert&lt;/li&gt;
  &lt;li&gt;Zero copy branching for frictionless experiments&lt;/li&gt;
  &lt;li&gt;Full reproducibility of data and code&lt;/li&gt;
  &lt;li&gt;Pre-commit/merge hooks for data CI/CD&lt;/li&gt;
  &lt;li&gt;Instantly revert changes to data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;use-cases&quot;&gt;Use cases&lt;/h3&gt;

&lt;h4 id=&quot;in-development&quot;&gt;In development&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Experiment - try new tools, upgrade versions, and evaluate code changes in 
isolation. By creating a branch of the data you get an isolated snapshot to run 
experiments over, while others are not exposed. Compare between branches with 
different experiments or to the main branch of the repository to understand a 
change’s impact.&lt;/li&gt;
  &lt;li&gt;Debug - checkout specific commits in a repository’s commit history to 
materialize consistent, historical versions of your data. See the exact state of
your data at the point-in-time of an error to understand its root cause.&lt;/li&gt;
  &lt;li&gt;Collaborate - avoid managing data access at the two extremes of either 
treating your data lake like a shared folder or creating multiple copies of the
data to safely collaborate. Instead, leverage isolated branches managed by 
metadata (not copies of files) to work in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;during-deployment&quot;&gt;During deployment&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Version Control - deploy data safely with CI/CD workflows borrowed from 
software engineering best practices. Ingest new data onto an isolated branch, 
perform data validations, then add to production through a merge operation.&lt;/li&gt;
  &lt;li&gt;Test - define pre-merge and pre-commit hooks to run tests that enforce schema 
and validate properties of the data to catch issues before they reach 
production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;in-production&quot;&gt;In production&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Roll back - recover from errors by instantly reverting data to a former, 
consistent snapshot of the data lake. Choose any commit in a repository’s commit
 history to revert in one atomic action.&lt;/li&gt;
  &lt;li&gt;Troubleshoot - investigate production errors by starting with a snapshot of 
the inputs to the failed process. Spend less time re-creating the state of 
datasets at the time of failure, and more time finding the solution.&lt;/li&gt;
  &lt;li&gt;Cross-collection consistency - provide consumers multiple synchronized 
collections of data in one atomic, revertable action. Using branches, writers 
provide consistency guarantees across different logical collections - merging to
 the main branch only after all relevant datasets have been created or updated 
 successfully.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://docs.lakefs.io/#use-cases&quot;&gt;https://docs.lakefs.io/#use-cases&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-week-running-trino-on-lakefs&quot;&gt;Demo of the week: Running Trino on LakeFS&lt;/h2&gt;

&lt;p&gt;In order to run Trino and LakeFS, you need Docker installed on your system with at least 4GB
of memory allocated to Docker.&lt;/p&gt;

&lt;p&gt;Let’s start up the LakeFS instance and the required PostgreSQL instance along 
with the typical Trino containers used with the Hive connector. 
Clone the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started&lt;/code&gt; repository and navigate to the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;community_tutorials/lakefs/trino-lakefs-minio/&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/lakefs/trino-lakefs-minio/

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once this is done, you can navigate to the following locations to verify that
everything started correctly.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Navigate to &lt;a href=&quot;http://localhost:8000&quot;&gt;http://localhost:8000&lt;/a&gt; to open the LakeFS user interface.&lt;/li&gt;
  &lt;li&gt;Log in with Access Key, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AKIAIOSFODNN7EXAMPLE&lt;/code&gt;, and Secret Access Key, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Verify that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example&lt;/code&gt; repository exists in the UI and open it.&lt;/li&gt;
  &lt;li&gt;The branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; in the repository, found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/main/&lt;/code&gt;, should be 
empty.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you have verified the repository exists, let’s go ahead and create a schema
under the Trino Hive catalog called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; that was pointing to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; but is
now wrapped by LakeFS to add the git-like layer around the file storage.&lt;/p&gt;

&lt;p&gt;Name the schema &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; as that is the schema we copy from the TPCH data set. 
Notice the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; property of the schema. It now has a namespace that is 
prefixed before the actual &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny/&lt;/code&gt; table directory. The prefix contains the 
repository name, then the branch name. All together this follows the pattern of 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;protocol&amp;gt;://&amp;lt;repository&amp;gt;/&amp;lt;branch&amp;gt;/&amp;lt;schema&amp;gt;/&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny
WITH (location = &apos;s3a://example/main/tiny&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, create two tables, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;
using the same namespace used in the schema and adding the table name. The query
retrieves the data from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; TPCH data set.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/main/tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/main/tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Verify that you can see the table directories in LakeFS once they exist.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run a query on these two tables using the standard table pointing to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;
branch.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny.customer c, minio.tiny.orders o
WHERE MKTSEGMENT = &apos;BUILDING&apos; AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE &amp;lt; date&apos;1995-03-15&apos;
GROUP BY ORDERKEY, ORDERDATE, SHIPPRIORITY
ORDER BY ORDERDATE;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Open the &lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&quot;&gt;LakeFS UI again&lt;/a&gt; 
and click on the &lt;strong&gt;Unversioned Changes&lt;/strong&gt; tab. Click &lt;strong&gt;Commit Changes&lt;/strong&gt;. Type a 
commit message on the popup and click &lt;strong&gt;Commit Changes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once the changes are commited on branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;, click on the &lt;strong&gt;Branches&lt;/strong&gt; tab.
Click &lt;strong&gt;Create Branch&lt;/strong&gt;. Name a new branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; that branches off of the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch. Now click &lt;strong&gt;Create&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Although there is a branch that exists called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;, this only exists 
logically. We need to make Trino aware by adding another schema and tables 
that point to the new branch. Do this by making a new schema called 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; and changing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; property to point to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;
branch instead of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny_sandbox
WITH (location = &apos;s3a://example/sandbox/tiny&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema exists, we can copy the table definitions
of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table from the original tables created. We got
the schema for free by copying it directly from the TPCH data using the CTAS 
statement. We don’t want to use CTAS in this case as it not only copies the 
table definition, but also the data. This duplication of data is unnecessary and
is what creating a branch in LakeFS avoids. We want to just copy the table
definition using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt; statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SHOW CREATE TABLE minio.tiny.customer;
SHOW CREATE TABLE minio.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Take the output and update the schema to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;
to point to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; for both tables.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny_sandbox.customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
WITH (
   external_location = &apos;s3a://example/sandbox/tiny/customer&apos;,
   format = &apos;ORC&apos;
);

CREATE TABLE minio.tiny_sandbox.orders (
   orderkey bigint,
   custkey bigint,
   orderstatus varchar(1),
   totalprice double,
   orderdate date,
   orderpriority varchar(15),
   clerk varchar(15),
   shippriority integer,
   comment varchar(79)
)
WITH (
   external_location = &apos;s3a://example/sandbox/tiny/orders&apos;,
   format = &apos;ORC&apos;
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once these table definitions exist, go ahead and run the same query as before,
but update using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema instead of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; schema.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny_sandbox.customer c, minio.tiny_sandbox.orders o
WHERE MKTSEGMENT = &apos;BUILDING&apos; AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE &amp;lt; date&apos;1995-03-15&apos;
ORDER BY ORDERDATE;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One last bit of functionality we want to test is the merging capabilities. To
do this, create a table called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch using a CTAS
statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny_sandbox.lineitem
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/sandbox/tiny/lineitem/&apos;
) 
AS SELECT * FROM tpch.tiny.lineitem;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Verify that you can see three table directories in LakeFS including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; 
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=sandbox&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=sandbox&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verify that you do not see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the table directories in LakeFS in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also verify this by running queries against &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the tables
pointing to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch that should fail on the tables pointing to the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;To merge the new table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; to show up in the main branch, first commit 
the new change to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; by again going to &lt;strong&gt;Unversioned Changes&lt;/strong&gt; tab. 
Click &lt;strong&gt;Commit Changes&lt;/strong&gt;. Type a commit message on the popup and click 
&lt;strong&gt;Commit Changes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; add is committed, click on the &lt;strong&gt;Compare&lt;/strong&gt; tab. Set the
base branch to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; and the compared to branch to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;. You should see
the addition of a line item show up in the diff view. Click &lt;strong&gt;Merge&lt;/strong&gt; and click
&lt;strong&gt;Yes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once this is merged you should see the table data show up in LakeFS. Verify that
you can see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the table directories in LakeFS in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As before, we won’t be able to query this data from Trino until we run the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt; from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema and use the output to create
the table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; schema that is pointing to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8762-add-query-error-info-to-cluster-overview-page-in-web-ui&quot;&gt;PR of the week: PR 8762 Add query error info to cluster overview page in web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8762&quot;&gt;PR of the week&lt;/a&gt; adds some 
really useful context around query failures in the Trino Web UI. This PR was
created by &lt;a href=&quot;https://github.com/posulliv&quot;&gt;Pádraig O’Sullivan &lt;/a&gt;. For many, it can
be fustrating when a query fails and you have to do a lot of digging before you
understand even the type of error that is happening.This PR gives a better
highlight of what failed so that you don’t have to do a lot of investigation 
upfront to get a sense of what is happening and where to look next.&lt;/p&gt;

&lt;p&gt;Thank you so much Pádraig!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-why-are-deletes-so-limited-in-trino&quot;&gt;Question of the week: Why are deletes so limited in Trino?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://trinodb.slack.com/archives/CGB0QHWSW/p1632775855390300&quot;&gt;question of the week&lt;/a&gt;
comes from Marius Grama on our Trino community Slack. Marius created the 
&lt;a href=&quot;https://github.com/findinpath/dbt-trino-incremental-hive&quot;&gt;dbt-trino&lt;/a&gt; adapter 
and wants to implement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; functionality.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; checks whether there are entries in the target table that 
exist as well in the staging table, and it first deletes the target entries, 
before inserting the staging entries. Unfortunately the delete didn’t work for
RDBMS, Hive, or Iceberg. His questionis if this is a limitation of Trino for 
all connectors, and how we can approach the “delete” part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hive-metastore-why-its-still-here-and-what-can-replace-it/&quot;&gt;Hive Metastore - Why its still here and what can replace it&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hive-metastore-it-didnt-age-well/&quot;&gt;Hive Metastore - It didn’t age well&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/&quot;&gt;Hudi, Iceberg, Delta Lake Table Formats Compared&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/the-docker-everything-bagel-spin-up-a-local-data-stack/&quot;&gt;The Docker Everything Bagel&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>26: Trino discovers data catalogs with Amundsen</title>
      <link href="https://trino.io/episodes/26.html" rel="alternate" type="text/html" title="26: Trino discovers data catalogs with Amundsen" />
      <published>2021-09-16T00:00:00+00:00</published>
      <updated>2021-09-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/26</id>
      <content type="html" xml:base="https://trino.io/episodes/26.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Mark Grover, Co-creator of Amundsen and Founder at &lt;a href=&quot;https://www.stemma.ai/&quot;&gt;Stemma&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/mark_grover&quot;&gt;@mark_grover&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-362&quot;&gt;Release 362&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin is not yet available since release it 
not out… but soon.&lt;/p&gt;

&lt;p&gt;Manfreds notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Add new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listagg&lt;/code&gt; function contributed by Marius&lt;/li&gt;
  &lt;li&gt;Join performance and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; performance improvements&lt;/li&gt;
  &lt;li&gt;SQL security related changes in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER SCHEMA&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN table&lt;/code&gt; for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP&lt;/code&gt;/… &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Whole bunch of improvements in the BigQuery connector&lt;/li&gt;
  &lt;li&gt;Numerous improvements for Parquet file usage in Hive connector&lt;/li&gt;
  &lt;li&gt;All connector docs now have SQL support section&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-data-discovery-and-amundsen&quot;&gt;Concept of the week: Data discovery and Amundsen&lt;/h2&gt;

&lt;p&gt;Data discovery is a process that aids in the analysis of data where siloed data 
has been centralized, and it is difficult to find data or overlap between
disparate data sets. Many teams have their own view of the world when it comes 
to the data they need, but they commonly need to reason about how their data 
relates to data outside of their domain.&lt;/p&gt;

&lt;p&gt;There are typically questions about who owns what data to help identify 
individuals responsible for maintaining the standards. Additionally, there are 
also issues around providing documentation around the data, and to identify who 
to call for help if there are issues using the data. This allows analysts to 
discover patterns in the data, and periodically audit the data storage 
practices. Interesting questions also arise around existing policies, and can 
encourage a system of record that act as a shared front end around their data 
policies.&lt;/p&gt;

&lt;h3 id=&quot;what-is-amundsen&quot;&gt;What is Amundsen?&lt;/h3&gt;

&lt;p&gt;Amundsen provides data discovery by using ETL processes to scrape metadata from
all of the data sources. It creates a central location to collect all that 
metadata and enables search and other analytics of this metadata. Here’s how the
project describes itself on &lt;a href=&quot;https://www.amundsen.io/amundsen/&quot;&gt;the Amundsen website&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Amundsen is a data discovery and metadata engine for improving the 
productivity of data analysts, data scientists and engineers when interacting
with data. It does that today by indexing data resources (tables, dashboards,
streams, etc.) and powering a page-rank style search based on usage patterns 
(e.g. highly queried tables show up earlier than less queried tables).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Amundsen has an architecture that interacts primarily with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;information_schema&lt;/code&gt;
tables, among other metadata, depending on the data source. In Trino’s case, 
&lt;a href=&quot;https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/presto_view_metadata_extractor.py&quot;&gt;the extractor used&lt;/a&gt; 
connects directly to the Hive metastore database, for Trino views, since 
they’re stored there. Physical tables use the &lt;a href=&quot;https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/hive_table_metadata_extractor.py&quot;&gt;HiveTableMetadataExtractor&lt;/a&gt;
to load these tables into Amundsen. This makes sense since the data is stored in
the Hive table format. For non-Hive use cases, you generally want to bypass
using Trino (for now) and directly connect Amundsen to each data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Amundsen includes an ETL framework called &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/&quot;&gt;databuilder&lt;/a&gt;
that runs multiple jobs. Jobs contain an ETL task to extract the metadata and 
load it into the two databases that are central to Amundsen, Neo4j and 
Elasticsearch. Neo4j stores the core metadata that is represented on the UI. 
Elasticsearch enables search over the many fields in the metadata. Ingestion via
ETL follows the following steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ingest base data to Neo4j.&lt;/li&gt;
  &lt;li&gt;Ingest additional data and decorate Neo4j over base data.&lt;/li&gt;
  &lt;li&gt;Update Elasticsearch index using Neo4j data.&lt;/li&gt;
  &lt;li&gt;Remove stale data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each job contains an ETL task. The task must define an extractor and a loader, 
and optionally a translator. You can see example configurations for different
extractors on the website, like the &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/#hivetablemetadataextractor&quot;&gt;example for the HiveTableMetadataExtractor&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-job.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The metadata is modeled using a graph representation in neo4j and optionally
&lt;a href=&quot;https://atlas.apache.org/#/&quot;&gt;Apache Atlas&lt;/a&gt; to model advanced concepts, such as,
lineage and other relations.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-metadata.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;You can learn more about the &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/docs/models/&quot;&gt;models in the metadata here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;amundsen-resources&quot;&gt;Amundsen resources&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Docs: &lt;a href=&quot;https://www.amundsen.io/amundsen/&quot;&gt;https://www.amundsen.io/amundsen/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;GitHub: &lt;a href=&quot;https://github.com/amundsen-io/amundsen&quot;&gt;https://github.com/amundsen-io/amundsen&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;YouTube: &lt;a href=&quot;https://www.youtube.com/playlist?list=PL0UJdxehTNlKnGU_h7k2fzJyvAiufeh1U&quot;&gt;https://www.youtube.com/playlist?list=PL0UJdxehTNlKnGU_h7k2fzJyvAiufeh1U&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slack: &lt;a href=&quot;https://join.slack.com/t/amundsenworkspace/shared_invite/enQtNTk2ODQ1NDU1NDI0LTc3MzQyZmM0ZGFjNzg5MzY1MzJlZTg4YjQ4YTU0ZmMxYWU2MmVlMzhhY2MzMTc1MDg0MzRjNTA4MzRkMGE0Nzk&quot;&gt;Join&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;amundsen-as-a-subcomponent-to-data-mesh&quot;&gt;Amundsen as a subcomponent to data mesh&lt;/h3&gt;

&lt;p&gt;A new architecture, philosophy, and yes, &lt;a href=&quot;https://www.merriam-webster.com/dictionary/buzzword&quot;&gt;buzzword&lt;/a&gt; 
that is gaining momentum is the &lt;em&gt;data mesh&lt;/em&gt;. While it certainly still not 
concretely defined, it is in the research and development phase. Data mesh is
gaining a lot of attention as a potential alternative to data lakes and data 
warehouses for analytics solutions.&lt;/p&gt;

&lt;p&gt;Data mesh mirrors the philosophy of microservice architecture. It argues that 
data should be defined and maintained by teams responsible for their business 
domain similar to how the responsibility is delegated at the service layer. 
Since not everyone is going to be a data engineer on the domain team, there must
be some consideration for the architecture of such a platform. The author of 
this paradigm, Zhamak Dehghani, lays out 4 principles that characterize a data 
mesh. Below are the principles of a Data mesh. Below the systems that provide 
some or all of the solution for a principle are listed in parentheses.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Domain-oriented decentralized data ownership and architecture (Trino &amp;amp; Amundsen)&lt;/li&gt;
  &lt;li&gt;Data as a product	(Amundsen)&lt;/li&gt;
  &lt;li&gt;Self-serve data infrastructure as a platform (Trino)&lt;/li&gt;
  &lt;li&gt;Federated computational governance (Amundsen to some extent)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;stemma&quot;&gt;Stemma&lt;/h3&gt;

&lt;p&gt;Like with many successful open source projects, there are enterprise products 
that build on and support the open source project. &lt;a href=&quot;https://www.stemma.ai/&quot;&gt;Stemma&lt;/a&gt; 
is the enterprise company that supports Amundsen. It’s founded by Mark and 
others central to the open source project.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-index-trino-views&quot;&gt;PR of the week: Index Trino views&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/amundsen-io/amundsen/commit/4cfc55d311ca7bc9b02df26ece3b4bde5eedecd6#diff-1c6e94c4ea77e16625f97d4e029f5611d3f3b10d428ab6038edc0b931df4243c&quot;&gt;PR (or should we say commit) of the week&lt;/a&gt;, 
adds the original Trino extractor. As mentioned above this extractor is only
needed for views as the physical tables exist in Hive and are retrieved.&lt;/p&gt;

&lt;h3 id=&quot;call-to-contribute-to-amundsen&quot;&gt;Call to contribute to Amundsen&lt;/h3&gt;

&lt;p&gt;If you want to help out, you can consider adding the Trino image similar to 
&lt;a href=&quot;https://github.com/amundsen-io/amundsenfrontendlibrary/commit/4e24bfe1c1cd3c6cf568ee1b3e39580686fafbe6&quot;&gt;this commit completed a while back&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-extracting-metadata-from-hive-metastore-and-loading-it-into-amundsen&quot;&gt;Demo: Extracting metadata from Hive metastore and loading it into Amundsen&lt;/h2&gt;

&lt;p&gt;There were technical difficulties on the day of broadcasting the show, so the
demo was moved to its own separate video.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/m-mL00FkWd0&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;The steps in this demo are adapted from the &lt;a href=&quot;https://www.amundsen.io/amundsen/installation/&quot;&gt;Amundsen installation page&lt;/a&gt;.
Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started/community_tutorials/amundsen&lt;/code&gt; 
directory. For this demo you need at least 3GB of memory allocated to your 
Docker application.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/amundsen

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once all the services are running, clone the Amundsen repository in a separate
terminal. Then navigate to the databuilder folder and install all the 
dependencies:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone --recursive https://github.com/amundsen-io/amundsen.git
cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Navigate to MinIO at &lt;a href=&quot;http://localhost:9000&quot;&gt;http://localhost:9000&lt;/a&gt; to create the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; bucket for the
schema in Trino to map to. In Trino, create a schema and a couple tables in the 
existing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; catalog:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny
WITH (location = &apos;s3a://tiny/&apos;);

CREATE TABLE minio.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Navigate back to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started/community_tutorials/amundsen&lt;/code&gt; directory in the same 
Python virtual environment you just opened.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd trino-getting-started/community_tutorials/amundsen
python3 assets/scripts/sample_trino_data_loader.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;View the Amundsen UI at &lt;a href=&quot;http://localhost:5000&quot;&gt;http://localhost:5000&lt;/a&gt; and try to search test, it 
should return the tables you just created.&lt;/p&gt;

&lt;p&gt;You can verify dummy data has been ingested into Neo4j by visiting &lt;a href=&quot;http://localhost:7474/browser/&quot;&gt;http://localhost:7474/browser/&lt;/a&gt;.
Log in as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;neo4j&lt;/code&gt; with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test&lt;/code&gt; password and run 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH (n:Table) RETURN n LIMIT 25&lt;/code&gt; in the query box. You should see few tables.&lt;/p&gt;

&lt;p&gt;If you have any issues, look at some of the &lt;a href=&quot;https://www.amundsen.io/amundsen/installation/#troubleshooting&quot;&gt;troubleshooting steps&lt;/a&gt;
in the Amundsen installation page.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-can-i-add-a-udf-without-restarting-trino&quot;&gt;Question of the week: Can I add a UDF without restarting Trino?&lt;/h2&gt;

&lt;p&gt;This weeks question of the week comes in from the Trino Slack from Chen Xuying.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Is there any way to register &lt;a href=&quot;https://trino.io/docs/current/develop/functions.html&quot;&gt;a new user defined function (UDF)&lt;/a&gt; 
and needn’t restart coordinator and worker?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Currently, no. In Java, jar files and all the java code is loaded up on start 
time. So in order to load the files on all the worker nodes and coordinator, you
need to restart. There are various ways for UDFs to be implemented in a dynamic
way so we are still looking for a suggestion here.&lt;/p&gt;

&lt;p&gt;One option, as Manfred mentions, would be to load Javascript as a UDF as Java
allows to compile Javascript. This would allow for new functions to be added 
without restart. There may be other ways to acheive and we invite you to
contribute your ideas!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;Data Mesh Principles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.starburst.io/trino-data-governance-and-accelerating-data-science&quot;&gt;Trino, Data Governance, and Accelerating Data Science&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Mark Grover, Co-creator of Amundsen and Founder at Stemma (@mark_grover). Release 362</summary>

      
      
    </entry>
  
    <entry>
      <title>25: Trino going through changes</title>
      <link href="https://trino.io/episodes/25.html" rel="alternate" type="text/html" title="25: Trino going through changes" />
      <published>2021-09-02T00:00:00+00:00</published>
      <updated>2021-09-02T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/25</id>
      <content type="html" xml:base="https://trino.io/episodes/25.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ayush Chauhan, Data Platform Engineer at &lt;a href=&quot;https://www.zomato.com/who-we-are&quot;&gt;Zomato&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/ayush-chauhan/&quot;&gt;Ayush Chauhan&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Gunnar Morling, Lead of Debezium and Open source software engineer at &lt;a href=&quot;https://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/gunnarmorling&quot;&gt;@gunnarmorling&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Ashhar Hasan, Software Engineer at &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/hashhar&quot;&gt;@hashhar&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-361&quot;&gt;Release 361&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for OAuth2/OIDC opaque access tokens&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for Pinot&lt;/li&gt;
  &lt;li&gt;Better performance for Parquet files with column indexes&lt;/li&gt;
  &lt;li&gt;Support for reading fields as JSON values in Elasticsearch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Predicate pushdown in Cassandra&lt;/li&gt;
  &lt;li&gt;Metadata cache size limitation in a few connectors&lt;/li&gt;
  &lt;li&gt;Lots of improvements for Hive view support&lt;/li&gt;
  &lt;li&gt;Glue table statistics improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-361.html&quot;&gt;https://trino.io/docs/current/release/release-361.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-change-data-capture&quot;&gt;Concept of the week: Change Data Capture&lt;/h2&gt;

&lt;p&gt;If you know Trino, you know it allows for flexible architectures that include 
many systems with varying use cases they support. We’ve come to accept this 
potpourri of systems as a general modus operandi for most businesses.&lt;/p&gt;

&lt;p&gt;Many times the data gets copied to different systems to accomplish varying use 
cases from performance and data warehousing to merge cross cutting data into a 
single store. When copying data between systems, how do these systems stay in 
sync? It’s a critical need especially for Trino to know that the state across 
the data sources we query is valid.&lt;/p&gt;

&lt;p&gt;To answer this, we can use the concept of Change Data Capture (CDC). CDC is a 
powerful concept that considers a data source(s), called a systems of record(s), 
that store the true state of a system. The systems of records are monitored for
changes, and upon detecting changes, the CDC system propogates changes to a 
number of target systems.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/cdc.png&quot; /&gt;&lt;br /&gt;
Change Data Capture: &lt;a href=&quot;https://medium.com/event-driven-utopia/a-gentle-introduction-to-event-driven-change-data-capture-683297625f9b&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;debezium-for-cdc&quot;&gt;Debezium for CDC&lt;/h3&gt;

&lt;p&gt;One implemention of CDC that has grown tremendously in popularity since its 
inception is called Debezium. According to &lt;a href=&quot;https://debezium.io&quot;&gt;https://debezium.io&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Debezium is an open-source distributed platform for change data capture. Start
it up, point it at your databases, and your apps can start responding to all 
of the inserts, updates, and deletes that other apps commit to your databases.
Debezium is durable and fast, so your apps can respond quickly and never miss
an event, even when things go wrong.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The common way Debezium is deployed in the wild is using [Kafka Connect(https://docs.confluent.io/platform/current/connect/index.html) 
and defining the Debezium source connectors. You can then use the Kafka Connect 
ecosystem to create to different targets downstream.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/debezium-architecture.png&quot; /&gt;&lt;br /&gt;
The Debezium architecture with Kafka Connect: &lt;a href=&quot;https://debezium.io/documentation/reference/architecture.html&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Another alternative, if you don’t want to use Kafka, is to use dedicated Debezium
servers to implement CDC and push the logs to the target database downstram 
using Debezium connectors.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/debezium-server-architecture.png&quot; /&gt;&lt;br /&gt;
The Debezium standalone server architecture: &lt;a href=&quot;https://debezium.io/documentation/reference/architecture.html&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;While CDC is the primary focus, Debezium also provides support for more advanced
concepts such as the &lt;a href=&quot;https://debezium.io/documentation/reference/integrations/outbox.html&quot;&gt;outbox pattern support for Quarkus apps&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;debezium--trino-at-zomato&quot;&gt;Debezium + Trino at Zomato&lt;/h3&gt;

&lt;p&gt;Zomato is a technology platform that connects customers, restaurant partners and
delivery partners, serving their multiple needs. Customers use their platform to
search and discover restaurants, read and write customer generated reviews and 
view and upload photos, order food delivery, book a table and make payments 
while dining-out at restaurants. Clearly there’s a lot of data that can flow
through a platform like this. You’ll have both operational databases to support
the applications in this platform, but also need big data stores to store and
analyze all of this data.&lt;/p&gt;

&lt;p&gt;Here is one of the earlier iterations of Zomato’s big data architecture before
they were able to integrate Debezium. Ayush covers some of the pain points they
experienced before implementing CDC.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/zomato-before.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Once Zomato implemented CDC, they were able to keep their downstream Iceberg 
stores in sync across multiple operational systems. As a result the analytics 
data is now much more dependable.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/zomato-after.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4140-implement-aggregation-pushdown-in-pinot&quot;&gt;PR of the week: PR 4140 Implement aggregation pushdown in Pinot&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/6069&quot;&gt;PR of the week&lt;/a&gt; is actually a
throwback to &lt;a href=&quot;/episodes/13.html&quot;&gt;episode thirteen&lt;/a&gt;, &lt;em&gt;Trino takes a sip of Pinot&lt;/em&gt;,
where our guest &lt;a href=&quot;https://twitter.com/ElonAzoulay&quot;&gt;Elon Azoulay&lt;/a&gt; discussed some of
the upcoming features coming to the Pinot connector were. Push down aggregates
was on that list and this just landed in the 361 release!&lt;/p&gt;

&lt;p&gt;This PR implements aggregation pushdown for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COUNT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AVG&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAX&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SUM&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COUNT(DISTINCT)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_distinct&lt;/code&gt;. It is enabled by default and can be 
disabled using the configuration property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.aggregation-pushdown.enabled&lt;/code&gt; 
or the catalog session property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aggregation_pushdown_enabled&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;FYI: &lt;a href=&quot;https://github.com/trinodb/trino/pull/9208&quot;&gt;https://github.com/trinodb/trino/pull/9208&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks Elon!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-is-there-an-array-function-that-flattens-a-row-like-1--a-b-c-into-three-rows&quot;&gt;Question of the week: Is there an array function that flattens a row like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1 | [a, b, c]&lt;/code&gt; into three rows?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1630241736052500&quot;&gt;question of the week&lt;/a&gt;
comes from Brian Hudson on our Trino community Slack. Brian is dealing with an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY&lt;/code&gt;
type in one column and an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt; column in another. This is common when 
processing nested denormalized data. The goal is to make this row &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1 | [a, b, c]&lt;/code&gt;,
split the array into three rows.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1 | a
1 | b
1 | c
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Kasia answered this question by using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; on the array column. This
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; statement produces a single column of the size of the array and a 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; is performed with the original &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt; column.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
WITH t(x, y) AS (VALUES (1, ARRAY[&apos;a&apos;, &apos;b&apos;, &apos;c&apos;]))
SELECT x, y_unnested
FROM t
LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;

trino&amp;gt; WITH t(x, y) AS (VALUES (1, ARRAY[&apos;a&apos;, &apos;b&apos;, &apos;c&apos;]))
     -&amp;gt; SELECT x, y_unnested
     -&amp;gt; FROM t
     -&amp;gt; LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;
 x | y_unnested
---+------------
 1 | a
 1 | b
 1 | c
(3 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/event-driven-utopia/a-gentle-introduction-to-event-driven-change-data-capture-683297625f9b&quot;&gt;A gentle introduction to Event Driven Change Data Capture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/event-driven-utopia/a-visual-introduction-to-debezium-32563e23c6b8&quot;&gt;A Visual Introduction to Debezium&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/blog/&quot;&gt;Debezium Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/reference/&quot;&gt;Debezium Docs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/debezium/debezium-examples/&quot;&gt;Debezium Examples&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/online-resources/&quot;&gt;Debezium Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoq.com/presentations/data-streaming-kafka-debezium/&quot;&gt;Practical Change Data Streaming Use Cases with Apache Kafka &amp;amp; Debezium&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://speakerdeck.com/gunnarmorling/practical-change-data-streaming-use-cases-with-apache-kafka-and-debezium-qcon-san-francisco-2019&quot;&gt;Slides&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QYbXDp4Vu-8&quot;&gt;Apache Kafka and Debezium / DevNation Tech Talk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>24: Trinetes I: Trino on Kubernetes</title>
      <link href="https://trino.io/episodes/24.html" rel="alternate" type="text/html" title="24: Trinetes I: Trino on Kubernetes" />
      <published>2021-08-19T00:00:00+00:00</published>
      <updated>2021-08-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/24</id>
      <content type="html" xml:base="https://trino.io/episodes/24.html">&lt;p&gt;This is the first episode in a series where we cover the basics and just enough
advanced Kubernetes features and information to understand how to deploy Trino 
on Kubernetes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-k8s-architecture-containers-pods-and-kubelets&quot;&gt;Concept of the week: K8s architecture: Containers, Pods, and kubelets&lt;/h2&gt;

&lt;p&gt;For this concept of the week, we want to provide you a minimalistic overview of
what you need to know about Kubernetes to deploy Trino to a cluster.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Why Kubernetes?&lt;/strong&gt; Kubernetes is a container orchestration platform that allows
you to indicate how to manage containers declaritively using yaml 
configuration files. This definition can be tricky to understand if you don’t
have proper context. To make sure nobody is left behind, it is useful to 
cover what containers are:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;The traditional way to deploy an application is to take the compiled 
binary of that application and run it directly on computer hardware that has
an operating system to run the application on it. This works, but has a lot
of dependency on the underlying hardware and operating system to be 
functional and requires multiple applications to share the same resources. If
one of the applications fails and causes any of the shared resources to 
crash, it could cause all applications to fail on that machine.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;To remove these dependencies, engineers created virtual machines (VMs) by 
using a VM manager called the hypervisor that emulate hardware environments 
to host other operating systems. This is a big step forward as now each 
application can be isolated, but it comes at a great cost. Each virtual machine
hosts an entire operating system and is resource intensive and slow.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Containers are the newest type of deployment. Containers enable a logical
isolation of resources while still physically running on shared resources. 
All resources created in the hardware and operating systems exist on the host
system. The isolation restricts any interference from other processes. 
Containers achieve the goals of virtualization without sacrificing much 
performance or efficiency.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/container-evolution.svg&quot; /&gt;&lt;br /&gt;
 Source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Containerization simplified a trend in service oriented architecture called 
microservices. Microservices deploy loosely coupled and modular applications
rather than all-encompassing monolithic applications. With containers, these
applications can be deployed and scaled up quickly across various virtual and
physical machines without affecting other applications on the same machine. 
This is great, but results in new complexities. Some examples are the need 
for new approaches to monitoring the health of applications, scaling the 
applications as requests grow and diminish, redeploying crashed applications, 
and networking the applications together. In summary, all of these activities
can be considered container orchestration and this is exactly what Kubernetes
solves!&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/load-balancer.jpeg&quot; /&gt;&lt;br /&gt;
 Source: https://www.slideshare.net/devopsdaysaustin/continuously-delivering-microservices-in-kubernetes-using-jenkins&lt;br /&gt;
 Here we hae two services that each sit behind a load balancer provided and mapped by the Kuberenets cluster.
&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Kubernetes components and architecture&lt;/strong&gt;:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Node - The physical machine or VM running a kubelet and container runtime.&lt;/li&gt;
      &lt;li&gt;Control Plane - The container orchestration layer that exposes the API and 
interfaces to define, deploy, and manage the lifecycle of containers.&lt;/li&gt;
      &lt;li&gt;Cluster - a set of nodes connected to the same control plane.&lt;/li&gt;
      &lt;li&gt;Pod - single instance of an application, the smallest object in kubernetes.&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/components-of-kubernetes.svg&quot; /&gt;&lt;br /&gt;
 Source: https://kubernetes.io/docs/concepts/overview/components/
&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;kubernetes-control-plane-components&quot;&gt;Kubernetes control plane components:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;API server that nodes connect with and is the front end for users and 
 administrators of the cluster.&lt;/li&gt;
  &lt;li&gt;etcd keystore is a distributed store containing all data used to manage 
 the cluster&lt;/li&gt;
  &lt;li&gt;Scheduler that distributes work across nodes and assigns newly created 
 containers to nodes&lt;/li&gt;
  &lt;li&gt;Controllers that are the brain behind orchestration and monitors for 
 nodes going down etc…&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;kubernetes-worker-node-components&quot;&gt;Kubernetes worker node components:&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;container runtime - underlying runtime used to manage containers&lt;/li&gt;
  &lt;li&gt;kubelet - agent that checks the health and manages the pods running on the node based on the desired state provided in the PodSpec&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;kube-proxy - network proxy that maintains network rules applied to nodes and allows network access between Pods in a cluster&lt;/p&gt;

    &lt;p&gt;You can scale up multiple pods on a single node until the node has no more 
resources, at which time a new node needs to be added and pod instances are 
distributed between the nodes.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;So how does this relate to Trino?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Out of the box, Kubernetes can do these key things for Trino.
    &lt;ul&gt;
      &lt;li&gt;Simple scale up and down (manually tell k8s to start or kill Trino pods).&lt;/li&gt;
      &lt;li&gt;Kubernetes supports failover, meaning that your workers will restart if they die.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Advanced jobs that could exist but not currently in open source.
    &lt;ul&gt;
      &lt;li&gt;Auto-scaling via the &lt;a href=&quot;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/&quot;&gt;Horizontal Pod Autoscaler&lt;/a&gt; 
 and custom metrics.&lt;/li&gt;
      &lt;li&gt;Graceful Shutdowns are hooks that you can add into your cluster that wait 
 to shut down to avoid a failed call to a node that already shut down.&lt;/li&gt;
    &lt;/ul&gt;
    &lt;p align=&quot;center&quot;&gt;
     &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/kubernetes-shutdown.svg&quot; /&gt;&lt;br /&gt;
     Source: https://learnk8s.io/graceful-shutdown
  &lt;/p&gt;
    &lt;p align=&quot;center&quot;&gt;
     &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/graceful-shutdown.svg&quot; /&gt;&lt;br /&gt;
     Source: https://learnk8s.io/graceful-shutdown
  &lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;What the heck are helm charts then?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Helm is package manager for Kubernetes&lt;/li&gt;
  &lt;li&gt;Removes the need for managing lots of Kubernetes related yaml files&lt;/li&gt;
  &lt;li&gt;Best way to deploy apps to Kubernetes&lt;/li&gt;
  &lt;li&gt;Charts are available for many different applications&lt;/li&gt;
  &lt;li&gt;Helm chart for Trino&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-11-merge-contributor-version-of-k8s-charts-with-the-community-version&quot;&gt;PR of the week: PR 11 Merge contributor version of k8s charts with the community version&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/charts/pull/11&quot;&gt;PR of the week&lt;/a&gt; comes 
from a different repo under the trinodb org, &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;trinodb/charts&lt;/a&gt;.
This PR contains the merging from contributor &lt;a href=&quot;https://github.com/valeriano-manassero&quot;&gt;Valeriano Manassero&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Valerino maintains a &lt;a href=&quot;https://github.com/valeriano-manassero/helm-charts/tree/main/valeriano-manassero/trino&quot;&gt;very useful helm chart&lt;/a&gt;, 
that started before the Trino org had defined our own community chart. This pull
request effectively is trying to merge some useful features Valeriano added to 
his Trino helm chart so that it can be maintained in the community version.&lt;/p&gt;

&lt;p&gt;Valeriano’s Trino Helm Chart: &lt;a href=&quot;https://artifacthub.io/packages/helm/valeriano-manassero/trino&quot;&gt;https://artifacthub.io/packages/helm/valeriano-manassero/trino&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It hasn’t been merged yet but we are really looking forward to seeing this get
merged in. Thanks Valeriano!&lt;/p&gt;

&lt;h2 id=&quot;demo-running-the-trino-charts-with-kubectl&quot;&gt;Demo: Running the Trino charts with kubectl&lt;/h2&gt;

&lt;p&gt;For this weeks demo, you need to install &lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/&quot;&gt;kubectl&lt;/a&gt;,
&lt;a href=&quot;https://minikube.sigs.k8s.io/docs/start/&quot;&gt;minikube&lt;/a&gt; using the &lt;a href=&quot;https://minikube.sigs.k8s.io/docs/drivers/docker/&quot;&gt;docker driver&lt;/a&gt;,
and &lt;a href=&quot;https://helm.sh/docs/intro/install/&quot;&gt;helm&lt;/a&gt;. You can find the trino helm 
chart on ArtifactHub at this URL.&lt;/p&gt;

&lt;p&gt;https://artifacthub.io/packages/helm/trino/trino&lt;/p&gt;

&lt;p&gt;First, start your minikube instance.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube start --driver=docker
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now take a quick look at the state of your k8s cluster.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the template for the different trino catalogs on coordinators and workers.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the template for a single coordinator configuration.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/configmap-coordinator.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8080

  log.properties: |
    io.trino=INFO
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the tcb-trino service definition to run Trino.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the deployment definition for the service.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/deployment-coordinator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
spec:
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: coordinator
  template:
    metadata:
      labels:
        app: trino
        release: tcb
        component: coordinator
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-coordinator
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-coordinator
          image: &quot;trinodb/trino:latest&quot;
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now check the state of the k8s cluster again.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run the following command to expose the url and port to the localhost system.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube service tcb-trino --url
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Clean up all the resources.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete pod --all
kubectl delete replicaset --all
kubectl delete service tcb-trino
kubectl delete deployment tcb-trino-coordinator
kubectl delete configmap --all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now you can run the same demo using the helm chart which includes all of these
templates out-of-the-box. First add the trino helm chart, check the templates
that are produced by helm, and run the install.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# HELM DEMO

helm repo add trino https://trinodb.github.io/charts/

helm template tcb trino/trino --version 0.2.0

helm install tcb trino/trino --version 0.2.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that it’s installed, run the same command to expose the url of the service.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube service tcb-trino --url
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Clean up all the resources.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube delete
helm repo remove trino
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Summit is moving to 100% virtual: &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>This is the first episode in a series where we cover the basics and just enough advanced Kubernetes features and information to understand how to deploy Trino on Kubernetes.</summary>

      
      
    </entry>
  
    <entry>
      <title>23: Trino looking for patterns</title>
      <link href="https://trino.io/episodes/23.html" rel="alternate" type="text/html" title="23: Trino looking for patterns" />
      <published>2021-08-02T00:00:00+00:00</published>
      <updated>2021-08-02T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/23</id>
      <content type="html" xml:base="https://trino.io/episodes/23.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Kasia Findeisen, Software Engineer at &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;@kasiafi&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-360&quot;&gt;Release 360&lt;/h2&gt;

&lt;p&gt;In our last episode we already had a bit of a glimpse. Now the release is really out.&lt;/p&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic configuration of TLS for internal communication.&lt;/li&gt;
  &lt;li&gt;Improved correlated subqueries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for assuming an IAM role in Elasticsearch connector.&lt;/li&gt;
  &lt;li&gt;Support for Trino views in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Documentation for materialized views SQL commands&lt;/li&gt;
  &lt;li&gt;Partial support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and batch insert support for various JDBC-based connectors&lt;/li&gt;
  &lt;li&gt;A bunch of performance and correctness fixes&lt;/li&gt;
  &lt;li&gt;Numerous improvements on Iceberg connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-360.html&quot;&gt;https://trino.io/docs/current/release/release-360.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-row-pattern-matching-and-match_recognize&quot;&gt;Concept of the week: Row pattern matching and MATCH_RECOGNIZE&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;version 356&lt;/a&gt;. With
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.&lt;/p&gt;

&lt;p&gt;For more details, &lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;this blog post&lt;/a&gt; 
gives you a taste of row pattern matching capabilities, and a quick overview of 
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax.&lt;/p&gt;

&lt;p&gt;Let’s look at an example with data similar to the TPCH data. Here is an example, 
and the same goal: detect a “V”-shape of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
values over time for different customers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; WITH orders(customer_id, order_date, price) AS (VALUES
    (&apos;cust_1&apos;, DATE &apos;2020-05-11&apos;, 100),
    (&apos;cust_1&apos;, DATE &apos;2020-05-12&apos;, 200),
    (&apos;cust_2&apos;, DATE &apos;2020-05-13&apos;,   8),
    (&apos;cust_1&apos;, DATE &apos;2020-05-14&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-15&apos;,   4),
    (&apos;cust_1&apos;, DATE &apos;2020-05-16&apos;,  50),
    (&apos;cust_1&apos;, DATE &apos;2020-05-17&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-18&apos;,   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price &amp;lt; PREV(price),
                UP AS price &amp;gt; PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Two matches are detected, one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_1&lt;/code&gt;, and one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The matching algorithm was a collaboration between Martin and Kasia. This 
algorithm &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/operator/window/matcher/Matcher.java&quot;&gt;lives in the Matcher class&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;running semantics&lt;/em&gt; is the default both in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MESAURES&lt;/code&gt;
clauses. Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; only applies to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;To sum up, here’s one complex measure expression combining different elements
of the special syntax:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/measure-example.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8348-document-row-pattern-recognition-in-window&quot;&gt;PR of the week: PR 8348 Document row pattern recognition in window&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8348&quot;&gt;PR of the week&lt;/a&gt;, is adding 
documentation for applying pattern matching over windows. This is yet another
SQL functionality that Kasia added after getting the patter recognition to work
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-match_recognize-functionality-by-example&quot;&gt;Demo: Showing MATCH_RECOGNIZE functionality by example&lt;/h2&gt;

&lt;p&gt;Here are a few examples that Kasia will be running:&lt;/p&gt;

&lt;p&gt;Demo preview:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The initial query. That’s mostly the same query that’s in the blog post, the 
differences being:
    &lt;ul&gt;
      &lt;li&gt;Usage of a real table instead of a CTE.&lt;/li&gt;
      &lt;li&gt;Additional sort key for consistent ordering&lt;/li&gt;
      &lt;li&gt;Two more measures&lt;/li&gt;
    &lt;/ul&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The query returns many results (many matches). Wrap it in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count()&lt;/code&gt; 
aggregation to check how many there are:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       ))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Modify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; to limit the results. Now searching for a “big V”:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       ))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Unwrap from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count()&lt;/code&gt; aggregation to see the actual matches:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP PAST LAST ROW&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP TO NEXT ROW&lt;/code&gt; to 
detect overlapping matches:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP TO NEXT ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ONE ROW PER MATCH&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; (also, revert the previous 
change). Discuss the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classy&lt;/code&gt; column and explain the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;running&lt;/code&gt; semantics on the 
example of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_date&lt;/code&gt; column:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change the semantics of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_date&lt;/code&gt; column to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt;:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           FINAL LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-you-tag-a-list-of-rows-with-custom-periodic-rules&quot;&gt;Question of the week: How do you tag a list of rows with custom periodic rules?&lt;/h2&gt;

&lt;p&gt;A StackOverflow user asked how to tag orders in a table that meet a certain 
criterion that relies on periodicity. There are certainly some complicated and
inefficient SQL queries that you could craft to address these issues. However,
now with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; it is possible to do this and take advantage of the
efficient matching capabilities that Martin and Kasia have added.&lt;/p&gt;

&lt;p&gt;Here is an example orders table represented as a csv table:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Create_time, Order_id, person_id, variable_a
&apos;2021-06-01&apos;, 1234, 2232, 1
&apos;2021-06-02&apos;, 1235, 2232, 0.6
&apos;2021-06-03&apos;, 1236, 2232, 0.33
&apos;2021-06-04&apos;, 1237, 2232, 0.7
&apos;2021-06-05&apos;, 1238, 2232, 0.6
&apos;2021-06-06&apos;, 1239, 2232, 0.4
&apos;2021-06-07&apos;, 1240, 2232, 0.8
&apos;2021-06-08&apos;, 1241, 2232, 0.7
&apos;2021-06-09&apos;, 1242, 2232, 0.4
&apos;2021-06-10&apos;, 1243, 2232, 0.6
&apos;2021-06-11&apos;, 1244, 2232, 0.7
&apos;2021-06-12&apos;, 1245, 2232, 0.6
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The grace period logic will produce the final_hit column as the result of this 
logic:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_hit&lt;/code&gt; column equals to 1 if the variable A less than equal to 0.5&lt;/li&gt;
  &lt;li&gt;There is a grace period totaling 4 Orders after the hit, so any hit that 
is within the grace period will be ignored. The resulting row can be called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_hit&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on this logic, this is the desired result of the example is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Create_time, Order_id, person_id, variable_a, is_hit, final_hit
&apos;2021-06-01&apos;, 1234, 2232, 1, NULL, NULL
&apos;2021-06-02&apos;, 1235, 2232, 0.6, NULL, NULL
&apos;2021-06-03&apos;, 1236, 2232, 0.33, true, true
&apos;2021-06-04&apos;, 1237, 2232, 0.7, NULL, NULL
&apos;2021-06-05&apos;, 1238, 2232, 0.6, NULL, NULL
&apos;2021-06-06&apos;, 1239, 2232, 0.4, true, NULL
&apos;2021-06-07&apos;, 1240, 2232, 0.8, NULL, NULL
&apos;2021-06-08&apos;, 1241, 2232, 0.7, NULL, NULL
&apos;2021-06-09&apos;, 1242, 2232, 0.4, true, true
&apos;2021-06-10&apos;, 1243, 2232, 0.6, NULL, NULL
&apos;2021-06-11&apos;, 1244, 2232, 0.7, NULL, NULL
&apos;2021-06-12&apos;, 1245, 2232, 0.6, NULL, NULL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To accomplish this with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can do the following statement, 
which gives us the correct answer:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH data(Create_time, Order_id, person_id, variable_a) AS (
    VALUES
      (DATE &apos;2021-06-01&apos;, 1234, 2232, 1),
      (DATE &apos;2021-06-02&apos;, 1235, 2232, 0.6),
      (DATE &apos;2021-06-03&apos;, 1236, 2232, 0.33),
      (DATE &apos;2021-06-04&apos;, 1237, 2232, 0.7),
      (DATE &apos;2021-06-05&apos;, 1238, 2232, 0.6),
      (DATE &apos;2021-06-06&apos;, 1239, 2232, 0.4),
      (DATE &apos;2021-06-07&apos;, 1240, 2232, 0.8),
      (DATE &apos;2021-06-08&apos;, 1241, 2232, 0.7),
      (DATE &apos;2021-06-09&apos;, 1242, 2232, 0.4),
      (DATE &apos;2021-06-10&apos;, 1243, 2232, 0.6),
      (DATE &apos;2021-06-11&apos;, 1244, 2232, 0.7),
      (DATE &apos;2021-06-12&apos;, 1245, 2232, 0.6)
)
SELECT Create_time, Order_id, person_id, variable_a, if(variable_a &amp;lt;= 0.5, true, null) is_hit, final_hit
FROM data
   MATCH_RECOGNIZE (
     PARTITION BY person_id
     ORDER BY Create_time
     MEASURES if(classifier() = &apos;HIT&apos;, true, null) AS final_hit
     ALL ROWS PER MATCH WITH UNMATCHED ROWS
     AFTER MATCH SKIP PAST LAST ROW
     PATTERN (HIT G{,4})
     DEFINE /* G -- grace period */
            HIT AS HIT.variable_a &amp;lt;= 0.5
  )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Check out &lt;a href=&quot;https://stackoverflow.com/questions/68095763&quot;&gt;Martin and Kasia’s full answer to this question&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Kasia Findeisen, Software Engineer at Starburst (@kasiafi). Release 360</summary>

      
      
    </entry>
  
    <entry>
      <title>22: TrinkedIn: LinkedIn gets a Trino promotion</title>
      <link href="https://trino.io/episodes/22.html" rel="alternate" type="text/html" title="22: TrinkedIn: LinkedIn gets a Trino promotion" />
      <published>2021-07-22T00:00:00+00:00</published>
      <updated>2021-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/22</id>
      <content type="html" xml:base="https://trino.io/episodes/22.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/22/cbb-linkedin.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun, landing the job!
&lt;/p&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Akshay Rai, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/akshayrai09/&quot;&gt;@akshayrai09&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Jithesh Rajan, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/jithesh-tr-a3185b20/&quot;&gt;@jithesh-tr-a3185b20&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Laura Chen, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/laura-yu-chen-3a75413/&quot;&gt;@laura-yu-chen-3a75413&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Pratham Desai, Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/pratham-desai/&quot;&gt;@pratham-desai&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Raju Nalli, Staff Site Reliability Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/rajunalli/&quot;&gt;@rajunalli&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;upcoming-release-and-trino-summit&quot;&gt;Upcoming release and Trino Summit&lt;/h2&gt;

&lt;h3 id=&quot;sneak-peek-items-for-360&quot;&gt;Sneak peek items for 360&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic cluster internal TLS&lt;/li&gt;
  &lt;li&gt;Views support in Iceberg connector&lt;/li&gt;
  &lt;li&gt;Documentation for materialized views SQL commands&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and batch insert support for various JDBC-based connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h3&gt;

&lt;p&gt;Get excited for this year’s &lt;a href=&quot;https://blog.starburst.io/announcing-trino-summit-2021&quot;&gt;Trino Summit&lt;/a&gt;
hosted by &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;. 
&lt;a href=&quot;https://www.starburst.io/info/trino-summit-call-for-papers/&quot;&gt;Registration and call for papers&lt;/a&gt;
is now open!&lt;/p&gt;

&lt;h3 id=&quot;linkedin-is-hiring&quot;&gt;LinkedIn is hiring!&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/jobs/view/2402727250/?alternateChannel=search&amp;amp;refId=VRDXEQNgS2gxtpsJaHPXjQ%3D%3D&amp;amp;trackingId=0GzsJkrXWYt6qHWSUHTvCg%3D%3D&amp;amp;trk=d_flagship3_search_srp_jobs&quot;&gt;Software Engineer - Big Data Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/jobs/view/2291645936/?eBP=CwEAAAF6y0tYtsROpAG7XxMEhLVgpq2rSMwpNv28Q_j06PdFsD_s11eFyh-sIv2rxm_Y8zN-p755Gts-ElMlR6XvK2hOMp3JMnxFPzOnZvvZnv_-oHaBslitgtWzsmJy7_f7BKljmgAUtfinG9WCp1Bpi574HZEBJwAsjzKx-89NUdnIBj_SBIPHES_G2RNqoKp5eZ4c0k7YaVJSuZJTyi2K6KoKJ7njT65FEOWvmS9S80ysbINbXjX_WSz71RNAugEpqIgE9-gB1MhW8tQ9z72jQhbjXMqSuUaYS43zFaP8ImXhjTrhbopTxyxTIN9yst6tvlcPo_T5RNAaf_0e8x_km2SGdw&amp;amp;recommendedFlavor=IN_NETWORK&amp;amp;refId=VRDXEQNgS2gxtpsJaHPXjQ%3D%3D&amp;amp;trackingId=5Qo2D07i3Wl%2FVhGeAvLtew%3D%3D&amp;amp;trk=flagship3_search_srp_jobs&quot;&gt;Senior Software Engineer - Big Data Platform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-at-linkedin&quot;&gt;Concept of the week: Trino at LinkedIn&lt;/h2&gt;

&lt;p&gt;The LinkedIn team covers the concept of the week in &lt;a href=&quot;https://www.youtube.com/watch?v=vlc84xB-Hfs&amp;amp;t=955s&quot;&gt;this section&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-digging-into-join-queries&quot;&gt;PR of the week: Digging into join queries&lt;/h2&gt;

&lt;p&gt;Today our PR of the week is from the future 🔮! 
&lt;a href=&quot;https://github.com/jitheshtr/trino/issues/1&quot;&gt;LinkedIn is currently investigating the issue&lt;/a&gt;.
This gives us a chance to talk about the research aspects that go into a PR.&lt;/p&gt;

&lt;p&gt;With a view &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;V&lt;/code&gt; that performs a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION ALL&lt;/code&gt; from an old table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O&lt;/code&gt; and a new 
migrated table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;. For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datepartition&lt;/code&gt; values older than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt; (say 2021-06-05), 
table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O&lt;/code&gt; will be referred for data, while for date equal to or greater than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;,
data from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt; will be used.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/22/view-old-new-tables.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The query in question is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * FROM V
WHERE x IN (SELECT x2 FROM Z)
AND cast(substring(datepartition,1,10) as date) &amp;gt;= date(&apos;2021-06-08&apos;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Z&lt;/code&gt; has stats available and only have 17 rows in them. While the 
data from view &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;V&lt;/code&gt; (which is entirely from underlying table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt; for this query) 
has say billions of rows.&lt;/p&gt;

&lt;p&gt;This query used to take about 39 seconds to run before our upgrade 
(PrestoSQL-333). After the upgrade (Trino-352) it increased to approximately 
thirty-five minutes.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-can-i-query-the-hive-views-from-trino&quot;&gt;Question of the week: How can I query the Hive views from Trino?&lt;/h2&gt;

&lt;p&gt;We actually covered the answer in &lt;a href=&quot;/episodes/18.html&quot;&gt;episode 18&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can use the &lt;a href=&quot;https://engineering.linkedin.com/blog/2020/coral&quot;&gt;Coral&lt;/a&gt; 
project that allows for translation between different SQL syntax. For example, 
it processes Hive QL statements and convert them to an internal representation using
&lt;a href=&quot;https://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;. It then converts the internal
representation to Trino SQL. See &lt;a href=&quot;/docs/current/connector/hive.html#hive-views&quot;&gt;the docs&lt;/a&gt;
for more details.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/coral.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive view, then shows the sequence of events 
when Trino reads that view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2020/coral&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2021/from-daily-dashboards-to-enterprise-grade-data-pipelines&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;News&lt;/p&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun, landing the job!</summary>

      
      
    </entry>
  
    <entry>
      <title>21: Trino + dbt = a match made in SQL heaven?</title>
      <link href="https://trino.io/episodes/21.html" rel="alternate" type="text/html" title="21: Trino + dbt = a match made in SQL heaven?" />
      <published>2021-07-08T00:00:00+00:00</published>
      <updated>2021-07-08T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/21</id>
      <content type="html" xml:base="https://trino.io/episodes/21.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Amy Chen, Partner Solutions Architect at &lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt Labs (formerly Fishtown Analytics)&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/yuanamychen/&quot;&gt;@yuanamychen&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Victor Coustenoble, Solutions Architect at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/victorcouste&quot;&gt;@victorcouste&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-359&quot;&gt;Release 359&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Row pattern recognition for window functions&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp(n)&lt;/code&gt; with precision higher than 3 in MySQL&lt;/li&gt;
  &lt;li&gt;ARM64-compatible docker image&lt;/li&gt;
  &lt;li&gt;Support for granting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; privilege&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt; is a feature from our guest Marius from last time!&lt;/li&gt;
  &lt;li&gt;ARM64 compatible docker image as well as already existing tar.gz and rpm means usage of Graviton and other ARM64 processors is now available also for Kubernetes users, there are significant cost/performance benefits, try it out&lt;/li&gt;
  &lt;li&gt;wow .. this time it took a whole month from 358 to 359&lt;/li&gt;
  &lt;li&gt;breaking change - need Java 11.0.11&lt;/li&gt;
  &lt;li&gt;more materialized view stuff, and I am working on docs!&lt;/li&gt;
  &lt;li&gt;Fix handling of multiple LDAP user bind patterns - for those of us in larger orgs..&lt;/li&gt;
  &lt;li&gt;network logging in CLI&lt;/li&gt;
  &lt;li&gt;rename &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;connector.name&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive-hadoop2&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-359.html&quot;&gt;https://trino.io/docs/current/release/release-359.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-can-dbt-connect-to-different-databases-in-the-same-project&quot;&gt;Question of the week: Can dbt connect to different databases in the same project?&lt;/h2&gt;

&lt;p&gt;This week we are going a little out of order from our usual sequence on this
show. The question really gets to the heart of the concept of the week. We’ll 
cover this first then jump into the concept.&lt;/p&gt;

&lt;p&gt;This question was asked on &lt;a href=&quot;https://stackoverflow.com/questions/63002171&quot;&gt;StackOverflow&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;It seems dbt only works for a single database. If my data is in a different 
database, will that still work? For example, if my datalake is using delta, 
but I want to run dbt using Redshift, would dbt still work for this case?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our guest Victor replied:&lt;/p&gt;

&lt;p&gt;You can use Trino with dbt to connect to multiple databases in the same project.&lt;/p&gt;

&lt;p&gt;The GitHub example project &lt;a href=&quot;https://github.com/victorcouste/trino-dbt-demo&quot;&gt;https://github.com/victorcouste/trino-dbt-demo&lt;/a&gt; 
contains a fully working setup, that you can replicate and adapt to your needs.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week&quot;&gt;Concept of the week:&lt;/h2&gt;

&lt;h3 id=&quot;what-is-dbt&quot;&gt;What is dbt?&lt;/h3&gt;

&lt;p&gt;dbt is a transformation workflow tool that lets teams quickly and collaboratively 
deploy analytics code, following software engineering best practices like 
modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL 
to build production-grade data pipelines.&lt;/p&gt;

&lt;p&gt;When referring to dbt, it can mean two slightly different things. dbt core is 
the open source framework that provides the SQL compiler and framework to manage
your SQL workflow. You can interact with it via a command line interface. In 
addition, dbtlabs offers the fully managed SaaS product dbt Cloud. You can use 
it to handle all of your dbt projects from development to deployment in a single 
browser based tool. It provides useful features like a full IDE to develop and 
test code, orchestration, logging, and alerting. At the moment, dbt Cloud is not
available for Trino users.&lt;/p&gt;

&lt;p&gt;The framework allows you to check the quality of results, document the lineage, 
manage the changes/versions in the SQL scripts and orchestrate the queries, like
a CI/CD framework but for your data. dbt is not an extract and load tool. The 
focus is on transforming what is already in your data warehouse/data lake.&lt;/p&gt;

&lt;p&gt;Check out these links to learn more:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getdbt.com/&quot;&gt;https://www.getdbt.com/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getdbt.com/docs/introduction&quot;&gt;https://docs.getdbt.com/docs/introduction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;goals-of-dbt-and-how-that-differs-from-trino&quot;&gt;Goals of dbt and how that differs from Trino&lt;/h3&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/21/dbt-trino-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Trino is the execution SQL engine and dbt is the framework to manage your SQL 
statements. dbt won’t execute the SQL itself, rather it pushes all of the 
compute down to the SQL engine. This SQL engine can be Trino, or an engine 
included in the data source like the database itself. Using Trino as the SQL 
execution engine allows you to use the same SQL dialect for all connected data 
sources. This includes data sources that natively do not support SQL like object
storage systems, Kafka, Elasticsearch, and many others.&lt;/p&gt;

&lt;h3 id=&quot;transformation-vs-ad-hoc-joins&quot;&gt;Transformation vs ad-hoc joins&lt;/h3&gt;

&lt;p&gt;Transformations done by dbt are in general used to clean and prepare data for 
analytics purposes. It’s often used to go from the raw data to a ready-to-use 
data for reporting and analysis. dbt creates database objects like tables or 
views to be consumed by business users and analytics tools.&lt;/p&gt;

&lt;p&gt;On the other hand, even if Trino can also execute SQL to create tables and 
views, these SQL queries are not managed but just executed. Trino doesn’t have,
like dbt, all the framework to version, audit, document and orchestrate SQL 
script and execution. Trino is more used to execute SQL SELECT 
statements generated by users or BI tools to analyze data in an interactive way.&lt;/p&gt;

&lt;h3 id=&quot;cases-for-why-you-need-both&quot;&gt;Cases for why you need both&lt;/h3&gt;

&lt;p&gt;Trino and dbt are complementary when you need to access different sources from
a single SQL query or when you need to run SQL query with good performance on
object storage systems like S3, GCS, ADLS, or HDFS.&lt;/p&gt;

&lt;p&gt;It’s where Trino can complement dbt, as dbt can only access a single data 
warehouse connection in a SQL query. In dbt there is no way to query multiple 
storage systems at the same time.&lt;/p&gt;

&lt;p&gt;Trino is recognized for great performance with object storage/data lake 
processing. With dbt it can transform and prepare data at scale. Trino also 
allows you to run dbt on a traditional, on-premise data warehouse where 
normally dbt only runs on a modern cloud data warehouse like Snowflake, 
BigQuery, or Redshift.&lt;/p&gt;

&lt;h3 id=&quot;dbt-basics&quot;&gt;dbt basics&lt;/h3&gt;

&lt;p&gt;dbtlabs offers a &lt;a href=&quot;https://docs.getdbt.com/tutorial/setting-up&quot;&gt;good tutorial&lt;/a&gt;
which covers the fundamental topics of dbt for you to learn:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Project: A directory of SQL and YAML files defined with a single project file.&lt;/li&gt;
  &lt;li&gt;Models: A model is a single SQL file where you define your transformations to create a table or a view.&lt;/li&gt;
  &lt;li&gt;Profile: To define connections to your data sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you have other resources like seeds, macros, tests, sources, snapshots.&lt;/p&gt;

&lt;h2 id=&quot;demo-querying-trino-from-a-dbt-project&quot;&gt;Demo: Querying Trino from a dbt project&lt;/h2&gt;

&lt;p&gt;Victor shows us a demo from 
&lt;a href=&quot;https://medium.com/geekculture/trino-dbt-a-match-in-sql-heaven-1df2a3d12b5e&quot;&gt;his blog post that inspired this episode&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you looked at the code, you  may have noticed that the code used an adapter 
called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;db-presto-trino&lt;/code&gt;. This adapter derives from the outdated presto naming and is
still there for interaction with legacy Presto clusters. Although it can work
it uses an outdated python client to interact with Trino and there is an open
&lt;a href=&quot;https://github.com/dbt-labs/dbt-presto/issues/39&quot;&gt;issue to create an official &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dbt-trino&lt;/code&gt; adapter&lt;/a&gt; 
that uses the updated &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to help with this, reach out on the issue itself and join the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#db-presto-trino&lt;/code&gt; channel on the dbt Slack. 
&lt;a href=&quot;https://community.getdbt.com/&quot;&gt;https://community.getdbt.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the show &lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt;, started &lt;a href=&quot;https://github.com/findinpath/dbt-trino&quot;&gt;work on
dbt-trino in his own repository&lt;/a&gt;.
Thanks for the quick turnaround Marius!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8283-externalised-destination-table-cache-expiry-duration-for-bigquery-connector&quot;&gt;PR of the week: PR 8283 Externalised destination table cache expiry duration for BigQuery Connector&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8283&quot;&gt;PR of the week&lt;/a&gt;, was committed 
by Ayush Bilala(&lt;a href=&quot;https://twitter.com/ayushbilala&quot;&gt;Twitter&lt;/a&gt;), (&lt;a href=&quot;https://www.linkedin.com/in/ayush-bilala/&quot;&gt;LinkedIn&lt;/a&gt;), a Staff Software Engineer at
Walmart Global Tech.&lt;/p&gt;

&lt;p&gt;This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/8236&quot;&gt;issue 8263&lt;/a&gt; by adding
a new configuration for the Big Query connector, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigquery.views-cache-ttl&lt;/code&gt; 
to allow configuring the cache expiration for BigQuery views.&lt;/p&gt;

&lt;p&gt;Thanks Ayush!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>20: Trino for the Trinewbie</title>
      <link href="https://trino.io/episodes/20.html" rel="alternate" type="text/html" title="20: Trino for the Trinewbie" />
      <published>2021-06-23T00:00:00+00:00</published>
      <updated>2021-06-23T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/20</id>
      <content type="html" xml:base="https://trino.io/episodes/20.html">&lt;script async=&quot;&quot; defer=&quot;&quot; src=&quot;https://buttons.github.io/buttons.js&quot;&gt;&lt;/script&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Marius Grama, Data Engineer at &lt;a href=&quot;https://www.willhaben.at/&quot;&gt;willhaben internet service GmbH &amp;amp; Co KG&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;@findinpath&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-for-the-trinewbie&quot;&gt;Concept of the week: Trino for the Trinewbie&lt;/h2&gt;

&lt;p&gt;One of the best and easiest ways to get an understanding about Trino, and how to
use it is the book Trino: Definitive Guide. The next three sections have a few 
excerpts from the book that does an incredible job at introducing the space 
Trino is in. If you would like to read the book in its entirety, Starburst 
offers &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the digital copy for free&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;the-problems-with-big-data&quot;&gt;The Problems with Big Data&lt;/h3&gt;

&lt;p&gt;Everybody is capturing more and more data from device metrics, user behavior
tracking, business transactions, location data, software and system testing 
procedures and workflows, and much more. The insights gained from understanding
that data and working with it can make or break the success of any initiative,
or even a company.&lt;/p&gt;

&lt;p&gt;At the same time, the diversity of storage mechanisms available for data has 
exploded: relational databases, NoSQL databases, document databases, key-value 
stores, object storage systems, and so on. Many of them are necessary in today’s
organizations, and it is no longer possible to use just one of them.&lt;/p&gt;

&lt;h3 id=&quot;what-is-trino&quot;&gt;What is Trino?&lt;/h3&gt;

&lt;p&gt;Trino is not a database with storage, rather, it simply queries data where it 
lives. When using Trino, storage and compute are decoupled and can be scaled 
independently. Trino represents the compute layer, whereas the underlying data 
sources represent the storage layer.&lt;/p&gt;

&lt;p&gt;This allows Trino to scale up and down its compute resources for query 
processing, based on analytics demand to access this data. There is no need to 
move your data, and provision compute and storage to the exact needs of the 
current queries, or change that regularly, based on your changing query needs.&lt;/p&gt;

&lt;p&gt;Trino can scale the query power by scaling the compute cluster dynamically, and 
the data can be queried right where it lives in the data source. This 
characteristic allows you to greatly optimize your hardware resource needs and 
therefore reduce cost.&lt;/p&gt;

&lt;h3 id=&quot;sql-on-anything&quot;&gt;SQL-on-Anything&lt;/h3&gt;

&lt;p&gt;Trino was initially designed to query data from HDFS. And it can do that very 
efficiently, as you learn later. But that is not where it ends. On the contrary,
Trino is a query engine that can query data from object storage, relational
database management systems (RDBMSs), NoSQL databases, and other systems.&lt;/p&gt;

&lt;p&gt;Trino queries data where it lives and does not require a migration of data to a 
single location. So Trino allows you to query data in HDFS and other distributed
object storage systems. It allows you to query RDBMSs and other data sources. As
such, it can really query data wherever it lives and therefore be a replacement
to the traditional, expensive, and heavy extract, transform, and load (ETL) 
processes. Or at a minimum, it can help you with them and lighten the load. So 
Trino is clearly not just another SQL-on-Hadoop solution.&lt;/p&gt;

&lt;p&gt;Object storage systems include Amazon Web Services (AWS) Simple Storage Service
(S3), Microsoft Azure Blob Storage, Google Cloud Storage, and S3-compatible 
storage such as MinIO and Ceph. Trino can query traditional RDBMSs such as 
Microsoft SQL Server, PostgreSQL, MySQL, Oracle, Teradata, and Amazon Redshift. 
Trino can also query NoSQL systems such as Apache Cassandra, Apache Kafka, 
MongoDB, or Elasticsearch. Trino can query virtually anything and is truly a 
SQL-on-Anything system.&lt;/p&gt;

&lt;p&gt;For users, this means that suddenly they no longer have to rely on specific 
query languages or tools to interact with the data in those specific systems.
They can simply leverage Trino and their existing SQL skills and their 
well-understood analytics, dashboarding, and reporting tools. These tools, 
built on top of using SQL, allow analysis of those additional data sets, which 
are otherwise locked in separate systems. Users can even use Trino to query 
across different systems with the SQL they know.&lt;/p&gt;

&lt;h3 id=&quot;contributing-to-trino&quot;&gt;Contributing to Trino&lt;/h3&gt;

&lt;p&gt;In this episode, Marius Grama discusses his journey with Trino. From joining the
community, his first impressions and experiences, and what led him to make 
sixteen commits over the last three months. We also ask him where he thinks we 
could improve to make the onboarding experience better.&lt;/p&gt;

&lt;p&gt;In the Trino project there are four &lt;a href=&quot;/development/roles.html&quot;&gt;roles&lt;/a&gt;.
You can immediately become a participant or reviewer. To be a contributor, you
need to follow some steps that are covered later in the episode. Likewise, for
maintainers, there is a path to becoming a maintainer that is discussed in 
detail on the roles page.&lt;/p&gt;

&lt;h4 id=&quot;participants&quot;&gt;Participants&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;Participants are those who show up and join in discussions about the project. 
Users, developers, and administrators can all be participants, as can 
literally anyone who has the time, energy, and passion to become involved. 
Participants suggest improvements and new features. They report bugs, 
regressions, performance issues, and so on. They work to make Trino better for
everyone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;contributors&quot;&gt;Contributors&lt;/h4&gt;

&lt;p&gt;Today’s episode covers the process that a contributor goes through to make a
code change, but simply put:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A contributor submits code changes to Trino.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;reviewers&quot;&gt;Reviewers&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;A reviewer reads a proposed change to Trino, and assesses how well the change 
aligns with the Trino vision and guidelines. This includes everything from 
high level project vision to low level code style. Everyone is invited and 
encouraged to review others’ contributions – you don’t need to be a maintainer
for that.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;maintainers&quot;&gt;Maintainers&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;A maintainer is responsible for checking in code only after ensuring it has 
been reviewed thoroughly and aligns with the Trino vision and guidelines. In 
addition to merging code, a maintainer actively participates in discussions 
and reviews. Being a maintainer does not grant additional rights in the 
project to make changes, set direction, or anything else that does not align 
with the direction of the project. Instead, a maintainer is expected to bring
these to the project participants as needed to gain consensus. The maintainer
role is for an individual, so if a maintainer changes employers, the role is 
retained. However, if a maintainer is no longer actively involved in the 
project, their maintainer status will be reviewed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter&quot;&gt;a writeup on the Apache Hive process to become a committer.&lt;/a&gt;
For context, a committer is equivalent to a maintainer in Trino. This writeup
aligns precisely with the Trino philosophy. Here are a few good quotes from that
article:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Contributors often ask Hive PMC members the question, “What do I need to do in
order to become a committer?” The simple (though frustrating) answer to this 
question is, “If you want to become a committer, behave like a committer.” If 
you follow this advice, then rest assured that the PMC will notice, and 
committership will seek you out rather than the other way around.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;It should go without saying, but here it is anyway: your participation in the 
project should be a natural part of your work with Hive; if you find yourself 
undertaking tasks “so that you can become a committer”, then you’re doing it 
wrong, young padawan. This is particularly true if your motivations for 
wanting to become a committer are primarily negative or self-centered&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8135-set-default-time-zone-for-the-current-session&quot;&gt;PR of the week: PR 8135 Set default time zone for the current session&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8135&quot;&gt;PR of the week&lt;/a&gt;, was committed 
by today’s guest, &lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/8112&quot;&gt;issue 8112&lt;/a&gt; by adding
support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt; statement. The time zone specified is being 
stored as a session property and has a lower precedence than 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sql.forced-session-time-zone&lt;/code&gt; setting.&lt;/p&gt;

&lt;p&gt;Thanks Marius!&lt;/p&gt;

&lt;h2 id=&quot;demo-contributing-to-trino&quot;&gt;Demo: Contributing to Trino&lt;/h2&gt;

&lt;p&gt;Here is the video that goes into detail on the steps below on how to contribute
code to Trino!&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/gAqYkR2oGgM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Download an IDE.&lt;/p&gt;

    &lt;p&gt;First, you need to have an integrated development environment (IDE) to run 
 the code. We recommend &lt;a href=&quot;https://www.jetbrains.com/idea/download/&quot;&gt;Intellij Community Edition&lt;/a&gt;
 as it is the standard that is used by developers across the project. Of 
 course, you may use any IDE you like, but there may be issues that others 
 may not be able to help with as readily.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Install Git.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://git-scm.com/&quot;&gt;Git&lt;/a&gt; is a distributed version source control software 
 used to collaborate code with other users. You must 
 &lt;a href=&quot;https://git-scm.com/book/en/v2/Getting-Started-Installing-Git&quot;&gt;install git&lt;/a&gt;
 in order to contribute to the project.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Install Docker.&lt;/p&gt;

    &lt;p&gt;The Trino testing framework runs Trino and other databases it connects to on
 Docker, a tool that runs different services in isolation using containers.&lt;br /&gt;
 Go ahead and &lt;a href=&quot;https://docs.docker.com/engine/install/&quot;&gt;install Docker&lt;/a&gt; on 
 your system.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Create and configure your GitHub account.&lt;/p&gt;

    &lt;p&gt;GitHub is a free hosted Git repository, and a central point of collaboration
 for the Trino project. If you haven’t done so, please 
 &lt;a href=&quot;https://git-scm.com/book/en/v2/GitHub-Account-Setup-and-Configuration&quot;&gt;create and configure your GitHub account&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Make a fork of the Trino repository on GitHub&lt;/p&gt;

    &lt;p&gt;Navigate to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino repository&lt;/a&gt; and 
 click the “fork” button. Or you can just click it here: &lt;a class=&quot;github-button&quot; href=&quot;https://github.com/trino/trinodb/fork&quot; data-icon=&quot;octicon-repo-forked&quot; data-size=&quot;large&quot;&gt;Fork&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;You want to create a fork so that you can save your work without needing the
 special privileges it takes to commit code back to the Trino repository. 
 This way, you can upload (also called a “push” in Git) your code to your 
 fork and later open a pull request into the main Trino repository.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Clone your fork of the Trino repository to your computer and import into Intellij.&lt;/p&gt;

    &lt;p&gt;Execute the following clone command in your terminal:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; git clone git@github.com:&amp;lt;your_username&amp;gt;/trino.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;Open the &lt;a href=&quot;https://www.jetbrains.com/help/idea/maven-support.html#maven_import_project_start&quot;&gt;Trino project in Intellij&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Add the Airlift code style checks to Intellij.&lt;/p&gt;

    &lt;p&gt;There are many unspoken rules to code style and formatting in any project. 
Trino is no exception. To make life simpler on the contributor and reviewer, 
the &lt;a href=&quot;https://raw.githubusercontent.com/airlift/codestyle/master/IntelliJIdea2019/Airlift.xml&quot;&gt;Trino code style definition&lt;/a&gt; 
that &lt;a href=&quot;https://www.jetbrains.com/help/idea/copying-code-style-settings.html&quot;&gt;you can import into Intellij&lt;/a&gt; 
to have the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Reformat Code&lt;/code&gt; action to format in the desired style of the project.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Build the project.&lt;/p&gt;

    &lt;p&gt;One of the greatest resources in trino history is &lt;a href=&quot;https://gist.github.com/findepi/04c96f0f60dcc95329f569bb0c44a0cd&quot;&gt;this cheat sheet&lt;/a&gt;
created by &lt;a href=&quot;https://twitter.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;. I use it for some
of the commands, but the most important use, is the “fast” build command he
adds on the top. In your terminal, make sure you are located in the root 
directory of the Trino project, and run the following command.&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./mvnw -pl &apos;!:trino-server-rpm,!:trino-docs,!:trino-proxy,!:trino-verifier,!:trino-benchto-benchmarks&apos; clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true \
-Dair.check.skip-all=true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;This builds all necessary modules of the project to run almost everything
in Trino. The build excludes some modules, runs the compiler on multiple 
threads, skips the tests, javadocs, and the Airlift code style checks. If you
would like to run code style check on a specific module (e.g. 
trino-elasticsearch) then you can run the following command.&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./mvnw -pl &apos;:trino-elasticsearch&apos; clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Sign the CLA.&lt;/p&gt;

    &lt;p&gt;Sign the &lt;a href=&quot;https://github.com/trinodb/cla/blob/master/Trino%20Foundation%20Individual%20CLA.pdf&quot;&gt;contributor license agreement (CLA)&lt;/a&gt; 
 to agree that all of your code you commit to the project is subject to the 
 Apache License 2.0. Once you sign the agreement, scan and submit the form to
 &lt;a href=&quot;mailto:cla@trino.io&quot;&gt;cla@trino.io&lt;/a&gt;. This email gets checked every few days,
 and you can check if your name has been added to the &lt;a href=&quot;https://github.com/trinodb/cla/blob/master/contributors&quot;&gt;contributors&lt;/a&gt;
 list.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;At this point you can look for an &lt;a href=&quot;https://github.com/trinodb/trino/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22&quot;&gt;issue labeled “good first issue”&lt;/a&gt;
This identifies issues that we think are more approachable for developers that 
aren’t as familiar with the Trino repository yet.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;One final thing before you move on to the contribution process. Before you
start jumping in and changing the code, you’ll also want to create a special
branch for your changes. A branch in git makes a separate workflow for all the 
changes you make to be isolated, If something goes wrong, or you need to 
compare with an older branch you can do so. The default branch may either be
named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;master&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;. See &lt;a href=&quot;https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging&quot;&gt;more on branching in git&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To make a branch for your feature, you can run the following command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git checkout -b my-feature-branch
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Follow the remaining steps in the &lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;contribution process page&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-remove-nulls-from-an-array-in-trino&quot;&gt;Question of the week: How do I remove nulls from an array in Trino?&lt;/h2&gt;

&lt;p&gt;A &lt;a href=&quot;https://stackoverflow.com/questions/66162776&quot;&gt;question posted to StackOverflow&lt;/a&gt; 
asked the following question:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’m extracting data from a json column in Trino and getting the output in an 
array like this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&apos;AL&apos;, NULL, &apos;NEW&apos;]&lt;/code&gt;. The problem is I need to remove the null since
the array has to be mapped another array.I tried several options but no luck.
How can I remove the null and get only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&apos;AL&apos;, &apos;NEW&apos;]&lt;/code&gt; without unnesting?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt; replied:&lt;/p&gt;

&lt;p&gt;You can use &lt;a href=&quot;https://trino.io/docs/current/functions/array.html#filter&quot;&gt;filter()&lt;/a&gt;
for this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SELECT filter(ARRAY[&apos;AL&apos;, NULL,&apos;NEW&apos;], e -&amp;gt; e IS NOT NULL);
   _col0
-----------
 [AL, NEW]
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary></summary>

      
      
    </entry>
  
    <entry>
      <title>19: Data Ingestion to Iceberg and Trino</title>
      <link href="https://trino.io/episodes/19.html" rel="alternate" type="text/html" title="19: Data Ingestion to Iceberg and Trino" />
      <published>2021-06-10T00:00:00+00:00</published>
      <updated>2021-06-10T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/19</id>
      <content type="html" xml:base="https://trino.io/episodes/19.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Cory Darby, Principal Software Developer at &lt;a href=&quot;https://bluecatnetworks.com/&quot;&gt;BlueCat&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/ckdarby&quot;&gt;@ckdarby&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-358&quot;&gt;Release 358&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW STATS&lt;/code&gt; support for arbitrary queries.&lt;/li&gt;
  &lt;li&gt;Performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY ... LIMIT&lt;/code&gt; queries on sorted data.&lt;/li&gt;
  &lt;li&gt;Support for Hive views containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL VIEW&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Reduced graceful shutdown time&lt;/li&gt;
  &lt;li&gt;A bunch of performance and correctness fixes&lt;/li&gt;
  &lt;li&gt;Removed support for legacy JDBC string in driver &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto:&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-358.html&quot;&gt;https://trino.io/docs/current/release/release-358.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;release-357&quot;&gt;Release 357&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for subquery expressions that produce multiple columns.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_CATALOG&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_SCHEMA&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Rule support for identifier mapping in various connectors.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format_number&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Cast row types as JSON objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Print dynamic filters summary in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Fix trusted cert usage for OAuth&lt;/li&gt;
  &lt;li&gt;clear command in CLI&lt;/li&gt;
  &lt;li&gt;Numerous smaller connector changes - check your favourite connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More at &lt;a href=&quot;https://trino.io/docs/current/release/release-357.html&quot;&gt;https://trino.io/docs/current/release/release-357.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-ingesting-into-iceberg-with-pulsar-and-flink-at-bluecat&quot;&gt;Concept of the week: Ingesting into Iceberg with Pulsar and Flink at BlueCat&lt;/h2&gt;

&lt;p&gt;Here are Cory’s slides that you can use to follow along while listening to the 
podcast.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/5KsmZMJtSOoxFx&quot; width=&quot;800&quot; height=&quot;650&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1905-add-format_number-function&quot;&gt;PR of the week: PR 1905 Add format_number function&lt;/h2&gt;

&lt;p&gt;The
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1905&quot;&gt;PR of the week&lt;/a&gt;, is a simple but
always useful PR done by maintainer &lt;a href=&quot;https://twitter.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt;.
This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/1878&quot;&gt;issue 1878&lt;/a&gt; that makes
a nice format for very large numbers that get returned from the query to be
truncated with a value suffix like (B - billion, M - million, K - thousand, 
etc…). Rather than reuse the CLI’s 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/client/trino-cli/src/main/java/io/trino/cli/FormatUtils.java&quot;&gt;FormatUtils&lt;/a&gt;
class, which missed various cases, he created 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/operator/scalar/FormatNumberFunction.java&quot;&gt;his own implementation&lt;/a&gt; 
that solves
for those issues. Thanks Yuya!&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-the-format_number-functionality&quot;&gt;Demo: Showing the format_number functionality&lt;/h2&gt;

&lt;p&gt;Here are the examples we ran in the show.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT format_number(DOUBLE &apos;1234.5&apos;);

SELECT format_number(DOUBLE &apos;-9223372036854775808&apos;);

SELECT format_number(DOUBLE &apos;9223372036854775807&apos;);

SELECT format_number(REAL &apos;-999&apos;);

SELECT format_number(REAL &apos;999&apos;);

SELECT format_number(DECIMAL &apos;-1000&apos;);

SELECT format_number(DECIMAL &apos;1000&apos;);

SELECT format_number(999999999);

SELECT format_number(1000000000);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-search-nested-objects-in-elasticsearch-from-trino&quot;&gt;Question of the week: How do I search nested objects in Elasticsearch from Trino?&lt;/h2&gt;

&lt;p&gt;A &lt;a href=&quot;https://stackoverflow.com/questions/67667313&quot;&gt;question posted to StackOverflow&lt;/a&gt; 
asked how to search nested objects using the Elasticsearch connector.&lt;/p&gt;

&lt;p&gt;Trino maps a &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nested&lt;/code&gt;&lt;/a&gt; 
object type to a &lt;a href=&quot;https://trino.io/docs/current/language/types.html#row&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt;&lt;/a&gt;
the same way that it maps a standard 
&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html&quot;&gt;object&lt;/a&gt; 
type during a read. The nested designation itself serves no purpose to Trino 
since it only determines how the object is stored in Elasticsearch.&lt;/p&gt;

&lt;p&gt;Check out &lt;a href=&quot;https://stackoverflow.com/a/67843697/2023810&quot;&gt;Brian’s full answer to this question&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;Iceberg at Adobe&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Trino Meetup: &lt;a href=&quot;https://www.youtube.com/watch?v=ifXpOn0NJWk&quot;&gt;Apache Iceberg: A table format for data lakes with unforeseen use cases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Cory Darby, Principal Software Developer at BlueCat (@ckdarby) Release 358</summary>

      
      
    </entry>
  
    <entry>
      <title>18: Trino enjoying the view</title>
      <link href="https://trino.io/episodes/18.html" rel="alternate" type="text/html" title="18: Trino enjoying the view" />
      <published>2021-05-20T00:00:00+00:00</published>
      <updated>2021-05-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/18</id>
      <content type="html" xml:base="https://trino.io/episodes/18.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-view.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun enjoying the views...
&lt;/p&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Anjali Norwood, Senior Open Source Software Engineer at Netflix 
 (&lt;a href=&quot;https://www.linkedin.com/in/anjali-norwood-9521a16/&quot;&gt;@AnjaliNorwood&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-views-hive-views-and-materialized-views&quot;&gt;Concept of the week: Trino Views, Hive Views and Materialized Views&lt;/h2&gt;

&lt;p&gt;Before diving into views, it can be helpful to take a step back to consider a 
well understood abstraction, like tables, to understand the purpose of a view.
Tables contain data in a vertical orientation, referred to as columns. Databases
represent instances of the data in a horizontal orientation, referred to as rows.
See the following tables, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; tables from the TPCH dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;customer table&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;nationkey&lt;/th&gt;
      &lt;th&gt;acctbal&lt;/th&gt;
      &lt;th&gt;mktsegment&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;Customer#000000376&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;4231.45&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;Customer#000000377&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;1043.72&lt;/td&gt;
      &lt;td&gt;MACHINERY&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;378&lt;/td&gt;
      &lt;td&gt;Customer#000000378&lt;/td&gt;
      &lt;td&gt;22&lt;/td&gt;
      &lt;td&gt;5718.05&lt;/td&gt;
      &lt;td&gt;BUILDING&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;orders table&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;orderkey&lt;/th&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;orderstatus&lt;/th&gt;
      &lt;th&gt;totalprice&lt;/th&gt;
      &lt;th&gt;orderdate&lt;/th&gt;
      &lt;th&gt;orderpriority&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;172799.49&lt;/td&gt;
      &lt;td&gt;1996-01-02&lt;/td&gt;
      &lt;td&gt;5-LOW&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;38426.09&lt;/td&gt;
      &lt;td&gt;1996-12-01&lt;/td&gt;
      &lt;td&gt;1-URGENT&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;205654.3&lt;/td&gt;
      &lt;td&gt;1993-10-14&lt;/td&gt;
      &lt;td&gt;5-LOW&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The columns have a schema that enforce particular data types in particular 
columns and prevents insertion of invalid data into the table by throwing
an exception. This becomes extremely useful when reading and processing the data
as there are a clear set of operations that can run on certain columns ased on 
their type. This information is also useful when deserializing result sets into
various in-memory abstractions. Here is an example of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; table 
schema:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;customer table schema&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;views-and-materialized-views&quot;&gt;Views and materialized views:&lt;/h3&gt;

&lt;p&gt;The structure of a view is similar to tables in that they have columns, rows,
and schemas similar to regular database tables. What then do views offer over
tables? Views offer ways to encapsulate complex SQL statements. For example,
take this SQL query that would run over the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; tables
defined before.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
 c.custkey, 
 name, 
 nationkey, 
 mktsegment, 
 sumtotalprice, 
 openstatuscount, 
 failedstatuscount, 
 partialstatuscount
FROM 
 customer c 
 JOIN (
  SELECT 
   custkey, 
   SUM(totalprice) AS sumtotalprice, 
   COUNT_IF(orderstatus = &apos;O&apos;) AS openstatuscount,
   COUNT_IF(orderstatus = &apos;F&apos;) AS failedstatuscount, 
   COUNT_IF(orderstatus = &apos;P&apos;) AS partialstatuscount
  FROM orders
  GROUP BY custkey
 ) o
 ON c.custkey = o.custkey;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This query performs some aggregations on the orders table grouped by customer.
Then there is a join performed on the aggregated orders table and customer table
by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;nationkey&lt;/th&gt;
      &lt;th&gt;mktsegment&lt;/th&gt;
      &lt;th&gt;sumtotalprice&lt;/th&gt;
      &lt;th&gt;openstatuscount&lt;/th&gt;
      &lt;th&gt;failedstatuscount&lt;/th&gt;
      &lt;th&gt;partialstatuscount&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;Customer#000000376&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
      &lt;td&gt;1600696.4700000002&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;Customer#000000377&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;MACHINERY&lt;/td&gt;
      &lt;td&gt;803271.9400000001&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;379&lt;/td&gt;
      &lt;td&gt;Customer#000000379&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
      &lt;td&gt;3155009.54&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;From here, there are many ways you could further evaluate the resulting data. 
You could filter and look at which market segment is spending the most on your
products. You could also look at where there are the most failed orders by the
nation column to evaluate where shipping lines may need to be improved. The 
table above which results from the example query, is a good intermediate state 
of the data that can be reused for many future evaluations. Instead of defining
a new table, you can create a view on this data that encapsulates the complex
SQL that was used to calculate it. This is done using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE VIEW&lt;/code&gt; 
statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE VIEW customer_orders_view AS 
&amp;lt;complex SQL query above&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, when you want to run any further analysis on this intermediate dataset, you
simply refer to the view instead of having to rewrite the statement before. As
mentioned, this view also has a schema and is treated much like a table when the
query engine does its planning. In this way it is also easier to map the data to
the application logic by enabling different shapes of the same data. It should
be made clear that these views are read-only and do not allow inserts, updates,
or deleting from the view.&lt;/p&gt;

&lt;p&gt;Another reason why you would want to create a view is to control read access to the
data. When running the query, you get to choose which columns and rows get
filtered out and that return from when users query the view. The authorization 
of a user  is tied to the view and its content, and that can significantly 
differ from the complete data in the underlying tables. For example, the views 
can exclude sensitive data like social security numbers, birth dates, credit 
card numbers, and many other facts.&lt;/p&gt;

&lt;p&gt;When creating a view, there are two modes that the view can run in that will 
indicate the user that will run the queries defined in the view during query
runtime. You can either run this query as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINER&lt;/code&gt; which indicates to run
the view query as the user that created the view, or as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INVOKER&lt;/code&gt;, which
indicates to run the view query as the user that is running the outer query of
the view. The default mode is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINER&lt;/code&gt;. See more 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-view.html#security&quot;&gt;in the security section of the create view documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two types of views; materialized and logical views. The view defined
above is the standard logical view that gets expanded into its definition. 
Logical views do not provide any performance benefit since the data is not 
stored and instead queried at query time. Materialized views persist the view 
data upon view creation by storing the query data.&lt;/p&gt;

&lt;p&gt;Materialized views make overall queries much faster to run as part of the query
has already been computed. One issue with materialized views is that the data 
may become outdated and out of sync with the underlying table data. To keep the 
data between the tables and materialized view in sync, you have to refresh the 
view. A special refresh command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; is called 
periodically to handle this operation, or to schedule the procedure run 
automatically.&lt;/p&gt;

&lt;h3 id=&quot;trino-views-so-many-views-so-little-time&quot;&gt;Trino views: So many views, so little time&lt;/h3&gt;

&lt;p&gt;Views handling in Trino depends on the connector. In general, most connectors
expose views to Trino as if they are another set of tables available for Trino
to query. The main exceptions for this is the Hive and Iceberg connectors. The 
table below lists the current possible Hive and Iceberg views.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th colspan=&quot;2&quot;&gt;&lt;/th&gt;
    &lt;th&gt;Logical&lt;/th&gt;
    &lt;th&gt;Materialized&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
  &lt;tr&gt;
    &lt;td rowspan=&quot;2&quot;&gt;Trino Created View&lt;/td&gt;
    &lt;td&gt;Hive Connector&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Iceberg Connector&lt;/td&gt;
    &lt;td&gt;✅ (Edit: &lt;a href=&quot;https://github.com/trinodb/trino/pull/8540&quot;&gt;PR 8540&lt;/a&gt;)&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td colspan=&quot;2&quot;&gt;Hive Created View&lt;/td&gt;
    &lt;td&gt;✅ (read-only)&lt;/td&gt;
    &lt;td&gt;✅ (read-only)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You’ll notice that the materialized views cannot be created through the Hive
connector in Trino. You will get the following exception:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating materialized views.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Also, you cannot create logical views in Iceberg and you will get the following
exception:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating views.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;trino-reads-hive-views&quot;&gt;Trino reads Hive views&lt;/h4&gt;

&lt;p&gt;Before Trino there was Hive. Trino is a replacement for the Hive runtime for 
many users, and it is very useful for these users to also be able to read data 
from Hive views in Trino. Trino always aims to be compatible with as many Hive abstractions
as possible to make migrating away from Hive to Trino as painless as possible. 
So Trino supports reading data from Hive Views, though it doesn’t support 
updates on these views. You have to update these views through Hive and ideally
you will gradually migrate these views to Trino over time. Trino also supports
reading Hive materialized views, though Trino reads these views as another Hive 
table rather since they are stored similarly to standard Hive tables. Since
Hive views are defined in HiveQL, the view definitions need to be translated to
Trino SQL syntax. This is done using LinkedIn’s Coral library.&lt;/p&gt;

&lt;h4 id=&quot;coral-the-unifier-of-the-bee-and-the-bunny&quot;&gt;Coral: the unifier of the bee and the bunny&lt;/h4&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/linkedin/coral&quot;&gt;Coral&lt;/a&gt; is a project that allows for 
translation between views from different SQL syntax. It can process Hive QL 
statements and convert them to an internal representation using
&lt;a href=&quot;https://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;. It then converts the internal
representation to Trino SQL.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/coral.png&quot; /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;trino-reading-hive-view-sequence-diagrams&quot;&gt;Trino reading Hive view sequence diagrams&lt;/h4&gt;

&lt;p&gt;In both of these sequence diagrams, notice that the first actions are to create
a Hive view. This is created and maintained by the Hive system and it is 
impossible to create or update a similar view in Trino.&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive view, then shows the sequence of events 
when Trino reads that view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive materialized view, then shows the 
sequence of events when Trino reads the materialized view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-materialized-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;trino-native-view-sequence-diagrams&quot;&gt;Trino native view sequence diagrams&lt;/h4&gt;

&lt;p&gt;This diagram shows the sequence diagram for a Trino view that is created using 
the Hive Connector.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-view-hive-connector-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the sequence diagram for a materialized Trino view that is 
created using the Iceberg Connector.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-materialized-view-iceberg-connector-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;iceberg-materialized-view-refresh-currently-only-full-refresh-in-iceberg-connector&quot;&gt;Iceberg materialized view refresh (currently only full refresh in Iceberg connector)&lt;/h3&gt;

&lt;p&gt;Ideally, as the tables underlying a materialized view change, the materialized
view should be automatically and incrementally updated to reflect the results 
that are in sync with latest data.&lt;/p&gt;

&lt;p&gt;Automatically keeping materialized views fresh can be tricky from resource 
management point of view since the computation to materialize the materialized 
view can be expensive. Trino currently does not support automatic refresh of 
materialized views. It instead supports the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command 
that the user can issue to ensure that the materialized view is fresh.&lt;/p&gt;

&lt;p&gt;As a part of executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command in Trino, existing
data in the materialized view is dropped and new data is inserted if there are 
any changes to base data. If the base data has not changed at all, the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command is a no-op.&lt;/p&gt;

&lt;p&gt;What happens if the user issues a query against the materialized view, and the 
materialized view is not fresh? Trino detects that the materialized view is 
stale, so it expands the materialized view definition, much like a logical view 
and executes that SQL statement. Trino runs the query against the base tables.&lt;/p&gt;

&lt;p&gt;Incremental or delta refresh of materialized views is a more efficient way of
keeping the materialized view in sync with the base data. An incremental refresh 
means only parts of the data that need to be updated in a materialized view are 
updated The rest of the data is left untouched. For example, say you have a base
table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sales&lt;/code&gt;, partitioned on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt; column. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sales&lt;/code&gt; table only gets 
inserted data for that day. If the materialized view is also partitioned on 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt;, a new partition for a day can be added and data inserted for that day. 
Data for previous days/months is still fresh and can be left untouched. 
This is something on Netflix’s roadmap. The incremental refresh of the 
materialized view can be a partition level refresh, another can be a more 
granular row-level refresh by using functionality similar to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SQL MERGE&lt;/code&gt; 
statement.&lt;/p&gt;

&lt;h3 id=&quot;support-in-trino-and-at-netflix&quot;&gt;Support in Trino and at Netflix:&lt;/h3&gt;

&lt;h4 id=&quot;netflix-materialized-views&quot;&gt;Netflix materialized views&lt;/h4&gt;

&lt;p&gt;The main reason Netflix is interested in materialized views is to give analysts 
an easy way to compute and materialize their frequently used queries and keep 
the results refreshed without relying on ETL pipeline to create and maintain 
those result sets. Some materialized views are as simple as queries that project
columns and apply filters, selecting data for a time range or for a test-id. 
Others are more complex that perform multi-level joins and aggregations.&lt;/p&gt;

&lt;h4 id=&quot;netflix-materialized-view-cross-compatibility-extension&quot;&gt;Netflix materialized view cross compatibility extension&lt;/h4&gt;

&lt;p&gt;Materialized views, much like logical views, are compatible across Trino and 
Spark, the two main engines used at Netflix. Spark is used at Netflix to do ETL, 
and creating and populating tables. Trino is the most popular engine with 
analysts and developers for adhoc and experimental queries as well as audits.&lt;/p&gt;

&lt;p&gt;Trino is also used for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS SELECT&lt;/code&gt; (CTAS) in some use cases. Both 
the engines access data from tables using Iceberg and Hive connectors where data
is stored in S3. Netflix built upon the Trino logical views to create common 
views that are accessible from both Spark and Trino. The difference between the 
Trino logical views and Netflix common views is that the metadata is stored in 
the Hive metastore for Trino logical views, while common views store their 
metadata in JSON format in S3.&lt;/p&gt;

&lt;p&gt;A view object in Hive metastore points to the S3 location of metadata. It tracks
evolution of view definition in the form of versions so that you can potentially
revert a view to its older version. Main benefit of common views is 
interoperability between Spark and Trino (can create, replace, query, drop from 
either engine and can be expanded to other engines). Netflix supports common 
views through both Hive and Iceberg connectors.&lt;/p&gt;

&lt;p&gt;Currently, common views support SQL syntax common to both Spark and Trino. This 
support can be expanded in future using LinkedIn’s Coral project such that 
engine specific syntax and semantics can be translated and interpreted by 
another engine. Netflix materialized views are an extension of Trino 
materialized views to make them inter-operable between Spark and Trino. The only
difference between Trino and Netflix materialized views is where the metadata is
stored, very similar to Trino and Netflix logical views.&lt;/p&gt;

&lt;h3 id=&quot;roadmap&quot;&gt;Roadmap:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Netflix is looking into caching query results using materialized views and 
  memory connector.&lt;/li&gt;
  &lt;li&gt;Incremental refresh ideas.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4832-add-iceberg-support-for-materialized-views&quot;&gt;PR of the week: PR 4832 Add Iceberg support for materialized views&lt;/h2&gt;

&lt;p&gt;Our guest, Anjali, is the author of this weeks 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4832&quot;&gt;PR of the week&lt;/a&gt;, which adds Iceberg
support for materialized views. Thanks Anjali!&lt;/p&gt;

&lt;p&gt;Honorable PR mentions:&lt;/p&gt;

&lt;p&gt;In order for the PR of the week to work, Anjali 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3283&quot;&gt;added syntax support&lt;/a&gt; for Trino 
materialized views with commands: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE MATERIALIZED VIEW&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP MATERIALIZED VIEW&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Before any of this was done, user &lt;a href=&quot;https://github.com/laurachenyu&quot;&gt;laurachenyu&lt;/a&gt; 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4661&quot;&gt;integrated Coral with trino to enable querying hive views&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-the-different-views-in-trino&quot;&gt;Demo: Showing the different views in Trino&lt;/h2&gt;

&lt;p&gt;In Trino, create some Hive tables in a hive catalog named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hdfs&lt;/code&gt; that represents
the underlying storage Trino writes to.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA hdfs.tiny
WITH (location = &apos;/tiny/&apos;);

CREATE TABLE hdfs.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;/tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE hdfs.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;/tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, create a logical Hive view (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive_view&lt;/code&gt;), and a materialized Hive view
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive_materialized_view&lt;/code&gt;) from the Hive CLI.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;USE tiny;

CREATE VIEW hive_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW hive_materialized_view AS
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you create the views, you should check the state in the hive metastore.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT t.TBL_NAME, t.TBL_TYPE, t.VIEW_EXPANDED_TEXT, t.VIEW_ORIGINAL_TEXT 
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
WHERE d.NAME = &apos;tiny&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the Hive views exist, you can then query them from Trino.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE VIEW hdfs.tiny.trino_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_032433_00002_6syuw): 
This connector does not support creating materialized views */
CREATE MATERIALIZED VIEW hdfs.tiny.trino_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_101856_00009_ihjur): 
This connector does not support creating views */
CREATE VIEW iceberg.tiny.iceberg_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* 
This REFRESH call failed during the show due to the fact that I created the 
materialized Trino view in the Iceberg (`iceberg`) catalog using tables from the
Hive(`hdfs`) catalog. I should have created the materialized view using the
iceberg catalog:

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM iceberg.tiny.customer c JOIN iceberg.tiny.orders o ON c.custkey = o.custkey;
*/
REFRESH MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view;

/* query tables */

SELECT * FROM hdfs.tiny.customer LIMIT 3;

SELECT * FROM hdfs.tiny.orders LIMIT 3;

/* query views */

SELECT * FROM hdfs.tiny.trino_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_materialized_view LIMIT 3;

SELECT * FROM iceberg.tiny.iceberg_materialized_view LIMIT 3;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-are-jdbc-drivers-backwards-compatible-with-older-trino-versions&quot;&gt;Question of the week: Are JDBC drivers backwards compatible with older Trino versions?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Full question:&lt;/strong&gt; Are JDBC drivers backwards compatible with older Trino 
versions? I’m trying to install the 354 driver on a multi-tenanted Tableau 
server where there might be older Trino versions in play. Do I need to upgrade 
my Trino clients right away when upgrading my server to Trino version from 
&amp;lt;=350 to &amp;gt;350?&lt;/p&gt;

&lt;p&gt;For this particular users case, the answer is that they won’t need to upgrade 
their clients assuming they are on Trino servers. If their server versions are
PrestoSQL version &amp;lt;= 350 then they will need to hold off on upgrading to a Trino
client.&lt;/p&gt;

&lt;p&gt;Trino’s JDBC drivers typically maintain compatibility with older server versions
(and vice versa). However, the project was renamed from PrestoSQL to Trino 
starting version 351, and as a consequence, JDBC drivers with version &amp;gt;= 351 are
not compatible with servers with version &amp;lt;= 350. More details at:
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In short, you can have a PrestoSQL client with a Trino server, but you can’t 
have a Trino client with an PrestoSQL server.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Join for an awesome event on May 26th as Iceberg Creator, Ryan Blue, dives 
 into some interesting and less conventional use cases of Apache Iceberg.
 &lt;a href=&quot;https://www.meetup.com/trino-americas/events/278103777/&quot;&gt;Trino Americas meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2020/coral&quot;&gt;https://engineering.linkedin.com/blog/2020/coral&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.arcadiadata.com/lp/tech-talk-on-join-optimization/&quot;&gt;https://www.arcadiadata.com/lp/tech-talk-on-join-optimization/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun enjoying the views...</summary>

      
      
    </entry>
  
    <entry>
      <title>17: Trino connector resurfaces API calls</title>
      <link href="https://trino.io/episodes/17.html" rel="alternate" type="text/html" title="17: Trino connector resurfaces API calls" />
      <published>2021-05-13T00:00:00+00:00</published>
      <updated>2021-05-13T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/17</id>
      <content type="html" xml:base="https://trino.io/episodes/17.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/trino-resurface.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun is diving deep to find anomalies!
&lt;/p&gt;

&lt;h2 id=&quot;resurface-links&quot;&gt;Resurface links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/&quot;&gt;Resurface site&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/resurfaceio&quot;&gt;Resurface GitHub&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/slack&quot;&gt;Resurface Slack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Rob Dickinson, Co-founder and CEO of &lt;a href=&quot;https://resurface.io/&quot;&gt;Resurface&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/robfromboulder&quot;&gt;@robfromboulder&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Martin Traverso, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-resurface-and-the-resurface-connector&quot;&gt;Concept of the week: Resurface and the Resurface connector&lt;/h2&gt;

&lt;h3 id=&quot;what-is-resurface&quot;&gt;What is Resurface?&lt;/h3&gt;
&lt;p&gt;Resurface is an API system of record, which is a fancy way of saying that 
Resurface is a purpose-built database for API requests and responses. Like a 
weblog or access log, but on steroids because Resurface runs on Trino.&lt;/p&gt;

&lt;p&gt;Why do you need a system of record for your APIs? Because otherwise you’re 
guessing about how your APIs are used and attacked, and guessing doesn’t feel 
good. Resurface helps your DevOps and security teams instantly find API 
failures, slowdowns, and attacks – easily, responsibly, and at scale.&lt;/p&gt;

&lt;h3 id=&quot;how-resurface-differs-from-logs--metrics&quot;&gt;How Resurface differs from logs &amp;amp; metrics&lt;/h3&gt;
&lt;p&gt;You probably use system monitoring tools, which tell you about what’s happening 
on your systems. What code is running, what code is slow, and what error codes 
are returned. That’s all great — but it still leaves a big gap between the 
system-level events you can see, and what your API consumers actually see.&lt;/p&gt;

&lt;p&gt;Resurface helps you fill this gap with your own API system of record. Now your 
customers, your DevOps team, and your security team all have the same view of 
every transaction, because there is a record of the requests and responses.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb1.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The other obvious way to compare Resurface against other tools is to look at the
data model. System monitoring gives you time-series metrics, or timestamped log
messages with a severity and detail string. Resurface gives you all the request
and response data fields, including headers and payloads, in a schema where all
of those fields are discrete and searchable. Plus it adds a bunch of helpful
virtual and computed columns.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb2.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;the-indexing-problem&quot;&gt;The indexing Problem&lt;/h3&gt;

&lt;p&gt;Resurface has a very descriptive data model, but there’s a problem here – how
to partition and index this data for efficient searching. Partitioning based on
time is the obvious starting point, but within a time range, what then? Index
everything?&lt;/p&gt;

&lt;p&gt;Most databases work best when a subset of the columns are constrained at once –
but in their case, they have strong reasons for wanting to use all columns at 
once. A system monitoring tool might give you a count of “500 codes” – but they
want to detect silent failures, like malformed JSON payloads or airline tickets 
selling for less than twenty dollars. That means looking at the URL, content 
type, other headers, and payloads, all at the same time.&lt;/p&gt;

&lt;p&gt;They also want to classify kinds of API consumers by their behaviors – are they
using or attacking your API? To classify those behaviors. Again, they look at 
the URL, content type, payloads. If they can query for the yellow region below,
they find lost revenue that they can recover.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb4.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Now you might be thinking – maybe the best solution is to do all this 
processing when the API calls are captured, but then how would you identify a 
new zero-day failure or attack? The definition of “responses failed” and 
“threats” needs to be changeable without having to reprocess any data, which 
really favors query-time processing.&lt;/p&gt;

&lt;p&gt;The example below is pretty much as simple as this gets. I struggled to find 
one of these queries that actually fits in a reasonable amount of space.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb5.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;So how to build a database that does these kinds of queries in reasonable time?&lt;/p&gt;

&lt;h3 id=&quot;the-resurface-connector&quot;&gt;The Resurface connector&lt;/h3&gt;

&lt;p&gt;The first prototype actually used the Trino memory connector, which gave them 
the kind of query performance that they were looking for, but wasn’t shippable 
(for obvious reasons).&lt;/p&gt;

&lt;p&gt;Then they tried Redis as a replacement in-memory db, but the problem is that the 
queries are gonna pull all the data in Redis over the network for every query.
Not cool.&lt;/p&gt;

&lt;p&gt;Trino allows you to move the queries closer to the data, and so that’s what they
did. They took inspiration from the “local file” connector, where the connector
reads directly from the filesystem instead of over the network.&lt;/p&gt;

&lt;p&gt;Then the question was, what file format to use?  They tried JSON, CSV, Protocol
Buffers, and ultimately found the fastest and simplest approach was just to
write a simple binary file format that requires no real parsing. When these
files fit in memory, their connector can process SQL queries at 4GB/sec per core. 
The connector was easy to write because they’re just mapping between fields in 
the binary files and the columns exposed to Trino. They built the first version
of their connector in a weekend!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb3.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-not-just-use-avro&quot;&gt;Why not just use Avro?&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Simple requirements – basic versioning, no secondary objects, limited data 
 types&lt;/li&gt;
  &lt;li&gt;Zero-allocation reader for fast linear scan – one memcpy per physical column&lt;/li&gt;
  &lt;li&gt;Connector can report null/not-null without type conversion&lt;/li&gt;
  &lt;li&gt;Connector defers type conversion until getXXX() method&lt;/li&gt;
  &lt;li&gt;getSlice() just wraps an existing buffer (zero allocation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of these optimizations were realized by working backwards from the Trino 
connector API to get the best linear scan performance imaginable.&lt;/p&gt;

&lt;h3 id=&quot;combining-api-calls-with-other-data&quot;&gt;Combining API calls with other data&lt;/h3&gt;

&lt;p&gt;Now they can deliver API call data out to all the different kinds of SQL clients 
out there, and they’re also able to combine API call data with data stored in 
other databases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb6.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This is really exciting because your Resurface database plays nicely with all 
your other databases that are bridged together with Trino. That means that 
actual API traffic can be brought into your customer data mart, or combined 
with data from any other systems, in real time!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4022-add-soundex-function&quot;&gt;PR of the week: PR 4022 Add Soundex function&lt;/h2&gt;

&lt;p&gt;A big shoutout to &lt;a href=&quot;https://github.com/tooptoop4&quot;&gt;tooptoop4&lt;/a&gt; for their contribution to this weeks
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4022&quot;&gt;PR of the week&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This PR adds the &lt;a href=&quot;https://en.wikipedia.org/wiki/Soundex&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;soundex()&lt;/code&gt; function&lt;/a&gt;, 
which is a phonetic function. These functions show up in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause of a
query to find words that sound similar. There’s a few examples in the demo
below.&lt;/p&gt;

&lt;p&gt;Thanks for this awesome contribution!&lt;/p&gt;

&lt;h2 id=&quot;demo-using-the-soundex-function&quot;&gt;Demo: Using the soundex function&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
SELECT * 
FROM (
  VALUES 
  (1, &apos;Bri&apos;), 
  (2, &apos;Bree&apos;), 
  (3, &apos;Bryan&apos;), 
  (4, &apos;Brian&apos;), 
  (5, &apos;Briann&apos;), 
  (6, &apos;Brianna&apos;), 
  (7, &apos;Briannas&apos;),
  (8, &apos;Bri Jan&apos;),  
  (9, &apos;Bri Yan&apos;),  
  (10, &apos;Bob&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Brian&apos;);

# Results:
# |id |name   |
# |---|-------|
# |3  |Bryan  |
# |4  |Brian  |
# |5  |Briann |
# |6  |Brianna|
# |9  |Bri Yan|

SELECT * 
FROM (
  VALUES 
  (1, &apos;Man&apos;), 
  (2, &apos;Fred&apos;), 
  (3, &apos;Manfred&apos;), 
  (4, &apos;Can fed&apos;), 
  (5, &apos;Tan bed&apos;), 
  (6, &apos;Man Fred&apos;), 
  (7, &apos;Man dread&apos;), 
  (8, &apos;Bob&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Manfred&apos;);

# Results:
# |id |name    |
# |---|--------|
# |3  |Manfred |
# |6  |Man Fred|

SELECT * 
FROM (
  VALUES 
  (1, &apos;Martin&apos;), 
  (2, &apos;Mar teen&apos;), 
  (3, &apos;Mar tin&apos;), 
  (4, &apos;Marteen&apos;), 
  (5, &apos;Mart in&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Martin&apos;);

# Results:
# |id |name    |
# |---|--------|
# |1  |Martin  |
# |2  |Mar teen|
# |3  |Mar tin |
# |4  |Marteen |
# |5  |Mart in |

SELECT * 
FROM (
  VALUES 
  (1, &apos;Robert&apos;), 
  (2, &apos;Rob&apos;), 
  (3, &apos;Bob&apos;), 
  (4, &apos;Bobert&apos;), 
  (5, &apos;Bobby&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Rob&apos;);

# Results:
# |id |name|
# |---|----|
# |2  |Rob |


SELECT * 
FROM (
  VALUES 
  (1, &apos;Christ&apos;), 
  (2, &apos;Christeen&apos;), 
  (3, &apos;Christian&apos;), 
  (4, &apos;Christine&apos;), 
  (5, &apos;Chris&apos;), 
  (6, &apos;Kristine&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Christine&apos;);

# Results:
# |id |name     |
# |---|---------|
# |1  |Christ   |
# |2  |Christeen|
# |3  |Christian|
# |4  |Christine|

# What the results actually return

SELECT name, soundex(name)
FROM (
  VALUES 
  (1, &apos;Christ&apos;), 
  (2, &apos;Christeen&apos;), 
  (3, &apos;Christian&apos;), 
  (4, &apos;Christine&apos;), 
  (5, &apos;Chris&apos;), 
  (6, &apos;Kristine&apos;), 
  (6, &apos;Christine&apos;)
) names(id, name);

# Results:
# |name     |_col1|
# |---------|-----|
# |Christ   |C623 |
# |Christeen|C623 |
# |Christian|C623 |
# |Christine|C623 |
# |Chris    |C620 |
# |Kristine |K623 |

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-how-to-export-query-results-into-a-file-eg-ctas-but-into-a-single-file&quot;&gt;Question of the week: How to export query results into a file (e.g. CTAS, but into a single file)?&lt;/h2&gt;

&lt;p&gt;This is possible using the &lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;Trino CLI&lt;/a&gt;’s
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--execute&lt;/code&gt; option in conjunction with the redirect operator (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt;). You may also
use other options, such as, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--output-format&lt;/code&gt; to specify the format of the data
going to the file (e.g. if you want a csv, tsv, json, headers, etc…)&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Output format for batch mode [ALIGNED, VERTICAL, TSV, TSV_HEADER, CSV, 
CSV_HEADER, CSV_UNQUOTED, CSV_HEADER_UNQUOTED, JSON, NULL] (default: CSV)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is an example of the command you would run using the cli executable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino --execute &quot;select * from tpch.sf1.customer limit 5&quot; \
--server http://localhost:8080 \
--output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you’re running Trino in Docker, here is an example command to run this in a
temporary Trino container.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-hdfs3_trino-network \
    --name export-trino-data \
    trinodb/trino:latest \
    trino --execute &quot;select * from tpch.sf1.customer limit 5&quot; \
    --server http://trino-coordinator:8080 \
    --output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you have a very complex query that takes up multiple lines, or you don’t 
want to spend half of your day escaping quotations, you can put your SQL into a
file and reference the query using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-f&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--file&lt;/code&gt; options. The query 
above could be represented as this query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino --file query.sql \
--server http://localhost:8080 \
--output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This query along with the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.sql&lt;/code&gt; file produces an equivalent query:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;select * 
from tpch.sf1.customer 
limit 5;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, one last trick is to stage the data using the memory connector to stage
the data and finally export it. The &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino Definitive Guide&lt;/a&gt; 
has example for adding Iris data set into memory connector storage with CLI.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Apache Iceberg: A table format for data lakes with unforeseen use cases
    &lt;ul&gt;
      &lt;li&gt;Americas meetup&lt;/li&gt;
      &lt;li&gt;May 26th, 2021 @ 5:30p EDT&lt;/li&gt;
      &lt;li&gt;Link: &lt;a href=&quot;https://www.meetup.com/trino-americas/events/278103777/&quot;&gt;https://www.meetup.com/trino-americas/events/278103777/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Trino Summit
    &lt;ul&gt;
      &lt;li&gt;Hybrid event&lt;/li&gt;
      &lt;li&gt;September 15th, 2021&lt;/li&gt;
      &lt;li&gt;Link: &lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;http://starburst.io/trinosummit2021&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/why-we-love-trino&quot;&gt;https://resurface.io/blog/why-we-love-trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/what-is-api-observability&quot;&gt;https://resurface.io/blog/what-is-api-observability&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/forking-open-source&quot;&gt;https://resurface.io/blog/forking-open-source&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Francisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun is diving deep to find anomalies!</summary>

      
      
    </entry>
  
    <entry>
      <title>16: Make data fluid with Apache Druid</title>
      <link href="https://trino.io/episodes/16.html" rel="alternate" type="text/html" title="16: Make data fluid with Apache Druid" />
      <published>2021-04-29T00:00:00+00:00</published>
      <updated>2021-04-29T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/16</id>
      <content type="html" xml:base="https://trino.io/episodes/16.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/trino-druid.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun the speedy druid!
&lt;/p&gt;

&lt;h2 id=&quot;druid-links&quot;&gt;Druid links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://druid.apache.org/&quot;&gt;Apache Druid&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://druid.apache.org/community/&quot;&gt;Apache Druid Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.druidforum.org/&quot;&gt;Druid Forum&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Samarth Jain, Software Engineer at Netflix 
 (&lt;a href=&quot;https://www.linkedin.com/in/samarthjain11/&quot;&gt;@samarthjain11&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Parth Brahmbhatt, Senior Software Engineer at Netflix 
 (&lt;a href=&quot;https://twitter.com/brahmbhattparth/&quot;&gt;@brahmbhattparth&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Rachel Pedreschi, VP Community and Developer Relations at 
 &lt;a href=&quot;https://imply.io/&quot;&gt;Imply&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/rachelpedreschi&quot;&gt;@rachelpedreschi&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-356&quot;&gt;Release 356&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;https://trino.io/docs/current/release/release-356.html&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;General:
    &lt;ul&gt;
      &lt;li&gt;MATCH_RECOGNIZE clause support, used to detect patterns in a set of rows 
within a single query&lt;/li&gt;
      &lt;li&gt;soundex function&lt;/li&gt;
      &lt;li&gt;Property to limit planning time (and improved behavior about cancel during 
planning)&lt;/li&gt;
      &lt;li&gt;A bunch of performance improvements around pushdown (and start of docs for 
pushdowns)&lt;/li&gt;
      &lt;li&gt;Misc improvements around materialized views support&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;JDBC driver - OAuth2 token caching in memory&lt;/li&gt;
  &lt;li&gt;BigQuery - create and drop schema&lt;/li&gt;
  &lt;li&gt;Hive - Parquet, ORC and Azure ADL improvements&lt;/li&gt;
  &lt;li&gt;Iceberg - SHOW TABLES even when tables created elsewhere&lt;/li&gt;
  &lt;li&gt;Kafka - SSL support&lt;/li&gt;
  &lt;li&gt;Metadata caching improvements for a bunch of connectors&lt;/li&gt;
  &lt;li&gt;SPI: couple of changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-druid-and-realtime-analytics&quot;&gt;Concept of the week: Apache Druid and realtime analytics&lt;/h2&gt;

&lt;p&gt;This week covers Apache Druid, a modern, real-time OLAP database. Joining us is 
the head of developer relations at Imply, the company that creates an enterprise
 version of Druid, to cover what Druid is, and the use cases it solves.&lt;/p&gt;

&lt;p&gt;Here are the slides that Rachel uses in the show:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/1fKHCGSRJwUjB7&quot; width=&quot;800&quot; height=&quot;650&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3 id=&quot;druid-architecture&quot;&gt;Druid Architecture&lt;/h3&gt;

&lt;p&gt;Druid has several process types:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Coordinator&lt;/strong&gt; processes manage data availability on the cluster.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Overlord&lt;/strong&gt; processes control the assignment of data ingestion workloads.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Broker&lt;/strong&gt; processes handle queries from external clients.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Router&lt;/strong&gt; processes are optional processes that can route requests to Brokers, Coordinators, and Overlords.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Historical&lt;/strong&gt; processes store queryable data.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;MiddleManager&lt;/strong&gt; processes are responsible for ingesting data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/druid-architecture.png&quot; /&gt;&lt;br /&gt;
The Druid architecture.
&lt;/p&gt;

&lt;p&gt;Druid processes can be deployed any way you like, but for ease of deployment we 
suggest organizing them into three server types: Master, Query, and Data.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Master: Runs Coordinator and Overlord processes, manages data availability and ingestion.&lt;/li&gt;
  &lt;li&gt;Query: Runs Broker and optional Router processes, handles queries from external clients.&lt;/li&gt;
  &lt;li&gt;Data: Runs Historical and MiddleManager processes, executes ingestion workloads and stores all queryable data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://druid.apache.org/docs/latest/design/architecture.html&quot;&gt;https://druid.apache.org/docs/latest/design/architecture.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-3522-add-druid-connector&quot;&gt;PR of the week: PR 3522 Add Druid connector&lt;/h2&gt;

&lt;p&gt;Our guest, Samarth, is the author of this weeks 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3522&quot;&gt;PR of the week&lt;/a&gt;. 
&lt;a href=&quot;https://twitter.com/puneetjaiswal&quot;&gt;Puneet Jaiswal&lt;/a&gt; is the first engineer that
started work to add a Druid connector. Later, Samarth picked up the torch and 
the Trino Druid connector became available in 
&lt;a href=&quot;/docs/current/release/release-337.html&quot;&gt;release 337&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An honorable mention goes to our other guest, Parth, for doing some 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3697&quot;&gt;preliminary work&lt;/a&gt; that enabled 
aggregation pushdown in the SPI. This enabled the use of the Druid connector to
actually scale well with the completion of PR 4313 (see future work below).&lt;/p&gt;

&lt;p&gt;A &lt;a href=&quot;https://github.com/trinodb/trino/pull/3881&quot;&gt;third honorable PR&lt;/a&gt;, 
that was completed by &lt;a href=&quot;https://twitter.com/findepi&quot;&gt;@findepi&lt;/a&gt;, was adding 
pushdown to the jdbc client which appeared in release 337 along with the Druid 
connector.&lt;/p&gt;

&lt;p&gt;It is incredible to see the amount of hands that various features and connectors
pass through to get to the final release.&lt;/p&gt;

&lt;h3 id=&quot;future-work&quot;&gt;Future work:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4249&quot;&gt;SPI and optimizer rule for connectors that can support complete topN (PR 4249)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4313&quot;&gt;Implement aggregate pushdown for Druid (PR 4313)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4554&quot;&gt;Optimizer rule to support aggregate pushdown with grouping sets (PR 4554)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-using-the-druid-web-ui-to-create-an-ingestion-spec-querying-via-trino&quot;&gt;Demo: Using the Druid Web UI to create an ingestion spec querying via Trino&lt;/h2&gt;

&lt;p&gt;Let’s start up the Druid cluster along with the required Zookeeper and 
PostgreSQL instance. Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-druid&lt;/code&gt;
directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/druid/trino-druid

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To do batch insert, navigate to the Druid Web UI once it has finished starting 
up at &lt;a href=&quot;http://localhost:8888&quot;&gt;http://localhost:8888&lt;/a&gt;. Once that is done, click the “Load data” button, 
choose, “Example data”, and follow the prompts to create the native batch 
ingestion spec. Once the spec is created, run the job and ingest the data.
More information can be found here: &lt;a href=&quot;https://druid.apache.org/docs/latest/tutorials/index.html&quot;&gt;https://druid.apache.org/docs/latest/tutorials/index.html&lt;/a&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/druid-console.png&quot; /&gt;&lt;br /&gt;
The Druid architecture.
&lt;/p&gt;

&lt;p&gt;Once Druid completes the task, open up a Trino connection and validate that the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;druid&lt;/code&gt; catalog exists.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker exec -it trino-druid_trino-coordinator_1 trino

trino&amp;gt; SHOW CATALOGS;

 Catalog 
---------
 druid   
 system  
 tpcds   
 tpch    
(4 rows)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now show the tables under the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;druid.druid&lt;/code&gt; schema.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SHOW TABLES IN druid.druid;
   Table   
-----------
 wikipedia 
(1 row)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt;  to see the column definitions.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SHOW CREATE TABLE druid.druid.wikipedia;
             Create Table             
--------------------------------------
 CREATE TABLE druid.druid.wikipedia ( 
    __time timestamp(3) NOT NULL,     
    added bigint NOT NULL,            
    channel varchar,                  
    cityname varchar,                 
    comment varchar,                  
    commentlength bigint NOT NULL,    
    countryisocode varchar,           
    countryname varchar,              
    deleted bigint NOT NULL,          
    delta bigint NOT NULL,            
    deltabucket bigint NOT NULL,      
    diffurl varchar,                  
    flags varchar,                    
    isanonymous varchar,              
    isminor varchar,                  
    isnew varchar,                    
    isrobot varchar,                  
    isunpatrolled varchar,            
    metrocode varchar,                
    namespace varchar,                
    page varchar,                     
    regionisocode varchar,            
    regionname varchar,               
    user varchar                      
 )                                    
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, query the first 5 rows of data showing the user and how much they added.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SELECT user, added FROM druid.druid.wikipedia LIMIT 5;
      user       | added 
-----------------+-------
 Lsjbot          |    31 
 ワーナー成増    |   125 
 181.230.118.178 |     2 
 JasonAQuest     |     0 
 Kolega2357      |     0 
(5 rows)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-why-doesnt-the-druid-connector-use-the-native-json-over-http-calls&quot;&gt;Question of the week: Why doesn’t the Druid connector use the native json over http calls?&lt;/h2&gt;

&lt;p&gt;To answer this question I’m going to quote Samarth and Parth on this from 
&lt;a href=&quot;https://trinodb.slack.com/archives/CHD6386E4/p1589311502029000?thread_ts=1586167749.002500&amp;amp;cid=CHD6386E4&quot;&gt;this super long but enlightening thread&lt;/a&gt;
on the subject.&lt;/p&gt;

&lt;h3 id=&quot;samarths-take&quot;&gt;Samarth’s take:&lt;/h3&gt;

&lt;p&gt;Pro JDBC:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Going forward, Druid SQL is going to be the de-facto way of accessing Druid 
 data with native JSON queries being more of an advanced level use case. A 
 benefit of down the SQL route is that we can take advantage of all the changes 
 made in the Druid SQL optimizer land like using vectorized query processing 
 when possible, when to use a TopN vs group by query type, etc.  If we were to 
 hit historicals directly, which don’t support SQL querying, we potentially 
 won’t be taking advantages of such optimizations unless we keep
 porting/applying them to the trino-druid connector which may not always be 
 possible.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If we end up letting a Trino node act as a Druid broker (which is what 
 would happen I assume when you let a Trino node do the final merging), then, 
 you would need to allocate  similar kinds of resources (direct memory buffers, 
 etc.) to all the Trino worker nodes as a Druid broker which may not be ideal.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This is not necessarily a limitation but adds complexity - with your proposed 
 implementation, the Trino cluster will need to maintain state about what Druid
 segments are hosted on what data nodes (middle managers and historicals). The 
 Druid broker already maintains that state and having to replicate and store all
 that state on the Trino coordinator will demand more resources out of it.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To your point on SCAN query overwhelming the broker - that shouldn’t be the 
 case as Druid scan query type streams results through broker instead of 
 materializing all of them in memory. See: &lt;a href=&quot;https://druid.apache.org/docs/latest/querying/scan-query.html&quot;&gt;https://druid.apache.org/docs/latest/querying/scan-query.html&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pro HTTP:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;One use case where directly hitting the historicals may help is when the 
 group by key space is large (like a group by on UUID like column). For a very 
 large data set, a Druid broker can get overwhelmed when performing the giant 
 merge. By hitting historicals directly, we can let historicals do first level 
 merge followed by multiple Trino workers doing the second level merge. I am 
 not sure if solving for this limited use case is worth going the http native
 query route, though. IMHO, Druid generally isn’t built for pulling lots of 
 data out of it. You can do it, but whether you want to push that work down to 
 Druid cluster or let Trino directly pull it down for you is debatable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would advocate for going the Druid SQL route at least for the initial version 
of the connector. This would provide a solution for the majority of the use 
cases that Druid generally is used for (OLAP style queries over pre-aggregated 
data). We could in the next version of the connector, possibly focus on adding a
new mode of the connector which can make native JSON queries directly to the 
Druid historicals and middle managers instead of submitting SQL queries to the 
broker.&lt;/p&gt;

&lt;h3 id=&quot;parths-take&quot;&gt;Parth’s take:&lt;/h3&gt;

&lt;p&gt;Our general take is that Druid is designed as OLAP cube and so it is really fast
when it comes to aggregate queries over reasonable cardinality dimensions and 
will not work well for use cases that are treating it like a regular data 
warehouse and trying to do pure select scans with filter. The primary reason 
most of our users would look to Trino’s Druid connector is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;To be able to join already aggregated data in Druid to some other datastore 
 in our warehouse.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To gain access through tooling that doesn’t have good support for Druid 
 inherently for dashboarding use cases (think Tableau).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even if we wanted to support the use cases that Druid is not designed for in a 
more efficient manner by going thorough historicals directly, it has other 
implications. We are now talking about partial aggregation pushdown which is 
more complicated IMO than our current approach of complete pushdown. We could 
choose to take the approach that others have taken where we can incrementally 
add a mode to Druid connector to either use JDBC or go directly to historical, 
but I really don’t think it’s a good idea to block the current development in 
hopes of a more efficient future version specially when this is just 
implementation detail that we can switch anytime without breaking any user 
queries.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Summit:
&lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;http://starburst.io/trinosummit2021&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06&quot;&gt;https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://imply.io/post/apache-druid-joins&quot;&gt;https://imply.io/post/apache-druid-joins&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/gumgum-tech/optimized-real-time-analytics-using-spark-streaming-and-apache-druid-d872a86ed99d&quot;&gt;https://medium.com/gumgum-tech/optimized-real-time-analytics-using-spark-streaming-and-apache-druid-d872a86ed99d&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.inovex.de/blog/a-close-look-at-the-workings-of-apache-druid/&quot;&gt;https://www.inovex.de/blog/a-close-look-at-the-workings-of-apache-druid/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&quot;&gt;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun the speedy druid!</summary>

      
      
    </entry>
  
    <entry>
      <title>15: Iceberg right ahead!</title>
      <link href="https://trino.io/episodes/15.html" rel="alternate" type="text/html" title="15: Iceberg right ahead!" />
      <published>2021-04-15T00:00:00+00:00</published>
      <updated>2021-04-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/15</id>
      <content type="html" xml:base="https://trino.io/episodes/15.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/15/trino-iceberg.png&quot; /&gt;&lt;br /&gt;
Looks like Commander Bun Bun is safe on this Iceberg&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;iceberg-links&quot;&gt;Iceberg links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community/&quot;&gt;Apache Iceberg Community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Ryan Blue, creator of Iceberg, and Senior Software Engineer at 
 Netflix (&lt;a href=&quot;https://github.com/rdblue&quot;&gt;@rdblue&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;David Phillips, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/electrum32&quot;&gt;@electrum32&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-355&quot;&gt;Release 355&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-355.html&quot;&gt;https://trino.io/docs/current/release/release-355.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Multiple password authentication plugins&lt;/li&gt;
  &lt;li&gt;Column and table lineage reporting in query events&lt;/li&gt;
  &lt;li&gt;Improved planning performance for queries against Phoenix or SQL Server&lt;/li&gt;
  &lt;li&gt;Improved performance for ORDER BY … LIMIT queries against Phoenix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Security overview and TLS pages and authentication types&lt;/li&gt;
  &lt;li&gt;Reiterate multiple authentication providers (ldap1, ldap2, password)&lt;/li&gt;
  &lt;li&gt;Improved parallelism for table bucket count is small compared to number of nodes.&lt;/li&gt;
  &lt;li&gt;Include information about Spill to disk in EXPLAIN ANALYZE&lt;/li&gt;
  &lt;li&gt;Unixtime function changes&lt;/li&gt;
  &lt;li&gt;Hive view support improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-iceberg-and-the-iceberg-spec&quot;&gt;Concept of the week: Apache Iceberg and the Iceberg spec&lt;/h2&gt;

&lt;h3 id=&quot;interview-with-ryan-blue&quot;&gt;Interview with Ryan Blue&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/14.html&quot;&gt;the previous episode&lt;/a&gt;, we covered the 
differences between the Iceberg table format, and the Hive table format from a 
technical standpoint in the context of Trino. We highly recommend watching it
before this episode. In this episode we ask Ryan about the origins of Apache 
Iceberg and why he started the project. We cover some details of the 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg specification&lt;/a&gt; which is a nice change
from the ad-hoc specification that people adhere to when using Hive tables. Then
Ryan dives into several amazing use cases how Netflix and others use Iceberg.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-7233-fix-queries-on-tables-without-snapshot-id&quot;&gt;PR of the week: PR 7233 Fix queries on tables without snapshot id&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/7233&quot;&gt;PR of the week&lt;/a&gt; was 
submitted by one of the Trino maintainers,
&lt;a href=&quot;https://twitter.com/desai_pratham&quot;&gt;Pratham Desai&lt;/a&gt;. Pratham is a Software 
Engineer at LinkedIn who commits a lot of time in the Trino community helping
out on the slack channel, contributing code, and doing PR reviews. Thank you for
all you do Pratham!&lt;/p&gt;

&lt;p&gt;Had Brian known about this PR, he wouldn’t have had the issue he did with 
reading the empty snapshot created with the Iceberg Java API and would have been 
able to read and insert into the table just fine. If you come across this issue,
we introduced this feature in 
&lt;a href=&quot;/docs/current/release/release-344.html&quot;&gt;release 344&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;another-future-development-for-the-trino-iceberg-connector&quot;&gt;Another future development for the Trino Iceberg connector&lt;/h3&gt;

&lt;p&gt;Along with the future developments we discussed in the previous episode, another
core Iceberg functionality that we want to add in Trino is support for
&lt;a href=&quot;https://github.com/trinodb/trino/issues/7580&quot;&gt;partition migration&lt;/a&gt;. We also 
discussed future support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; capabilities for the Iceberg 
connector.&lt;/p&gt;

&lt;h2 id=&quot;demo-creating-tables-with-iceberg-and-reading-the-data-in-trino&quot;&gt;Demo: Creating tables with Iceberg and reading the data in Trino&lt;/h2&gt;

&lt;p&gt;For this weeks’ demo, we continue to use the Iceberg Java API to create a table.
You also have the option to use Trino, Spark, or other to ingest and query the
data, but I wanted to use vanilla Iceberg API’s to experience the API and
hopefully solidify my learning of Iceberg concepts in the process. Make sure you
follow the instructions in the repository if you don’t have Docker or Java
installed.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone this 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In your favorite IDE, open the files under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/iceberg-java&lt;/code&gt; into your
project and run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IcebergMain&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;This class creates a logging table if it doesn’t exist along with the logging 
schema. Once you run this code, you can check to see that the table in Trino 
exists in the metastore under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE_PARAMS&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we transition from the Java API to running queries over Iceberg using Trino.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * This is the equivalent of running IcebergMain in the iceberg-java project.
 * Go ahead and inspect the java code you can use to interact with Iceberg
 * tables and metadata.
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;hour(event_time)&apos;,&apos;level&apos;]
)

/**
 * Read From Trino
 */

SELECT * FROM iceberg.logging.logs;

/**
 * Write data from Trino and check data and snapshots
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Write more data from Trino and check data and snapshots
 */
INSERT INTO iceberg.logging.logs 
VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 15:55:23.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Double oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 15:55:23.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Maybeh oh noes?&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
);

 
SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Read data from an old snapshot (Time travel)
 */

SELECT * FROM iceberg.logging.&quot;logs@2806470637437034115&quot;;

/**
 * Add new column, notice there is no snapshots of the metadata
 */

ALTER TABLE iceberg.logging.logs ADD COLUMN severity INTEGER;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Insert new data with new column
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;INFO&apos;, 
  timestamp &apos;2021-04-01 19:59:59.999999&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;es muy bueno&apos;, 
  ARRAY [&apos;It is all normal&apos;], 
  1
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Rename column and drop column
 */

ALTER TABLE iceberg.logging.logs RENAME COLUMN severity TO priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

ALTER TABLE iceberg.logging.logs DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Travel back to previous snapshots
 */

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

SELECT * FROM iceberg.logging.&quot;logs@&amp;lt;insert-earlier-snapshot&amp;gt;&quot;;

CALL system.rollback_to_snapshot(&apos;logging&apos;, &apos;logs&apos;, &amp;lt;insert-earlier-snapshot&amp;gt;)

/**
 * Back to the future snapshot
 */

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

SELECT * FROM iceberg.logging.&quot;logs@&amp;lt;insert-latest-snapshot&amp;gt;&quot;;

CALL system.rollback_to_snapshot(&apos;logging&apos;, &apos;logs&apos;, &amp;lt;insert-latest-snapshot&amp;gt;)

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-what-do-i-do-to-restart-the-test-pipeline-if-it-fails-on-me&quot;&gt;Question of the week: What do I do to restart the test pipeline if it fails on me?&lt;/h2&gt;

&lt;p&gt;When developing with Trino, there is an automated build that acts as 
verification of any PR. It is powered by a GitHub actions definition and runs 
all the tests in Trino when developers add new code. Sometimes test unrelated to
the changes in your PR fail, which makes your PR show that it shouldn’t be 
merged due to a failure, but is actually unrelated.&lt;/p&gt;

&lt;p&gt;Developers are aware of these flaky tests, and need a mechanism to resubmit 
their PR and rerun the tests. There is unfortunately no way to enable users to 
rerun tests through GitHub without write permissions to the Trino repository, so
you have to do a dummy commit.&lt;/p&gt;

&lt;p&gt;This can easily be done using this one line hack 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git commit --amend --no-edit &amp;amp;&amp;amp; git push -f&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The good news is, we have gone through some extensive lengths to identify flaky
tests in the last year. These test failures are much rarer now, and we are 
constantly improving the build stability as an ongoing effort.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;h3 id=&quot;wtd-portland&quot;&gt;WTD Portland&lt;/h3&gt;

&lt;p&gt;Interested in supporting the Trino project, but don’t know where to start? A 
good place to start with a little less barrier to entry, is adding to the 
documentation. We will be supporting the 
&lt;a href=&quot;https://trino.io/blog/2021/04/14/wtd-writing-day.html&quot;&gt;writing day&lt;/a&gt; at the
Write the Docs (WTD) Portland conference this April! Join us to learn how to get involved!&lt;/p&gt;

&lt;h3 id=&quot;virtual-trino-meetups&quot;&gt;Virtual Trino meetups&lt;/h3&gt;

&lt;p&gt;Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual
meetup group in your region! See &lt;a href=&quot;./community.html&quot;&gt;the community page&lt;/a&gt; for more
details.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/277246268/&quot;&gt;Trino Americas meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/events/277246173/&quot;&gt;Trino EMEA meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/events/277246078/&quot;&gt;Trino APAC meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At these meetups, the four Trino/Presto founders will be updating everyone on 
the state of Trino. We’ll discuss the rebrand, talk about the recent features, 
and discuss the trajectory of the project. Then we will host a hangout and an
ask me anything (AMA) session. Hope to see you all there!&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&quot;&gt;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&quot;&gt;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&quot;&gt;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&quot;&gt;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Looks like Commander Bun Bun is safe on this Iceberg https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>14: Iceberg: March of the Trinos</title>
      <link href="https://trino.io/episodes/14.html" rel="alternate" type="text/html" title="14: Iceberg: March of the Trinos" />
      <published>2021-04-01T00:00:00+00:00</published>
      <updated>2021-04-01T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/14</id>
      <content type="html" xml:base="https://trino.io/episodes/14.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/trino-penguin.png&quot; /&gt;&lt;br /&gt;
March of the Trinos! Be careful Commander Bun Bun! That Iceberg doesn&apos;t look stable!&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;iceberg-links&quot;&gt;Iceberg links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community/&quot;&gt;Apache Iceberg Community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;David Phillips, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/electrum32&quot;&gt;@electrum32&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-354&quot;&gt;Release 354&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-354.html&quot;&gt;https://trino.io/docs/current/release/release-354.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for OAuth 2.0 in CLI&lt;/li&gt;
  &lt;li&gt;Support for MemSQL 3.2&lt;/li&gt;
  &lt;li&gt;Pushdown of ORDER BY … LIMIT for MemSQL, MySQL and SQL Server connectors&lt;/li&gt;
  &lt;li&gt;Support for time(p) in SQL Server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;LEFT, RIGHT and FULL JOIN&lt;/li&gt;
  &lt;li&gt;Preferred write partitioning on by default (needs statistics)&lt;/li&gt;
  &lt;li&gt;Small but useful fix on Elasticsearch (single value array)&lt;/li&gt;
  &lt;li&gt;Hive connector&lt;/li&gt;
  &lt;li&gt;Fix ACID table DELETE and UPDATE - critical fix is in! Boom!&lt;/li&gt;
  &lt;li&gt;Avro format improvement&lt;/li&gt;
  &lt;li&gt;CSV and Glue metadata improvement&lt;/li&gt;
  &lt;li&gt;Iceberg - date and timestamp improvement&lt;/li&gt;
  &lt;li&gt;CREATE SCHEMA fixes  in MySQL, PostgreSQL, Redshift and SQL Server&lt;/li&gt;
  &lt;li&gt;Bunch of other fixes in those connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-iceberg-and-the-table-format&quot;&gt;Concept of the week: Apache Iceberg and the table format&lt;/h2&gt;

&lt;h3 id=&quot;the-hive-table-format&quot;&gt;The Hive table format&lt;/h3&gt;

&lt;p&gt;For the last decade or so, big data professionals’ only option to query their 
data was to, in some way shape or form, use the Hive model. The Hive model is
very simple, but it enabled running queries over files in a distributed file
system.&lt;/p&gt;

&lt;p&gt;To accomplish this, Hive uses a metastore service which stores and manages
metadata. For Hive and Trino, this metadata acts as a pointer to the files
containing the data, contains the file format, and has the column structure and
types. This enabled Hive to query the correct files and data within those files
for a SQL query. For more information on Hive’s architecture, read the
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Gentle Introduction to Hive&lt;/a&gt;
blog. After the initial model gained adoption, Hive added other features such as
partitioning. It uses the directory structures of the filesystems to split the 
files of data partitioned on a special column into different directories. We 
talk about this in more depth &lt;a href=&quot;/episodes/5.html&quot;&gt;a few episodes back&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Hive model solved some initial issues facing engineers in big data, but 
there were quite a few issues with this model. It is very rigid and not able
to adapt to your needs as requirements change. For example, if you started
partitioning your data splitting by date and segmenting by month, that table is
stuck with that partitioning forever. The only way to update it is to create a
new table with your new partition values, and migrate all of your data from
the old table to the new table. With the common data sizes such a migration is 
often a long process, sometimes even impossible. Another issue stems from the 
separation of data stored in the metastore and data stored in the file system. 
The source of many issues in Hive is caused by the Hive metastore getting out of
sync. A third but not final issue, is that running operations against the 
metastore is a timely process when running operations like list files on more 
modern object storage.&lt;/p&gt;

&lt;p&gt;As all these problems amassed over the years, clearly something needed to be 
done. In the last few years, a few candidate table formats have come to the 
forefront of data engineering trends. Examples are, Apache Iceberg, Apache Hudi,
and the proprietary Databricks’ Deltalake. The goal of these systems is to 
modernize the old Hive data structure. To Trino, Iceberg is particularly 
promising due to the list of promising features like schema versioning support 
and hidden partitioning that made it particularly attractive. Let’s talk about 
some of these features in detail.&lt;/p&gt;

&lt;h3 id=&quot;the-iceberg-table-format&quot;&gt;The Iceberg table format&lt;/h3&gt;

&lt;p&gt;Iceberg, is a new table format developed at Netflix that aims to replace older 
table formats like Hive to add better flexibility as the schema evolves, atomic
operations, speed, and just dependability. To be clear, it’s not a new file
format, as it still uses ORC, Parquet, and Avro, but a table format. Netflix 
donated Iceberg to the Apache Software Foundation and it is now a top level
project!&lt;/p&gt;

&lt;p&gt;Iceberg handles both the data on disk just like Hive, but instead it stores the
metadata in manifest files on disk along with the data itself. These &lt;em&gt;manifest 
files&lt;/em&gt; are AVRO files that contain table metadata that lists a subset of data 
files. &lt;em&gt;Manifest lists&lt;/em&gt; are a special type of manifest file that point to other 
manifest files. &lt;em&gt;Snapshots&lt;/em&gt; contain a manifest list that points to all the 
manifest files that belong to the snapshot. Another huge difference from Hive is
that the manifest files keep track of table data at the file level as opposed to
directory level that Hive uses. By doing so, Iceberg avoids having to list all 
files in a directory, which becomes a very common and expensive operation.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
  &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/iceberg-metadata.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;By tracking files this way, we not only get better performance from object
storage, it also enables serializable isolation. This addresses the lack of
consistency between the metadata and file state experienced in Hive.&lt;/p&gt;

&lt;p&gt;One of the greater advantages to Iceberg over Hive is the in-place table
evolution. This means that you can add, drop, or rename a column, as well as, 
reorder and update a column without any expensive refactoring of tables or 
moving data around and there is no adverse effects on your data or metadata.&lt;/p&gt;

&lt;p&gt;Partition evolution and hidden partitions are particularly invaluable. In 
Iceberg, the &lt;em&gt;partition spec&lt;/em&gt; is a description of how to partition data in a 
table consisting of a list of source columns and transforms. Once the spec is 
created, it generates a partition tuple that is applied uniformly to the files 
created with that spec. Unlike Hive, that requires you to modify and send a 
special column that acts as the partition value, Iceberg stores partition values
unmodified. Here’s an example partition spec generated in the Java API.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PartitionSpec spec = PartitionSpec.builderFor(schema)
        .hour(&quot;event_time&quot;)
        .identity(&quot;level&quot;)
        .build();
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This example creates a separate hourly partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field and 
use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;identity()&lt;/code&gt; function on level to generate another level of partitioning
on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;level&lt;/code&gt; field. If at a later time, you decide you are getting too many 
small files because your partitions are too small, then you  can update the 
partition spec and Iceberg starts writing new files by the updated spec. 
Again, this is all without creating a new table and moving data around and all 
the queries return correctly. This kind of evolution is a problem with Hive.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
  &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/partition-spec-evolution.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;If all that isn’t enough, you can also do time travel and version rollback with
Iceberg. As we mentioned above, Iceberg keeps track of various snapshots of 
your data in time through manifest files. As long as you keep those older 
snapshots around, the files associated with those snapshots stick around as 
well. This allows you to move around to previous views of the data. This is 
useful for testing, recovery, and many other purposes. Just as
you can time travel, you can make the time travel permanent by rolling back
any unintended changes and deleting the undesired snapshot.&lt;/p&gt;

&lt;p&gt;Iceberg is also able to offer fast scan planning by filtering out the metadata
files that are irrelevant to the scan, and using the partition spec to only find
files containing responses to the data. Iceberg filters the metadata using
partition value ranges and seeing if that is contained within the files of
the metadata. Then while processing the list of manifest files, Iceberg will
filter files by query predicates included in the partition, then apply column
stats to help prune out files that don’t match. Iceberg also uses multiple
concurrent writers to speed things up as a final measure.&lt;/p&gt;

&lt;p&gt;Saving the best for last; Iceberg is a community standard and has 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;a full written specification&lt;/a&gt; which is a nice
change from Hive which is an ad-hoc specification that people adhere to in some 
ways. There have been many issues over the years due to the variance of how the
unwritten specification gets interpreted. This not only enables people to 
understand how to use it, but documents how others can implement the same
features with an entirely different systems. Let’s wait to do a deep dive on the
spec for the next episode when we bring on Ryan Blue, creator of Iceberg, to dig
into these details.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1067-add-iceberg-connector&quot;&gt;PR of the week: PR 1067 Add Iceberg connector&lt;/h2&gt;

&lt;p&gt;A huge shoutout goes to &lt;a href=&quot;https://github.com/Parth-Brahmbhatt&quot;&gt;Parth Brahmbhatt&lt;/a&gt;,
a Senior Software Engineer at Netflix who created this weeks’ 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1067&quot;&gt;PR of the week&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-318.html&quot;&gt;Release 318&lt;/a&gt;, 
introduced this code that supported querying tables from Apache Iceberg in
Trino. While the code existed, the Iceberg connector code wasn’t officially
released or documented until a little over a year later in 
&lt;a href=&quot;/docs/current/release/release-341.html&quot;&gt;release 341&lt;/a&gt; once the connector reached
maturity.&lt;/p&gt;

&lt;h3 id=&quot;future-development-for-the-trino-iceberg-connector&quot;&gt;Future development for the Trino Iceberg connector&lt;/h3&gt;

&lt;p&gt;Still, some strange artifacts that we’re still facing today in the connector.
For example, if you create a table with the Iceberg Java API, &lt;a href=&quot;https://github.com/apache/iceberg/blob/996ed979f396f2c7cc12ca824a3fe758f2c486ce/hive/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L222&quot;&gt;it creates
Iceberg tables with &amp;lt;table_type, ICEBERG&amp;gt;&lt;/a&gt;
but Trino &lt;a href=&quot;https://github.com/prestosql/presto/blob/master/presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/HiveTableOperations.java#L190&quot;&gt;creates and reads with &amp;lt;table_type, iceberg&amp;gt;&lt;/a&gt;.
See &lt;a href=&quot;https://github.com/trinodb/trino/issues/1592&quot;&gt;Issue 1592&lt;/a&gt; for status and 
details. In general, we can track some of the broader changes that are being 
made to &lt;a href=&quot;https://github.com/trinodb/trino/issues/1324&quot;&gt;the Iceberg connector here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-creating-tables-with-iceberg-and-reading-the-data-in-trino&quot;&gt;Demo: Creating tables with Iceberg and reading the data in Trino&lt;/h2&gt;

&lt;p&gt;For this weeks’ demo, I wanted to play around with the Iceberg Java API directly.
You also have the option to use Trino, Spark, or other to ingest and query the
data, but I wanted to use vanilla Iceberg API’s to experience the API and
hopefully solidify my learning of Iceberg concepts in the process. Make sure you
follow the instructions in the repository if you don’t have Docker or Java
installed.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone this 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In your favorite IDE, open the files under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/iceberg-java&lt;/code&gt; into your
project and run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IcebergMain&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;This class creates a logging table if it doesn’t exist. Once you run this code,
you can check to see that the table in Trino exists in the metastore under
TABLE_PARAMS. But, if run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW TABLES IN iceberg.logging;&lt;/code&gt; you’ll notice that
the table doesn’t show up due to &lt;a href=&quot;https://github.com/trinodb/trino/issues/1592&quot;&gt;the issue we discussed above&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s update the TABLE_PARAMS entry in the metastore db and then query the table
again.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-why-does-trino-still-depend-on-the-hive-metastore-if-metadata-for-iceberg-saves-to-the-filesystem&quot;&gt;Question of the week: Why does Trino still depend on the Hive metastore if metadata for Iceberg saves to the filesystem?&lt;/h2&gt;

&lt;p&gt;We kept the metastore as many tests run around using the metastore that exist
for the Hive connector, and we want to give the Iceberg connector ample time to
mature before we migrate entirely away from the metastore. We also wanted to 
make the metastore the initial method of use in Iceberg that got developed as
most developers initially would be migrating from their existing Hive catalog,
and we wanted this transition to use existing tested components.&lt;/p&gt;

&lt;p&gt;Currently, the metastore isn’t used the same way as in Hive. Trino stores a
top-level directory that points to the metadata manifest file location and other
statistics around the table in the TABLE_PARAMS table of the metastore. There
is a &lt;a href=&quot;https://github.com/trinodb/trino/pull/6977&quot;&gt;pull request created by Jack Ye&lt;/a&gt;
to migrate away from the requirement to use the Hive metastore when using 
Iceberg with Trino.&lt;/p&gt;

&lt;h2 id=&quot;tip-of-the-iceberg&quot;&gt;Tip of the Iceberg&lt;/h2&gt;

&lt;p&gt;Last bit of some fun with Iceberg. Let’s do a little experiment called, “Will 
the iceberg tip?”:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Go to &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;https://iceberg.apache.org/&lt;/a&gt; and take a look at the logo.&lt;/li&gt;
  &lt;li&gt;Now go to &lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Draw the Apache Iceberg logo and see what happens.&lt;/li&gt;
  &lt;li&gt;Now draw the iceberg in the image above that Commander Bun Bun is on.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When drawing the iceberg like the image with Commander Bun Bun, the iceberg tips
over. Careful Commander Bun Bun! It looks like the Apache logo wins! Shout out 
to &lt;a href=&quot;https://twitter.com/JoshData&quot;&gt;Joshua Tauberer&lt;/a&gt; for the web page. Shout out 
to &lt;a href=&quot;https://twitter.com/GlacialMeg&quot;&gt;Megan Thompson-Munson&lt;/a&gt; for the tweet that 
started the page. Shout out to 
&lt;a href=&quot;https://www.linkedin.com/in/bartonwright/&quot;&gt;Barton Wright&lt;/a&gt; from Manfred’s team 
of writers for being the geek to find this. Shout out to 
&lt;a href=&quot;https://twitter.com/aliLoney&quot;&gt;Ali&lt;/a&gt; for being a good sport and setting Command 
Bun Bun on the iceberg.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual
meetup group in your region!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/277246268/&quot;&gt;Americas meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/events/277246173/&quot;&gt;EMEA meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/events/277246078/&quot;&gt;APAC meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this meetup, the four Trino/Presto founders will be updating everyone on the
state of Trino. We’ll discuss the rebrand, talk about the recent features, and 
discuss the trajectory of the project. Then we will host a hangout and AMA. Hope
to see you all there!&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&quot;&gt;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&quot;&gt;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&quot;&gt;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&quot;&gt;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup Groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;https://www.meetup.com/trino-americas/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;https://www.meetup.com/trino-emea/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;https://www.meetup.com/trino-apac/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;https://www.meetup.com/trino-boston/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;https://www.meetup.com/trino-nyc/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;https://www.meetup.com/trino-san-francisco/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;https://www.meetup.com/trino-los-angeles/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;https://www.meetup.com/trino-chicago/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>March of the Trinos! Be careful Commander Bun Bun! That Iceberg doesn&apos;t look stable! https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>13: Trino takes a sip of Pinot!</title>
      <link href="https://trino.io/episodes/13.html" rel="alternate" type="text/html" title="13: Trino takes a sip of Pinot!" />
      <published>2021-03-18T00:00:00+00:00</published>
      <updated>2021-03-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/13</id>
      <content type="html" xml:base="https://trino.io/episodes/13.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/trinot
.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun loves sippin&apos; on Pinot after a hard day of data exploration!
&lt;/p&gt;

&lt;h2 id=&quot;pinot-links&quot;&gt;Pinot links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://communityinviter.com/apps/apache-pinot/apache-pinot&quot;&gt;Apache Pinot Slack&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/apache-pinot/events/275991991/&quot;&gt;Pinot Meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Xiang Fu, project management chair and committer at &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot&lt;/a&gt;
  and co-founder of stealth mode startup (&lt;a href=&quot;https://twitter.com/xiangfu0&quot;&gt;@xiangfu0&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Elon Azoulay, software engineer at stealth mode startup (&lt;a href=&quot;https://twitter.com/ElonAzoulay&quot;&gt;@ElonAzoulay&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-353&quot;&gt;Release 353&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;https://trino.io/docs/current/release/release-353.html&lt;/a&gt;
Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;New ClickHouse connector&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries involving UNNEST&lt;/li&gt;
  &lt;li&gt;CREATE/DROP TABLE in BigQuery connector&lt;/li&gt;
  &lt;li&gt;Reading and writing column stats in Glue Metastore&lt;/li&gt;
  &lt;li&gt;Support for Apache Phoenix 5.1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;New geometry functions&lt;/li&gt;
  &lt;li&gt;A whole bunch of correctness and performance improvements&lt;/li&gt;
  &lt;li&gt;Env var (and hence secrets) support for RPM-based installs&lt;/li&gt;
  &lt;li&gt;Hive - performance for bucketed table inserts&lt;/li&gt;
  &lt;li&gt;Kafka - schema registry improvements&lt;/li&gt;
  &lt;li&gt;Experimental join pushdown in a bunch of JDBC connectors&lt;/li&gt;
  &lt;li&gt;Also a bunch of fixes on JDBC connectors&lt;/li&gt;
  &lt;li&gt;Quite a list of changes on the SPI - ensure to check if you have a plugin&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-data-cubes-and-apache-pinot&quot;&gt;Concept of the week: Data cubes and Apache Pinot&lt;/h2&gt;

&lt;p&gt;Before diving into Pinot, I think it’s worthwhile to discuss some theoretical
background to motivate some of the use cases Pinot solves for. We cover the 
concept of data cubes and how they are used in traditional data warehousing to 
speed up queries and minimize unnecessary work on your OLAP system.&lt;/p&gt;

&lt;h3 id=&quot;data-cubes-and-molap-multi-dimensional-online-analytics-processing&quot;&gt;Data cubes and MOLAP (Multi-dimensional online analytics processing)&lt;/h3&gt;

&lt;p&gt;In data analytics, there are many access patterns that tend to repeat themselves
over and over again. It is very common to need to split and merge data based on 
the date and time values. Or perhaps you ask a lot of questions based on a 
specific customer, or even a specific product. Answering these questions 
typically involves aggregation of data like sums, averages, counts, etc… 
Wouldn’t it make sense to cache some of these intermediary results?&lt;/p&gt;

&lt;p&gt;A common way to visualize the columns that are commonly bucketed to some values
or range of values is to show them as a cube, that is sliced up into smaller
dimensions. This actually derives from the traditional form of OLAP, 
multi-dimensional OLAP (MOLAP).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/data_cube.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This cube represents a caching of data aggregations that are grouped by commonly
used dimensions. For example, the displayed cube would be the pre-aggregation of
the following query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, customer, COUNT(*)
FROM cube_table
GROUP BY part, store, customer
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we want to get the data for a particular customer, we can take a “slice” of
that cube by specifying a particular customer. The following query returns the
green square above from our cube.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, COUNT(*)
FROM cube_table
WHERE customer = &quot;Bob&quot;
GROUP BY part, store
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now what if we want to flatten one of the dimensions? While this can be managed
with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; as before, but depending on the system may ignore any cached
data and scan over all the rows. For this, SQL reserved a special set of
keywords around cubes. We won’t dive into that in depth now, but for our current
goal of flattening a dimension, we can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLLUP&lt;/code&gt;. Using the keyword &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLLUP&lt;/code&gt;
indicates to the underlying system that you intend to aggregate over the 
pre-materialized data rather than scan over all rows to compute again. This
gives you the total count of parts per store using the counts of the data cube.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, COUNT(*)
FROM cube_table
GROUP BY ROLLUP (part, store)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, although we used simple counts, you can precompute a lot of other aggregate
data like sums, min, max, percentile, etc… These can service various queries
that are commonly queried and don’t require a new computation every time. That
is the goal of MOLAP and data cubes.&lt;/p&gt;

&lt;h3 id=&quot;apache-pinot&quot;&gt;Apache Pinot&lt;/h3&gt;

&lt;p&gt;Now let’s move on to Apache Pinot. It is a realtime distributed OLAP datastore, 
designed to answer OLAP queries with low latency. Although there may be a lot of
words there that overlap with the Trino description, the key differentiators are
realtime and low latency. Trino performs batch processing and is not a realtime
system where Pinot is great for ingesting data in batch or stream. The other key
word, low latency could technically apply to both Pinot and Trino but in the
context of realtime subsecond latency, Trino is slow compared to Pinot. This
is due to the specialized indexes that Pinot uses to store the data that we
cover shortly. Importantly, another big distinction is that Trino does not store
any data itself. It is purely a query engine. Xiang has a really great summary
slide that easily shows the strengths of each system and why they work so well
together.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/latency_flexibility.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;While Trino is not as fast as Pinot, it is able to handle a broader set of
use cases like performing broad joins over open data formats in data lakes. 
This is what motivated work on the Trino Pinot connector. You can have the speed
of Pinot, while having the flexibility of Trino.&lt;/p&gt;

&lt;p&gt;Now that you understand the common use case for Pinot, it’s important to know 
the main goals of Pinot.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;One primary goal is the keep response times of aggregation queries
  predictable, regardless of how many requests Pinot handles. As it scales
  you won’t see a degradation of performance. This is achieved by Pinot’s
  custom indices and storage formats.
    &lt;p align=&quot;center&quot;&gt;
    &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/data_value.jpeg&quot; /&gt;&lt;br /&gt;
 &lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Another goal of Pinot is to revive the value of data from a historical
  context. Data reaches a particular point in its lifecycle where it becomes
  less valuable as it ages. While all data is able to add some value no matter
  what the age, there’s a tradeoff of scanning multiple rows to glean
  information from antiquated data. Pinot aims to remove this tradeoff as most 
  questions around historical data are queried in aggregate and this can be
  summarized and queried at a low cost.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The final goal is to manage dimension explosion. One of the difficulties
  with managing a system that caches all this historic data is handling
  dimension explosion that occurs when you cache every possible combination of
  data. Above we showed a three-dimensional cube, but Pinot can handle a much
  larger number of dimensions. However, just because you can, doesn’t mean you
  should. Pinot has a lot of smarts around using the data, and some good
  defaults to determine the maximum number of buckets per dimension. This helps
  balance an exploding cache yet maintains fast results.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;pinot-architecture&quot;&gt;Pinot architecture&lt;/h3&gt;

&lt;p&gt;We just covered Pinot theory and goals, let’s take a quick look at the
architecture.&lt;/p&gt;

&lt;p&gt;A &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/cluster&quot;&gt;Pinot cluster&lt;/a&gt; 
consists of a &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/controller&quot;&gt;controller&lt;/a&gt;, 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/components/broker&quot;&gt;broker&lt;/a&gt;, 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/components/server&quot;&gt;server&lt;/a&gt;, and
optionally a &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/minion&quot;&gt;minion&lt;/a&gt;
to purge data.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/pinot_architecture.svg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-2028-add-pinot-connector&quot;&gt;PR of the week: PR 2028 Add Pinot connector&lt;/h2&gt;

&lt;p&gt;Our guest on the show today, Elon Azoulay, is the author of 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/2028&quot;&gt;this PR&lt;/a&gt;, so we can ask him all
about it now.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/trino_pinot_connector.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/pinot.html#configuration&quot;&gt;Basic configuration (Pinot controller url, Pinot segment limit)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;2 ways to connect to Pinot - broker and server, and their tradeoffs 
 (i.e. segment limit for server)&lt;/li&gt;
  &lt;li&gt;Talk about broker passthrough queries, i.e select * from “select … from
  pinot_table …&lt;/li&gt;
  &lt;li&gt;Server limit that we eventually want to eliminate broker query parsing
    &lt;ul&gt;
      &lt;li&gt;How to crash the Pinot server.&lt;/li&gt;
      &lt;li&gt;Streaming server alternative&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;future-pinot-features-in-trino&quot;&gt;Future Pinot features in Trino&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/6069&quot;&gt;Aggregation pushdown (PR 6069)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/aggregation_pushdown.png&quot; /&gt;&lt;br /&gt;
 &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7162&quot;&gt;Pinot insert (PR 7162)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7164&quot;&gt;Pinot create table (PR 7164)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7160&quot;&gt;Pinot drop table (PR 7160)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7163&quot;&gt;Pinot 6 (PR 7163)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Pinot filter clause parsing (see question of the week below)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-pinot-batch-insertion-and-query-using-trino-pinot-connector&quot;&gt;Demo: Pinot batch insertion and query using Trino Pinot connector&lt;/h2&gt;

&lt;p&gt;To put this PR to the test, we set up a Pinot cluster using Docker Compose.&lt;/p&gt;

&lt;p&gt;To load the data, we’re going to use a simple batch import, but you can also 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/data-import/upsert&quot;&gt;insert the data in a stream&lt;/a&gt;
using &lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Kafka&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s start up the Pinot cluster along with the required Zookeeper and Kafka
broker. Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot/trino-pinot&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/pinot/trino-pinot

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To do batch insert, we will stage a csv file to read the data in. Create a 
directory underneath a temp folder locally and then submit this to Pinot.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mkdir -p /tmp/pinot-quick-start/rawdata

echo &quot;studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000&quot; &amp;gt; /tmp/pinot-quick-start/rawdata/transcript.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order for Pinot to understand the CSV data, we must provide it a 
&lt;a href=&quot;https://docs.pinot.apache.org/configuration-reference/schema&quot;&gt;schema&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;{
    \&quot;schemaName\&quot;: \&quot;transcript\&quot;,
    \&quot;dimensionFieldSpecs\&quot;: [
      {
        \&quot;name\&quot;: \&quot;studentID\&quot;,
        \&quot;dataType\&quot;: \&quot;INT\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;firstName\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;lastName\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;gender\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;subject\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      }
    ],
    \&quot;metricFieldSpecs\&quot;: [
      {
        \&quot;name\&quot;: \&quot;score\&quot;,
        \&quot;dataType\&quot;: \&quot;FLOAT\&quot;
      }
    ],
    \&quot;dateTimeFieldSpecs\&quot;: [{
      \&quot;name\&quot;: \&quot;timestampInEpoch\&quot;,
      \&quot;dataType\&quot;: \&quot;LONG\&quot;,
      \&quot;format\&quot; : \&quot;1:MILLISECONDS:EPOCH\&quot;,
      \&quot;granularity\&quot;: \&quot;1:MILLISECONDS\&quot;
    }]
}&quot; &amp;gt; /tmp/pinot-quick-start/transcript-schema.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we are almost ready to create the &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/table&quot;&gt;table&lt;/a&gt;. 
Instead of adding table configurations as part of the SQL command, Pinot enables
you to store table configurations as a file. This is a nice option that
decouples the DDL which makes for simpler scripting in batch setups.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;{
    \&quot;tableName\&quot;: \&quot;transcript\&quot;,
    \&quot;segmentsConfig\&quot; : {
      \&quot;timeColumnName\&quot;: \&quot;timestampInEpoch\&quot;,
      \&quot;timeType\&quot;: \&quot;MILLISECONDS\&quot;,
      \&quot;replication\&quot; : \&quot;1\&quot;,
      \&quot;schemaName\&quot; : \&quot;transcript\&quot;
    },
    \&quot;tableIndexConfig\&quot; : {
      \&quot;invertedIndexColumns\&quot; : [],
      \&quot;loadMode\&quot;  : \&quot;MMAP\&quot;
    },
    \&quot;tenants\&quot; : {
      \&quot;broker\&quot;:\&quot;DefaultTenant\&quot;,
      \&quot;server\&quot;:\&quot;DefaultTenant\&quot;
    },
    \&quot;tableType\&quot;:\&quot;OFFLINE\&quot;,
    \&quot;metadata\&quot;: {}
}&quot; &amp;gt; /tmp/pinot-quick-start/transcript-table-offline.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you create these three files and verify that docker containers are running,
we can now run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Add Table&lt;/code&gt; command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 -exec
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that the table exists, we can see it in the 
&lt;a href=&quot;http://localhost:9000/#/tables&quot;&gt;Pinot web UI&lt;/a&gt;. Let’s insert some data using a 
batch job specification:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;executionFrameworkSpec:
  name: &apos;standalone&apos;
  segmentGenerationJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner&apos;
  segmentTarPushJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner&apos;
  segmentUriPushJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner&apos;
jobType: SegmentCreationAndTarPush
inputDirURI: &apos;/tmp/pinot-quick-start/rawdata/&apos;
includeFileNamePattern: &apos;glob:**/*.csv&apos;
outputDirURI: &apos;/tmp/pinot-quick-start/segments/&apos;
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: &apos;csv&apos;
  className: &apos;org.apache.pinot.plugin.inputformat.csv.CSVRecordReader&apos;
  configClassName: &apos;org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig&apos;
tableSpec:
  tableName: &apos;transcript&apos;
  schemaURI: &apos;http://pinot-controller:9000/tables/transcript/schema&apos;
  tableConfigURI: &apos;http://pinot-controller:9000/tables/transcript&apos;
pinotClusterSpecs:
  - controllerURI: &apos;http://pinot-controller:9000&apos;&quot; &amp;gt; /tmp/pinot-quick-start/docker-job-spec.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now run this batch job by running the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LaunchDataIngestionJob&lt;/code&gt; task.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We modified this demo from the tutorials available on the Pinot website:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot&quot;&gt;https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.pinot.apache.org/basics/getting-started/running-pinot-in-docker&quot;&gt;https://docs.pinot.apache.org/basics/getting-started/running-pinot-in-docker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-week-why-does-my-passthrough-query-not-work-in-the-pinot-connector&quot;&gt;Question of the week: Why does my passthrough query not work in the Pinot connector?&lt;/h2&gt;

&lt;p&gt;The passthrough queries may be failing due to upper case constants that need to
be surrounded with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPPER()&lt;/code&gt;. For example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;Foo&apos;&lt;/code&gt; in this query would be 
rendered as all lowercase once it is passed to Pinot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = &apos;FOO&apos; GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The fix is to pass &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;Foo&apos;&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPPER()&lt;/code&gt; in the passthrough query.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = UPPER(&apos;FOO&apos;) GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It could also be due to parsing of functions in filters. A workaround is to put
the filter outside of the double quotes, which can work in some cases. For
example, column table names can be mixed case as the connector will auto resolve
them. If there are mixed case constants would not work with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;upper()&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = &apos;Foo&apos; GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filter can be hoisted into the outer query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table GROUP BY col1, col2&quot; WHERE col2 = &apos;Foo&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is ongoing work to improve this parsing: 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/7161&quot;&gt;Pinot filter clause parsing (PR 7161)&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-i-cc672caea307&quot;&gt;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-i-cc672caea307&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-ii-3d09ff937713&quot;&gt;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-ii-3d09ff937713&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/exploring-olap-on-kubernetes-with-apache-pinot-32f12233dc0b&quot;&gt;https://medium.com/apache-pinot-developer-blog/exploring-olap-on-kubernetes-with-apache-pinot-32f12233dc0b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/building-a-climate-dashboard-with-apache-pinot-and-superset-d3ee8cb7941d&quot;&gt;https://medium.com/apache-pinot-developer-blog/building-a-climate-dashboard-with-apache-pinot-and-superset-d3ee8cb7941d&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog&quot;&gt;https://medium.com/apache-pinot-developer-blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&quot;&gt;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup Groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;https://www.meetup.com/trino-americas/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;https://www.meetup.com/trino-emea/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Trino APAC - Coming Soon&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;https://www.meetup.com/trino-boston/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;https://www.meetup.com/trino-nyc/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;https://www.meetup.com/trino-san-francisco/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;https://www.meetup.com/trino-los-angeles/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;https://www.meetup.com/trino-chicago/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun loves sippin&apos; on Pinot after a hard day of data exploration!</summary>

      
      
    </entry>
  
    <entry>
      <title>12: Trino gets super visual with Apache Superset!</title>
      <link href="https://trino.io/episodes/12.html" rel="alternate" type="text/html" title="12: Trino gets super visual with Apache Superset!" />
      <published>2021-03-04T00:00:00+00:00</published>
      <updated>2021-03-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/12</id>
      <content type="html" xml:base="https://trino.io/episodes/12.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Srini Kadamati, Developer Advocate at &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/SriniKadamati&quot;&gt;@SriniKadamati&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Dr. Beto Dealmeida, Staff Engineer at &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/dealmeida&quot;&gt;@dealmeida&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-353--almost&quot;&gt;Release 353 – Almost&lt;/h2&gt;

&lt;p&gt;353 is right around the corner. Last show we said this would be a small release.
While there was a correctness issue we resolved, there didn’t seem to be much
demand to get it out quick as we initially thought. So it was decided to
continue adding more features to 353. It should be coming out shortly!&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-clients-python-and-apache-superset&quot;&gt;Concept of the week: Trino clients, Python, and Apache Superset&lt;/h2&gt;

&lt;p&gt;What is the general data flow from a connected data source?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino workers request data from the data source with specific connector&lt;/li&gt;
  &lt;li&gt;Workers process data and send it to the coordinator&lt;/li&gt;
  &lt;li&gt;Coordinator does final processing&lt;/li&gt;
  &lt;li&gt;Supplies the data via HTTP / REST stream to requestor&lt;/li&gt;
  &lt;li&gt;Requestor is a “client” such as JDBC driver, or Trino CLI&lt;/li&gt;
  &lt;li&gt;Client translates data further and provides to application (Java application
using JDBC driver) or user interface/directly to user (output in CLI)&lt;/li&gt;
  &lt;li&gt;User views part of data and scrolls down&lt;/li&gt;
  &lt;li&gt;Client requests more data from coordinator via HTTP / REST (and see above)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What clients are provided by Trino project?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/client/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;Trino CLI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Trino Python client&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Trino Go client&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What other clients are there?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.starburst.io/data-consumer/clients/odbc.html&quot;&gt;ODBC driver from Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/ecosystem/client.html&quot;&gt;Various other clients&lt;/a&gt; from the open source community
    &lt;ul&gt;
      &lt;li&gt;R&lt;/li&gt;
      &lt;li&gt;NodeJS/Javascript&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What happens in the Python world?&lt;/p&gt;

&lt;p&gt;Disclaimer: I am not a Pythonista or Pythoneer.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;DB-API 2.0
    &lt;ul&gt;
      &lt;li&gt;PEP 249 &lt;a href=&quot;https://www.python.org/dev/peps/pep-0249/&quot;&gt;https://www.python.org/dev/peps/pep-0249/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Python standard library&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;trino-python-client
    &lt;ul&gt;
      &lt;li&gt;Wraps complexity of Trino HTTP / REST&lt;/li&gt;
      &lt;li&gt;Supports authentication and such&lt;/li&gt;
      &lt;li&gt;Provides DB API endpoints / implementation&lt;/li&gt;
      &lt;li&gt;Preferred method to query Trino&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;SQLAlchemy https://www.sqlalchemy.org/
    &lt;ul&gt;
      &lt;li&gt;SQL toolkit&lt;/li&gt;
      &lt;li&gt;ORM mapper&lt;/li&gt;
      &lt;li&gt;Widely used, eg. in Apache Superset&lt;/li&gt;
      &lt;li&gt;Supports dialects&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyHive
    &lt;ul&gt;
      &lt;li&gt;Not really a SQL wrapper&lt;/li&gt;
      &lt;li&gt;Aimed at Hive QL&lt;/li&gt;
      &lt;li&gt;Only kind of useful for Trino, limited compatibility&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;JDBC driver (Java !) and PySpark
    &lt;ul&gt;
      &lt;li&gt;Possible, but a hack really&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyJDBC
    &lt;ul&gt;
      &lt;li&gt;Wraps DB API around any JDBC driver&lt;/li&gt;
      &lt;li&gt;Kind of a hack since it goes through JDBC to HTTP, when Trino python
client does the same more directly&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyODBC
    &lt;ul&gt;
      &lt;li&gt;Similar hack to PyJDBC&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Potentially also possible to talk to via HTTP directly
    &lt;ul&gt;
      &lt;li&gt;That’s like reimplementing the trino-python-client&lt;/li&gt;
      &lt;li&gt;Also see question of the week later&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond that, it will vary from application to application.&lt;/p&gt;

&lt;p&gt;Let’s find out from our guests how this hangs together in Apache Superset, since
it is using Python.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-superset-pr-13105-feat-first-step-native-support-trino&quot;&gt;PR of the week: Superset PR 13105 feat: first step native support Trino&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/apache/superset/pull/13105&quot;&gt;https://github.com/apache/superset/pull/13105&lt;/a&gt; that
was graciously added by &lt;a href=&quot;https://github.com/dungdm93&quot;&gt;dungdm93&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The first thing we need to understand about this addition is the concept of a
database engine in Superset. A database engine handles a lot of the custom
interactions between various databases and maps them to the interface that 
Superset understands. If certain concepts are missing in a certain database, 
like time granularity or SQL syntax, the database engine for that database
indicated to Superset that this is not available. As a result the option does 
not show in Superset, or a concise error message is reported. By default, 
database engines use the &lt;a href=&quot;https://github.com/apache/superset/blob/master/superset/db_engine_specs/base.py&quot;&gt;base.py&lt;/a&gt;
methods, but each engine, like Trino, add the custom mappings with a specific
engine implementation,
&lt;a href=&quot;https://github.com/apache/superset/blob/master/superset/db_engine_specs/trino.py&quot;&gt;trino.py&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The pull request adds a few basic custom changes to enable Trino usage with 
Superset. One change ensures that complex timestamps from Trino are truncated to
a format that Superset is able to support during time aggregation operations.&lt;/p&gt;

&lt;p&gt;This opens a vast amount of functionality for using Trino and Superset. We 
wanted to feature this because it goes to show how a small code change, even
one that is not in the Trino repository, can have a vast effect on those
using Superset and Trino.&lt;/p&gt;

&lt;p&gt;Thank you so much to &lt;a href=&quot;https://github.com/dungdm93&quot;&gt;dungdm93&lt;/a&gt; for making this
change and further linking Trino into a fantastic project like &lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache
 Superset&lt;/a&gt;!&lt;/p&gt;

&lt;h2 id=&quot;demo-superset-querying-trino-to-create-visualization-dashboard&quot;&gt;Demo: Superset querying Trino to create visualization dashboard&lt;/h2&gt;

&lt;p&gt;To put this PR to the test, we need to connect Apache Superset to Trino as our
datasource.&lt;/p&gt;

&lt;p&gt;First, you need to follow &lt;a href=&quot;https://superset.apache.org/docs/installation/installing-superset-using-docker-compose&quot;&gt;these instructions&lt;/a&gt;
to install Docker (if you don’t already have it installed), and then clone the 
Superset repository:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$ git clone https://github.com/apache/superset.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Next, you need to set up the database driver for Trino. Navigate to the root
directory of the local Superset repository you just downloaded and run the
following.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;echo &quot;sqlalchemy-trino&quot; &amp;gt;&amp;gt; ./docker/requirements-local.txt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This tells Superset scripts to install the sqlalchemy-trino library upon
startup. We know the name by looking up &lt;a href=&quot;https://superset.apache.org/docs/databases/trino&quot;&gt;the Trino driver page&lt;/a&gt;
for the driver documentation and how to use the connection string. If you were
to install these directly on a Superset node, you would refer to &lt;a href=&quot;https://superset.apache.org/docs/databases/installing-database-drivers&quot;&gt;this database
 drivers page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now run the following command to start up Superset and make sure you’re in the 
root folder of the repo.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose -f docker-compose-non-dev.yml up&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After Superset is running, you need to start Trino as well. We did so using a
separate docker-compose app.&lt;/p&gt;

&lt;p&gt;As soon as this is done, you can navigate to Superset’s homepage &lt;a href=&quot;http://localhost:8088&quot;&gt;http://localhost:8088&lt;/a&gt;
and scroll to the &lt;strong&gt;Data&lt;/strong&gt; &amp;gt; &lt;strong&gt;Databases&lt;/strong&gt; menu.&lt;/p&gt;

&lt;p&gt;Click the &lt;strong&gt;+Database&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;Set Name to “Trino” and URI to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino://trino@host.docker.internal:8080&lt;/code&gt;
and click &lt;strong&gt;Add&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you want to allow CTAS, CVAS, or DML operations, you’ll want to edit
the Database you just created and click on the &lt;strong&gt;SQL LAB SETTINGS&lt;/strong&gt; tab and
 select
in the operations you want to allow.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/12/connection_settings.png&quot; /&gt;&lt;br /&gt;
 Connection settings that allows for creation/manipulation of tables.
&lt;/p&gt;

&lt;p&gt;You should be able to verify under &lt;strong&gt;SQL Lab&lt;/strong&gt; &amp;gt; &lt;strong&gt;SQL Editor&lt;/strong&gt; and run a SELECT
query.&lt;/p&gt;

&lt;p&gt;We cover adding charts and creating a dashboard in the show. We linked some
blogs from &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; around how to do a lot of this workflow 
in great detail. Find these blogs linked below! Here’s a taste of what we
created in Superset with some &lt;a href=&quot;https://transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ&quot;&gt;BTS On-Time : Reporting Carrier On-Time
 Performance (1987-present)&lt;/a&gt;
and &lt;a href=&quot;https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf&quot;&gt;Covid Cases&lt;/a&gt; 
reported by the CDC.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/12/covid_flights_data.png&quot; /&gt;&lt;br /&gt;
 COVID-19 and flights data dashboard!
&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-use-the-trino-rest-api&quot;&gt;Question of the week: How do I use the Trino REST api?&lt;/h2&gt;

&lt;p&gt;I want to just use the REST API of Trino. Where is the documentation? How do I do that?&lt;/p&gt;

&lt;h3 id=&quot;the-short-answer&quot;&gt;The short answer:&lt;/h3&gt;

&lt;p&gt;Don’t do that. Use a Trino client instead.&lt;/p&gt;

&lt;h3 id=&quot;the-long-answer&quot;&gt;The long answer:&lt;/h3&gt;

&lt;p&gt;The typical desired use case for using the REST API is to run a query and get 
the result. However that part of the API is not really a traditional REST API 
(HTTP POST, HTTP GET). That just doesn’t work for large datasets to be returned.
Instead, it is a constant open connection and stream of data and interaction
between client and Trino.&lt;/p&gt;

&lt;p&gt;The clients take care of all this complexity and provide it in standard API for
the various platforms (JDBC, …). Use the clients!&lt;/p&gt;

&lt;p&gt;And if there is no client, or the existing client is not good enough. Create an
open source one or contribute improvements.&lt;/p&gt;

&lt;h3 id=&quot;the-exception&quot;&gt;The exception:&lt;/h3&gt;

&lt;p&gt;There are other simple, pure REST API endpoints that you can use just straight
out of the box. Try &lt;a href=&quot;http://localhost:8080/v1/info&quot;&gt;http://localhost:8080/v1/info&lt;/a&gt; or
&lt;a href=&quot;http://localhost:8080/v1/status&quot;&gt;http://localhost:8080/v1/status&lt;/a&gt;.
You could use those for a liveness/readiness probe in k8s or for cluster status
display. By the way, the Web UI uses those and others..&lt;/p&gt;

&lt;h3 id=&quot;last-note&quot;&gt;Last note&lt;/h3&gt;

&lt;p&gt;If you really can’t help yourself, here are some docs.
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/HTTP-Protocol&quot;&gt;https://github.com/trinodb/trino/wiki/HTTP-Protocol&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-03-03-druid-prophet-pt1/&quot;&gt;https://preset.io/blog/2021-03-03-druid-prophet-pt1/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-02-11-superset-geodata/&quot;&gt;https://preset.io/blog/2021-02-11-superset-geodata/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-01-18-superset-1-0/&quot;&gt;https://preset.io/blog/2021-01-18-superset-1-0/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-1-18-recap-2020/&quot;&gt;https://preset.io/blog/2021-1-18-recap-2020/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-09-22-slack-dashboard/&quot;&gt;https://preset.io/blog/2020-09-22-slack-dashboard/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-10-02-slack-dashboard-part-2/&quot;&gt;https://preset.io/blog/2020-10-02-slack-dashboard-part-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-10-08-bigquery-superset-part-2/&quot;&gt;https://preset.io/blog/2020-10-08-bigquery-superset-part-2/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Srini Kadamati, Developer Advocate at Preset (@SriniKadamati) Dr. Beto Dealmeida, Staff Engineer at Preset (@dealmeida)</summary>

      
      
    </entry>
  
    <entry>
      <title>11: Dynamic filtering and dynamic partition pruning</title>
      <link href="https://trino.io/episodes/11.html" rel="alternate" type="text/html" title="11: Dynamic filtering and dynamic partition pruning" />
      <published>2021-02-18T00:00:00+00:00</published>
      <updated>2021-02-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/11</id>
      <content type="html" xml:base="https://trino.io/episodes/11.html">&lt;h2 id=&quot;release-352&quot;&gt;Release 352&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;https://trino.io/docs/current/release/release-352.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No new release to discuss yet except that 353 will be around the corner to fix
a low-impact correctness issue that came out in 352
&lt;a href=&quot;https://github.com/trinodb/trino/pull/6895&quot;&gt;https://github.com/trinodb/trino/pull/6895&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-dynamic-filtering&quot;&gt;Concept of the week: Dynamic filtering&lt;/h2&gt;

&lt;p&gt;So we’ve covered a lot on the Trino Community Broadcast to build our way up
to tackling this pretty big subject in the space called dynamic filtering. If
you haven’t seen episodes five through nine, you may want to go back and watch
those for some context for this episode. Episode eight actually diverted to the
Trino rebrand so we won’t discuss that one. For the recap;&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/5.html&quot;&gt;episode five&lt;/a&gt;, we spoke about Hive partitions. 
In  order to save you time when you run a query, Hive stores data under
directories named by the values of the data written underneath that directory.
Take this directory structure for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table partitioned by the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orderdate&lt;/code&gt; field:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;orders
├── orderdate=1992-01-01
│   ├── orders_1992-01-01_1.orc
│   ├── orders_1992-01-01_2.orc
│   ├── orders_1992-01-01_3.orc
│   └── ...
├── orderdate=1992-01-02
│   └── ...
├── orderdate=1992-01-03
│   └── ...
└── ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When querying for data under January 1st, 1992, according to the Hive model,
query engines like Hive and Trino will only scan ORC files under the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders/orderdate=1992-01-01&lt;/code&gt; directory. The idea is to avoid scanning
unnecessary data by grouping rows based on a field commonly used in a query.&lt;/p&gt;

&lt;p&gt;In episode &lt;a href=&quot;/episodes/6.html&quot;&gt;six&lt;/a&gt; and &lt;a href=&quot;/episodes/7.html&quot;&gt;seven&lt;/a&gt;,
we discussed a bit about how a query gets represented internally to Trino once
you submit your SQL query. First, the Parser converts SQL to an abstract syntax
tree (AST) format. Then the planner generates a different tree structure called
the intermediate representation (IR) that contains nodes representing the steps
that need to be performed in order to answer the query. The leaves of the tree 
get executed first, and the parents of each node are dependent on the action of
its child completing before it can start. Finally, the planner and 
cost-based-optimizer (CBO) runs various updates on the IR to optimize the query
plan until it is ready to be executed. To sum it all up, the planner and CBO
generate and optimize the plan by running optimization rules. Refer to chapter 
four in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50 for more information.&lt;/p&gt;

&lt;p&gt;In episode &lt;a href=&quot;/episodes/9.html&quot;&gt;nine&lt;/a&gt;, we discussed how hash-joins work
by first drawing a nested-loop analogy to how joins work. We then discussed how
it is advantageous to read the inner loop into memory to avoid a lot of extra
disk calls. Since it is ideal to read an entire table into memory, you likely
want to make sure the table that is built in memory is the smaller size of the
two tables. This smaller table called the build table. The table that gets
streamed is called the probe table. We discussed a bit how hash-joins work which
is a common mechanism to execute joins in a distributed and parallel fashion.&lt;/p&gt;

&lt;p&gt;Another nomenclature akin to build table and probe tables are dimension and
fact table, respectively. This nomenclature comes from the &lt;a href=&quot;https://en.wikipedia.org/wiki/Star_schema&quot;&gt;star schema&lt;/a&gt;
from data warehousing. Typically, there are large tables called fact tables
would live at the center of the schema. These tables typically have many foreign
keys, and a bit of quantitative or measuarable columns of the event or instance.
The foreign keys connect these big fact tables to smaller dimension tables that,
when joined, provide human readable context to enrich the recordings in the fact
table. The schema ends up looking like a star with the fact table at the center.
In essence, you just need to remember when someone is describing a fact table
they are saying it is a bigger table that is likely going to end up on the probe
side of a join, where a dimension is more likely a candidate to fit into memory
on the build side of a join.&lt;/p&gt;

&lt;p&gt;So let’s get onto the dynamic filtering shall we? First, let’s cover a few
concepts about dynamic filtering, then compare some variations of this concept.&lt;/p&gt;

&lt;p&gt;Dynamic filtering takes advantage of joins with big fact tables to smaller
dimension tables. What makes this filtering different from other types of
filtering is that you are using the smaller build table that is loaded at query
time to generate a list of values that exist in the join column between the
build table and probe table. We know that only values that match these criteria
are going to be returned from the probe side, so we can use this dynamically
generated list as a pushdown predicate on the join column of the probe side.
This means we are still scanning this data, but only sending the subset that
answers the query. We can look at &lt;a href=&quot;/blog/2019/06/30/dynamic-filtering.html&quot;&gt;the blog written for the original local
 dynamic filtering implementation&lt;/a&gt;
by Roman Zeyde for more insights on the original implementation for dynamic
filtering before Raunaq’s changes.&lt;/p&gt;

&lt;p&gt;Local dynamic filtering is definitely beneficial as it allows skipping 
unnecessary stripes or row-groups in the ORC or Parquet reader. However, it
works only for broadcast joins, and its effectiveness depends upon the 
selectivity of the min and max indices maintained in ORC or Parquet files. What
if we could prune entire partitions from the query execution based on dynamic
filters? In the next iteration of dynamic filtering, called dynamic partition 
pruning, we do just that. We take advantage of the partitioned layout of Hive 
tables to avoid generating splits on partitions that won’t exist in the final
query result. The coordinator can identify partitions for pruning based on the 
dynamic filters sent to it from the workers processing the build side of join.
This only works if the query contains a join condition on a column that is 
partitioned.&lt;/p&gt;

&lt;p&gt;With that basic understanding, let’s move on to the PR that implement dynamic
partition pruning!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1072-implement-dynamic-partition-pruning&quot;&gt;PR of the week: PR 1072 Implement dynamic partition pruning&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/1072&quot;&gt;https://github.com/trinodb/trino/pull/1072&lt;/a&gt; we
return with Raunaq Morarka and Karol Sobczak. This PR effectively brings in the 
second iteration of dynamic filtering, dynamic partition pruning, where instead
of relying on local dynamic filtering we collect dynamic filters from the
workers in the coordinator and prune out extra splits that aren’t needed with
the partition layout of the probe side table. A query like this for example, 
seen in &lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;Raunaq’s blog about dynamic partition pruning&lt;/a&gt;
shows that if we partition &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt; we can take
advantage of this information by sending it to the coordinator.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT COUNT(*) FROM 
sales JOIN items ON sales.item_id = date_dim.items.id
WHERE items.price &amp;gt; 1000;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Below we show how the execution of this would look in a distributed manner if
you partitioned the sales table on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;item_id&lt;/code&gt;. This is a visual reference for
those listening in on the podcast:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
1:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering1.png&quot; /&gt;&lt;br /&gt;
 Query is sent to the coordinator to be parsed, analyzed, and planned.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
2:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering2.png&quot; /&gt;&lt;br /&gt;
 All workers get a subset of the items (build) table and each worker filters
 out items with price &amp;gt; 1000.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
3:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering3.png&quot; /&gt;&lt;br /&gt;
 All workers create dynamic filter for their item subset and send it to the 
 coordinator.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
4:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering4.png&quot; /&gt;&lt;br /&gt;
 Coordinator uses dynamic filter list to prune out splits and partitions that
 do not overlap with the DF and submits splits to run on workers.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
5:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering5.png&quot; /&gt;&lt;br /&gt;
 Workers run splits over the sales (probe) table.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
6:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering6.png&quot; /&gt;&lt;br /&gt;
 Workers return final rows to be assembled into the final result on the
 coordinator.
&lt;/p&gt;

&lt;h2 id=&quot;pr-demo-pr-1072-implement-dynamic-partition-pruning&quot;&gt;PR Demo: PR 1072 Implement dynamic partition pruning&lt;/h2&gt;

&lt;p&gt;For this PR demo, we have set up one r5.4xlarge coordinator and four r5.4xlarge
workers in a cluster. We have a sf100 size tpcds dataset. We will run some of
the TPC-DS queries and perhaps a few others.&lt;/p&gt;

&lt;p&gt;The first query we run through in the TPC-DS queries was &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/resources/sql-tests/testcases/tpcds/q54.sql&quot;&gt;query 54&lt;/a&gt;.
With this query, we are using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt; catalog pointing to AWS S3 and AWS Glue
as our metastore. We initially disable dynamic filtering then compare it to
the times when dynamic filtering is enabled. With dynamic filtering we find the
query to run at about 92 seconds, where with dynamic filtering it runs for 42
seconds. We see similar findings for the semijoin we execute below and discuss
some implications of how the planner actually optimizes the semijoin into an
inner join.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/* turn dynamic filtering on or off to compare */
SET SESSION enable_dynamic_filtering=false;

SELECT ss_sold_date_sk, COUNT(*) from store_sales WHERE ss_sold_date_sk IN (
  SELECT ws_sold_date_sk FROM (
    SELECT ws_sold_date_sk, COUNT(*) FROM web_sales GROUP BY 1 ORDER BY 2 LIMIT 100
  )
)
GROUP BY 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/codex/how-to-build-a-modern-data-lake-with-minio-db0455eec053&quot;&gt;https://medium.com/codex/how-to-build-a-modern-data-lake-with-minio-db0455eec053&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/quintoandar-tech-blog/building-a-sql-engine-infrastructure-at-quintoandar-73540e136c4e&quot;&gt;https://medium.com/quintoandar-tech-blog/building-a-sql-engine-infrastructure-at-quintoandar-73540e136c4e&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/codex/modern-data-platform-using-open-source-technologies-212ba8273eab&quot;&gt;https://medium.com/codex/modern-data-platform-using-open-source-technologies-212ba8273eab&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Big Data Technology Warsaw Summit - Workshop Feb 23 - 24 &lt;a href=&quot;https://bigdatatechwarsaw.eu/agenda/&quot;&gt;https://bigdatatechwarsaw.eu/agenda/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Big Data Technology Warsaw Summit - Conference Feb 25 - 26 &lt;a href=&quot;https://bigdatatechwarsaw.eu/agenda/&quot;&gt;https://bigdatatechwarsaw.eu/agenda/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Past Events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Starburst Datanova - on demand &lt;a href=&quot;https://www.starburst.io/info/datanova/&quot;&gt;https://www.starburst.io/info/datanova/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 352</summary>

      
      
    </entry>
  
    <entry>
      <title>10: Naming the bunny!</title>
      <link href="https://trino.io/episodes/10.html" rel="alternate" type="text/html" title="10: Naming the bunny!" />
      <published>2021-02-04T00:00:00+00:00</published>
      <updated>2021-02-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/10</id>
      <content type="html" xml:base="https://trino.io/episodes/10.html">&lt;h2 id=&quot;release-352&quot;&gt;Release 352&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;https://trino.io/docs/current/release/release-352.html&lt;/a&gt;
At the time of recording 352 was not out yet. We will discuss a few of the
 changes coming down the pipeline to look forward to!&lt;/p&gt;

&lt;h2 id=&quot;naming-our-new-bunny&quot;&gt;Naming our new bunny!&lt;/h2&gt;
&lt;p&gt;That’s right, you submitted your names, and we are now happy to announce the top
names that were chosen, and we will choose the name by a community poll.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/trino-og.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The running names are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lepi: short for Lepus, the constellation under Orion that is in the shape of
 a bunny and said to be chased by Orion or Orion’s dogs. They cannot catch it
  because the bunny is fast &lt;a href=&quot;https://en.wikipedia.org/wiki/Lepus_(constellation)&quot;&gt;https://en.wikipedia.org/wiki/Lepus_(constellation)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Neut: early name used by community members through hearsay to unofficially 
 name the bunny until it had a real name. This name, which is a portmanteau when
  combined with Trino (Neut-Trino) became popular among a few members.&lt;/li&gt;
  &lt;li&gt;Nu: math symbol, with a similar prefix use of Nu + Trino to refer to
  the neutrino origins. Also in physics nu represents any of three kinds of
  neutrino in particle physics.&lt;/li&gt;
  &lt;li&gt;Commander Bun Bun:  a name suggested by a community member’s child who loves 
 the bunny!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 352 Release Notes discussed: https://trino.io/docs/current/release/release-352.html At the time of recording 352 was not out yet. We will discuss a few of the changes coming down the pipeline to look forward to!</summary>

      
      
    </entry>
  
    <entry>
      <title>9: Distributed hash-joins, and how to migrate to Trino</title>
      <link href="https://trino.io/episodes/9.html" rel="alternate" type="text/html" title="9: Distributed hash-joins, and how to migrate to Trino" />
      <published>2021-01-21T00:00:00+00:00</published>
      <updated>2021-01-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/9</id>
      <content type="html" xml:base="https://trino.io/episodes/9.html">&lt;script type=&quot;text/x-mathjax-config&quot;&gt;
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ [&apos;$&apos;,&apos;$&apos;], [&quot;\\(&quot;,&quot;\\)&quot;] ],
      processEscapes: true
    }
  });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;
&lt;/script&gt;

&lt;h2 id=&quot;release-351&quot;&gt;Release 351&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-351.html&quot;&gt;https://trino.io/docs/current/release/release-351.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This release was really all about renaming everything from a client perspective
to use Trino instead of Presto. Manfred will cover all the work that was done
to do this for the release&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-migrate-from-presto-releases-earlier-than-350-to-trino-releases-351&quot;&gt;Question of the week: How do I migrate from presto releases earlier than 350 to Trino releases 351?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-distributed-hash-join&quot;&gt;Concept of the week: Distributed Hash-join&lt;/h2&gt;
&lt;p&gt;Joins are one of the most useful and powerful operations performed by databases.
There are many approaches to joining data. Various types of indices can
facilitate joins. The order in which a join gets executed can vary depending
on geographic distribution of the data, selectivity of the query where the
fewer rows that get returned from a query the higher the selectivity, and the
information available from indexes and table statistics can inform an
execution engine how to form a query. One thing that stays consistent about
virtually every query engine in the world is that they occur over two tables
at a time no matter how many tables exist in the query. Some joins may occur
in parallel but any given join will only involve two tables.&lt;/p&gt;

&lt;p&gt;If you wrote a simple program that did what a join does, it might look something
like a nested loop:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;public class CartesianProductNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                System.out.println(o + &quot;, &quot; + i);
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since there is no predicate such as something you would see in a WHERE clause, 
the join returns the cartesian product of these two tables. It is useful also
to portray these joins in relation algbegra. For example, the join above is
written as $O \times I$ where $O$ is the outer table and $I$ is the inner table.
$\times$ indicates that the join we are using is the cartesian product as we
see below. Another useful way to view this is to visualize the join as a graph.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/cartesian.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE: When using relational algebra or using a graph to represent a join, it
is convention that the table in the outer loop of this join is always shown on
the left. This distinction becomes important as you will see below.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is the output from the cartesian product join above.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2, 1
2, 2
2, 3
2, 4
4, 1
4, 2
4, 3
4, 4
6, 1
6, 2
6, 3
6, 4
8, 1
8, 2
8, 3
8, 4
10, 1
10, 2
10, 3
10, 4
12, 1
12, 2
12, 3
12, 4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notice also that we are treating these tables the same since we have to read 
each of the values to print out the cartesian product it doesn’t make a 
difference which table is the inner table and which is the outer yet. We could 
swap these tables for inner and outer and still get the same performance of $O
(n^2)$.&lt;/p&gt;

&lt;p&gt;Now, what if you did have some criteria that filtered out some rows that get 
returned from this product. Since it is quite common to join tables by an id,
the most common criteria for a join is that the values are equal since values
in rows with matching ids are related. Initially we can get away with just
adding an if statement, print when true, and be done with it. Let’s
do that.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;public class NaturalJoinNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                if(o == i){
                    System.out.println(o + &quot;, &quot; + i);
                }
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Lets assume that the integers in these tables are values of a column called id 
in both tables that uniquely identify a row in each table. When you have a
commonly named column like this, the operation of joining based on columns that
share the same name is a natural join. In relational algebra it is denoted with
a litte bowtie, for example, $O \bowtie I$. We could also use the Equi-join
notation that specifies the exact join columns $O \bowtie_(O.id = I.id) I$. The
graph will look about the same as before but we only change the operation we
are performing.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/natural_join.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Now we only get the output of two rows as we should expect.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2, 2
4, 4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One important aspect that that gets glossed over in this simple example is that
the data is small and in memory versus a database initially has to retrieve the 
data from disk. Reading values from a disk using random access is 
&lt;a href=&quot;https://queue.acm.org/detail.cfm?id=1563874&quot;&gt;100,000 times faster on memory&lt;/a&gt;.
That being said, it’s really important to consider the fact that reading the 
values over and over again is going to be an exponential exercise, multiplied by 
100,000.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/disk_vs_mem.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;It would be better if we could read one table into memory once, and reuse those 
values as you scan over the data of the other table. There is a common name for 
both of these. Trino first reads the inner table into memory, to avoid having
to read this table for each row in the outer table. We call this table the build
table, as with the first scan you build the table in memory. Trino then streams 
the rows from the outer table and performs the join with the build table. We
call this table the probe table.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;import java.util.*;

public class BuildProbeLoops {
    public static void main(String[] args) {
        int[] probeTable = {2, 4, 6, 8, 10, 12};
        int[] buildTable = {1, 2, 3, 4};
        Map&amp;lt;Integer, Integer&amp;gt; buildTableCache = new HashMap&amp;lt;&amp;gt;();

        for (int row : buildTable) {
            //in this case the row is actually just the join column
            int hash = row;

            buildTableCache.put(hash, row);
        }

        for (int row : probeTable) {
            //in this case the row is actually just the join column
            int hash = row;

            Integer buildRow = buildTableCache.get(hash);
            if(buildRow != null){
                System.out.println(buildRow + &quot;, &quot; + row);
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;While it may seem redundant to do all of this extra work for this simple
example, this saves minutes to hours when reading from disk and the data you are
reading is big enough. The runtime complexity has now dropped from $O(n^2)$ to 
just a linear runtime of $O(n)$. The relational algebra for this table is now
$P \bowtie B$, where $P$ is the probe table and $B$ is the build table. Notice 
the relational algebra for this hasn’t changed, we just now specify that we do
a build on the inner table and probe the outer table.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/natural_join2.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;One thing to consider is the size of each table, if we are fitting one of the
tables into memory, it’s probably best we choose the smaller table to use as
the build table. Hopefully this helps you understand now why we now specify
between a build and a probe table. This will help in our discussions about
query optimization and dynamic filtering which we will discuss on the next
show.&lt;/p&gt;

&lt;p&gt;Another interesting subtopic of this that we won’t get into today are &lt;a href=&quot;http://www.oaktable.net/content/right-deep-left-deep-and-bushy-joins&quot;&gt;left
-deep and right-deep plans&lt;/a&gt;.
Since now we know that the probe table is always on the left and our build table
is on the right, the shape of our query matters. Consider the difference between
these two trees.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/left_deep.png&quot; /&gt;
&lt;img width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/right_deep.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The left-deep tree vs right-deep trees have big implications on the speed of the
query. This is a bit tangential for our talk today. Let’s  finally move on to
hash-joins!&lt;/p&gt;

&lt;p&gt;In Trino, a hash-join is the common algorithm that is used to join tables. In
fact the last snippet of code is really all that is invovled in implementing a 
hash-join. So in explaining probe and build, we have already covered how the
algorithm works conceptually.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/tables.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The big difference is that trino implements a distributed hash-join over two
types of parallelism.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Joined tables are distributed over the worker nodes to achieve inter-node
 parallelism. Instead of the hash value simply being used to match with other
  rows, it is also used to route to specific Trino worker nodes. Rows that meet
  the equijoin criteria then are processed by the workers for a set of ids.&lt;/li&gt;
  &lt;li&gt;Within the node, workers can use the hash to further distribute the rows
  across multithreaded applications. This intranode-parallelism allows for there
  to be a single thread for every hash partition.&lt;/li&gt;
  &lt;li&gt;Finally, once all of these threads are finished determining which rows pass
 the join criteria, the probe side then begins to emit rows in larger batches,
 which can quickly be thrown out or kept based on which partitions exist on a 
 given worker.&lt;/li&gt;
&lt;/ol&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/parallelism.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Great resources on this topic where some of the examples above derive:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Hash_join&quot;&gt;https://en.wikipedia.org/wiki/Hash_join&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/5-query-opt/join-order2.html&quot;&gt;http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/5-query-opt/join-order2.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;how-to-contribute-documentation-and-testimonials&quot;&gt;How to contribute documentation and testimonials&lt;/h2&gt;
&lt;p&gt;Instead of a PR this week Manfred discusses some notes on how to contribute to 
documentation and testimonials.&lt;/p&gt;

&lt;p&gt;If you want to show us some 💕, please &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;give us a ⭐ on Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary></summary>

      
      
    </entry>
  
    <entry>
      <title>8: Trino: A ludicrously fast query engine: past, present, and future</title>
      <link href="https://trino.io/episodes/8.html" rel="alternate" type="text/html" title="8: Trino: A ludicrously fast query engine: past, present, and future" />
      <published>2021-01-11T00:00:00+00:00</published>
      <updated>2021-01-11T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/8</id>
      <content type="html" xml:base="https://trino.io/episodes/8.html">&lt;h2 id=&quot;in-this-episode&quot;&gt;In this episode…&lt;/h2&gt;

&lt;p&gt;Well, we’re back, and no longer waving the Presto® flag like we did before. If 
you haven’t heard, Presto® SQL is now Trino (
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;Read more about that here&lt;/a&gt;).
In this episode, we sit down with the four original creators of Presto® and
discuss the journey in more detail of what led us to our current trajectory with
the Presto® SQL project and why that is now being renamed to Trino. We also
discuss how this affects those that are using Trino. If you are developing on
Trino and have the old namespace check out the 
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;guide to migrate here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also discuss the differences between the two projects. It is actually a lot
after two years since the split, and we recommend looking at the
&lt;a href=&quot;https://trino.io/blog/2020/01/01/2019-summary.html&quot;&gt;blog we wrote at the end of 2019&lt;/a&gt;
and keep your eyes peeled for the blog we are writing to summarize the changes
in 2020!&lt;/p&gt;

&lt;p&gt;Finally, we cover some sneak peeks at the roadmap for Trino in 2021.&lt;/p&gt;

&lt;p&gt;If you want to show us some 💕, please &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;give us a ⭐ on Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this episode…</summary>

      
      
    </entry>
  
    <entry>
      <title>7: Cost Based Optimizer, Decorrelate subqueries, and does Presto make my RDBMS faster?</title>
      <link href="https://trino.io/episodes/7.html" rel="alternate" type="text/html" title="7: Cost Based Optimizer, Decorrelate subqueries, and does Presto make my RDBMS faster?" />
      <published>2020-11-30T00:00:00+00:00</published>
      <updated>2020-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/7</id>
      <content type="html" xml:base="https://trino.io/episodes/7.html">&lt;h2 id=&quot;release-348&quot;&gt;Release 348&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://prestosql.io/docs/current/release/release-348.html&quot;&gt;https://prestosql.io/docs/current/release/release-348.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s announcement:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for OAuth2 authorization in Web UI&lt;/li&gt;
  &lt;li&gt;Support for S3 streaming uploads&lt;/li&gt;
  &lt;li&gt;Support for DISTINCT aggregations in correlated subqueries&lt;/li&gt;
  &lt;li&gt;Performance improvement for ORDER BY … LIMIT queries&lt;/li&gt;
  &lt;li&gt;Many improvements and bug fixes to JDBC driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s observations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;SHOW STATS to play around with&lt;/li&gt;
  &lt;li&gt;switch for Hive view translation off, legacy or new coral system&lt;/li&gt;
  &lt;li&gt;a bunch of other Hive connector improvements&lt;/li&gt;
  &lt;li&gt;Iceberg on GCP and Azure&lt;/li&gt;
  &lt;li&gt;Small SPI changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-cost-based-optimizer&quot;&gt;Concept of the week: Cost Based Optimizer&lt;/h2&gt;
&lt;p&gt;We’re continuing our series covering some fundamental topics that build up to
dynamic filtering! This week we’re discussing the cost-based optimizer with
Presto co-creator &lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;parseranalyzer&quot;&gt;Parser/Analyzer&lt;/h3&gt;

&lt;p&gt;To recap, in &lt;a href=&quot;6.html&quot;&gt;episode 6&lt;/a&gt; we discussed a little bit about the various
forms a query takes from submission to the coordinator, to actually being
executed. We discussed how the parser generates an abstract syntax tree (AST) 
and the analyzer checks for valid SQL including functions and making sure tables
and columns being referenced actually exist.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/7/ast.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here’s an example of an abstract syntax tree from last weeks episode for query
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT * FROM (VALUES 1) t(a) WHERE a = 1 OR 1 = a OR a = 1;&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;planner&quot;&gt;Planner&lt;/h3&gt;

&lt;p&gt;The next phase we discussed was the planner. Internally, the planner and
optimizer overlap substantially, but you can think of the planner as the early
part of the planning phase that generates the logical query, and over several
optimization iterations becomes an optimized distributed query. The planner
generates a new tree data structure called the plan IR (intermediate
representation) that contains nodes representing the steps that need to be
performed in order to answer the query. The leaves of the tree get executed 
first, and the parents of each node are dependent on the action of its child
completing before it can start.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/7/logical.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here’s ab example of a logical plan tree using the same query form the AST
above. Since this query isn’t pulling from a data source, the distributed
plan is equivalent to the logical plan.&lt;/p&gt;

&lt;h3 id=&quot;cost-based-optimizer-cbo&quot;&gt;Cost-Based Optimizer (CBO)&lt;/h3&gt;

&lt;p&gt;In the cost-based optimizer phase, there are various rules that are applied
to the Plan IR that slowly optimize the structure into the final distributed
plan that is then executed. To do this, the optimizer retrieves some statistical
metadata of the tables and their data. This information includes, table row
counts, column data size, column low/high value, distinct column value count, 
and the percentage of null values in a column. With the list of rules that aim
to leverage these statistics, the optimizer improves the query structure that 
improves on parallelism based on the number of workers to the number of sources.&lt;/p&gt;

&lt;p&gt;If you want to jump into the code, start at 
&lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L188&quot;&gt;the entry point&lt;/a&gt;
for the planner/optimizer and the initial planning starts on 
&lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L200&quot;&gt;this line&lt;/a&gt;. 
This loop is where the &lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L205&quot;&gt;actual optimization&lt;/a&gt;
occurs. So if you are interested, maybe grab a brandy 🥃 and take some time to
set your debugger at these points and watch the optimizer do its thing!&lt;/p&gt;

&lt;p&gt;Refer to chapter 4 in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1415-decorrelate-subqueries-with-limit-or-topn&quot;&gt;PR of the week: PR 1415 Decorrelate subqueries with Limit or TopN&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/prestosql/presto/pull/1415&quot;&gt;https://github.com/prestosql/presto/pull/1415&lt;/a&gt;, 
done by Presto contributer and Starburst Engineer &lt;a href=&quot;https://github.com
/kasiafi&quot;&gt;kasiafi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before we can jump into this PR, let’s discuss what a subquery is and further
what a correlated subquery is. In SQL you have a nested query that runs within
another query, typically embedded within a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statement.
Take this query for example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we have a standard non-correlated subquery that runs on 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table2&lt;/code&gt;. The reason it is not correlated is because there are no dependencies
on the parent query that is being run on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table&lt;/code&gt;. This type of query enables the
SQL engine to run the subquery first and then use those results to run the
parent query after. In the case of a correlated query, you typically have at
least one criterion in the nested query that depends on the parent. This
requires that the nested query gets executed for each row of the parent query. 
Take a look at this correlated query:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we are running the subquery in the context of the row in order
to evaluate the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t1.b&lt;/code&gt;. Having this query run for every row of the
parent query is certainly not ideal if it is not required and that is why
subquery decorrelation is a common optimization technique if an equivalent
non-correlated subquery exists for a given correlated subquery.&lt;/p&gt;

&lt;p&gt;This pull request adds a rule that added the ability for Presto to handle the
decorrelation of a subquery containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; or (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER&lt;/code&gt; + &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; i.e.
TopN) clauses. So, the common trick during decorrelation is to turn it into a 
query that can process the results from the inner table in one shot. The
approach is to flatten the results of executing the subquery for every row into
a single stream of rows before it is finally ready for execution.&lt;/p&gt;

&lt;p&gt;This change also applies to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt; join, which behaves a lot like a nested
subquery only that it acts as a table and returns multiple rows instead of
just a single row.&lt;/p&gt;

&lt;h2 id=&quot;pr-demo-pr-1415-decorrelate-subqueries-with-limit-or-topn&quot;&gt;PR Demo: PR 1415 Decorrelate subqueries with Limit or TopN&lt;/h2&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;###&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Fails&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Returns&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;more&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;than&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;one&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;This&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actually&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fails&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execution&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;during&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;planning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optimizing&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;two&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;below&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fail&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Limit&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correlated&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;non&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equality&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predicate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TopN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correlated&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;non&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equality&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predicate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;After the show Kasia pointed out that the failing queries were not all failing 
for the same reason. The first failing query above actually gets planned and
executed, but the exception occurs during the execution. The rest actually
fail during the planning and optimization phase as they were unable to be 
decorrelated due to the issue I line out in the comments above.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week&quot;&gt;Question of the week:&lt;/h2&gt;

&lt;p&gt;In this week’s question, we answer: Will running Presto on my relational 
database make processing faster?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I have been going over the docs of PrestoSQL and it seems to fit some of my 
requirements. I am little concerned about the resources needed to run Presto 
in production. Because the size of my prod data is between 3-5GB and there
will be very minimal data growth. Is Presto suitable for such a small 
data size?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Many times, the idea that Presto is fast gets conflated with the idea that
Presto is a good fit for all use cases. It is important to understand that
Presto is a) not a database b) not developed for OLTP workloads and c) built
to handle data at the scale of Terabytes to Petabytes over distributed queries.
Since Presto uses a connector framework, it also has an added benefit of running
federated queries to whatever data source that returns data that can be
represented in some columnar fashion.&lt;/p&gt;

&lt;p&gt;For relatively small size data sets you should try directly using your
relational database first. Doing this is better for small data sets. Database
indexes are really nice if you’re not in big data world and if you give your
SQL Server say 10 GB memory, it should be running fully in-memory and thus
— fast.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;
&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://prestosql.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://prestosql.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/08/13/training-security.html&quot;&gt;https://prestosql.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/08/27/training-performance.html&quot;&gt;https://prestosql.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://prestosql.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://prestosql.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://prestosql.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://prestosql.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 348 Release Notes discussed: https://prestosql.io/docs/current/release/release-348.html</summary>

      
      
    </entry>
  
    <entry>
      <title>6: Query Planning, Remove duplicate predicates, and Memory settings</title>
      <link href="https://trino.io/episodes/6.html" rel="alternate" type="text/html" title="6: Query Planning, Remove duplicate predicates, and Memory settings" />
      <published>2020-11-30T00:00:00+00:00</published>
      <updated>2020-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/6</id>
      <content type="html" xml:base="https://trino.io/episodes/6.html">&lt;h2 id=&quot;release-347&quot;&gt;Release 347&lt;/h2&gt;

&lt;p&gt;We discuss the Trino 347 release notes:
&lt;a href=&quot;https://trino.io/docs/current/release/release-347.html&quot;&gt;https://trino.io/docs/current/release/release-347.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Official release announcement from Martin Traverso:&lt;/p&gt;

&lt;p&gt;We’re happy to announce the release of Presto 347! This version includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for EXCEPT ALL and INTERSECT ALL&lt;/li&gt;
  &lt;li&gt;New syntax for changing the owner of a view&lt;/li&gt;
  &lt;li&gt;Performance improvements when inserting data into Hive tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notes from Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;contains_sequence function for arrays.&lt;/li&gt;
  &lt;li&gt;CentOS 8 on docker image.&lt;/li&gt;
  &lt;li&gt;Kudu get dynamic filtering.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-query-planning&quot;&gt;Concept of the week: Query planning&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;All happening on coordinator in cluster.&lt;/li&gt;
  &lt;li&gt;Before a query can be planned, the coordinator receives a SQL query and
passes it to a parser.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parser/Analyzer&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Parser parses the sql query into an AST (abstract syntax tree).&lt;/li&gt;
  &lt;li&gt;Then the analyzer checks for valid SQL including functions and such.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Planner/Optimizer&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Request metadata about structure from catalogs.
    &lt;ul&gt;
      &lt;li&gt;Do the tables and columns exist?&lt;/li&gt;
      &lt;li&gt;What data types are used?&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Request metadata about content (table stats, data location).&lt;/li&gt;
  &lt;li&gt;Create logical plan
    &lt;ul&gt;
      &lt;li&gt;Are function parameters using right data types?&lt;/li&gt;
      &lt;li&gt;What catalogs/schema/tables/columns need to be accessed?&lt;/li&gt;
      &lt;li&gt;Are joins using compatible field data types?&lt;/li&gt;
      &lt;li&gt;Optimize
        &lt;ul&gt;
          &lt;li&gt;Eliminate redundant conditions.&lt;/li&gt;
          &lt;li&gt;Figure best order of operations.&lt;/li&gt;
          &lt;li&gt;Decide on filtering early.&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Create distributed plan (More on this in the next episode!)
    &lt;ul&gt;
      &lt;li&gt;Break logical plan up.&lt;/li&gt;
      &lt;li&gt;Adapt to parallel access by multiple workers to data source.&lt;/li&gt;
      &lt;li&gt;Break up operations so workers aggregate and process data from other workers.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; to learn what is planned.
Also refer to chapter 4 in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-730-remove-duplicate-predicates&quot;&gt;PR of the week: PR 730 Remove duplicate predicates&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/730&quot;&gt;https://github.com/trinodb/trino/pull/730&lt;/a&gt;, 
came from one of the co-creators &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;.
This pull request removes duplicate predicates in logical binary expressions
(AND, OR) and canonicalizes commutative arithmetic expressions and comparisons
to handle a larger number of variants. Canonicalize is a big word but all it
is saying is that if there are multiple representations of the same logic or
data, then simplify it to a simpler or agreed upon normal form.&lt;/p&gt;

&lt;p&gt;For example the statement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COALESCE(a * (2 * 3), 1 - 1)&lt;/code&gt; is 
equivalent to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COALESCE(6 * a, 0)&lt;/code&gt; as the expression 2 * 3 can
be simplified to static integer.&lt;/p&gt;

&lt;p&gt;This is an example of a logical plan because we are talking about the query
syntax by optimizing the SQL. It differs from the distributed plan as we are not
determining how the plan will be distributed, where this plan will run and it
does not run further optimizations that are handled by the cost based optimizer
such as pushdown predicates. We’ll talk about this step more in the next
episode. For now let’s cover a few examples&lt;/p&gt;

&lt;h2 id=&quot;demo-pr-730-remove-duplicate-predicates&quot;&gt;Demo: PR 730 Remove duplicate predicates&lt;/h2&gt;
&lt;p&gt;The format of the EXPLAIN used is &lt;a href=&quot;https://graphviz.org/&quot;&gt;graphviz&lt;/a&gt;. The
online tool used during the show is &lt;a href=&quot;http://viz-js.com/&quot;&gt;Viz.js&lt;/a&gt;. You can paste
the output of your EXPLAIN queries to visualize the query in a tree form.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LOGICAL&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LOGICAL&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; 

&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DISTRIBUTED&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;question-of-the-week-how-should-i-allocate-memory-properties&quot;&gt;Question of the week: How should I allocate memory properties?&lt;/h2&gt;

&lt;p&gt;In this week’s question, we answer:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;How should I allocate memory properties? CPU : 16Core  MEM:64GB&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before answering this, we should make sure a few things about memory are clear.&lt;/p&gt;

&lt;h3 id=&quot;user-memory&quot;&gt;User memory&lt;/h3&gt;
&lt;p&gt;Space needed that the user is capable of reasoning about:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Input Data&lt;/li&gt;
  &lt;li&gt;Hash tables execution&lt;/li&gt;
  &lt;li&gt;Sorting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Settings&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory-per-node&lt;/code&gt;&lt;/strong&gt; - maximum amount of user memory that a query
is allowed to use on a given worker.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory&lt;/code&gt;&lt;/strong&gt; (without the -per-node at the end) - This config caps 
the amount of user memory used by a single query over all worker nodes in your 
cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;system-memory&quot;&gt;System memory&lt;/h3&gt;
&lt;p&gt;Memory needed to facilitate internal usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Shuffle buffers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NOTE: There are no settings for this memory as it is implicitly set by the user
and total memory settings. Use this to calculate system memory:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;max system memroy per node = &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; - 
 &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory-per-node&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;max system memory = &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory&lt;/code&gt;&lt;/strong&gt; - &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;total-memory&quot;&gt;Total memory&lt;/h3&gt;
&lt;p&gt;Total Memory = System + User, but there are only properties for total and
user memory.&lt;/p&gt;

&lt;p&gt;Settings&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; - maximum amount of total memory that a
  query is allowed to use on a given worker.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory&lt;/code&gt;&lt;/strong&gt;(without the -per-node at the end) - This config 
caps the total memory used by a single query over all worker nodes in your
cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;heap-headroom&quot;&gt;Heap headroom&lt;/h3&gt;
&lt;p&gt;The final setting I would like to cover is the 
&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memory.heap-headroom-per-node&lt;/code&gt;&lt;/strong&gt;. This config sets aside memory for the
JVM heap for allocations that are not tracked by Presto. You can typically go
with the default on this setting which is 30% of the JVM’s max heap size 
(-Xmx setting).&lt;/p&gt;

&lt;h3 id=&quot;jvm-heap-memory--xmx-setting&quot;&gt;JVM heap memory (-Xmx setting)&lt;/h3&gt;
&lt;p&gt;Now knowing that Presto is a java application means it runs on the JVM. None of
these memory settings mean anything until we actually have the JVM that Presto
is running on set aside sufficient memory. So how do I know I am setting 
sufficient memory based on my settings?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; + &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memory.heap-headroom-per-node&lt;/code&gt;&lt;/strong&gt; &amp;lt; 
 &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Xmx&lt;/code&gt; setting (Java heap)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/6/memory_pools.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Dain really covers the proportions well in detail on the recent training videos. 
Here’s a snippet of what he recommends.&lt;/p&gt;

&lt;iframe width=&quot;1058&quot; height=&quot;595&quot; src=&quot;https://www.youtube.com/embed/Pu80FkBRP-k?start=2569&amp;amp;end=2674&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; 
encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;p&gt;All in all, try to estimate the amount of memory needed by your max anticipated
query load, and if possible try to get even more than your estimate. Once Presto
is discovered by users, they will start to use it even more and demands on the
system will grow.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;
&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dec 8 &lt;a href=&quot;https://www.meetup.com/Warsaw-Data-Engineering/events/274939817/&quot;&gt;https://www.meetup.com/Warsaw-Data-Engineering/events/274939817/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 347</summary>

      
      
    </entry>
  
    <entry>
      <title>5: Hive Partitions, sync_partition_metadata, and Query Exceeded Max Columns!</title>
      <link href="https://trino.io/episodes/5.html" rel="alternate" type="text/html" title="5: Hive Partitions, sync_partition_metadata, and Query Exceeded Max Columns!" />
      <published>2020-11-19T00:00:00+00:00</published>
      <updated>2020-11-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/5</id>
      <content type="html" xml:base="https://trino.io/episodes/5.html">&lt;p&gt;In this week’s concept, Manfred discusses Hive Partitioning.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Concept from RDBMS systems implemented in HDFS&lt;/li&gt;
  &lt;li&gt;Normally just multiple files in a directory per table&lt;/li&gt;
  &lt;li&gt;Lots of different file formats, but always one directory&lt;/li&gt;
  &lt;li&gt;Partitioning creates nested directories&lt;/li&gt;
  &lt;li&gt;Needs to be set up at start of table creation&lt;/li&gt;
  &lt;li&gt;CTAS query&lt;/li&gt;
  &lt;li&gt;Uses WITH ( partitioned_by = ARRAY[‘date’])&lt;/li&gt;
  &lt;li&gt;Results in tablename/date=2020-11-19&lt;/li&gt;
  &lt;li&gt;Can also nest deeper WITH ( partitioned_by = ARRAY[‘date’, ‘countrycode’])&lt;/li&gt;
  &lt;li&gt;Can greatly enhance performance&lt;/li&gt;
  &lt;li&gt;Optimizer can determine what directories to read based on field&lt;/li&gt;
  &lt;li&gt;Especially useful when fields are used in WHERE clauses&lt;/li&gt;
  &lt;li&gt;Also useful for historic data management over time such as moving data out
to archive, deleting data, or replacing data with aggregates, or even just
  running compaction on subsets&lt;/li&gt;
  &lt;li&gt;Presto can use DELETE on partitions using DELTE FROM table WHERE date=value&lt;/li&gt;
  &lt;li&gt;Also possible to create empty partitions upfront CALL system.create_empty_partition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See here for more details: &lt;a href=&quot;https://www.educba.com/partitioning-in-hive/&quot;&gt;https://www.educba.com/partitioning-in-hive/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/223&quot;&gt;https://github.com/trinodb/trino/pull/223&lt;/a&gt;, 
came from contributor &lt;a href=&quot;https://github.com/luohao&quot;&gt;Hao Luo&lt;/a&gt;. What this function
does is similar to Hive’s &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)&quot;&gt;MSCK REPAIR TABLE&lt;/a&gt;
where if it finds a hive partition directory in the filesystem that exist but
no partition entry in the metastore, then it will add the entry to the
metastore. If there is an entry in the metastore but the partition was deleted
from the filesystem, then it will remove the metastore entry. You can find
more information about &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#procedures&quot;&gt;this procedure in the documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here are the commands and SQL I ran during the show on Presto&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CATALOGS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCHEMAS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TABLES&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;s3a://part/&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Create a table with no partitions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;dt&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Make sure you are using minio (which is a rename of hive) catalog&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ADD&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;DROP&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;FULL&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

 &lt;span class=&quot;c1&quot;&gt;-- Create a table with multi partitions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;multi_part&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;month&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;day&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;year&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;month&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;day&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;multi_part&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-7&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-8&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-9&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-10&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-12&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We ran some queries against the metastore database. It’s a complicated model so 
here is a database diagram to show the different tables and their relations in
the metastore.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/5/hive_metastore_database_diagram.png&quot; alt=&quot;&quot; /&gt;
This diagram was generated by niftimusmaximus on 
&lt;a href=&quot;https://analyticsanvil.wordpress.com/2016/08/21/useful-queries-for-the-hive-metastore/&quot;&gt;The Analytics Anvil&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;MariaDB (metastore database)&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metastore_db&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show database&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show tables given a database&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show location and input format of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INPUT_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LOCATION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show (de)serializer format of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SLIB&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SERDES&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show columns of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;COLUMNS_V2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTEGER_IDX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show partitions of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LOCATION&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PARTITIONS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this week’s question, we answer:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Why am I getting, “Query exceeded maximum columns. Please reduce the number 
of columns referenced and re-run the query.”?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’m running this query to check for duplicates. My table has approx. 650
columns and I get this error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *, COUNT(1) 
FROM tbl 
GROUP BY * 
HAVING COUNT(1) &amp;gt; 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;getting a stacktrace like this&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;io.prestosql.spi.PrestoException: Compiler failed
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitScanFilterAndProject(LocalExecutionPlanner.java:1306)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:1185)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.ProjectNode.accept(ProjectNode.java:82)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:1119)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.AggregationNode.accept(AggregationNode.java:204)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:461)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:432)
	at io.prestosql.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:75)
	at io.prestosql.execution.SqlTask.updateTask(SqlTask.java:382)
	at io.prestosql.execution.SqlTaskManager.updateTask(SqlTaskManager.java:383)
	at io.prestosql.server.TaskResource.createOrUpdateTask(TaskResource.java:128)
	at jdk.internal.reflect.GeneratedMethodAccessor480.invoke(Unknown Source)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The throwable that causes this error &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MethodTooLargeException&lt;/code&gt; comes from the ASM
library &lt;a href=&quot;https://asm.ow2.io/&quot;&gt;https://asm.ow2.io/&lt;/a&gt; when you ask it to create a method with more
bytecode than is allowed by the JVM specification.&lt;/p&gt;

&lt;p&gt;We try to generate code for handling given query and the code generated is too 
large. Since the code is proportional to number of columns referenced, we
rewrap the exception in something more meaningful to the user.&lt;/p&gt;

&lt;p&gt;The general strategy would be to lower the number of columns that you reference.&lt;/p&gt;

&lt;p&gt;The problem is that in removing columns you will remove important information
to the query. For example, in the example looking for duplicates above, you 
won’t be able to discard false positive duplicate matches, but this may be
good enough to help narrow the search space. As always, it depends…&lt;/p&gt;

&lt;p&gt;To learn more about the JVM limit and search for code_length in the Java SE
specification.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.3&quot;&gt;SE8&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.3&quot;&gt;SE11&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to &lt;a href=&quot;https://github.com/hashhar&quot;&gt;Ashhar Hasan&lt;/a&gt; for asking this 
question and providing some useful context!&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;https://trino.io/docs/current/release/release-346.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2020/05/presto-sql-for-newbies.html&quot;&gt;https://www.javahelps.com/2020/05/presto-sql-for-newbies.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2020/04/setup-presto-sql-development-environment.html&quot;&gt;https://www.javahelps.com/2020/04/setup-presto-sql-development-environment.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-types-of-joins.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-types-of-joins.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/analytics-vidhya/deploying-starburst-enterprise-presto-on-googles-kubernetes-engine-with-storage-and-postgres-72483b10ab62&quot;&gt;https://medium.com/analytics-vidhya/deploying-starburst-enterprise-presto-on-googles-kubernetes-engine-with-storage-and-postgres-72483b10ab62&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Nov 19 Presto Tokyo Conference - Japanese &lt;a href=&quot;https://techplay.jp/event/795265&quot;&gt;https://techplay.jp/event/795265&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 24 EMEA - Polish &lt;a href=&quot;https://www.meetup.com/Warsaw-Data-Engineering/events/274666392/&quot;&gt;https://www.meetup.com/Warsaw-Data-Engineering/events/274666392/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 2 &lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 3 EMEA &lt;a href=&quot;https://www.starburstdata.com/introduction-to-presto/&quot;&gt;https://www.starburstdata.com/introduction-to-presto/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses Hive Partitioning. Concept from RDBMS systems implemented in HDFS Normally just multiple files in a directory per table Lots of different file formats, but always one directory Partitioning creates nested directories Needs to be set up at start of table creation CTAS query Uses WITH ( partitioned_by = ARRAY[‘date’]) Results in tablename/date=2020-11-19 Can also nest deeper WITH ( partitioned_by = ARRAY[‘date’, ‘countrycode’]) Can greatly enhance performance Optimizer can determine what directories to read based on field Especially useful when fields are used in WHERE clauses Also useful for historic data management over time such as moving data out to archive, deleting data, or replacing data with aggregates, or even just running compaction on subsets Presto can use DELETE on partitions using DELTE FROM table WHERE date=value Also possible to create empty partitions upfront CALL system.create_empty_partition See here for more details: https://www.educba.com/partitioning-in-hive/</summary>

      
      
    </entry>
  
    <entry>
      <title>4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?</title>
      <link href="https://trino.io/episodes/4.html" rel="alternate" type="text/html" title="4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?" />
      <published>2020-11-04T00:00:00+00:00</published>
      <updated>2020-11-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/4</id>
      <content type="html" xml:base="https://trino.io/episodes/4.html">&lt;p&gt;In this week’s concept, Manfred discusses ACID in general, CAP theorem, 
HDFS and Hive before ACID, and now ORC ACID and similar support.&lt;/p&gt;

&lt;p&gt;ACID &lt;a href=&quot;https://en.wikipedia.org/wiki/ACID&quot;&gt;https://en.wikipedia.org/wiki/ACID&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Atomicity - Transaction completely succeeds or completely fails, no partial 
results so no inconsistent relationships left tangling and such. The database
 remains in a consistent state.&lt;/li&gt;
  &lt;li&gt;Consistency - database content always adheres to defined rules (key
 constraints).&lt;/li&gt;
  &lt;li&gt;Isolation, transactions are isolated from each other and can run in parallel
  with same result as sequentially.&lt;/li&gt;
  &lt;li&gt;Durability - no data is lost after transaction completion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ACID used to be a crucial criteria for a “serious” relational database system.&lt;/p&gt;

&lt;p&gt;Then came big data and the CAP theorem. &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;https://en.wikipedia.org/wiki/CAP_theorem&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Consistency&lt;/li&gt;
  &lt;li&gt;Availability&lt;/li&gt;
  &lt;li&gt;Partition tolerance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/5402&quot;&gt;https://github.com/trinodb/trino/pull/5402&lt;/a&gt;,
came from contributor &lt;a href=&quot;https://github.com/djsstarburst&quot;&gt;David Stryker&lt;/a&gt;. David
covers some interesting aspects to working on this pull request. This commit
adds support for row-level insert and delete for Hive ACID tables, and
product tests that verify that row-level insert and delete where allowed.&lt;/p&gt;

&lt;p&gt;Here is the SQL that we ran in the INSERT/DELETE demo&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/*
  Ran against Presto
*/
SHOW SCHEMAS IN minio;
SHOW TABLES IN minio.acid;

CREATE SCHEMA minio.acid
WITH (location = &apos;s3a://acid/&apos;);


CREATE TABLE minio.acid.test (a int, b int)
WITH (
   format=&apos;ORC&apos;,
   transactional=true
);

INSERT INTO minio.acid.test VALUES (10, 10), (20, 20);

SELECT * FROM  minio.acid.test;

DELETE FROM minio.acid.test WHERE a = 10;

/*
  Ran against Hive
*/

SHOW DATABASES;

SELECT * FROM acid.test;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;David also mentioned &lt;a href=&quot;http://shzhangji.com/blog/2019/06/10/understanding-hive-acid-transactional-table/&quot;&gt;this blog&lt;/a&gt;
to better understand the hive acid model.&lt;/p&gt;

&lt;p&gt;In this week’s question we answer, “Why is Java 11 needed in the newer version
of Presto and how do I get the older version of Presto as I need the 328 latest 
on Java 8 as Java 11 isn’t available to use?&lt;/p&gt;

&lt;p&gt;Using Java 11 because it is the next LTS verison of java since 8. Java 11 
provides significant performance and stability improvements, so we believe 
everyone should be running that version to get the best experience out of 
Presto. Moving to Java 11 allows us to take advantage of many improvements to 
the JDK and the Java language that were introduced since Java 8.&lt;/p&gt;

&lt;p&gt;For older versions, you can download it from maven or an older document version.
&lt;a href=&quot;https://repo.maven.apache.org/maven2/io/prestosql/presto-server/&quot;&gt;https://repo.maven.apache.org/maven2/io/prestosql/presto-server/&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/328/&quot;&gt;https://trino.io/docs/328/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing to point out is you’re only required to use JDK11 for the server. The
client can be on JDK8. One reason you would need to run Presto on JDK8 is if the
server had to be run with another service running JDK8 which we do not recommend
as this will degrade the performance of your cluster and could cause other
issues if Presto is fighting for resources.&lt;/p&gt;

&lt;p&gt;Another possibility is that there is
a company policy requiring specific JDKs be installed on all servers. You can
have side-by-side installs of multiple versions of the JDK and use the
appropriate one. You just need to launch Presto with the correct java command. 
If your company is against using a newer JDK, you can point out the arguments
above to update the policy to at least include JDK11.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-345.html&quot;&gt;https://trino.io/docs/current/release/release-345.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &quot;&gt;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Nov 12 Webinar: &lt;a href=&quot;https://www.starburstdata.com/webinar-lower-cdw-costs-starburst&quot;&gt;https://www.starburstdata.com/webinar-lower-cdw-costs-starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 17 &lt;a href=&quot;https://databricks.com/session_eu20/presto-fast-sql-on-anything-including-delta-lake-snowflake-elasticsearch-and-more&quot;&gt;https://databricks.com/session_eu20/presto-fast-sql-on-anything-including-delta-lake-snowflake-elasticsearch-and-more&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 19 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 2 &lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses ACID in general, CAP theorem, HDFS and Hive before ACID, and now ORC ACID and similar support.</summary>

      
      
    </entry>
  
    <entry>
      <title>3: Running two Presto distributions and Kafka headers as Presto columns</title>
      <link href="https://trino.io/episodes/3.html" rel="alternate" type="text/html" title="3: Running two Presto distributions and Kafka headers as Presto columns" />
      <published>2020-10-22T00:00:00+00:00</published>
      <updated>2020-10-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/3</id>
      <content type="html" xml:base="https://trino.io/episodes/3.html">&lt;p&gt;In this week’s concept, Manfred discusses what an SPI (service provider 
interface) is and covers the connector architecture of Presto, Starburst, and 
Custom.&lt;/p&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/4462&quot;&gt;https://github.com/trinodb/trino/pull/4462&lt;/a&gt;, 
came from user &lt;a href=&quot;https://github.com/0xE282B0&quot;&gt;Sven Pfennig&lt;/a&gt;. Sven works for 
&lt;a href=&quot;syncier.com&quot;&gt;Syncier GmbH&lt;/a&gt; and as part of his role there he gets to contribute
to open source projects such as Presto. Thanks Sven! We jump into a quick setup
of a kafka broker using the 
&lt;a href=&quot;https://kafka.apache.org/quickstart&quot;&gt;kafka quickstart tutorial&lt;/a&gt; and I use the 
&lt;a href=&quot;https://github.com/edenhill/kafkacat&quot;&gt;kafkacat tool&lt;/a&gt; to show off the addition 
of headers in Kafka that Sven has provided us and discuss why this is 
beneficial.&lt;/p&gt;

&lt;p&gt;Here’s the crazy select statement I used to decode the binary values to utf text
of the foo column&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
   _message, 
   reduce(element_at(_headers,&apos;foo&apos;), &apos;&apos;, (s, c) -&amp;gt; s || from_utf8(c), s -&amp;gt; s) AS foo 
FROM kafka.default.pcb 
WHERE contains(map_keys(_headers), &apos;foo&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;An alternative tutorial that uses the TPC dataset can be located on the website site.
&lt;a href=&quot;https://trino.io/docs/current/connector/kafka-tutorial.html&quot;&gt;https://trino.io/docs/current/connector/kafka-tutorial.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This weeks question was accidentally cut off as I had mapped my Shift + R key to
toggle streaming/recording and this cut the broadcast when I typed the R in
FROM.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-344.html&quot;&gt;https://trino.io/docs/current/release/release-344.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &quot;&gt;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses what an SPI (service provider interface) is and covers the connector architecture of Presto, Starburst, and Custom.</summary>

      
      
    </entry>
  
    <entry>
      <title>2: Kubernetes, arrays on Elasticsearch, and security breaks the UI</title>
      <link href="https://trino.io/episodes/2.html" rel="alternate" type="text/html" title="2: Kubernetes, arrays on Elasticsearch, and security breaks the UI" />
      <published>2020-10-07T00:00:00+00:00</published>
      <updated>2020-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/2</id>
      <content type="html" xml:base="https://trino.io/episodes/2.html">&lt;p&gt;This week we had a bit of a technical issue between zoom and OBS so there was 
some editing done to remove a portion of the broadcast which mainly cuts out us 
covering the releases. We circle back and give a small summary but unfortunately
 lost the majority of that part of the conversation.&lt;/p&gt;

&lt;p&gt;In this week’s concept, we cover a general overview of kubernetes and how
kubernetes is used when deploying and scaling up . We
also dive into how this is being used at our guest Cory Darby’s company,
BlueCat.&lt;/p&gt;

&lt;p&gt;In this week’s pull request covers a pull request
&lt;a href=&quot;https://github.com/trinodb/trino/pull/2462&quot;&gt;https://github.com/trinodb/trino/pull/2462&lt;/a&gt; which closes ticket
&lt;a href=&quot;https://github.com/trinodb/trino/issues/2441&quot;&gt;https://github.com/trinodb/trino/issues/2441&lt;/a&gt;. This was actually a PR Brian
submitted some months ago. He dives into a bit about 
&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html&quot;&gt;Elasticsearch mappings&lt;/a&gt; 
and how Elasticsearch models their data. He then covers how this motivated the 
pull request addressing the need for explicit mappings of which Elasticsearch 
fields are array types vs scalar.&lt;/p&gt;

&lt;p&gt;In this week’s question, we answer, “Why does the web ui say “disabled”?” This 
typically comes from a security setup issue and there’s another similar issue
 when you are using a proxy that we cover as a bonus.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-342.html&quot;&gt;https://trino.io/docs/current/release/release-342.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/current/release/release-343.html&quot;&gt;https://trino.io/docs/current/release/release-343.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&amp;lt;https://learning.oreilly.com/live-training/courses/presto-first-steps
/0636920462859/&amp;gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://medium.com/@joshua_robinson/presto-and-fast-object-putting-backups-to-use-for-devops-and-machine-learning-s3-46876eef4ffa&quot;&gt;https://medium.com/@joshua_robinson/presto-and-fast-object-putting-backups-to-use-for-devops-and-machine-learning-s3-46876eef4ffa&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>This week we had a bit of a technical issue between zoom and OBS so there was some editing done to remove a portion of the broadcast which mainly cuts out us covering the releases. We circle back and give a small summary but unfortunately lost the majority of that part of the conversation.</summary>

      
      
    </entry>
  
    <entry>
      <title>1: What is Presto, WITH RECURSIVE, and Hive connector</title>
      <link href="https://trino.io/episodes/1.html" rel="alternate" type="text/html" title="1: What is Presto, WITH RECURSIVE, and Hive connector" />
      <published>2020-09-24T00:00:00+00:00</published>
      <updated>2020-09-24T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/1</id>
      <content type="html" xml:base="https://trino.io/episodes/1.html">&lt;p&gt;Today’s concept covers a big overview of what Presto is for those that are new
to Presto. For mor information about Presto, check out the following resources:
&lt;a href=&quot;/&quot;&gt;Website&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/current/&quot;&gt;Documentation&lt;/a&gt;
Download the &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Free Presto O’Reilly Book&lt;/a&gt;
Learn &lt;a href=&quot;/development/&quot;&gt;how to contribute&lt;/a&gt;
Join our community on the &lt;a href=&quot;/slack.html&quot;&gt;Slack channel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this PR we covered &lt;a href=&quot;https://github.com/trinodb/trino/pull/5163&quot;&gt;pull request 5163&lt;/a&gt;
which is actually just a documentation update around the existing
experimental features of the WITH RECURSIVE feature. The extended development of
this feature is still being tracked and documented in 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/1122&quot;&gt;issue 1122&lt;/a&gt;. As with many 
problems in recursion, the solution space typically exponentially increases and
so it is something that can easily be misused and cause problems. We run the 
query and discuss it as well as some of the things that can go wrong. Check out
 he pull request to see more documentation that was added around it.&lt;/p&gt;

&lt;p&gt;In the question of the week, we covered a lot of the confusion around the hive
connector(https://trino.io/docs/current/connector/hive.html). Feel free to 
try out the katacoda example I created and will be nesting within an 
&lt;a href=&quot;blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;intro to the hive connector blog&lt;/a&gt;.
This is running on a non-paid katacoda account so resources are scarce at times
and it may take a while to load. Nevertheless, the information written around it
will help you quickly have a Presto environment to play with.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-341.html&quot;&gt;https://trino.io/docs/current/release/release-341.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Today’s concept covers a big overview of what Presto is for those that are new to Presto. For mor information about Presto, check out the following resources: Website Documentation Download the Free Presto O’Reilly Book Learn how to contribute Join our community on the Slack channel</summary>

      
      
    </entry>
  
</feed>
