<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator>
  <link href="https://trino.io/blog/feed.xml" rel="self" type="application/atom+xml" />
  <link href="https://trino.io/blog/" rel="alternate" type="text/html" />
  <updated>2026-04-08T03:00:47+00:00</updated>
  <id>https://trino.io/blog/feed.xml</id>

  <title>Trino Blog</title>

  <subtitle>Trino is a high performance, distributed SQL query engine for big data.</subtitle>

  
    <entry>
      <title>Introducing the NUMBER data type</title>
      <link href="https://trino.io/blog/2026/03/25/number-data-type.html" rel="alternate" type="text/html" title="Introducing the NUMBER data type" />
      <published>2026-03-25T00:00:00+00:00</published>
      <updated>2026-03-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2026/03/25/number-data-type</id>
      <content type="html" xml:base="https://trino.io/blog/2026/03/25/number-data-type.html">&lt;p&gt;One of Trino’s core strengths is breaking down data silos—enabling data
engineers to query diverse data sources through a single SQL interface. However,
when those sources use high-precision numeric types beyond Trino’s 38-digit
DECIMAL limit, that promise breaks down. Users faced an impossible choice: skip
the columns entirely and lose access to critical data, or accept lossy rounding
that compromises data integrity.&lt;/p&gt;

&lt;p&gt;This challenge required a new approach: a dedicated data type for high-precision,
variable-scale decimals.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Adding a new built-in data type to Trino is exceptionally rare. The last time we
introduced a new type was the UUID type in May 2019—nearly seven years ago.
Types are fundamental building blocks that touch many parts of the system, from
the type registry, through coercion rules to connectors, functions, and the protocol.
They require careful design and long-term commitment.&lt;/p&gt;

&lt;p&gt;With Trino 480, we’re excited to introduce the NUMBER type—a high-precision
decimal type that breaks down these data silos and enables seamless access to
numeric data across diverse database systems. This addition is particularly
powerful for data engineers working with Oracle, PostgreSQL, MySQL, MariaDB, and
SingleStore, which support numeric precision beyond the traditional 38-digit
DECIMAL limit.&lt;/p&gt;

&lt;p&gt;Let’s explore why NUMBER matters, how it works, and how it will simplify your
data integration workflows.&lt;/p&gt;

&lt;h2 id=&quot;the-challenge-precision-beyond-38-digits&quot;&gt;The challenge: precision beyond 38 digits&lt;/h2&gt;

&lt;p&gt;Trino’s DECIMAL type has long supported exact numeric values with precision up
to 38 decimal digits, which covers the vast majority of use cases. However,
many database systems support higher precision:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Oracle NUMBER&lt;/strong&gt;: when declared as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER(p, s)&lt;/code&gt;, precision must be in [1, 38] and
scale in [-84, 127]. When declared as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; without precision/scale, each value
can have different scale, and actual precision can reach 40 decimal digits. Oracle can
store values from 10^-130 to (but not including) 10^126.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;PostgreSQL NUMERIC&lt;/strong&gt;: supports precision and scale in range from -1000 to 1000;
supports very high precision numbers with up to 131,072 digits before the decimal point.
When declared without precision/scale constraints, each value can have different scale.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;MySQL, MariaDB, SingleStore DECIMAL&lt;/strong&gt;: up to 65 digits of precision (scale 0-30)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before Trino 480, accessing these high-precision numeric columns required
choosing between two unsatisfying options:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Skip the columns entirely&lt;/strong&gt; and lose access to potentially critical data.
This was the default behavior.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Accept lossy conversions&lt;/strong&gt; - Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=ALLOW_OVERFLOW&lt;/code&gt; with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-default-scale=S&lt;/code&gt; to force values into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(38, S)&lt;/code&gt;, losing precision
through rounding and failing for numbers greater than or equal to 10^(38-S).
For example, with scale 10, values ≥ 10^28 would fail.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither option is ideal for data federation and warehousing scenarios where
preserving data fidelity is essential.&lt;/p&gt;

&lt;h2 id=&quot;enter-number-arbitrary-precision-decimals-in-trino&quot;&gt;Enter NUMBER: arbitrary-precision decimals in Trino&lt;/h2&gt;

&lt;p&gt;The NUMBER type solves this problem by supporting floating-point decimal numbers
of high precision and flexible scale. In practice, NUMBER supports values with
up to 200 digits of precision – far exceeding what most database workloads require.
Each value can have a different scale, allowing for values as small as 10^-16000
(or even smaller) and as large as 10^16000 (or even larger) within the same column.&lt;/p&gt;

&lt;p&gt;Here’s what NUMBER looks like in action:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- High-precision literal (50+ digits)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;3.1415926535897932384626433832795028841971693993751&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 3.1415926535897932384626433832795028841971693993751
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Scientific notation with extreme precision&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;12345678901234567890123456789012345678901234567890e30&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 1.234567890123456789012345678901234567890123456789E+79
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Verify the type&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; number
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;special-values&quot;&gt;Special values&lt;/h3&gt;

&lt;p&gt;NUMBER also supports special values similar to IEEE 754 floating-point types:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Infinity&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;positive_infinity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;-Infinity&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;negative_infinity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;NaN&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;not_a_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; positive_infinity | negative_infinity | not_a_number
-------------------+-------------------+--------------
 +Infinity         | -Infinity         | NaN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;These special values follow intuitive comparison and ordering semantics that
follow DOUBLE behavior. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; compares as inequal to all values, including
itself. Any comparison with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; returns false. When sorting, values are
ordered as follows: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt;, all finite values, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+Infinity&lt;/code&gt; followed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The special values are particularly useful for handling edge cases in source data.
In particular, PostgreSQL’s NUMERIC type can represent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Infinity&lt;/code&gt;, and
these values are now seamlessly mapped to NUMBER when queried through the PostgreSQL
connector.&lt;/p&gt;

&lt;h2 id=&quot;seamless-connector-integration&quot;&gt;Seamless connector integration&lt;/h2&gt;

&lt;p&gt;The real power of NUMBER becomes apparent when querying external databases. Five
connectors now automatically map high-precision numeric types to NUMBER,
requiring &lt;strong&gt;no configuration changes&lt;/strong&gt;:&lt;/p&gt;

&lt;h3 id=&quot;oracle-connector&quot;&gt;Oracle connector&lt;/h3&gt;

&lt;p&gt;Oracle’s NUMBER type supports variable precision and scale. The Oracle connector
now maps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; without precision/scale → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; with extreme scale values → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Query an Oracle table with high-precision columns&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unit_price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extended_price&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extended_price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1000000000000000000000000&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;postgresql-connector&quot;&gt;PostgreSQL connector&lt;/h3&gt;

&lt;p&gt;PostgreSQL’s NUMERIC type supports very high precision and even “unconstrained”
precision. The connector automatically handles:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMERIC(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMERIC&lt;/code&gt; without precision/scale → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Access PostgreSQL scientific data without precision loss&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;measurement_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;precise_value&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- a NUMERIC column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lab&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;measurements&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;mysql-mariadb-and-singlestore-connectors&quot;&gt;MySQL, MariaDB, and SingleStore connectors&lt;/h3&gt;

&lt;p&gt;These MySQL-compatible databases support DECIMAL precision up to 65 digits. The
connectors now map:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Join across different databases with high precision&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mysql_balance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle_balance&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mysql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;banking&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accounts&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;banking&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accounts&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;abs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;0.01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;backwards-compatibility-and-migration&quot;&gt;Backwards compatibility and migration&lt;/h2&gt;

&lt;p&gt;The NUMBER type integration is designed to be seamless and backward compatible:&lt;/p&gt;

&lt;h3 id=&quot;automatic-mapping&quot;&gt;Automatic mapping&lt;/h3&gt;

&lt;p&gt;If you previously relied on the default behavior (no &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt;
configuration), your queries now automatically use NUMBER for high-precision
columns. No configuration changes needed.&lt;/p&gt;

&lt;h3 id=&quot;legacy-configurations-still-work&quot;&gt;Legacy configurations still work&lt;/h3&gt;

&lt;p&gt;If you explicitly configured &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=ALLOW_OVERFLOW&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=STRICT&lt;/code&gt;, your existing configuration continues to work. The
NUMBER mapping is disabled when these options are set, ensuring no surprises.&lt;/p&gt;

&lt;p&gt;However, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt; configuration and related session properties
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_mapping&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_default_scale&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_rounding_mode&lt;/code&gt;) are now
&lt;strong&gt;deprecated&lt;/strong&gt; and will be removed in a future Trino release. We recommend
migrating to NUMBER-based workflows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (with lossy conversion):&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# catalog/postgresql.properties
&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:postgresql://host:5432/database&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;user&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;password&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-mapping&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ALLOW_OVERFLOW&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-default-scale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;10&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-rounding-mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;HALF_UP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;After (lossless with NUMBER):&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# catalog/postgresql.properties
&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:postgresql://host:5432/database&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;user&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;password&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# No decimal-mapping needed - NUMBER is used automatically!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For Oracle, if you previously used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;oracle.number.rounding-mode&lt;/code&gt; to handle
high-precision NUMBER columns, you can now remove this configuration to enable
native NUMBER mapping.&lt;/p&gt;

&lt;h2 id=&quot;working-with-number&quot;&gt;Working with NUMBER&lt;/h2&gt;

&lt;h3 id=&quot;type-conversions&quot;&gt;Type conversions&lt;/h3&gt;

&lt;p&gt;NUMBER integrates naturally with Trino’s type system:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Convert from other numeric types&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;DECIMAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.45&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12345&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;123&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; from_decimal | from_integer | from_double
--------------+--------------+-------------
 123.45       | 12345        | 123.45
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Convert NUMBER to other types&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DECIMAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; to_bigint | to_double | to_decimal
-----------+-----------+------------
 123       | 123.456   | 123.46
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;aggregate-functions&quot;&gt;Aggregate functions&lt;/h3&gt;

&lt;p&gt;Common aggregate functions work naturally with NUMBER:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Aggregate high-precision values&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;department&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_revenue&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;department&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;creating-tables-with-number-columns&quot;&gt;Creating tables with NUMBER columns&lt;/h3&gt;

&lt;p&gt;The Oracle and PostgreSQL connectors support creating tables with NUMBER columns:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Create a PostgreSQL table with NUMBER column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;measurements&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;precise_value&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Create an Oracle table with NUMBER column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scientific_data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;experiment_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;measurement&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;technical-characteristics-and-limitations&quot;&gt;Technical characteristics and limitations&lt;/h2&gt;

&lt;p&gt;While NUMBER provides high precision, it’s important to understand its
characteristics:&lt;/p&gt;

&lt;h3 id=&quot;precision-and-scale&quot;&gt;Precision and scale&lt;/h3&gt;

&lt;p&gt;Trino’s NUMBER type characteristics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Supported precision&lt;/strong&gt;: currently 200 decimal digits.
While we consider this an implementation detail that may change in future releases,
it is unlikely that maximum precision will be decreased.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scale range&lt;/strong&gt;: -16,384 to 16,383&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Variable scale&lt;/strong&gt;: each value can have a different scale, similar to
PostgreSQL NUMERIC and Oracle NUMBER&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Special values&lt;/strong&gt;: supports &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Infinity&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Comparison of decimal numeric types across database systems:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Database&lt;/th&gt;
      &lt;th&gt;Max Precision&lt;/th&gt;
      &lt;th&gt;Scale Range&lt;/th&gt;
      &lt;th&gt;Variable Scale&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Oracle NUMBER(p, s)&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;-84 to 127&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Oracle NUMBER&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;Approximately -130 to 126&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;PostgreSQL NUMERIC(p, s)&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;-1000 to 1000&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;PostgreSQL NUMERIC&lt;/td&gt;
      &lt;td&gt;131,072&lt;/td&gt;
      &lt;td&gt;-1000 to 1000&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;MySQL/MariaDB/SingleStore DECIMAL&lt;/td&gt;
      &lt;td&gt;65&lt;/td&gt;
      &lt;td&gt;0 to 30&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Trino DECIMAL&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;0 to 38&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Trino NUMBER&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;200&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-16,384 to 16,383&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;storage-and-representation&quot;&gt;Storage and representation&lt;/h3&gt;

&lt;p&gt;NUMBER uses a variable-width binary format optimized for flexibility:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;2-byte header encoding sign and scale&lt;/li&gt;
  &lt;li&gt;Variable-length magnitude in big-endian format&lt;/li&gt;
  &lt;li&gt;The binary format is considered unstable and may evolve in future releases to
enable optimizations and performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility allows Trino to improve NUMBER’s internal representation over
time without breaking connector compatibility.
Trino SPI provides a stable API for connectors to read and write NUMBER values,
abstracting away the internal format.&lt;/p&gt;

&lt;h3 id=&quot;performance-considerations&quot;&gt;Performance considerations&lt;/h3&gt;

&lt;p&gt;NUMBER uses Java’s BigDecimal for arithmetic operations, which provides exact
precision at the cost of being slower than fixed-precision types like BIGINT,
DOUBLE or DECIMAL. For this reason, NUMBER is designed for scenarios where
precision is more important than computational speed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Best for&lt;/strong&gt;: reading and storing high-precision data from source systems,
data federation, reporting, data warehousing&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Not optimal for&lt;/strong&gt;: computational heavy-lifting, complex mathematical
operations, high-performance analytics on numeric columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workload involves extensive numeric computation, consider whether DECIMAL
(for up to 38 digits), DOUBLE (for approximate arithmetic), or BIGINT (for
integer arithmetic) might be more appropriate.&lt;/p&gt;

&lt;h3 id=&quot;function-support&quot;&gt;Function support&lt;/h3&gt;

&lt;p&gt;NUMBER supports essential operations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Arithmetic: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Aggregations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Rounding functions: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;abs()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sign()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceiling()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;floor()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;truncate()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;round()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Special value checks: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_nan()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_finite()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_infinite()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many advanced mathematical functions (trigonometric, logarithmic, etc.)
do not work with NUMBER directly and require explicit type conversions to DOUBLE or DECIMAL.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;The NUMBER type support will continue to evolve. Additional connectors are
planned for future releases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt;: for Decimal256 type mapping&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Apache Ignite&lt;/strong&gt;: for high-precision numeric support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re also exploring performance optimizations and expanding function support
based on community feedback.&lt;/p&gt;

&lt;h2 id=&quot;getting-started&quot;&gt;Getting started&lt;/h2&gt;

&lt;p&gt;NUMBER support is available now in Trino 480. To start using it:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Upgrade to Trino 480&lt;/strong&gt; - NUMBER is available out of the box&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Remove deprecated configs&lt;/strong&gt; - If you used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt; configurations,
consider removing them to enable automatic NUMBER mapping&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Query your data&lt;/strong&gt; - High-precision columns are now accessible without
configuration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For detailed documentation, refer to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/language/types.html&quot;&gt;NUMBER type reference&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/oracle.html&quot;&gt;Oracle connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/postgresql.html&quot;&gt;PostgreSQL connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/mysql.html&quot;&gt;MySQL connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/mariadb.html&quot;&gt;MariaDB connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/singlestore.html&quot;&gt;SingleStore connector documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have questions or feedback? Join the discussion on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino community
Slack&lt;/a&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#dev&lt;/code&gt; channel, or open an issue on
&lt;a href=&quot;https://github.com/trinodb/trino/issues&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The NUMBER type represents a significant milestone in Trino’s evolution,
eliminating precision loss barriers and making high-precision numeric data from
diverse sources readily accessible for analytics and reporting. We’re excited to
see how the community uses this powerful new capability!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>One of Trino’s core strengths is breaking down data silos—enabling data engineers to query diverse data sources through a single SQL interface. However, when those sources use high-precision numeric types beyond Trino’s 38-digit DECIMAL limit, that promise breaks down. Users faced an impossible choice: skip the columns entirely and lose access to critical data, or accept lossy rounding that compromises data integrity. This challenge required a new approach: a dedicated data type for high-precision, variable-scale decimals.</summary>

      
      
    </entry>
  
    <entry>
      <title>Core Principles and Design Practices of OLAP Engines</title>
      <link href="https://trino.io/blog/2025/03/27/olap-principles-book.html" rel="alternate" type="text/html" title="Core Principles and Design Practices of OLAP Engines" />
      <published>2025-03-27T00:00:00+00:00</published>
      <updated>2025-03-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/03/27/olap-principles-book</id>
      <content type="html" xml:base="https://trino.io/blog/2025/03/27/olap-principles-book.html">&lt;p&gt;Yiteng Xu and Yingju Gao are proudly announcing the new book “Core Principle and
Design Practices of OLAP Engines” from China Machine Press. This is great news
for the Trino community, since the book is based on the open source project
Trino, specifically Trino 350. It took more than four years for the two authors
to finish writing. All concepts and details are explained with Trino falvor and
generalized to all OLAP engines. Let us walk throught the chapters and you will
find out the two author dive deep into the source code layer and bring you so
many treasures.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;author-introduction&quot;&gt;Author introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/medsmeds&quot;&gt;Yiteng (Ivan) Xu&lt;/a&gt;: is a data security engineer and
is currently utilizing Trino, Spark, and Calcite for SQL analysis. His work
encompasses various scenarios, including data warehouse metrics, SQL
auto-rewriting, SQL purpose detection, and the development of SQL-based
Purpose-Aware Access Control System.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/garyelephant&quot;&gt;Yingju (Gary) Gao&lt;/a&gt; is an Apache Seatunnel PMC
member and the lead of the time series database team. He currently serves as the
technical lead for the observability-engine team, and is responsible for
building the ecosystem for observability data, including metrics, trace, log,
and event data, providing a high-performance, high-throughput data pipeline from
ingestion to consumption, storage, querying, and data warehousing. Additionally,
he oversees metrics stability, multi-tenant access, and user requirement
integration.&lt;/p&gt;

&lt;p&gt;Both authors are passionate about sharing their technical knowledge. They have
delved deep into source code and excel in technical writing, breaking down
complex underlying principles into a linear and comprehensible format for
readers. They firmly believe that sharing is a virtue and are committed to
continuing their technical contributions.&lt;/p&gt;

&lt;p&gt;So now it is time to get the book, or read on for a walk through of the content:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://product.dangdang.com/11974653727.html&quot;&gt;
        Get the book from dangdang.com
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://item.m.jd.com/product/10136949561522.html&quot;&gt;
        Get the book from jd.com
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;walk-through&quot;&gt;Walk through&lt;/h2&gt;

&lt;p&gt;Let’s have a look at the different chapters in a high-level walk through.&lt;/p&gt;

&lt;h3 id=&quot;part-1-background-knowledge&quot;&gt;Part 1: Background knowledge&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 1&lt;/strong&gt;: Introduce the concept of OLAP (Online Analytical Processing),
provide comparsion among different engines like Trino, Impala, Doris and others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 2&lt;/strong&gt;: Provides a comprehensive introduction to the Trino engine,
covering its principles, architecture, enterprise use cases, compilation, and
execution. It also compares Trino with the Presto project and introduces the
SQL statements that are referenced throughout the book.&lt;/p&gt;

&lt;h3 id=&quot;part-2-core-principles&quot;&gt;Part 2: Core principles&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 3&lt;/strong&gt;: Offers an overview of the distributed SQL query process, serving
as a high-level introduction to the subsequent chapters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 4&lt;/strong&gt;: Begins with the generation of query execution plans, including
the transformation of SQL into abstract syntax trees, semantic analysis, and the
creation of initial logical plans. It then delves into the theoretical knowledge
of optimizers and the overall framework of the Trino optimizer.&lt;/p&gt;

&lt;h3 id=&quot;part-3-classic-sql&quot;&gt;Part 3: Classic SQL&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 5&lt;/strong&gt;: Explains the generation and optimization of execution plans for
SQL statements involving only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Filter&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Project&lt;/code&gt; operations,
along with their scheduling and execution processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 6&lt;/strong&gt;: Focuses on SQL statements with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Limit&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Sort&lt;/code&gt; operations,
detailing the generation and optimization of execution plans, as well as their
scheduling and execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 7&lt;/strong&gt;: Introduces the basic principles of aggregate queries. It then
covers the generation and optimization of execution plans for grouped and
non-grouped aggregate SQL statements, along with their scheduling and execution
processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 8&lt;/strong&gt;: Discusses SQL statements with count distinct and multiple
aggregate operations, explaining the generation and optimization of execution
plans, as well as their scheduling and execution. This includes the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Scatter-Gather&lt;/code&gt; model and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MarkDistinct&lt;/code&gt; optimization. Finally, a complex SQL
statement is used to tie together the concepts from Chapters 5 to 8.&lt;/p&gt;

&lt;h3 id=&quot;part-4-data-exchange-mechanism&quot;&gt;Part 4: Data exchange mechanism&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 9&lt;/strong&gt;: Introduces the overall concept of data exchange mechanisms and
how data exchange is incorporated during the query optimization phase via the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AddExchanges&lt;/code&gt; optimizer, along with the design principles for scheduling and
execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 10&lt;/strong&gt;: Explains how tasks establish connections during the query
scheduling phase and the mechanisms for upstream and downstream data flow during
execution. It also covers the principles of intra-task data exchange, RPC
interaction mechanisms, and analyzes backpressure, Limit semantics, and
out-of-order request handling.&lt;/p&gt;

&lt;h3 id=&quot;part-5-plugin-mechanisms-and-connectors&quot;&gt;Part 5: Plugin mechanisms and connectors&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 11&lt;/strong&gt;: Begins with an introduction to Trino’s plugin system and SPI
mechanism, including plugin loading and JVM’s class loading principles. It then
dissects connectors, covering metadata modules, read modules, pushdown
optimization, and providing in-depth insights into connector design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 12&lt;/strong&gt;: Uses the example-http connector to help readers understand
connector design and implements a simple data source using Python’s Flask
framework.&lt;/p&gt;

&lt;h3 id=&quot;part-6-function-principles-and-development&quot;&gt;Part 6: Function principles and development&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 13&lt;/strong&gt;: Provides an overview of Trino’s function system, including
function types, lifecycle, and several function development methods. It delves
into the data structures and annotations related to functions and explains the
function registration and parsing process during semantic analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 14&lt;/strong&gt;: Focuses on how to write a udf in practice. It covers
annotation-based development methods for scalar functions, as well as low-level
development methods using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;codeGen&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;methodHandle&lt;/code&gt; APIs. For aggregate
functions, it introduces annotation-based development methods and low-level
methods where developers handle serialization and state on their own.&lt;/p&gt;

&lt;h3 id=&quot;why-trino&quot;&gt;Why Trino?&lt;/h3&gt;

&lt;p&gt;In 2020, one of the authors, Yiteng Xu, encountered a scenario at work where
data needed to be read from two Hive instances, each modified by different
internal teams. The company’s infrastructure team attempted a simple solution by
registering virtual tables and using MapReduce for federated queries. However,
this approach proved inadequate for the agile analysis needs of data analysts,
with complex queries taking nearly 12 hours to complete. One mistake per SQL
meant an entire day was wasted.&lt;/p&gt;

&lt;p&gt;Later, another team researched and adopted Presto (before Trino became
independent). By adapting the Hive engine at the connector level, they enabled
federated queries across the two Hive instances without data migration or
extensive code changes. Users only needed to be aware of a catalog prefix,
making the process incredibly convenient. The author later had the opportunity
to participate in the project and developed a strong interest in its source
code. The elegance of the open-source project, its plugin design, and the inner
workings of connectors and Airlift framework sparked a deep curiosity, leading
the author on a journey of source code exploration. As the PrestoSQL project was
more active and receptive to developer feedback, the author chose to continue
following the Trino project when it emerged in late 2020.&lt;/p&gt;

&lt;h2 id=&quot;get-your-copy&quot;&gt;Get your copy&lt;/h2&gt;

&lt;p&gt;Now it is time for you to get your copy of &lt;strong&gt;Core Principles and Design Practices of OLAP Engines&lt;/strong&gt;:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://product.dangdang.com/11974653727.html&quot;&gt;
        Get the book from dangdang.com
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://item.m.jd.com/product/10136949561522.html&quot;&gt;
        Get the book from jd.com
    &lt;/a&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Yiteng Xu, Yingju Gao, Manfred Moser</name>
        </author>
      

      <summary>Yiteng Xu and Yingju Gao are proudly announcing the new book “Core Principle and Design Practices of OLAP Engines” from China Machine Press. This is great news for the Trino community, since the book is based on the open source project Trino, specifically Trino 350. It took more than four years for the two authors to finish writing. All concepts and details are explained with Trino falvor and generalized to all OLAP engines. Let us walk throught the chapters and you will find out the two author dive deep into the source code layer and bring you so many treasures.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/core-principles-olap-book.jpg" />
      
    </entry>
  
    <entry>
      <title>Twenty four</title>
      <link href="https://trino.io/blog/2025/03/03/java-24.html" rel="alternate" type="text/html" title="Twenty four" />
      <published>2025-03-03T00:00:00+00:00</published>
      <updated>2025-03-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/03/03/java-24</id>
      <content type="html" xml:base="https://trino.io/blog/2025/03/03/java-24.html">&lt;p&gt;Six month ago &lt;a href=&quot;/blog/2024/09/17/java-23.html&quot;&gt;we adopted Java 23 as requirement&lt;/a&gt;, following our standard procedure to upgrade with each Java version as soon
as it becomes available. This allows us to take advantage of all the great
improvement each release brings. The upgrade to 23 was pretty easy since the
changes from 22 to 23 were not that big. The story turns out to be a bit
different now with our upgrade to Java 24.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;java-24-features&quot;&gt;Java 24 features&lt;/h2&gt;

&lt;p&gt;We have been &lt;a href=&quot;https://github.com/trinodb/trino/issues/23498&quot;&gt;planning and working towards the
upgrade&lt;/a&gt; consistently since the
23 bump in September. Java 24 is set to be released in March 2025 and the list
of changes is quite significant:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;JEP 450 Compact Object Headers (Experimental)&lt;/li&gt;
  &lt;li&gt;JEP 472 Prepare to Restrict the Use of JNI&lt;/li&gt;
  &lt;li&gt;JEP 475 Late Barrier Expansion for G1&lt;/li&gt;
  &lt;li&gt;JEP 478 Key Derivation Function API (Preview)&lt;/li&gt;
  &lt;li&gt;JEP 483 Ahead-of-Time Class Loading &amp;amp; Linking&lt;/li&gt;
  &lt;li&gt;JEP 484 Class-File API&lt;/li&gt;
  &lt;li&gt;JEP 485 Stream Gatherers&lt;/li&gt;
  &lt;li&gt;JEP 486 Permanently Disable the Security Manager&lt;/li&gt;
  &lt;li&gt;JEP 487 Scoped Values (Fourth Preview)&lt;/li&gt;
  &lt;li&gt;JEP 488 Primitive Types in Patterns, instanceof, and switch (Second Preview)&lt;/li&gt;
  &lt;li&gt;JEP 489 Vector API (Ninth Incubator)&lt;/li&gt;
  &lt;li&gt;JEP 490 ZGC: Remove the Non-Generational Mode&lt;/li&gt;
  &lt;li&gt;JEP 491 Synchronize Virtual Threads without Pinning&lt;/li&gt;
  &lt;li&gt;JEP 492 Flexible Constructor Bodies (Third Preview)&lt;/li&gt;
  &lt;li&gt;JEP 494 Module Import Declarations (Second Preview)&lt;/li&gt;
  &lt;li&gt;JEP 495 Simple Source Files and Instance Main Methods (Fourth Preview)&lt;/li&gt;
  &lt;li&gt;JEP 496 Quantum-Resistant Module-Lattice-Based Key Encapsulation Mechanism&lt;/li&gt;
  &lt;li&gt;JEP 497 Quantum-Resistant Module-Lattice-Based Digital Signature Algorithm&lt;/li&gt;
  &lt;li&gt;JEP 498 Warn upon Use of Memory-Access Methods in sun.misc.Unsafe&lt;/li&gt;
  &lt;li&gt;JEP 499 Structured Concurrency (Fourth Preview)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The list of new features is also quite large. You can find more details
in the &lt;a href=&quot;https://jdk.java.net/24/release-notes&quot;&gt;release notes&lt;/a&gt; and each
individual JEP.&lt;/p&gt;

&lt;h2 id=&quot;trino-perspective&quot;&gt;Trino perspective&lt;/h2&gt;

&lt;p&gt;From a Trino perspective we want to specifically take advantage of performance
improvements to MemorySegment (mismatch, copy, fill), “JEP 491 Synchronize
Virtual Threads without Pinning” and “JEP 475 Late Barrier Expansion for G1”. On
the other hand &lt;a href=&quot;https://openjdk.org/jeps/486&quot;&gt;JEP 486 Permanently Disable the Security
Manager&lt;/a&gt; turned out to be the most impactful.&lt;/p&gt;

&lt;p&gt;Since Trino and its connectors have a large footprint of dependencies there was
a high chance that some projects as not keeping up with the security manager
removal, although it was first deprecated with Java 17 in 2021.&lt;/p&gt;

&lt;p&gt;At this stage the Kafka, Kudu, and Phoenix connectors are affected. The Kafka
project is planning to make a new compatible release available in time and we
will adopt that version.&lt;/p&gt;

&lt;p&gt;The Kudu and Phoenix connectors however will be removed, since it is not
possible to use them with Java 24 as requirement. Both connectors are not
heavily used in our community as we learned from our communication with numerous
users, integrators, and the results from our &lt;a href=&quot;/blog/2025/01/07/2024-and-beyond.html&quot;&gt;user survey&lt;/a&gt;. We are tracking progress for each removal in the
issues &lt;a href=&quot;https://github.com/trinodb/trino/issues/24419&quot;&gt;#24419 Phoenix connector&lt;/a&gt;
and &lt;a href=&quot;https://github.com/trinodb/trino/issues/24417&quot;&gt;#24417 Kudu connector&lt;/a&gt;. If
either of these communities ends up supporting Java 24, or a newer version as
required by Trino, in the future, we can potentially add the connectors back in
if community members contribute updated versions.&lt;/p&gt;

&lt;h2 id=&quot;release-plans&quot;&gt;Release plans&lt;/h2&gt;

&lt;p&gt;In terms of shipping the changes we follow our established pattern:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Clean up codebase and get it ready, specifically this include the removal of
the Kudu and Phoenix connectors.&lt;/li&gt;
  &lt;li&gt;Cut a release that is completely ready to be used with Java 24, but does not
yet make it a hard requirement&lt;/li&gt;
  &lt;li&gt;Allow for community testing and feedback using Java 24.&lt;/li&gt;
  &lt;li&gt;Introduce Java 24 as hard requirement in another release.&lt;/li&gt;
  &lt;li&gt;Adopt Java 24 features and bring the benefits to our users with following
releases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you see, there is a bunch of work waiting, we we better back to it. As usual,
if you have questions or comments, chime in on the relevant issue or chat with
us on &lt;a href=&quot;/slack.html&quot;&gt;Trino Slack&lt;/a&gt; in the &lt;a href=&quot;https://trinodb.slack.com/messages/C07ABNN828M&quot;&gt;core-dev
channel&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Mateusz Gajewski</name>
        </author>
      

      <summary>Six month ago we adopted Java 23 as requirement, following our standard procedure to upgrade with each Java version as soon as it becomes available. This allows us to take advantage of all the great improvement each release brings. The upgrade to 23 was pretty easy since the changes from 22 to 23 were not that big. The story turns out to be a bit different now with our upgrade to Java 24.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/coffee-24.png" />
      
    </entry>
  
    <entry>
      <title>Out with the old file system</title>
      <link href="https://trino.io/blog/2025/02/10/old-file-system.html" rel="alternate" type="text/html" title="Out with the old file system" />
      <published>2025-02-10T00:00:00+00:00</published>
      <updated>2025-02-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/02/10/old-file-system</id>
      <content type="html" xml:base="https://trino.io/blog/2025/02/10/old-file-system.html">&lt;p&gt;What a long journey it has been! From the start Trino supported querying Hive
data and used libraries from the Hive and Hadoop ecosystem. With the release of
&lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470&lt;/a&gt; we mark
another milestone to more features and better performance for data lake and
lakehouse querying with Trino. We deprecated the legacy file system support, and
will permanently remove them in an upcoming release.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;Trino always had a focus on performance and security. As a result we implemented
custom readers for file formats like Apache ORC and Apache Parquet many years
ago. We also have improved libraries for compression and decompression of files
from object storage and and implemented our own support for other table formats
with the Apache Iceberg, Delta Lake and Apache Hudi connectors.&lt;/p&gt;

&lt;p&gt;For the underlying object storage solutions and file systems, we originally
extended the libraries around the Hive system and added implementations for
Amazon S3, Azure Storage, Google Cloud Storage and others. Over time the
mismatch of the HDFS libraries and the cloud-centric usage with modern file
systems became more and more of a maintenance headache. It also represented an
unnecessary complexity overhead, resulted in performance problems, and forced us
to carry the Hadoop dependencies with all their baggage of old Java code and
security issues.&lt;/p&gt;

&lt;p&gt;In the end David Philips, as our file system lead, decided in 2022 that it was
time to write our own file system support as needed for Trino. By summer of 2023
and with Trino 419 a &lt;a href=&quot;https://github.com/trinodb/trino/pull/17498&quot;&gt;first support for
S3&lt;/a&gt; became available for the
Iceberg and Delta Lake connectors. Over a year later in September 2024 and with
&lt;a href=&quot;/docs/current/release/release-458.html&quot;&gt;Trino 458&lt;/a&gt;, we declared
the old file system support on top of the Hadoop libraries legacy and advised
users to migrate.&lt;/p&gt;

&lt;p&gt;Since then you are required to declare what file system you want to enable in
each catalog with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-azure.enabled=true&lt;/code&gt;,&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-gcs.enabled=true&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-s3.enabled=true&lt;/code&gt;. If you are truly using HDFS, or if you insist on
using the old legacy support you can also use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.hadoop.enabled=true&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;trino-470&quot;&gt;Trino 470&lt;/h2&gt;

&lt;p&gt;With the recent &lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470
release&lt;/a&gt; from February
2025, we took the next step. All catalog configuration properties for using the
old, legacy support for accessing Azure Storage, Google Cloud Storage, S3, and
S3-compatible file systems are now &lt;strong&gt;deprecated&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These properties include all names starting with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.azure&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.cos&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.gcs&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.s3&lt;/code&gt;. The result of this deprecation is that Trino emits
warnings during the startup for each of these properties in the server log.&lt;/p&gt;

&lt;p&gt;We also removed all documentation for the old properties, leaving only relevant
migration guides in place.&lt;/p&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;p&gt;Within the next weeks or months we will completely remove all these properties
and the underlying code. We therefore renew our call out from numerous
contributor calls, Trino Community Broadcast episodes, and our Trino Fest and
Trino Summit events:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Stop using the old legacy file systems today.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you need help, have a look at the documentation for your connector, the file
system you use, and the migration guide for each file system:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Delta Lake connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Hive connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Hudi connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Iceberg connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-azure.html&quot;&gt;Azure Storage file system support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-gcs.html&quot;&gt;Google Cloud Storage file system support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-s3.html&quot;&gt;S3 file system support&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new systems are more stable and performant, and save you time and money.
Migrate today, and if you encounter any issues, or find that there are features
missing, ping us on &lt;a href=&quot;/slack./html&quot;&gt;Slack&lt;/a&gt; and chime in on the
&lt;a href=&quot;https://github.com/trinodb/trino/issues/24878&quot;&gt;roadmap issue for the removal of the legacy file system
support&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, David Phillips, Mateusz Gajewski</name>
        </author>
      

      <summary>What a long journey it has been! From the start Trino supported querying Hive data and used libraries from the Hive and Hadoop ecosystem. With the release of Trino 470 we mark another milestone to more features and better performance for data lake and lakehouse querying with Trino. We deprecated the legacy file system support, and will permanently remove them in an upcoming release.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/hadoop-trashcan.png" />
      
    </entry>
  
    <entry>
      <title>Trino in 2024 and beyond</title>
      <link href="https://trino.io/blog/2025/01/07/2024-and-beyond.html" rel="alternate" type="text/html" title="Trino in 2024 and beyond" />
      <published>2025-01-07T00:00:00+00:00</published>
      <updated>2025-01-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/01/07/2024-and-beyond</id>
      <content type="html" xml:base="https://trino.io/blog/2025/01/07/2024-and-beyond.html">&lt;p&gt;Wow, what an amazing year 2024 was for Trino! Martin Traverso presented about
the achievements and progress of the project at the &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;recent Trino Summit
2024&lt;/a&gt;. Let me dive
deeper into the content of his keynote and elaborate some more about our amazing
plans for the future.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;

&lt;p&gt;In his first slide of the presentation &lt;strong&gt;Enduring with persistence to reach the
summit&lt;/strong&gt; Martin presented some of the amazing statistics of the year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Over 30 releases packed with features and improvements - &lt;a href=&quot;/docs/current/release.html#releases-2024&quot;&gt;Trino releases 436-467&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;5,000+ additional commits to the 40,000+ total commits since project start&lt;/li&gt;
  &lt;li&gt;225+ unique contributors in 2024, 925+ total&lt;/li&gt;
  &lt;li&gt;10.5k+ stars on GitHub&lt;/li&gt;
  &lt;li&gt;13,500+ Slack members&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast episodes 54-67&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;improvements&quot;&gt;Improvements&lt;/h2&gt;

&lt;p&gt;Some of the major improvements in Trino are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Access controls with
&lt;a href=&quot;/docs/current/security/opa-access-control.html&quot;&gt;Open Policy Agent&lt;/a&gt; and
&lt;a href=&quot;/docs/current/security/ranger-access-control.html&quot;&gt;Apache Ranger&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Improved observability with &lt;a href=&quot;/docs/current/admin/event-listeners-openlineage.html&quot;&gt;OpenLineage&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/admin/opentelemetry.html&quot;&gt;OpenTelemetry&lt;/a&gt;, OpenMetrics, and 
&lt;a href=&quot;/docs/current/admin/event-listeners-kafka.html&quot;&gt;Kafka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Significant &lt;a href=&quot;/docs/current/client/client-protocol.html&quot;&gt;client protocol&lt;/a&gt; improvements&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;New connectors such as &lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker&lt;/a&gt;,
&lt;a href=&quot;/docs/current/connector/snowflake.html&quot;&gt;Snowflake&lt;/a&gt;, or
&lt;a href=&quot;/docs/current/connector/vertica.html&quot;&gt;Vertica&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Numerous improvements on object storage connectors and integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course we also paid a lot of attention to bug fixes and shipped tremendous
performance improvements.&lt;/p&gt;

&lt;h2 id=&quot;slides-and-video&quot;&gt;Slides and video&lt;/h2&gt;

&lt;p&gt;If you want to find out all the details, have a look at the
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-keynote.pdf&quot;&gt;&lt;strong&gt;slides&lt;/strong&gt;&lt;/a&gt;
and the video recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=wmR6kzOCo-I&quot;&gt;&lt;img src=&quot;https://img.youtube.com/vi/wmR6kzOCo-I/0.jpg&quot; alt=&quot;YouTube&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;other-projects&quot;&gt;Other projects&lt;/h2&gt;

&lt;p&gt;Martin also talked about the many improvements in other Trino projects such as
&lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino Gateway&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt;, the new
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt;, and the new
&lt;a href=&quot;https://github.com/trinodb/trino-csharp-client&quot;&gt;trino-csharp-client&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;plans-for-2025&quot;&gt;Plans for 2025&lt;/h2&gt;

&lt;p&gt;For 2025, we have some pretty big plans in addition to our continued software
supply chain attention, performance improvemsnts and bug fixes.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Secrets management and dynamic catalogs&lt;/li&gt;
  &lt;li&gt;Client protocol improvements for all client drivers&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22597&quot;&gt;Packaging improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;More connectors such as DuckDB, LanceDB, HsqlDB, Loki, …&lt;/li&gt;
  &lt;li&gt;Continued and even increased work on performance improvements&lt;/li&gt;
  &lt;li&gt;Research and prototype towards a next generation optimizer&lt;/li&gt;
  &lt;li&gt;SQL language improvements such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ASOF&lt;/code&gt; joins, …&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, what really happens in 2025 and Trino depends on you all. The project
lives and breathes only thanks to the efforts of all our contributors and
maintainers and we look forward to working with you all.&lt;/p&gt;

&lt;h2 id=&quot;trino-survey&quot;&gt;Trino survey&lt;/h2&gt;

&lt;p&gt;Besides filing issues, sending pull requests, and discussing topics on Slack and
GitHub, we also have some specific questions and would really appreciate your
feedback. Answering should take less than a minute.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://docs.google.com/forms/d/e/1FAIpQLSfrEIZ_5iyj17_hMJMdFhCIx9bQyHm6G-x6-CIq2VajURm6cQ/viewform?usp=sharing&quot;&gt;
        Help by answering the Trino survey
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;With Trino as a huge collaborative effort only one thing is for certain:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;2025 will be an exciting year for Commander Bun Bun, Trino, and the Trino project.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Wow, what an amazing year 2024 was for Trino! Martin Traverso presented about the achievements and progress of the project at the recent Trino Summit 2024. Let me dive deeper into the content of his keynote and elaborate some more about our amazing plans for the future.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Summit 2024 resources</title>
      <link href="https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap.html" rel="alternate" type="text/html" title="Trino Summit 2024 resources" />
      <published>2024-12-18T00:00:00+00:00</published>
      <updated>2024-12-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap.html">&lt;p&gt;What a view we had at the summit! Over 700 live attendees enjoyed the sessions
and learned more about Trino-related use cases and projects. Now it is time for
the additional 1000 registrants, our 13000+ Trino users on
&lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;, and everyone else in the Trino community
and beyond to enjoy the presentations and recordings at their leisure.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;day-1-sessions&quot;&gt;Day 1 sessions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Enduring with persistence to reach the summit&lt;/strong&gt;
&lt;br /&gt;   Presented by Martin Traverso, co-creator of Trino and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/wmR6kzOCo-I&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-keynote.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Running Trino as exabyte-scale data warehouse&lt;/strong&gt;
&lt;br /&gt;   Presented by Alagappan Maruthappan from &lt;a href=&quot;/users.html#netflix&quot;&gt;Netflix&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/WuUS73QPuZE&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-netflix.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Data lake at Wise powered by Trino and Iceberg&lt;/strong&gt;
&lt;br /&gt;   Presented by Peter Kosztolanyi and Abdullah Alkhawatrah from &lt;a href=&quot;https://wise.com&quot;&gt;Wise&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/K5RmYtbeXAc&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Using Trino as a strangler fig&lt;/strong&gt;
&lt;br /&gt;   Presented by Trevor Kennedy from &lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/cVA5IPWdHRs&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-fanduel.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;A lakehouse that simply works&lt;/strong&gt;
&lt;br /&gt;   Presented by Vincenzo Cassaro from &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt; 
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/6xdPRqpA8FA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-prezi.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn&lt;/strong&gt; 
&lt;br /&gt;   Presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from
&lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/rl4GLNEVkjo&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-linkedin-ai.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/strong&gt;
&lt;br /&gt;   Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
  &lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/G9jafHdH8FY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-bazaar.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/strong&gt;
&lt;br /&gt;   Presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/Yaz7fwvOPdY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-branch.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Lessons and news from the AI world for Trino&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred Moser, panel moderator and Trino maintainer at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Gunther Hagleitner, CEO and Co-founder at &lt;a href=&quot;https://waii.ai/&quot;&gt;Waii&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Rong Rong, Software Engineer at &lt;a href=&quot;https://character.ai/&quot;&gt;CharacterAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;William Chang, Co-founder and CTO of &lt;a href=&quot;/users.html#canner&quot;&gt;Canner&lt;/a&gt; and
&lt;a href=&quot;/ecosystem/client-application.html#wren-ai&quot;&gt;WrenAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Mustafa Sakalsiz, Founder and CEO at &lt;a href=&quot;/users.html#peaka&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/gobl6PhIWeE&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;day-2-sessions&quot;&gt;Day 2 sessions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trino for observability at Intuit&lt;/strong&gt; 
&lt;br /&gt;   Presented by Ujjwal Sharma and Riya John from &lt;a href=&quot;https://www.intuit.com/&quot;&gt;Intuit&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/47dMrURt7us&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-intuit.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Hassle-free dynamic policy enforcement in Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Ramanathan Ramu and Pratham Desai from &lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/GAudNEmbvsc&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-linkedin-policy.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Empowering HugoBank’s digital services through Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Mustafa Mirza and Razi Moosa from &lt;a href=&quot;https://www.hugobank.com.pk&quot;&gt;HugoBank&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/51JVd25behQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-hugobank.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security&lt;/strong&gt; 
&lt;br /&gt;   Presented by Sebastian Daberdaku from &lt;a href=&quot;https://cardoai.com&quot;&gt;CardoAI&lt;/a&gt; and
Jan Waś from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/MGuOf45cGwA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-cardoai.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Virtual view hierarchies with Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Rob Dickinson from &lt;a href=&quot;https://graylog.org/&quot;&gt;Graylog&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/z8eh_3vBpvg&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-graylog.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Opening up the Trino Gateway&lt;/strong&gt;
&lt;br /&gt;   Presented by Manfred Moser and Will Morrison from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;, 
&lt;br /&gt;   Vishal Jadhav from &lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/&quot;&gt;Bloomberg&lt;/a&gt;, and Jaehoo Yoo from &lt;a href=&quot;/users.html#naver&quot;&gt;Naver&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/MiQEngRJk8g&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-trino-gateway.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Wvlet: A new flow-style query language for functional data modeling and interactive analysis&lt;/strong&gt;
&lt;br /&gt;   Presented by Taro L. Saito from &lt;a href=&quot;/users.html#treasuredata&quot;&gt;Treasure Data&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/ot7z7J6h9rM&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-wvlet.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Securing data pipelines at the storage layer&lt;/strong&gt;
&lt;br /&gt;   Presented by Andrew MacKay from &lt;a href=&quot;https://superna.io/&quot;&gt;Superna&lt;/a&gt;.
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/Lxr4Rzn27cw&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-superna.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Empowering pharmaceutical drug launches with Trino-powered sales data analytics&lt;/strong&gt;
&lt;br /&gt;   Presented by Harpreet Singh from &lt;a href=&quot;https://www.gilead.com/&quot;&gt;Gilead&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/ELsBGx1Sv3o&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Connecting to Trino with C# and ADO.net&lt;/strong&gt; 
&lt;br /&gt;   Presented by George Fischer from &lt;a href=&quot;https://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/x2rF6IEjFK0&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-csharp-client.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our thanks go out to all our speakers as well as our event sponsor:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/users.html#starburst&quot;&gt;
&lt;img src=&quot;/assets/images/logos/starburst.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See you at Trino Fest 2025, one of our &lt;a href=&quot;/community.html#events&quot;&gt;other events and
meetings&lt;/a&gt;, and on &lt;a href=&quot;/slack.html&quot;&gt;Trino
Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Monica, and Anna&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>What a view we had at the summit! Over 700 live attendees enjoyed the sessions and learned more about Trino-related use cases and projects. Now it is time for the additional 1000 registrants, our 13000+ Trino users on Slack, and everyone else in the Trino community and beyond to enjoy the presentations and recordings at their leisure.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/recap-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>The long journey to Apache Ranger</title>
      <link href="https://trino.io/blog/2024/12/02/ranger.html" rel="alternate" type="text/html" title="The long journey to Apache Ranger" />
      <published>2024-12-02T00:00:00+00:00</published>
      <updated>2024-12-02T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/12/02/ranger</id>
      <content type="html" xml:base="https://trino.io/blog/2024/12/02/ranger.html">&lt;p&gt;&lt;a href=&quot;/ecosystem/add-on.html#apache-ranger&quot;&gt;Apache Ranger&lt;/a&gt; has
arrived! With the new &lt;a href=&quot;/docs/current/release/release-466.html&quot;&gt;Trino
466&lt;/a&gt; you all get another
jam-packed release of Trino awesomeness. One of the goodies is a new plugin for
access control for your data with Apache Ranger, and it has gone through a long
story to get here.&lt;/p&gt;

&lt;p&gt;Apache Ranger has a long history and wide adoption as an access control system
for data lakes using Hadoop and Hive. Since Trino brings fast analytics to this
space, and also supports modern data lakehouses and other data sources, Apache
Ranger is a natural fit for access control on a Trino-powered data platform.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-beginnings&quot;&gt;The beginnings&lt;/h2&gt;

&lt;p&gt;Apache Ranger has been in use with Trino for a long time - in fact there are
&lt;a href=&quot;https://github.com/trinodb/trino/pull/244&quot;&gt;early&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1069&quot;&gt;rudimentary&lt;/a&gt; pull requests from
2019 that implemented some support. And even before then, various hacks existed.
In 2020, a plugin for PrestoSQL was added to Apache Ranger. Aakash Nand blogged
about &lt;a href=&quot;https://towardsdatascience.com/integrating-trino-and-apache-ranger-b808f6b96ad8&quot;&gt;Integrating Trino and Apache
Ranger&lt;/a&gt;
in 2021 to adjust for the changes to Trino. Jeff Xu followed up with
&lt;a href=&quot;https://medium.com/@jeff.xu.z/integrating-trino-and-apache-ranger-in-a-kerberos-secured-enterprise-environment-997c95cd10e9&quot;&gt;Integrating Trino and Apache Ranger in a Kerberos-secured enterprise
environment&lt;/a&gt;
in 2022, followed quickly by the addition of the Trino support to the Apache
Ranger repository.&lt;/p&gt;

&lt;h2 id=&quot;testing-and-container-images&quot;&gt;Testing and container images&lt;/h2&gt;

&lt;p&gt;However that was only half of the needed support. The Trino project moves very
fast with nearly weekly releases, so the best approach is to have the supporting
plugin in Trino directly so every release includes the relevant updates. &lt;a href=&quot;https://github.com/dprophet&quot;&gt;Erik
Anderson&lt;/a&gt; created a more mature plugin that was in
production use for quite a while for Trino. His &lt;a href=&quot;https://github.com/trinodb/trino/pull/13297&quot;&gt;pull request from July
2022&lt;/a&gt; included great background
reasoning for having the plugin in Trino. One of the issues that Erik solved for
the Trino project is testing. Trino plugins require the availability of a
container image for testing whatever integration. Apache Ranger did still not
ship a container in 2022, but thanks to the lobbying efforts of Erik this
changed and a container image became available over the months.&lt;/p&gt;

&lt;h2 id=&quot;a-long-sprint&quot;&gt;A long sprint&lt;/h2&gt;

&lt;p&gt;Unfortunately, focus changed and while the PR from Erik existed and was useful,
it never made it to merge due to waning priorities. That changed when &lt;a href=&quot;https://github.com/mneethiraj&quot;&gt;Madhan
Neethiraj&lt;/a&gt; from the Apache Ranger project stepped
up and created &lt;a href=&quot;https://github.com/trinodb/trino/pull/22675&quot;&gt;new PR&lt;/a&gt; in July 2024.&lt;/p&gt;

&lt;p&gt;We knew this could be another shot at it, and it would require a lot of work to
get it done, since we put a high focus on quality so that we can maintain the
Trino codebase for the long run. Monitoring all PRs regularly &lt;a href=&quot;https://github.com/mosabua&quot;&gt;I (Manfred
Moser)&lt;/a&gt; noticed it and jumped in with first help.&lt;/p&gt;

&lt;p&gt;Erik and other interested users chimed in.
&lt;a href=&quot;https://github.com/lozbrown&quot;&gt;lozbrown&lt;/a&gt; and Manfred helped with documentation
and getting other developers interested. The heavy technical reviews and lots of
guidance came from &lt;a href=&quot;https://github.com/ksobolew&quot;&gt;Krzysztof Sobolewski&lt;/a&gt; and
&lt;a href=&quot;https://github.com/kokosing&quot;&gt;Grzegorz Kokosiński&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;During the whole process, Madhan had to react to comments, update the code, and
also regularly rebase his PR to adjust for the constantly changing Trino
codebase in the master branch. Starburst recognized Madhan’s effort and
&lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot;&gt;featured him as Starburst Trino
Champion&lt;/a&gt;. Interestingly,
the container image ended up not being used for testing, however it will be
crucially important for many users deploying Apache Ranger on Kubernetes anyway.
Nearly 400 comments and over four months later we all got to celebrate. The
Trino maintainer Grzegorz took on the responsibility and merged the PR. &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya
Ebihara&lt;/a&gt; and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin
Traverso&lt;/a&gt; followed up with
&lt;a href=&quot;https://github.com/trinodb/trino/pull/24238&quot;&gt;minor&lt;/a&gt;
&lt;a href=&quot;https://github.com/trinodb/trino/pull/24252&quot;&gt;cleanups&lt;/a&gt;, and we finally shipped
the plugin as part of &lt;a href=&quot;/docs/current/release/release-466.html&quot;&gt;Trino
466&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;A huge congratulations and thank you goes out to everyone involved.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now it is your turn to have a look at the
&lt;a href=&quot;/docs/current/security/apache-ranger-access-control.html&quot;&gt;documentation&lt;/a&gt;,
learn more about Trino and Apache Ranger, and maybe even proceed to help us
improve the integration.&lt;/p&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;p&gt;Beyond our celebration, more tasks are waiting for all of us:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Test it out in your usage and migrate from any old or custom versions.&lt;/li&gt;
  &lt;li&gt;Help us improve the
&lt;a href=&quot;/docs/current/security/apache-ranger-access-control.html&quot;&gt;documentation&lt;/a&gt;
significantly to allow easier adoption.&lt;/li&gt;
  &lt;li&gt;Work with lozbrown on adding support to the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm chart&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Check out the codebase and help us fix bugs and add features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And last, but not least - join us all to celebrate Trino at the upcoming &lt;a href=&quot;/blog/2024/11/22/trino-summit-2024-lineup.html&quot;&gt;Trino
Summit 2024 for two days of amazing sessions and interaction with your peers
from the Trino community&lt;/a&gt;
and the &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt; for
more open community chat and discussion.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Apache Ranger has arrived! With the new Trino 466 you all get another jam-packed release of Trino awesomeness. One of the goodies is a new plugin for access control for your data with Apache Ranger, and it has gone through a long story to get here. Apache Ranger has a long history and wide adoption as an access control system for data lakes using Hadoop and Hive. Since Trino brings fast analytics to this space, and also supports modern data lakehouses and other data sources, Apache Ranger is a natural fit for access control on a Trino-powered data platform.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/apache-ranger.png" />
      
    </entry>
  
    <entry>
      <title>The glorious lineup for Trino Summit 2024</title>
      <link href="https://trino.io/blog/2024/11/22/trino-summit-2024-lineup.html" rel="alternate" type="text/html" title="The glorious lineup for Trino Summit 2024" />
      <published>2024-11-22T00:00:00+00:00</published>
      <updated>2024-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/22/trino-summit-2024-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/22/trino-summit-2024-lineup.html">&lt;p&gt;We just wrapped up our mini training series &lt;a href=&quot;/blog/2024/11/21/sql-basecamps-view.html&quot;&gt;SQL basecamps before Trino
Summit&lt;/a&gt;, and now Trino Summit 2024
is less than three busy weeks away. It’s a good thing that we have also been
working hard on all the preparations for the summit. Everything is coming
together, and we are excited to share the full lineup for the free, virtual, two
day event today.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In &lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;our first glimpse at the summit&lt;/a&gt; we were able to share a few sessions with
more details. Now have a look at the whole lineup with speakers from all these
and many other companies:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2024/summit-wall.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Make sure you register to get up to date information and more details for all
the sessions. It will allow you to join us live, chat with the speakers during
the event. You will also get important session follow up information, including
recordings and slide decks becoming available, so you can review, watch anything
you missed, and share sessions with your peers.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; target=&quot;_blank&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]mpaign=NORAM-FY25-Q4-CM-Trino-Summit-2024&amp;amp;utm_content=blog-3&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;keynote&quot;&gt;Keynote&lt;/h2&gt;

&lt;p&gt;In the keynote &lt;strong&gt;Enduring with persistence to reach the summit&lt;/strong&gt; Martin
Traverso, co-creator of Trino and CTO at
&lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;, covers the developments from
2024 in the Trino projects and the Trino community. Martin also reveals details
about new features, new projects, and plans for 2025.&lt;/p&gt;

&lt;h2 id=&quot;panel-discussion&quot;&gt;Panel discussion&lt;/h2&gt;

&lt;p&gt;The hype and reality of AI has swept through the industry. In the panel
discussion &lt;strong&gt;Lessons and news from the AI world for Trino&lt;/strong&gt;, Manfred Moser is
moderating experts from the community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Gunther Hagleitner, CEO and Co-founder at &lt;a href=&quot;https://waii.ai/&quot;&gt;Waii&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Rong Rong, Software Engineer at &lt;a href=&quot;https://character.ai/&quot;&gt;CharacterAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;William Chang, Co-founder and CTO of &lt;a href=&quot;/users.html#canner&quot;&gt;Canner&lt;/a&gt; and
&lt;a href=&quot;/ecosystem/client#wren-ai&quot;&gt;WrenAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Mustafa Sakalsiz, Founder and CEO at &lt;a href=&quot;/users.html#peaka&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All panelists have have extensive experience with AI and Trino, and will share
their knowledge and different perspectives.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;The following sessions allow our speakers to really dig into the details of
their topic:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and
security&lt;/strong&gt; presented by Sebastian Daberdaku from
&lt;a href=&quot;https://cardoai.com/&quot;&gt;CardoAI&lt;/a&gt; and Jan Waś from
&lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Trino for Observability at Intuit&lt;/strong&gt; presented by Ujjwal Sharma and Riya John
from &lt;a href=&quot;https://www.intuit.com/&quot;&gt;Intuit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Opening up the Trino Gateway&lt;/strong&gt; presented by the Trino Gateway maintainers&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Data Lake at Wise powered by Trino and Iceberg&lt;/strong&gt; presented by Peter
Kosztolanyi and Abdallah Alkhawatrah from &lt;a href=&quot;https://wise.com&quot;&gt;Wise&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Hassle-free dynamic policy enforcement in Trino&lt;/strong&gt; presented by Ramanathan
Ramu and Pratham Desai from &lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Empowering self-serve data analytics with a text-to-SQL assistant at
LinkedIn&lt;/strong&gt; presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from
&lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;A Lakehouse that simply works&lt;/strong&gt; presented by Vincenzo Cassaro from
  &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Securing data pipelines at the storage layer&lt;/strong&gt; presented by Andrew MacKay
from &lt;a href=&quot;https://superna.io/&quot;&gt;Superna&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/strong&gt;
presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Wvlet: A new flow-style query language for functional data modeling and
interactive analysis&lt;/strong&gt; presented by Taro L. Saito from &lt;a href=&quot;/users.html#treasuredata&quot;&gt;Treasure
Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Running Trino as exabyte-scale data warehouse&lt;/strong&gt; presented by Alagappan
Maruthappan from &lt;a href=&quot;/users.html#netflix&quot;&gt;Netflix&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;lightning-talks&quot;&gt;Lightning talks&lt;/h2&gt;

&lt;p&gt;Our lightning talks provide inspiration with some great examples of Trino
adoption and usage:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Using Trino as a strangler fig&lt;/strong&gt; presented by Trevor Kennedy from
&lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Virtual view hierarchies with Trino&lt;/strong&gt; presented by Rob Dickinson from
&lt;a href=&quot;https://graylog.org/&quot;&gt;Graylog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Empowering HugoBank’s digital services through Trino&lt;/strong&gt; presented by Mustafa
Mirza and Razi Moosa from &lt;a href=&quot;https://www.hugobank.com.pk&quot;&gt;HugoBank&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/strong&gt;
presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
&lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Connecting to Trino with C# and ADO.net&lt;/strong&gt; presented by George Fischer from
&lt;a href=&quot;https://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our special thanks go out to all our speakers as well as our event sponsor:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/users.html#starburst&quot;&gt;
&lt;img src=&quot;/assets/images/logos/starburst.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See you on the summit.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Monica, and Anna&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>We just wrapped up our mini training series SQL basecamps before Trino Summit, and now Trino Summit 2024 is less than three busy weeks away. It’s a good thing that we have also been working hard on all the preparations for the summit. Everything is coming together, and we are excited to share the full lineup for the free, virtual, two day event today.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>View the SQL basecamps before Trino Summit</title>
      <link href="https://trino.io/blog/2024/11/21/sql-basecamps-view.html" rel="alternate" type="text/html" title="View the SQL basecamps before Trino Summit" />
      <published>2024-11-21T00:00:00+00:00</published>
      <updated>2024-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/21/sql-basecamps-view</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/21/sql-basecamps-view.html">&lt;p&gt;Trino Summit is inching closer fast, and we are busy with all the preparation.
Nevertheless, we thought we bring you some more SQL and Trino-related training.
The two live classes from our &lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit&lt;/a&gt; are now available for you all to enjoy, just in
case you missed it.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the two classes I teamed up with Dain Sundstrom and Martin Traverso, and
created a interview-style training classes. Hopefully you learned something from
their insights, and my guidance and questions.&lt;/p&gt;

&lt;p&gt;Check out the two session recordings and the supporting material:&lt;/p&gt;

&lt;h2 id=&quot;moving-supplies&quot;&gt;Moving supplies&lt;/h2&gt;

&lt;p&gt;In the first episode &lt;strong&gt;SQL basecamp 1 – Moving supplies&lt;/strong&gt; Dain and I discussed
the core concepts of a Trino-powered lakehouse, getting data in and maintaining
the lakehouse.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://trinodb.github.io/presentations/presentations/moving-supplies/index.html&quot;&gt;
    Look at the slides
  &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/LyBSHiCd2A8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;getting-ready-to-summit&quot;&gt;Getting ready to summit&lt;/h2&gt;

&lt;p&gt;The second episode &lt;strong&gt;SQL Basecamp 2 – Getting ready to summit&lt;/strong&gt; builds on the
foundation established in episode 1. Martin and I discussed some further details
for lakehouse usage and then looked at structural data types and views.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://trinodb.github.io/presentations/presentations/getting-ready-to-summit/index.html&quot;&gt;
    Look at the slides
  &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/32uGABdBCTQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;next-up-trino-summit&quot;&gt;Next up, Trino Summit&lt;/h2&gt;

&lt;p&gt;If you think those two sessions were great, how about two days worth of great
presentations at Trino Summit?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024&amp;amp;utm_content=sql-series-recap-blog&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino Summit is inching closer fast, and we are busy with all the preparation. Nevertheless, we thought we bring you some more SQL and Trino-related training. The two live classes from our SQL basecamps before Trino Summit are now available for you all to enjoy, just in case you missed it.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/sql-basecamps-2024.png" />
      
    </entry>
  
    <entry>
      <title>Trino and Javascript?! YES!</title>
      <link href="https://trino.io/blog/2024/11/18/javascript.html" rel="alternate" type="text/html" title="Trino and Javascript?! YES!" />
      <published>2024-11-18T00:00:00+00:00</published>
      <updated>2024-11-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/18/javascript</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/18/javascript.html">&lt;p&gt;Trino is written in Java. Trino contributors and maintainers are often veterans
in the Java ecosystem and community, and Trino is very modern when it comes to
Java. For example, Trino now requires the latest Java version and actively uses
new features.&lt;/p&gt;

&lt;p&gt;When it comes to JavaScript however, the story is a bit more complicated. Of
course, JavaScript is commonly used in the Trino ecosystem and codebase. Let’s
look at some of the specifics.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;client-driver-and-applications&quot;&gt;Client driver and applications&lt;/h2&gt;

&lt;p&gt;Client applications that allow users to submit queries to Trino, and then
receive the results are written in numerous languages. Trino has good support
for &lt;a href=&quot;/ecosystem/index.html#clients&quot;&gt;many of them&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks to the collaboration with &lt;a href=&quot;https://github.com/regadas&quot;&gt;Filipe Regadas&lt;/a&gt;
and the contribution of his JavaScript client driver to the Trino community, we
now have an official
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt; project. After his
initial donation we have applied numerous improvements and recently cut our
first release.&lt;/p&gt;

&lt;p&gt;The client is already used in the &lt;a href=&quot;/ecosystem/client#vscode&quot;&gt;VisualCode
support&lt;/a&gt;, the &lt;a href=&quot;/ecosystem/client#emacs&quot;&gt;Emacs
support&lt;/a&gt;, the example project discussed
in &lt;a href=&quot;/episodes/63.html&quot;&gt;Trino Community Broadcast episode 63&lt;/a&gt;,
and numerous other applications.&lt;/p&gt;

&lt;p&gt;And we have big plans as well:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for more authentication methods supported in Trino&lt;/li&gt;
  &lt;li&gt;Improve documentation and example projects&lt;/li&gt;
  &lt;li&gt;Add support for the new spooling client protocol from Trino&lt;/li&gt;
  &lt;li&gt;Test with Trino Gateway and adjust as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While this project is a great addition for many users of Trino and their custom
web applications, there are numerous other usages of JavaScript in the project.&lt;/p&gt;

&lt;h2 id=&quot;user-interfaces&quot;&gt;User interfaces&lt;/h2&gt;

&lt;p&gt;Web-based user interfaces are one important use of JavaScript. Trino includes
the &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Trino Web UI&lt;/a&gt; and
the ongoing effort to replace it with a more modern and feature rich UI -
currently called the &lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;Preview
UI&lt;/a&gt;. It was
inspired by the replacement of the legacy UI for &lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino
Gateway&lt;/a&gt; with a new UI based on
current tools and libraries.&lt;/p&gt;

&lt;p&gt;All three user interfaces require constant work in terms of upkeep to current
libraries, bug fixes, and addition of new features.&lt;/p&gt;

&lt;h2 id=&quot;other-projects&quot;&gt;Other projects&lt;/h2&gt;

&lt;p&gt;Beyond the user interfaces we also provide a &lt;a href=&quot;https://github.com/trinodb/grafana-trino&quot;&gt;plugin for
Grafana&lt;/a&gt; that is mostly written in
Javascript, and there might be more projects on the way.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;The skills and experience needed for all these JavaScript-based efforts are
different enough to ensure that there are developers out there who can help in
these efforts without knowing much about Trino and Java.&lt;/p&gt;

&lt;p&gt;If that is you, we want to hear from you. And if you are also knowledgable in
Trino, Java, and many other things, and also interested to help on the
JavaScript stuff, we also want to hear from you. There is always more stuff we
want to get done and we need your help.&lt;/p&gt;

&lt;p&gt;So have a look at the codebase that interests you the most, chat with us on
&lt;a href=&quot;/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, join an &lt;a href=&quot;/community.html#events&quot;&gt;upcoming Trino contributor
call&lt;/a&gt; and &lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;Trino Summit&lt;/a&gt;, and let me know if you would be
interested in a regular Trino JavaScript call - for example monthly?&lt;/p&gt;

&lt;p&gt;And if you don’t want to code in Java or JavaScript? Well, you can help us write
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/docs&quot;&gt;documentation in Markdown&lt;/a&gt;,
work on the &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt;, the
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Go client&lt;/a&gt;, or maybe even
contribute a client we don’t even have yet.&lt;/p&gt;

&lt;p&gt;In all cases, we look forward to your help.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino is written in Java. Trino contributors and maintainers are often veterans in the Java ecosystem and community, and Trino is very modern when it comes to Java. For example, Trino now requires the latest Java version and actively uses new features. When it comes to JavaScript however, the story is a bit more complicated. Of course, JavaScript is commonly used in the Trino ecosystem and codebase. Let’s look at some of the specifics.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/javascript-small.png" />
      
    </entry>
  
    <entry>
      <title>A glimpse at the summit</title>
      <link href="https://trino.io/blog/2024/10/17/trino-summit-2024-tease.html" rel="alternate" type="text/html" title="A glimpse at the summit" />
      <published>2024-10-17T00:00:00+00:00</published>
      <updated>2024-10-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/17/trino-summit-2024-tease</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/17/trino-summit-2024-tease.html">&lt;p&gt;Our efforts around &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt; are ramping up and the event
is creeping closer and closer. We are really looking forward to the two-day,
free, virtual event in December about all things Trino.&lt;/p&gt;

&lt;p&gt;While we are working hard to put together the &lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit
training sessions&lt;/a&gt; and &lt;a href=&quot;/community.html#events&quot;&gt;other community
events&lt;/a&gt;, a number of your awesome peers
from the Trino community submitted session proposals, and we are excited to
share that glimpse on the agenda for Trino Summit 2024.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;first-batch-of-sessions&quot;&gt;First batch of sessions&lt;/h2&gt;

&lt;p&gt;Let’s see what already settled on the agenda.&lt;/p&gt;

&lt;h3 id=&quot;running-trino-as-exabyte-scale-data-warehouse&quot;&gt;Running Trino as exabyte-scale data warehouse&lt;/h3&gt;

&lt;p&gt;Presented by Alagappan Maruthappan from &lt;a href=&quot;https://netflix.com&quot;&gt;Netflix&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Netflix operates over 15 Trino clusters, efficiently handling more than 10
million queries each month. As the initial creator of the Apache Iceberg,
Netflix has over 1 million Iceberg tables extensively using the Trino Iceberg
connector. In this session we talk about the operational challenges faced,
internal efficiency improvements, and our experience with upgrading to the
latest Trino version.&lt;/p&gt;

&lt;h3 id=&quot;a-lakehouse-that-simply-works&quot;&gt;A Lakehouse that simply works&lt;/h3&gt;

&lt;p&gt;Presented by Vincenzo Cassaro from &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the billions of tech and vendor proposal, it’s easy to loose track of what
truly matters. Vincenzo would like to show how a simple combination of established,
maintained, open source technologies can make a lakehouse that truly works for a
150M users company.&lt;/p&gt;

&lt;h3 id=&quot;how-trino-and-dbt-unleashed-many-to-many-interoperability-at-bazaar&quot;&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/h3&gt;

&lt;p&gt;Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
&lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learn how Bazaar leveraged the combined power of Trino and dbt to scale their
data platform effectively. This talk delves into the strategies and technologies
used to enable many-to-many integration, fueling data-driven decision-making
across the organization.&lt;/p&gt;

&lt;h3 id=&quot;maximizing-cost-efficiency-in-data-analytics-with-trino-and-iceberg&quot;&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/h3&gt;

&lt;p&gt;Presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At Branch, we realized that our existing architecture, was not only expensive
but also becoming unsustainable as data volumes grew for one of our business
units and we decided to adopt Trino and Apache Iceberg. Our journey of migrating
from Apache Druid to Trino and Iceberg taught us that the right combination of
tools can transform data analytics for one of our internal business units,
offering the perfect balance between cost savings, performance, and scalability.
Learn more how we achieved 7-figure savings with a few “compromises”.&lt;/p&gt;

&lt;h3 id=&quot;using-trino-as-a-strangler-fig&quot;&gt;Using Trino as a strangler fig&lt;/h3&gt;

&lt;p&gt;Presented by Trevor Kennedy from &lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This talk discusses how FanDuel uses Trino to migrate analysts from Redshift to
Delta Lake using Martin Fowler’s Strangler Fig pattern. Trino slowly took roots
after initial trials, started replacing parts of the legacy system, and
eventually will be a complete replacement with a shadow of the original system.&lt;/p&gt;

&lt;h3 id=&quot;enduring-with-persistence-to-reach-the-summit&quot;&gt;Enduring with persistence to reach the summit&lt;/h3&gt;

&lt;p&gt;Presented by Martin Traverso from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the keynote Martin presents the latest and greatest news from the Trino
project and the Trino community. With more contributors, more maintainers, and a
larger community we got a lot done since Trino Fest in June. Find out the
details from the co-creator of Trino.&lt;/p&gt;

&lt;p&gt;Surely, you don’t need any more convincing and you are ready to proceed to&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=blog-2&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;continued-call-for-speakers&quot;&gt;Continued call for speakers&lt;/h2&gt;

&lt;p&gt;Now that you registered and saw what others have submitted and got accepted, we
are sure you are thinking:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Well, thats interesting, but I can submit a talk like that and even better!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We agree and know you are up to it, so go ahead and submit a proposal:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;And if necessary, check the &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;original announcement&lt;/a&gt; for more tips and ideas.&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-summit&quot;&gt;Sponsor Trino Summit&lt;/h2&gt;

&lt;p&gt;To make the event a smashing hit, we are also looking for more sponsors.
Starburst, as the organizing sponsor of the event, is excited and interested to
collaborating with other organizations from the Trino community. If you are
interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com?subject=Sponsor%20Trino%20Summit&quot;&gt;events@starburstdata.com&lt;/a&gt;
for information.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>Our efforts around Trino Summit 2024 are ramping up and the event is creeping closer and closer. We are really looking forward to the two-day, free, virtual event in December about all things Trino. While we are working hard to put together the SQL basecamps before Trino Summit training sessions and other community events, a number of your awesome peers from the Trino community submitted session proposals, and we are excited to share that glimpse on the agenda for Trino Summit 2024.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>A Kubernetes operator for Trino?</title>
      <link href="https://trino.io/blog/2024/10/10/operator.html" rel="alternate" type="text/html" title="A Kubernetes operator for Trino?" />
      <published>2024-10-10T00:00:00+00:00</published>
      <updated>2024-10-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/10/operator</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/10/operator.html">&lt;p&gt;Trino is deployed everywhere – on-premise, in private data centers, in the cloud
with hosting providers, on bare metal servers, on virtual machines, and with
containers. With all these options for deployments, a Kubernetes-based platform
with a container emerged as the most widely used approach.&lt;/p&gt;

&lt;p&gt;The Trino project caters for this usage with our &lt;a href=&quot;/docs/current/installation/containers.html&quot;&gt;container
images&lt;/a&gt; for every
release and our &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm chart&lt;/a&gt;. However we keep
hearing from people who want to use a Kubernetes operator…&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;existing-operators&quot;&gt;Existing operators&lt;/h2&gt;

&lt;p&gt;We know that various companies have Kubernetes operators developed internally,
and we also know that open source ones exist, for example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/stackabletech/trino-operator&quot;&gt;trino-operator&lt;/a&gt; from
Stackable with integration in
&lt;a href=&quot;https://github.com/stackabletech/trino-lb&quot;&gt;trino-lb&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://charmhub.io/trino-k8s&quot;&gt;Charmed Trino K8s Operator&lt;/a&gt; from Canonical&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ideally these separate efforts can combine their work, and create a great
operator in the Trino project that is closely aligned with Trino itself, and
also suitable for future integration with Trino Gateway. In fact, the Trino
Gateway is a good example where different parties came together and considerably
innovated together. Hopefully we can achieve the same with the operator. It can
still be expandable and modular to suite for specific needs on different
platforms and for different users.&lt;/p&gt;

&lt;p&gt;We also know that this is &lt;a href=&quot;https://github.com/trinodb/trino/issues/396&quot;&gt;a long standing community wish from the
issue&lt;/a&gt; and various discussions with
users.&lt;/p&gt;

&lt;h2 id=&quot;discussing-next-steps&quot;&gt;Discussing next steps&lt;/h2&gt;

&lt;p&gt;However there are some complications such as choice of programming language or
commitment to help within the Trino project as subproject maintainer. We kicked
off some of these discussion in the past at Trino contributor meetings, and hope
that now is a good time to continue.&lt;/p&gt;

&lt;p&gt;To that end we are arranging a community meeting:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Virtual video call&lt;/li&gt;
  &lt;li&gt;30th of October 2024&lt;/li&gt;
  &lt;li&gt;8:00 PDT / 11:00 EDT / 15:00 GMT / 16:00 CET&lt;/li&gt;
  &lt;li&gt;Invite available from Manfred on Trino Slack or via email:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;mailto:manfred@starburst.io?subject=trino-k8s-operator&quot;&gt;
        Tell Manfred you want to join
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;We will also post connection details on the #kubernetes channel and we are
collecting related discussion points on
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-kubernetes-operator-discussion-30-oct-2024&quot;&gt;our contributor meeting page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to a great discussion.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>Trino is deployed everywhere – on-premise, in private data centers, in the cloud with hosting providers, on bare metal servers, on virtual machines, and with containers. With all these options for deployments, a Kubernetes-based platform with a container emerged as the most widely used approach. The Trino project caters for this usage with our container images for every release and our Helm chart. However we keep hearing from people who want to use a Kubernetes operator…</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/kubernetes.png" />
      
    </entry>
  
    <entry>
      <title>SQL basecamps before Trino Summit</title>
      <link href="https://trino.io/blog/2024/10/07/sql-basecamps.html" rel="alternate" type="text/html" title="SQL basecamps before Trino Summit" />
      <published>2024-10-07T00:00:00+00:00</published>
      <updated>2024-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/07/sql-basecamps</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/07/sql-basecamps.html">&lt;p&gt;Later in December your knowledge of our Trino SQL query engine will certainly
peak again at &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;. To reach those heights and
absorb all there is to learn at Trino Summit, you need to get ready.&lt;/p&gt;

&lt;p&gt;That is why I teamed up with our &lt;a href=&quot;/development/roles#benevolent-dictators-for-life-&quot;&gt;Trino creators and
BDFLs&lt;/a&gt; –
Martin Traverso, Dain Sundstrom, and David Phillips. We aim to be your coaches
and trainers to get you ready and get to the summit without the need for oxygen
masks and sherpas. Join us for the &lt;strong&gt;“SQL basecamps before Trino Summit”&lt;/strong&gt;,
where we expand on our &lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;list=PLFnr63che7wYzZoo5yyEF5R1QrOH6VRq3&quot;&gt;past SQL training
series&lt;/a&gt;
with two new episodes.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/sql-basecamps-before-trino-summit/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-SQL-Basecamps-Before-Trino-Summit&amp;amp;utm_content=blog-1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Both planned sessions provide a high-level overview and some practical tips and
tricks over the course of an hour. The sessions are completed by an open
questions and answers section with the speakers.&lt;/p&gt;

&lt;h2 id=&quot;moving-supplies&quot;&gt;Moving supplies&lt;/h2&gt;

&lt;p&gt;In the first episode &lt;strong&gt;SQL basecamp 1 – Moving supplies&lt;/strong&gt; David and Dain will
help me provide an overview of the wide range of possibilities when it comes to
moving data to Trino and moving data with Trino.&lt;/p&gt;

&lt;p&gt;We specifically look at the strengths of Trino for running your data lakehouse
and migrating to it from legacy data lakes or other systems. SQL skills
discussed include tips for creating schemas and tables, adding and updating
data, and inspecting metadata. We talk about table procedures for data
management and also cover some operational aspects. For example, we talk about
the right configuration in your catalogs for your object storage, specifically
the new file system support in Trino.&lt;/p&gt;

&lt;h2 id=&quot;getting-ready-to-summit&quot;&gt;Getting ready to summit&lt;/h2&gt;

&lt;p&gt;The second episode &lt;strong&gt;SQL Basecamp 2 – Getting ready to summit&lt;/strong&gt; builds on the
foundation established in episode 1. Data has moved into the lakehouse, powered
by Trino, and more data is added and changed as part of normal operation. In
this episode Martin and myself look at maintaining the data in a healthy state
and explore some tips and tricks for querying data. For example, we look at data
management with procedures, analyzing data with window functions, and examine
more complex structural data.&lt;/p&gt;

&lt;h2 id=&quot;what-do-want-to-learn&quot;&gt;What do want to learn&lt;/h2&gt;

&lt;p&gt;So there you have it - enough reason to register. Well, if not we can do better:
Both sessions are aimed at all of you out there using Trino and we are ready to
discuss your questions during class. More importantly though, I would also love
to hear your suggestions for these and other topics about SQL and Trino. We can
adjust this series, figure out a session for Trino Summit, or bring another SQL
training series to you next year.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;mailto:manfred@starburst.io?subject=SQL%20basecamp%20idea&quot;&gt;
        Submit an idea to Manfred
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;trino-summit-needs-you&quot;&gt;Trino Summit needs you!&lt;/h2&gt;

&lt;p&gt;Now with all that in mind, what are you waiting for? Get ready to learn more
about SQL with Trino in the series and at Trino Summit.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/sql-basecamps-before-trino-summit/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-SQL-Basecamps-Before-Trino-Summit&amp;amp;utm_content=blog-1&quot;&gt;
        I am convinced - register now
    &lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;And of course, we are also interested in your 
&lt;a href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;speaker proposals&lt;/a&gt; and 
&lt;a href=&quot;mailto:events@starburstdata.com?subject=Sponsor%20Trino%20Summit%202024&quot;&gt;sponsorships&lt;/a&gt;
for Trino Summit to make it an awesome event for everyone again.&lt;/p&gt;

&lt;p&gt;See you soon,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Later in December your knowledge of our Trino SQL query engine will certainly peak again at Trino Summit 2024. To reach those heights and absorb all there is to learn at Trino Summit, you need to get ready. That is why I teamed up with our Trino creators and BDFLs – Martin Traverso, Dain Sundstrom, and David Phillips. We aim to be your coaches and trainers to get you ready and get to the summit without the need for oxygen masks and sherpas. Join us for the “SQL basecamps before Trino Summit”, where we expand on our past SQL training series with two new episodes. Register now</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/sql-basecamps-2024.png" />
      
    </entry>
  
    <entry>
      <title>23 is a go, keeping pace with Java</title>
      <link href="https://trino.io/blog/2024/09/17/java-23.html" rel="alternate" type="text/html" title="23 is a go, keeping pace with Java" />
      <published>2024-09-17T00:00:00+00:00</published>
      <updated>2024-09-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/09/17/java-23</id>
      <content type="html" xml:base="https://trino.io/blog/2024/09/17/java-23.html">&lt;p&gt;Only about ten Trino releases or six months ago, we released &lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino
447&lt;/a&gt; with the requirement to
use Java 22. In recent releases we started to take more and more advantage of
features that are only available with that upgrade. We made some big steps in
terms of performance and talked talked about some of those performance
enhancements around aircompressor in the recent &lt;a href=&quot;https://trino.io/episodes/65.html&quot;&gt;Trino Community Broadcast
65&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Java community runs its release processes on a very predictable schedule -
March and September mean new Java releases. This time it’s Java 23, and
Trino will not be left behind. We are upgrading to &lt;a href=&quot;https://github.com/trinodb/trino/issues/21316&quot;&gt;use and require Java
23&lt;/a&gt; soon!.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;While the new features and improvements in Java 23 are not as impactful as in
Java 22, we still need to keep pace to take advantage of the improvements and
avoid any problems in the future. Here are the Java Enhancement Proposals that
are &lt;a href=&quot;https://openjdk.org/projects/jdk/23/&quot;&gt;included with Java 23&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/455&quot;&gt;JEP 455:	Primitive Types in Patterns, instanceof, and switch (Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/466&quot;&gt;JEP 466:	Class-File API (Second Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/467&quot;&gt;JEP 467:	Markdown Documentation Comments&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/469&quot;&gt;JEP 469:	Vector API (Eighth Incubator)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/473&quot;&gt;JEP 473:	Stream Gatherers (Second Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/471&quot;&gt;JEP 471:	Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/474&quot;&gt;JEP 474:	ZGC: Generational Mode by Default&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/476&quot;&gt;JEP 476:	Module Import Declarations (Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/477&quot;&gt;JEP 477:	Implicitly Declared Classes and Instance Main Methods (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/480&quot;&gt;JEP 480:	Structured Concurrency (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/481&quot;&gt;JEP 481:	Scoped Values (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/482&quot;&gt;JEP 482:	Flexible Constructor Bodies (Second Preview)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more you can check out the &lt;a href=&quot;https://www.youtube.com/watch?v=ymuv5aUzWu0&quot;&gt;short summary
video&lt;/a&gt; or the &lt;a href=&quot;https://www.youtube.com/watch?v=QG9xKpgwOI4&quot;&gt;three hour long
launch stream&lt;/a&gt;. The &lt;a href=&quot;https://www.oracle.com/news/announcement/oracle-releases-java-23-2024-09-17/&quot;&gt;Oracle press
release&lt;/a&gt;
as well as the &lt;a href=&quot;https://blogs.oracle.com/java/post/the-arrival-of-java-23&quot;&gt;community
announcement&lt;/a&gt; also
bring you a wealth of further information.&lt;/p&gt;

&lt;p&gt;Overall our reasoning is unchanged from the &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;upgrade to 21&lt;/a&gt; and the &lt;a href=&quot;/blog/2024/03/13/java-22.html&quot;&gt;upgrade to 22&lt;/a&gt;.
So what are we specifically doing now?&lt;/p&gt;

&lt;h2 id=&quot;current-status-and-plans&quot;&gt;Current status and plans&lt;/h2&gt;

&lt;p&gt;Early access binaries have been in use in our continuous integration builds for
months. Java 23 launched today and the various JDK distribution binary packages
will become available shortly. We are executing on the same blueprint as last
time:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Wait for &lt;a href=&quot;https://adoptium.net/temurin/releases/&quot;&gt;Eclipse Temurin&lt;/a&gt; binaries.&lt;/li&gt;
  &lt;li&gt;Ensure everything works with Java 23.&lt;/li&gt;
  &lt;li&gt;Change the container image to use Java 23.&lt;/li&gt;
  &lt;li&gt;Cut a release and get community feedback from testing with the container.&lt;/li&gt;
  &lt;li&gt;Adjust to any feedback and available improvements for a few releases.&lt;/li&gt;
  &lt;li&gt;Switch the requirement for build and runtime to Java 23.&lt;/li&gt;
  &lt;li&gt;Cut another release and celebrate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Timing on all the work depends on obstacles we find on the way and how we
progress with removing them. We use the &lt;a href=&quot;https://github.com/trinodb/trino/issues/21316&quot;&gt;Java 23 tracking
issue&lt;/a&gt; and the linked issues and
pull requests to manage progress, discuss next steps, and work with the
community.&lt;/p&gt;

&lt;p&gt;Feel free to chime in there, find us on the &lt;a href=&quot;https://trinodb.slack.com/messages/C07ABNN828M&quot;&gt;#core-dev
channel&lt;/a&gt; on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino
community Slack&lt;/a&gt; or join us for a &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;contributor
call&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Mateusz Gajewski</name>
        </author>
      

      <summary>Only about ten Trino releases or six months ago, we released Trino 447 with the requirement to use Java 22. In recent releases we started to take more and more advantage of features that are only available with that upgrade. We made some big steps in terms of performance and talked talked about some of those performance enhancements around aircompressor in the recent Trino Community Broadcast 65. The Java community runs its release processes on a very predictable schedule - March and September mean new Java releases. This time it’s Java 23, and Trino will not be left behind. We are upgrading to use and require Java 23 soon!.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-23.png" />
      
    </entry>
  
    <entry>
      <title>Announcing Trino Summit 2024</title>
      <link href="https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers.html" rel="alternate" type="text/html" title="Announcing Trino Summit 2024" />
      <published>2024-07-11T00:00:00+00:00</published>
      <updated>2024-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers</id>
      <content type="html" xml:base="https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers.html">&lt;p&gt;Fresh off the heels of &lt;a href=&quot;/blog/2024/06/24/trino-fest-recap.html&quot;&gt;Trino Fest 2024&lt;/a&gt;, where Commander Bun Bun was busy meeting the Trino community in-person,
we’re already looking forward to another, bigger event to round out the year in
Trino. For those who’ve been here a while, you know that can only mean one
thing: Trino Summit 2024. Much like last year, it will be a two-day, fully
virtual event, hosting a wide range of talks covering all things Trino on the
11th and 12th of December. Read on for more info, or if you’re already
convinced…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]Y25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=CFS-Blog&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;join-us-online&quot;&gt;Join us online&lt;/h2&gt;

&lt;p&gt;Trino Summit is an event that brings together engineers, analysts, data
scientists, and anyone else interested in using or contributing to Trino. As the
biggest Trino event of the year, we’re excited to bring together professionals
from the big data and analytics community, so they can share experiences and
insights, make connections, and learn from each other.&lt;/p&gt;

&lt;p&gt;The event will be broadcast live, and speakers will be addressing questions
asked in chat, so if you want the full experience, make sure to register and
attend while the talks are happening. Even if you can’t make it, registering
means you’ll be notified when we post videos of all talks to the Trino YouTube
channel after the event, &lt;a href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]Y25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=CFS-Blog&quot;&gt;so don’t fret - sign up!&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;Interested in speaking? We want to hear from everyone in the Trino community
who has something to share. We are looking for full sessions (about 30 minutes)
and lightning talks (15 minutes). We welcome beginner to highly advanced
submissions for talks that are connected to Trino.&lt;/p&gt;

&lt;p&gt;A two-day event means we’ve got room for everything, so if you’re unsure about
whether to submit a talk, go ahead and do it! We’ll review all submissions, and
we’ll do our best to work with you to turn your talk into a smash hit. Some
possible topics include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Best practices and use cases&lt;/li&gt;
  &lt;li&gt;Data lake, lakehouse, and data federation architectures&lt;/li&gt;
  &lt;li&gt;Query federation and data migrations&lt;/li&gt;
  &lt;li&gt;Table formats, file formats, and metadata catalogs&lt;/li&gt;
  &lt;li&gt;Optimizations and performance improvements&lt;/li&gt;
  &lt;li&gt;Data engineering, including data cleaning, batch and streaming architectures,
and maintenance&lt;/li&gt;
  &lt;li&gt;Streaming and other data ingestion and pipelines&lt;/li&gt;
  &lt;li&gt;Data science workflows and analytics&lt;/li&gt;
  &lt;li&gt;SQL analytics, business intelligence, dashboarding and other visualizations&lt;/li&gt;
  &lt;li&gt;Data governance and security&lt;/li&gt;
  &lt;li&gt;Writing advanced SQL queries and pipelines&lt;/li&gt;
  &lt;li&gt;Help for Trino deployment on-premise and in the cloud&lt;/li&gt;
  &lt;li&gt;Developing custom connectors and other plugins&lt;/li&gt;
  &lt;li&gt;Contributing to Trino&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to speak?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;sponsor-trino-summit&quot;&gt;Sponsor Trino Summit&lt;/h2&gt;

&lt;p&gt;Starburst is the organizing sponsor of the event, but to make Trino Summit a
smashing success, they’re excited and interested in collaborating with other
organizations within the community. If you are interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt; for information.&lt;/p&gt;

&lt;p&gt;And regardless of whether you’re planning on attending, speaking, or sponsoring,
we look forward to seeing you soon!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden, Manfred Moser, and Monica Miller</name>
        </author>
      

      <summary>Fresh off the heels of Trino Fest 2024, where Commander Bun Bun was busy meeting the Trino community in-person, we’re already looking forward to another, bigger event to round out the year in Trino. For those who’ve been here a while, you know that can only mean one thing: Trino Summit 2024. Much like last year, it will be a two-day, fully virtual event, hosting a wide range of talks covering all things Trino on the 11th and 12th of December. Read on for more info, or if you’re already convinced… Register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest 2024 recap</title>
      <link href="https://trino.io/blog/2024/06/24/trino-fest-recap.html" rel="alternate" type="text/html" title="Trino Fest 2024 recap" />
      <published>2024-06-24T00:00:00+00:00</published>
      <updated>2024-06-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/06/24/trino-fest-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2024/06/24/trino-fest-recap.html">&lt;p&gt;Trino Fest 2024 is successfully in the books! While over 100 enthusiastic
members of the community gathered in Boston, over 650 virtual attendees joined
us worldwide to learn from our expert speakers as they discussed topics such as
table formats, enhancements and optimizations, and use cases with Trino both
large and small. And now it is your chance to revisit the presentations or catch
up on everything you missed.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;impressions&quot;&gt;Impressions&lt;/h2&gt;

&lt;p&gt;Judging from early results from attendee and speaker feedback, everyone enjoyed
the event. Asked about what sessions the audience liked we got answers like&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;They were all very insightful.&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;All of it, but especially the realtime demos to see speed difference on query
optimization.&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;and &lt;em&gt;All of them, nothing was missed!&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just like some attendees, our speakers travelled from Europe, Asia, and other
places, and enjoyed the event.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Thanks for organizing the awesome event and inviting me for the talk!&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Was great to finally meet you and we had a great time at Trino Fest!&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Thanks for a great event last week. It was a pleasure to meet you all.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of us also &lt;a href=&quot;https://www.linkedin.com/posts/k-shreya-s_trinofest2024-bigdata-analytics-activity-7209236269774585857-p8-e?utm_source=share&amp;amp;utm_medium=member_desktop&quot;&gt;met Commander Bun Bun&lt;/a&gt;,
and &lt;a href=&quot;https://www.youtube.com/watch?v=4jPYpU9Jrrw&quot;&gt;we sent greetings to the remote audience as
well&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/blog/trino-fest-2024/cbb-manfred.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The keynote, the sessions, and all the talk in the hallways confirmed that Trino
continues to thrive and expand in usage. Large companies like &lt;a href=&quot;https://trino.io/users.html&quot;&gt;Apple, Microsoft,
LinkedIn, Amazon, and many other users&lt;/a&gt; openly talk
about shipping Trino as part of their products and using it for internal usage
as well. Smaller companies either run Trino themselves or take advantage of
Trino-based products for all their data platform needs. Our sessions for Trino
Fest offered something to learn for everyone.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/blog/trino-fest-2024/hallway-chat.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;sponsors&quot;&gt;Sponsors&lt;/h2&gt;

&lt;p&gt;Bringing together the event was only possible thanks to the great Trino events
team around &lt;a href=&quot;https://www.linkedin.com/in/anna-schibli-418692172/&quot;&gt;Anna Schibli&lt;/a&gt;
at our main sponsor Starburst, and the assistance from all our other sponsors. A
heartfelt thank you from Commander Bun Bun and all of us go out to you!&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;Now, following is what you are really looking for. All the talks, speakers,
short recaps, slide decks, video recordings, and following Q&amp;amp;A sessions, ready
for you. Enjoy!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s new in Trino this summer&lt;/strong&gt;
&lt;br /&gt;Presented by Martin Traverso from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin recapped everything that’s happened in Trino over the last six months,
taking a look at the biggest new features and how Trino development is going
better than ever. He also gave a sneak peek at what we can expect soon in Trino.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=mk3n0_tAdZY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/keynote.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Reducing query cost and query runtimes of Trino powered analytics platforms&lt;/strong&gt;
&lt;br /&gt;Presented by Jonas Irgens Kylling from
&lt;a href=&quot;https://dune.com/&quot; target=&quot;_blank&quot;&gt;Dune&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Jonas gave a detailed talk about how Dune has improved their performance of
Trino with a few key tweaks. That includes leveraging caching with Alluxio,
advanced cluster management, and storing, sampling, and filtering query results.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=11yhPXIXiBY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/dune.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Enhancing Trino’s query performance and data management with Hudi: innovations and future&lt;/strong&gt;
&lt;br /&gt;Presented by Ethan Guo from
&lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;Onehouse&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ethan gave a look into development on Hudi and Trino’s Hudi connector,
explaining multi-modal indexing and how it can improve query performance. He
also gave an overview of the roadmap and future of the connector.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=JMzS2BbeK0E&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/onehouse.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Trino Engineering @ Microsoft&lt;/strong&gt;
&lt;br /&gt;Presented by George Fisher and Ishan Patwa from
&lt;a href=&quot;https://www.microsoft.com/&quot; target=&quot;_blank&quot;&gt;Microsoft&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;George and Ishan gave a deep dive into what’s been going on with Microsoft’s
deployment and management of Trino. This included clients and integrations,
result caching, a sharded SQL connector, deep debugging and monitoring, and
seamless security integration with Azure.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=t7ndqYUhKSA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Enhancing data governance in Trino with the OpenLineage integration&lt;/strong&gt;
&lt;br /&gt;Presented by Alok Kumar Prusty from
&lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Alok’s lightning talk is all about how Apple deployed OpenLineage, an open
framework for data lineage collection and analysis, and built a Trino plugin to
publish OpenLineage complaint events that can be viewed and monitored.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=A7hj1M7IYj8&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Best practices and insights when migrating to Apache Iceberg for data engineers&lt;/strong&gt;
&lt;br /&gt;Presented by Amit Gilad from
&lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;Cloudinary&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Amit shared how Cloudinary expanded their data lake to use Apache Iceberg. He
demonstrated how moving from Snowflake to an open table format allowed them to
reduce storage costs and leverage different query and processing engines to run
more powerful analytics at scale.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=dKQ2zShNlyQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/cloudinary.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Trino query intelligence: insights, recommendations, and predictions&lt;/strong&gt;
&lt;br /&gt;Presented by Marton Bod from &lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Marton’s lightning talk explored how Apple has monitored and stored metadata for
every Trino query execution, then used that data for for real-time cluster
dashboarding, self-service troubleshooting, and automatic generation of
recommendations for users.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=K3iSXOJNaSQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;The open source journey of the Trino Delta Lake Connector&lt;/strong&gt;
&lt;br /&gt;Presented by Marius Grama from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Marius went into a deep dive on all the work and collaboration that’s gone into
making the Delta Lake connector in Trino a robust, first-class connector. Casual
discussions, engineers working together, GitHub issues filed by the community,
and innovative contributions have all come together, and Marius’ talk shows why
an open source community is so powerful.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=mPfRYdvDcMo&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/delta-lake.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Tiny Trino; new perspectives in small data&lt;/strong&gt;
&lt;br /&gt;Presented by Ben Jeter and Thomas Zugibe from
&lt;a href=&quot;https://www.executivehomes.com/&quot; target=&quot;_blank&quot;&gt;Executive Homes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ben and Tommy explore how Executive Homes uses Trino’s robust suite of
integrations to handle data at a small scale. Instead of petabytes, how about a
handful of gigabytes in several different systems? It’s something that Trino is
well-equipped to handle thanks to how well-supported it is in the data
ecosystem, and they explain why.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=ZcY9LJDdB6Y&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/executive-homes.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Bridging the divide: running Trino SQL on a vector data lake powered by Lance&lt;/strong&gt;
&lt;br /&gt;Presented by Lei Xu from &lt;a href=&quot;https://lancedb.com/&quot; target=&quot;_blank&quot;&gt;LanceDB&lt;/a&gt;
and Noah Shpak from &lt;a href=&quot;https://character.ai/&quot; target=&quot;_blank&quot;&gt;Character.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lei and Noah give an overview of LanceDB, how it works, and what makes it a
great database for multimodal AI. Then they dive into a Trino connector for
Lance, and explore how Trino slots into Character.AI’s workload to blend
analytics with training and generating new models.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=jmOsVbGfon0&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/lance-characterai.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;How FourKites runs a scalable and cost-effective log analytics solution to
handle petabytes of logs&lt;/strong&gt;
&lt;br /&gt;Presented by Arpit Garg from
&lt;a href=&quot;https://www.fourkites.com/&quot; target=&quot;_blank&quot;&gt;FourKites&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With nearly a petabyte of logs being managed at FourKites, it shouldn’t be a
huge surprise that they’ve turned to Trino to handle understanding and analyzing
them. Arpit discusses how they’ve scaled log ingestion, strategically used S3
with Parquet to minimize storage costs, transformed and extracted those logs at
scale, and leveraged Trino to search and explore the datasets with Superset as a
frontend for visualization.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=xdCZBQJt-0g&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/fourkites.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Observing Trino&lt;/strong&gt;
&lt;br /&gt;Presented by Matt Stephenson from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Starburst has built a comprehensive observability platform around Trino to
better serve its users and customers. Matt explored all the components of it,
including how to integrate with Jaeger, Prometheus, and ELK.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=v7p72Ggcc5I&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/observing-trino.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Accelerate Performance at Scale: Best Practices for Trino with Amazon S3&lt;/strong&gt;
&lt;br /&gt;Presented by Dai Ozaki from &lt;a href=&quot;https://aws.amazon.com/&quot; target=&quot;_blank&quot;&gt;AWS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dai’s talk explores best practices to get the most out of using Trino in
conjunction with Amazon S3. He discusses partitioning, scaling workloads,
reducing latency, and resolving common bottlenecks, providing valuable insights
for anyone trying to manage and deploy Trino with S3.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=cjUUcHlUKxQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/aws-s3.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;While you are busy catching up, we are still working hard on a recap of the
Trino Contributor Congregation. We also had a lot of great conversations that
lead us to follow up action items such as more pull requests to review, new
contributors to onboard, and more projects to work on.&lt;/p&gt;

&lt;p&gt;Make sure you to &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;join the community on Slack&lt;/a&gt; to learn
more in the next little while.&lt;/p&gt;

&lt;p&gt;Oh, and one last thing…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=Trino-Fest-Blog-Recap&quot;&gt;
        Trino Summit 2024 registration is open
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you soon,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Cole, and Monica&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden, Monica Miller</name>
        </author>
      

      <summary>Trino Fest 2024 is successfully in the books! While over 100 enthusiastic members of the community gathered in Boston, over 650 virtual attendees joined us worldwide to learn from our expert speakers as they discussed topics such as table formats, enhancements and optimizations, and use cases with Trino both large and small. And now it is your chance to revisit the presentations or catch up on everything you missed.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/trino-fest-talk.jpg" />
      
    </entry>
  
    <entry>
      <title>One busy week to go before Trino Fest 2024</title>
      <link href="https://trino.io/blog/2024/06/06/trino-fest-last-call.html" rel="alternate" type="text/html" title="One busy week to go before Trino Fest 2024" />
      <published>2024-06-06T00:00:00+00:00</published>
      <updated>2024-06-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/06/06/trino-fest-last-call</id>
      <content type="html" xml:base="https://trino.io/blog/2024/06/06/trino-fest-last-call.html">&lt;p&gt;This week has surely started off with a big bang and another boom in the data
platform world. Snowflake &lt;a href=&quot;https://www.snowflake.com/blog/introducing-polaris-catalog/&quot;&gt;introduced the open source Polaris
catalog&lt;/a&gt; as
implementation of the Iceberg REST catalog specification. And Databricks, the
main driver of the Delta Lake table format, &lt;a href=&quot;https://www.databricks.com/blog/databricks-tabular&quot;&gt;announced their acquisition of
Tabular&lt;/a&gt;, a main driver in
the Apache Iceberg community.&lt;/p&gt;

&lt;p&gt;Interestingly enough, Trino is in the middle of all this with great support for
Delta Lake, Hudi, Iceberg, and also the Iceberg REST catalog. And if all that
interoperability with Trino is not enough reason to join us next week at Trino
Fest 2024, I have some more ideas for you to consider.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;reasons-to-attend-trino-fest&quot;&gt;Reasons to attend Trino Fest&lt;/h2&gt;

&lt;p&gt;Trino Fest is happening next week on the 13th of June, and following are all the
reasons I can think of why you should tune in.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The event is free for all attendees. It is available as an in-person event in
Boston and for virtual attendance across the rest of the world.&lt;/li&gt;
  &lt;li&gt;You can learn about real world experience with Trino, Delta Lake, Iceberg,
Hudi, and many &lt;a href=&quot;https://trino.io/ecosystem/index.html&quot;&gt;other data sources, clients, and add-ons&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Many Trino friends, users, and contributors from around the world and
companies like Amazon, Apple, Bloomberg, character.ai, Dune, LanceDB,
Microsoft, Onehouse and Starburst are going to attend and present.&lt;/li&gt;
  &lt;li&gt;Monica Miller and Manfred Moser will guide you through the event with the help
of the awesome Starburst Trino events team.&lt;/li&gt;
  &lt;li&gt;In-person attendees might just meet our mascot, Commander Bun Bun.&lt;/li&gt;
  &lt;li&gt;On the following day, the &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-congregation-14-june-2024&quot;&gt;Trino Contributor
Congregation&lt;/a&gt;
will dive super deep into technical details and collaborative efforts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Convinced yet, or still wondering. In either case, go and &lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;have a look at the
detailed agenda and then register to attend&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;
        Register now!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;And last, but not least thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>This week has surely started off with a big bang and another boom in the data platform world. Snowflake introduced the open source Polaris catalog as implementation of the Iceberg REST catalog specification. And Databricks, the main driver of the Delta Lake table format, announced their acquisition of Tabular, a main driver in the Apache Iceberg community. Interestingly enough, Trino is in the middle of all this with great support for Delta Lake, Hudi, Iceberg, and also the Iceberg REST catalog. And if all that interoperability with Trino is not enough reason to join us next week at Trino Fest 2024, I have some more ideas for you to consider.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>Big names round out the Trino Fest 2024 lineup</title>
      <link href="https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized.html" rel="alternate" type="text/html" title="Big names round out the Trino Fest 2024 lineup" />
      <published>2024-05-08T00:00:00+00:00</published>
      <updated>2024-05-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized</id>
      <content type="html" xml:base="https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized.html">&lt;p&gt;We gave
&lt;a href=&quot;/blog/2024/04/15/trino-fest-2024-approaches.html&quot;&gt;a sneak peek of the Trino Fest lineup a month ago&lt;/a&gt;,
and we’re excited to now bring you the full lineup for the event. We’ve got some
major names being added, including Amazon, Microsoft, and another talk from
Apple. With Fourkites and a joint talk with LanceDB and CharacterAI also added
to the schedule, we’re excited to present the
&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/#agenda&quot;&gt;full lineup for Trino Fest 2024&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Trino Fest is barely a month away on the 13th of June, and whether you want to
attend live in Boston or tune in virtually, this is a reminder that you
should &lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;register to attend!&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;trino-fest-the-contributor-congregation-and-logistics&quot;&gt;Trino Fest, the contributor congregation, and logistics&lt;/h2&gt;

&lt;p&gt;In case you missed
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;our announcement of Trino Fest&lt;/a&gt;,
it’s a hybrid event taking place from 9am-5pm Eastern Time on June 13th. It’ll
feature talks from a wide range of Trino users and contributors, with topics
ranging from use cases, migrations, cluster management and administration,
to lakehouse integrations and more. If you want to join us in-person, we’ll be at
the Hyatt Regency Boston. There will also be a meeting for Trino contributors
the day after the event at the Starburst office in Boston from 9am-1pm, and if
you’d be interested in attending that, please reach out to myself (Cole Bowden)
or Manfred Moser on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you still haven’t booked a hotel, we also have a discounted rate at the Hyatt
for the event to make life easy - whether that’s waking up and heading
downstairs for the start of the event, or being able to quickly duck back to
your room for a 30-minute meeting without missing too much. One link will take
you to a booking for just the night before the event, while the other allows
you to optionally book an extra night prior or include the night after Trino
Fest so you can stick around for the contributor congregation or explore Boston.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA4&quot;&gt;
        Book your hotel for June 12-13
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA3&quot;&gt;
        Book your hotel for June 11-14
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;and-dont-forget-those-additional-speakers&quot;&gt;And don’t forget those additional speakers&lt;/h2&gt;

&lt;p&gt;George Fisher, Ishan Patwa, and Oleg Savin will be diving deep into how Trino is
leveraged at Microsoft. While we’ve previously had LinkedIn at Trino events,
this is the first time the Trino community is getting to hear about the scale of
Trino within Microsoft proper, and with their plans to cover clients,
integrations, result caching, a sharded connector, visualization for monitoring,
and AKS deployment with Azure, there will be a lot to learn.&lt;/p&gt;

&lt;p&gt;Alok Kumar Prusty and Amogh Margoor from Apple will be joining the lineup to
discuss Trino query intelligence. With the mountain of query metadata, the team
at Apple has been able to better understand Trino usage and use that knowledge
to create impactful improvements for their Trino users. With dashboarding,
self-service troubleshooting, and automatic recommendations for query
optimization, Alok and Amogh will detail how a world-class engineering team can
take an awesome tool like Trino and make it even better for the end users.&lt;/p&gt;

&lt;p&gt;Also relatively new to the Trino community is discussing AI workloads. Lei Xu
from &lt;a href=&quot;https://lancedb.com/&quot;&gt;LanceDB&lt;/a&gt; and Noah Shpak from
&lt;a href=&quot;https://character.ai/&quot;&gt;character.ai&lt;/a&gt; will be highlighting exactly that,
using Trino as an analytics engine on top of a LanceDB-powered vector data lake.
With AI data so often being in a silo, analyzing it with a traditional SQL
workload is often expensive or complicated… but Lei and Noah will be
demonstrating how character.ai’s LanceDB/Trino pairing maintains the power of
both systems while making it easy.&lt;/p&gt;

&lt;p&gt;Dai Ozaki from Amazon will be diving into how to optimize Trino with S3. Given
how many people are using Trino with S3 already, hearing directly from Dai, an
engineer at Amazon, regarding best practices and optimizations should prove
beneficial for a massive chunk of the Trino community. Dai plans on talking
about how Trino and S3 interact, and how that knowledge can be used to get the
most out of your stack and avoid common bottlenecks.&lt;/p&gt;

&lt;p&gt;And last but not least, Aprit Garg from &lt;a href=&quot;https://www.fourkites.com/&quot;&gt;FourKites&lt;/a&gt;
will be discussing utilizing Trino to handle nearly a petabyte of logs.
FourKites is able to ingest massive amounts of logs, use S3 and
Parquet to keep storage costs low, transform and extract logs at scale, and then
use Trino as the engine to query those logs and reference them in context with
other data sets and data stores. Arpit will also touch on using Superset as a
frontend for Trino.&lt;/p&gt;

&lt;p&gt;And keep in mind - all of that is in addition to the talks we’ve already
announced!
&lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;Register to attend&lt;/a&gt;,
&lt;a href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA3&quot;&gt;book your hotel&lt;/a&gt;, and
the Trino community is looking forward to seeing you there!&lt;/p&gt;

&lt;p&gt;Thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>We gave a sneak peek of the Trino Fest lineup a month ago, and we’re excited to now bring you the full lineup for the event. We’ve got some major names being added, including Amazon, Microsoft, and another talk from Apple. With Fourkites and a joint talk with LanceDB and CharacterAI also added to the schedule, we’re excited to present the full lineup for Trino Fest 2024. Trino Fest is barely a month away on the 13th of June, and whether you want to attend live in Boston or tune in virtually, this is a reminder that you should register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>A sneak peek of Trino Fest 2024</title>
      <link href="https://trino.io/blog/2024/04/15/trino-fest-2024-approaches.html" rel="alternate" type="text/html" title="A sneak peek of Trino Fest 2024" />
      <published>2024-04-15T00:00:00+00:00</published>
      <updated>2024-04-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/04/15/trino-fest-2024-approaches</id>
      <content type="html" xml:base="https://trino.io/blog/2024/04/15/trino-fest-2024-approaches.html">&lt;p&gt;Trino Fest is drawing ever closer. Commander Bun Bun has been hard at work
behind the scenes arranging the schedule and making sure that Trino’s trip to
Boston is going to be a great one. In case you missed it,
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;we announced Trino Fest&lt;/a&gt;
a couple months ago, and if you &lt;em&gt;have&lt;/em&gt; missed it, make sure to go register to
attend! All our speakers will be in person in downtown Boston on the 13th of
June, with plenty of opportunities for networking and a happy hour event at the
end of the day. But if you can’t make the trip to enjoy the lovely New England
summer, we’ll also be live-streaming the event, and you can register to join us
virtually.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-2&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Still on the fence, though? Read on for a preview of our speaker lineup and
brief summaries of their talks. Keep in mind this also isn’t the full lineup,
and we’ll follow up soon with the last few talks that round out the schedule.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;a-brief-word-from-our-sponsors&quot;&gt;A brief word from our sponsors…&lt;/h2&gt;

&lt;p&gt;Thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;And now onto what you’re waiting for: a preview of most of the talks coming to
Trino Fest this year!&lt;/p&gt;

&lt;h2 id=&quot;lakehouses&quot;&gt;Lakehouses&lt;/h2&gt;

&lt;p&gt;It’s no secret that using Trino as part of your lakehouse has become one of its
major use cases in the past few years. We’re excited to say that at Trino Fest,
we’ll have representation for each of the modern big three table formats:
Iceberg, Delta Lake, and Hudi.&lt;/p&gt;

&lt;h3 id=&quot;iceberg&quot;&gt;Iceberg&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt; will be covered twice: Amogh
Jahagirdar from &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt; will be diving into the world of
Iceberg views and how they can be leveraged to coordinate across different query
languages and dialects. Amit Gilad from &lt;a href=&quot;https://cloudinary.com/&quot;&gt;Cloudinary&lt;/a&gt;
will be covering the story of migrating out of Snowflake to the wonderful world
of open table formats and Iceberg.&lt;/p&gt;

&lt;h3 id=&quot;delta-lake&quot;&gt;Delta Lake&lt;/h3&gt;

&lt;p&gt;Marius Grama, a Trino contributor at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;,
will be going into detail on the history, development, and improvements to the
&lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; connector. With
&lt;a href=&quot;/blog/2024/04/11/time-travel-delta-lake.html&quot;&gt;time travel for the Delta Lake connector&lt;/a&gt;
landing in Trino 445, it’s one of the most exciting areas for development in
open source Trino, and there’s some interesting stories that Marius is excited
to share with the community.&lt;/p&gt;

&lt;h3 id=&quot;hudi&quot;&gt;Hudi&lt;/h3&gt;

&lt;p&gt;Rounding out data lakes, Ethan Guo from &lt;a href=&quot;https://www.onehouse.ai/&quot;&gt;Onehouse&lt;/a&gt;
will be diving into Trino’s &lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Hudi&lt;/a&gt; connector, giving
an update on what’s landed lately to improve performance and functionality.
He’ll also give a preview of what’s coming soon. The features are flying in, and
if you’re a current or prospective user of Hudi with Trino, you won’t want to
miss out.&lt;/p&gt;

&lt;h2 id=&quot;data-takes&quot;&gt;Data takes&lt;/h2&gt;

&lt;p&gt;Of course, there’s more to Trino than querying data lakes, and there’s a wide
variety of talks to discuss the other activities going on within the Trino
community.&lt;/p&gt;

&lt;h3 id=&quot;small-scale&quot;&gt;Small scale&lt;/h3&gt;

&lt;p&gt;Ben Jeter at &lt;a href=&quot;https://www.executivehomes.com/&quot;&gt;Executive Homes&lt;/a&gt;, who gave
&lt;a href=&quot;/blog/2023/07/25/trino-fest-2023-datto.html&quot;&gt;a talk at Trino Fest last year&lt;/a&gt;
while at &lt;a href=&quot;https://www.datto.com/&quot;&gt;Datto&lt;/a&gt;, is back to discuss running Trino at a
more moderate scale than that we’re used to hearing about in the Trino space.
Forget petabytes and exabytes, and welcome a tiny cluster querying thousands,
not millions, of records that still derives huge value from Trino. It’s a great
playbook for smaller startups and enterprises who still need robust, flexible,
performant analytics.&lt;/p&gt;

&lt;h3 id=&quot;maximizing-performance&quot;&gt;Maximizing performance&lt;/h3&gt;

&lt;p&gt;Jonas Kylling from &lt;a href=&quot;https://dune.com/about&quot;&gt;Dune&lt;/a&gt; will be detailing how they’ve
managed to optimize Trino and squeeze out every ounce of performance to reduce
query costs and runtimes. That includes leveraging the new Alluxio-based file
system caching, emulating various cluster sizes to avoid expensive idle cluster
time, and storing, sampling, and filtering query results to avoid re-executing
queries.&lt;/p&gt;

&lt;h3 id=&quot;query-intelligence&quot;&gt;Query intelligence&lt;/h3&gt;

&lt;p&gt;Marton Bod and Vinitha Gankidi from Apple bring insights to query intelligence.
They’ll demonstrate how Apple has understood when their clusters are most
utilized and who’s using them, enabling slicing and dicing along different
dimensions. Having a query intelligence dataset can be used for real-time
cluster dashboarding, self-service troubleshooting, and automatic generation of
recommendations for users, all of which can empower Trino to be better than
ever.&lt;/p&gt;

&lt;h2 id=&quot;and-more&quot;&gt;And more!&lt;/h2&gt;

&lt;p&gt;Of course, Trino’s own Martin Traverso will be giving a keynote on the latest
and greatest in the project, covering everything big that’s landed since Trino
Summit, as well as a glimpse at the roadmap for the project in the coming few
months. Several other big talks are falling into place that we can’t announce
just yet, so stay tuned for more info as the event draws nearer.&lt;/p&gt;

&lt;h2 id=&quot;trino-contributor-congregation&quot;&gt;Trino contributor congregation&lt;/h2&gt;

&lt;p&gt;The day after Trino Fest, we’ll also be hosting an in-person meetup for
Trino contributors and engineers to catch up, discuss the Trino roadmap, and
engage directly with the maintainers in-person. It’s a great opportunity to put
faces and voices to those GitHub handles, align on the big ideas or tricky PRs
that have been moving slowly, and find more ways to get involved in Trino
development. If you’re interested in attending, message Manfred Moser or Cole
Bowden on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and we’ll get you added to
the attendee list and share more details.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>Trino Fest is drawing ever closer. Commander Bun Bun has been hard at work behind the scenes arranging the schedule and making sure that Trino’s trip to Boston is going to be a great one. In case you missed it, we announced Trino Fest a couple months ago, and if you have missed it, make sure to go register to attend! All our speakers will be in person in downtown Boston on the 13th of June, with plenty of opportunities for networking and a happy hour event at the end of the day. But if you can’t make the trip to enjoy the lovely New England summer, we’ll also be live-streaming the event, and you can register to join us virtually. Register to attend! Still on the fence, though? Read on for a preview of our speaker lineup and brief summaries of their talks. Keep in mind this also isn’t the full lineup, and we’ll follow up soon with the last few talks that round out the schedule.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>Time travel in Delta Lake connector</title>
      <link href="https://trino.io/blog/2024/04/11/time-travel-delta-lake.html" rel="alternate" type="text/html" title="Time travel in Delta Lake connector" />
      <published>2024-04-11T00:00:00+00:00</published>
      <updated>2024-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/04/11/time-travel-delta-lake</id>
      <content type="html" xml:base="https://trino.io/blog/2024/04/11/time-travel-delta-lake.html">&lt;p&gt;Exciting news - time travel capability has finally arrived in the Delta Lake
connector! After introducing support for time travel in the Iceberg connector
back in 2022, we’re thrilled to announce that the Delta Lake connector now joins
the ranks as the second connector offering this feature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;Time travel as a feature has a number of practical use cases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Data recovery and rollback&lt;/strong&gt;: In the event of data corruption or erroneous
 updates, time travel allows users to roll back to a previous version of the
 data, restoring it to a known good state.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Auditing and compliance&lt;/strong&gt;: Time travel enables auditors and compliance
 teams to analyze data changes over time, ensuring regulatory compliance and
 providing transparency into data operations.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Historical analysis&lt;/strong&gt;: Data analysts and data scientists can perform
 historical analysis by querying data at different points in time, uncovering
 trends, patterns, and anomalies that may not be apparent in current data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;time-travel-sql-example&quot;&gt;Time travel SQL example&lt;/h2&gt;

&lt;p&gt;Start by creating a catalog &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example&lt;/code&gt; with the &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
connector&lt;/a&gt;, create a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;demo&lt;/code&gt;
schema, and make it the current catalog with the
&lt;a href=&quot;https://trino.io/docs/current/sql/use.html&quot;&gt;USE&lt;/a&gt; statement.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;demo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s create a Delta Lake table, add some data, modify the table and add some
more data using the following SQL statement:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_mapping_mode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;name&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Alice&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Bob&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Mallory&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ALTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COLUMN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Use the following statement to look at all data in the table:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; id
----
  1
  2
  3
  4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$history&lt;/code&gt; metadata table offers a record of past operations:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;operation&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;users$history&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; version |             timestamp              |  operation
---------+------------------------------------+--------------
       0 | 2024-04-10 17:49:18.528 Asia/Tokyo | CREATE TABLE
       1 | 2024-04-10 17:49:18.755 Asia/Tokyo | WRITE
       2 | 2024-04-10 17:49:18.929 Asia/Tokyo | DROP COLUMNS
       3 | 2024-04-10 17:49:19.137 Asia/Tokyo | WRITE
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can specify the version using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR VERSION AS OF&lt;/code&gt;. For example, to time
travel to version 1, which includes a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WRITE&lt;/code&gt; operation, the query would look
like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VERSION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, time travel not only rolls back the data but also the table definition:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----+---------&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Alice&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Bob&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Mallory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;technical-details&quot;&gt;Technical details&lt;/h2&gt;

&lt;p&gt;Delta Lake manages transaction logs in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory located under
the table’s specified location.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Last checkpoint&lt;/strong&gt;: The optional file that manages the last checkpoint
version is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_last_checkpoint&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Delta log entries&lt;/strong&gt;: The JSON file contains an atomic set of actions, for
example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000000.json&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Checkpoints&lt;/strong&gt;: The Parquet file contains the complete replay of all actions,
up to and including the checkpointed table version, for example
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000010.checkpoint.parquet&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More details are available in the &lt;a href=&quot;https://github.com/delta-io/delta/blob/master/PROTOCOL.md&quot;&gt;Delta Lake protocol
documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Following is an example of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;00000000000000000000.json
00000000000000000001.json
00000000000000000002.json
00000000000000000003.json
00000000000000000003.checkpoint.parquet
00000000000000000004.json
00000000000000000005.json
...
_last_checkpoint
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When the specified version is older than the last checkpoint, such as version 2,
the connector reads the transaction log files starting from the initial
checkpoint file (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000000.json&lt;/code&gt;) up to the specified version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000002.json&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When the specified version is equal to the last checkpoint, in our example
version 3, the connector reads only the checkpoint file for that version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000003.checkpoint.parquet&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When the specified version is newer than the last checkpoint, so version 4, the
connector reads the checkpoint file for the last checkpoint version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000003.checkpoint.parquet&lt;/code&gt;) and the transaction log file for the
specified version (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000004.json&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The actual logic without the last checkpoint is more complex because the
connector cannot determine the checkpoints without listing file names in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Time travel in the Trino &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
connector&lt;/a&gt; opens up new
possibilities for data exploration and analysis, empowering users to delve into
the past and derive insights from historical data. By seamlessly integrating
with Delta Lake’s versioning and transaction logs, Trino provides a powerful
tool for querying data as it appeared at different points in time. Whether it’s
auditing, historical analysis, or data recovery, time travel adds a valuable
dimension to data-driven decision-making, making it an indispensable feature for
modern data platforms.&lt;/p&gt;

&lt;h2 id=&quot;bonus&quot;&gt;Bonus&lt;/h2&gt;

&lt;p&gt;Join us for &lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024&lt;/a&gt; where &lt;a href=&quot;https://github.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt; presents &lt;em&gt;“The open
source journey of the Trino Delta Lake connector”&lt;/em&gt; and shares more tips and
tricks.&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>Exciting news - time travel capability has finally arrived in the Delta Lake connector! After introducing support for time travel in the Iceberg connector back in 2022, we’re thrilled to announce that the Delta Lake connector now joins the ranks as the second connector offering this feature.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/trino-delta.png" />
      
    </entry>
  
    <entry>
      <title>Blazing ahead with 22</title>
      <link href="https://trino.io/blog/2024/03/13/java-22.html" rel="alternate" type="text/html" title="Blazing ahead with 22" />
      <published>2024-03-13T00:00:00+00:00</published>
      <updated>2024-03-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/03/13/java-22</id>
      <content type="html" xml:base="https://trino.io/blog/2024/03/13/java-22.html">&lt;p&gt;It was not that long ago that we &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;first announced support for Java 21&lt;/a&gt;, and subsequently made it a build and runtime
requirement with &lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since then, the codebase received some significant improvements in readability,
and we have also seen better performance. However, innovation in Trino and Java
is not holding still, on the contrary - it’s accelerating. On the Java
community side, Java 22 is just about to be released, and we think it is time
to drive innovation in Trino even further. Trino is going to use and require
Java 22 soon!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;The planned move to use and require Java 22 for build and runtime of Trino is
driven by numerous aspects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Take advantage of performance and runtime improvements of the new JVM version.&lt;/li&gt;
  &lt;li&gt;Use the newly available language features to further improve readability and
maintenance aspects of the codebase.&lt;/li&gt;
  &lt;li&gt;Enable the use of further performance improvements for Trino under the umbrella
of &lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Attract and motivate more contributors for Trino as an opportunity to work
with a modern Java stack on a cutting edge, complex application and work with
the relevant language features and APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Speaking about APIs and new features, let’s look at a list of JDK Enhancement
Proposals (JEPs) that we are actively looking at. Specifically we plan to
experiment, and adopt any non-preview JEPs where we see benefits. We also plan
to submit any issues and problems we encounter back upstream to the Java
community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Region Pinning for G1 (&lt;a href=&quot;https://openjdk.org/jeps/423&quot;&gt;JEP 423&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Foreign Function &amp;amp; Memory API (&lt;a href=&quot;https://openjdk.org/jeps/454&quot;&gt;JEP 454&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Unnamed Variables and Patterns (&lt;a href=&quot;https://openjdk.org/jeps/456&quot;&gt;JEP 456&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Class File API in preview (&lt;a href=&quot;https://openjdk.org/jeps/457&quot;&gt;JEP 457&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;String Templates in second preview (&lt;a href=&quot;https://openjdk.org/jeps/459&quot;&gt;JEP 459&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Vector API in 7th incubator (&lt;a href=&quot;https://openjdk.org/jeps/460&quot;&gt;JEP 460&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Structured Concurrency in second preview  (&lt;a href=&quot;https://openjdk.org/jeps/462&quot;&gt;JEP 462&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Scoped Values in second preview  (&lt;a href=&quot;https://openjdk.org/jeps/464&quot;&gt;JEP 464&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of these API’s allow us to further modernize the feature set of Trino and
adapt it to current hardware and compute power realities. Specifically we can
continue with our commitment to the Java ecosystem and avoid many of the
complexities and pitfalls of JNI - the traditional, now legacy integrations with
native code and specific hardware features.&lt;/p&gt;

&lt;p&gt;Another aspect some of you might wonder about is the move from a Java LTS
version to a Java STS release – from “long term support” to “short term
support”. So far Trino was using Java 8, Java 11, Java 17, and then Java 21 as
requirements. Since all of them are LTS releases, some of you might have
concluded that we have a policy of only using Java LTS versions. That is not the
case, it is only a coincidence.&lt;/p&gt;

&lt;p&gt;We always thrived to use up to date source code, dependencies, runtime
environments, and so forth. The benefits, including better performance,
available and included bug fixes, reduced need for backports, less security
issues, and support for modern language features, development environments, and
tooling, always far outweighed the effort of staying up to date.&lt;/p&gt;

&lt;p&gt;We are now finally at the long planned status where we can move quick enough as
a project to use latest tools, dependencies, and Java releases and keep
iterating on our frequent releases. And that is exactly what we are doing for
the benefit of everyone contributing to Trino and using Trino. Java 22 now. And
then later this year we can move to Java 23, and next year to 24 and 25.&lt;/p&gt;

&lt;p&gt;So what are we specifically doing now?&lt;/p&gt;

&lt;h2 id=&quot;current-status-and-plans&quot;&gt;Current status and plans&lt;/h2&gt;

&lt;p&gt;Java 22 is scheduled to ship in March 2024. The various JDK distribution
binary packages will become available shortly after the official release.&lt;/p&gt;

&lt;p&gt;Early access source and binaries are already available, and our continuous
integration builds already use such an EA build successfully.&lt;/p&gt;

&lt;p&gt;Overall the transition is going well. Our plan is to follow the same approach as
our switch to Java 21:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ensure everything works with Java 22.&lt;/li&gt;
  &lt;li&gt;Change the container image to use Java 22.&lt;/li&gt;
  &lt;li&gt;Cut a release and get community feedback from testing with the container.&lt;/li&gt;
  &lt;li&gt;Adjust to any feedback and available improvements for a few releases.&lt;/li&gt;
  &lt;li&gt;Switch the requirement for build and runtime to Java 22.&lt;/li&gt;
  &lt;li&gt;Cut another release and celebrate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then the real fun starts all over. We can update code, libraries, and start
working with the new APIs. Timing on all the work depends on obstacles we find
on the way and how we progress with removing them.&lt;/p&gt;

&lt;p&gt;We use the &lt;a href=&quot;https://github.com/trinodb/trino/issues/20980&quot;&gt;Java 22 tracking
issue&lt;/a&gt; and the linked issues and
pull requests to manage progress, discuss next steps, and work with the
community.&lt;/p&gt;

&lt;p&gt;Feel free to chime in there or find us on the &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX&quot;&gt;#dev
channel&lt;/a&gt; on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino community
Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Join us in this exciting next step for Trino.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update from 8 May 2024:&lt;/strong&gt;
The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino 447&lt;/a&gt;
includes the switch to Java 22 as a requirement for running Trino.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>It was not that long ago that we first announced support for Java 21, and subsequently made it a build and runtime requirement with Trino 436. Since then, the codebase received some significant improvements in readability, and we have also seen better performance. However, innovation in Trino and Java is not holding still, on the contrary - it’s accelerating. On the Java community side, Java 22 is just about to be released, and we think it is time to drive innovation in Trino even further. Trino is going to use and require Java 22 soon!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-22.png" />
      
    </entry>
  
    <entry>
      <title>A cache refresh for Trino</title>
      <link href="https://trino.io/blog/2024/03/08/cache-refresh.html" rel="alternate" type="text/html" title="A cache refresh for Trino" />
      <published>2024-03-08T00:00:00+00:00</published>
      <updated>2024-03-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/03/08/cache-refresh</id>
      <content type="html" xml:base="https://trino.io/blog/2024/03/08/cache-refresh.html">&lt;p&gt;Thinking about our recent work on caching in Trino reminds me of the famous
saying, &lt;a href=&quot;https://www.karlton.org/2017/12/naming-things-hard/&quot;&gt;“There are only two hard things in computer science: cache invalidation
and naming things&lt;/a&gt;.” Well,
in the Trino community we know all about caching and naming. With the recent
&lt;a href=&quot;https://trino.io/docs/current/release/release-439.html&quot;&gt;Trino 439 release&lt;/a&gt;, caching
from object storage file systems got a refresh. Catalogs using the Delta Lake,
Hive, Iceberg, and soon Hudi connectors now get to access performance benefits
from the new Alluxio-powered file system caching.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;in-the-past&quot;&gt;In the past&lt;/h2&gt;

&lt;p&gt;So how did we get here? A long, long time ago, Qubole open-sourced a &lt;a href=&quot;https://github.com/qubole/rubix&quot;&gt;light
light-weight data caching framework called
RubiX&lt;/a&gt;. The library was integrated into the
Trino Hive connector, and it enabled &lt;a href=&quot;https://trino.io/docs/438/connector/hive-caching.html&quot;&gt;Hive connector storage
caching&lt;/a&gt;. But over time, any
open source project without active maintenance becomes stale. And like a stale
cache, a stale open source project can cause issues, or becomes outdated and
unsuitable for modern use. Though RubiX had once served Trino well, it was time
to remove the dust, and RubiX had to go.&lt;/p&gt;

&lt;h2 id=&quot;making-progress&quot;&gt;Making progress&lt;/h2&gt;

&lt;p&gt;Catching back up to 2024, Trino now includes powerful connectors for the modern
lakehouse formats Delta Lake, Hudi, and Iceberg:&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/delta-lake.png&quot; title=&quot;Delta Lake connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/apache-hudi.png&quot; title=&quot;Hudi connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/apache-iceberg.png&quot; title=&quot;Iceberg connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Hive is still around, just like HDFS, but we consider them both close to legacy
status. Yet all four connectors could benefit from caching. Good news came at
Trino Summit 2022 when Hope Wang and Beinan Wang from
&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#alluxio&quot;&gt;Alluxio&lt;/a&gt; presented about their
integration with Trino and the Hive connector - &lt;a href=&quot;/blog/2023/07/21/trino-fest-2023-alluxio-recap.html&quot;&gt;Trino optimization with
distributed caching on data lake&lt;/a&gt;. They mentioned plans to open
source their implementation and an initial pull request (PR) was created.&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;&lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio.png&quot; title=&quot;Alluxio&quot; /&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;&lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&quot;collaboration&quot;&gt;Collaboration&lt;/h2&gt;

&lt;p&gt;The initial presentation and PR planted a seed in the community. The Trino
project had been moving fast in terms of deprecating the old dependencies from
the Hadoop and Hive ecosystem, so the initial Alluxio PR was no longer up to
date and compatible with latest Trino version. Discussions with &lt;a href=&quot;https://github.com/electrum&quot;&gt;David
Phillips&lt;/a&gt; laid out the path to adjust to the new
file system support and get ready for reviews towards a merge.&lt;/p&gt;

&lt;p&gt;In the end it was &lt;a href=&quot;https://github.com/pluies&quot;&gt;Florent Delannoy&lt;/a&gt; who started
another &lt;a href=&quot;https://github.com/trinodb/trino/pull/18719&quot;&gt;PR for file system caching support, specifically for the Delta Lake
connector&lt;/a&gt;. His teammate &lt;a href=&quot;https://github.com/jkylling&quot;&gt;Jonas
Irgens Kylling&lt;/a&gt;, also a &lt;a href=&quot;/blog/2023/07/14/trino-fest-2023-dune.html&quot;&gt;presenter from Trino Fest
2023&lt;/a&gt;, took over the work on the
PR. The collaboration on it was an &lt;strong&gt;epic effort&lt;/strong&gt;. After many months of time,
over 300 comments directly on GitHub and numerous hours of coding, reviewing,
testing, and discussion on Slack and elsewhere the work finally resulted in a
successful merge, and therefore inclusion in the next release.&lt;/p&gt;

&lt;p&gt;Special props for their help for Florent and Jonas must go out to &lt;a href=&quot;https://github.com/electrum&quot;&gt;David
Phillips&lt;/a&gt;, &lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;Raunaq
Morarka&lt;/a&gt;, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr
Findeisen&lt;/a&gt;, &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz
Gajewski&lt;/a&gt;, &lt;a href=&quot;https://github.com/beinan&quot;&gt;Beinan Wang&lt;/a&gt;,
&lt;a href=&quot;https://github.com/amoghmargoor&quot;&gt;Amogh Margoor&lt;/a&gt;, &lt;a href=&quot;https://github.com/osscm&quot;&gt;Manish
Malhorta&lt;/a&gt;, and &lt;a href=&quot;https://github.com/marton-bod&quot;&gt;Marton
Bod&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;finishing&quot;&gt;Finishing&lt;/h2&gt;

&lt;p&gt;In parallel to the work on the initial PR for Delta Lake, yours truly ended up
working on the documentation, and pulled together an &lt;a href=&quot;https://github.com/trinodb/trino/issues/20550&quot;&gt;issue and conversations to
streamline the roll out&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt; had also put together a PR to
remove the old RubiX integration already. With the merge of the initial PR we
were off to the races. We merged the removal of RubiX and the addition of the
docs. Mateusz also added support for OpenTelemetry.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/osscm&quot;&gt;Manish Malhorta&lt;/a&gt; and &lt;a href=&quot;https://github.com/amoghmargoor&quot;&gt;Amogh
Margoor&lt;/a&gt; sent a PR for Iceberg support. They
were also about to add Hive support, when &lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;Raunaq
Morarka&lt;/a&gt; beat them and submitted that PR.&lt;/p&gt;

&lt;p&gt;After some final clean up, &lt;a href=&quot;https://github.com/colebow&quot;&gt;Cole Bowden&lt;/a&gt; and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin
Traverso&lt;/a&gt; got the release notes together and shipped
&lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino 439&lt;/a&gt;! Now you can use
it, too.&lt;/p&gt;

&lt;h2 id=&quot;using-file-system-caching&quot;&gt;Using file system caching&lt;/h2&gt;

&lt;p&gt;There are only a few relatively simple steps to add file system caching to your
catalogs that use Delta Lake, Hive, or Iceberg connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Provision fast local file system storage on all your Trino cluster nodes. How
you do that depends on your cluster provisioning.&lt;/li&gt;
  &lt;li&gt;Enable file system caching and configure the cache location, for example at
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/trino-cache&lt;/code&gt; on the nodes, in your catalog properties files.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;fs.cache.enabled=true
fs.cache.directories=/tmp/trino-cache
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After a cluster restart, file system caching is active for the configured
catalogs, and you can tweak it with &lt;a href=&quot;https://trino.io/docs/current/object-storage/file-system-cache.html&quot;&gt;further, optional configuration
properties&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;What a success! It took many members from the global Trino village to get this
feature added. Now our users across the globe can enjoy even more benefits of
using Trino, and also participate in our next steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Further improvements to the current implementation, maybe adding
worker-to-worker connections for exchanging cached files.&lt;/li&gt;
  &lt;li&gt;Preparation to add file system caching with the Hudi connector is in progress
with &lt;a href=&quot;https://github.com/codope&quot;&gt;Sagar Sumit&lt;/a&gt; and &lt;a href=&quot;https://github.com/yihua&quot;&gt;Y Ethan
Guo&lt;/a&gt; and implementation is following next.&lt;/li&gt;
  &lt;li&gt;Adjust to any learnings from production usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our thanks, and those from all current and future users, go out to everyone
involved in this effort. What are we going to do next?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;PS: If you want to share your use of Trino or connect with other Trino users,
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;join us for the free Trino Fest 2024&lt;/a&gt; as speaker or attendee live in Boston,
or virtually from your home.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Thinking about our recent work on caching in Trino reminds me of the famous saying, “There are only two hard things in computer science: cache invalidation and naming things.” Well, in the Trino community we know all about caching and naming. With the recent Trino 439 release, caching from object storage file systems got a refresh. Catalogs using the Delta Lake, Hive, Iceberg, and soon Hudi connectors now get to access performance benefits from the new Alluxio-powered file system caching.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-cache-refresh.png" />
      
    </entry>
  
    <entry>
      <title>Japanese edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp.html" rel="alternate" type="text/html" title="Japanese edition of Trino: The Definitive Guide" />
      <published>2024-02-27T00:00:00+00:00</published>
      <updated>2024-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp.html">&lt;p&gt;Do you know where the name ‘Trino’ comes from? It’s actually a shortened form of
‘neutrino’. These fast and lightweight subatomic particles have recently made
their way to Japan. You can now reserve your copy of the Japanese edition of
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that the Japanese translation of the book
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt; is
available for the communities all across Japan and far beyond. Preorder today
and get your copy from the first batch in the middle of March. Hopefully it can
lower the barrier to Trino for native speakers. We invite you all to get your
own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hanmoto.com/bd/isbn/9784798071671&quot;&gt;
        分散SQLクエリエンジンTrino徹底ガイド 秀和システム
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks goes out Masanori Nishida and his teams at Shuwa System. I would also
like to thank my great team of translators and collaborators, &lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Kai
Sasaki&lt;/a&gt;, &lt;a href=&quot;https://github.com/aajisaka&quot;&gt;Akira
Ajisaka&lt;/a&gt;, &lt;a href=&quot;https://github.com/eurekaeru&quot;&gt;Kaname
Nishizuka&lt;/a&gt;, and &lt;a href=&quot;https://github.com/mikiT&quot;&gt;Miki
Takata&lt;/a&gt; for their help in making the book a reality.
We hope many readers can benefit from the translated edition.&lt;/p&gt;

&lt;p&gt;We look forward to chatting with many of our new readers and Trino users on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=general-jp&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;general-jp&lt;/code&gt;&lt;/a&gt;
channel in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;, other
channels, and direct messaging.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of &lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino in the upcoming Trino
Fest 2024 as a speaker. Or just register to attend the free event&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yuya Ebihara&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>Do you know where the name ‘Trino’ comes from? It’s actually a shortened form of ‘neutrino’. These fast and lightweight subatomic particles have recently made their way to Japan. You can now reserve your copy of the Japanese edition of Trino: The Definitive Guide!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-jp-cover.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino Fest goes to Boston in 2024</title>
      <link href="https://trino.io/blog/2024/02/20/announcing-trino-fest-2024.html" rel="alternate" type="text/html" title="Trino Fest goes to Boston in 2024" />
      <published>2024-02-20T00:00:00+00:00</published>
      <updated>2024-02-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/20/announcing-trino-fest-2024</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/20/announcing-trino-fest-2024.html">&lt;p&gt;After the resounding success of Trino Fest and Trino Summit in 2023, Commander
Bun Bun has exciting news to share: we’re taking our biggest events of the year
back to being in-person. They’ll be hybrid, to be more specific, so if you can’t
travel, don’t fret, you’ll still be able to watch and ask questions in chat.
But if you can travel, you won’t want to miss out! Everything you already know
and love about Trino Fest is moving to the East Coast for the lovely Boston
summer. The event is on the 13th of June in the Hyatt Regency Boston, where
we’ll have a full day of talks, time to network, and a happy hour at the end of
the day. You may even get to meet Commander Bun Bun, who’s ditching the hiking
gear in favor of training for the Olympics. Sound exciting?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-1&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;join-us-in-person&quot;&gt;Join us in person&lt;/h2&gt;

&lt;p&gt;Our event will be hosted at the Hyatt Regency in Boston, where we are planning a
full day of festivities followed by a happy hour on the Hyatt Regency deck.
There is a
&lt;a href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA4&quot;&gt;discounted room block&lt;/a&gt;
set aside for those interested in attending live and staying with us in Boston.
If you are looking to book hotel dates in addition to what is provided on the
room block, email &lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt;,
and they will help you coordinate your reservation.&lt;/p&gt;

&lt;p&gt;Regardless of whether you plan on attending in person or online, you do need to
register, so make sure to click the button above!&lt;/p&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;Interested in speaking? We want to hear from everyone in the Trino community
who has something to share. If you aren’t sure whether it’s worth it to submit,
submit anyway! We’ll review all submissions, and we’ll do our best to work with
you to turn your talk into a smash hit. We are looking for both full sessions
(about 30 minutes) and lightning talks (10-15 minutes). We welcome intermediate
to advanced submissions for talks that are connected to Trino on any of the
following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Best practices and use cases&lt;/li&gt;
  &lt;li&gt;Data migrations&lt;/li&gt;
  &lt;li&gt;Optimizations and performance improvements&lt;/li&gt;
  &lt;li&gt;Data governance&lt;/li&gt;
  &lt;li&gt;Data engineering, including batch and streaming architectures&lt;/li&gt;
  &lt;li&gt;Data science&lt;/li&gt;
  &lt;li&gt;SQL analytics and BI&lt;/li&gt;
  &lt;li&gt;Cloud data lake use cases&lt;/li&gt;
  &lt;li&gt;Data lake architecture&lt;/li&gt;
  &lt;li&gt;Query federation&lt;/li&gt;
  &lt;li&gt;Table formats&lt;/li&gt;
  &lt;li&gt;Data ingestion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to speak?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-fest-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;-trino-contributor-congregation&quot;&gt;&lt;a name=&quot;tcc&quot;&gt;&lt;/a&gt; Trino contributor congregation&lt;/h2&gt;

&lt;p&gt;The day after Trino Fest, we’ll also be hosting an in-person meetup for
Trino contributors and engineers to catch up, discuss the Trino roadmap, and
engage directly with the maintainers in-person. It’s a great opportunity to put
faces and voices to those GitHub handles, align on the big ideas or tricky PRs
that have been moving slowly, and find more ways to get involved in Trino
development. If you’re interested in attending, message Manfred Moser or Cole
Bowden on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and we’ll get you added to
the attendee list and share more details.&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-fest&quot;&gt;Sponsor Trino Fest&lt;/h2&gt;

&lt;p&gt;Starburst is the organizing sponsor of the event, but to make Trino Fest a
smashing success, they’re excited and interested in collaborating with other
organizations within the community. If you are interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt; for information.&lt;/p&gt;

&lt;p&gt;And regardless of whether you’re planning on attending, speaking, or sponsoring,
we look forward to seeing you soon!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>After the resounding success of Trino Fest and Trino Summit in 2023, Commander Bun Bun has exciting news to share: we’re taking our biggest events of the year back to being in-person. They’ll be hybrid, to be more specific, so if you can’t travel, don’t fret, you’ll still be able to watch and ask questions in chat. But if you can travel, you won’t want to miss out! Everything you already know and love about Trino Fest is moving to the East Coast for the lovely Boston summer. The event is on the 13th of June in the Hyatt Regency Boston, where we’ll have a full day of talks, time to network, and a happy hour at the end of the day. You may even get to meet Commander Bun Bun, who’s ditching the hiking gear in favor of training for the Olympics. Sound exciting? Register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>Open Policy Agent for Trino arrived</title>
      <link href="https://trino.io/blog/2024/02/06/opa-arrived.html" rel="alternate" type="text/html" title="Open Policy Agent for Trino arrived" />
      <published>2024-02-06T00:00:00+00:00</published>
      <updated>2024-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/06/opa-arrived</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/06/opa-arrived.html">&lt;p&gt;Trino now ships with an access control integration using the popular and widely
used &lt;a href=&quot;https://www.openpolicyagent.org/&quot;&gt;Open Policy Agent (OPA)&lt;/a&gt; from the Cloud Native
Computing Foundation. The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino
438&lt;/a&gt; marks an important
milestone of the effort towards this integration.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;collaboration-and-history&quot;&gt;Collaboration and history&lt;/h2&gt;

&lt;p&gt;Open Policy Agent was first released in 2016 and has gained more and more
popularity in the ecosystem of cloud native applications and beyond.&lt;/p&gt;

&lt;p&gt;Initial efforts for an integration with Trino started at Bloomberg, Stackable,
Raft, and other places separately and sometimes in parallel, with only partial
collaboration. You might have first heard about it in August 2022 in the &lt;a href=&quot;https://trino.io/episodes/39.html&quot;&gt;Trino
Community Broadcast episode 39&lt;/a&gt; with a team from
Raft as guests.&lt;/p&gt;

&lt;p&gt;Usage and experience with OPA grew. In the end, Pablo Arteaga from
&lt;a href=&quot;https://www.techatbloomberg.com/&quot;&gt;Bloomberg&lt;/a&gt; and Sebastian Bernauer and Sönke
Liebau from &lt;a href=&quot;https://stackable.tech/&quot;&gt;Stackable&lt;/a&gt; had the initiative to start a
pull request to Trino. Their persistence and collaboration led them through many
review comments, update commits, and even a second PR, to submit a talk and
eventually present at Trino Summit 2023 about the Open Policy Agent access
control with Trino and their motivation to move from Apache Ranger to OPA.&lt;/p&gt;

&lt;h2 id=&quot;opa-at-trino-summit-2023&quot;&gt;OPA at Trino Summit 2023&lt;/h2&gt;

&lt;p&gt;The presentation from Pablo and Sönke titled “Trino OPA authorizer - An open
source love story” received a lot of interest from the audience at the event and
on YouTube since then. They detailed the architectural differences of using
Ranger and OPA. Sönke detailed the usage of OPA in the Stackable platform and
how it enables a single access control platform to apply across many systems.
They discussed their collaboration on the pull request, and Pablo showed a
migration path from Ranger, and a full demo of OPA with Trino.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/fbqqapQbAv0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;They also made the &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/opa-trino.pdf&quot;&gt;slide deck available for your
reference&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Edward Morgan and Bhaarat Sharma from &lt;a href=&quot;https://teamraft.com/&quot;&gt;Raft&lt;/a&gt; also
presented &lt;a href=&quot;https://www.youtube.com/watch?v=6KspMwCbOfI&quot;&gt;Avoiding pitfalls with query federation in data
lakehouses&lt;/a&gt; at Trino Summit, and
detailed their OPA usage in their Data Fabric platform. It combines Delta Lake,
Trino, Apache Kafka, and Open Policy Agent (OPA) into a robust lakehouse data
platform. They talked about access control in Trino overall and how important it
is for their customers, including the US Department of Defense. Their
presentation also included a demo of OPA with Trino.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/6KspMwCbOfI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;opa-on-the-way-to-trino&quot;&gt;OPA on the way to Trino&lt;/h2&gt;

&lt;p&gt;Pablo and Sebastian continued their efforts on the &lt;a href=&quot;https://github.com/trinodb/trino/pull/19532&quot;&gt;pull
request&lt;/a&gt; after Trino Summit. They
worked successfully with Dain on the code review and necessary changes, and
helped Manfred with the documentation.&lt;/p&gt;

&lt;p&gt;Finally, with the release of Trino 438, the &lt;a href=&quot;https://trino.io/docs/current/security/opa-access-control.html&quot;&gt;Open Policy Agent access
control&lt;/a&gt; is available
to all Trino users.&lt;/p&gt;

&lt;p&gt;The community is already taking notice with follow up pull requests for further
improvements and blog posts such as &lt;a href=&quot;https://www.linkedin.com/pulse/enhancing-security-observability-trino-open-policy-agent-isa-inalcik-zhl9e/&quot;&gt;Enhancing Security and Observability in
Trino with Open Policy Agent and
OpenTelemetry&lt;/a&gt;
from Isa Inalcik.&lt;/p&gt;

&lt;h2 id=&quot;benefits-of-opa&quot;&gt;Benefits of OPA&lt;/h2&gt;

&lt;p&gt;The arrival of OPA support for Trino marks an important step. OPA is a mature
and widely used access control system. Its
&lt;a href=&quot;https://www.openpolicyagent.org/ecosystem/&quot;&gt;ecosystem&lt;/a&gt; includes many
integrations, user interfaces, development tools, and other resources.&lt;/p&gt;

&lt;p&gt;OPA is a very flexible authorization system, making it an ideal match for Trino.
Trino deployments are often part of a diverse data platform, spanning a variety
 of interconnected data sources, pipelines, client tools and applications.&lt;/p&gt;

&lt;p&gt;Trino users now have an alternative to the file-based access
control from the Trino project itself, the effort to support your own Ranger
integration, or the use of commercial offerings for access control.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;We reached another milestone but we are not done yet. Specifically for OPA, we
are looking at the following next tasks:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Get more features from various older, private forks converted into pull
requests to Trino so everyone can benefit.&lt;/li&gt;
  &lt;li&gt;Update the documentation with more practical advice and tips.&lt;/li&gt;
  &lt;li&gt;Provide further resources for running OPA with Trino, writing rego scripts,
and helping the community.&lt;/li&gt;
  &lt;li&gt;Implementation of row level filtering and column masking, based on the
&lt;a href=&quot;https://github.com/bloomberg/trino/pull/16&quot;&gt;draft&lt;/a&gt; from Pablo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks go to everyone participating so far. Consider this an open
invitation to join the effort.&lt;/p&gt;

&lt;p&gt;Ping me on Slack directly or find us in #opa-dev.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino now ships with an access control integration using the popular and widely used Open Policy Agent (OPA) from the Cloud Native Computing Foundation. The release of Trino 438 marks an important milestone of the effort towards this integration.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/opa-small.png" />
      
    </entry>
  
    <entry>
      <title>Trino 2023 wrapped</title>
      <link href="https://trino.io/blog/2024/01/19/trino-2023-wrapped.html" rel="alternate" type="text/html" title="Trino 2023 wrapped" />
      <published>2024-01-19T00:00:00+00:00</published>
      <updated>2024-01-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/01/19/trino-2023-wrapped</id>
      <content type="html" xml:base="https://trino.io/blog/2024/01/19/trino-2023-wrapped.html">&lt;p&gt;If &lt;a href=&quot;https://www.newsroom.spotify.com/2023-wrapped/&quot;&gt;“Wrapped” is good enough for Spotify&lt;/a&gt;, 
it’s good enough for Trino, right? As we look forward to a bright 2024, we can
also take a moment to get sentimental, look back at everything we’ve
accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been
hard at work, so if you haven’t been paying close attention to Trino or want an
idea of all that went down in 2023, we’re happy to present you with an end of
year recap. We’ll be exploring what’s gone on in the community, on development,
the events we’ve hosted, and discuss the cool new features and technologies you
can use when you’re running Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IRq3ZNR9Dgs&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;2023-by-the-numbers&quot;&gt;2023 by the numbers&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;64,288 views 👀 on YouTube&lt;/li&gt;
  &lt;li&gt;5,872 hours watched ⌚on YouTube&lt;/li&gt;
  &lt;li&gt;5,018 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;2,985 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2,494 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1,227 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;704 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;45 videos 🎥 uploaded to YouTube&lt;/li&gt;
  &lt;li&gt;30 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;39 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;10 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;2 Trino ⛰️ Summits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re excited to say that Trino continued to grow in 2023:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;GitHub stars increased by nearly 50% total and by 8% more than last year&lt;/li&gt;
  &lt;li&gt;Commits increased by 7%&lt;/li&gt;
  &lt;li&gt;Slack usage picked up dramatically&lt;/li&gt;
  &lt;li&gt;YouTube viewership was up 7% despite a lack of Pokemon-themed musical content compared to 2022 (our bad)&lt;/li&gt;
  &lt;li&gt;30 releases kept new versions of Trino coming out more than every other week.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks in part to all that growth, it’s more important than ever to be on
&lt;a href=&quot;/slack.html&quot;&gt;our Slack&lt;/a&gt;. If you’re a Trino user or community member and aren’t
already on there, you’re missing out! Make sure to join up for community
announcements, release statuses, the shared expertise of the entire Trino
community, and event-specific channels for discussion when we’re hosting things 
like Trino Fest and Trino Summit. Speaking of those…&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;One of the best parts of being an open source community is that it’s easy to be
excited and connect with others about using such a cool piece of technology.
Whether that’s bringing Trino to new users who can take advantage of it, or
sharing our learnings with other Trino users to make the most, events are one of
the best ways to distribute that knowledge. So what were we up to this year?&lt;/p&gt;

&lt;h3 id=&quot;trino-fest-and-trino-summit&quot;&gt;Trino Fest and Trino Summit&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/blog/2023/12/18/trino-summit-recap.html&quot;&gt;Trino Summit&lt;/a&gt; are
becoming mainstays on the Trino calendar each year, and 2023 was no different.
Formerly “Cinco de Trino,” we ditched the Cinco de Mayo theme and went with the
simpler “Trino Fest” in June, opting to theme it around Commander Bun Bun’s Lake
House Summer Camp, with a focus on integrating Trino with lakehouse and data
lake architectures. Trino Summit only wrapped up a little over a month ago,
rounding out the year and highlighting some amazing developments that we’ll be
talking about later in this blog post.&lt;/p&gt;

&lt;p&gt;Trino Fest has historically been the smaller event, but it did some catching up
in 2023, as both Trino Fest and Trino Summit were made virtual and expanded to 2
days this year. Easier to attend than ever before, we reached a combined total
of about 1,200 live attendees, with thousands more views on demand.&lt;/p&gt;

&lt;p&gt;The lineups were packed with 34 talks across both events, featuring speakers
from huge Trino users like Salesforce, Stripe, Apple, and Lyft, as well as from
major Trino contributors like Starburst, Tabular, and Bloomberg. You can
view &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wbBu_czq-SS9iVdQ4CIv2z1&quot;&gt;recordings of every Trino Fest talk&lt;/a&gt;
and &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wYeJLUjUaEftCFfjymhgLcq&quot;&gt;every Trino Summit talk&lt;/a&gt;
on the Trino YouTube channel if you missed out.&lt;/p&gt;

&lt;h3 id=&quot;meetups-and-international-events&quot;&gt;Meetups and international events&lt;/h3&gt;

&lt;p&gt;One of the more exciting developments was our a major event in Japan -
&lt;a href=&quot;https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html&quot;&gt;Trino Conference Tokyo&lt;/a&gt;. 
A virtual event with four sessions, it brought Trino to a Japanese-speaking
audience and further pushed our favorite query engine across language borders.
On top of that,
&lt;a href=&quot;https://www.starburst.io/info/india-trino-meetup-miq/?utm_source=trino&amp;amp;utm_medium=slack&amp;amp;utm_campaign=APAC-FY24-Q4-CM-india-Meetup-at-MiQ-Digital&quot;&gt;Starburst co-hosted a Trino meetup in Bengaluru&lt;/a&gt;, 
and the community organized the first-ever Korean Trino meetup (pictured below).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2023-review/trino-kr-meetup.png&quot; float=&quot;center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And last but not least,
&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino, the Definitive Guide, 2nd Edition&lt;/a&gt;
was translated into Mandarin and Polish.&lt;/p&gt;

&lt;h2 id=&quot;the-trino-gateway&quot;&gt;The Trino Gateway&lt;/h2&gt;

&lt;p&gt;One of the biggest announcements in the Trino community this year was
the &lt;a href=&quot;https://trino.io/blog/2023/09/28/trino-gateway.html&quot;&gt;launch of the Trino Gateway&lt;/a&gt;. A proxy and
load-balancer, it’s a crucial piece of Trino infrastructure for organizations
that need more than one Trino cluster to suit their needs.&lt;/p&gt;

&lt;p&gt;Why would you want more than one Trino cluster? Maybe you want one cluster with
fault-tolerant execution enabled for ETL workloads and another cluster for
speedy ad-hoc analytics. Perhaps you have analysts performing wildly
differently-sized queries, and high-volume compute-intensive queries are proving
to be bad neighbors for lightweight and low-latency queries that shouldn’t take
more than milliseconds. Historically, users would have to manually manage
swapping between clusters, establish a new connection, and try not to get a
headache in the process.&lt;/p&gt;

&lt;p&gt;Enter the Trino Gateway! By routing all of your Trino traffic automatically,
it’s never been easier to manage, maintain, and query multiple Trino clusters at
once. Load balancing ensures that no one cluster gets overworked, and it’s the
perfect way to stop large queries from getting in the way of the little guys.
Add in the fact that you can seamlessly shut down an individual cluster for
updates or maintenance while the Trino Gateway routes traffic elsewhere, and
it’s easy to see why this is such a game-changer. We’re super excited for it to
be out there in the world, and we hope it makes running Trino at the largest
scales simpler and faster than ever before.&lt;/p&gt;

&lt;p&gt;For more information on the Trino Gateway, check out:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2023/09/28/trino-gateway.html&quot;&gt;The announcement blog post&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/quickstart.md&quot;&gt;The quickstart guide&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/tree/main&quot;&gt;The main Trino Gateway repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-features&quot;&gt;New features&lt;/h2&gt;

&lt;p&gt;With more development on Trino than ever before, there were obviously a ton of
new things being added to it. Let’s go over some of the biggest adds in 2023.&lt;/p&gt;

&lt;h3 id=&quot;sql-routines&quot;&gt;SQL routines&lt;/h3&gt;

&lt;p&gt;Whether you want to refer to them as SQL routines or as user-defined functions,
they’re a big deal. Fresh off the presses and only a few months old, they do
exactly what you’d expect them to do: you, a user, can define and re-use your
own functions! Define and use them inline as part of a query to make that query
cleaner, easier, and simpler to understand. Or, if you’re really cooking, you
can run a query that defines the routine in the schema of the catalog. This
allows other Trino users to access the same routine time and time again as part
of their other queries. It’s a level of customization that we’ve never had
before in Trino, and no longer do you need to write your own Java plugins to
create and re-use functions that do exactly what you need them to do.&lt;/p&gt;

&lt;p&gt;If you want to learn more about SQL routines, you can check
out &lt;a href=&quot;/docs/current/routines/introduction.html&quot;&gt;the introduction to SQL routines&lt;/a&gt;
in our documentation, as well as
&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;list=PLFnr63che7wYzZoo5yyEF5R1QrOH6VRq3&amp;amp;index=4&quot;&gt;a video from our SQL training series&lt;/a&gt;
and a few &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;example routines&lt;/a&gt; which give a
good look at how they can be used.&lt;/p&gt;

&lt;h3 id=&quot;schema-evolution-and-dynamic-catalogs&quot;&gt;Schema evolution and dynamic catalogs&lt;/h3&gt;

&lt;p&gt;While we’re providing more power, customization, and flexibility to Trino users,
it’s also important to highlight just how much has been added this year to make
it easier to adjust things on the fly.&lt;/p&gt;

&lt;p&gt;Schema evolution in Hive was a big addition, allowing you to alter columns’ data
types, rename columns, and handle nested fields when dropping columns. Instead
of needing to use the underlying database or modify it some other way and reboot
Trino, Trino can handle the adjustments on the fly.&lt;/p&gt;

&lt;p&gt;But if you don’t use Hive and are feeling left out, we’ve experimentally taken
things one step further in 2023, adding dynamic catalogs to Trino. Rather than
adjusting your schema one column at a time, what about adding or dropping an
entire catalog in one go? You can do that now. Though it’s currently still
bleeding-edge and not ready for widespread use on your important production
data sources, we’re looking forward to improving it and making it resilient and
stable in 2024.&lt;/p&gt;

&lt;h3 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h3&gt;

&lt;p&gt;Trino has always been about squeezing out every ounce of performance that you
can get. Check out our &lt;a href=&quot;/docs/current/release.html&quot;&gt;release notes&lt;/a&gt; and
you’ll see that every version includes at least a couple performance
improvements. Over time, these performance improvements add up to a substantial
gain, meaning that version-over-version, year-over-year, Trino is always getting
faster. Project Hummingbird was a concerted effort this year to take a look at
the core engine and make a number of architectural changes paired with small
improvements that would add up to something very substantial.
&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;The GitHub issue tracking it&lt;/a&gt;
lists a ton of work that’s been accomplished already, with a lot of that work
done in 2023. Though stay tuned for more, because that’s only scratching the
surface…&lt;/p&gt;

&lt;h3 id=&quot;lakehouse-improvements&quot;&gt;Lakehouse improvements&lt;/h3&gt;

&lt;p&gt;Want to leverage the historical log of all actions taken on a table in Hudi? The
new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$timeline&lt;/code&gt; system table has you covered. How about in Delta Lake? We’ve got
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table_changes&lt;/code&gt; function for that, and views were added there, too. Too many
metadata tables to list were added to Iceberg, along with the REST, JDBC, and
Nessie catalogs for metadata.&lt;/p&gt;

&lt;h3 id=&quot;java-21&quot;&gt;Java 21!&lt;/h3&gt;

&lt;p&gt;Java 21. It’s required to run version Trino versions 436 and later. With
&lt;a href=&quot;https://trino.io/blog/2023/11/03/java-21.html&quot;&gt;the upgrade from Java 17 to 21&lt;/a&gt;
comes a ton of improvements that will make development on Trino easier and
better than ever, which will in turn make it faster and smoother than ever.
Though not as huge of a deal as our upgrade to Java 17 last year, expect to see
the benefits coming down the pipeline as the engineers working on Trino are able
to take advantage of the latest and greatest features in Java.&lt;/p&gt;

&lt;h2 id=&quot;trino-ecosystem-updates&quot;&gt;Trino ecosystem updates&lt;/h2&gt;

&lt;p&gt;There’s more to Trino than Trino itself! With community updates and other
technologies integrating with Trino, the number of ways you can access and use
Trino are always growing. And the number of people taking care of Trino is
growing, too.&lt;/p&gt;

&lt;h3 id=&quot;python-clients&quot;&gt;Python clients&lt;/h3&gt;

&lt;p&gt;Trino’s own &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt; saw
heavy development in 2023. It was updated to support SQLAlchemy 2.0 and had type
support fully fleshed out, making it a robust, free, and open-source tool for
running your Trino queries.&lt;/p&gt;

&lt;p&gt;Elsewhere in the Python ecosystem, we heard from
both &lt;a href=&quot;https://youtu.be/aKhI1Phfn-o&quot;&gt;Fugue&lt;/a&gt;
and &lt;a href=&quot;https://youtu.be/JMUtPl-cMRc&quot;&gt;Ibis&lt;/a&gt; at Trino Fest, two different Python
clients that integrate Trino with Python in new ways. Fugue is a wrapper that
helps integrate with other Python tools and clients, and Ibis can help convert
your Python code into SQL queries, making it feasible to be a 100% Python-based
organization that still leverages the speed and power of a SQL query engine like
Trino. We had Phillip Cloud from Voltron Data on
for &lt;a href=&quot;/episodes/49&quot;&gt;an episode of the Trino Community Broadcast&lt;/a&gt; to talk about
Ibis in even more detail.&lt;/p&gt;

&lt;h3 id=&quot;and-other-clients-too&quot;&gt;And other clients, too!&lt;/h3&gt;

&lt;p&gt;Also on the Trino Community Broadcast repping new client support for Trino in
2023 were &lt;a href=&quot;/episodes/45&quot;&gt;Dolphin Scheduler&lt;/a&gt;, &lt;a href=&quot;/episodes/51&quot;&gt;PopSQL&lt;/a&gt;,
and &lt;a href=&quot;/episodes/53&quot;&gt;Coginiti&lt;/a&gt;. Dolphin Scheduler is a workflow orchestrator - and
scheduler! - that can be used to routinely run and coordinate Trino queries.
PopSQL is like Google Drive for SQL, providing a suite of collaborative tools
for editing and working on queries as a team, including synchronous query
editing, storing query history, and a robust commenting and feedback system.
Coginiti is a high-powered data workspace that connects to Trino among many
other things, supporting a host of powerful features that make it easier to
reuse code and snippets of queries, as well as featuring embedded variables to
minimize redundancy. If you want to learn more about any of these clients, click
in on the links above to check out the Trino Community Broadcast where we went
in-depth with them!&lt;/p&gt;

&lt;p&gt;Oh, and don’t forget
the &lt;a href=&quot;https://regadas.dev/trino-js-client/&quot;&gt;Trino Typescript client&lt;/a&gt;, for when
you want to work at the beautiful intersection of web development and accessing
tons of data.&lt;/p&gt;

&lt;h3 id=&quot;new-maintainers&quot;&gt;New maintainers&lt;/h3&gt;

&lt;p&gt;Trino saw three new maintainers added to its ranks this year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred Moser&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;James Petty&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred even took the liberty of updating the website’s
&lt;a href=&quot;/development/roles&quot;&gt;roles page&lt;/a&gt; to list out all our maintainers. Thank you to
them for their dedication to making Trino the best it can be, and
congratulations to them on their shiny maintainer titles!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;2022 had been the busiest year in Trino’s history&lt;/a&gt;,
but 2023 has managed to surpass it. If you’re interested in contributing to
Trino, make sure to check it out on &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;GitHub&lt;/a&gt;.
Even if you’re not interested in contributing, give us a
&lt;a href=&quot;https://trino.io/star&quot;&gt;star&lt;/a&gt; on GitHub, anyway! It’s been a great year for
Commander Bun Bun, and we can’t wait to show you what 2024 has in store for
everyone’s favorite data rabbit.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>If “Wrapped” is good enough for Spotify, it’s good enough for Trino, right? As we look forward to a bright 2024, we can also take a moment to get sentimental, look back at everything we’ve accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been hard at work, so if you haven’t been paying close attention to Trino or want an idea of all that went down in 2023, we’re happy to present you with an end of year recap. We’ll be exploring what’s gone on in the community, on development, the events we’ve hosted, and discuss the cool new features and technologies you can use when you’re running Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/2023-review/wrapped.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2023 recap</title>
      <link href="https://trino.io/blog/2023/12/18/trino-summit-recap.html" rel="alternate" type="text/html" title="Trino Summit 2023 recap" />
      <published>2023-12-18T00:00:00+00:00</published>
      <updated>2023-12-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/12/18/trino-summit-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/12/18/trino-summit-recap.html">&lt;p&gt;Two days of non-stop Trino action are done! Last week, Trino Summit 2023
took place virtually another great community event. Great presentations from Trino
experts across the globe showed different use cases and experiences with Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;During the event, our lively audience of over 600 attendees asked questions from
the speakers and each other on chat, and we had fun with Trino trivia questions.&lt;/p&gt;

&lt;p&gt;We talked about the &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;SQL routine competition&lt;/a&gt; and announced Kevin Liu from Stripe and Jan Was from Starburst as the
winners. You can find their submissions in &lt;a href=&quot;https://trino.io/docs/current/routines/examples.html&quot; target=&quot;_blank&quot;&gt;the examples page for SQL
routines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Starburst announced their &lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot; target=&quot;_blank&quot;&gt;Trino Champions
program&lt;/a&gt;.
Kevin and Jan are the first recipients of the award and will receive their swag
packs soon. Going forward, new champions will be crowned regularly, and
Starburst is &lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot; target=&quot;_blank&quot;&gt;looking for
nominations&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;If you missed out on the event, the following list of all the sessions provides
links to the recordings. Over time, we will follow up with blog posts about each
session with the presentation and further details.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=pXdZqpwgdxA&quot; target=&quot;_blank&quot;&gt;The mountains Trino climbed in 2023&lt;/a&gt;
presented by Martin Traverso from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/mountains-trino-climbed.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=qZejzyxT2fo&quot; target=&quot;_blank&quot;&gt;Trino workload management&lt;/a&gt;
presented by Jinyang Li and Tingting Ma from
&lt;a href=&quot;https://www.airbnb.ccom&quot; target=&quot;_blank&quot;&gt;Airbnb&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FaytoXxKXOQ&quot; target=&quot;_blank&quot;&gt;Secure exchange SQL: Building a privacy-preserving data clean room service over Trino&lt;/a&gt;
presented by Taro Saito from
&lt;a href=&quot;https://www.treasuredata.com/&quot; target=&quot;_blank&quot;&gt;Treasure Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MYLepz-hIys&quot; target=&quot;_blank&quot;&gt;Powering Bazaar`s business operation using Trino&lt;/a&gt;
presented by Umair Abro from
&lt;a href=&quot;https://www.youtube.com/watch?v=MYLepz-hIys&quot; target=&quot;_blank&quot;&gt;Bazaar&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/powering-bazaar-business-operations.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=qUT-uaEE-Fk&quot; target=&quot;_blank&quot;&gt;Efficient Kappa architecture with Trino&lt;/a&gt;
presented by Sanghyun Lee at
&lt;a href=&quot;https://www.sktelecom.com&quot; target=&quot;_blank&quot;&gt;SK Telecom&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/efficient-kappa-architecture-sk-telecom.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=2qwBcKmQSn0&quot; target=&quot;_blank&quot;&gt;Many clusters and only one gateway&lt;/a&gt;
presented by Will Morrison (&lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;),
Andy Su (&lt;a href=&quot;https://www.techatbloomberg.com/&quot; target=&quot;_blank&quot;&gt;Bloomberg&lt;/a&gt;), and
Jaeho Yoo (&lt;a href=&quot;https://www.naver.com&quot; target=&quot;_blank&quot;&gt;Naver&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=dg16M6bFN2w&quot; target=&quot;_blank&quot;&gt;Trino upgrade at exabytes scale&lt;/a&gt;
presented by Ramanathan Ramu from
&lt;a href=&quot;https://www.linkedin.com/&quot; target=&quot;_blank&quot;&gt;LinkedIn&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ooUGJ6BYt90&quot; target=&quot;_blank&quot;&gt;Powering data marts through Trino Iceberg connector at Zomato&lt;/a&gt;
presented by Shubham Gupta and Bhanu Mittal from
&lt;a href=&quot;https://www.zomato.com/&quot; target=&quot;_blank&quot;&gt;Zomato&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/powering-data-marts-at-zomato.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=RC8K6pIvAtI&quot; target=&quot;_blank&quot;&gt;Pinterest journey to achieving 2x efficiency improvement on Trino&lt;/a&gt;
presented by Carlos Benavides from
&lt;a href=&quot;https://www.pinterest.com/&quot; target=&quot;_blank&quot;&gt;Pinterest&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=6KspMwCbOfI&quot; target=&quot;_blank&quot;&gt;Avoiding pitfalls with query federation in data lakehouses&lt;/a&gt;
presented by  Edward Morgan and
Bhaarat Sharma from &lt;a href=&quot;https://teamraft.com/&quot; target=&quot;_blank&quot;&gt;Raft&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=rmotnvBWXv4&quot; target=&quot;_blank&quot;&gt;Adopting Trino’s fault-tolerant execution mode at Quora&lt;/a&gt;
presented by Gabriel Fernandes de Oliveira and Yifan Pan from
&lt;a href=&quot;https://www.quora.com/&quot; target=&quot;_blank&quot;&gt;Quora&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/fte-mode-at-quora.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=fYCoI8kkdRQ&quot; target=&quot;_blank&quot;&gt;Inherent race condition in Guava Cache invalidation and how to escape it&lt;/a&gt;
presented by Piotr Findeisen from
&lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/inherent-race-in-cache-invalidation.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=LynEiteEtPk&quot; target=&quot;_blank&quot;&gt;Unstructured data analysis using polymorphic table function in Trino&lt;/a&gt;
presented by YongHwan Lee from
&lt;a href=&quot;https://www.sktelecom.com&quot; target=&quot;_blank&quot;&gt;SK Telecom&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/polymorphic-table-function-sk-telecom.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=_wocf0NK6Kc&quot; target=&quot;_blank&quot;&gt;Transitioning to Trino: Evaluating Lyft’s query engine capabilities&lt;/a&gt;
presented by Charles Song from
&lt;a href=&quot;https://www.lyft.com/&quot; target=&quot;_blank&quot;&gt;Lyft&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/transition-to-trino-at-lyft.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=idk0GMxs8vE&quot; target=&quot;_blank&quot;&gt;Visualizing Trino with Apache Superset&lt;/a&gt;
presented by Evan Rusackas from
&lt;a href=&quot;https://preset.io/&quot; target=&quot;_blank&quot;&gt;Preset&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=fbqqapQbAv0&quot; target=&quot;_blank&quot;&gt;Trino OPA authorizer - An open source love story&lt;/a&gt;
presented by Sönke Liebau (&lt;a href=&quot;https://stackable.tech/&quot; target=&quot;_blank&quot;&gt;Stackable&lt;/a&gt;)
and Pablo Arteaga (&lt;a href=&quot;https://www.techatbloomberg.com/&quot; target=&quot;_blank&quot;&gt;Bloomberg&lt;/a&gt;).
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/opa-trino.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=RutbCY8i22Q&quot; target=&quot;_blank&quot;&gt;VAST database catalog&lt;/a&gt;
presented by Jason Russler from
&lt;a href=&quot;https://vastdata.com/&quot; target=&quot;_blank&quot;&gt;VAST&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/vast-connector.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ZJExdGeC4eA&quot; target=&quot;_blank&quot;&gt;Support for Parquet decryption and aggregate pushdown In Trino&lt;/a&gt;
presented by Amogh Margoor and Manish Malhotra from
&lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;shout-outs&quot;&gt;Shout outs&lt;/h2&gt;

&lt;p&gt;Shout outs for all their work with the speakers and organizing the event go to
Anna Schibli, Mandy Darnell, and Monica Miller from the Trino Summit event team,
and everyone else at Starburst who helped make this event a success.&lt;/p&gt;

&lt;p&gt;Special thanks for making this Trino Software Foundation event a reality go out
to our hosting sponsor &lt;a href=&quot;https://starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;, and
our other sponsors &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;Alluxio&lt;/a&gt;,
&lt;a href=&quot;https://www.coginiti.co&quot; target=&quot;_blank&quot;&gt;Coginiti&lt;/a&gt; and &lt;a href=&quot;https://www.montecarlodata.com/&quot; target=&quot;_blank&quot;&gt;Monte
Carlo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We will see you all at future Trino Contributor Congregations, Trino Fest 2024,
Trino Summit 2024, and &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;other events related to Trino&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;sponsors&quot;&gt;Sponsors&lt;/h2&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.coginiti.co&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/coginiti-small.png&quot; title=&quot;Coginiti, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.montecarlodata.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/monte-carlo-small.png&quot; title=&quot;Monte Carlo, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Two days of non-stop Trino action are done! Last week, Trino Summit 2023 took place virtually another great community event. Great presentations from Trino experts across the globe showed different use cases and experiences with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Final reminder for Trino Summit 2023</title>
      <link href="https://trino.io/blog/2023/12/11/trino-summit-reminder.html" rel="alternate" type="text/html" title="Final reminder for Trino Summit 2023" />
      <published>2023-12-11T00:00:00+00:00</published>
      <updated>2023-12-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/12/11/trino-summit-reminder</id>
      <content type="html" xml:base="https://trino.io/blog/2023/12/11/trino-summit-reminder.html">&lt;p&gt;Are you ready? &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;Trino Summit
2023&lt;/a&gt;
is just two days away, and our lineup of speakers, sponsors, and activities is
truly amazing. Make sure to register and join us live.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Over the two days of the event we will enjoy sessions with our speakers from
numerous well-known and respected companies, including Airbnb, Apple, Bloomberg,
LinkedIn, Pinterest, SK Telecom, and others. Look at the &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;full lineup for
details&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Just like &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;last time at Trino Fest 2023&lt;/a&gt; we will have some fun Trino quiz
questions for you all to puzzle over, and are ready to reward your fast and
correct answers.&lt;/p&gt;

&lt;p&gt;Cole Bowden and I will guide you through the two days of the event as hosts. The
chat on the event platform as well as the Trino slack channel for the event will
allow you to talk to other community members and the presenters, ask questions,
and follow up for more answers and discussions.&lt;/p&gt;

&lt;p&gt;We will announce the winning entries for our SQL routine competition and look a
bit at the implementation. And if you are keen to write one, there is still have
time to share your best SQL routine. You might be among the winners.&lt;/p&gt;

&lt;p&gt;So you see - Trino Summit 2023 will be great. The event is virtual and free, so
there really is no excuse for missing out:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;
        Register for Trino Summit 2023
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Special thanks for their help with making this Trino Software Foundation event a
reality go out to our hosting sponsor &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;, and our
other sponsors &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;,
&lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt; and &lt;a href=&quot;https://www.montecarlodata.com/&quot;&gt;Monte
Carlo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We all look forward to see you in just two days. So exciting!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Are you ready? Trino Summit 2023 is just two days away, and our lineup of speakers, sponsors, and activities is truly amazing. Make sure to register and join us live.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Functions with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/29/sql-training-4.html" rel="alternate" type="text/html" title="Functions with SQL and Trino" />
      <published>2023-11-29T00:00:00+00:00</published>
      <updated>2023-11-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/29/sql-training-4</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/29/sql-training-4.html">&lt;p&gt;In the fourth part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; Martin Traverso, Dain
Sundstrom and I took on the big topic of aggregation functions, and covered the
two new and exciting features of table functions and SQL routines.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/1siAYR6BzzY&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=582&quot;&gt;First simple aggregation example&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=2384&quot;&gt;Table functions introduction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=3093&quot;&gt;Query pass through table function&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=3442&quot;&gt;SQL routine use cases&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=4355&quot;&gt;Human readable days example&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More timestamps for every part of the talk are in the description on
YouTube. Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-functions/index.html&quot;&gt;Functions with SQL and
Trino&lt;/a&gt;,
including files with all SQL statements, configurations and more ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this last episode of the series for 2023 we are ready to showcase Trino
with an &lt;a href=&quot;/blog/2023/11/22/trino-summit-2023-nears-lineup.html&quot;&gt;amazing lineup of speakers and sessions&lt;/a&gt; at the upcoming Trino Summit 2023.
Register now and catch all the presenters live for questions in the chat:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register for Trino Summit 2023
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you at Trino Summit 2023, upcoming &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community Broadcast
episodes&lt;/a&gt;, and maybe even more SQL
training in 2024.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the fourth part of our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom and I took on the big topic of aggregation functions, and covered the two new and exciting features of table functions and SQL routines.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2023 nears with an awesome lineup</title>
      <link href="https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup.html" rel="alternate" type="text/html" title="Trino Summit 2023 nears with an awesome lineup" />
      <published>2023-11-22T00:00:00+00:00</published>
      <updated>2023-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup.html">&lt;p&gt;As winter nears, the days may be getting shorter, but so is the wait until
Trino Summit 2023! It’ll be here before you know it on December 13th and 14th.
We’ve got a packed speaker lineup full of exciting talks, and we’re ready to
share some details with the Trino community today. Read on for a preview of some
talks, and if you’re interested in attending, make sure to…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-lineup-announcement&quot;&gt;
        Register!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;So, who’s going to be talking at Trino Summit? Here’s a quick rundown of the
talks coming in from various companies.&lt;/p&gt;

&lt;h2 id=&quot;starburst-the-mountains-trino-climbed-in-2023&quot;&gt;Starburst: The mountains Trino climbed in 2023&lt;/h2&gt;

&lt;p&gt;As always, our keynote will come from Martin Traverso, Trino co-founder and
co-CTO at Starburst. He’ll be giving a project update on everything exciting
that’s happened in Trino since
&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt;, as well as a
sneak peek at the roadmap for features coming to Trino in 2024. It’s one of the
best ways to keep up with the ongoing developments in the Trino community, and
you won’t want to miss it.&lt;/p&gt;

&lt;h2 id=&quot;starburst-bloomberg-and-naver-many-clusters-and-only-one-gateway&quot;&gt;Starburst, Bloomberg, and Naver: Many clusters and only one gateway&lt;/h2&gt;

&lt;p&gt;A second talk, which is a collaboration among Starburst, Bloomberg, and Naver,
will be exploring the new &lt;a href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;Trino Gateway&lt;/a&gt;,
a proxy and load-balancer that has been in the works for a long while in the
Trino community. There’s no more need to worry about noisy neighbors or huge
queries bullying out the quick and small workloads - with multiple clusters and
the Trino Gateway on top, users interact with Trino like normal, but under the
hood, queries get routed to available clusters to ensure that the time it takes
to get your insights are shorter than ever before.&lt;/p&gt;

&lt;h2 id=&quot;airbnb-trino-workload-management&quot;&gt;Airbnb: Trino workload management&lt;/h2&gt;

&lt;p&gt;Trino is the main interactive compute engine for offline ad-hoc analytics at
Airbnb. Recently, they’ve redesigned their query workload processing on Trino
clusters, introducing query cost forecasting and workload awareness scheduling
systems. This helps them deliver a more stable and consistent analytics query
service to offline data users at Airbnb, with improved performance and speed.
And they’ll be explaining how they did it!&lt;/p&gt;

&lt;h2 id=&quot;pinterest-journey-to-achieving-2x-efficiency-improvement-on-trino&quot;&gt;Pinterest: Journey to achieving 2x efficiency improvement on Trino&lt;/h2&gt;

&lt;p&gt;Trino usage has been growing at Pinterest each year, which comes with growing
costs and increased demand on the existing Trino clusters. To help reduce costs
and serve their Trino users, the engineering team there has migrated to AWS
Graviton, taken advantage of Trino improvements, consolidated traffic, improved
job scheduling, and worked to optimize their data and metadata formats. The end
result has been a reduction in cost &lt;em&gt;and&lt;/em&gt; an increase in query throughput.
They’ll be sharing the details on the effort it took to make Trino faster and
cheaper at the same time.&lt;/p&gt;

&lt;h2 id=&quot;quora-adopting-trinos-fault-tolerant-execution-mode&quot;&gt;Quora: Adopting Trino’s fault-tolerant execution mode&lt;/h2&gt;

&lt;p&gt;Quora will be covering how they adopted Trino’s fault-tolerant execution mode
to run some of their heaviest ETL jobs. They separate Trino queries
from their main data pipelines in two clusters, one running the FTE mode for
memory-intensive and longer jobs and another without it for lighter, general
pipelines. This separation helped achieve better query failure rates, improved
the execution time of long queries due to the more flexible autoscaling in
FTE, and provided an alternative to run queries that would otherwise run out of
memory without scaling up the cluster.&lt;/p&gt;

&lt;h2 id=&quot;linkedin-trino-upgrades-at-exabyte-scale&quot;&gt;LinkedIn: Trino upgrades at exabyte scale&lt;/h2&gt;

&lt;p&gt;LinkedIn has been keeping up with Trino releases at an impressive rate, but
getting to that point has required a lot of time, effort, and work on
streamlining the update process. They’ll be discussing the challenges of
breaking changes, applying internal patches, and ensuring that there are no
meaningful performance regressions. They’ve automated much of this, including
implementing a post-commit integration test suite that ensures nothing has
broken, and creating an automated test framework that can validate the
performance of each new Trino release before it deploys to users.&lt;/p&gt;

&lt;h2 id=&quot;ea-migrating-120-million-hms-metadata-records-without-customer-impact&quot;&gt;EA: Migrating 120 million HMS metadata records without customer impact&lt;/h2&gt;

&lt;p&gt;Migrating production databases is a scary task no matter who you are. It’s
scarier when you’re talking about 600+ databases, 35,000+ tables, and over 120
million partitions, all of which you need to migrate while avoiding any customer
impact. EA managed to pull it off with the help of Trino, and they’ll be at
Trino Summit to share how they made it work and what they learned along the way.&lt;/p&gt;

&lt;h2 id=&quot;sk-telecom-efficient-kappa-architecture-with-trino&quot;&gt;SK Telecom: Efficient Kappa architecture with Trino&lt;/h2&gt;

&lt;p&gt;SK Telecom is bringing us two talks this year, as they’ve got a lot going on and
some unique Trino stories to share!&lt;/p&gt;

&lt;p&gt;The first talk will dive into Kappa architecture and the challenges
involved in getting it to run in real-time at the massive scale SK Telecom
needs. They started with Trino’s Kafka connector, but the limitations of that
architecture steered them towards a solution with Flink and Trino’s Iceberg
connector, which they’ll explain. They’ll also be sharing some tips and tricks
for tuning Flink and Iceberg to get the most out of your Trino deployments.&lt;/p&gt;

&lt;h2 id=&quot;sk-telecom-unstructured-data-analysis-using-polymorphic-table-functions-in-trino&quot;&gt;SK Telecom: Unstructured data analysis using polymorphic table functions in Trino&lt;/h2&gt;

&lt;p&gt;The second talk will discuss the challenges of dealing with unstructured data.
Pre-processing is essential for analyzing unstructured data, and it’s difficult
for ordinary users and analysts to distribute large amounts of unstructured
data. With the power of a custom-built polymorphic table function,
they were able to invoke Python code within Trino to help structure that data
for analysis, solving the problem in a powerful and fascinating way. We’ll get
to hear about polymorphic table functions, how they work in Trino, and how
anyone else may be able to leverage them to solve problems.&lt;/p&gt;

&lt;h2 id=&quot;raft-avoiding-pitfalls-with-query-federation-in-data-lakehouses&quot;&gt;Raft: Avoiding pitfalls with query federation in data lakehouses&lt;/h2&gt;

&lt;p&gt;Raft has partnered with the US Department of Defense to build a data fabric that
is built on top of Delta Lake, Trino, Apache Kafka, and Open Policy Agent (OPA).
This talk will discuss the challenges involved, provide solutions and
considerations for each, and end with a demo of Raft’s data fabric. The talk
will focus on a plugin for Trino, developed by Raft, that uses OPA as a policy
engine to provide fine-grained access control at query time based on a user’s
JWT passed along with the query.&lt;/p&gt;

&lt;h2 id=&quot;treasure-data-secure-exchange-sql&quot;&gt;Treasure Data: Secure exchange SQL&lt;/h2&gt;

&lt;p&gt;Secure Exchange SQL is a production data clean room service deployed at Treasure
Data, which leverages Trino and differential privacy technology to enable
cross-company data analysis while mitigating the risk of privacy breaches.
In their session, they’ll introduce the concept of differential privacy and
discuss the privacy protection methods that need to be implemented during SQL
processing. To minimize changes to Trino’s codebase, they employed approaches of
SQL rewriting and validation at the logical plan level. They’ll explain these
methods and provide some practical use cases of their data clean room.&lt;/p&gt;

&lt;h2 id=&quot;zomato-powering-data-marts-through-the-trino-iceberg-connector&quot;&gt;Zomato: Powering data marts through the Trino Iceberg connector&lt;/h2&gt;

&lt;p&gt;It’s a common theme in the Trino community - Zomato recently migrated from a
traditional data warehouse to a Trino-powered data lakehouse in conjunction with
Iceberg. They’ll be discussing how this has enabled their analytics to run
better than ever, including periodic updates to their data marts and tackling
the challenges involved in maintaining Iceberg tables.&lt;/p&gt;

&lt;h2 id=&quot;bazaar-powering-bazaars-business-operations-using-trino&quot;&gt;Bazaar: Powering Bazaar`s business operations using Trino&lt;/h2&gt;

&lt;p&gt;Bazaar’s talk will discuss how they leverage Trino’s capabilities to optimize
data analysis and support data-driven decision-making. The talk specifically
explores including real-time data querying across multiple sources and
performance optimization, illustrating Trino’s role in Bazaar’s data-centric
strategies. This presentation provides in-depth insights for individuals
well-versed in Trino, shedding light on the platform’s transformative impact on
enhancing e-commerce operations.&lt;/p&gt;

&lt;h2 id=&quot;preset-visualizing-trino-with-superset&quot;&gt;Preset: Visualizing Trino with Superset&lt;/h2&gt;

&lt;p&gt;Preset will be diving into the “last mile” of the modern data stack and
show you how to query and visualize data pulled from Trino with Apache Superset
and/or Preset. Specifically, they’ll discuss things like Trino’s federated query
support (a common wish for Superset users) and how Superset can support
near-real-time analytics for Trino users. They’ll also give a demo of connecting
to Trino, building SQL queries, designing charts and dashboards, and other ways
to gain insight and stay on top of your data.&lt;/p&gt;

&lt;h2 id=&quot;vast-the-vast-database-catalog&quot;&gt;VAST: The VAST database catalog&lt;/h2&gt;

&lt;p&gt;The VAST Database connector for Trino was open-sourced this year! They’ll be
discussing the architecture of VAST and the connector, the purpose and major use
cases for it, and demonstrate the workflows surrounding the VAST Database in the
Trino ecosystem.&lt;/p&gt;

&lt;h2 id=&quot;and-still-more-to-come&quot;&gt;And still more to come!&lt;/h2&gt;

&lt;p&gt;Believe it or not, the great lineup we’ve gone over here still isn’t every talk.
Stay tuned here or on the &lt;a href=&quot;https://trino.io/slack&quot;&gt;Trino Slack&lt;/a&gt; to hear about the
other speakers as they’re announced. And of course, if you want to catch all
these talks live, engage in chat, and have an opportunity to ask questions, make
sure to &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-lineup-announcement&quot;&gt;register to attend&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>As winter nears, the days may be getting shorter, but so is the wait until Trino Summit 2023! It’ll be here before you know it on December 13th and 14th. We’ve got a packed speaker lineup full of exciting talks, and we’re ready to share some details with the Trino community today. Read on for a preview of some talks, and if you’re interested in attending, make sure to… Register!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>Data management with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/15/sql-training-3.html" rel="alternate" type="text/html" title="Data management with SQL and Trino" />
      <published>2023-11-15T00:00:00+00:00</published>
      <updated>2023-11-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/15/sql-training-3</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/15/sql-training-3.html">&lt;p&gt;In the third part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; David Phillips and I changed
gears from reading data and performing analytics with Trino. We looked the the
topic of write operations. We covered creating catalogs, schema, tables, and
then inserting and updating data, and talked about related topics such as data
source and connector support.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/q2uyV7mBKVc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-data-mgt/index.html&quot;&gt;Data management with SQL and
Trino&lt;/a&gt;,
including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One more episode to go this year, and then we are going to celebrate our users
at Trino Summit 2023. Register now and catch us live for both events:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you next time. I am excited to show you more about &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;SQL routines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the third part of our training series Learning SQL with Trino from the experts David Phillips and I changed gears from reading data and performing analytics with Trino. We looked the the topic of write operations. We covered creating catalogs, schema, tables, and then inserting and updating data, and talked about related topics such as data source and connector support.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Share your best Trino SQL routine</title>
      <link href="https://trino.io/blog/2023/11/09/routines.html" rel="alternate" type="text/html" title="Share your best Trino SQL routine" />
      <published>2023-11-09T00:00:00+00:00</published>
      <updated>2023-11-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/09/routines</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/09/routines.html">&lt;p&gt;We want to see the best &lt;a href=&quot;/docs/current/routines.html&quot;&gt;SQL routines&lt;/a&gt;
you can write, feature them as &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;examples in the
documentation&lt;/a&gt;, and send you
some goodies as a reward!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;With the recent &lt;a href=&quot;/docs/current/release/release-431.html&quot;&gt;Trino 431
release&lt;/a&gt; we shipped a
feature that has been awaited by many Trino users for a long, long time. &lt;a href=&quot;/docs/current/routines.html&quot;&gt;SQL
routines&lt;/a&gt; are an easy way to define our
own procedural, custom functions. All users on your Trino instance can then use
that function in their queries and enjoy the new feature to simplify their
queries.&lt;/p&gt;

&lt;p&gt;The new process of writing a routine in your client tool in SQL can be used as
alternative to the old way of having to create a custom plugin in Java,
compiling it, and getting the binary deployed in your cluster. The time it takes
to use a function has gone from hours to minutes and a few commands!&lt;/p&gt;

&lt;p&gt;Our documentation includes details for all the supported statements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEGIN&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECLARE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FUNCTION&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IF&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ITERATE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEAVE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LOOP&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPEAT&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RETURN&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHILE&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the memory connector and the Hive connector supporting routine storage, you
can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE FUNCTION&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP FUNCTION&lt;/code&gt;, so that everyone using the
cluster has access to your routines.&lt;/p&gt;

&lt;p&gt;The unit tests and our &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;examples
documentation&lt;/a&gt; contain a
number of routines that scratch the surface of what is possible. Now, we are
looking for you to help us improve the documentation and maybe even find some
bugs. So here is what we are asking from you:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade your Trino cluster, CLI, and other clients to 431 or newer. Support in
client tools may vary.&lt;/li&gt;
  &lt;li&gt;Learn from the documentation and write your own routines.&lt;/li&gt;
  &lt;li&gt;Send us your best SQL routine.
    &lt;ul&gt;
      &lt;li&gt;Create a pull request to add to the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/docs/src/main/sphinx/routines/examples.md&quot;&gt;examples in the
documentation&lt;/a&gt;
with a new section, and request a review from &lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred
(mosabua)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Alternatively, &lt;a href=&quot;mailto:manfred@starburst.io&quot;&gt;email the details&lt;/a&gt; and submit a
&lt;a href=&quot;https://github.com/trinodb/cla&quot;&gt;CLA&lt;/a&gt; separately.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Explain the use case, what the routine does, and maybe also how it works.&lt;/li&gt;
  &lt;li&gt;Include the full statement for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE FUNCTION&lt;/code&gt; definition and an example
invocation.&lt;/li&gt;
  &lt;li&gt;Add any necessary tables or data so we can test the function.&lt;/li&gt;
  &lt;li&gt;Reach out to us on the &lt;a href=&quot;/slack.html&quot;&gt;Trino community Slack&lt;/a&gt;,
if you need any help.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We plan to present submissions at &lt;a href=&quot;/blog/2023/09/14/trino-summit-2023-announcement.html&quot;&gt;Trino Summit 2023&lt;/a&gt;, write a blog post, add them to
the documentation, and &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; will send a cool
reward for the ten best entries.&lt;/p&gt;

&lt;p&gt;Also, if you have more great Trino usage to talk about and share, we would love
to see your &lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;speaker proposal for Trino
Summit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We look forward to seeing many great submissions from you all.&lt;/p&gt;

&lt;p&gt;See you at Trino Summit 2023, and don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Martin, Dain, David, and Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips, Manfred Moser</name>
        </author>
      

      <summary>We want to see the best SQL routines you can write, feature them as examples in the documentation, and send you some goodies as a reward!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql-routine.png" />
      
    </entry>
  
    <entry>
      <title>Trino is moving to Java 21</title>
      <link href="https://trino.io/blog/2023/11/03/java-21.html" rel="alternate" type="text/html" title="Trino is moving to Java 21" />
      <published>2023-11-03T00:00:00+00:00</published>
      <updated>2023-11-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/03/java-21</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/03/java-21.html">&lt;p&gt;We’re excited to announce that as of version 432, Trino can run with Java 21. In
fact, the Trino Docker image uses Java 21 now. We have done upgrades to newer
Java LTS versions successfully before when we upgraded to Java 11 and then &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;Java
17 with Trino 390&lt;/a&gt;. Each
time the improvements to the JVM runtime, the garbage collectors, the involved
libraries, and the dependencies resulted in performance gains that came nearly
for free.&lt;/p&gt;

&lt;p&gt;And each time we were able to take advantage of new language constructs and
standard libraries to improve the codebase for all contributors and maintainers
of the project.&lt;/p&gt;

&lt;p&gt;Now it is time to do it again.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In September, &lt;a href=&quot;https://blogs.oracle.com/java/post/the-arrival-of-java-21&quot;&gt;Java 21 was
released&lt;/a&gt; as the
newest long-term support version. The &lt;a href=&quot;https://www.oracle.com/java/technologies/javase/21all-relnotes.html&quot;&gt;consolidated release
notes&lt;/a&gt; are
truly impressive when it comes to breath and depth of improvements throughout
the runtime, the standard libraries, the included tools, and the overall system.&lt;/p&gt;

&lt;p&gt;Java 21 provides numerous great opportunities to improve Trino. Even without
many code changes, the performance benefits can have a significant impact on the
cost of running a Trino cluster.&lt;/p&gt;

&lt;p&gt;Taking it one step further, and into the codebase and used libraries, we are
able to move our performance work to the next level. &lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project
Hummingbird&lt;/a&gt;, our performance
fine-tuning initiative, is buzzing already. &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt; shipped some great improvements recently again. Just
like with our Java 17 upgrade, &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;
has been of critical importance to pull all the necessary changes together.&lt;/p&gt;

&lt;p&gt;With the &lt;a href=&quot;https://trino.io/docs/current/release/release-432.html&quot;&gt;Trino 432
release&lt;/a&gt; we have now
made the next big step. The Trino Docker image was changed to use the &lt;a href=&quot;https://adoptium.net/temurin/releases/&quot;&gt;Eclipse
Temurin&lt;/a&gt; distribution of Java 21. We
have been running our test suites with Java 21 for quite some time and all looks
good. With this release, you are now able to easily test Trino with Java 21.
Just use the Docker container in your deployment or testing with your own
pipeline or with the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Trino Helm charts&lt;/a&gt;. The
new version 0.14.0 of the chart already uses the right JVM configuration and
Trino 432 by default.&lt;/p&gt;

&lt;p&gt;Our plan is to make Java 21 the required runtime and move towards adopting the
new language features and libraries. However, before we do that, we want your
input. Are you ready to move to Java 21 for Trino? Did you do some testing with
it already? Are there any issue you encounters? We want to know all about your
experience. Find us on the Trino community chat and ping us in the &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX&quot;&gt;#dev
channel&lt;/a&gt;. Or leave comments in our
&lt;a href=&quot;https://github.com/trinodb/trino/issues/17017&quot;&gt;Java 21 tracking issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We want to hear from you. Any input and feedback is welcome.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update from 11 Jan 2024:&lt;/strong&gt;
The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;
includes the switch to Java 21 as a requirement for running Trino.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>We’re excited to announce that as of version 432, Trino can run with Java 21. In fact, the Trino Docker image uses Java 21 now. We have done upgrades to newer Java LTS versions successfully before when we upgraded to Java 11 and then Java 17 with Trino 390. Each time the improvements to the JVM runtime, the garbage collectors, the involved libraries, and the dependencies resulted in performance gains that came nearly for free. And each time we were able to take advantage of new language constructs and standard libraries to improve the codebase for all contributors and maintainers of the project. Now it is time to do it again.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-21.png" />
      
    </entry>
  
    <entry>
      <title>Advanced analytics with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/01/sql-training-2.html" rel="alternate" type="text/html" title="Advanced analytics with SQL and Trino" />
      <published>2023-11-01T00:00:00+00:00</published>
      <updated>2023-11-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/01/sql-training-2</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/01/sql-training-2.html">&lt;p&gt;In the second part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; Martin Traverso and I built
on top of the foundational knowledge from the &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;first training session&lt;/a&gt;. We continued to learn more about data
types and working with them, including the important strings, numeric, temporal,
and JSON types.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/S-mfueDmXds&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting
topics snippets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=601s&quot;&gt;Temporal data types&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=1920s&quot;&gt;Strings&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2442s&quot;&gt;Numeric types&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2705s&quot;&gt;URL parsing and more&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2850s&quot;&gt;JSON&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-adv-analytics/index.html&quot;&gt;Advanced analytics with SQL and
Trino&lt;/a&gt;,
including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are halfway through the series, and there is lots more to cover. Don’t forget
to register for the next session, join us to ask specific questions, and learn
much more about SQL and Trino:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you next time,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the second part of our training series Learning SQL with Trino from the experts Martin Traverso and I built on top of the foundational knowledge from the first training session. We continued to learn more about data types and working with them, including the important strings, numeric, temporal, and JSON types.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Getting started with Trino and SQL</title>
      <link href="https://trino.io/blog/2023/10/18/sql-training-1.html" rel="alternate" type="text/html" title="Getting started with Trino and SQL" />
      <published>2023-10-18T00:00:00+00:00</published>
      <updated>2023-10-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/10/18/sql-training-1</id>
      <content type="html" xml:base="https://trino.io/blog/2023/10/18/sql-training-1.html">&lt;p&gt;In our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the experts&lt;/a&gt; Martin Traverso, Dain Sundstrom, David Phillips,
and myself will run through the wide range of SQL support and features of Trino with
our audience. In the first episode, we covered the concepts of Trino and SQL, and
then started to learn some basic SQL. Now you can take advantage of the
recording and available resources to learn at your own pace.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/SnvSBYhRZLg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting
topics snippets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=380&quot;&gt;What is Trino?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=1163&quot;&gt;Catalogs and connectors&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=1658&quot;&gt;Clients&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=3224&quot;&gt;SQL WHERE statement&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the series&lt;/a&gt;, with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-trino/index.html&quot;&gt;SQL and Trino concepts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-basics/index.html&quot;&gt;SQL basics with Trino&lt;/a&gt;, including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that you know of the series and saw the first part of it, make sure you
register for the next ones, so you can ask specific questions and learn much
more about SQL and Trino:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you then,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom, David Phillips, and myself will run through the wide range of SQL support and features of Trino with our audience. In the first episode, we covered the concepts of Trino and SQL, and then started to learn some basic SQL. Now you can take advantage of the recording and available resources to learn at your own pace.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>A report from the Trino Conference Tokyo 2023</title>
      <link href="https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html" rel="alternate" type="text/html" title="A report from the Trino Conference Tokyo 2023" />
      <published>2023-10-11T00:00:00+00:00</published>
      <updated>2023-10-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023</id>
      <content type="html" xml:base="https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html">&lt;p&gt;The Trino community in Japan held an online event on October 5th, 2023. This
article is a summary of the conference aiming to share the presentations and
provide an overview.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Watch a replay of the whole event, or jump to specific time stamps and topic of
interest:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/CTwk2rkatx8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;This year, there were 4 sessions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Trino, Starburst Galaxy, and Enterprise&lt;/li&gt;
  &lt;li&gt;Log infrastructure using Trino and Iceberg&lt;/li&gt;
  &lt;li&gt;Data infrastructure using Spark and Trino on bare metal k8s&lt;/li&gt;
  &lt;li&gt;Getting started Trino and a transactional data lake with serverless Athena&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;trino-starburst-galaxy-and-enterprise&quot;&gt;Trino, Starburst Galaxy, and Enterprise&lt;/h1&gt;

&lt;p&gt;The first session was presented by Yuya Ebihara (me) from Starburst. I explained
the Trino changes from 2022 and 2023, as well as features of Starburst Galaxy
and Starburst Enterprise. The session introduced &lt;a href=&quot;https://prtimes.jp/main/html/rd/p/000000226.000025237.html&quot;&gt;a press release of the
partnership of Starburst and Dell Technologies in
Japan&lt;/a&gt;.&lt;/p&gt;

&lt;iframe src=&quot;https://docs.google.com/presentation/d/e/2PACX-1vRubtZB9peROzcGgaTQQYkLs-9jZEbWuRszNInKviuj1RdPwp5CrElssLwLYSUuVeGUfj58wv428UFw/embed&quot; frameborder=&quot;0&quot; width=&quot;595&quot; height=&quot;485&quot; allowfullscreen=&quot;true&quot; mozallowfullscreen=&quot;true&quot; webkitallowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;log-infrastructure-using-trino-and-iceberg&quot;&gt;Log infrastructure using Trino and Iceberg&lt;/h1&gt;

&lt;p&gt;The second session was presented by Tadahisa Kamijo from Sakura Internet. He
 explained some requirements for new analytics environments such as concurrent
read/write, schema evolution, record-level modification, restoring past
snapshots, and addressing performance issues with the Hive metastore. They
decided to use Trino and Iceberg for handling these requests. Kamijo-san also
introduced the file layout in Iceberg and demonstrated how to debug Iceberg
files using their Java client.&lt;/p&gt;

&lt;iframe class=&quot;speakerdeck-iframe&quot; frameborder=&quot;0&quot; src=&quot;https://speakerdeck.com/player/4c9229c81e36494ca0c722b20bfdf20e&quot; title=&quot;TrinoとIcebergで ログ基盤の構築 / 2023-10-05 Trino Presto Meetup&quot; allowfullscreen=&quot;true&quot; style=&quot;border: 0px; background: padding-box padding-box rgba(0, 0, 0, 0.1); margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;&quot; data-ratio=&quot;1.7777777777777777&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;data-infrastructure-using-spark-an-trino-on-bare-metal-k8s&quot;&gt;Data infrastructure using Spark an Trino on bare metal k8s&lt;/h1&gt;

&lt;p&gt;The third session was presented by Yasukazu Nagatomi from MicroAd. They started
a migration to Trino from Impala to resolve the following issues - separating
computing and storage, refreshing and utilizing table and column statistics even
with large tables, and supporting schema evolution. Nagatomi-san shared a use
case of the Trino features fault-tolerant execution and spill-to-disk, which is
the first public use case of these features in Japan.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/NTzgv4IUvAPIvp&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/microad_engineer/trino-conference-tokyo-2023&quot; title=&quot;ベアメタルで実現するSpark＆Trino on K8sなデータ基盤&quot; target=&quot;_blank&quot;&gt;ベアメタルで実現するSpark＆Trino on K8sなデータ基盤&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;//www.slideshare.net/microad_engineer&quot; target=&quot;_blank&quot;&gt;MicroAd, Inc.(Engineer)&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;getting-started-trino-and-a-transactional-data-lake-with-serverless-athena&quot;&gt;Getting started Trino and a transactional data lake with serverless Athena&lt;/h1&gt;

&lt;p&gt;The last session was presented by Sotaro Hikita from AWS. Athena is a serverless
service for ad hoc analytics with Trino and Presto foundation. It supports not only S3
data but also various datasources via Federated Query. In Athena, Iceberg
supports both read and write operations, while Hudi and Delta Lake only support
read operations.&lt;/p&gt;

&lt;iframe class=&quot;speakerdeck-iframe&quot; frameborder=&quot;0&quot; src=&quot;https://speakerdeck.com/player/e1f3188001ca4919b227177f3934b626&quot; title=&quot;サーバレスなAmazon Athenaで始めるTrinoとTransactional Data Lake&quot; allowfullscreen=&quot;true&quot; style=&quot;border: 0px; background: padding-box padding-box rgba(0, 0, 0, 0.1); margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;&quot; data-ratio=&quot;1.7777777777777777&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap up&lt;/h1&gt;

&lt;p&gt;We sincerely appreciate the participation of community members in Japan. Thank
you so much for watching the live event. We are planning to hold an offline
event next year, see you next time!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yuya&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>The Trino community in Japan held an online event on October 5th, 2023. This article is a summary of the conference aiming to share the presentations and provide an overview.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Gateway has arrived</title>
      <link href="https://trino.io/blog/2023/09/28/trino-gateway.html" rel="alternate" type="text/html" title="Trino Gateway has arrived" />
      <published>2023-09-28T00:00:00+00:00</published>
      <updated>2023-09-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/28/trino-gateway</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/28/trino-gateway.html">&lt;p&gt;You started with one Trino cluster, and your users like the power for SQL and
&lt;a href=&quot;/ecosystem/index.html#data-sources&quot;&gt;querying all sorts of data sources&lt;/a&gt;.
Then you needed to upgrade and got a cluster for testing going. That was a while
ago, and now you run a separate cluster configured for ETL workloads with
fault-tolerant execution, and some others with different configurations.&lt;/p&gt;

&lt;p&gt;With Trino Gateway we now have an answer to your users request to provide one URL
for all the clusters. Trino Gateway has arrived!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce our &lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-3-26-sep-2023&quot;&gt;first release of Trino
Gateway&lt;/a&gt;.
The release is the result of many, many months of effort to move the legacy
Presto Gateway to Trino, start a refactor of the project, and add numerous new
features.&lt;/p&gt;

&lt;p&gt;Many larger deployments across the Trino community rely on the gateway as a load
balancer, proxy server, and configurable routing gateway for multiple Trino
clusters. Users don’t need to worry about what catalog and data source is
available in what Trino cluster. Trino Gateway exposes one URL for them all.
Administrators can ensure routing is correct and use the REST API to configure
the necessary rules. This also allows seamless upgrades of clusters behind Trino
Gateway in a blue/green deployment mode.&lt;/p&gt;

&lt;p&gt;Up to now, many users had to maintain separate forks of the legacy Presto
Gateway. Some of these users created numerous improvements in isolation of each
other, sometimes even implementing the same feature multiple times. This first
release of Trino Gateway starts a strong collaboration of some of these users.
Bloomberg contributed the main bulk of the new features, including the
much-requested support for authentication and authorization on Trino Gateway
itself. Maintainers and contributors from Starburst pulled together the
stakeholders and managed the project, and collaborators from Naver, LinkedIn,
Dune, and others are already helping out and ready to move the project forward.&lt;/p&gt;

&lt;p&gt;There are exciting times ahead for the project, and we have big plans for
documentation, installation, and general modernizations of the app, so go and
have a look at the project, read the documentation and release notes, file an
issue, or submit a pull request:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;
        Trino Gateway
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Interested to find out more? Find us and others users and contributors on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=trino-gateway&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-gateway&lt;/code&gt;&lt;/a&gt;
and
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=trino-gateway-dev&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-gateway-dev&lt;/code&gt;&lt;/a&gt;
channels in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of Trino Gateway or Trino and
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;submit a talk for Trino Summit
2023&lt;/a&gt;. And if you just want to learn
and listen to others, &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register as
attendee&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and all the other Trino Gateway contributors&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>You started with one Trino cluster, and your users like the power for SQL and querying all sorts of data sources. Then you needed to upgrade and got a cluster for testing going. That was a while ago, and now you run a separate cluster configured for ETL workloads with fault-tolerant execution, and some others with different configurations. With Trino Gateway we now have an answer to your users request to provide one URL for all the clusters. Trino Gateway has arrived!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/trino-gateway-small.png" />
      
    </entry>
  
    <entry>
      <title>Learning SQL with Trino from the experts</title>
      <link href="https://trino.io/blog/2023/09/27/training-series.html" rel="alternate" type="text/html" title="Learning SQL with Trino from the experts" />
      <published>2023-09-27T00:00:00+00:00</published>
      <updated>2023-09-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/27/training-series</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/27/training-series.html">&lt;p&gt;Do you have a rough idea of what SQL is? Do you need to get data out of object
storage in the cloud and some relational database at the same time? You should
look at Trino and learn about SQL.&lt;/p&gt;

&lt;p&gt;Or do you know the ins and outs of joins, window functions, and your SQL
queries are counted by the pages and not lines? You may even be the expert on SQL on
your team. You should &lt;em&gt;also&lt;/em&gt; look at Trino and SQL.&lt;/p&gt;

&lt;p&gt;Luckily for you all, we have the right SQL training for everyone in our upcoming
series with the founders of the Trino project and SQL experts Martin Traverso,
Dain Sundstrom, and David Phillips, and myself as host and co-trainer.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the SQL training series, we start with the basics of Trino. You will learn
that despite the fact that there is leopard frog on the cover of &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt;, SQL does
not stand for Silly Quacking Leopardfrogs. Instead SQL stands for Structured
Query Language, and you will learn about the benefits of connecting &lt;a href=&quot;/ecosystem/index.html#data-sources&quot;&gt;many
data sources&lt;/a&gt; to Trino, and using
&lt;a href=&quot;/ecosystem/index.html#clients&quot;&gt;different clients&lt;/a&gt;. And you can always use
the same powerful SQL. And for the SQL pros, you learn about catalogs and
queries that go across data sources.&lt;/p&gt;

&lt;p&gt;Then we’ll glance at the basic SQL foundations, since there are literally
hundreds of books, videos, and training course around. All of them teach you
things like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statements, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clauses, and unravel the confusions
around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEFT OUTER JOIN&lt;/code&gt; and the like.&lt;/p&gt;

&lt;p&gt;And after this is when we get to the interesting stuff. Following is a list of
some of the topics we will cover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino concepts like cluster, data source, client, catalog, and more&lt;/li&gt;
  &lt;li&gt;Overview of all the SQL support with statements, data types, functions, and
connector support&lt;/li&gt;
  &lt;li&gt;Working with data types, including numerical and text values, dates and times,
JSON, …&lt;/li&gt;
  &lt;li&gt;Lots of scalar, aggregation, window functions&lt;/li&gt;
  &lt;li&gt;Object storage and other data sources&lt;/li&gt;
  &lt;li&gt;Creating schemas, tables, and views&lt;/li&gt;
  &lt;li&gt;Inserting, merging, moving and deleting data&lt;/li&gt;
  &lt;li&gt;Metadata in general and in hidden tables like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Table procedures&lt;/li&gt;
  &lt;li&gt;Trino views, Trino materialized views and other views&lt;/li&gt;
  &lt;li&gt;Global and connector level table functions, including query pass-through&lt;/li&gt;
  &lt;li&gt;Support for SQL routines, also known as user-defined functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interested now? No matter how great your SQL knowledge or Trino expertise is,
you will learn something new in this series.  So what are you waiting for?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Join us in one or all of the sessions on the following dates:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;18th of October 2023: &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Getting started with Trino and SQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;1st of November 2023: &lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Advanced analytics with SQL and Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;15th of November 2023: &lt;a href=&quot;/blog/2023/11/15/sql-training-3.html&quot;&gt;Data management with SQL and Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;29th November 2023: &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;Functions with SQL and Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We look forward to seeing you in class.&lt;/p&gt;

&lt;p&gt;Martin, Dain, David, and Manfred&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Videos, slide decks, and other resources for all classes are now available:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Getting started with Trino and SQL: &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Advanced analytics with SQL and Trino: &lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Data management with SQL and Trino: &lt;a href=&quot;/blog/2023/11/15/sql-training-3.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=q2uyV7mBKVc&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Functions with SQL and Trino: &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Do you have a rough idea of what SQL is? Do you need to get data out of object storage in the cloud and some relational database at the same time? You should look at Trino and learn about SQL. Or do you know the ins and outs of joins, window functions, and your SQL queries are counted by the pages and not lines? You may even be the expert on SQL on your team. You should also look at Trino and SQL. Luckily for you all, we have the right SQL training for everyone in our upcoming series with the founders of the Trino project and SQL experts Martin Traverso, Dain Sundstrom, and David Phillips, and myself as host and co-trainer.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Chinese edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn.html" rel="alternate" type="text/html" title="Chinese edition of Trino: The Definitive Guide" />
      <published>2023-09-21T00:00:00+00:00</published>
      <updated>2023-09-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn.html">&lt;p&gt;Trino, Trino, Trino everywhere. Just looking at our website stats and the users
in our community chat, we know that Trino is going places. We also know that one
of these places with a large user community is China. And now we have good news
for you. A translation of the second edition of the book to Chinese is now
available.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that a Chinese translation of the book &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino:
The Definitive Guide&lt;/a&gt; is now
available for the communities all across China and far beyond and hopefully a
lowers the barrier to Trino for native speakers. We invite you all to get your
own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://product.dangdang.com/11487789827.html&quot;&gt;
        Trino权威指南(原书第2版) 机械工业出版社
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks goes out the teams at O’Reilly and dangdang for making this happen.
We hope many readers will benefit from the translated edition.&lt;/p&gt;

&lt;p&gt;We look forward to chatting with many of our new readers and Trino users on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=general-cn&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;general-cn&lt;/code&gt;&lt;/a&gt; channel in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;,
other channels, and direct messaging.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of Trino. You can contact us on
Slack to be a guest in &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; or &lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;submit a talk for Trino
Summit 2023&lt;/a&gt;. And if you just want
to learn and listen to others, &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register as
attendee&lt;/a&gt; for Trino Summit 2023.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>Trino, Trino, Trino everywhere. Just looking at our website stats and the users in our community chat, we know that Trino is going places. We also know that one of these places with a large user community is China. And now we have good news for you. A translation of the second edition of the book to Chinese is now available.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-cn-cover.png" />
      
    </entry>
  
    <entry>
      <title>Join us for Trino Summit 2023</title>
      <link href="https://trino.io/blog/2023/09/14/trino-summit-2023-announcement.html" rel="alternate" type="text/html" title="Join us for Trino Summit 2023" />
      <published>2023-09-14T00:00:00+00:00</published>
      <updated>2023-09-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/14/trino-summit-2023-announcement</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/14/trino-summit-2023-announcement.html">&lt;p&gt;The Trino community is buzzing. Commander Bun Bun is ready to invite you all to
join us for Trino Summit 2023. And “all” really means everyone in the community.
The event is free to attend, virtual, and full of news and shared knowledge from
your peers using Trino. Don’t hesitate to submit your talk and register to
attend now.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We are pleased to announce the upcoming Trino Summit 2023. The summit is
scheduled as a virtual event on the &lt;strong&gt;13th and 14th of December 2023&lt;/strong&gt;, and
attendance is free!&lt;/p&gt;

&lt;p&gt;If you’d like to share your knowledge and information about Trino usage and give
a talk at this year’s Trino Summit, we’re putting out a call for speakers. We
are accepting submissions from now until the &lt;strong&gt;12th of November&lt;/strong&gt;, but we
recommend submitting as soon as possible, because we expect slots to fill up
fast.&lt;/p&gt;

&lt;p&gt;We’re looking for intermediate to advanced-level talks on a variety of themes.
If you have an interesting story about how you leverage Trino in your data
platform for analytics and other workloads, found a neat way to extend it with a
custom plugin or add-on, or swapped to Trino for a performance win, we’d love to
hear about it. We’re excited to expand our speaker lineup with talks from the
broader Trino community. Find more information about duration, technical
details, and more suggestions when you submit your talk.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;
        Register to attend
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;
        Submit a talk
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;The event of the Trino Software Foundation is organized and sponsored by
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;, and we invite other sponsors to help make
this a successful event for the Trino community.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://starburst.io&quot;&gt;
  &lt;img src=&quot;/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If that interests you or your employer, &lt;a href=&quot;mailto:events@starburst.io&quot;&gt;contact the Trino events team for more
information&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And of course, we’re looking forward to reading your proposals and seeing you
then.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>The Trino community is buzzing. Commander Bun Bun is ready to invite you all to join us for Trino Summit 2023. And “all” really means everyone in the community. The event is free to attend, virtual, and full of news and shared knowledge from your peers using Trino. Don’t hesitate to submit your talk and register to attend now.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>FugueSQL: Interoperable Python and Trino for interactive workloads</title>
      <link href="https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap.html" rel="alternate" type="text/html" title="FugueSQL: Interoperable Python and Trino for interactive workloads" />
      <published>2023-07-27T00:00:00+00:00</published>
      <updated>2023-07-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap.html">&lt;p&gt;Fugue may be an unfamiliar name to those in the Trino ecosystem. It’s another
Python tool, a programming model built to enhance interoperability between
Python and SQL. On the Python side of things, it’s a wrapper around common tools
like pandas and Polars that convert code into SQL for high-performance,
large-scale query execution. So why are we talking about it at Trino Fest?
Because Fugue recently launched an integration with Trino, enabling you to write
Python code that can be converted to SQL to run on a high-powered Trino backend.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/aKhI1Phfn-o&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Though Trino users are quite familiar with SQL, it does present some challenges.
Iterating on a SQL query and improving it can be difficult, and finding ways to
optimize or speed things up can be a challenge that requires sophisticated
external tools or working on hunches. Testing queries, especially incrementally,
has never been super easy, either. Compare that to Python, which does not have
those problems, but has issues of its own. Python, especially at scale, is not
very performant. So it’s natural to try to take the advantages of both, which is
what Fugue is aiming to do.&lt;/p&gt;

&lt;p&gt;After that brief intro into Fugue, the rest of the talk consists of technical
demos of the many various things that you can do with Fugue. This includes
setting a query up, breaking it up into smaller parts, bringing it to pandas,
and demonstrating extensions that are built into Fugue. With all of these
intermediate steps, it becomes easier to unit test queries before sending them
into production, making sure that everything works as expected.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Kevin Kho and Cole Bowden</name>
        </author>
      

      <summary>Fugue may be an unfamiliar name to those in the Trino ecosystem. It’s another Python tool, a programming model built to enhance interoperability between Python and SQL. On the Python side of things, it’s a wrapper around common tools like pandas and Polars that convert code into SQL for high-performance, large-scale query execution. So why are we talking about it at Trino Fest? Because Fugue recently launched an integration with Trino, enabling you to write Python code that can be converted to SQL to run on a high-powered Trino backend.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Fugue.png" />
      
    </entry>
  
    <entry>
      <title>Starburst Galaxy: A romance of many architectures</title>
      <link href="https://trino.io/blog/2023/07/25/trino-fest-2023-datto.html" rel="alternate" type="text/html" title="Starburst Galaxy: A romance of many architectures" />
      <published>2023-07-25T00:00:00+00:00</published>
      <updated>2023-07-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/25/trino-fest-2023-datto</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/25/trino-fest-2023-datto.html">&lt;p&gt;Let’s cut straight to the chase with this lightning talk from Benjamin Jeter, a
data architect, platform manager, and data engineer at Datto. For those that are
not familiar with Datto, they are an American cybersecurity and data backup
company. They’re the leading global provider of security and cloud-based
software solutions purpose-built for Managed Service Providers (MSPs). In
Benjamin’s talk, he goes through some of the considerations and design goals of
a reference architecture pattern that they use and why they chose to use Trino
with Starburst Galaxy.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/K3AlAWB-Gmg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Datto.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;But you might be wondering: what does Ben mean when he says “reference
architecture”? A reference architecture pattern is a pattern for making
arbitrary data available to end users in a reproducible and modular way. It’s an
opinionated representation of what best practices look like for a given class of
use cases. You can almost think of it as a conceptual tool for thinking
critically about specific patterns through a pragmatic balance of simplicity and
effectiveness. However, it is not something that will work for every use case
and not necessarily the best solution.&lt;/p&gt;

&lt;p&gt;The main design goal that Benjamin had was to facilitate near real-time data
access while using only Trino. In addition, he wanted it to be simple, easy to
understand, flexible, and adaptable. Accomplishing this design goal requires
many steps, such as first having a daily batch transform that transforms JSON
into Iceberg and serve as &lt;a href=&quot;https://www.investopedia.com/terms/t/tplus1.asp&quot;&gt;T-1
data&lt;/a&gt;. Then he created an
unpartitioned external table that is rebuilt every day as part of the daily
batch transform. Using the &lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/sql/great-lakes.html&quot;&gt;Great Lakes
connectivity&lt;/a&gt;
with this table allows Datto to have scan on query semantics, which enables data
access about as real-time as you can get it without a streaming solutions like
Kafka or Kinesis. Benjamin shows how easy it is to design a use case with just a
couple lines of code using Trino with Starburst Galaxy.&lt;/p&gt;

&lt;p&gt;Interested? Check out the video where Benjamin shows the code and explains how
it works!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Benjamin Jeter, Ryan Duan</name>
        </author>
      

      <summary>Let’s cut straight to the chase with this lightning talk from Benjamin Jeter, a data architect, platform manager, and data engineer at Datto. For those that are not familiar with Datto, they are an American cybersecurity and data backup company. They’re the leading global provider of security and cloud-based software solutions purpose-built for Managed Service Providers (MSPs). In Benjamin’s talk, he goes through some of the considerations and design goals of a reference architecture pattern that they use and why they chose to use Trino with Starburst Galaxy.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Datto.png" />
      
    </entry>
  
    <entry>
      <title>Trino optimization with distributed caching on data lakes</title>
      <link href="https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html" rel="alternate" type="text/html" title="Trino optimization with distributed caching on data lakes" />
      <published>2023-07-21T00:00:00+00:00</published>
      <updated>2023-07-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html">&lt;p&gt;By 2025, there will be 100 zetabytes stored in the cloud. That’s
100,000,000,000,000,000,000,000 bytes - a huge, eye-popping number. But only
about 10% of that data is actually used on a regular basis. At Uber, for
example, only 1% of their disk space is used for 50% of the data they access on
any given day. With so much data but such a small percentage being used, it
raises the question: how can we identify frequently-used data and make it more
accessible, efficient, and lower-cost to access?&lt;/p&gt;

&lt;p&gt;Once we have identified that “hot data,” the answer is data caching. By caching
that data in storage, you can reap a ton of benefits: performance gains, lower
costs, less network congestion, and reduced throttling on the storage layer.
Data caching sounds great, but why are we talking about it at a Trino event?
Because &lt;a href=&quot;https://github.com/trinodb/trino/pull/16375&quot;&gt;data caching with Alluxio is coming to Trino&lt;/a&gt;!&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/oK1A5U1WzFc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Alluxio.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;So what are the key features of data caching? The first and foremost is that the
frequently-accessed data gets stored on local SSDs. In the case of Trino, this
means that the Trino worker nodes will store data to reduce latency and decrease
the number of loads from object storage. Even if the worker restarts, it also still has
that data stored. Caching will work on all the data lake connectors, so whether
you’re using Iceberg, Hive, Hudi, or Delta Lake, it’ll be speeding your queries
up. The best part is that once it’s in Trino, all you need to do is enable it,
set three configuration properties, and let the performance improvement speak
for itself. There’s no other change to how queries run or execute, so there’s no
headache or migration needed.&lt;/p&gt;

&lt;p&gt;Hope then gives deeper technical detail on exactly how data caching works. She
highlights a few existing examples of how large-scale companies, Uber and
Shopee, have utilized data caching to reap massive performance gains. Then the
talk is passed off to Beinan, who gives further technical detail,
exploring cache invalidation, how to maximize cache hit rate, cluster
elasticity, cache storage efficiency, and data consistency. He also explores
ongoing work on semantic caching, native/off-heap caching, and distributed
caching, all of which have interesting upsides and benefits.&lt;/p&gt;

&lt;p&gt;Give the full talk a listen if you’re interested, as both Hope and Beinan go
into a lot of great, technical detail that you won’t want to miss out on. And
don’t forget to keep an eye on Trino release notes to see when it’s live!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Hope Wang, Beinan Wang, and Cole Bowden</name>
        </author>
      

      <summary>By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes - a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access on any given day. With so much data but such a small percentage being used, it raises the question: how can we identify frequently-used data and make it more accessible, efficient, and lower-cost to access? Once we have identified that “hot data,” the answer is data caching. By caching that data in storage, you can reap a ton of benefits: performance gains, lower costs, less network congestion, and reduced throttling on the storage layer. Data caching sounds great, but why are we talking about it at a Trino event? Because data caching with Alluxio is coming to Trino!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Alluxio.png" />
      
    </entry>
  
    <entry>
      <title>Inspecting Trino on ice</title>
      <link href="https://trino.io/blog/2023/07/19/trino-fest-2023-stripe.html" rel="alternate" type="text/html" title="Inspecting Trino on ice" />
      <published>2023-07-19T00:00:00+00:00</published>
      <updated>2023-07-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/19/trino-fest-2023-stripe</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/19/trino-fest-2023-stripe.html">&lt;p&gt;For those unfamiliar, Stripe is an online payment processor that facilitates
online payments for digital-native merchants. They use Trino to facilitate ad
hoc analytics, enable dashboarding, and provide an API for internal services and
data apps to utilize Trino. In Kevin Liu’s session at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, he showcases the Trino Iceberg
connector and how it can replace more complex usage to access Iceberg metadata.
He also discusses how Trino is a core part of operations at Stripe.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/PSGuAMVc6-w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Stripe.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Trino is the foundational infrastructure on which other data apps and services
are built upon. In Kevin’s words, “I call Trino the Swiss army knife in the data
ecosystem.”&lt;/p&gt;

&lt;p&gt;At Stripe, they use Iceberg tables extensively, replacing legacy Hive tables.
But Iceberg isn’t perfect: one problem with Iceberg is reading its metadata from
S3. To work with Iceberg metadata, Stripe developed an internal CLI tool. The
tool requires a privileged internal machine, which is only accessible to
developers. And outputs the result in JSON format, which is difficult to
process, read, and use for further analysis. However, Kevin found that the Trino
Iceberg connector can replace most of the functionality of the Iceberg CLI. The
connector brings Iceberg metadata information to Trino’s powerful analytical
engine and facilitates lightning fast debugging and analysis.&lt;/p&gt;

&lt;p&gt;Unfortunately, there was no way to grab all desired table property information
from the Trino Iceberg connector, because they were using an older version.
Thus, they use the Trino PostgreSQL connector to connect directly to the backend
database of the Hive Metastore, allowing them to inspect table metadata
directly. With the two connectors, they have all the information about the data
warehouse, powering their analysis and meta-analysis of the data and how it’s
used.&lt;/p&gt;

&lt;p&gt;They also use Trino to inspect Iceberg usage patterns. They log every Trino
query using the Trino event listener and store that in another PostgreSQL
database. This gives the full information of every query that has ever run
through Trino, and allows them to perform analysis using historical queries.
Combined with Trino’s built-in query metadata enrichment, this method enables a
multitude of auditing, debugging, and optimization use cases.&lt;/p&gt;

&lt;p&gt;In the future, they plan to use Trino to improve data quality by leveraging it
as a validation framework, to perform Iceberg table maintenance, and to optimize
tables based on historical read patterns.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Kevin Liu, Ryan Duan</name>
        </author>
      

      <summary>For those unfamiliar, Stripe is an online payment processor that facilitates online payments for digital-native merchants. They use Trino to facilitate ad hoc analytics, enable dashboarding, and provide an API for internal services and data apps to utilize Trino. In Kevin Liu’s session at Trino Fest 2023, he showcases the Trino Iceberg connector and how it can replace more complex usage to access Iceberg metadata. He also discusses how Trino is a core part of operations at Stripe.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Stripe.png" />
      
    </entry>
  
    <entry>
      <title>Data mesh implementation using Hive views</title>
      <link href="https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap.html" rel="alternate" type="text/html" title="Data mesh implementation using Hive views" />
      <published>2023-07-17T00:00:00+00:00</published>
      <updated>2023-07-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap.html">&lt;p&gt;At Comcast, data is used in a data mesh ecosystem, with a vision where users can
discover data and request data through a self-service platform. With federation,
various tools, and the ability to create, read, and write data with different
platforms, it’s a full-blown data mesh. So how do you build that? With Trino, of
course, and with the power of Hive views. Tune into the 10-minute lightning talk
that Alejandro gave at Trino Fest to learn more about how Comcast pulled it off.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ZgcVtPFkKHM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;With various different storage systems, like S3 and MinIO, and users that
want to be able to use a variety of data platforms, including Trino, but also
Databricks and Spark, Comcast needed something to sit between the data and those
platforms. The solution was the Hive CLI and Hive views, which could read from 
all their various forms of storage, and which could be read from all the
user-facing query engines and data platforms with no issues.&lt;/p&gt;

&lt;p&gt;By centralizing data, there was also the upside of easily integrating with
Privacera, which allowed for privacy policies to be implemented without much
issue. Users could request access to the data within the Hive views, and data
owners could approve or reject access as appropriate. Because of the
centralization, it was easy to go very fine-grained with data access rules,
allowing for access control as specific as column-level.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Alejandro Rojas, Cole Bowden</name>
        </author>
      

      <summary>At Comcast, data is used in a data mesh ecosystem, with a vision where users can discover data and request data through a self-service platform. With federation, various tools, and the ability to create, read, and write data with different platforms, it’s a full-blown data mesh. So how do you build that? With Trino, of course, and with the power of Hive views. Tune into the 10-minute lightning talk that Alejandro gave at Trino Fest to learn more about how Comcast pulled it off.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Comcast.png" />
      
    </entry>
  
    <entry>
      <title>DuneSQL - A query engine for blockchain data</title>
      <link href="https://trino.io/blog/2023/07/14/trino-fest-2023-dune.html" rel="alternate" type="text/html" title="DuneSQL - A query engine for blockchain data" />
      <published>2023-07-14T00:00:00+00:00</published>
      <updated>2023-07-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/14/trino-fest-2023-dune</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/14/trino-fest-2023-dune.html">&lt;p&gt;The need to make blockchain data easily accessible has risen over the recent
years due to the popularity of cryptocurrencies, NFTs, and other uses of
blockchains. Dune has made it their mission to make blockchain data more
accessible. Dune is a community data platform for querying public blockchain
data and building beautiful dashboards. They use their own query engine called
DuneSQL, built as extension of Trino, to query blockchain data. In the session,
Miguel and Jonas from Dune talk about the challenges of querying blockchain
data, their transition to Trino, and how DuneSQL is operated. Watch the
recording of the session or keep reading for a recap.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/sCJncarnGdU&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Dune.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;The Dune community data platform is a serverless, open access, community-wide
collaboration portal. Dune experienced some difficulties with blockchain data,
such as processing and ingesting raw data, deserializing and decoding function
calls and arguments, and allowing the community to build abstractions. Their
engine, DuneSQL, is Trino with custom extensions that they created. It runs tens
of thousands of queries that are executed, saved, and re-used each day.&lt;/p&gt;

&lt;p&gt;At first, Dune used PostgreSQL, where they sharded per blockchain and used
vertical scaling. However, they quickly ran into bottleneck issues on storage
size and IOPS (I/O operations per second). Thus, they switched to Apache Spark
with Databricks to allow horizontal scaling and support more blockchains
processing and to support the vast query volume that they had. Unfortunately,
the result was not performant and not interactive enough. In the end, Miguel
says that, “Trino was our choice for performance reasons, for the good
environment and ecosystem, and to fully support our scheme and our datasets.”
Using Trino addressed the performance issues.&lt;/p&gt;

&lt;p&gt;Operating DuneSQL requires modifications and extensions of Trino to suit the
needs of the users and platform as a whole. DuneSQL needs to manage the whole
fleet and the capacity they have, because they use over 4000 CPUs per hour, do
more than 100 billion S3 requests per month, and operate over 10 clusters. To
handle the scheduling and load balancing of these massive operations, DuneSQL
uses query execution services and
&lt;a href=&quot;https://github.com/lyft/presto-gateway&quot;&gt;gateway&lt;/a&gt;. Clusters have a fixed size to
have a predictable capacity and performance. The gateway exposes the clusters to
reduce the blast-radius so failures do not affect other clusters. Even with all
these adjustments, they still have work to do as they plan to optimize the
billions of S3 requests they receive, improve data layout, and implement
sandboxed user defined functions.&lt;/p&gt;

&lt;p&gt;Interested in DuneSQL? Check out the video where Jonas goes over the
specificities and unique characteristics of DuneSQL.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Miguel Filipe, Jonas Irgens Kylling, Ryan Duan</name>
        </author>
      

      <summary>The need to make blockchain data easily accessible has risen over the recent years due to the popularity of cryptocurrencies, NFTs, and other uses of blockchains. Dune has made it their mission to make blockchain data more accessible. Dune is a community data platform for querying public blockchain data and building beautiful dashboards. They use their own query engine called DuneSQL, built as extension of Trino, to query blockchain data. In the session, Miguel and Jonas from Dune talk about the challenges of querying blockchain data, their transition to Trino, and how DuneSQL is operated. Watch the recording of the session or keep reading for a recap.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Dune.png" />
      
    </entry>
  
    <entry>
      <title>Let it snow for Trino</title>
      <link href="https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html" rel="alternate" type="text/html" title="Let it snow for Trino" />
      <published>2023-07-12T00:00:00+00:00</published>
      <updated>2023-07-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html">&lt;p&gt;In this recap, we can skip right to the exciting part: through the joint efforts
of engineers at ForePaaS and Bloomberg, there is a Snowflake connector coming
to Trino! Though it hasn’t landed yet, it has been tested and run in production
at both companies, and a pull request is open and working its way towards
completion as this blog post goes up. In the talk, Yu and Erik talk about
difficulties in developing the connector, the motivations to make it happen, and
the new features that come as part of it for Trino users to take advantage of.
Sound interesting? Give the talk a listen, or read on for more details.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/kmpO_yM8OAs&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023LetItSnow.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those unfamiliar, Snowflake is a cloud-based data warehousing and analytics
platform. It offers a great combination of scale, flexibility, and performance,
with the downside of being a proprietary software that is vendor-locked, and in
order to use Snowflake, you must go through Snowflake, Inc. ForePaaS and its
customers store data in Snowflake, but they also store data in many other 
formats and systems, and they rely on Trino to run their analytics. With no
Snowflake connector in Trino, this meant that while they could run analytics and
queries on most data, Trino had a blind spot. They needed to develop a Snowflake
connector in order to see and query 100% of their data. Bloomberg was in a
similar boat, having data in Snowflake, using Trino for analytics, and needing a
way to join those two together. With a shared need, ForePaaS and Bloomberg
joined forced and made the connector happen.&lt;/p&gt;

&lt;p&gt;The connector has been in use at both companies for some time, and it comes with
the full feature set one would expect from a Trino connector. With the connector,
you can query Snowflake directly from Trino, taking advantage of Trino’s
lightning-fast speeds and the underlying features of Snowflake with no issue.&lt;/p&gt;

&lt;p&gt;Curious to see more? For the rest of the talk, Erik Anderson at Bloomberg gives
a demo of the connector in action. Give the talk a watch, and you can check out
progress on how adding the connector to Trino is coming along on
&lt;a href=&quot;https://github.com/trinodb/trino/pull/17909&quot;&gt;the pull request contributing it&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Yu Teng, Erik Anderson, Cole Bowden</name>
        </author>
      

      <summary>In this recap, we can skip right to the exciting part: through the joint efforts of engineers at ForePaaS and Bloomberg, there is a Snowflake connector coming to Trino! Though it hasn’t landed yet, it has been tested and run in production at both companies, and a pull request is open and working its way towards completion as this blog post goes up. In the talk, Yu and Erik talk about difficulties in developing the connector, the motivations to make it happen, and the new features that come as part of it for Trino users to take advantage of. Sound interesting? Give the talk a listen, or read on for more details.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ForePaaS%20and%20Bloomberg.png" />
      
    </entry>
  
    <entry>
      <title>Redis &amp; Trino - Real-time indexed SQL queries (new connector)</title>
      <link href="https://trino.io/blog/2023/07/10/trino-fest-2023-redis.html" rel="alternate" type="text/html" title="Redis &amp; Trino - Real-time indexed SQL queries (new connector)" />
      <published>2023-07-10T00:00:00+00:00</published>
      <updated>2023-07-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/10/trino-fest-2023-redis</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/10/trino-fest-2023-redis.html">&lt;p&gt;Ever since the pandemic, it has become clear that the need for a digital first
economy is becoming more and more necessary. As Redis’ Field CTO Allen Terleto
said during their talk from &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, “In a digital first economy, data is the
lifeblood of the organization, which makes the databases the heart of enterprise
architectures”. Redis, a popular open source project, is a distributed in-memory
key–value database. It includes a cache, message broker, and optional
durability. In his talk, Allen demonstrates Redis’ new connector for Trino. It
can push down advanced queries and aggregations while leveraging Redis’ unique
in-memory secondary indexing. As a result, performance with the new connector is
much higher.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/JjBtZ26IHYk&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Redis is an open source, in-memory, NoSQL database that natively supports a
variety of data structures. Redis is designed for utmost performance and high
throughput use cases across different types of workloads. Redis is widely known
for being the fastest data store in the market with sub millisecond performance,
its ease of use, and being a multi-model database. Redis is able to map
relational tables to a key-value database by adding a key-value pair as a hash
attribute for each column. However, how can you search for a certain key in a
way that scales well in high throughput databases? Redis has a unique way to
deal with this problem: secondary indexing and Redis Search.&lt;/p&gt;

&lt;p&gt;Redis Search enables secondary indexing and full-text search, which allows Redis
to support many features such as multi-field queries, aggregations, exact phrase
matching, numeric filtering, geo-filtering, and vector similarity semantic
search on top of text queries. As Allen says, “Redis Search will be at the heart
of our new integration with Trino and game-changing better performance at scale
to the existing Redis Trino connector”. In addition, Redis supports a native
data model for JSON documents, allowing you to store, update, and retrieve JSON
values in a Redis database like other Redis data types. It also works with Redis
Search to let you index and query JSON documents.&lt;/p&gt;

&lt;p&gt;The syntax for Redis Search is a bit different from traditional SQL syntax, so
Redis is introducing a quicker and more reliable Redis-Trino connector that lets
you easily integrate with visualizations frameworks and platforms that support
Trino. The connector is open source and publicly available on their public
GitHub. In addition, it will be contributed directly to the Trino project.&lt;/p&gt;

&lt;p&gt;Want to see Redis in action? Check out the video where Julien does a demo on how
you can load data from some file system, relational database, or data warehouse
and query it without writing a single line of code.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Allen Terleto, Julien Ruaux, Ryan Duan</name>
        </author>
      

      <summary>Ever since the pandemic, it has become clear that the need for a digital first economy is becoming more and more necessary. As Redis’ Field CTO Allen Terleto said during their talk from Trino Fest 2023, “In a digital first economy, data is the lifeblood of the organization, which makes the databases the heart of enterprise architectures”. Redis, a popular open source project, is a distributed in-memory key–value database. It includes a cache, message broker, and optional durability. In his talk, Allen demonstrates Redis’ new connector for Trino. It can push down advanced queries and aggregations while leveraging Redis’ unique in-memory secondary indexing. As a result, performance with the new connector is much higher.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Redis.png" />
      
    </entry>
  
    <entry>
      <title>Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem</title>
      <link href="https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap.html" rel="alternate" type="text/html" title="Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem" />
      <published>2023-07-07T00:00:00+00:00</published>
      <updated>2023-07-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap.html">&lt;p&gt;Optimizing data access and query performance is crucial to building low-latency
applications and running analytics. Even with the modern data lakehouse designed
to be as efficient and performant as possible, there are a number of bottlenecks
that can slow things down and plenty of challenges to overcome. Nadine and Sagar
explored this at Trino Fest, introducing us to multi-modal indexing and the
metadata table in Hudi, how they work, and how leveraging them with Trino can
unlock queries faster than ever before.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IiDOmAEOXUM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Onehouse.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;When you’re building large-scale data-based applications, bottlenecks are
inevitable. Finding ways to address these bottlenecks and optimizing your
platform to avoid them is going to be a huge cost, so it pays off to know your
requirements. In the same vein, if you know the types of services and features
you need to effectively scale, you can build with them in mind from the ground
up. Hudi has a couple key features you might be interested in that aren’t
present in all lakehouses:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Write indexing, speeding up and optimizing inserts and upserts&lt;/li&gt;
  &lt;li&gt;Automated table services, which handle clustering, cleaning, compacting,
and metadata indexing without any need for manual orchestration or overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nadine also goes on a deep dive into exactly how the Hudi table format works,
but emphasizes that these extra features elevate it to being an entire platform,
not just a table format.&lt;/p&gt;

&lt;p&gt;From there, Nadine passes things off to Sagar, who does an explanation of the
multi-modal indexing sub-system in Hudi, which features a scalable metadata
table, different types of indexes, and an async indexer. All of these features
minimize tradeoffs while maximizing performance, helping you read and write data
faster than ever. And with Trino’s Hudi connector, the Trino coordinator is able
to read the feature-rich Hudi metadata to more effectively delegate workers,
leveraging that speed as the best-in-class query engine for running analytics on
your data stored in Hudi.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Nadine Farah, Sagar Sumit, Cole Bowden</name>
        </author>
      

      <summary>Optimizing data access and query performance is crucial to building low-latency applications and running analytics. Even with the modern data lakehouse designed to be as efficient and performant as possible, there are a number of bottlenecks that can slow things down and plenty of challenges to overcome. Nadine and Sagar explored this at Trino Fest, introducing us to multi-modal indexing and the metadata table in Hudi, how they work, and how leveraging them with Trino can unlock queries faster than ever before.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Onehouse.png" />
      
    </entry>
  
    <entry>
      <title>AWS Athena (Trino) in the cybersecurity space</title>
      <link href="https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf.html" rel="alternate" type="text/html" title="AWS Athena (Trino) in the cybersecurity space" />
      <published>2023-07-05T00:00:00+00:00</published>
      <updated>2023-07-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf.html">&lt;p&gt;Arctic Wolf Networks, a cybersecurity company that provides security monitoring
to cyber threats, is one of the companies that have recently switched to using
AWS Athena as a new and efficient service to query their data using Trino. AWS
Athena is a serverless, interactive analytics service built on open-source
frameworks that runs on Trino, supporting open table and file formats and
providing a simplified, flexible way to analyze petabytes of data where it
lives. Senior software developer Anas Shakra from Arctic Wolf Networks gave a
talk at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;
detailing their switch to AWS Athena and how “queries that took hours with old
solution now take around a minute today”. Tune in to the talk or you can read
the recap!&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/WCuJaW7zC8k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;At Arctic Wolf, data access use-cases fall under three categories: investigations,
compliance, and customer self-serve platform. The process of preparing the data
follows an established pattern of starting with datastore, performing an
operation to filter or transform the data, and then outputting the data in some
format like a CSV or JSON, depending on the client needs. Arctic Wolf’s custom
legacy service was unable to match the growing service demand and had four main
problems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Optimized for breadth over depth&lt;/li&gt;
  &lt;li&gt;Struggles to handle growing service demand&lt;/li&gt;
  &lt;li&gt;Proprietary query language&lt;/li&gt;
  &lt;li&gt;Complicated design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This compelled Anas’ team to find a different and improved service: Trino as
provided by AWS Athena.&lt;/p&gt;

&lt;p&gt;They had four main objectives for the new service: defined access patterns,
performant at scale, user-friendly, and deterministic pricing. AWS Athena
satisfied these objectives, while also providing numerous benefits such as using
a powerful query engine, being purposefully built for large datasets, using SQL
syntax, and having a clear pricing structure. However, with these benefits come
some drawbacks for Athena. These includes being subject to quota limits, having
suboptimal file sizes for their system, and being unable to control access
sufficiently. Anas addresses this by using log queries that resolves these three
main impediments. As next step, Anas is considering switching to a self-managed
Trino deployment for more control with the same performance gains.&lt;/p&gt;

&lt;p&gt;Want to learn more about log queries that they use? Check out Anas’ explanation
in the video!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Anas Shakra, Ryan Duan</name>
        </author>
      

      <summary>Arctic Wolf Networks, a cybersecurity company that provides security monitoring to cyber threats, is one of the companies that have recently switched to using AWS Athena as a new and efficient service to query their data using Trino. AWS Athena is a serverless, interactive analytics service built on open-source frameworks that runs on Trino, supporting open table and file formats and providing a simplified, flexible way to analyze petabytes of data where it lives. Senior software developer Anas Shakra from Arctic Wolf Networks gave a talk at Trino Fest 2023 detailing their switch to AWS Athena and how “queries that took hours with old solution now take around a minute today”. Tune in to the talk or you can read the recap!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ArcticWolf.png" />
      
    </entry>
  
    <entry>
      <title>Ibis: Because SQL is everywhere and so is Python</title>
      <link href="https://trino.io/blog/2023/07/03/trino-fest-2023-ibis.html" rel="alternate" type="text/html" title="Ibis: Because SQL is everywhere and so is Python" />
      <published>2023-07-03T00:00:00+00:00</published>
      <updated>2023-07-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/03/trino-fest-2023-ibis</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/03/trino-fest-2023-ibis.html">&lt;p&gt;The PyData stack has been described as “unreasonably effective,” empowering its
users to glean insights and analyze moderate amounts of data with a high level
of flexibility and excellent visualization. The large-scale, production data
stack using a query engine like Trino sits on the other side of the world,
capable of handling petabytes and exabytes, but perhaps not integrating as
seamlessly with the Python ecosystem as one would hope. SQL has been a means of
bridging this gap, but we’ve now got an exciting solution to bridge it even
better: Ibis.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/JMUtPl-cMRc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Ibis.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A major problem with bridging the gap between Python and SQL engines has been
the lack of standardization in SQL. Though Trino prides itself on being
ANSI-compliant and many other SQL dialects strive to be similar, the reality is
that every SQL engine is different, and a complicated SQL query will error out
or return different results based on what engine you’re using. So if you want to
convert some Python code to SQL, the question is… which SQL? If you’re doing
your data analysis in Python because you prefer to use it, spending time
scratching your head and trying to work out a SQL conversion can be frustrating,
time-consuming, and painful. But SQL is everywhere, and for large, performant,
efficient queries, you may need a SQL engine like Trino.&lt;/p&gt;

&lt;p&gt;Enter Ibis, a lightweight Python library for “data wrangling.” It can easily
convert your Python code into SQL queries for 16 different engines, including
Trino. With Ibis, you can leverage the ease of writing Python code with the
power and performance of running queries in Trino, getting the best of both
worlds in both the Python and SQL ecosystems. Want to learn more? Check out
&lt;a href=&quot;https://ibis-project.org/&quot;&gt;the Ibis project website&lt;/a&gt;, give the talk a listen,
and tune into the Trino Community Broadcast on July 6th, where we’ll be going
into even more detail about Ibis.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Phillip Cloud, Cole Bowden</name>
        </author>
      

      <summary>The PyData stack has been described as “unreasonably effective,” empowering its users to glean insights and analyze moderate amounts of data with a high level of flexibility and excellent visualization. The large-scale, production data stack using a query engine like Trino sits on the other side of the world, capable of handling petabytes and exabytes, but perhaps not integrating as seamlessly with the Python ecosystem as one would hope. SQL has been a means of bridging this gap, but we’ve now got an exciting solution to bridge it even better: Ibis.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Ibis.png" />
      
    </entry>
  
    <entry>
      <title>CDC patterns in Apache Iceberg</title>
      <link href="https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg.html" rel="alternate" type="text/html" title="CDC patterns in Apache Iceberg" />
      <published>2023-06-30T00:00:00+00:00</published>
      <updated>2023-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg.html">&lt;p&gt;Have you ever wanted to keep your data in a table and have an efficient way to
interact with them? Iceberg, an open standard table format, is
exactly what you need. One of the great and unique features of the Iceberg
table format is its support for change data capture (CDC). Co-creator of
Apache Iceberg, Ryan Blue, presented at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt; this past week detailing the CDC support
and the trade-offs between different patterns that can be used for writing
CDC streams into Iceberg tables.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/GM7EvRc7_is&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Iceberg.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;To begin, what is CDC and why should you use it? CDC is the idea that when
relational or transactional tables are modified, you emit an update stream.
This enables you to keep copies in sync by capturing changes to tables as
they happen. As Ryan states, “[CDC] is very lightweight on the source
database … rather than being super careful with what we run on the database,
what we want to do is just make a copy of it very easily and maintain that
copy.” Ryan continues giving an example of a bank using a transactional table
in Iceberg to offer some context on what’s going on.&lt;/p&gt;

&lt;p&gt;Although CDC has many advantages, there are also some problems that make it
difficult:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lower latency means more work&lt;/li&gt;
  &lt;li&gt;Write amplification - the work necessary to balance the trade-offs between
efficiency at write time and efficiency at read time&lt;/li&gt;
  &lt;li&gt;Batch writes with double update and possible inconsistency&lt;/li&gt;
  &lt;li&gt;Read requirements with the different types of deletes in a table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these types of problems, the importance of the trade-offs between the
different patterns rise due to the need for utmost efficiency. The first
trade-offs that Ryan talks about are the storage trade-offs between using direct
writes and a change log table, which is considered the most important and often
overlooked decision. The next trade-offs are in regards to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; pattern’s
choice of lazy merge (merge-on-read) or eager merge (copy-on-write). In
addition, the commit frequency trade-offs have different benefits depending on if you
prefer it to be faster or slower. The change log pattern and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; pattern both
have benefits you may want, so Ryan suggests using a hybrid version of both that
may give you what you want from both patterns. With Iceberg, you have the choice and the
different CDC patterns can be supported for you to adjust your usage to your
specific needs. Check out the video and review the slides for more details!&lt;/p&gt;

&lt;p&gt;Want to read more about CDC? Check out some of Ryan Blue’s blog posts:
&lt;a href=&quot;https://tabular.io/blog/hello-world-of-cdc/&quot;&gt;Hello, World of CDC!&lt;/a&gt; and &lt;a href=&quot;https://tabular.io/blog/cdc-data-gremlins/&quot;&gt;CDC
Data Gremlins&lt;/a&gt;!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Ryan Blue, Ryan Duan</name>
        </author>
      

      <summary>Have you ever wanted to keep your data in a table and have an efficient way to interact with them? Iceberg, an open standard table format, is exactly what you need. One of the great and unique features of the Iceberg table format is its support for change data capture (CDC). Co-creator of Apache Iceberg, Ryan Blue, presented at Trino Fest 2023 this past week detailing the CDC support and the trade-offs between different patterns that can be used for writing CDC streams into Iceberg tables.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ApacheIceberg.png" />
      
    </entry>
  
    <entry>
      <title>Zero-cost reporting</title>
      <link href="https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap.html" rel="alternate" type="text/html" title="Zero-cost reporting" />
      <published>2023-06-28T00:00:00+00:00</published>
      <updated>2023-06-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap.html">&lt;p&gt;Let’s say you have some data. Maybe it’s in a spreadsheet, a CSV file, a
relational database, or multiple terabytes of data in an S3 bucket. You need
to run SQL queries on this data, and you’d like to share those results with your
teammates, coworkers, and partner teams, but you want to do it in a way that
allows everyone to view those results on-demand, on the web, and with the latest
results without the need for any manual effort on your part.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/586qvEyuO_U&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;There are a lot of tools that might be able to do this for you, but whatever you
choose, you’ll need to spend time or money to set it up, and you don’t want to
spend a lot. With so many options, there’s the possibility of getting stuck in
analysis paralysis, and trying to find the best way forward may leave you
stymied. Jan Waś from Starburst has a suggestion: keep it simple with Trino,
plaintext files, Git, and GitHub actions, and you can set it all up for free.&lt;/p&gt;

&lt;p&gt;To start, why put results into plaintext files? With markdown, files are both
human-legible and machine-readable. By saving queries in normal files, it’s easy
to see and edit those queries. You can commit your queries and results to Git,
and then you can push them to a service like GitHub, where those files will be
even more readable thanks to the web UI. Then, once on GitHub, you can use the
power of actions to re-run the queries, update your results on a schedule, and
keep things up to date for teammates to view via GitHub Pages. Sound neat? Check
out the talk to see how Jan does it!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Jan Waś, Cole Bowden</name>
        </author>
      

      <summary>Let’s say you have some data. Maybe it’s in a spreadsheet, a CSV file, a relational database, or multiple terabytes of data in an S3 bucket. You need to run SQL queries on this data, and you’d like to share those results with your teammates, coworkers, and partner teams, but you want to do it in a way that allows everyone to view those results on-demand, on the web, and with the latest results without the need for any manual effort on your part.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Starburst.png" />
      
    </entry>
  
    <entry>
      <title>Anomaly detection for Salesforce’s production data using Trino</title>
      <link href="https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce.html" rel="alternate" type="text/html" title="Anomaly detection for Salesforce’s production data using Trino" />
      <published>2023-06-26T00:00:00+00:00</published>
      <updated>2023-06-26T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce.html">&lt;p&gt;Rolling into our next presentation from &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, we’re excited to bring you
Tuli Navas and Geeta Shankar’s talk from the Performance Engineering Team at
Salesforce. They provide numerous reasons for why they need Trino and
further explain how it is essential for anomaly detection in
their data. It’s an insightful talk about using a query engine to ensure data
quality and how switching to Trino has massively improved their performance.
You definitely don’t want to miss it.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/nFuqpb2GjVI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Salesforce.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Salesforce provides customer relationship management software and applications
focused on sales, customer service, marketing automation, e-commerce, analytics,
and application development. They host hundreds of thousands of customers that
generate millions of transactions per day. For a company of this size, they
need a query engine that is fast and efficient. During the talk, Tuli made it
clear how much Salesforce relies on Trino, stating, “Trino has been a one-stop
shop for analytics.” Trino is the perfect solution for them, as Tuli mentions,
“Because of how well Trino scales and how efficiently it has been able to
process even the most gnarly looking queries.” It allows them to do everything
they need.&lt;/p&gt;

&lt;p&gt;In addition, Trino has helped Salesforce get more value from their production
logging data by accelerating their access to it, speeding up their decision
making. For years, they used Splunk for all their production data, but after
switching to Trino, they have had numerous improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reducing their team’s analytics cost&lt;/li&gt;
  &lt;li&gt;Improving their cost-to-serve&lt;/li&gt;
  &lt;li&gt;Improving the time it takes to run the same query by 194%&lt;/li&gt;
  &lt;li&gt;Providing an SLA of 20-minute latency on all production logs&lt;/li&gt;
  &lt;li&gt;Retaining and accessing data up to 2 years compared to Splunk’s 30 days&lt;/li&gt;
  &lt;li&gt;Reducing the number of queries needed, which creates a smaller footprint&lt;/li&gt;
  &lt;li&gt;Creating tables and views for temporary data storage and analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this, they use specific heuristics to create an anomaly detection framework
with a very quick response time that they are able to constantly observe. This
also allows them to monitor customer behavior efficiently, allowing them to
respond to any urgent changes quickly. In the future, they plan to expand and
ramp up their usage of Trino throughout their teams.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Tuli Nivas, Geeta Shankar, Ryan Duan</name>
        </author>
      

      <summary>Rolling into our next presentation from Trino Fest 2023, we’re excited to bring you Tuli Navas and Geeta Shankar’s talk from the Performance Engineering Team at Salesforce. They provide numerous reasons for why they need Trino and further explain how it is essential for anomaly detection in their data. It’s an insightful talk about using a query engine to ensure data quality and how switching to Trino has massively improved their performance. You definitely don’t want to miss it.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Salesforce.png" />
      
    </entry>
  
    <entry>
      <title>Trino for lakehouses, data oceans, and beyond</title>
      <link href="https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap.html" rel="alternate" type="text/html" title="Trino for lakehouses, data oceans, and beyond" />
      <published>2023-06-22T00:00:00+00:00</published>
      <updated>2023-06-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap.html">&lt;p&gt;&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt; got off to a
bang, as Trino co-creator and maintainer Martin Traverso gave an update on all
the amazing things that have happened to Trino since
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit last year&lt;/a&gt;. He
also provided some insight into what’s coming down the pipeline for Trino, with
a brief look at the project’s roadmap. You can watch the recording of the talk
if you want to see for yourself, or you can read on for the highlights.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/SJ1h-I7HoII&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Keynote.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;It’s only been about 7 months since Trino Summit in 2022, but Trino moves
quickly. In the words of Martin, “the project is on fire” and “is as active as
it’s ever been,” leaving us a lot to catch up to since then:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;16 releases and 2,250 commits&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/47.html&quot;&gt;Two new maintainers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Several new table functions&lt;/li&gt;
  &lt;li&gt;Simplified configuration and improved performance for fault-tolerant execution&lt;/li&gt;
  &lt;li&gt;Better support for schema evolution and lakehouse migration&lt;/li&gt;
  &lt;li&gt;45 bullet points worth of performance improvements&lt;/li&gt;
  &lt;li&gt;Tracing with OpenTelemetry&lt;/li&gt;
  &lt;li&gt;An improved Python client and dbt Cloud support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And keep in mind that these are the highlights of the highlights! In the talk,
Martin goes into depth on all of the above, making it a worthwhile watch or
listen. There’s also a lot to look forward to, which you’ll hear more about as
they roll out in the coming months:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;SQL 2023, including enhancements to JSON functions and numeric literals&lt;/li&gt;
  &lt;li&gt;A new Snowflake connector and an improved Redis connector&lt;/li&gt;
  &lt;li&gt;Java 21&lt;/li&gt;
  &lt;li&gt;Project Hummingbird, the ongoing effort to incrementally make Trino faster
than ever before&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Cole Bowden</name>
        </author>
      

      <summary>Trino Fest 2023 got off to a bang, as Trino co-creator and maintainer Martin Traverso gave an update on all the amazing things that have happened to Trino since Trino Summit last year. He also provided some insight into what’s coming down the pipeline for Trino, with a brief look at the project’s roadmap. You can watch the recording of the talk if you want to see for yourself, or you can read on for the highlights.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Keynote.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest 2023 recap</title>
      <link href="https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html" rel="alternate" type="text/html" title="Trino Fest 2023 recap" />
      <published>2023-06-20T00:00:00+00:00</published>
      <updated>2023-06-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/20/trino-fest-2023-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html">&lt;p&gt;Last week we held Trino Fest, and it kept us all so busy, we forgot to spend
time chilling by the lakehouse! Great demos, amazing announcements, new plugins,
and use cases reached our active audience. Thanks go to our event host and
organizer &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;, to our sponsors
&lt;a href=&quot;https://aws.amazon.com/&quot;&gt;AWS&lt;/a&gt; and &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;, to our
many well-prepared speakers, and to our great live audience. Now you get a
chance to catch up on anything you missed.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot; Starburst, event host and organizer &quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://aws.amazon.com/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/aws-small.png&quot; title=&quot;AWS, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In the weeks leading up to the event, we published numerous blog posts, and
racked up great interest in the Trino community and beyond. Over 1100
registrations blew away our numbers from last year. More importantly, during the
two half-days of the event, we had over 560 attendees watching live and
participating in the busy chat.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;If you could not attend every session, or if you missed out on attending
completely, then we’ve got great news for you! You still  have a chance to learn
from the presentations and the experience and knowledge of our speakers.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/22/trino-fest-2023-keynote-recap.html&quot;&gt;Trino for lakehouses, data oceans, and beyond&lt;/a&gt;
presented by Martin Traverso, co-creator of Trino and CTO at
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/26/trino-fest-2023-salesforce.html&quot;&gt;Anomaly detection for Salesforce’s production data using
Trino&lt;/a&gt; presented by Geeta Shankar and Tuli Nivas
from &lt;a href=&quot;https://www.salesforce.com/&quot;&gt;Salesforce&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/28/trino-fest-2023-starburst-recap.html&quot;&gt;Zero-cost reporting&lt;/a&gt; presented by Jan Waś from
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/30/trino-fest-2023-apacheiceberg.html&quot;&gt;CDC patterns in Apache Iceberg&lt;/a&gt; presented by Ryan
Blue from &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/03/trino-fest-2023-ibis.html&quot;&gt;Ibis: Because SQL is everywhere and so is Python&lt;/a&gt;
presented by Phillip Cloud from &lt;a href=&quot;https://voltrondata.com/&quot;&gt;Voltron Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/05/trino-fest-2023-arcticwolf.html&quot;&gt;AWS Athena (Trino) in the cybersecurity space&lt;/a&gt;
presented by Anas Shakra from &lt;a href=&quot;https://arcticwolf.com/&quot;&gt;Artic Wolf&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/07/trino-fest-2023-onehouse-recap.html&quot;&gt;Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal
indexing subsystem&lt;/a&gt;
presented by Nadine Farah and  Sagar Sumit from &lt;a href=&quot;https://www.onehouse.ai/&quot;&gt;OneHouse&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/10/trino-fest-2023-redis.html&quot;&gt;Redis &amp;amp; Trino - Real-time indexed SQL queries (new
connector)&lt;/a&gt; presented by Allen Terleto and
Julien Ruaux from &lt;a href=&quot;https://redis.com/&quot;&gt;Redis&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html&quot;&gt;Let it SNOW for Trino&lt;/a&gt;
presented by Erik Anderson from &lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/open-source/projects/&quot;&gt;Bloomberg&lt;/a&gt;
and Yu Teng from &lt;a href=&quot;https://www.ovhcloud.com/en-ie/public-cloud/data-platform/&quot;&gt;ForePaaS&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/14/trino-fest-2023-dune.html&quot;&gt;DuneSQL, a query engine for blockchain data&lt;/a&gt; presented by Miguel Filipe and Jonas
Irgens Kylling from &lt;a href=&quot;https://dune.com/&quot;&gt;Dune&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/17/trino-fest-2023-comcast-recap.html&quot;&gt;Data Mesh implementation using Hive views&lt;/a&gt;
presented by Alejandro Rojas from &lt;a href=&quot;https://comcast.github.io/&quot;&gt;Comcast&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/19/trino-fest-2023-stripe.html&quot;&gt;Inspecting Trino on ice&lt;/a&gt; presented by Kevin Liu
from &lt;a href=&quot;https://stripe.com/&quot;&gt;Stripe&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/21/trino-fest-2023-alluxio-recap.html&quot;&gt;Trino optimization with distributed caching on Data Lake&lt;/a&gt;
presented by Hope Wang and Beinan Wang from &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/25/trino-fest-2023-datto.html&quot;&gt;Starburst Galaxy: A romance of many architectures&lt;/a&gt; presented by Benjamin Jeter from
&lt;a href=&quot;https://www.datto.com/&quot;&gt;Datto&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/27/trino-fest-2023-fugue-recap.html&quot;&gt;FugueSQL, Interoperable Python and Trino for interactive workloads&lt;/a&gt;
presented by &lt;a href=&quot;https://www.linkedin.com/in/kvnkho/&quot;&gt;Kevin Kho&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;next-up&quot;&gt;Next up&lt;/h2&gt;

&lt;p&gt;This first recap is sharing all the video recordings with you all if you can’t
wait. But stay tuned, because we’ll also be publishing individual blog posts and
recaps for each session, and they’ll include additional useful info:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Summary of the main lessons and takeaways from the session&lt;/li&gt;
  &lt;li&gt;Slide decks for you to browse on your own&lt;/li&gt;
  &lt;li&gt;Interesting and fun quotes from the speakers and audience&lt;/li&gt;
  &lt;li&gt;Notes and impressions from the audience and event hosts&lt;/li&gt;
  &lt;li&gt;Questions and answer during the event&lt;/li&gt;
  &lt;li&gt;Links to further documentation, tutorials, and other resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll be rolling out recap posts for a few talks each week, so keep an eye out
on our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;community chat&lt;/a&gt; or the website for updates.&lt;/p&gt;

&lt;p&gt;At the same time, we are already marching ahead and planning towards our next
major event in autumn. Trino Summit 2023 - here we come!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Last week we held Trino Fest, and it kept us all so busy, we forgot to spend time chilling by the lakehouse! Great demos, amazing announcements, new plugins, and use cases reached our active audience. Thanks go to our event host and organizer Starburst, to our sponsors AWS and Alluxio, to our many well-prepared speakers, and to our great live audience. Now you get a chance to catch up on anything you missed.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest nears with an all-star lineup</title>
      <link href="https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup.html" rel="alternate" type="text/html" title="Trino Fest nears with an all-star lineup" />
      <published>2023-06-01T00:00:00+00:00</published>
      <updated>2023-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup.html">&lt;p&gt;Trino Fest is just around the corner! We’re only two weeks away, and we’re
excited to share that we’ve got an incredible speaker lineup with a wide variety
of talks about all things Trino. If you’re out of the loop,
&lt;a href=&quot;/2023-04-05-announcing-trino-fest-2023.html&quot;&gt;we announced Trino Fest&lt;/a&gt; back in
April as a two-day, free, virtual event. If you want to attend, see talks live,
engage with our speakers in Q&amp;amp;As at the end of each session, you’ll need to
register, so don’t delay, and…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;With that said, we’re also excited to bring you a preview of our exciting
speaker lineup. Read on if you’d like to learn more.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;new-connectors&quot;&gt;New connectors&lt;/h2&gt;

&lt;p&gt;We’ve got two talks, one from Bloomberg and ForePaaS and another from Redis,
about ongoing efforts to extend Trino’s functionality to query even more data
sources. Erik Anderson from Bloomberg and Yu Teng from ForePaaS will talk about
their shared need for a Snowflake connector and the collaboration to get their
two connectors merged and then merged into Trino. Allen Terleto and Julien Ruaux
at Redis will be talking about a new, custom, and improved Redis connector for
Trino, showing how you can leverage the speed of both Redis and Trino to run
queries faster than ever while seamlessly integrating with data visualization
frameworks.&lt;/p&gt;

&lt;h2 id=&quot;the-python-ecosystem&quot;&gt;The Python ecosystem&lt;/h2&gt;

&lt;p&gt;We’ve got talks from &lt;a href=&quot;https://github.com/fugue-project/fugue&quot;&gt;Fugue&lt;/a&gt; and
&lt;a href=&quot;https://ibis-project.org/&quot;&gt;Ibis&lt;/a&gt;, two different tools that integrate Python
with SQL, and then run that SQL on underlying data sources. Both have recently
added Trino support, and they’re excited to share their use cases and introduce
the Trino community to the new, powerful ways you can leverage it. Trino has
always been a SQL query engine, but with Fugue and Ibis, writing Python code to
run queries with Trino is suddenly a reality, and analysts and data scientists
may not even need to know much SQL to get the insights they’re looking for.&lt;/p&gt;

&lt;h2 id=&quot;data-lakes&quot;&gt;Data lakes&lt;/h2&gt;

&lt;p&gt;Ryan Blue, the co-founder of Iceberg and founder of Tabular, will be exploring
how to best write CDC (change data capture) streams into Iceberg tables. A talk
from Kevin Liu at Stripe will explore how a data engineer can monitor queries
being run on Iceberg to catch performance outliers and understand usage rates. A
talk from Alluxio highlights caching optimizations with Trino and data lakes.
OneHouse is giving a talk about using Trino with Hudi, exploring how to get
query latency down, how multi-modal indexing works in Hudi, and how Trino can
utilize that indexing to execute queries at astonishing speeds. A lightning talk
from Comcast will explore Hive views, and DuneSQL will be discussing its use of
Trino with Delta Lake, rounding out coverage on all four of Trino’s lakehouse
connectors.&lt;/p&gt;

&lt;h2 id=&quot;and-more&quot;&gt;And more!&lt;/h2&gt;

&lt;p&gt;We’ll hear from customers of Trino’s main commercial vendors - Datto will be
discussing their use of Starburst Galaxy, and Arctic Wolf will give an overview
of how AWS Athena helps them provide data to customers. Jan Was from Starburst
has a lightning talk on avoiding the costs of BI tools or expensive
visualization software by setting things up for free with GitHub Actions. And
Walmart has a talk on finding ways to cut costs with cloud storage, rounding out
our expansive lineup.&lt;/p&gt;

&lt;p&gt;Does any of that sound exciting?
&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Go sign up to attend Trino Fest 2023&lt;/a&gt;,
and we look forward to seeing you there!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>Trino Fest is just around the corner! We’re only two weeks away, and we’re excited to share that we’ve got an incredible speaker lineup with a wide variety of talks about all things Trino. If you’re out of the loop, we announced Trino Fest back in April as a two-day, free, virtual event. If you want to attend, see talks live, engage with our speakers in Q&amp;amp;As at the end of each session, you’ll need to register, so don’t delay, and… Register to attend! With that said, we’re also excited to bring you a preview of our exciting speaker lineup. Read on if you’d like to learn more.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest-featured-talks.png" />
      
    </entry>
  
    <entry>
      <title>Trino at Open Source Summit North America 2023</title>
      <link href="https://trino.io/blog/2023/05/15/oss-na.html" rel="alternate" type="text/html" title="Trino at Open Source Summit North America 2023" />
      <published>2023-05-15T00:00:00+00:00</published>
      <updated>2023-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/05/15/oss-na</id>
      <content type="html" xml:base="https://trino.io/blog/2023/05/15/oss-na.html">&lt;p&gt;Last week, I had the pleasure to attend &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;Open Source Summit North America
2023&lt;/a&gt; in
Vancouver. A quick hop across the &lt;a href=&quot;https://en.wikipedia.org/wiki/Strait_of_Georgia&quot;&gt;Strait of
Georgia&lt;/a&gt; got me right into the
event and into the midst of my peers of open source developers, advocates, and
enthusiasts.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;A highlight of the event for me was catching up with many existing and new
friends from the open source communities. It was inspiring to learn details
about the success of open source projects, including
&lt;a href=&quot;https://opensearch.org/&quot;&gt;Opensearch&lt;/a&gt;, &lt;a href=&quot;https://riscv.org/about/&quot;&gt;RISC-V&lt;/a&gt;, the
British Columbia government &lt;a href=&quot;https://developer.gov.bc.ca/&quot;&gt;DevHub project&lt;/a&gt;, NASA
&lt;a href=&quot;https://code.nasa.gov/&quot;&gt;open source&lt;/a&gt; and &lt;a href=&quot;https://data.nasa.gov/&quot;&gt;open data
projects&lt;/a&gt;, and many others.&lt;/p&gt;

&lt;p&gt;In my interview with John Furrier and Rob Strechay for &lt;a href=&quot;https://www.thecube.net/&quot;&gt;SiliconANGLE
theCUBE&lt;/a&gt;, I was able to share more information about
Trino, query engines, lakehouses, and &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;. We also
talked about the benefits of using Trino for different use cases, how data
continues to be crucial, and how it is even important thanks to the new
wave of large language models.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-orange&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://siliconangle.com/2023/05/11/making-data-accessibility-faster-and-friendly-using-distributed-query-insights-ossummit/&quot; target=&quot;_blank&quot;&gt;Read more about the interview and watch the video&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;SiliconANGLE theCUBE features &lt;a href=&quot;https://www.thecube.net/events/linux-foundation/open-source-summit-na-2023&quot;&gt;more interview coverage from the
summit&lt;/a&gt;.
and The Linux Foundation &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;makes keynote and session videos as well as
presentation decks available&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My special thanks goes to Starburst for sending me to represent the Trino
community at the summit. I also really appreciate the help for organizing Trino
Fest. The speaker proposals are all in, and the free, virtual event is promising
to be a great showcase of Trino, modern lakehouse platforms and tools from the
community of users, contributors and vendors, and our increased adoption for a
wide range of use cases.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot; target=&quot;_blank&quot;&gt;Register for Trino Fest 2023&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;Join us in June for the event, you don’t want to miss some of the announcements
and demos.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Last week, I had the pleasure to attend Open Source Summit North America 2023 in Vancouver. A quick hop across the Strait of Georgia got me right into the event and into the midst of my peers of open source developers, advocates, and enthusiasts.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/manfred-open-source-summit.jpg" />
      
    </entry>
  
    <entry>
      <title>Refreshing at the lakehouse summer camp</title>
      <link href="https://trino.io/blog/2023/05/03/refresh-at-trino-fest.html" rel="alternate" type="text/html" title="Refreshing at the lakehouse summer camp" />
      <published>2023-05-03T00:00:00+00:00</published>
      <updated>2023-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/05/03/refresh-at-trino-fest</id>
      <content type="html" xml:base="https://trino.io/blog/2023/05/03/refresh-at-trino-fest.html">&lt;p&gt;Summer is just around the corner, and we are busy getting ready for &lt;a href=&quot;/blog/2023/04/05/announcing-trino-fest-2023.html&quot;&gt;Trino Fest
2023&lt;/a&gt;. Everything is
ramping up. Early birds are starting to register, and &lt;a href=&quot;https://www.starburst.io/info/trinofest&quot;&gt;so should
you&lt;/a&gt;. Our Trino Fest theme song is
available for your listening pleasure, and we are reviewing speaker submissions.
The festival is promising to be another great event to learn about lakehouse use
cases with Trino, but we are also featuring some great presentations for
querying data with Trino. And of course, we are still looking for more
presenters, so don’t hesitate and &lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;submit your
proposal&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Before you dive into the technical details of our upcoming conference, lean back
and listen to our theme song. Hopefully you are feeling the summer vibe coming
your way already.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/6oN-70jSbF8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Our event host &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; is again helping us ensure
that Trino Fest is a venue for Trino beginners and experts to meet, exchange
ideas, and learn from each other. One of the Starburst engineers, &lt;a href=&quot;https://github.com/nineinchnick&quot;&gt;Jan
Waś&lt;/a&gt;, is scheduled to present about his
amazingly low-effort setup to use Trino for data analysis and report generation.&lt;/p&gt;

&lt;p&gt;Getting closer to the theme of the event “Lakehouse summer camp”, we are
planning to have sessions about Iceberg, Delta Lake, and Hudi usage with Trino.
Learn about the latest  developments from these projects and practical tips and
tricks from the user community.&lt;/p&gt;

&lt;p&gt;In the keynote, Martin Traverso will speak about the many new features that
arrived in Trino since &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit last year&lt;/a&gt;. This includes the new Apache Ignite
connector we talked about in the &lt;a href=&quot;https://trino.io/episodes/46.html&quot;&gt;Trino Community Broadcast episode
46&lt;/a&gt;. At Trino Fest we are going to share some
more exciting news about new connectors and integrations for Trino. Specifically
on the client tooling side you can expect some great demos and news from the
Python community.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? It’s time to register for the event. And if you
think you also want to share your knowledge and usage of Trino, submit a speaker
proposal.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-orange&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot; target=&quot;_blank&quot;&gt;Register&lt;/a&gt;
  &lt;a class=&quot;btn btn-pink&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://sessionize.com/trino-fest-2023&quot; target=&quot;_blank&quot;&gt;Submit a talk&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;In either case, as your hosts and guides through the two half days, we look
forward to have you at the event.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred and Cole&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Summer is just around the corner, and we are busy getting ready for Trino Fest 2023. Everything is ramping up. Early birds are starting to register, and so should you. Our Trino Fest theme song is available for your listening pleasure, and we are reviewing speaker submissions. The festival is promising to be another great event to learn about lakehouse use cases with Trino, but we are also featuring some great presentations for querying data with Trino. And of course, we are still looking for more presenters, so don’t hesitate and submit your proposal.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>Just the right time date predicates with Iceberg</title>
      <link href="https://trino.io/blog/2023/04/11/date-predicates.html" rel="alternate" type="text/html" title="Just the right time date predicates with Iceberg" />
      <published>2023-04-11T00:00:00+00:00</published>
      <updated>2023-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/11/date-predicates</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/11/date-predicates.html">&lt;p&gt;In the data lake world, data partitioning is a technique that is critical to the
performance of read operations. In order to avoid scanning large amounts of data
accidentally, and also to limit the number of partitions that are being
processed by a query, a query engine must push down constant expressions when
filtering partitions.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Partitions in an Iceberg table tend to be fairly large, containing up to tens or
even hundreds of data files. It is therefore crucial to be able to skip
irrelevant partitions while scanning a table in order to ensure high performance
query processing speed. When a table is created in a data lake, its partitioning
scheme constitutes a de-facto index, speeding up queries against it by pruning
out irrelevant partitions from the scan operation.&lt;/p&gt;

&lt;p&gt;Date and time are natural and universal partitioning candidates. Common
partition patterns revolve around month, day, hour. One exciting feature  of the
Iceberg table format is its &lt;a href=&quot;https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html#partition-specification-evolution&quot;&gt;hidden
partitioning&lt;/a&gt;.
Iceberg uses handy
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#partitioned-tables&quot;&gt;transforms&lt;/a&gt;
such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;year&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;month&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;day&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hour&lt;/code&gt; to deal with the complexities of mapping
a raw timestamp value to an actual partition value in a manner that is
transparent to the user.&lt;/p&gt;

&lt;p&gt;Let’s look at a typical example of an Iceberg table containing log events which
are partitioned by day:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;level&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;message&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitioning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;day(event_time)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When dealing with logs, it often happens that we want to know what happened
today or within the last few days:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;constant-folding&quot;&gt;Constant folding&lt;/h2&gt;

&lt;p&gt;Trino uses the &lt;em&gt;constant folding&lt;/em&gt; optimization technique for dealing with these
types of queries by internally rewriting the filter expression as a comparison
predicate against a constant evaluated before executing the query in order to
avoid recalculating the same expression for each row scanned:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/constant_folding.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;predicate-pushdown&quot;&gt;Predicate pushdown&lt;/h2&gt;

&lt;p&gt;Another common query scenario for log data is to query for a specific date in
the past. A seasoned SQL user, being aware of the underlying data type of the
partitioning column, would likely specify the date to be queried explicitly as
two timestamp constant filter expressions:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 00:00:00.000000 UTC&apos;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-21 00:00:00.000000 UTC&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A different flavor of the above-mentioned query would be to use
the &lt;a href=&quot;/docs/current/functions/comparison.html#range-operator-between&quot;&gt;BETWEEN&lt;/a&gt;
range operator:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 00:00:00.000000 UTC&apos;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 23:59:59.999999 UTC&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Users can focus on writing queries that are concise and readable by other human
readers, and leave the eventual grunt optimization work to the query engine.&lt;/p&gt;

&lt;p&gt;A succinct way of querying the logs for a specific day would be to cast the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; field value to its corresponding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt; value and compare it with
the day containing the relevant logs:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this case, Trino &lt;a href=&quot;https://github.com/trinodb/trino/commit/49be4c2a&quot;&gt;unwraps the initial temporal
filter&lt;/a&gt; to a filter that tests
whether the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; is within the constant timestamp range
corresponding to the date used in the initial filter, which is equivalent to the
most efficient of the explicit filters mentioned above.&lt;/p&gt;

&lt;p&gt;A different approach of querying the log data for a specific date is to use the
&lt;a href=&quot;/docs/current/functions/datetime.html#truncation-function&quot;&gt;date_trunc&lt;/a&gt;
function:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;date_trunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;day&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Trino again &lt;a href=&quot;https://github.com/trinodb/trino/commit/80c079f9&quot;&gt;replaces the initial temporal
filter&lt;/a&gt; to a filter testing
whether the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; is within the constant timestamp range
corresponding to the date used in the initial filter.&lt;/p&gt;

&lt;p&gt;A slightly different use case is querying the log data to see whether an exotic
error type is recorded in the logs during previous months of the current year by
making use of the
&lt;a href=&quot;/docs/current/functions/datetime.html#year&quot;&gt;year()&lt;/a&gt; function:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2023&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This time, Trino &lt;a href=&quot;https://github.com/trinodb/trino/commit/b8967a3c1550b6e64ad8d3e7979ea46fbfc51550&quot;&gt;rewrites the temporal
filter&lt;/a&gt;
applied on the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BETWEEN&lt;/code&gt; filter for the unfolded date
range corresponding to the entire span of the specified year:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2023-01-01 00:00:00.000000 UTC&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2023-12-31 23:59:59.999999&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without predicate pushdown, the filtering is done by Trino on each tuple, after
scanning the entire content of the table:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/filter_basic_data_flow.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The optimization techniques employed by Trino to speed up the above mentioned
types of queries all involve replacing the provided filter with an equivalent
filter expression. Constant replacement optimizations compare the table column
against a constant or a constant range with the purpose of literally pushing the
filter down to &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a consequence, the partition pruning happens on the metadata layer of the
table instead of filtering on top of the data itself, dramatically reducing the
amount of actual data files scanned:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/filter_push_down_data_flow.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As described in the &lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg Table Spec&lt;/a&gt;, for
any snapshot of the table, Iceberg tracks each individual data file and the
partition to which it belongs. Iceberg uses a hierarchical index in its metadata
layer by storing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lower_bounds&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;upper_bounds&lt;/code&gt; for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;each partition in the manifest list files&lt;/li&gt;
  &lt;li&gt;each data file in the manifest files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Desugaring seemingly variable filter expressions to comparison predicates
involving only columns and constants or constant ranges pays off. Not only does
it prune out partitions, but it also skips portions of the data file (for
example a Apache Parquet row group) or even the data file altogether in certain
circumstances. For instance, pruning and skipping can occur  if the queried
range value does not overlap with the indexed Iceberg metadata range of values
contained in the file, in case of a non-partition column filter.&lt;/p&gt;

&lt;p&gt;To put things in perspective, the optimization techniques presented in this
article, which have been already integrated in Trino, can cause the execution of
queries containing temporal filters with selective filters to complete in
seconds compared (depending on the size of the table scanned) to hours.&lt;/p&gt;

&lt;p&gt;A reader keen to experiment and discover whether the previously mentioned
optimization techniques are actually effective can use
&lt;a href=&quot;/docs/current/sql/explain.html&quot;&gt;EXPLAIN&lt;/a&gt; to examine the output
of the query planning stage. If the temporal predicate employed in the query is
being pushed down, the scan operation should definitely have fewer rows than the
count of all rows contained in the table.&lt;/p&gt;

&lt;p&gt;The queries in this post showcase just a tiny fraction of the myriad of
techniques which can be employed to perform queries on date and time columns.
Trino continuously strives to streamline its users’ workflows by providing the
results of queries as fast as possible.&lt;/p&gt;</content>

      
        <author>
          <name>Marius Grama</name>
        </author>
      

      <summary>In the data lake world, data partitioning is a technique that is critical to the performance of read operations. In order to avoid scanning large amounts of data accidentally, and also to limit the number of partitions that are being processed by a query, a query engine must push down constant expressions when filtering partitions.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/date-predicates/christian-pfeifer-l6OraG-v0d8-unsplash.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino and the BDFL model: a renewed focus</title>
      <link href="https://trino.io/blog/2023/04/06/trino-bdfl-focus.html" rel="alternate" type="text/html" title="Trino and the BDFL model: a renewed focus" />
      <published>2023-04-06T00:00:00+00:00</published>
      <updated>2023-04-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/06/trino-bdfl-focus</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/06/trino-bdfl-focus.html">&lt;p&gt;For those who are paying close attention, you may notice updates to a few pages
across the Trino website with a renewed focus on leadership roles in Trino. This
is part of an effort to re-focus and make the operating model more transparent
both for contributors and for end users. While this is not a functional change,
this does involve clarifying our roles following the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Benevolent_dictator_for_life&quot;&gt;BDFL (benevolent dictator for life)&lt;/a&gt;
model.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Trino has been a popular open source project used by many companies and
organizations since its inception in 2012. As a founder-led project, it has
consistently operated under a BDFL model, though not necessarily by name. The
model is used to describe the persons who can make the final decisions for the
direction and development of the project. Many successful open-source projects,
including Linux, Python, Scala, Ruby, and Rust, operate using a BDFL model.&lt;/p&gt;

&lt;h2 id=&quot;why-the-bdfl-model&quot;&gt;Why the BDFL model?&lt;/h2&gt;

&lt;p&gt;One of the key benefits of the BDFL model is that it allows for a clear
decision-making process. When a project has a large number of contributors, it
can be difficult to reach consensus on certain issues. The BDFL can step in and
make the final decision, which can be particularly helpful in situations where
time is of the essence. Additionally, having a BDFL can provide a sense of
stability and direction for the project.&lt;/p&gt;

&lt;p&gt;It’s important to emphasize that the use of the BDFL model is not a new
development in Trino’s history. We (Dain, David and Martin) have acted in 
this role since the beginning.&lt;/p&gt;

&lt;h2 id=&quot;why-now&quot;&gt;Why now?&lt;/h2&gt;

&lt;p&gt;Why is there a renewed focus on the BDFL model now? Trino has reached a level
of maturity and a community size that has made increasingly important to have
clear leadership and decision-making processes. By making the BDFL model more
explicit, we can ensure that the project remains focused and continues to deliver
value to its users.&lt;/p&gt;

&lt;h2 id=&quot;more-info&quot;&gt;More info&lt;/h2&gt;

&lt;p&gt;You can check out the following pages for additional information:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/development/roles.html&quot;&gt;Roles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;Development process&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/individual-code-of-conduct.html&quot;&gt;Individual code of conduct&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>For those who are paying close attention, you may notice updates to a few pages across the Trino website with a renewed focus on leadership roles in Trino. This is part of an effort to re-focus and make the operating model more transparent both for contributors and for end users. While this is not a functional change, this does involve clarifying our roles following the BDFL (benevolent dictator for life) model.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/bdfl-blog/trino-logo.png" />
      
    </entry>
  
    <entry>
      <title>Polish edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl.html" rel="alternate" type="text/html" title="Polish edition of Trino: The Definitive Guide" />
      <published>2023-04-06T00:00:00+00:00</published>
      <updated>2023-04-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl.html">&lt;p&gt;At this stage Trino is used all around the globe as we know from the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;community
chat&lt;/a&gt; and &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;our speakers at Trino Summit 2022&lt;/a&gt;. One large community of Trino
contributors and maintainers, many employed by &lt;a href=&quot;http://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
is located in Poland. Poland also has a very active participation of developers
and users in the Java and Big Data communities.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that a translation of the book &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt; to Polish is
now available for the communities in Poland and beyond. We invite you all to get
your own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://ksiazki.promise.pl/produkt/trino-profesjonalny-przewodnik-sql-w-dowolnej-skali-w-dowolnym-magazynie-i-w-dowolnym-srodowisku/&quot;&gt;
        Trino Profesjonalny Przewodnik
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks for making this happen go out the teams at O’Reilly and
&lt;a href=&quot;https://ksiazki.promise.pl/&quot;&gt;Promise&lt;/a&gt;. We hope many readers will benefit from
the translated edition.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>At this stage Trino is used all around the globe as we know from the community chat and our speakers at Trino Summit 2022. One large community of Trino contributors and maintainers, many employed by Starburst, is located in Poland. Poland also has a very active participation of developers and users in the Java and Big Data communities.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-pl-cover.png" />
      
    </entry>
  
    <entry>
      <title>Lakehouse summer camp at Trino Fest 2023</title>
      <link href="https://trino.io/blog/2023/04/05/announcing-trino-fest-2023.html" rel="alternate" type="text/html" title="Lakehouse summer camp at Trino Fest 2023" />
      <published>2023-04-05T00:00:00+00:00</published>
      <updated>2023-04-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/05/announcing-trino-fest-2023</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/05/announcing-trino-fest-2023.html">&lt;p&gt;Get ready to kick off your summer with Commander Bun Bun at Trino Fest 2023!
This year’s event is going virtual and will take place over two days, &lt;strong&gt;the 14th
and 15th of June&lt;/strong&gt;. The focus of the event will be on Trino as a data lakehouse
query engine, with discussions on how new features and the ecosystem around
Trino can support better data lakehouse management.&lt;/p&gt;

&lt;p&gt;Trino Fest 2023 is the new annual summer event dedicated to all things Trino.
Building on the success of last year’s &lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de
Trino&lt;/a&gt;, we’re excited to bring
the community together once again to explore the latest trends and innovations
in Trino and data lakehouse management. With a focus on education, community
collaboration, and inspiration, Trino Fest 2023 will be a valuable experience
for anyone interested in improving their data and analytics platform. We hope to
see you there as attendee, speaker, or sponsor! Read below to find out how to
sign up.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;Call for speakers&lt;/a&gt; is now open, and we
invite you to submit a talk if you have an interesting perspective on Trino.
We’re particularly interested in talks related to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Data lake and lakehouse use cases, architectures and experiences&lt;/li&gt;
  &lt;li&gt;Apache Iceberg&lt;/li&gt;
  &lt;li&gt;Delta Lake&lt;/li&gt;
  &lt;li&gt;Hudi&lt;/li&gt;
  &lt;li&gt;Industry use cases for Trino&lt;/li&gt;
  &lt;li&gt;Query federation&lt;/li&gt;
  &lt;li&gt;Data governance with Trino&lt;/li&gt;
  &lt;li&gt;SQL with Trino&lt;/li&gt;
  &lt;li&gt;ETL/ELT/batch query processing&lt;/li&gt;
  &lt;li&gt;Other tools and integrations in the Trino ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The call for speakers closes on May 19th, so be sure to submit your talk soon!&lt;/p&gt;

&lt;h2 id=&quot;whats-new-this-year&quot;&gt;What’s new this year?&lt;/h2&gt;

&lt;p&gt;Aside from the new title, this year’s Trino Fest will differ from last years,
short conference in a few ways. We’re featuring more talks from Trino
practitioners, the event will run over two shorter days to avoid the death march
of talks, and there will be more summer, lakehouse, and camping puns. Of course,
there will be continued use of the &lt;a href=&quot;https://www.youtube.com/watch?v=kfJ63DNbAuI&amp;amp;list=PLFnr63che7wYFsknFAqisURvfm96rW0Dr&amp;amp;index=4&quot;&gt;Trinoritaville song
&lt;/a&gt;.
Whether you’re just getting started with Trino or you’re a seasoned pro, there
will be something for everyone at Trino Fest.&lt;/p&gt;

&lt;h2 id=&quot;what-is-trino-fest-versus-trino-summit&quot;&gt;What is Trino Fest versus Trino Summit&lt;/h2&gt;

&lt;p&gt;Trino was &lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;built from the beginning to query on Hive data&lt;/a&gt;, so Trino moving on to support a data
lakehouse is simply the evolution from its flagship use case. Trino Fest covers
the latest features and improvements to Trino that make it an even better choice
for data lakehouse management. You’ll hear from speakers who are using Trino in
innovative ways, and who can provide valuable insights and tips for managing
your own data lakehouse. Going with the chill summer theme, there will be plenty
of time to have fun and relax too!&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-fest&quot;&gt;Sponsor Trino Fest&lt;/h2&gt;

&lt;p&gt;If you’re interested in sponsoring Trino Fest 2023, we’d love to hear from you!
Sponsoring the event is a great way to get your brand in front of a highly
engaged audience of Trino enthusiasts and data professionals. Your support will
help make the event a success, and in return, we’ll offer a range of benefits,
such as logo placement on our website, social media shoutouts, and more. To
learn more about sponsoring Trino Fest 2023, reach out to
&lt;a href=&quot;mailto:events@starburst.io&quot;&gt;events@starburst.io&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;see-you-there&quot;&gt;See you there&lt;/h2&gt;

&lt;p&gt;Mark your calendar to save &lt;strong&gt;the 14th
and 15th of June&lt;/strong&gt; for Trino Fest 2023: Lakehouse Summer Camp. Get ready
for a two-day event that will get you diving into the deep end of the data lake.
&lt;a href=&quot;https://www.starburst.io/info/trinofest&quot;&gt;Registration is open now&lt;/a&gt;, and &lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;the
call for speakers&lt;/a&gt; closes on April 28th,
so be sure to sign up and submit your talk soon!&lt;/p&gt;

&lt;p&gt;Happy querying!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Get ready to kick off your summer with Commander Bun Bun at Trino Fest 2023! This year’s event is going virtual and will take place over two days, the 14th and 15th of June. The focus of the event will be on Trino as a data lakehouse query engine, with discussions on how new features and the ecosystem around Trino can support better data lakehouse management. Trino Fest 2023 is the new annual summer event dedicated to all things Trino. Building on the success of last year’s Cinco de Trino, we’re excited to bring the community together once again to explore the latest trends and innovations in Trino and data lakehouse management. With a focus on education, community collaboration, and inspiration, Trino Fest 2023 will be a valuable experience for anyone interested in improving their data and analytics platform. We hope to see you there as attendee, speaker, or sponsor! Read below to find out how to sign up.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>The rabbit reflects on Trino in 2022</title>
      <link href="https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html" rel="alternate" type="text/html" title="The rabbit reflects on Trino in 2022" />
      <published>2023-01-10T00:00:00+00:00</published>
      <updated>2023-01-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects</id>
      <content type="html" xml:base="https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html">&lt;p&gt;It’s that time of the year where everyone gives excessively broad or niche
predictions about the finance market, venture capital, or even the data
industry. And we are now bombarded with &lt;a href=&quot;https://www.githubunwrapped.com/&quot;&gt;“year-in-review” 
summaries&lt;/a&gt; where we find out just how much
data is being collected to generate those summaries. End-of-year reflections are
always useful because you can find patterns of what’s going well and what’s
going poorly. It’s also good to pause and take stock of the things that did go
well, because without that, you’ll only be looking at the list of things that
you still have to do, and that isn’t healthy for anybody. In that spirit, let’s
reflect on what we’ve been able to accomplish as a community this year, as well
as what to look forward to in the next year!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;2022-by-the-numbers&quot;&gt;2022 by the numbers&lt;/h2&gt;

&lt;p&gt;Let’s take a look at the Trino project’s growth and what happened specifically
in the past year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;1,031,842 unique visits 🙋to the Trino site&lt;/li&gt;
  &lt;li&gt;116,231 unique blog post views 👩‍💻 on the Trino site&lt;/li&gt;
  &lt;li&gt;60,296 views 👀 on YouTube&lt;/li&gt;
  &lt;li&gt;5,982 hours watched ⌚on YouTube&lt;/li&gt;
  &lt;li&gt;4,696 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;2,775 new members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;2,769 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2,550 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1,465 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;1,322 new followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;1,068 pull requests closed ❌ in GitHub&lt;/li&gt;
  &lt;li&gt;702 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;658 average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;56 videos 🎥 uploaded to YouTube&lt;/li&gt;
  &lt;li&gt;37 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;36 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;12 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;12 Trino 🍕 meetups&lt;/li&gt;
  &lt;li&gt;2 Trino ⛰️ Summits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Trino website got an impressive number of unique visits, also referred to as
entrances. This metric filters out refreshes and through traffic to count the
number of times a visitor started a unique session. Blog posts saw a 47 percent
increase from last year. Slack membership grew 13 percent and average weekly
active members grew an exciting 25 percent. YouTube views have increased by 218
percent. We’ve more than doubled the number of hours watched, which makes sense,
as we’ve nearly doubled the number of subscribers since last year.&lt;/p&gt;

&lt;p&gt;The project’s velocity hasn’t slowed down either. The number of commits grew 
27.6 percent this year and the number of created issues grew by 20 percent. This
increase in demand for features also pushed up merged pull requests numbers by
nearly 29 percent!&lt;/p&gt;

&lt;p&gt;Why are we pointing out the number of closed pull requests that weren’t merged?
We are improving communication with contributors regarding when and why we
explicitly decide not to move forward with a pull request. Part of this has
included a new initiative to close out old and inactive pull requests. There
have been a good number of pull requests that have fallen through the cracks and
are missing communication from the pull request creator or reviewer. The DevRel
team, Brian Olsen, Cole Bowden, and Manfred Moser, are actively working on
improving the workflow around pull requests and issues. Cole recently posted a 
&lt;a href=&quot;/blog/2023/01/09/cleaning-up-the-trino-backlog.html&quot;&gt;blog that dives deeper&lt;/a&gt;
into what this team is actively working on to improve the experience of 
contributing to the project.&lt;/p&gt;

&lt;h3 id=&quot;trino-is-trending&quot;&gt;Trino is trending&lt;/h3&gt;

&lt;p&gt;A lot of these metrics indicate the growing popularity of Trino, but they also
help drive further awareness of the project to others. One metric we pay close
attention to is the number of visitors we get through blog posts, as they grow
Trino’s visibility. This increases the number of contributors and users that
shape Trino to be the best analytics SQL query engine on the planet. One of our
most successful blog posts was &lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we
could do for the Trino Community&lt;/a&gt;.
The day this blog post was released, it doubled the website traffic we received
and set the record for blog post views or website views in a single day. For
reference, our previous record was the post we had when the project was 
rebranded.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/web-views.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This post gained a lot of traction for two reasons. Posts related to Meta and
the inner workings of open source communities naturally perform well, as many
developers are interested in these topics - drama is exciting! But you can have
an interesting topic that doesn’t go viral if nobody sees it. The catalyst to
this success was actually when &lt;a href=&quot;https://news.ycombinator.com/item?id=32323746&quot;&gt;David Phillips posted this to Hacker
News&lt;/a&gt;. We hit the top ten of 
Hacker News and occupied the front page for about two days.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/hacker-news.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So what is the takeaway here? We need your help! While it made sense for David
to do this post once, &lt;a href=&quot;https://news.ycombinator.com/newsguidelines.html&quot;&gt;Hacker News generally looks down upon repeated
self-promotion&lt;/a&gt;. Clearly 
&lt;a href=&quot;http://redd.it/zbe333&quot;&gt;there’s a lot of people interested&lt;/a&gt; in Trino, and Hacker
News and many other social media outlets are how we get the word out. If you
don’t think that sharing has much effect, we hope sharing this impact motivates
you to help us. We don’t want to keep Trino the hidden secret of Silicon Valley
much longer. We need your help to really get people continuously reading and
hearing about all things Trino. So share any time you see something cool going
on in our community!&lt;/p&gt;

&lt;h3 id=&quot;trino-touches-the-world&quot;&gt;Trino touches the world&lt;/h3&gt;

&lt;p&gt;Let’s take a look at the number of users who have initiated at least one session
on the Trino site in 2022 by top 10 countries. This goes to show the true global
reach this project has attained in 10 years.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;123,326 USA 🇺🇸users&lt;/li&gt;
  &lt;li&gt;33,540 Indian 🇮🇳users&lt;/li&gt;
  &lt;li&gt;30,955 Chinese 🇨🇳users&lt;/li&gt;
  &lt;li&gt;12,282 British 🇬🇧users&lt;/li&gt;
  &lt;li&gt;11,638 German 🇩🇪users&lt;/li&gt;
  &lt;li&gt;10,760 Canadian 🇨🇦 users&lt;/li&gt;
  &lt;li&gt;9,980 Brazilian 🇧🇷users&lt;/li&gt;
  &lt;li&gt;9,098 Singaporean 🇸🇬users&lt;/li&gt;
  &lt;li&gt;8,649 South Korean 🇰🇷users&lt;/li&gt;
  &lt;li&gt;8,636 Japanese 🇯🇵users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/world.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Our reach currently favors the USA, but our aim is to grow Trino in all
countries that are starting to show interest. The new edition of “Trino: The
Definitive Guide” is being translated into Chinese, &lt;a href=&quot;https://simpligility.ca/2022/12/trino-guide-for-everyone-in-2023/&quot;&gt;Polish, and
Japanese&lt;/a&gt;. If
you want to translate the book to your local language, please reach out to
Manfred Moser.&lt;/p&gt;

&lt;h2 id=&quot;trino-celebrates-its-tenth-birthday&quot;&gt;Trino celebrates its tenth birthday&lt;/h2&gt;

&lt;p&gt;Of all the incredible things that happened, one that gave us cause to reflect
was Trino’s tenth birthday. Martin, Dain, and David &lt;a href=&quot;https://trino.io/development/vision.html&quot;&gt;cite
longevity&lt;/a&gt; of the project as one of the
core philosophies that govern decisions around Trino. We expect that Trino will
be used for at least the next 20 years. We build for the long term. This first
decade &lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;has been an adventurous
ride&lt;/a&gt;, and wow has it &lt;a href=&quot;/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;produced an
incredible system&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/how-it-started-going.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We wanted to do something special with the community to celebrate this
milestone, so Brian put together a birthday video to timeline the evolution of
Presto and now Trino. We had a premiere watch party on the day of the tenth
anniversary and got some folks’ reactions. Take a look at the video if you
haven’t yet, you don’t want to miss it.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot; style=&quot;text-align: center;&quot;&gt;
 
&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;
      
&lt;/div&gt;

&lt;h2 id=&quot;trino-summit&quot;&gt;Trino Summit&lt;/h2&gt;

&lt;p&gt;The next event in 2022 was the Trino Summit, which was the first in-person
summit we’ve had as Trino, with well over 750 attendees. We had a stellar lineup
of speakers from companies like Apple, Astronomer, Bloomberg, Comcast,
Goldman Sachs, Lyft, Quora, Shopify, Upsolver, and Zillow.&lt;/p&gt;

&lt;p&gt;This summit had a Pokémon theme, making the analogy that data sources are much
like Pokémon and Trino is much like a Pokémon trainer trying to access and
federate all the data, train it, and level the data up. Check out the video for
a small summary, and if you missed this event, we have all 
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the recordings and slides available&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot; style=&quot;text-align: center;&quot;&gt;
 
&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/R1Z0VnKrQ9w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;
      
&lt;/div&gt;

&lt;p&gt;We want to thank &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt; for hosting this event and
all the sponsors for making this year’s summit possible. As usual, a huge thanks
to the community for showing up, engaging with each other, and bringing your
stories and curiosity.&lt;/p&gt;

&lt;h3 id=&quot;cinco-de-trino&quot;&gt;Cinco de Trino&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de Trino&lt;/a&gt; was
our mini Trino Summit held in the first half of the year. It dove into using
Trino with complementary tools to build a data lakehouse. The virtual event was
held on Cinco de Mayo (5th of May), which gave it a Margaritaville, on-the-lake
vibe. We used this conference as a platform to &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;launch the long-awaited Project
Tardigrade features&lt;/a&gt;
around the fault-tolerance mode for Trino.&lt;/p&gt;

&lt;h4 id=&quot;trino-contributor-congregation&quot;&gt;Trino Contributor Congregation&lt;/h4&gt;

&lt;p&gt;This year, we began what we are calling the Trino Contributor Congregation
(TCC), which brings together Trino contributors, maintainers, and developer
relations under the same roof. This congregation was to counter the siloed
nature of Trino development that occurred during the pandemic. Many community
members felt like their work wasn’t being seen and much of this was due to lack
of communication, and especially face-to-face communication, which builds
empathy and demands attention. The TCCs aim to increase connections and
collaboration between maintainers and contributors, create opportunities for
highly technical exchange of ideas and plans for Trino, and learn about usage
scenarios and issues from each other. This is different from the Trino Summit
since it focuses on gathering those who contribute code to keep the
conversations focused on developing features and removing blockers for
contributors.&lt;/p&gt;

&lt;p&gt;The first TCC happened just following Trino Summit in Palo Alto. This was
convenient for many, as a lot of folks were already in San Francisco to attend
Trino Summit. Moving forward we will continue having in-person TCCs around Trino
Summit to minimize the travel expected for anyone wanting to attend in-person
TCCs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/tcc.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Along with the in-person TCC, we also had the first virtual TCC in December.
This included a great deal of people in Eurasia who weren’t able to travel to
San Francisco in November. We covered mostly similar topics but with a larger
amount of interaction from those new voices.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/virtual-tcc.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;During these discussions the biggest topics covered timelines of existing
roadmap items and suggestions for other items that should get more attention.
We talked about upcoming connectors and plugins, and all the required
infrastructure needed to support that. A recurring theme was the need for better
testing infrastructure. The more information we can gather as a community, the
quicker we can remove any issues as new releases come out and increase adoption
of newer versions of Trino. We also discussed desired features around
resource-intensive and batch workloads, and the new polymorphic table function
features.&lt;/p&gt;

&lt;p&gt;The biggest takeaway from these meetings was that everyone now had a better
basis to engage with each other. As we move forward, we will continue the
cadence of having these virtual TCCs to keep everyone on the same page, and have
in-person meetings when there is a larger conference. With that, let’s cover
some of the features we gained this year.&lt;/p&gt;

&lt;h2 id=&quot;features&quot;&gt;Features&lt;/h2&gt;

&lt;p&gt;Of course, one of the main deliverables of our project are Trino releases. In
2022, we improved our release process and cadence, shipping 37 releases that
were packed with features, and we’re about to dive into a high-level list of the
most exciting ones that made their way to you. For details and to keep up you
can check out the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;fault-tolerant-execution-mode&quot;&gt;Fault-tolerant execution mode&lt;/h3&gt;

&lt;p&gt;2022 was the year of resiliency for Trino. Users have long requested adding a 
&lt;a href=&quot;https://trino.io/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant mechanism to 
Trino&lt;/a&gt; akin to
query engines like Apache Spark. Users wanted the ability to take the queries
that they were running in Trino and scale those queries to larger data or
resource intensive queries. Experimental features were implemented in late 2021
for &lt;a href=&quot;https://github.com/trinodb/trino/pull/9361&quot;&gt;automatic query retries&lt;/a&gt; and
earlier this year &lt;a href=&quot;https://github.com/trinodb/trino/pull/9818&quot;&gt;task-level
retries&lt;/a&gt;. The efforts for these
features were codenamed &lt;a href=&quot;https://trino.io/episodes/32.html&quot;&gt;Project Tardigrade&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Fault-tolerant execution relies on storing intermediate data between task
shuffles to have data persist in an exchange spool. The first iteration of this
was AWS S3, but eventually Azure Blob Storage and Google Cloud Storage were
included. The Project Tardigrade engineers started &lt;a href=&quot;/blog/2022/02/16/tardigrade-project-update.html&quot;&gt;improving performance and
fixing bugs&lt;/a&gt; in
fault-tolerant execution as users tested the early implementation. Later, memory
efficiency for aggregations, faster data transfers, and dynamic filtering with
fault-tolerant query execution were added. The &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;launch of fault-tolerant
execution&lt;/a&gt; happened at Cinco de
Trino. The first iterations only applied to queries being run on object-storage
connectors such as Hive, Iceberg, and Delta Lake. Recently, support for MySQL,
PostgreSQL, and SQL Server were added. These contributions added a foundation
for other JDBC connectors. A few companies, &lt;a href=&quot;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;most notably
Lyft&lt;/a&gt;, have
adopted this feature and are scaling it in production.&lt;/p&gt;

&lt;h3 id=&quot;sql-language-improvements&quot;&gt;SQL language improvements&lt;/h3&gt;

&lt;p&gt;Here are all the notable SQL features that made it to Trino this year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/merge.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statement support&lt;/a&gt; is
 the most impactful SQL feature released this year. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; allows users to
 implement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; functionality in one statement.
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is not simply syntax sugar, the implementation has profound performance
 improvements. A lot of your operations can be merged (pun intended) from 
 multiple tasks into a single scan over data. This functionality is absolutely
 critical for positioning Trino as a data lakehouse query engine. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is 
 currently available in the Hive, Iceberg, Delta Lake, Kudu, and Raptor 
 connectors. We discussed this and did a demo with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; on the recent &lt;a href=&quot;https://trino.io/episodes/40.html&quot;&gt;Trino
 Community Broadcast with Iceberg&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Another massive update was the introduction of &lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table
 functions&lt;/a&gt; (
 &lt;a href=&quot;https://trino.io/docs/current/functions/table.html&quot;&gt;PTFs&lt;/a&gt;). Table functions
 initially released with some initial passthrough query functionality that we
 see in connectors like Pinot, Elasticsearch, MySQL, PostgreSQL,
 &lt;a href=&quot;https://github.com/trinodb/trino/pull/12325&quot;&gt;and other JDBC connectors&lt;/a&gt;.
 However, this is only one small instance of what can be achieved with PTFs and
 the &lt;a href=&quot;https://www.youtube.com/clip/UgkxQcokpdgPjiuMKMC5-3HwHvlbmZjxAvxe&quot;&gt;true power comes from the generalization of this
 feature&lt;/a&gt;. 
 Dain and David gave &lt;a href=&quot;https://www.youtube.com/clip/Ugkx62IKgPd_v9eGBaPUHP2hyaRkWSXh8w8h&quot;&gt;a simpler explanation of
 PTFs&lt;/a&gt;. To
 dive in deeper, watch &lt;a href=&quot;https://trino.io/episodes/38.html&quot;&gt;this episode of
 the Trino Community Broadcast&lt;/a&gt; where Kasia
 Findeisen and Martin discuss PTFs in greater detail.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/8&quot;&gt;Dynamic function resolution&lt;/a&gt; has
 been discussed for many years and finally arrived. This provides the ability
 for &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=680&quot;&gt;connectors to provide functions at
 runtime&lt;/a&gt;. Unlike before, where you needed
 to statically register your functions ahead of time, you can now provide a
 plugin that contains these functions that are resolved at runtime. This enables
 features like supporting function calls to dynamically registered user-defined
 functions in different languages like Javascript or Python. Martin and Dain go
 into great detail about how this works when &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=1596&quot;&gt;answering this question at Trino
 Summit&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Trino gained support for JSON processing functions, which is a part of the
 &lt;a href=&quot;https://en.wikipedia.org/wiki/SQL:2016&quot;&gt;ANSI SQL 2016&lt;/a&gt; specification. This
 resolves a large number of issues reported by the community over the years.
 This includes the
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-array&quot;&gt;json_array&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-object&quot;&gt;json_object&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-exists&quot;&gt;json_exists&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-query&quot;&gt;json_query&lt;/a&gt;, and
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-value&quot;&gt;json_value&lt;/a&gt;
 functions that were added to Trino this year.&lt;/li&gt;
  &lt;li&gt;The JSON format was added to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; statement to provide an anonymized
 query plan output to enable offline analysis.&lt;/li&gt;
  &lt;li&gt;It became possible to comment on tables, columns of tables, and even views for
 various connectors. Support for setting comments on views was introduced very
 recently and includes support for Hive and Iceberg.&lt;/li&gt;
  &lt;li&gt;A ton of new functions were added, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_base32&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_base32&lt;/code&gt;,
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim_array&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;performance-improvements&quot;&gt;Performance improvements&lt;/h3&gt;

&lt;p&gt;Despite all the hype about vectorization being a silver bullet to make databases
go fast, the real speed comes from &lt;a href=&quot;https://www.youtube.com/clip/UgkxQwDYDS6evVJelNVjWAgrIhzg_Q-cAEyq&quot;&gt;better algorithms and better data structures
that lead to lower resource consumption&lt;/a&gt;.
Following is a list of some improvements that made their way into Trino this
year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino now offers improved performance for a variety of operations, including
 complex join criteria pushdown to connectors, faster aggregations, faster
 joins, and better performance for large clusters. We have also implemented
 improvements specifically for aggregations with filters and for the Glue
 metastore. In addition, we now support dynamic filtering for various connectors
 and have faster query planning for the Hive, Delta Lake, Iceberg, MySQL,
 PostgreSQL, and SQL Server connectors.&lt;/li&gt;
  &lt;li&gt;Along with general performance optimizations, there have been a great deal of
 query planning optimizations that lead to better performance for specific SQL
 operators. These include faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries, improved performance for
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; expressions and highly selective &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; queries, and enhanced
 performance and reliability for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; operations. We also made
 performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; queries, as well
 as faster planning of queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; predicates.&lt;/li&gt;
  &lt;li&gt;There are also optimizations for specific SQL types’ performance, such as
 string, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; types. We have also made aggregations over 
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt; columns faster and improved the performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; type and
 aggregation.&lt;/li&gt;
  &lt;li&gt;A last set of improvements come from reading open file formats like ORC and
 Parquet efficiently. We have improved the speed of reading or writing of all 
 data types from and to Parquet in general. There were also general performance
 to ORC types, and now have the ability to write Bloom filters in ORC files. We
 have also improved performance and efficiency for a wide range of ORC and
 Parquet-related operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These improvements in aggregate are at the core of what makes Trino fast. There
is no silver bullet you can plug in to speed things up. It takes time, effort,
and smart changes to improve the speed of various systems.&lt;/p&gt;

&lt;h3 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;Trino upgraded to Java 17&lt;/a&gt;. This
upgrade improves the overall speed and lowers the memory footprint of Trino with
various performance fixes to the JVM and garbage collectors. Trino uses the G1
garbage collector which can now more efficiently reclaim memory and reduce pause
times.&lt;/p&gt;

&lt;p&gt;Aside from having to perform the upgrades, we get a lot of these performance
enhancements for free. On top of performance, upgrading to Java 17 adds new Java
language features to improve the ability to write and maintain higher quality
code.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;this blog 
post&lt;/a&gt; and watch episode 36
of &lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;the Trino Community Broadcast&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Along with the Java upgrade, Trino now has a Docker image for ppc64le and added
CLI support for ARM64, which means Trino’s Docker image can run on AWS Graviton
processors and the image and CLI can run on the new MacBooks.&lt;/p&gt;

&lt;h3 id=&quot;security&quot;&gt;Security&lt;/h3&gt;

&lt;p&gt;Trino added the following improvements and features relevant for authentication,
authorization and integration with other security systems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;There were a lot of updates to &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html&quot;&gt;OAuth 2.0
 authentication&lt;/a&gt; like support for OAuth
 2.0 refresh tokens and allowing access token passthrough with refresh tokens
 enabled. We also added support for &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html#openid-connect-discovery&quot;&gt;automatic discovery of OpenID
 Connect&lt;/a&gt;
 metadata with OAuth 2.0 authentication, support for groups in OAuth2 claims,
 and reduced latency for OAuth2.0 authentication.&lt;/li&gt;
  &lt;li&gt;Hive, Iceberg, and Delta Lake got AWS Security Token Service (STS) credentials
 for authentication with Glue catalog and allow specifying an AWS role session
 name via S3 security mapping config.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;object-storage-connectors-hive-iceberg-delta-lake-hudi&quot;&gt;Object storage connectors (Hive, Iceberg, Delta Lake, Hudi):&lt;/h3&gt;

&lt;p&gt;One of the common uses for Trino is being used as a data lakehouse query engine.
This year we not only added two connectors to this category, but a lot of 
performance improvements across the board with the file reader and writer
improvements.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Earlier this year, we added the &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
 connector&lt;/a&gt; to finally
 reach everyone using Trino in the Delta Lake community. Delta Lake is a table
 format that improves on the Hive table format in areas like better support for
 ACID transactions. After the initial release, we added read and write support
 on Google Cloud Storage, added support for Databricks 10.4 LTS, and improved
 overall performance of the connector. To learn more about the Delta Lake
 connector, watch the &lt;a href=&quot;https://trino.io/episodes/34.html&quot;&gt;Trino Community Broadcast on Delta 
 Lake&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot;&gt;The Hudi connector&lt;/a&gt; is a
 more recent addition, but it’s just as exciting. Hudi was created at Uber with
 the goal of handling realtime ingestion to a data lake. This connector is the
 youngest of the three newest object storage connectors, so stay tuned to see
 more features land around this connector. See how Robinhood uses &lt;a href=&quot;https://trino.io/episodes/34.html&quot;&gt;Hudi and
 Trino in the Trino Community Broadcast&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The Iceberg connector had a massive amount of improvements as well, bringing
 it to the same level of a production-ready connector as Hive. Iceberg now has
 new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_orphan_files&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OPTIMIZE&lt;/code&gt; procedures.
 Having these capabilities along with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; are really the keys to being an
 effective lakehouse query engine. This year, Iceberg added support for the Glue
 metastore, the Avro file format, file-based access control, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and
 time travel syntax. Iceberg received a lot of performance improvements and
 improvement in latency when querying tables with many files.&lt;/li&gt;
  &lt;li&gt;Although it seems like Hive is gradually on its way out, there are many that
 still depends on the Hive connector to be performant. Hive received support for
 S3 Select pushdown for JSON data, IBM Cloud Object Storage in Hive,
 improved performance when querying partitioned Hive tables, and the
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_metadata_cache()&lt;/code&gt; procedure for the Hive connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-connectors&quot;&gt;Other connectors&lt;/h3&gt;

&lt;p&gt;A major feature of Trino is the availability of other connectors to query all
sorts of databases with SQL. All with the speed that Trino users are used to.
Here’s some of the major improvements that landed for these connectors in 2022:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New MariaDB connector&lt;/li&gt;
  &lt;li&gt;Performance improvements with various pushdowns in the MongoDB, MySQL, Oracle,
 PostgreSQL and SQL Server connectors.&lt;/li&gt;
  &lt;li&gt;Support for bulk data insertion in SQL Server connector.&lt;/li&gt;
  &lt;li&gt;Added a query passthrough table function to numerous connectors.&lt;/li&gt;
  &lt;li&gt;Expanded SQL features for various connectors by adding support for
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SCHEMA&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, and others.&lt;/li&gt;
  &lt;li&gt;Update Cassandra connector to support v5 and v6 protocols.&lt;/li&gt;
  &lt;li&gt;A collection of improvements on the Pinot and BigQuery connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;bug-fixes&quot;&gt;Bug fixes&lt;/h3&gt;

&lt;p&gt;Any software includes issues and bugs, Trino included. Thanks to our community
we learned about many of them, and fixed even more. Continue to test new
releases and report issues. Check out &lt;a href=&quot;https://trino.io/docs/current/release.html#releases-2022&quot;&gt;all the release notes for
details&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;updates-in-the-trino-ecosystem&quot;&gt;Updates in the Trino ecosystem&lt;/h2&gt;

&lt;p&gt;Outside of the excitement within the main Trino project, there was a great deal
going on in the larger Trino community and ecosystem:&lt;/p&gt;

&lt;h3 id=&quot;trino-the-definitive-guide-second-edition&quot;&gt;Trino: The Definitive Guide second edition&lt;/h3&gt;

&lt;p&gt;Martin, Manfred, and Matt released the &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;second version of Trino: The Definitive
Guide&lt;/a&gt;. This update of the
book from O’Reilly fixed errata, added the deployment process to include newer
Kubernetes installation methods, and updated features for all the additions that
had been released since the first version of the book. Along with this, &lt;a href=&quot;https://simpligility.ca/2022/12/trino-guide-for-everyone-in-2023/&quot;&gt;efforts
are underway to translate this
book&lt;/a&gt; to
different languages. Huge thanks to everyone involved in this!&lt;/p&gt;

&lt;h3 id=&quot;starburst-provides-trino-in-the-cloud&quot;&gt;Starburst provides Trino in the cloud&lt;/h3&gt;

&lt;p&gt;As a major community supporter, &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt; helped us
with events, marketing, developer relations, and partner cooperation. Starburst
also provided a large part of development and code contributions to Trino and
its related projects. Starburst acquired Varada and integrated the object
storage indexing technology, and they shipped many Starburst Enterprise releases
for self-managed deployments. On top of all that amazing work, Starburst
launched &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst Galaxy&lt;/a&gt;
as a powerful, multi-cloud SaaS offering of Trino. Security, cluster management,
a query editor, and many other features are included in this new platform.&lt;/p&gt;

&lt;h3 id=&quot;amazon-upgrades-athena&quot;&gt;Amazon upgrades Athena&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2022/12/01/athena.html&quot;&gt;Athena version three rolled out&lt;/a&gt;
and is now based on a recent Trino release. This is great news for Athena users
who were missing the many performance gains, expanded SQL support, and other
features from Trino, since the prior versions are based on old Presto releases.
As a result, the large Athena community and their feedback and knowledge have
become more integrated with the Trino community, and we are seeing positive
impact for Trino releases already.&lt;/p&gt;

&lt;h3 id=&quot;dbt-trino&quot;&gt;dbt-trino&lt;/h3&gt;

&lt;p&gt;dbt users rejoice! The &lt;a href=&quot;https://docs.getdbt.com/reference/warehouse-setups/trino-setup&quot;&gt;official dbt-Trino
integration&lt;/a&gt;
made it into dbt this year! This means that anyone who wanted to read or write
data to or from multiple data sources is now able to. If you want to dive into
it, &lt;a href=&quot;https://docs.starburst.io/blog/2022-11-30-dbt0-introduction.html&quot;&gt;check out this blog
post&lt;/a&gt; written
by the contributors of this integration.&lt;/p&gt;

&lt;h3 id=&quot;python-client-improvements&quot;&gt;Python client improvements&lt;/h3&gt;

&lt;p&gt;The amount of development of the
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt; doubled
this year. A major focus was on performance improvements with the sqlalchemy
integration. There was also a wide range of bug fixes.&lt;/p&gt;

&lt;h3 id=&quot;airflow-integration&quot;&gt;Airflow integration&lt;/h3&gt;

&lt;p&gt;The long-awaited &lt;a href=&quot;https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/index.html&quot;&gt;Trino/Airflow
integration&lt;/a&gt;
landed this year. This paired well with the new task-retry and fault-tolerant
execution features. To learn more about the full capabilities of pairing Trino’s
few fault-tolerant execution mode with Airflow, check out &lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;Philippe Gagnon’s
talk at this year’s Trino Summit&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;metabase-driver&quot;&gt;Metabase driver&lt;/h3&gt;

&lt;p&gt;A lot of folks in the community were asking for a &lt;a href=&quot;https://github.com/metabase/metabase/issues/17532&quot;&gt;Trino/Metabase
driver&lt;/a&gt; after Trino updated
its name. This was a large blocker for anyone who wants to move to Trino and
uses Metabase. Through a collaboration of the Metabase and Starburst engineers,
the &lt;a href=&quot;https://github.com/starburstdata/metabase-driver&quot;&gt;metabase-driver&lt;/a&gt; for
Trino was released, and we saw numerous users migrate to Trino.&lt;/p&gt;

&lt;h2 id=&quot;2023-roadmap&quot;&gt;2023 Roadmap&lt;/h2&gt;

&lt;p&gt;The upcoming roadmap was &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=799&quot;&gt;covered in detail&lt;/a&gt;
by Martin at Trino Summit. To avoid extending this blog even further, we’ll
leave you with the featured project that covers many aspects of the Trino core
engine.&lt;/p&gt;

&lt;h3 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt; aims to
improve Trino’s columnar and vectorized evaluation engine. Every year we report
on many incremental performance improvements. These improvements are typically
small in isolation but have a large aggregate impact. This incremental approach
is the real key to improving query engine performance, and there is always room
for further optimization. If you want to get involved with this exciting
project, or to learn about the latest innovations as they are being discussed,
join the #project-hummingbird channel in &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;the Trino Slack
workspace&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;2022 was by far the busiest year this bunny has been. Trino has consistently
continued growing as we’ve attracted more contributors. We believe this trend
will continue in 2023 as we begin to put more process in place around managing
pull requests. Remember to get the word out and share anything you genuinely
think is cool or important for others to hear! Looking forward to an even more
successful 2023 Trino nation!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Manfred Moser, Cole Bowden, Martin Traverso </name>
        </author>
      

      <summary>It’s that time of the year where everyone gives excessively broad or niche predictions about the finance market, venture capital, or even the data industry. And we are now bombarded with “year-in-review” summaries where we find out just how much data is being collected to generate those summaries. End-of-year reflections are always useful because you can find patterns of what’s going well and what’s going poorly. It’s also good to pause and take stock of the things that did go well, because without that, you’ll only be looking at the list of things that you still have to do, and that isn’t healthy for anybody. In that spirit, let’s reflect on what we’ve been able to accomplish as a community this year, as well as what to look forward to in the next year!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/2022-review/cbb-reflection.png" />
      
    </entry>
  
    <entry>
      <title>Cleaning up the Trino pull request backlog</title>
      <link href="https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog.html" rel="alternate" type="text/html" title="Cleaning up the Trino pull request backlog" />
      <published>2023-01-09T00:00:00+00:00</published>
      <updated>2023-01-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog</id>
      <content type="html" xml:base="https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog.html">&lt;p&gt;At some point in the lifecycle of a successful open source project, it reaches a
point where the number of incoming pull requests (PRs) outpace the project’s
ability to get code merged. It happens for a huge variety of reasons, including
developers moving on to other projects before tying up every loose end,
reviewers who miss a request for review, and because some stagnant PRs were
never going to happen and should have been closed two years ago. The GitHub
notification system doesn’t do anyone any favors, either. Having too many open
PRs is a problem for a project, because they make it harder to tell what is
being worked on and what may as well be dead code walking.&lt;/p&gt;

&lt;p&gt;And when we cross 700 open pull requests in Trino, constantly adding a few more
to the pile every week, what do we do? We clean it up! Let’s talk about how
we’re doing it, why we’re doing it that way, and how we’re planning on
preventing this from happening again. The end result should be some process
improvements that make contributing to Trino a better, faster, and more painless
experience.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;spring-cleaning&quot;&gt;Spring cleaning&lt;/h2&gt;

&lt;p&gt;The “how” is an easy thing to talk about. The Trino developer relations team is
in the process of going through all open PRs, from oldest to newest, manually
taking a look at each one and checking in on how we may want to proceed. For PRs
where the author seems to have abandoned it and not responded to a review, we close
them down, encouraging the authors to open them right back up if they decide
they want to continue work. For everything else, though, we’ve been taking a
more measured approach, offering to help facilitate reviews or discussion for
these long-lasting bits of code that may still have a chance of making their way
into Trino.&lt;/p&gt;

&lt;p&gt;To anyone who’s managed a repository before, this may seem like more effort than
necessary. You can add a bot to close anything that’s been stale or inactive for
too long, and problem solved, right? Sure, that does solve the problem, but it
creates a couple others.&lt;/p&gt;

&lt;p&gt;First, and perhaps most importantly: it’s not very human. Having a pull request
that you put time and effort into get shut down by a bot without having another
person swing by to say hello can be demoralizing, and it builds a negative
experience that might discourage future contributions to the project. We want
our contributors to like Trino and to enjoy the process of adding on to it, and
a GitHub bot slamming the door shut on their hard work isn’t going to help with
that. Having a bot do our work for us would also deprive us of a valuable
learning opportunity. Manually checking in on each pull request that slipped
through the cracks has allowed us to identify pain points in Trino code reviews
which we can try to mitigate moving forwards, and it’s provided a ton of
valuable insights for deciding on how to best improve the process.&lt;/p&gt;

&lt;p&gt;Second, and perhaps even more significant: there’s a lot of cool stuff we’d be
missing out on if we automatically closed everything. While going through the
backlog, we’ve found dozens of year-old pull requests that still have a lot of
value for Trino and only needed someone to take another look at them. For some,
the author may be missing, but the ideas are good and the PR can be handed off
to someone else to carry the torch and get it across the finish line. For
others, the author is still happy and ready to iterate on it, and all that’s
needed to get the ball rolling again is to ping a reviewer or two to take
another look. We’ve even found a couple PRs that were approved and ready to go,
and all it took was a simple click of the merge button. The effort-to-impact
ratio on that is off the charts - think of all the value we’d be missing out on
if we’d automatically closed those!&lt;/p&gt;

&lt;p&gt;The result of the effort so far has been excellent.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/backlog-blog/open-pull-requests-graph.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We’re not completely done with the cleanup effort, but as you can see, we’re
slowing down. Our oldest PRs are increasingly recent, still in development,
and worth having open. Going from a peak of 700+ open pull requests to around
300 is a massive improvement, and the goal is to end up in the vicinity of about
200 open pull requests in Trino at any point in time.&lt;/p&gt;

&lt;h2 id=&quot;keeping-things-pristine&quot;&gt;Keeping things pristine&lt;/h2&gt;

&lt;p&gt;But with the cleanup being so manual, the next challenge is stopping the pull
requests from steadily piling back up while we’re not paying attention to them.
The fix for that is simple - we’re going to keep paying attention. The Trino
developer relations team is planning on tracking and getting involved in two
categories of pull requests to keep the number of open PRs stable.&lt;/p&gt;

&lt;p&gt;The first category is pull requests that don’t get any immediate attention from
a reviewer. While Trino reviewers are overall excellent and quick to take a look
at incoming pull requests, about five percent slip through the cracks, where a
contributor submits something that receives no reviews or comments and lives on
in the pull request backlog. That’s not a good experience for the contributor,
and it’s not good for Trino, either, because that contribution could have a lot
of value. We plan on stopping this from happening by implementing workflows
which spring Trino developer relations into action when these situations arise.
If a pull request goes a few days without a comment, we’ll be the safety net to
ask questions, get engineers involved, and make sure that at least a few pairs
of eyes take a look at every incoming PR in a timely manner.&lt;/p&gt;

&lt;p&gt;The second category is pull requests that get some reviews, but eventually
stagnate or stop being actively worked on. This happens for a lot of reasons,
but in all cases, if a pull request goes a few weeks with no activity, the
developer relations team will be checking in. Our goal will be to figure out the
proper path forward, whether that’s flagging down some reviewers again,
communicating that the pull request should be closed, or anything else. The end
result should be that nothing slips through the cracks and ends up going months
without human contact. If an author vanishes or everyone gets too busy to look
at a pull request again, though, the final stop will ultimately be a stale bot
which closes pull requests that have gone a few months with no activity.&lt;/p&gt;

&lt;p&gt;With all these processes in place, contributors should never feel like their
efforts are going unnoticed. Submitted code should be reviewed quickly,
iterated on in a timely manner, and merged without much delay. In situations
where a pull request is &lt;em&gt;not&lt;/em&gt; going to be merged, the Trino developer relations
team should be able to chime in quickly to make that clear, saving contributors
from wasting time and effort on a false impression that their code will be
landed. And if you have any questions, concerns, or suggestions about all of
this, don’t hesitate to reach out to us directly on the Trino Slack using
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@devrel-team&lt;/code&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>At some point in the lifecycle of a successful open source project, it reaches a point where the number of incoming pull requests (PRs) outpace the project’s ability to get code merged. It happens for a huge variety of reasons, including developers moving on to other projects before tying up every loose end, reviewers who miss a request for review, and because some stagnant PRs were never going to happen and should have been closed two years ago. The GitHub notification system doesn’t do anyone any favors, either. Having too many open PRs is a problem for a project, because they make it harder to tell what is being worked on and what may as well be dead code walking. And when we cross 700 open pull requests in Trino, constantly adding a few more to the pile every week, what do we do? We clean it up! Let’s talk about how we’re doing it, why we’re doing it that way, and how we’re planning on preventing this from happening again. The end result should be some process improvements that make contributing to Trino a better, faster, and more painless experience.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/backlog-blog/so-many-pull-requests.png" />
      
    </entry>
  
    <entry>
      <title>Using Trino to analyze a product-led growth (PLG) user activation funnel</title>
      <link href="https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html" rel="alternate" type="text/html" title="Using Trino to analyze a product-led growth (PLG) user activation funnel" />
      <published>2022-12-23T00:00:00+00:00</published>
      <updated>2022-12-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html">&lt;p&gt;As the holiday season approaches, we have reached the end of our
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap posts&lt;/a&gt;.
With the last talk of the summit, Mei Long from Upsolver gave an insightful
overview of how they use data to inform product decisions.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/MCB_1furnAo&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Upsolver.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;When talking about product-led growth (PLG), it helps to start by defining what
it even means. The core idea is simple: see how users engage with your product,
and make decisions based on how you can improve the product to better serve
those users. At Upsolver, the goal of PLG is to maximize user value. The issue
is that while this can be simple in some situations, when you’re delivering
complicated analytics tools, it’s not always immediately clear what features
would be the most valuable or useful. You need a lot of data to glean a lot of
insight, and you need to make sure your insights that can lead to action. And of
course, you need to be absolutely certain that your data is high-quality,
accurate, and trustworthy, lest you end up accidentally giving a customer a
ten million dollar discount.&lt;/p&gt;

&lt;p&gt;Mei explores the initial pass at using analytics to drive PLG at Upsolver,
letting her intern use a tool called Amplitude that worked for a time and for
limited use cases. As Upsolver grew, the analytics requirements did, too, and
Amplitude wasn’t powerful enough for Upsolver’s use case, nor for the more
complicated queries and analysis that needed to be run.&lt;/p&gt;

&lt;p&gt;Want to guess what query engine they swapped to using? Trino. Mei dives into a
quick demo that shows how Upsolver ingests all of its streaming data and stores
it for Trino to query, driving down time-to-insight to make it quick and
efficient to ask questions and make decisions based on those answers. With Trino
at the ready, Upsolver has never been better-equipped to work towards PLG.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&quot;&gt;https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/upsolver-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Mei Long, Cole Bowden</name>
        </author>
      

      <summary>As the holiday season approaches, we have reached the end of our Trino Summit 2022 recap posts. With the last talk of the summit, Mei Long from Upsolver gave an insightful overview of how they use data to inform product decisions.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/upsolver.jpg" />
      
    </entry>
  
    <entry>
      <title>Using Trino with Apache Airflow for (almost) all your data problems</title>
      <link href="https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html" rel="alternate" type="text/html" title="Using Trino with Apache Airflow for (almost) all your data problems" />
      <published>2022-12-21T00:00:00+00:00</published>
      <updated>2022-12-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html">&lt;p&gt;As we close in on the final talks from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt;, this next talk dives into how to set up
Trino for batch processing. Trino has historically been well-known for
facilitating fast adhoc analytics queries as opposed to long-running, resource
intensive batch/ETL queries. This is due to the fact that Trino kills queries
that run out of resources in order to prioritize faster query execution. Earlier
this year, Trino added features to better support batch queries with a new 
&lt;a href=&quot;https://trino.io/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant execution mode&lt;/a&gt;.
This mode backs up intermediate data during execution time, allowing Trino to
restart individual query tasks on failure rather than a query stage or the query
itself.&lt;/p&gt;

&lt;p&gt;Batch queries don’t typically involve human intervention and run asynchronously.
These tasks may depend on each other and have a complex workflow. This talk
describes how to orchestrate this complexity using Airflow’s new Trino
integration to run Trino batch queries to solve (almost) all your data problems.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/xKDN7RUJ5i4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Astronomer.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;In this talk, we’re going to hear from Philippe, a Trino contributor and
Solutions Architect at Astronomer, the company building a SaaS product around
Apache Airflow. Philippe describes a fictional trading scenario that initially
follows a traditional warehousing approach to storing data. This architecture
has data sources that are queried and submitted as raw data into a centralized
warehouse. Within the warehouse itself, the raw data is transformed into data
ready to be consumed.&lt;/p&gt;

&lt;p&gt;This model enforces centralization, in which one team runs the platform and
builds the integration between producers and consumers. This team focuses on the
aspects of the data platform which further separates them from the business use
case. As source databases evolve, the central data team must keep up with these
changes. As the data consumers that rely on the data infrastructure grow, this
team commonly becomes a bottleneck.&lt;/p&gt;

&lt;p&gt;Trino allows you to move the queries as close as possible to the federated data
sources, removing the labor-intensive process of moving data into stages
before ingesting it into a central warehouse. This doesn’t mean that data
movement is no longer a necessity, but the necessity shifts from an availability
concern to a performance and scalability concern.&lt;/p&gt;

&lt;p&gt;Without investing into more resources, your data professionals are able to work
closely with producers and stakeholders with a shared understanding of the
domain. This increases data literacy and data availability throughout your
organization.&lt;/p&gt;

&lt;p&gt;Trino is not only for fast adhoc analytics with a human in the loop, but now 
provides a fault-tolerant execution mode that enables it to run resource
intensive batch jobs. This, paired with the federation capabilities, make Trino
able to ingest any data that can be represented in a tabular format. Users can
implement user-defined functions and run transformations using SQL without
involving intermediate systems.&lt;/p&gt;

&lt;p&gt;To run Trino batch queries at scale requires building complex interdependencies
between different tasks and often needs monitoring if there are any failures
that occur. This configuration also demands reactive automation to handle the
failing instances. Apache Airflow is an open-source platform for developing,
scheduling, and monitoring batch-oriented workflows on systems like Trino,
perfectly complementing the challenges of handling these intensive queries at 
scale.&lt;/p&gt;

&lt;p&gt;Even before introducing fault-tolerant execution mode, &lt;a href=&quot;https://engineering.salesforce.com/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36/&quot;&gt;Trino was already being
used to run batch queries at scale&lt;/a&gt;.
In these scenarios, Trino and a tool like Airflow already work well together
because these jobs will take time and likely nobody wants to wait around to run
the pipeline components in sequence. The reason why fault-tolerant execution
mode brings the Trino and Airflow combination to the forefront, is due to the
anticipation of Trino being adopted as a batch query engine tool as the learning
curve to run ETL jobs on Trino becomes as trivial as other tools in the space.&lt;/p&gt;

&lt;p&gt;Philippe dives into building out basic Airflow jobs to run over Trino and
introduces the concept of a directed acyclic graph (DAG). He then dives into
multiple useful features that help break down large jobs into manageable tasks,
and jobs that can adjust the schedule based on runtime execution. Sharded job 
creation splits large batch jobs into smaller tasks that can easily be retried.
Dynamic task mapping splits jobs into smaller tasks based on data observed at
runtime. Finally, a new features called data aware scheduling can schedule tasks
based on interdependencies between datasets.&lt;/p&gt;

&lt;p&gt;To get started with Trino in Apache Airflow, check out the
&lt;a href=&quot;https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/index.html&quot;&gt;Airflow Trino provider documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&quot;&gt;https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/astronomer-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Philippe Gagnon, Brian Olsen</name>
        </author>
      

      <summary>As we close in on the final talks from Trino Summit 2022, this next talk dives into how to set up Trino for batch processing. Trino has historically been well-known for facilitating fast adhoc analytics queries as opposed to long-running, resource intensive batch/ETL queries. This is due to the fact that Trino kills queries that run out of resources in order to prioritize faster query execution. Earlier this year, Trino added features to better support batch queries with a new fault-tolerant execution mode. This mode backs up intermediate data during execution time, allowing Trino to restart individual query tasks on failure rather than a query stage or the query itself. Batch queries don’t typically involve human intervention and run asynchronously. These tasks may depend on each other and have a complex workflow. This talk describes how to orchestrate this complexity using Airflow’s new Trino integration to run Trino batch queries to solve (almost) all your data problems.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/astronomer.jpg" />
      
    </entry>
  
    <entry>
      <title>Journey to Iceberg with Trino</title>
      <link href="https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html" rel="alternate" type="text/html" title="Journey to Iceberg with Trino" />
      <published>2022-12-19T00:00:00+00:00</published>
      <updated>2022-12-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html">&lt;p&gt;This post comes from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the second half of Trino Summit 2022 session&lt;/a&gt;. Our friends JaeChang and Jennifer from
SK Telecom traveled across the globe from South Korea to join us in person! SK
Telecom recently had some issues scaling Trino on the Hive model, among other
issues that come with Hive. While some initial tweaking helped speed things up,
it ultimately never solved the problem. After switching to Iceberg, SK Telecom
ran initial performance tests with some very impressive results. In this talk,
Jennifer and JaeChang describe their journey to Iceberg with Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/V9_aPLXATh8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@SK-Telecom.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;SK Telecom is a South Korean telecom company that has built and operated an
on-premise data platform based on open source software to determine
manufacturing yield since 2015. SK Telecom’s goal has always been to build an observable
federated data platform on open source software at scale.&lt;/p&gt;

&lt;p&gt;SK Telecom manages on-premise Hadoop clusters to store their data. Previously,
they used tools like
&lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html&quot;&gt;distcp&lt;/a&gt; to
make data available in one center. SK Telecom started using Presto in 2016 and
shifted to Trino in 2021. To run batch queries on their warehouse, Trino workers
are deployed on HDFS data nodes. There is also an adhoc Trino cluster deployed
to manage federated queries over multiple data silos from an array of disparate
data sources. This was one of the slow and brittle processes that Trino
replaced. They chose Trino because it simplifies querying novel big data systems
and combines that data more commonplace systems for their users.&lt;/p&gt;

&lt;p&gt;As Trino adoption grew within the company up to 300 requests per minute, they
eventually faced challenges with scaling. Not only were the number of
requests growing, but the range of data being queried grew as well; users were
evaluating petabytes of data, with terabyte-sized query input processed across
hundreds of nodes. Many user queries were blocked while waiting for resources to
become available. In response, the data engineering team began investigating how
they could both scale and improve individual query performance.&lt;/p&gt;

&lt;p&gt;To find the root cause, SK Telecom’s data engineers investigated cluster
behavior beyond what was exposed in the web UI. They began collecting all the
query plan JSON files, coordinator and worker JMX stats, system metrics, and
Trino logs to build out their own metrics dashboard. The two main
causes were that input data was too large, and there were spikes in the number
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BlockedSplit&lt;/code&gt; operations leading to queries being blocked while waiting for
other tasks to complete. They initially aimed to address this by changing some
settings to increase thread counts and tuning the settings, but these changes
still didn’t achieve the desired results. The ultimate bottleneck was the Hive
metastore and the expensive list operations that caused many of the blocking
operations to finish slowly.&lt;/p&gt;

&lt;p&gt;At this point, the team reevaluated their needs to consider alternative
solutions. They needed a better indexing strategy on the data with a flexible
partitioning strategy. They also needed to remove the bottleneck on the metadata
for this data while still maintaining compatibility across multiple query
engines as Hive did.&lt;/p&gt;

&lt;p&gt;The team looked at the existing set of novel data lake connectors available in
Trino version 356, which at the time only included Iceberg. SK Telecom was 
immediately impressed by the metadata indexing in the Iceberg project. They 
particularly liked Iceberg’s snapshot isolation as data is created or modified.
They were able to speed up queries using data file pruning on partition and
column stats stored in the manifest file.&lt;/p&gt;

&lt;p&gt;After running a benchmark, the team found that Iceberg reduced the input data
size on the order of hundreds, down to under ten gigabytes. They also
investigated adding a high amount of partitions to continue lowering the input
data, but found that there’s a tradeoff where creating too many partitions
increases query planning time. Ultimately, they found a sweet spot where the
input data size was around six gigabytes and planning only took 70 milliseconds.&lt;/p&gt;

&lt;p&gt;This summary is just the tip of the iceberg of all the information JaeChang and
Jennifer shared with us about how Iceberg helped SK Telecom with their Trino
scaling issues. Watch this incredible talk to learn more if you’re considering
taking the leap from Hive to Iceberg!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&quot;&gt;https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/sk-telecom-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>JaeChang Song, Jennifer Oh, Brian Olsen</name>
        </author>
      

      <summary>This post comes from the second half of Trino Summit 2022 session. Our friends JaeChang and Jennifer from SK Telecom traveled across the globe from South Korea to join us in person! SK Telecom recently had some issues scaling Trino on the Hive model, among other issues that come with Hive. While some initial tweaking helped speed things up, it ultimately never solved the problem. After switching to Iceberg, SK Telecom ran initial performance tests with some very impressive results. In this talk, Jennifer and JaeChang describe their journey to Iceberg with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/sk-telecom.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino at Quora: Speed, cost, reliability challenges, and tips</title>
      <link href="https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html" rel="alternate" type="text/html" title="Trino at Quora: Speed, cost, reliability challenges, and tips" />
      <published>2022-12-16T00:00:00+00:00</published>
      <updated>2022-12-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html">&lt;p&gt;As we near the end of the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap series&lt;/a&gt;, it’s time to take a stop at Quora. At
Quora, being an engineer responsible for maintaining Trino comes with its fair
share of challenges. With concerns about cost, performance, and reliability,
Quora has taken several creative steps to ensure that they get the most out of
Trino. Other Trino users may be able to learn a few neat tips and tricks to
do the same by tuning in.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Q03DzL_fm-I&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Quora.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Trino at Quora is used in the big ways that we’re all familiar with. It receives
queries from a variety of clients and services, then executes those queries
on an S3 data lake and Hive metastore to return results at high speeds. With a
wide variety of clients, Quora gets the most out of Trino, using it for ad-hoc
analysis, but also for ETL, backfill jobs, A/B testing, and time series queries.
But as with any large system being used for so many things, this isn’t without a
few challenges.&lt;/p&gt;

&lt;p&gt;The first challenge is a universal one - how can Quora keep the costs of running
Trino to a minimum? One of the biggest strategies was to migrate to AWS Graviton
instances to run Trino clusters, as they have proven to be more cost-efficient
than other AMD and Intel-based EC2 instances at Quora. Graviton does have lower 
availability, though, so they sometimes must be complemented with some AMD/Intel
instances in order to avoid any downtime. Auto-scaling also led to great cost
savings, as the workloads varied based on time of day. By checking usage and
anticipating it by ramping up the number of machines during the busy workday and
ramping it back down when fewer jobs are in progress, Quora was able to minimize
idle machines and cut back on unnecessary spending. Finally, and perhaps most
obviously, the team at Quora worked to make ETL queries more efficient. By using
partitions effectively and creating a tool to detect inefficient queries
scanning too many partition keys, the result is efficient queries that take less
time and use fewer resources, saving on cost.&lt;/p&gt;

&lt;p&gt;Up next - how could Quora maximize Trino’s performance? With data analysts
expecting quick runtimes and occasionally running into problems, fine-tuning
Trino to run as well as it possibly can isn’t always an easy task. One
particular major issue they found at Quora was that some worker nodes which ran
for 24 hours or more straight would utilize less CPU and run slow, bogging
things down. The fix? Gracefully restart worker nodes that run for over a day,
and implement a detector to flag and restart any nodes which showed signs of
behaving slowly.&lt;/p&gt;

&lt;p&gt;The final big concern at Quora is reliability, as users expect Trino to be up
and running whenever they need it. In one instance, they found that overwriting
a specific configuration option caused a cluster to crash repeatedly and
slow down to a crawl. The issue was that they’d steadily been bumping the value
of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.min-expire-age&lt;/code&gt; configuration property up and up and up from the
default value of 15 minutes, until eventually, unexpired query history was using
up too much memory and causing the cluster to falter. Lowering the value back
down to something more advisable saved the day in that situation. But wanting to
avoid similar situations from happening again, Quora built extensive monitoring
tools to track the health of their Trino clusters. They ensure that even when
user error does cause problems, those problems can be flagged and send out
alerts, bringing the data engineering team to the rescue.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html&quot;&gt;https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/quora-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yifan Pan, Cole Bowden</name>
        </author>
      

      <summary>As we near the end of the Trino Summit 2022 recap series, it’s time to take a stop at Quora. At Quora, being an engineer responsible for maintaining Trino comes with its fair share of challenges. With concerns about cost, performance, and reliability, Quora has taken several creative steps to ensure that they get the most out of Trino. Other Trino users may be able to learn a few neat tips and tricks to do the same by tuning in.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/quora.jpg" />
      
    </entry>
  
    <entry>
      <title>Federating them all on Starburst Galaxy</title>
      <link href="https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html" rel="alternate" type="text/html" title="Federating them all on Starburst Galaxy" />
      <published>2022-12-14T00:00:00+00:00</published>
      <updated>2022-12-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html">&lt;p&gt;As the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap post series&lt;/a&gt; continues on, I have been reading all the
wonderful posts by our awesome speakers, facilitated by the Trino developer
relations team. Because I have a perpetual fear of missing out, I convinced them
that I should get in on the fun. For this latest installment in the series, I
will be recapping my very own Trino Summit talk. Basically, I’m ripping off
Bo Burnham’s comedy bit where he &lt;a href=&quot;https://youtu.be/FZVMB8mrNO0?t=35&quot;&gt;reacts to his own reaction video&lt;/a&gt;,
blog style.&lt;/p&gt;

&lt;p&gt;In this session, I demonstrate building a data lakehouse architecture with
&lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst Galaxy&lt;/a&gt;, the
fastest and easiest way to get up running with Trino.
Before I dive into the recap, I want to thank the Trino community for showing
up. I am grateful that I was able to meet and learn from so many members of the
community in person.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Zfmxwu0m98k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;The premise of this example is that we have Pokémon Go data being ingested into
S3, which contains each Pokémon’s encounter information. This includes the
geo-location data of where each Pokémon spawned, and how long the Pokémon could
be found at that location. What we don’t have is any
information on that Pokemon’s abilities. That information is contained in the
Pokédex stored in MongoDB which I’ve cleverly nicknamed &lt;strong&gt;PokéMongoDB&lt;/strong&gt;. It
includes data about all the Pokémon including type, legendary status,
catch rate, and more. To create meaningful insights from our data, we need
to combine the incoming geo-location data with the static dimension CSV table
located in MongoDB.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/starburst-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To do this, I build out a reporting structure in the data lake using
Starburst Galaxy. The first step is to read the raw data stored in the land
layer, then clean and optimize that data into more performant ORC files in the
structure layer. Finally, I join the spawn data and Pokédex data together into a
single table that is cleaned and ready to be utilized by a data consumer.
Next I apply role-based access control capabilities within Starburst
Galaxy, which provides the proper data governance so that data consumers only
have read permissions to that final table. I then create some visualizations to
analyze which Pokémon are common to spawn in the San Francisco area.&lt;/p&gt;

&lt;p&gt;I walk through all the setup required to put this data lakehouse architecture
into action including creating my catalogs, cluster, schemas, and tables. After
incorporating open table formats, applying native security, and building
out a reporting structure, I have confidence that my data lakehouse is built
to last, and end up with some really cool final Pokémon graphs.&lt;/p&gt;

&lt;h2 id=&quot;helpful-links&quot;&gt;Helpful links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Sign up for &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/start/&quot;&gt;Starburst Galaxy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Read the &lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/index.html&quot;&gt;docs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Try a
&lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/tutorials/index.html&quot;&gt;tutorial&lt;/a&gt; for yourself&lt;/li&gt;
  &lt;li&gt;Register for &lt;a href=&quot;https://www.starburst.io/datanova/?utm_source=event&amp;amp;utm_medium=datanova&amp;amp;utm_campaign=[…]Event-Datanova-social-promo&amp;amp;utm_content=trinosummitrecapblog&quot;&gt;Datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html&quot;&gt;https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/starburst-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Monica Miller</name>
        </author>
      

      <summary>As the Trino Summit 2022 recap post series continues on, I have been reading all the wonderful posts by our awesome speakers, facilitated by the Trino developer relations team. Because I have a perpetual fear of missing out, I convinced them that I should get in on the fun. For this latest installment in the series, I will be recapping my very own Trino Summit talk. Basically, I’m ripping off Bo Burnham’s comedy bit where he reacts to his own reaction video, blog style. In this session, I demonstrate building a data lakehouse architecture with Starburst Galaxy, the fastest and easiest way to get up running with Trino. Before I dive into the recap, I want to thank the Trino community for showing up. I am grateful that I was able to meet and learn from so many members of the community in person.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/starburst.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino for large scale ETL at Lyft</title>
      <link href="https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html" rel="alternate" type="text/html" title="Trino for large scale ETL at Lyft" />
      <published>2022-12-12T00:00:00+00:00</published>
      <updated>2022-12-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html">&lt;p&gt;Buckle up, for the next &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;post in the Trino Summit 2022 recap series&lt;/a&gt;. In this post, we’re covering the talk
given by Lyft engineers, Charles and Ritesh, on how they have not only scaled
Trino as adoption grew, but with less nodes and more effective usage. They
also started moving to utilizing Trino more for ETL rather than just interactive
analytics. Get ready for a smooth ride as Lyft brings you large scale ETL with
Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FL3c1Ue7YWM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Lyft.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Lyft uses Trino to perform ETL jobs reading 10 petabytes of data per day and
writing 100 terabytes per day. They run 250,000 queries per day, with around
2,000 unique users. This requires approximately 750 EC2 instances scaling up or
down with an autoscaler. Over 90 percent of queries complete within a one to
three minutes.&lt;/p&gt;

&lt;p&gt;In the last year, Lyft cut their number of Trino nodes in half, while increasing
their workloads. This is possible due to recent improvements in Trino and
upgrades in Java versions. Lyft is not using fault-tolerant execution, but has
started seeing interest in using Trino for ETL jobs due to the faster
turnaround. Some issues Lyft has faced has been around how resource hungry Trino
is, as well as, the issue where the coordinator can be a single point of failure
for queries executing on a cluster.&lt;/p&gt;

&lt;p&gt;Lyft was one of the earliest companies to really push using Trino for ETL use
cases. They built custom best effort rollback code in Apache Airflow. If a query
fails, the operation reverts to the state before the operation began. Lyft runs
four Trino clusters split by the type of workload used on that cluster. The best
practices are careful usage around broadcast joins, query sharding, and scaling
writers for ETL loads.&lt;/p&gt;

&lt;p&gt;One final point Lyft pointed out is keeping up with the rapid release cycle of
Trino was a challenge. Lyft showcases their regression testing using their query
replay framework. This session is a smooth five out of five ride. Enjoy!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/lyft-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Charles Song, Ritesh Varyani, Brian Olsen</name>
        </author>
      

      <summary>Buckle up, for the next post in the Trino Summit 2022 recap series. In this post, we’re covering the talk given by Lyft engineers, Charles and Ritesh, on how they have not only scaled Trino as adoption grew, but with less nodes and more effective usage. They also started moving to utilizing Trino more for ETL rather than just interactive analytics. Get ready for a smooth ride as Lyft brings you large scale ETL with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/lyft.jpg" />
      
    </entry>
  
    <entry>
      <title>Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino</title>
      <link href="https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html" rel="alternate" type="text/html" title="Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino" />
      <published>2022-12-09T00:00:00+00:00</published>
      <updated>2022-12-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html">&lt;p&gt;Rolling right along with another one of &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;our Trino Summit 2022 recap posts&lt;/a&gt;, we’re excited to bring you the engaging
talk from Marc Laforet at Shopify. He talked about the ordeal (or, if you look
at it in a positive light, the privilege) of migrating petabytes of data from
Hive to Iceberg table formats with the help of Trino. With details on why
Shopify chose to move to Iceberg, the various migration strategies that were
considered, and the ultimate process of moving all that data while the Trino
Iceberg connector was still in active development, it’s an insightful talk that
you don’t want to miss.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/nJBBw-xnLU8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Shopify@Trino.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Along with many other Trino users, it should come as no surprise that Shopify
has a lot of data to work with. First-party data comes in from a few different
sources, and there’s a mountain of modelled data to go along with it. In
Shopify’s case, one of the issues was that some data sets were built on top of
custom table formats. On top of that, the architecture wasn’t scaled with a
careful plan in mind, leading to limited interoperability of datasets among
various tools. With data scientists unable to unify data across different tools
and storages, it was time for a change.&lt;/p&gt;

&lt;p&gt;When you’ve got tons of data that isn’t currently in one place, what’s the fix?
Create a central lakehouse for all the data to be accessible from, a
single-service portal that could serve all users’ needs. The first question was
which table format to use, and if the title of the blog post didn’t already give
it away, they chose to go with Apache Iceberg. It was an easy, central vision
to work towards: all data in a centralized lakehouse stored in Iceberg, then
queryable by Trino.&lt;/p&gt;

&lt;p&gt;Having a plan and putting that plan into action are two different things,
though. When nothing is already in Iceberg, moving it all there is a migration
on the scale of thousands of tables and petabytes of data. In Marc’s words from
the talk, once Shopify committed to the migration and invested resources into
it, the realization was, “crap, now I have to build it.” Even worse, because the
old data was primarily in gzipped JSON format, it all needed to be rewritten…
and so it was.&lt;/p&gt;

&lt;p&gt;Then, enter Trino! With new Iceberg-based tables, Trino was identified as the
right tool for the job to process all that data. This wasn’t without snags, as
the migration happened while the Iceberg connector was still being aggressively
worked on and developed. There were a few different incidents where Shopify hit
a snag or an issue, and an update or bugfix to Trino’s Iceberg connector solved
those problems in a matter of days or weeks.&lt;/p&gt;

&lt;p&gt;The result of all of this? Some incredible benchmark results. Large tables saw a
96% reduction in planning time, a 96% reduction in cumulative user memory, and a
95% reduction in query execution time. That’s the difference between thousands
of terabytes of memory to under 100, and a query that would take an hour to run
only taking three minutes. For the absolute largest table at Shopify, some
queries saw a 99.9% reduction in execution time. Yes, that number is real.&lt;/p&gt;

&lt;p&gt;Moral of the story? If you find yourself using an old Hive table with outdated
file formats, lamenting the resources you need and the time it takes, the
decision is easy. Migrate to Iceberg with Trino. Shopify has shown us the way,
and the full talk has plenty of useful advice for how to best go about it.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html&quot;&gt;https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/shopify-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Marc Laforet, Cole Bowden</name>
        </author>
      

      <summary>Rolling right along with another one of our Trino Summit 2022 recap posts, we’re excited to bring you the engaging talk from Marc Laforet at Shopify. He talked about the ordeal (or, if you look at it in a positive light, the privilege) of migrating petabytes of data from Hive to Iceberg table formats with the help of Trino. With details on why Shopify chose to move to Iceberg, the various migration strategies that were considered, and the ultimate process of moving all that data while the Trino Iceberg connector was still in active development, it’s an insightful talk that you don’t want to miss.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/shopify.jpg" />
      
    </entry>
  
    <entry>
      <title>Elevating data fabric to data mesh: Solving data needs in hybrid data lakes</title>
      <link href="https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html" rel="alternate" type="text/html" title="Elevating data fabric to data mesh: Solving data needs in hybrid data lakes" />
      <published>2022-12-07T00:00:00+00:00</published>
      <updated>2022-12-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html">&lt;p&gt;Tune in for the next &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;post in the Trino Summit 2022 recap series&lt;/a&gt;. In this post, we’re joining Saj from
Comcast, to talk about their migration from a data fabric to data mesh. Saj
shows you that there is more to the buzzword than meets the eye. He gives a
solid overview of why Comcast is taking data mesh to heart.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/sSWBi7bBotQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Comcast.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Comcast engineer Sajuman Joseph brings us through Comcast’s process to move from
their initial use case of using Trino to power their data fabric architecture to
include more governance features by leveraging Trino. Data fabric enables
querying data across distributed data sets, but importantly, it allows Comcast
to transparently migrate data across on-prem and cloud storage without impacting
users.&lt;/p&gt;

&lt;p&gt;Despite offering query federation, data fabric still misses out on a
higher-quality experience that data mesh aims to solve. Not only does having
access to the data matter, but also adding data quality checks and a dedicated
owner to ensure the data is correct and consumable. The ownership is split by
domains defined by Comcast. It is the responsibility of the owners to ensure
data quality, compliance, and security on the data they own. This data can be
exposed internally or externally as a data product. While many of the drivers
for this are done through company policy, there are technical means to make this
possible. This includes improving metadata on the data, access logs, global
data catalogs, and managing data access.&lt;/p&gt;

&lt;p&gt;Trino facilitates a single point of access and is the a primary location where
policies are enforced. Comcast created an engine called the Enterprise Policy
Hub which syncs with all data stores and compute engines to enforce company
policy and update metadata on all data across Comcast. Trino, along with other
query engines, consults this engine to determine what information a user has
access to, who owns the data, and creates an audit trail of what queries are
run.&lt;/p&gt;

&lt;p&gt;There are still some open challenges Comcast is looking to overcome. Data
discovery is a large challenge for anyone looking to find a specific table and
who is responsible for updating it. Another interesting area Comcast is
researching is creating automated retention and minimization of data copies.
This talk was exciting and gives a pretty clear roadmap to some beneficial
changes many teams can make to improve the quality and governance of their data
sets.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and
link to &lt;a href=&quot;https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html&quot;&gt;https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/comcast-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Sajuman Joseph, Brian Olsen</name>
        </author>
      

      <summary>Tune in for the next post in the Trino Summit 2022 recap series. In this post, we’re joining Saj from Comcast, to talk about their migration from a data fabric to data mesh. Saj shows you that there is more to the buzzword than meets the eye. He gives a solid overview of why Comcast is taking data mesh to heart.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/comcast.jpg" />
      
    </entry>
  
    <entry>
      <title>AI and poetry meet Trino</title>
      <link href="https://trino.io/blog/2022/12/06/poetry-ai.html" rel="alternate" type="text/html" title="AI and poetry meet Trino" />
      <published>2022-12-06T00:00:00+00:00</published>
      <updated>2022-12-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/06/poetry-ai</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/06/poetry-ai.html">&lt;p&gt;Sometimes when working in the depth of the query engine core, the planner, the
optimizer, or some other tricky problem or code, you run out of steam and need a
distraction. Your brain will quietly work on the problem in the background while
you get to read or do something interesting.&lt;/p&gt;

&lt;p&gt;During one break we found out more about GPT-3 and the recent improvements on
the OpenAI project. So we had to try.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;It’s pretty amazing that you can feed the AI a prompt just like we did:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Write a short rhyming poem explaining Trino in easy but accurate terms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And after a few trials, the best one produced by GPT-3 was this:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is a tech platform,&lt;br /&gt;
That makes data integration a breeze,&lt;br /&gt;
It’s designed to help your business grow,&lt;br /&gt;
And optimise your systems with ease.&lt;br /&gt;
&lt;br /&gt;
It links together different sources,&lt;br /&gt;
Both big and small,&lt;br /&gt;
Gathers data of different types,&lt;br /&gt;
Helping you make decisions for all.&lt;br /&gt;
&lt;br /&gt;
So Trino is an integration tool,&lt;br /&gt;
That connects data for you,&lt;br /&gt;
It’ll ease the burden of data sync,&lt;br /&gt;
Giving you more time to do what you do.&lt;br /&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cool, huh? Have a look at another prompt:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Write a short rhyming poem explaining Trino, the SQL query engine in complex
and accurate terms.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the compelling result:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is a query engine that does more than just run,&lt;br /&gt;
It’s a SQL engine with performance that stuns.&lt;br /&gt;
It is secure, efficient, and kind,&lt;br /&gt;
Providing real-time analytics of any kind.&lt;br /&gt;
&lt;br /&gt;
It’s a distributed system that’s built to scale,&lt;br /&gt;
Processing queries that never fail.&lt;br /&gt;
Connecting to myriad data sources,&lt;br /&gt;
It can quickly analyze any course.&lt;br /&gt;
&lt;br /&gt;
It’s flexible, reliable, and fast,&lt;br /&gt;
And is designed to make your queries last.&lt;br /&gt;
It can handle large volumes of data with ease,&lt;br /&gt;
Making the most complex queries a breeze.&lt;br /&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are interested to learn more you can look at the
&lt;a href=&quot;https://arstechnica.com/information-technology/2022/11/openai-conquers-rhyming-poetry-with-new-gpt-3-update/&quot;&gt;blog post on Ars Technica&lt;/a&gt;
or go and try it yourself &lt;a href=&quot;https://beta.openai.com/playground/&quot;&gt;on the playground&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Enjoy while we are heading back to &lt;a href=&quot;https://github.com/trinodb/trino/pulls&quot;&gt;working on Trino pull
requests&lt;/a&gt; and other code now.&lt;/p&gt;

&lt;p&gt;Martin and Marcos&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Marcos Traverso</name>
        </author>
      

      <summary>Sometimes when working in the depth of the query engine core, the planner, the optimizer, or some other tricky problem or code, you run out of steam and need a distraction. Your brain will quietly work on the problem in the background while you get to read or do something interesting. During one break we found out more about GPT-3 and the recent improvements on the OpenAI project. So we had to try.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/graphics/trino-openapi-header.png" />
      
    </entry>
  
    <entry>
      <title>Leveraging Trino to power data at Goldman Sachs</title>
      <link href="https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html" rel="alternate" type="text/html" title="Leveraging Trino to power data at Goldman Sachs" />
      <published>2022-12-05T00:00:00+00:00</published>
      <updated>2022-12-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html">&lt;p&gt;Continuing with &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the Trino Summit 2022 sessions posts&lt;/a&gt;, we’re diving into an insightful
lightning talk from &lt;a href=&quot;https://www.goldmansachs.com&quot;&gt;Goldman Sachs&lt;/a&gt;. They explore
how they use Trino to help ensure data quality across the board for all users
and customers. By using Trino to federate their various data sources, querying
everything in one place provides them with the flexibility they need. With that
flexibility, they can validate that all data is as it should be where that data
lives, settling any concerns that may exist about data integrity.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/g9fLA3tFG-Q&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Validating data quality can be a tricky and complicated process. Data resides
in many sources, with different rules and different processes for checking
quality. Goldman’s data ingestion team may not have a detailed understanding
of all data sets. Despite that, there is a need to autonomously verify and
validate all data to be confident in its quality and integrity. The solution to
this challenge? A queryable data quality platform powered by Trino.&lt;/p&gt;

&lt;p&gt;The underlying data quality platform’s logic handles the validation. Resting
on top of it is Trino, the scalable, fast solution to ensure that users can
query what they need. Even when the platform is profiling the data, enforcing
various quality rules, and validating the data in different ways, Trino is there
to provide access to everything contained within, proving that quality, speed,
and accessibility don’t need to be tradeoffs.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&quot;&gt;https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/goldman-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Sumit Halder, Siddhant Chadha, Suman-Newton, Ramesh Bhanan, Cole Bowden</name>
        </author>
      

      <summary>Continuing with the Trino Summit 2022 sessions posts, we’re diving into an insightful lightning talk from Goldman Sachs. They explore how they use Trino to help ensure data quality across the board for all users and customers. By using Trino to federate their various data sources, querying everything in one place provides them with the flexibility they need. With that flexibility, they can validate that all data is as it should be where that data lives, settling any concerns that may exist about data integrity.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/goldman-sachs.png" />
      
    </entry>
  
    <entry>
      <title>Optimizing Trino using spot instances with Zillow</title>
      <link href="https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html" rel="alternate" type="text/html" title="Optimizing Trino using spot instances with Zillow" />
      <published>2022-12-01T00:00:00+00:00</published>
      <updated>2022-12-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html">&lt;p&gt;In this installment of &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the Trino Summit 2022 sessions posts&lt;/a&gt;, we jump into an exciting topic by folks
from &lt;a href=&quot;https://www.zillow.com&quot;&gt;Zillow&lt;/a&gt; about running Trino on spot instances.
Spot instances are cheap and ephemeral nodes that lead to reduced overall
compute costs. Spot instances are cheaper as they are not guaranteed to remain
available.&lt;/p&gt;

&lt;p&gt;In this session, Zillow engineers talk about how they use Trino on spots to take
advantage of the cost savings while handling the transitory nature of spots.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/vz9reBUgQTE&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Zillow.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Zillow’s BI platform team is tasked with enabling access to data and metrics
from their data lake in a self-serving and performant manner. The platform must
handle generating up-to-date reports and metrics to unlock time-critical
opportunities. They also need to enable adhoc analytics across multiple domains
within Zillow.&lt;/p&gt;

&lt;p&gt;There are close to 600 data pipelines and 65,000 queries running daily. The
average read covers 600 terabytes of data, and the average P95 time is around
20 seconds. They have six Trino clusters that service various workflows based on
load. These are all deployed on Amazon EKS with a range of eight to 60 workers
based on CPU utilization.&lt;/p&gt;

&lt;p&gt;When deploying Trino on EKS, Zillow uses worker groups, which enables them to
collocate nodes in AWS local zones. It also made it possible to choose spot 
instances, which are 90% cheaper than regular on-demand instances. A critical
aspect they needed to cover was to correctly tune the percentage of nodes that
were spot instances. They created pools of nodes that were entirely on-demand
for coordinators since a coordinator going down, brings down the entire cluster.
Other pools used for workers are tuned to an optimal blend of spot and
on-demand.&lt;/p&gt;

&lt;p&gt;Watch this session to learn how to properly optimize the number of spot
instances running for your Trino clusters, without losing reliability of your
service. Also learn some ways that Zillow is planning on using the
fault-tolerant execution mode.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html&quot;&gt;https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/zillow-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Santhosh Venkatraman, Rupesh Kumar Perugu, Brian Olsen</name>
        </author>
      

      <summary>In this installment of the Trino Summit 2022 sessions posts, we jump into an exciting topic by folks from Zillow about running Trino on spot instances. Spot instances are cheap and ephemeral nodes that lead to reduced overall compute costs. Spot instances are cheaper as they are not guaranteed to remain available. In this session, Zillow engineers talk about how they use Trino on spots to take advantage of the cost savings while handling the transitory nature of spots.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/zillow.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino delivers for Amazon Athena</title>
      <link href="https://trino.io/blog/2022/12/01/athena.html" rel="alternate" type="text/html" title="Trino delivers for Amazon Athena" />
      <published>2022-12-01T00:00:00+00:00</published>
      <updated>2022-12-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/01/athena</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/01/athena.html">&lt;p&gt;Our community just keeps growing! Today, it is time to reach out and welcome
another large group of Trino users. The release of the new engine version for
&lt;a href=&quot;https://aws.amazon.com/athena&quot;&gt;Amazon Athena&lt;/a&gt; upgrades Athena to a recent
version of Trino from a rather old version. This update brings a ton of
improvements from the Trino project to the users of the popular cloud-based
query service.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;shared-history&quot;&gt;Shared history&lt;/h2&gt;

&lt;p&gt;Amazon Athena and Trino share a long history. From the beginning of Athena, the
query engine under the hood was Trino, then still called Presto. Athena created
a low-maintenance, powerful access mode to your data in S3 and beyond. It
combined the performance and features of Trino, with the convenience of a cloud
service, which enabled new users and use cases. You could take advantage of
Trino without needing a team of experts to deploy and operate a Trino cluster
for your organization. In fact, we wrote about this in the first edition of
&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;. There is also a section in the &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;new second
edition&lt;/a&gt; that you can get for
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;time-flies&quot;&gt;Time flies&lt;/h2&gt;

&lt;p&gt;But since the initial release of Athena, time has not stood still. In fact, the
Trino project has accelerated in &lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;innovation, features, and releases
tremendously&lt;/a&gt;. Until now Athena
users missed out on these improvements. However with the update Amazon Athena
users now get access to many of these great features. As &lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-athena-announces-upgraded-query-engine/&quot;&gt;AWS mentions in the
announcement&lt;/a&gt;,
“over 50 new SQL functions, 30 new features, and more than 90 query performance
improvements” are now available due the upgrade to a new version of Trino. These
include &lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;Row pattern recognition with MATCH_RECOGNIZE&lt;/a&gt;, &lt;a href=&quot;/blog/2021/03/10/introducing-new-window-features.html&quot;&gt;new window features&lt;/a&gt;, support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; statements, and many others.&lt;/p&gt;

&lt;p&gt;Performance improvements in our core engine and all the Trino connectors show up
in every release note. The &lt;a href=&quot;https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/&quot;&gt;improvements observed by the Athena team in their
benchmarks&lt;/a&gt;
show the resulting gains nicely. This is great evidence that our approach of
constantly working on small improvements wherever we find potential works well.
This approach is necessary since Trino is already at a very high performance
level, and like an elite athlete, where every small improvement matters.&lt;/p&gt;

&lt;p&gt;It is also important to note that these improvements are only in the  Trino
version of the engine, since the &lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Presto project does not include these
features&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;client-tools-and-collaboration&quot;&gt;Client tools and collaboration&lt;/h2&gt;

&lt;p&gt;Athena users also benefit from improvements for supporting client tools such as
Python clients, dbt, Metabase and others. Working with other communities is of
critical importance to the Trino project. The &lt;a href=&quot;https://trino.io/episodes/40.html&quot;&gt;innovations in our Iceberg
connector&lt;/a&gt; that are all now also available to
Athena users are a great example how we can lead the way together. Working with
contributors from Amazon and other companies and projects has yielded some
amazing improvements. At the &lt;a href=&quot;https://trino.io/episodes/42.html&quot;&gt;Trino summit and contributor
congregation&lt;/a&gt;, we to reconnected in person and
established even closer collaboration.&lt;/p&gt;

&lt;h2 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h2&gt;

&lt;p&gt;So, what is next for Trino and Athena users? First up, you should upgrade to the
new Trino engine in Athena, and avoid the legacy Presto engine.&lt;/p&gt;

&lt;p&gt;Second, check out some of the great presentations from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt; and &lt;a href=&quot;https://trino.io/episodes/42.html&quot;&gt;hear about some of our
impressions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And last but not least, stay tuned for more goodness. Trino already shipped
further releases that included support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, table functions, and more
performance improvements. The Athena team is working hard on updating Trino for
your benefit regularly.&lt;/p&gt;

&lt;p&gt;Celebrating our &lt;a href=&quot;/blog/2022/09/12/tenth-birthday-celebration-recap.html&quot;&gt;first decade of the Trino project this last summer&lt;/a&gt; has shown a great trajectory for
the project and the community, and it looks like the next decade is going to be
even better!&lt;/p&gt;

&lt;p&gt;Sending a warm welcome from the Trino community to the Amazon Athena team and
users. Now you know that you were Trino users all along.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Martin and Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>Our community just keeps growing! Today, it is time to reach out and welcome another large group of Trino users. The release of the new engine version for Amazon Athena upgrades Athena to a recent version of Trino from a rather old version. This update brings a ton of improvements from the Trino project to the users of the popular cloud-based query service.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-light.png" />
      
    </entry>
  
    <entry>
      <title>Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!</title>
      <link href="https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html" rel="alternate" type="text/html" title="Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!" />
      <published>2022-11-30T00:00:00+00:00</published>
      <updated>2022-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html">&lt;p&gt;This post continues &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;a larger series of posts&lt;/a&gt; on the Trino Summit 2022 sessions.
Following the &lt;a href=&quot;/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;Trino at Apple talk&lt;/a&gt;, engineers from Bloomberg shared
the latest about their additions to Trino. Bloomberg uses Trino to federate huge
amounts of disparate financial data together. When you have many users with
different use cases and resource needs, you need something to ensure that the
huge workloads don’t bully the small ones. Enter the Trino Load Balancer, a
privacy-aware solution to help maintain high availability while still treating
data security as the first-class citizen that it should be.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ePr-iVQ5ri4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino-at-Bloomberg.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Bloomberg collects data, creates experimental data, and ingests data from
vendors. Its data analysts then refine, clean, and structure that data using
whatever their preferred method is, generating even more diverse data. Internal
teams and clients then want to look at and query that generated data, too. Sound
like a data mesh? That’s because it is. Trino isn’t new at Bloomberg, and it’s
been in use to help federate all of those varying data sets into one unified
access point.&lt;/p&gt;

&lt;p&gt;When trying to deploy multiple Trino clusters for such a wide array of users who
demand high uptime, high throughput, and fast response times, the Trino
coordinator becomes a single point of failure. There’s the risk of
infrastructure outages, the need to shut things down for occasional upgrades,
and some users run high-throughput jobs for millions of rows while others are
expecting low-latency jobs for only hundreds. Keeping Trino up, running, and
meeting all users’ expectations is no small task.&lt;/p&gt;

&lt;p&gt;And that’s where the Trino Load Balancer comes in! As a fork of the open-source
presto-gateway, it helps to do exactly what it says on the tin for Trino:
balance workloads. By being aware of what’s running on each cluster and how many
resources are being used, it can direct traffic to the ideal clusters to meet
each user’s needs. And with a brief demo, we get a look at how data owners
can set policies that are respected within the load balancer, ensuring that
users can only access and query what they’re supposed to.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&quot;&gt;https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/bloomberg-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Vishal Jadhav, Pablo Arteaga, Cole Bowden</name>
        </author>
      

      <summary>This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Trino at Apple talk, engineers from Bloomberg shared the latest about their additions to Trino. Bloomberg uses Trino to federate huge amounts of disparate financial data together. When you have many users with different use cases and resource needs, you need something to ensure that the huge workloads don’t bully the small ones. Enter the Trino Load Balancer, a privacy-aware solution to help maintain high availability while still treating data security as the first-class citizen that it should be.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/bloomberg.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino at Apple</title>
      <link href="https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html" rel="alternate" type="text/html" title="Trino at Apple" />
      <published>2022-11-28T00:00:00+00:00</published>
      <updated>2022-11-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html">&lt;p&gt;This post continues &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;a larger series of posts&lt;/a&gt; on the Trino Summit 2022 sessions.
Following the &lt;a href=&quot;/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;Keynote: State of Trino session&lt;/a&gt;, engineers from Apple shared the
current usage of Trino at Apple. They discuss how they support Trino as a
service for multiple end-users, and the critical features that drew Apple to
Trino. They wrap up with some challenges they have faced and some development
they have planned to contribute to Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/3afcRK6Yvio&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Apple.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is deployed at scale in Apple, and it continues to see tremendous
adoption across multiple teams at Apple. &lt;em&gt;Yathi Peddyshetty, Software Engineer @ Apple&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The commonplace adhoc and BI analytics use cases make up a lot of how Apple uses
Trino today. They also have increasing uses in federated querying and A/B 
testing.&lt;/p&gt;

&lt;p&gt;To deploy Trino as a service, Apple has an in-house Kubernetes operator to
manage the Trino cluster lifecycles. They also created an orchestrator to
provision and simplify cluster creation and management. They make this a
self-service console that allows users to provision their own clusters per
request. Their custom orchestrator also takes care of autoscaling and other
technical complexities of maintaining a scalable Trino system.&lt;/p&gt;

&lt;p&gt;Apple primarily uses Iceberg, Hive, and Cassandra connectors. They have a heavy
focus on Apache Iceberg as their table format and have contributed a significant
amount of PRs to improve interoperability between Trino and Spark, and increased
coverage of Iceberg APIs. Other challenges Apple face stem from the lack of
flexible routing of queries to achieve zero downtime, and having pluggable
optimizer rules and operators.&lt;/p&gt;

&lt;p&gt;Apple has various features on their roadmap to eventually contribute to the
community. This includes, exposing remaining functionality in the Iceberg APIs,
support all partition transforms, predicate pushdowns, bucketed joins, simple
aggregate pushdowns, Iceberg native views in Trino, and more.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/apple-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Vinitha Gankidi, Yathi Peddyshetty, Brian Olsen</name>
        </author>
      

      <summary>This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Keynote: State of Trino session, engineers from Apple shared the current usage of Trino at Apple. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they have faced and some development they have planned to contribute to Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/apple.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 recap: The state of Trino</title>
      <link href="https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html" rel="alternate" type="text/html" title="Trino Summit 2022 recap: The state of Trino" />
      <published>2022-11-22T00:00:00+00:00</published>
      <updated>2022-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html">&lt;p&gt;To kick off the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt;,
we heard from Trino co-creators Martin Traverso, Dain Sundstrom, and David
Phillips. Martin gave a talk on the state of Trino and project plans for 2023,
then opened the floor to questions from the community. You can watch a recording
of the talk, or read on if you’re only interested in the highlights.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/mUq_h3oArp4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/State-of-Trino-Nov-2022.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;So what &lt;em&gt;has&lt;/em&gt; happened in Trino over the last year?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;We celebrated Trino’s 10th birthday!&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;It was the busiest year in project history, with 600+ contributors, 4000+
commits, and near-weekly releases.&lt;/li&gt;
  &lt;li&gt;Tons of new features were added, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, JSON functions, table
functions, fault-tolerant execution (look forward to a lot of talking about it
in later recaps!), upgrading to Java 17, and a slide so dense with other
goodies that it needed two columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what’s coming down the pipeline?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;, a large
set of core engine improvements.&lt;/li&gt;
  &lt;li&gt;Expanded table function support, including accepting tables as arguments.&lt;/li&gt;
  &lt;li&gt;Extra community support, so that contributors have an easier and better time
getting code merged into Trino.&lt;/li&gt;
  &lt;li&gt;New connectors, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP CATALOG&lt;/code&gt;, query tracing, and more!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were also tons of great questions asked by live and online attendees
answered by Dain, David, and Martin, so if you want to hear more, take a listen
to the full talk!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and
link to &lt;a href=&quot;https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/keynote-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips, Cole Bowden</name>
        </author>
      

      <summary>To kick off the Trino Summit 2022, we heard from Trino co-creators Martin Traverso, Dain Sundstrom, and David Phillips. Martin gave a talk on the state of Trino and project plans for 2023, then opened the floor to questions from the community. You can watch a recording of the talk, or read on if you’re only interested in the highlights.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/keynote-header.jpeg" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 recap</title>
      <link href="https://trino.io/blog/2022/11/21/trino-summit-2022-recap.html" rel="alternate" type="text/html" title="Trino Summit 2022 recap" />
      <published>2022-11-21T00:00:00+00:00</published>
      <updated>2022-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/21/trino-summit-2022-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/21/trino-summit-2022-recap.html">&lt;p&gt;Trino Summit 2022 was in a word, invigorating. I’m still coming off the high 
from the amount of energy I gained from being at this summit, meeting many of
you face-to-face for the first time. Most surprisingly, I learned that Trino
contributor James Petty from AWS was actually not famous painter
&lt;a href=&quot;https://en.wikipedia.org/wiki/Bob_Ross&quot;&gt;Bob Ross&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/james-petty.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you’ve ever planned a conference, you know that there are a lot of details to
iron out, and you can be left exhausted by the end. After this year’s Trino
Summit though, rather than being worn out, I felt like it ended too quickly and
I simply wanted more time to chat with everyone. A single day was simply not
enough, and now all I can think about is the next summit. We not only got to
hear an incredible lineup of talks and discussions from first-time Trino Summit
speakers like Apple, Shopify, and Lyft, but also had many engaging discussions
outside the auditorium.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/swag.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/authors.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/talking-1.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/talking-2.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There were cross-community discussions between Delta Lake, Airflow, and Alluxio
about how to turbo-charge Trino integrations with these communities. There were
many companies talking about best practices and gotchas while migrating from
Hive to Iceberg or Delta Lake. Others wanted to learn how to use fault-tolerant
execution. I spoke with managers of companies like LinkedIn and Bloomberg who
wanted to help develop their engineers to get more involved with contributing to
Trino. We all finally got to see the faces of people we had been talking to for
the past two to three years for the first time. People were getting their free
copies of Trino: The Definitive Guide signed by Manfred, Matt, and Martin and
brought home other swag. After a long day of talks, we wrapped Trino Summit up
with two happy hours on the roof of the Commonwealth club watching the sunset
over the San Francisco bay bridge.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/speech.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/happy-hour.jpg&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;session-summaries&quot;&gt;Session summaries&lt;/h2&gt;

&lt;p&gt;I would like to quickly summarize a few short takeaways I had from each talk at
the summit. I highly recommend you watch the full videos on the Trino YouTube
which are linked in the titles:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mUq_h3oArp4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Keynote: State of Trino&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Trino co-creator, Martin, covers recently developed features, community 
statistics, and discusses roadmap features like Project Hummingbird.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Dain and David join Martin on the stage to answer audience questions.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mUq_h3oArp4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/keynote.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=3afcRK6Yvio&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino at Apple&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Apple has an in-house k8s operator to manage Trino cluster lifecycles, and an
orchestrator to provision and simplify cluster creation and management.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Apple has a heavy focus on Apache Iceberg as their table format and has
contributed a significant amount of PRs to improve interoperability between
Trino and Spark and increased coverage of Iceberg APIs.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=3afcRK6Yvio&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/apple.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ePr-iVQ5ri4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Bloomberg uses Trino to centralize access to their massive amounts of catalogs
under many different departments.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To offer Trino-as-a-Service for varying workloads, they use a Trino Load
Balancer (a fork of the popular presto-gateway project at Lyft) to add new
functionality. In talking with them after their presentation, the Bloomberg
team expressed an interest in wanting to open source this work to the
community as a more generalized solution than the gateway project.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ePr-iVQ5ri4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/bloomberg.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=vz9reBUgQTE&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Optimizing Trino using spot instances&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/01/trino-summit-2022-zillow-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;In an attempt to minimize costs, Zillow is measuring the efficacy of running
Trino ETL jobs on spot instances.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This currently runs the risk of retries for failure but future work will look
at utilizing the new fault-tolerant execution method to mitigate retries in
the event of failure.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=vz9reBUgQTE&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/zillow.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=g9fLA3tFG-Q&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Leveraging Trino to Power Data at Goldman Sachs&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Goldman Sachs uses Trino to power their data quality service, taking advantage
of the fact that Trino centralizes all visibility across their platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=g9fLA3tFG-Q&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/goldman-sachs.png&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=sSWBi7bBotQ&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Elevating data fabric to data mesh: Solving data needs in hybrid datalakes&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/07/trino-summit-2022-comcast-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Comcast takes us through their Trino architecture journey by providing the
history of their Data Fabric service, and now discusses the data governance
and culture changes required to realize a Data Mesh with Trino.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=sSWBi7bBotQ&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/comcast.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=nJBBw-xnLU8&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/09/trino-summit-2022-shopify-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Shopify has recently migrates of its workloads to Trino. One of the first
hurdles was dealing with many issues in the Hive table format, so they quickly
upgraded to the Iceberg table format.&lt;/li&gt;
  &lt;li&gt;They initially encountered numerous issued, but experienced incredibly fast
turnaround of fixes from the Trino project that resolved their issues during
the migration.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There’s also a benchmark of how updating to a columnar format and Iceberg
table format drastically improves the results.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=nJBBw-xnLU8&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/shopify.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FL3c1Ue7YWM&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino for Large Scale ETL at Lyft&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Lyft is using Trino to perform ETL jobs scanning 10PB of data per day, and
writing 100TB per day. They are not using fault-tolerant execution.&lt;/li&gt;
  &lt;li&gt;In the last year, Lyft cut their number of Trino nodes in half, while
increasing the volume of their workloads due to recent improvements in Trino
and upgrades in Java versions.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Keeping up with the rapid release cycle of Trino was a challenge and Lyft
showcases their regression testing using their query replay framework.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FL3c1Ue7YWM&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/lyft.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Zfmxwu0m98k&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Federating them all on Starburst Galaxy&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/14/trino-summit-2022-starburst-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Running and scaling Trino is difficult. Starburst showcases Starburst Galaxy,
a SaaS data platform built around the Trino query engine.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This demoes running federated queries over Pokémon data scattered across
MongoDB and Iceberg tables.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Zfmxwu0m98k&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/starburst.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Q03DzL_fm-I&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino at Quora: Speed, Cost, Reliability Challenges and Tips&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/16/trino-summit-2022-quora-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Quora uses a large number of Trino clusters for ad-hoc, ETL, time series, A/B
testing, and backfill data.&lt;/li&gt;
  &lt;li&gt;Quora faced some initially high costs on Trino due to inefficient uses of
resources.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To address this they migrated to use Graviton instances, implemented
autoscaling, and optimized query efficiency.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Q03DzL_fm-I&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/quora.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=V9_aPLXATh8&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Journey to Iceberg with SK Telecom&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The speakers travelled all the way from South Korea to join us in person.&lt;/li&gt;
  &lt;li&gt;SK Telecom had a multitude of performance issues that all stemmed from the
lack of flexibility in the Hive model and metastore.&lt;/li&gt;
  &lt;li&gt;They migrated to Iceberg to address performance issues and had added benefits
of Iceberg’s table format to improve developer workflow.&lt;/li&gt;
  &lt;li&gt;Housekeeping operations like optimize were already addressed by the Iceberg
community and quickly added to Trino.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This reduced query processing time by 80%.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=V9_aPLXATh8&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/sk-telecom.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Using Trino with Apache Airflow for (almost) all your data problems&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Airflow is a highly functional and well-adopted workflow management platform
to schedule jobs on your data platform.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The Trino integration for Airflow recently landed and this coincided with the
GA arrival of fault-tolerance execution mode in Trino.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/astronomer.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MCB_1furnAo&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; How we use Trino to analyze our Product-led Growth (PLG) user activation funnel&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Upsolver solves a lot of common data problems on their platform.&lt;/li&gt;
  &lt;li&gt;One such problem is measuring activation rates in a product-led growthteam. This requires taking action on many sources of data.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Trino makes a natural fit to address the issues of joining this data together.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MCB_1furnAo&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/upsolver.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;federate-em-all&quot;&gt;Federate ‘em all&lt;/h2&gt;

&lt;p&gt;After a whole day of throwing Trino balls out to the crowd, we got to see a
nice metaphor for federated data by throwing them all in the air and yelling,
“Federate ‘em all!”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/balls.jpg&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-contributor-congregation&quot;&gt;Trino Contributor Congregation&lt;/h2&gt;

&lt;p&gt;The day after the summit, we invited a relatively small group of our
contributors to meet for the inaugural Trino Contributor Congregation (TCC).
This gathered many of our long-time and heavy Trino contributors. We had folks
from companies like Starburst, AWS, Apple, Bloomberg, Lyft, Comcast, LinkedIn,
Treasure Data, and others. Let’s dive into some of the topics we discussed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/contributor-congregation.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We discussed feature proposals like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Trino loadbalancer which is an adaption of the popular gateway project from Lyft.&lt;/li&gt;
  &lt;li&gt;A Ranger plugin to be maintained by the Trino community rather than rely on the Ranger project.&lt;/li&gt;
  &lt;li&gt;A Snowflake connector that was traditionally held back by the lack of infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We discussed the need for better shared testing datasets outside of the TPC-H
and TPC-DS that are more representative of real workloads that many are using.&lt;/p&gt;

&lt;p&gt;We discussed the need for a clearer process for contributors to follow to
minimize the time to get features merged and avoid stale PRs. This is being
addressed by the backlog grooming performed by the developer relations team, and
assigning maintainers to own various PRs. While there is never a promise to
merge a PR, improving the turnaround and communication on PRs is crucial to keep
happy contributors and improve the health of the project.&lt;/p&gt;

&lt;p&gt;While we were sad that not everyone could make the in-person TCC, we plan to
have virtual TCCs on a more frequent cadence and have the in-person TCCs
alongside larger in-person events. Getting these TCCs right is core to growing
the maintainership and continued success of the Trino project.&lt;/p&gt;

&lt;p&gt;We hope all of you who could join us in-person and online enjoyed yourselves. We
all had such a blast! Stay tuned for updates on the next Trino Summit location!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/bun-bun-bye.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Trino Summit 2022 was in a word, invigorating. I’m still coming off the high from the amount of energy I gained from being at this summit, meeting many of you face-to-face for the first time. Most surprisingly, I learned that Trino contributor James Petty from AWS was actually not famous painter Bob Ross.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/stage.jpg" />
      
    </entry>
  
    <entry>
      <title>Top five reasons to attend Trino Summit 2022</title>
      <link href="https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3.html" rel="alternate" type="text/html" title="Top five reasons to attend Trino Summit 2022" />
      <published>2022-10-31T00:00:00+00:00</published>
      <updated>2022-10-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3.html">&lt;p&gt;This blog post wraps up a series of 
&lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;previous posts&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/blog/2022/10/19/trino-summit-2022-teaser-2.html&quot;&gt;teasing Trino Summit 2022&lt;/a&gt;.
The conference is free and takes place in San Francisco, California on November
10th. Join us either in-person or virtually!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Lets dive right into the five reasons you should attend Trino Summit 2022. If
you’re not into these lists, go ahead and 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register now&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;1-hear-speakers-from-industry-leading-companies-talk-about-their-trino-architecture-and-use-cases&quot;&gt;1. Hear speakers from industry leading companies talk about their Trino architecture and use cases&lt;/h3&gt;

&lt;p&gt;This year’s summit contains leaders in the industry with varying workloads and
use cases. There are also sessions on tips and tricks to scale and lower the
cost of running Trino in production. Users from the following companies speak
about their challenges and how they use Trino to help overcome them:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Apple&lt;/li&gt;
  &lt;li&gt;Astronomer&lt;/li&gt;
  &lt;li&gt;Bloomberg&lt;/li&gt;
  &lt;li&gt;Comcast&lt;/li&gt;
  &lt;li&gt;Goldman Sachs&lt;/li&gt;
  &lt;li&gt;Lyft&lt;/li&gt;
  &lt;li&gt;Quora&lt;/li&gt;
  &lt;li&gt;Shopify&lt;/li&gt;
  &lt;li&gt;SK Telecom&lt;/li&gt;
  &lt;li&gt;Starburst&lt;/li&gt;
  &lt;li&gt;Upsolver&lt;/li&gt;
  &lt;li&gt;Zillow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see more information about the talks and the agenda for the conference, check
out the &lt;a href=&quot;https://www.starburst.io/info/trinosummit#agenda&quot;&gt;Trino Summit 2022 agenda&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;2-meet-the-authors-of-the-trino-the-definitive-guide-and-get-that-trino-swag&quot;&gt;2. Meet the authors of the &lt;strong&gt;&lt;em&gt;Trino: The Definitive Guide&lt;/em&gt;&lt;/strong&gt; and get that Trino swag&lt;/h3&gt;

&lt;p&gt;This year, we are giving away autographed copies of the recently updated,
&lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;&lt;/a&gt; to members who are attending.
Already have a physical copy? Visit the Trino booth to get your book signed and
meet authors 
&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;Manfred Moser&lt;/a&gt;,
&lt;a href=&quot;https://twitter.com/mfullertweets&quot;&gt;Matt Fuller&lt;/a&gt;, and
&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt; who literally wrote the book on
Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; src=&quot;/assets/ttdg2-cover.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;We will be giving away swag packs containing an autographed copy of Trino: The
Definitive Guide, a Trino Summit 2022 shirt, a Commander Bun Bun plushie, and 
more to both virtual and in-person attendees! This will be done during our
sponsored giveaway breaks between sessions where we challenge both online and 
virtual attendees in a race against time to bag the swag!&lt;/p&gt;

&lt;h3 id=&quot;3-federate-em-all&quot;&gt;3. Federate ‘em all&lt;/h3&gt;

&lt;p&gt;This year’s summit will be a free event that federates both data and humans. The
theme extends from a popular show that many of you know called Pokémon. To
understand the connection here, let’s break down what we mean by federate ‘em
all. In the same way Pokémon protagonist, Ash Ketchum, catches and trains
heterogeneous creatures called Pokémon, Trino queries and filters heterogeneous
data sets from various data sources.&lt;/p&gt;

&lt;p&gt;If you’re not familiar with Pokémon, a losing strategy is to train just one or
two Pokémon as different types of Pokémon are better suited to different tasks.
In the same way, centralizing all of your data to a single data warehouse or
data lake doesn’t make sense either. There are different use cases and 
different needs across the company. Rather than spending your time building
brittle one-size-fits-all architectures, Trino enables you to connect to
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;multiple data sources&lt;/a&gt; using ANSI SQL.&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h3 id=&quot;4-experience-beautiful-san-francisco&quot;&gt;4. Experience beautiful San Francisco&lt;/h3&gt;

&lt;p&gt;For those attending in-person, you will get to enjoy the beautiful San Francisco
area. The &lt;a href=&quot;https://www.starburst.io/info/trinosummit/#location&quot;&gt;Commonwealth club&lt;/a&gt;,
is located right on the San Francisco Bay. The building is beautiful with a
large auditorium for the main event, and plenty of floors and rooms for socializing.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/trino-summit-2022/commonwealth-club.jpeg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;At the end of the summit, we will have a happy hour on the scenic roof-deck that
gazes over the San Francisco bay at the iconic Oakland Bay Bridge.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-summit-2022/san-francisco.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;We know this only applies to our in-person attendees, but remember if you join
us virtually, there are still plenty of resources to network and interact 
throughout the conference. We will be taking questions from our virtual audience
and there will also be a chat forum to discuss with attendees from across the
globe. Plus, unlike those of us attending in-person, no travel is required and
pajamas are optional during the event!&lt;/p&gt;

&lt;h3 id=&quot;5-collaborate-with-some-of-the-best-minds-working-on-trino&quot;&gt;5. Collaborate with some of the best minds working on Trino&lt;/h3&gt;

&lt;p&gt;Trino is a relatively new paradigm compared to the rest of data world. If you
just realized that you don’t have to move all your data into one location,
you’re on the right track. However, there’s still a lot to learn when it comes
to scaling out a query engine that over time grows in usage. To get this right,
you need a community to be successful. The creators Martin, Dain, and David and
many of the core contributors of Trino will be attending, along with a large
list of folks that are using multiple clusters over hundreds of petabytes of
data.&lt;/p&gt;

&lt;p&gt;Tap into this incredibly passionate group of Trino enthusiasts to augment your
experience with this revolutionary query engine!&lt;/p&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;Make sure you register quickly for in-person registration, as it is limited to
250 seats. Spots are running out quickly so don’t wait!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;announcing-the-final-round-of-sessions-and-the-agenda&quot;&gt;Announcing the final round of sessions and the agenda!&lt;/h2&gt;

&lt;p&gt;Now for the final list of sessions to announce for this year’s Trino Summit!
This week is quite the reveal as we are showcasing a talk of how engineers at
Apple use Trino for their analytics challenges! 🎉🤯&lt;/p&gt;

&lt;p&gt;We also have three more amazing guests that are heavy hitters in the data and
analytics tech scene.&lt;/p&gt;

&lt;h3 id=&quot;trino-at-apple&quot;&gt;Trino at Apple&lt;/h3&gt;

&lt;p&gt;In this talk the audience will learn how Apple uses Trino to accelerate
analytics, the challenges we face deploying analytics at scale at Apple, and the
areas we would like to collaborate on with the community.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Vinitha Gankidi, Software engineer at Apple&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Yathindranath Peddyshetty, Software engineer at Apple&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;enterprise-ready-trino-at-bloomberg-one-giant-leap-toward-data-mesh&quot;&gt;Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!&lt;/h3&gt;

&lt;p&gt;Enterprises like Bloomberg love Trino. It allows us to embrace the data mesh
with ease. Providing Trino as a service in a highly available, configurable, and
access-controlled manner has been a key enabler for us in this paradigm shift.
Join us to learn how we have leveraged open-source components to achieve these
goals at Bloomberg.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Pablo Arteaga, Software Engineer at Bloomberg&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Vishal Jadhav, Software Engineer at Bloomberg&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;leveraging-trino-to-power-data-quality-at-goldman-sachs&quot;&gt;Leveraging Trino to power data quality at Goldman Sachs&lt;/h3&gt;

&lt;p&gt;Data is at the core of today’s business processes. We are responsible for making
accurate, timely, and modeled data available to our analytics and application
teams. The source of these datasets can be quite heterogeneous like HDFS, S3,
Sybase, Snowflake, Elasticsearch, and more. Also with an increase in data
volume, velocity, and variety; data quality assurance is extremely critical to
ensure the trustworthiness of data and mark it usable for consumers to use with
confidence. We have leveraged Trino to make high-quality data centrally
accessible through an efficient, secure, governed, and unified way of performing
analytics.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Sumit Halder, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Ramesh Bhanan, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Siddhant Chadha, Associate at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Suman Baliganahalli Narayan Murthy, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;optimizing-trino-using-spot-instances&quot;&gt;Optimizing Trino using spot instances&lt;/h3&gt;

&lt;p&gt;Trino is a critical tool used at Zillow for doing analytics on datalake. In this
talk we aim to give a general overview of how we leverage Trino and dive deeper
into the optimizations we have done for scaling Trino at Zillow using Spot
instances.&lt;/p&gt;

&lt;p&gt;In this session, we will show how fault-tolerant execution mode enables a more
cost-effective and resilient execution running Trino on Spot.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Rupesh Kumar Perugu, Senior Software Engineer at Zillow&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Santhosh Venkatraman, Software Engineer at Zillow&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That finalizes all of our sessions! To see them all, check out the
&lt;a href=&quot;https://www.starburst.io/info/trinosummit#agenda&quot;&gt;Trino Summit 2022 agenda&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Get excited, the conference is in less than two weeks so don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, &lt;strong&gt;&lt;em&gt;Federate them
all&lt;/em&gt;&lt;/strong&gt;! It is really shaping up to be an educational and fun-filled event with
Trino experts and aficionados.&lt;/p&gt;

&lt;p&gt;A huge thanks to our sponsors: Starburst, Privacera, Monte Carlo, Immuta,
CubeJS, Delta Lake, Hightouch, Backblaze, Databricks, Alluxio, and Tabular!&lt;/p&gt;

&lt;p&gt;Well that’s a wrap, we’ll see you all in T-minus ten days!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>This blog post wraps up a series of previous posts teasing Trino Summit 2022. The conference is free and takes place in San Francisco, California on November 10th. Join us either in-person or virtually! Register now</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022: Federating humans and data</title>
      <link href="https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2.html" rel="alternate" type="text/html" title="Trino Summit 2022: Federating humans and data" />
      <published>2022-10-19T00:00:00+00:00</published>
      <updated>2022-10-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2.html">&lt;p&gt;Trino has long been the de facto standard to querying large data sets over your
cloud or on-prem storage, also known as data lakes. This Trino Summit’s theme 
instead will showcase Trino’s other claim to fame: query federation. Trino is a
query engine providing an access point that exposes ANSI SQL across 
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;multiple data sources&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I urge you to join us either in-person or virtually if you are a fan of Trino,
big data, open source, data engineering, Java, or all the above! This conference
is free and takes place in San Francisco, California on November 10th.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;I can’t help but bring up the analogy of how Trino federates heterogeneous data
while this Trino Summit will federate many of us in the community form all
corners of the world. It really brings an appreciation to the international
reach of Trino and makes me look forward to more in-person events!&lt;/p&gt;

&lt;p&gt;Trino Summit will be held at the Commonwealth Club in San Francisco, California.
Make sure you register quickly for in-person registration, as it is limited to
250 seats. Virtual registration is also picking up quickly so register today!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h3 id=&quot;get-an-autographed-copy-of-trino-the-definitive-guide-2nd-ed&quot;&gt;Get an autographed copy of Trino: The Definitive Guide, 2nd ed.&lt;/h3&gt;

&lt;p&gt;Want to meet the authors who literally wrote the book on Trino? Visit 
&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;Manfred Moser&lt;/a&gt;,
&lt;a href=&quot;https://twitter.com/mfullertweets&quot;&gt;Matt Fuller&lt;/a&gt;, and
&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt; at the Trino booth during the
conference. Bring your hard copy of &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;&lt;/a&gt; to get it signed by the authors!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/ttdg2-cover.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Don’t have a book? We’ll be giving away autographed copied of the book
throughout the conference!&lt;/p&gt;

&lt;h3 id=&quot;trino-summit-2022-teaser&quot;&gt;Trino Summit 2022 teaser&lt;/h3&gt;

&lt;p&gt;Check out the teaser for this year’s Trino Summit and get ready to &lt;strong&gt;&lt;em&gt;Federate ‘em
all&lt;/em&gt;&lt;/strong&gt;!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;announcing-the-second-round-of-sessions-and-speakers&quot;&gt;Announcing the second round of sessions and speakers&lt;/h2&gt;

&lt;p&gt;As mentioned in the &lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;previous summit teaser&lt;/a&gt;, we announced some of our exciting
lineup of speakers! The topics range from architectures like data mesh and data
lakehouse, to running Trino at scale with fault-tolerant execution, and of
course, query federation.&lt;/p&gt;

&lt;p&gt;We have a full roster planned, but check out the next round of fully confirmed
sessions. Stay tuned for one more blog post as we announce the final sessions in
our agenda as they are confirmed!&lt;/p&gt;

&lt;h3 id=&quot;sk-telecoms-journey-to-iceberg&quot;&gt;SK Telecom’s journey to Iceberg&lt;/h3&gt;

&lt;p&gt;SK Group is one South Korea’s largest conglomerates in the nation covering
industries from manufacturing to telecommunications. SK Telecom uses an
on-premise data platform at petabyte scale using Trino as a query engine. We
chose Trino for its ability to connect to heterogeneous data sources and ensures
fast performance that plays a key role in our data platform.&lt;/p&gt;

&lt;p&gt;As data along with user demands to analyze long-term data increased, the Trino
Hive connector faced several challenges. Queries with an input data size 
exceeding a terabyte put a great burden on the cluster. This caused many jobs to
fail which can be problematic as Trino’s resource sharing architecture affects
multiple users when a heavy query occurs.&lt;/p&gt;

&lt;p&gt;To address this situation, we optimized the data structure, tuned queries, and
used the resource group to isolate queries, but none of this fixed the problem.
We investigated Apache Iceberg and realized it could address some of these
scaling issues we were facing. In this talk, we will share our journey.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;JaeChang Song, Data Engineer at SKTelecom and Trino/Iceberg Contributor&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Jennifer OH, Data Engineer at SKTelecom&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elevating-data-fabric-to-data-mesh-solving-data-needs-in-hybrid-data-lakes&quot;&gt;Elevating Data Fabric to Data Mesh: solving data needs in hybrid data lakes&lt;/h3&gt;

&lt;p&gt;At Comcast, we have long had a complex hybrid data lakes that consists of
data lakes in on-prem and multiple cloud environments. Comcast uses Trino to
bridge the data in these environments using an architecture we call Data Fabric.
Data Fabric is an abstraction layer that uses an internally built connector that
connects to multiple instances of Trino. This enables us to query across all
of these environments from a single Trino instance.&lt;/p&gt;

&lt;p&gt;In recent years, emerging architectures like Data Mesh have nicely complemented
the goals we have been building to for years. While we have effectively 
implemented some aspects of a Data Mesh, there are still core tenants that
cannot be addressed by Trino alone. This is the journey we are on at Comcast,
and we like to share our experience so far, challenges we overcame, and
the ones yet to be resolved. Data abstraction, availability, movement, and
governance are the various topics we will touch upon in this session.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Sajumon Joseph, Sr Principal Architect&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Pavan Madhineni, Sr. Manager; Product Development Engineering&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-at-quora-speed-cost-reliability-challenges-and-tips&quot;&gt;Trino at Quora: Speed, Cost, Reliability Challenges and Tips&lt;/h3&gt;

&lt;p&gt;Trino has become an essential part of Quora’s tech stack and a major component
of our A/B testing framework that powers our decision-making on the product.
Trino has brought a lot of advantages to us. However, at Quora’s scale, we face
cost, speed, and reliability challenges when operating Trino.&lt;/p&gt;

&lt;p&gt;In this session, we will talk about how we resolve the challenges. Some
approaches are: auto-scale Trino clusters, experiment with different cluster and
JVM configurations, and instance types, build checkers to detect slow workers
and inefficient queries, and set up extensive monitoring.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Yifan Pan, Software Engineer of Data Infrastructure Team at Quora; 
Administrator/Primary Owner of Trino infrastructure at Quora&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;how-we-use-trino-to-analyze-our-product-led-growth-plg-user-activation-funnel&quot;&gt;How we use Trino to analyze our Product-led Growth (PLG) user activation funnel&lt;/h3&gt;

&lt;p&gt;Being a PLG company, we must track and analyze every action our users perform
within the product to remove friction and maximize usage and satisfaction. To
understand how effectively and quickly users become educated and then active in
the product, we had to instrument the user journey from signup to the Aha moment
and beyond.&lt;/p&gt;

&lt;p&gt;There are many tools on the market that can be used to analyze user behavior,
but none met our needs. In this session you will learn how we built a data
architecture to collect, model, and enrich user behavior events to optimize
Trino query performance that accelerated our ability to understand and improve
user conversion rates.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Roy Hasson, Head of Product at Upsolver&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I hope you all are as excited as we are to finally federate the Trino community
face-to-face! This conference is shaping up to be educational, fun, and filled
with Trino experts and aficionados.&lt;/p&gt;

&lt;p&gt;Stay tuned for new developments in upcoming blog posts, don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, &lt;strong&gt;&lt;em&gt;Federate them
all&lt;/em&gt;&lt;/strong&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Trino has long been the de facto standard to querying large data sets over your cloud or on-prem storage, also known as data lakes. This Trino Summit’s theme instead will showcase Trino’s other claim to fame: query federation. Trino is a query engine providing an access point that exposes ANSI SQL across multiple data sources. I urge you to join us either in-person or virtually if you are a fan of Trino, big data, open source, data engineering, Java, or all the above! This conference is free and takes place in San Francisco, California on November 10th.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Release of the second edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2022/10/03/the-definitive-guide-2.html" rel="alternate" type="text/html" title="Release of the second edition of Trino: The Definitive Guide" />
      <published>2022-10-03T00:00:00+00:00</published>
      <updated>2022-10-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/03/the-definitive-guide-2</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/03/the-definitive-guide-2.html">&lt;p&gt;It was time for a refresh. A little while ago in April 2021, we announced
the &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino version of our definitive guide&lt;/a&gt;. But again, Trino as a project and community
has continued to innovate and grow. Numerous smaller and larger details changed,
and the examples and resources needed to be fixed.&lt;/p&gt;

&lt;p&gt;Today, we are happy to announce that after a few months of updates, testing, and
editing, the second edition of &lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt; is available.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy from Starburst now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;The &lt;a href=&quot;https://www.oreilly.com/library/view/trino-the-definitive/9781098137229/&quot;&gt;new edition of the book from
O’Reilly&lt;/a&gt;
is available in digital formats as well as physical copies. You can find more
information about the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our permanent page about
it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The book is now updated to Trino release 392 for all filenames, installation
methods, commands, names and properties. We addressed all problems that our
readers found and reported to us as well.&lt;/p&gt;

&lt;p&gt;We updated to Java 17 usage, added more SQL statements, and added info about
&lt;a href=&quot;https://trino.io/blog/2022/09/20/python-progress.html&quot;&gt;Python tools like dbt&lt;/a&gt; and clients like Metabase. We talk about the lakehouse architecture and new
connectors like Iceberg and Delta Lake.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;updated example code
repository&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/README.md&quot;&gt;give us a
star&lt;/a&gt;, provide feedback,
and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And one last tip, join us at &lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;Trino Summit 2022&lt;/a&gt; in San Francisco in November for a chat
and maybe even a signed hardcopy of the book.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>It was time for a refresh. A little while ago in April 2021, we announced the Trino version of our definitive guide. But again, Trino as a project and community has continued to innovate and grow. Numerous smaller and larger details changed, and the examples and resources needed to be fixed. Today, we are happy to announce that after a few months of updates, testing, and editing, the second edition of Trino: The Definitive Guide is available. Get a free copy from Starburst now!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-cover.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 will be legendary</title>
      <link href="https://trino.io/blog/2022/09/22/trino-summit-2022-teaser.html" rel="alternate" type="text/html" title="Trino Summit 2022 will be legendary" />
      <published>2022-09-22T00:00:00+00:00</published>
      <updated>2022-09-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/22/trino-summit-2022-teaser</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/22/trino-summit-2022-teaser.html">&lt;p&gt;Commander Bun Bun is back and this year we have an exciting lineup of speakers.
Topics range from architectures like data mesh and data lakehouse, to running
Trino at scale with fault-tolerant execution, and query federation. This 
conference is free and takes place on November 10th. The summit is a hybrid
event for in-person and virtual attendance. Find out more details below!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;This year’s Trino Summit will be hosted at the Commonwealth Club in San 
Francisco, CA. In-person registration is limited to 250 seats so make sure you
register quickly before spots run out!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h3 id=&quot;trino-summit-2022-teaser&quot;&gt;Trino Summit 2022 teaser&lt;/h3&gt;

&lt;p&gt;Get ready to federate them all this year! Many times when folks think of Trino,
their first instinct is to consider the data lake use case where it replaces
Hive or other data lakehouse query engines. However, this summit will also drill
into the lesser discussed query federation use case. Federate ‘em all!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;announcing-the-first-sessions-and-speakers&quot;&gt;Announcing the first sessions and speakers&lt;/h2&gt;

&lt;p&gt;We have a full roster planned but here is a glance at a few full confirmed
sessions. Stay tuned for future blog posts as we announce more session as they
are confirmed!&lt;/p&gt;

&lt;h3 id=&quot;state-of-trino-keynote&quot;&gt;State of Trino keynote&lt;/h3&gt;

&lt;p&gt;Hear the latest on the state of the open source Trino project. Trino
is the award-winning MPP SQL query engine. In this session, Trino creators
discuss the latest features that have landed in the last year, the roadmap for
the year ahead, and community growth highlights.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Martin Traverso, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Dain Sundstrom, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;David Phillips, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-for-large-scale-etl-at-lyft&quot;&gt;Trino for large scale ETL at Lyft&lt;/h3&gt;

&lt;p&gt;At Lyft, we are processing petabytes of data daily through Trino
for various use cases. A single query can execute as long as 4 hours with
terabytes of memory reserved. There are quite many challenges to operate Trino
ETL at such a scale: how to make all queries as performant as possible with low
failures rates; how should we define clusters, routing groups and resource
groups for changing volume across a day; how to keep commitment to user SLOs
during unexpected spikes, etc.&lt;/p&gt;

&lt;p&gt;We’ll share what we’ve done with our config tunings, large query/user
identifications, autoscaling and fault tolerant features to execute Trino at
such a scale. We’ll also share our upcoming challenges and plans to move steps
further with Trino adoption across the company.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Charles Song, Senior Software Engineer at Lyft&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;rewriting-history-migrating-petabytes-of-data-to-apache-iceberg-using-trino&quot;&gt;Rewriting history: Migrating petabytes of data to Apache Iceberg using Trino&lt;/h3&gt;

&lt;p&gt;Dataset interoperability between data platform components continues to
be a difficult hurdle to overcome. This short coming often results in siloed
data and frustrated users. Although open table formats like Apache Iceberg aim
to break down these silos by providing a consistent and scalable table
abstraction, migrating your pre-existing data archive to a new format can still
be daunting. This talk will outline challenges we faced when rewriting petabytes
of Shopify’s data into Iceberg table format using the Trino engine. A rapidly
evolving landscape, I will highlight recent contributions to Trino’s Iceberg
integration that made our work possible while also illustrating how we designed
our system to scale. Topics will include: what to consider when designing your
migration strategy, how we optimized Trino’s write performance and how to
recover from corrupt table states. Finally, I will compare the query performance
of old and migrated datasets using Shopify’s datasets as benchmarks.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Marc Laforet, Senior Data Engineer at Shopify&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;federating-them-all-on-starburst-galaxy&quot;&gt;Federating them all on Starburst Galaxy!&lt;/h3&gt;

&lt;p&gt;You’ve federated them all on Trino, but to beat the elite four at
Indigo Plateau, every data trainer needs help. In this talk, I will cover how
Starburst Galaxy is the fastest path to query federation and cover a demo that
trainers can follow later. We’ll also cover cool features like schema discovery
and fault-tolerance execution. The queries we’ll run will be with Pokémon data
so that you don’t have to witness yet another taxi cab or iris data set.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Monica Miller, Developer Advocate at Starburst*&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;using-trino-with-apache-airflow-for-almost-all-your-data-problems&quot;&gt;Using Trino with Apache Airflow for (almost) all your data problems&lt;/h3&gt;

&lt;p&gt;Trino is incredibly effective at enabling users to extract insights
quickly and effectively from large amount of data located in dispersed and
heterogeneous federated data systems. However, some business data problems are
more complex than interactive analytics use cases, and are best broken down into
a sequence of interdependent steps, a.k.a. a workflow. For these use cases,
dedicated software is often required in order to schedule and manage these
processes with a principled approach. In this session, we will look at how we
can leverage Apache Airflow to orchestrate Trino queries into complex workflows
that solve practical batch processing problems, all the while avoiding the use
of repetitive, redundant data movement.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Philippe Gagnon, Solutions Architect at Astronomer&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Stay tuned for new developments in upcoming blog posts, don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, federate them
all!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Dain Sundstrom</name>
        </author>
      

      <summary>Commander Bun Bun is back and this year we have an exciting lineup of speakers. Topics range from architectures like data mesh and data lakehouse, to running Trino at scale with fault-tolerant execution, and query federation. This conference is free and takes place on November 10th. The summit is a hybrid event for in-person and virtual attendance. Find out more details below!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Trino charms Python</title>
      <link href="https://trino.io/blog/2022/09/20/python-progress.html" rel="alternate" type="text/html" title="Trino charms Python" />
      <published>2022-09-20T00:00:00+00:00</published>
      <updated>2022-09-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/20/python-progress</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/20/python-progress.html">&lt;p&gt;Wow, have we ever come a long way with Python support for Trino. It feels like
ages ago that we talked about DB-API, trino-python-client, SQLAlchemy, Apache
Superset, and more in &lt;a href=&quot;https://trino.io/episodes/12.html&quot;&gt;Trino Community Broadcast episode
12&lt;/a&gt;. More recently we talked about dbt in
&lt;a href=&quot;https://trino.io/episodes/21.html&quot;&gt;episode 21&lt;/a&gt; and &lt;a href=&quot;https://trino.io/episodes/30.html&quot;&gt;episode
30&lt;/a&gt;, but there is so much more for Pythonistas,
Pythonians, Python programmers, and simply users of Python-powered tools.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;where-are-we-now&quot;&gt;Where are we now&lt;/h2&gt;

&lt;p&gt;Python usage shows up with nearly every Trino deployment these days, and we had
some really great developments for you all recent months:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; has really ramped up the contributions to
the foundation of a lot of Python tools connecting to Trino. The
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt; receives
improvements regularly and is definitely a first-class client at the same
level as the JDBC driver or the CLI.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt Labs&lt;/a&gt; and Starburst have worked hard on
launching and improving the &lt;a href=&quot;https://github.com/starburstdata/dbt-trino&quot;&gt;dbt-trino
project&lt;/a&gt; and enabling automated
data transformation flows.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache Airflow&lt;/a&gt; use cases are abound and the
&lt;a href=&quot;/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html&quot;&gt;integration is improving&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache Superset&lt;/a&gt; and
&lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; continue to add features and treat Trino as a
major data source and integration, and we should probably have another Trino
Community Broadcast episode to see that all in action.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://airbyte.com/&quot;&gt;Airbyte&lt;/a&gt; was &lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;demoed at Cinco de Trino&lt;/a&gt; and is &lt;a href=&quot;/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html&quot;&gt;widely used by companies such as
Lyft&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And of course there are well-known usages such as notebooks everywhere, on your
workstation, in your company, and out in the cloud. But is there more? There
must be!&lt;/p&gt;

&lt;h2 id=&quot;what-else-could-we-do&quot;&gt;What else could we do&lt;/h2&gt;

&lt;p&gt;All of these developments are great for our users. I want to encourage you all
to try these tools and learn how amazing they are with Trino. At the same time
it feels like there got to be even more. The Python ecosystem is so large, and
there are probably dozens of use cases we never heard about, have not
considered, or dreamed about in our wildest dreams.&lt;/p&gt;

&lt;p&gt;On the other hand I am sure there are still problems with these tools and
integrations. What is an edge case for us, might be a daily task for you. What
we consider hard and complicated, might be just what you have to deal with
anyway. And in the spirit of constant improvement, we really want to fix these
things and make it all amazing. But we need your help.&lt;/p&gt;

&lt;h2 id=&quot;let-us-know-what-you-think&quot;&gt;Let us know what you think&lt;/h2&gt;

&lt;p&gt;This is now your opportunity to tell us what need to make your Trino and Python
experience better.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://forms.gle/4bzMPZxby6E4xKm98&quot; target=&quot;_blank&quot;&gt;
        Help Trino and Python
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Trino, Python, and all the tools in the ecosystem go from strength to strength.
With your help we want to supercharge the tooling to hero levels. With your help
and input we can do it.&lt;/p&gt;

&lt;p&gt;Join us in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python-client&lt;/code&gt; on &lt;a href=&quot;https://trino.io/community.html&quot;&gt;Trino slack&lt;/a&gt;,
and don’t forget to &lt;a href=&quot;https://forms.gle/4bzMPZxby6E4xKm98&quot;&gt;answer that survey&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks, and see you at the &lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;Trino Summit 2022&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Brian, and Dain&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Brian Zhan, Dain Sundstrom</name>
        </author>
      

      <summary>Wow, have we ever come a long way with Python support for Trino. It feels like ages ago that we talked about DB-API, trino-python-client, SQLAlchemy, Apache Superset, and more in Trino Community Broadcast episode 12. More recently we talked about dbt in episode 21 and episode 30, but there is so much more for Pythonistas, Pythonians, Python programmers, and simply users of Python-powered tools.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/python.png" />
      
    </entry>
  
    <entry>
      <title>Trino&apos;s tenth birthday celebration recap</title>
      <link href="https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap.html" rel="alternate" type="text/html" title="Trino&apos;s tenth birthday celebration recap" />
      <published>2022-09-12T00:00:00+00:00</published>
      <updated>2022-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap.html">&lt;p&gt;What an exciting month we had in August! August marked the ten-year birthday of
the Trino project. Don’t worry if you missed all the excitment as we’ve
condensed it all in this post.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;blog-posts&quot;&gt;Blog posts&lt;/h2&gt;

&lt;p&gt;We felt it necessary to chronicle the larger events that happened in the last
decade of the project through the lens of where we are today.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we could do for the Trino Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;A decade of query engine innovation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;Happy tenth birthday Trino!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We shared these posts on HackerNews and the Facebook and the query innovation 
posts both hit the front page. This resulted in one of the largest amount of 
page views on the Trino website in a given day - more than 25k views!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/hn-top.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-ten-year-timeline-video&quot;&gt;Trino ten-year timeline video&lt;/h2&gt;

&lt;p&gt;Another way we celebrated was creating an epic ten-year montage video that
chronicles the incredible journey starting with the Presto project’s humble
beginnings, and how it evolved into the success that Trino is today:&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;birthday-celebration-with-the-creators-of-trino&quot;&gt;Birthday celebration with the creators of Trino&lt;/h2&gt;

&lt;p&gt;To cap things off last month, we hosted a meetup with the creators to reflect
on the last ten years, laugh and listen to some stories from the early days,
talk about the exciting features currently launching, and speculate on the next
ten years of Trino. Here are some highlights you missed:&lt;/p&gt;

&lt;h3 id=&quot;adding-dynamic-catalogs&quot;&gt;Adding dynamic catalogs&lt;/h3&gt;

&lt;p&gt;Dain discusses what dynamic catalogs could look like in Trino. Currently, to add
catalogs in Trino, you need to add the new catalog configuration file and then
restart Trino. With dynamic catalogs, you can add and remove these catalogs at
runtime with no restart required. There is still no guarantee of exactly when
this feature would arrive, but some of the foundations are currently being 
added. &lt;a href=&quot;https://www.youtube.com/clip/UgkxkYmwM6gmw9-GceMUb5IxqIKm0qNXt3fY&quot; target=&quot;_blank&quot;&gt; 
&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Dain dives into this a bit
more in this clip&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;vectorization-and-performance&quot;&gt;Vectorization and performance&lt;/h3&gt;

&lt;p&gt;As more marketing around vectorized databases has come up recently many have
asked if Trino will be following the trend. This question comes up at an
interesting time as 
&lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;Trino now requires Java 17 to run&lt;/a&gt;. Java 17
comes with a lot of capabilities to vectorize, and while we are excited to start
looking into these capabilities, simply updating workloads to use vectorization
doesn’t pack the performance punch that many would expect it to. The answer is
more complex:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Do modern workloads benefit from vectorization? 
&lt;a href=&quot;https://www.youtube.com/clip/UgkxmPAur8thP_D-_GpCcg-sqprEAqwWdyck&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
See Martin’s answer to this&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Is there a benefit to vectorization over Java’s auto-vectorization?
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx1AKbq0jQyZhOH4MKNf3LO4i9kZAmLqpJ&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Sometimes, but Dain elaborates on when&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;If not vectorization, what type of performance improvements does Trino focus on?
&lt;a href=&quot;https://www.youtube.com/clip/UgkxQwDYDS6evVJelNVjWAgrIhzg_Q-cAEyq&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Martin and Dain list some simple but impactful ones&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;The debate around query time optimization versus runtime adaption.
&lt;a href=&quot;https://www.youtube.com/clip/Ugkxt5ryTBP-EPEEo_OOcW2PKvNiJkj5n8UR&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Which should you optimize first?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;polymorphic-table-functions&quot;&gt;Polymorphic table functions&lt;/h3&gt;

&lt;p&gt;One feature that is top-of-mind for everyone in the Trino project are
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table functions&lt;/a&gt;
or simply “table functions” as Dain prefers to call them.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is a table function?
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx62IKgPd_v9eGBaPUHP2hyaRkWSXh8w8h&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
David and Dain discuss standard and polymorphic table functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Could we rewrite the &lt;a href=&quot;https://trino.io/docs/current/connector/googlesheets&quot;&gt;Google Sheets connector&lt;/a&gt;
as a table function?.
&lt;a href=&quot;https://www.youtube.com/clip/UgkxKIhplQHgEULQkSrjKs4M5w8oNdQMJaoL&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
David and Dain discuss how this would work&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Why table functions are so incredibly powerful.
&lt;a href=&quot;https://www.youtube.com/clip/UgkxQcokpdgPjiuMKMC5-3HwHvlbmZjxAvxe&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Eric and Dain talk about why PTFs are a game changer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about polymorphic table functions, check out the
recent &lt;a href=&quot;https://trino.io/episodes/38.html&quot;&gt;Trino Community Broadcast episode&lt;/a&gt; that
covers the potential of these functions in much more detail.&lt;/p&gt;

&lt;h3 id=&quot;the-early-days-of-presto-and-trino&quot;&gt;The early days of Presto and Trino&lt;/h3&gt;

&lt;p&gt;We wanted to get some insight into what the early days of the project looked
like, and how Martin, Dain, David, and Eric began the daunting task of designing
and building a distributed query engine from scratch. Some of the discussions
were interesting while others were downright hilarious. Here are some steps you
can take to write your own query engine, at least if you want to do it the way
the Trino creators did it:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Look up a bunch of research papers to see how others are doing this 📑.
  &lt;a href=&quot;https://www.youtube.com/clip/gkxGjPYZRx8rhtAndyho7AZgsM4e9wG9Jt4&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Side note: Papers tend to be highly aspirational and skip important fundamentals.
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx6Hqe5iglsTgrR9hVo9U3ITi8LSxxMu4U&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Video&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Address the real challenges of making a query engine.
  &lt;a href=&quot;https://www.youtube.com/clip/Ugkx57PezuXyRWHrxxxoLaKni6jqFZ-StwY-&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Take your initial version and just throw it away 😂🗑🚮.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxJz7zve36QJZZDdtC3S29vI-Ak1jRifAH&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Expand outside the initial use cases by learning from other companies and
  building community 👥.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxQrBl0BzOrjvwDcEN4KAAyqehcRUc1tsf&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Cause a &lt;a href=&quot;https://en.wikipedia.org/wiki/Brownout_(software_engineering)&quot;&gt;brownout&lt;/a&gt;
  on the Facebook network 📉.
  &lt;a href=&quot;https://www.youtube.com/clip/Ugkx6SyQTFgwX_kdeH018VGt2pMUbldvuKtC&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Realize the system you replaced was actually faster in some cases, but
  for all the wrong reasons ❌🙅.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxTqBY2nMAALn-OkglE5DT9dHlBuC18qf8&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After a lot of the initial work was done, Presto was deployed at Facebook and
soon after open sourced. From here, we know that the velocity of the project
picked up and once the project was independent of Facebook, the features took
off even more. While everything may seem calculated in hindsight, it was a lot
of hard work to grow the community and adoption around Presto and now Trino.
The creators knew they were making a project that would be utilized outside the
walls of Facebook, but
&lt;a href=&quot;https://www.youtube.com/clip/Ugkxh2J-1bi1rUoBpuld_FAuXYZgz2bvqPPx&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;  they could never have 
anticipated the sheer scale of adoption Trino would see&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We hope you enjoyed all the fun we had celebrating these first ten years of the
Trino project. We are thrilled to think of what the following decades will
bring. We’d like to leave you with closing thoughts from Dain:&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/6TFLKcF24HM?clip=Ugkx5bFnjvRX0USjk8vgRJdqLwZQo7Ffg0xm&amp;amp;clipt=ELfJ2gEY8o7eAQ&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>What an exciting month we had in August! August marked the ten-year birthday of the Trino project. Don’t worry if you missed all the excitment as we’ve condensed it all in this post.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-tenth-birthday/creators.jpeg" />
      
    </entry>
  
    <entry>
      <title>Make your Trino data pipelines production ready with Great Expectations</title>
      <link href="https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations.html" rel="alternate" type="text/html" title="Make your Trino data pipelines production ready with Great Expectations" />
      <published>2022-08-24T00:00:00+00:00</published>
      <updated>2022-08-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations.html">&lt;p&gt;An important aspect of a good data pipeline is ensuring data quality. 
You need to verify that the data is what you’re expecting it to be at any given
state. &lt;a href=&quot;https://greatexpectations.io/&quot;&gt;Great Expectations&lt;/a&gt; is an open source
tool created in Python that allows you to write detailed tests called
&lt;a href=&quot;https://docs.greatexpectations.io/docs/terms/expectation/&quot;&gt;expectations&lt;/a&gt;
against your data. Users write these expectations to run validations against the
data as it enters your system. These expectations are expressed as methods in
Python, and stored in JSON and YAML files. One great advantage of expectations 
is the human readable documentation that results from these tests. As you roll
out different versions of the code, you get alerted to any unexpected changes
and have version-specific generated documentation for what changed. Let’s learn
how to write expectations on tables in Trino!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-need-for-data-quality&quot;&gt;The need for data quality&lt;/h2&gt;

&lt;p&gt;Managing data pipelines is not for the faint of heart. Nodes fail, you run
out of memory, bursty traffic causes abnormal behavior, and that’s just the tip
of the iceberg. Lots of Trino community members build sophisticated
data pipelines and data applications using Trino. Building data pipelines in
Trino became more common with the addition of a
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant execution mode&lt;/a&gt; to
safeguard against failures when executing long-running and 
resource-intensive queries.&lt;/p&gt;

&lt;p&gt;Aside from all the infrastructure problems that concern data teams, another
category of problems that have been the silent problem for quite some time is
data quality. Faulty data comes in, which can either cause data pipelines to
fail, or it can possibly go unnoticed and cause inaccurate downstream reporting. 
Knowledge is scattered among domain experts, technical experts, and the code and
data itself. Maintenance becomes time-consuming and expensive. Documentation
gets out of date and unreliable. This is why using data quality checks using
libraries like Great Expectations is so important when writing ETL applications.&lt;/p&gt;

&lt;h2 id=&quot;improve-data-quality-in-trino-with-great-expectations&quot;&gt;Improve data quality in Trino with Great Expectations&lt;/h2&gt;

&lt;p&gt;As data quality moves to the forefront of the Trino community, the Great
Expectations and Trino communities have partnered to do some events together:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=pcqAOq3O3Ts&amp;amp;list=PLFnr63che7wZij92ynF_egatbsrH7by7T&amp;amp;index=3&quot;&gt;Trino meetup to discuss Great Expectations&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=4SieRmibb0U&quot;&gt;Great Expectations meetup to discuss Trino&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://superconductive.ai/&quot;&gt;Superconductive&lt;/a&gt; joined this year’s mini Trino 
Summit event 
&lt;a href=&quot;https://www.youtube.com/watch?v=kfJ63DNbAuI&amp;amp;list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&quot;&gt;Cinco de Trino&lt;/a&gt;
to showcase using 
&lt;a href=&quot;https://www.youtube.com/watch?v=9HE6LawCHP8&amp;amp;list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&amp;amp;index=7&quot;&gt;managed solutions for Great Expectations and Trino&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, we’re walking through a demo that showcases a scenario with Trino running
as the datalake query engine with multiple phases of data transformations on 
some Pokemon data sets. At each phase, we need to validate that the data is in
the correct schema, counts, and various other factors to validate. We use Trino
with Hive table with CSV for ingest and then move to Iceberg table for the
structure and consume tables. This is one of the great uses of Trino in that you
can operate using any of the popular table formats.&lt;/p&gt;

&lt;h2 id=&quot;trino-and-great-expectations-demo&quot;&gt;Trino and Great Expectations demo&lt;/h2&gt;

&lt;p&gt;In this scenario, we’re going to ingest Pokemon pokedex data and Pokemon Go 
spawn location data which lands as raw CSV files in our data lake. We then use
Trino’s Hive catalog to read the data from the landing files, clean up, and 
optimize that raw data into more performant ORC files in the structure tables.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/data-pipelines-production-ready-great-expectations/trino-ge-lakehouse.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The last step is to join and transform the spawn data and pokedex data into a
single table that is cleaned and ready to be utilized by a data analyst, data
scientist, or other data consumer. Every area of the pipeline where the data is
transformed opens up a liability. The state can go from good to bad when
infrastructure fails or is updated as newer versions of the pipeline roll out.
This is where adding Great Expectations is crucial.&lt;/p&gt;

&lt;p&gt;Now that you have a better understanding of the scenario, feel free to watch the
video, and try running it yourself!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/h6UYOilESfQ&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-datalake/blob/main/tutorials/expecting-greatness-from-trino.md&quot;&gt;Try this Trino demo yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;While data quality has always been a requirement, the standards for it increase
as the complexity of data lakes increase. It is a necessity that improves the
trust that data consumers have in the data. Dive into the 
&lt;a href=&quot;https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/database/trino/&quot;&gt;Great Expectations documentation&lt;/a&gt;
to learn more about the existing Trino support. If you run into any issues while
running the demo, reach out on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and let us 
know!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Brian Zhan</name>
        </author>
      

      <summary>An important aspect of a good data pipeline is ensuring data quality. You need to verify that the data is what you’re expecting it to be at any given state. Great Expectations is an open source tool created in Python that allows you to write detailed tests called expectations against your data. Users write these expectations to run validations against the data as it enters your system. These expectations are expressed as methods in Python, and stored in JSON and YAML files. One great advantage of expectations is the human readable documentation that results from these tests. As you roll out different versions of the code, you get alerted to any unexpected changes and have version-specific generated documentation for what changed. Let’s learn how to write expectations on tables in Trino!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/data-pipelines-production-ready-great-expectations/trino-ge.png" />
      
    </entry>
  
    <entry>
      <title>Happy tenth birthday Trino!</title>
      <link href="https://trino.io/blog/2022/08/08/trino-tenth-birthday.html" rel="alternate" type="text/html" title="Happy tenth birthday Trino!" />
      <published>2022-08-08T00:00:00+00:00</published>
      <updated>2022-08-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/08/trino-tenth-birthday</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/08/trino-tenth-birthday.html">&lt;p&gt;It’s inspiring and mindblowing to reflect on the ten year journey that has
produced the community around Trino. Trino is the community-driven fork from
Presto, the distributed big data SQL query engine created at Facebook in 2012. We
are a community of engineers, scientists, analysts, and visionaries that work in
a fast paced world where the expectations on the time to insights from our
analytics and the scale of the data are ever-increasing. Sometimes words only do
so much justice to encompass a journey like this one, so we created a video to
let you experience it yourself! Enjoy!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;trinos-first-ten-years-video&quot;&gt;Trino’s first ten years video&lt;/h1&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;p&gt;As we watch the video and think back to the five years Presto and Trino shared,
you begin to appreciate the organic development of the community, and the
excitement around the solution space that the project brought to big data. As a
baseline, Trino offers a faster and more interactive alternative to accessing
data stored in HDFS via Hive. But the project didn’t stop there. Development of
the SPI abstracted metadata and storage access to different
systems, making Trino a suitable engine to query an entire data ecosystem from
one location using ANSI SQL! Since the projects split, Trino has skyrocketed in
development from the original project and added an array of features that
we’ve listed out in the &lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;evolution of the Trino architecture blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/trajectory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To really celebrate this milestone, we wanted to offer some exciting ways for
you to learn more about Trino, and spin up Trino on your own system to play
around with it. We have a list of blogs, project stats, and ways to get involved
below. Starburst is also celebrating by offering free Trino birthday t-shirts
when you 
&lt;a href=&quot;https://www.starburst.io/sweepstakes/?utm_campaign=space-quest&quot;&gt;complete their Space Quest League mission&lt;/a&gt;.
Also don’t forget to attend 
&lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;our annual Trino Summit in November&lt;/a&gt;!&lt;/p&gt;

&lt;h1 id=&quot;learn-more-about-trino&quot;&gt;Learn more about Trino&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/p/a5a1088d3114&quot;&gt;Intro to Trino for the Trinewbie&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we could do for the Trino Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;A decade of query engine innovation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;We’re rebranding PrestoSQL to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/01/01/2019-summary.html&quot;&gt;Summary of features in 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/01/08/2020-review.html&quot;&gt;Summary of features in 2020&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/12/31/trino-2021-a-year-of-growth.html&quot;&gt;Summary of features in 2021&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-started-with-trino&quot;&gt;Getting started with Trino&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;A gentle introduction to the Hive connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/07/04/cbo-introduction.html&quot;&gt;Introduction to the Trino cost-based optimizer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;Trino getting started repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;community-statistics&quot;&gt;Community statistics&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;28250+ commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;5750+ stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;7350+ members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;6950+ pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;4000+ issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;3750+ followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;650+ average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;1050+ subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;38 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;264 Presto + Trino 🚀 releases (not including PrestoDB releases since the 
fork)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;join-our-community&quot;&gt;Join our community&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Join the &lt;a href=&quot;/slack.html&quot;&gt;Trino Slack workspace&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Watch the &lt;a href=&quot;/broadcast/&quot;&gt;Trino Community Broadcast&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Subscribe to the &lt;a href=&quot;https://www.youtube.com/c/trinodb&quot;&gt;Trino YouTube channel&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Follow us on the &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;trinodb Twitter account&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Give us a star on the &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;Trino GitHub repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Follow us on the &lt;a href=&quot;https://www.linkedin.com/company/trino-software-foundation&quot;&gt;Trino LinkedIn account&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;trino-summit-2022&quot;&gt;Trino Summit 2022&lt;/h1&gt;

&lt;p&gt;We hope you all join us in celebrating Trino’s birthday today. If you want to 
learn even more, 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;sign up for our hybrid event, Trino Summit, on the 10th of November 2022&lt;/a&gt;.
If you have a talk you’d like to give around Trino, the 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/#sponsors&quot;&gt;call for speakers&lt;/a&gt; is open
until September 15th.&lt;/p&gt;

&lt;p&gt;Join our community. We look forward to having you!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang</name>
        </author>
      

      <summary>It’s inspiring and mindblowing to reflect on the ten year journey that has produced the community around Trino. Trino is the community-driven fork from Presto, the distributed big data SQL query engine created at Facebook in 2012. We are a community of engineers, scientists, analysts, and visionaries that work in a fast paced world where the expectations on the time to insights from our analytics and the scale of the data are ever-increasing. Sometimes words only do so much justice to encompass a journey like this one, so we created a video to let you experience it yourself! Enjoy!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-tenth-birthday/how-it-started-going.png" />
      
    </entry>
  
    <entry>
      <title>A decade of query engine innovation</title>
      <link href="https://trino.io/blog/2022/08/04/decade-innovation.html" rel="alternate" type="text/html" title="A decade of query engine innovation" />
      <published>2022-08-04T00:00:00+00:00</published>
      <updated>2022-08-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/04/decade-innovation</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/04/decade-innovation.html">&lt;p&gt;It’s amazing how far we have come! Our massively-parallel processing SQL query
engine, Trino, has really grown up. We have moved beyond just querying object
stores using Hive, beyond just one company using the project, beyond usage in
Silicon Valley, beyond simple SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statements, and definitely also
beyond our expectations. Let’s have a look at some of the great technical and
architectural changes the project underwent, and how we all benefit from the
&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;commitment to quality, openness and collaboration&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;runtime-and-deployment&quot;&gt;Runtime and deployment&lt;/h2&gt;

&lt;p&gt;Starting with how you even run Trino and install it, numerous changes came about
in the last decade. We moved from Java 7 to Java 8, then to Java 11, and &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;only
recently to the latest supported Java LTS release - Java 17&lt;/a&gt;. Each time we
benefited from the innovations in the runtime performance as well as the
improved Java language features. With &lt;strong&gt;Java 17&lt;/strong&gt;, we are just about to start a lot
of these improvements.&lt;/p&gt;

&lt;p&gt;When it comes to actually &lt;a href=&quot;https://trino.io/episodes/35.html&quot;&gt;running and deploying
Trino&lt;/a&gt;, the &lt;strong&gt;tarball&lt;/strong&gt; is still a good choice
for simple installation and as a base for other packages. Over time we added
&lt;strong&gt;RPM&lt;/strong&gt; archive support, which is being replaced more and more by Docker
&lt;strong&gt;containers&lt;/strong&gt;. The container images also enable modern deployment on Kubernetes
with &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;our Helm chart&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And let us add one last note about deployments. Trino was always designed to
work on large servers. However the actual growth in a decade in the real world
has amazing to see. Machine sizes keep growing to hundreds of CPU cores and
closer to a terabyte of memory, and these truly large machines are now running
as clusters with many workers of that size. And more and more of these
deployments take advantage of our added support for the &lt;strong&gt;ARM processor
architecture&lt;/strong&gt; and the increasing availability of suitable servers from the
cloud providers.&lt;/p&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;What is security, authentication, authorization? In the beginning none of this
existed in the first releases of Trino. Two years after launch we added first
simple authentication and authorization support. Today the days when Kerberos
was critical, and you needed to use the Java KeyStore in most deployments are
long gone. The wide adoption of Trino led to improvements such as support for
&lt;a href=&quot;https://trino.io/docs/current/security/internal-communication.html&quot;&gt;automatic certificate creation and TLS for internal
communication&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/security/secrets.html&quot;&gt;secret injection from environment
variables&lt;/a&gt;, and the many
&lt;a href=&quot;https://trino.io/docs/current/security/authentication-types.html&quot;&gt;authentication
types&lt;/a&gt;
starting with LDAP and password file, to the modern OAuth2.0 and SSO systems.
Trino supports fine-grained access control and &lt;a href=&quot;https://trino.io/docs/current/language/sql-support.html#security-operations&quot;&gt;security management SQL commands
like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REVOKE&lt;/code&gt;&lt;/a&gt;.
You can secure connections from client tools, and use numerous methods to ensure
secured access to your data sources.&lt;/p&gt;

&lt;h2 id=&quot;client-tools-and-integrations&quot;&gt;Client tools and integrations&lt;/h2&gt;

&lt;p&gt;In the very beginning all you could do is submit a query to the &lt;a href=&quot;https://trino.io/docs/current/develop/client-protocol.html&quot;&gt;client REST
API&lt;/a&gt;. Very quickly
we added the &lt;a href=&quot;https://trino.io/docs/current/installation/cli.html&quot;&gt;Trino CLI&lt;/a&gt;
and the &lt;a href=&quot;https://trino.io/docs/current/installation/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;. And
while it has continued to be widely used in the community, and gathered great
features such as command-completion and history, different output formats, and
much more, the Trino CLI is not the only tool anymore. The JDBC driver, the
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt;, the &lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Go
client&lt;/a&gt;, and the ODBC driver from
&lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;, all expanded the support for different
client tools. You can query Trino in your Java-based IDE, such as IntelliJ
IDEA, or database tool, such as &lt;a href=&quot;https://dbeaver.io/&quot;&gt;DBeaver&lt;/a&gt; or
&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt;. You can take advantage of visualizations
in &lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache Superset&lt;/a&gt;, or automate with &lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache
Airflow&lt;/a&gt;, &lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt&lt;/a&gt;, or
&lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt;. And many commercial tools such as
&lt;a href=&quot;https://www.tableau.com/&quot;&gt;Tableau&lt;/a&gt;, &lt;a href=&quot;https://www.looker.com/&quot;&gt;Looker&lt;/a&gt;,
&lt;a href=&quot;https://powerbi.microsoft.com/&quot;&gt;PowerBI&lt;/a&gt;, or
&lt;a href=&quot;https://www.thoughtspot.com/&quot;&gt;ThoughtSpot&lt;/a&gt; also proudly support Trino users.&lt;/p&gt;

&lt;h2 id=&quot;sql&quot;&gt;SQL&lt;/h2&gt;

&lt;p&gt;All the client tools and integrations rely on the rich SQL support of Trino,
which has grown tremendously. Purely analytics-related support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and
all its complexities was not enough. Trino gained support for data management to
create schema and tables, but also views and materialized views. And with that
&lt;a href=&quot;https://trino.io/docs/current/language/sql-support.html#write-operations&quot;&gt;write support we needed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;&lt;/a&gt;.
That’s all done and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is next. But the core language features were not
able to satisfy the needs of our users. We added functions for a large variety
of topics ranging from simple string and &lt;a href=&quot;https://trino.io/docs/current/functions/datetime.html&quot;&gt;date
functions&lt;/a&gt; to &lt;a href=&quot;https://trino.io/docs/current/functions/json.html&quot;&gt;JSON
support&lt;/a&gt;, &lt;a href=&quot;https://trino.io/docs/current/functions/geospatial.html&quot;&gt;geospatial
functions&lt;/a&gt;, and many
others.&lt;/p&gt;

&lt;p&gt;From the core language perspective we added newer SQL functionality, such as
&lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;window functions and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; support&lt;/a&gt;. Currently we are on a journey to implement
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;support for table functions, including polymorphic table functions&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;connectors-and-data-sources&quot;&gt;Connectors and data sources&lt;/h2&gt;

&lt;p&gt;When it comes to the new SQL language features, there are two categories. There
are generic functions and statements that build on top of commonly used
functionality like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt;. These typically work with any connector and therefore
any data sources. And then there are SQL language features that need support in
a connector. After all, inserting data in PostgreSQL and an object storage
system are very different. Our community has been hard at work however, and
numerous connectors have gone way beyond simple read-only access.&lt;/p&gt;

&lt;p&gt;Looking at the number of available connectors, innovation has been tremendous.
The original Hive connector with support for HDFS and a Hive Metastore Service,
became a powerhouse of features. Support for object storage systems including
Amazon S3 and compatible systems, Azure Data Lake Storage, and Google Cloud
Storage, was supplemented by support for Amazon Glue as metastore. We also
constantly added support for different file formats in these systems, and
improved performance for ORC, Parquet, Avro, and others.&lt;/p&gt;

&lt;p&gt;The initial idea to support other data sources led to connectors for over a
dozen other databases, including relational systems such
&lt;a href=&quot;https://www.postgresql.org/&quot;&gt;PostgreSQL&lt;/a&gt;,
&lt;a href=&quot;https://www.oracle.com/database/&quot;&gt;Oracle&lt;/a&gt;, &lt;a href=&quot;https://www.microsoft.com/en-us/sql-server&quot;&gt;SQL
Server&lt;/a&gt;, and many others. We also
gained support for &lt;a href=&quot;https://www.elastic.co/elasticsearch/&quot;&gt;Elasticsearch&lt;/a&gt; and
&lt;a href=&quot;https://www.opensearch.org/&quot;&gt;OpenSearch&lt;/a&gt;, &lt;a href=&quot;https://www.mongodb.com/&quot;&gt;MongoDB&lt;/a&gt;,
&lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Apache Kafka&lt;/a&gt;, and other systems that traditionally
are not available to query with SQL. Trino unlocks completely new use cases for
these systems.&lt;/p&gt;

&lt;p&gt;The wide range of supported systems includes traditional data lakes and data
warehouses. With the emerging new table formats and the related Trino
connectors, our project is a powerful tool to run your lakehouse system. &lt;a href=&quot;https://delta.io/&quot;&gt;Delta
Lake&lt;/a&gt; and &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;
connectors are already capable of full read and write operations and include
numerous other features. An &lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Apache Hudi&lt;/a&gt; connector is
in the works and coming soon.&lt;/p&gt;

&lt;p&gt;We also have robust and widely used connectors for real-time analytics systems
like &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot&lt;/a&gt;, &lt;a href=&quot;https://druid.apache.org/&quot;&gt;Apache
Druid&lt;/a&gt; and &lt;a href=&quot;https://clickhouse.com/&quot;&gt;Clickhouse&lt;/a&gt;,
that are constantly improved by the community.&lt;/p&gt;

&lt;h2 id=&quot;query-processing-and-performance&quot;&gt;Query processing and performance&lt;/h2&gt;

&lt;p&gt;Last but not least, these queries also need to be processed. From the start high
efficiency and low latency were a core design goal, and with features like
native compilation the resulting performance surpassed other systems. Over the
years our query analyzer and planner was supplemented by more and more
sophisticated algorithms and features. Connectors learned to retrieve and manage
table statistics, the optimizer was created and morphed into a &lt;a href=&quot;/blog/2019/07/04/cbo-introduction.html&quot;&gt;cost-based
optimizer&lt;/a&gt;, and we added further
improvements that benefit query processing performance. We added dynamic
filtering, &lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;dynamic partition pruning&lt;/a&gt;, predicate pushdown, join pushdown,
aggregate function pushdown and numerous others. Each of these improvements was
also finely tuned, and runs in production with huge workloads providing us more
data on how to improve next.&lt;/p&gt;

&lt;p&gt;One large pivot we recently added was the addition of &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant query
execution mode&lt;/a&gt;. Queries execution
can survive cluster node failures when this feature is enabled. Parts of the
execution can be retried and query processing can proceed. Trino is moving on
from the best analytics engine to be the best query engine for many more use
case!&lt;/p&gt;

&lt;h2 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h2&gt;

&lt;p&gt;As you can see there is a lot to look back to and celebrate. But while we are
definitely proud of our successes working with the community, we see no time to rest.
There are many more improvements we are working on. Just to tease you a bit, let
us just mention that there will be more polymorphic table functions, new
lakehouse connectors and features, more client tools, and maybe even dynamic
configuration of the cluster.&lt;/p&gt;

&lt;p&gt;What would you like to add? Join us to celebrate and innovate towards your
favorite features. And who knows, we might see you in the &lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;Trino Summit&lt;/a&gt; in November, or in a
future episode of the &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community Broadcast&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>It’s amazing how far we have come! Our massively-parallel processing SQL query engine, Trino, has really grown up. We have moved beyond just querying object stores using Hive, beyond just one company using the project, beyond usage in Silicon Valley, beyond simple SQL SELECT statements, and definitely also beyond our expectations. Let’s have a look at some of the great technical and architectural changes the project underwent, and how we all benefit from the commitment to quality, openness and collaboration.</summary>

      
      
    </entry>
  
    <entry>
      <title>Why leaving Facebook/Meta was the best thing we could do for the Trino Community</title>
      <link href="https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html" rel="alternate" type="text/html" title="Why leaving Facebook/Meta was the best thing we could do for the Trino Community" />
      <published>2022-08-02T00:00:00+00:00</published>
      <updated>2022-08-02T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html">&lt;p&gt;It might surprise some that our departure from Facebook was one of the simplest 
decisions we’ve ever made. Many posts that discuss leaving a FAANG company focus
on leaving some grand sum of money or prestige of working at the company. For 
us, we were leaving the company where we had launched a project that we knew 
would quickly outgrow the walls of Facebook, and solve a much larger set of 
problems in the analytics domain. At the time we didn’t quite anticipate that 
Presto, a distributed SQL query engine for big data analytics, would be adopted 
around the globe by thousands of companies and an overwhelming number of 
industries. We appreciate Facebook for serving as the launchpad that inspired 
others to adopt Presto. Despite the harmonious beginnings, once the needs of the
community and Facebook no longer aligned, we had to leave, but we’ll get to that
part shortly.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/original-gang.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;people-make-up-communities-not-companies&quot;&gt;People make up communities, not companies&lt;/h2&gt;

&lt;p&gt;When we created Presto, it was clear to us that it needed to be open source.
Presto started in 2012, just before the Facebook IPO. The culture was very
conducive to starting an open source project. At that time, Facebook was working
on Open Compute which ended up disrupting the hardware industry, and we wanted
to achieve a similar impact for the analytics industry with Presto. We lobbied for and
gained approval from the VP of Infrastructure, Jay Parikh, and released 
&lt;a href=&quot;https://web.archive.org/web/20220203224702/https://www.computerworld.com/article/2485668/facebook-goes-open-source-with-query-engine-for-big-data.html&quot;&gt;Presto as an open source project&lt;/a&gt;. It’s something that we wanted to
do from the beginning, because we had worked with open source projects and 
believed that the most successful projects are open source.&lt;/p&gt;

&lt;p&gt;Getting other people and companies involved makes for a healthier project. You
end up not just building something that satisfies your needs, but needs from
everyone else, and in turn, you benefit. We reached out personally to
people from companies like Airbnb, Dropbox, Netflix, and LinkedIn to get them
involved because we wanted to bootstrap a real community. Five people at
Facebook hacking away was not enough. We actually had these companies beta test
Presto, so that when we launched, the problems that they had found were fixed.&lt;/p&gt;

&lt;p&gt;It’s important to understand why that’s beneficial to really grasp our
philosophy behind open source. In reality, when we say we’re getting more
companies involved, that’s true, but more importantly, we’re getting people
involved. Individuals in the tech space are interested in solving technology
problems. Companies are interested in solving problems that benefit their board,
investors, and their customers. It’s incredibly common to see an overlap in the
problems that engineers, analysts, and scientists are interested in solving with
the problems that companies need to solve, but it’s never guaranteed.&lt;/p&gt;

&lt;p&gt;Moreover, the interest of a company is very susceptible to change from company
growth, IPOs, acquisitions, directional pivots, and general political and
cultural changes. As people start to put their time and energy into a project,
their own identity starts to blend with the success of the project. This is much
less the case with corporations. Since corporations include many people, it
only takes a small set of people in the right position to decide that a project
is no longer aligned with the direction or goals of a company.&lt;/p&gt;

&lt;p&gt;Those of us in the Trino Software Foundation believe that 
&lt;a href=&quot;https://venturebeat.com/2021/08/27/who-owns-open-source-projects-people-or-companies/&quot;&gt;individuals that work on Trino actually make up the community&lt;/a&gt; and not the companies who so graciously allow their employees to
contribute. We view our community as visionaries that want to solve problems and
build systems that last for decades into the future. We don’t allow near-sighted
decisions that may affect the quality of the system, or that may diminish the
value of the application to the greater problem space. Most people do not want
to work on something for years, and then have the company change direction and
throw away all their work.&lt;/p&gt;

&lt;p&gt;To be clear, we’re not saying it’s a bad thing when a company moves in another
direction. That is the nature of business and having corporate involvement can
also be a healthy component of open source. To us, however, the core of what
makes a project long-lasting and beneficial for everyone using the product are
the people who are there building the system and interested in the problem
space. So what happened at Facebook that caused us to leave?&lt;/p&gt;

&lt;h2 id=&quot;why-we-left-facebook&quot;&gt;Why we left Facebook&lt;/h2&gt;

&lt;p&gt;As Presto became central to the infrastructure of prominent projects in Facebook,
it attracted the attention of engineers and managers at Facebook who wanted to 
work on this project. This is a strong sign of success, but some of these folks
did not have the same commitment to the open-source community. This was the
source of much of the conflict as engaging in open-source takes a lot of time
and effort, and we had a strict policy of “no one is special”. This means that
everyone’s code is reviewed, and just because you work for Facebook you still
have to earn commit rights. Engineers at Facebook are strongly motivated to
create “memorable” works to advance in the company, and this means this extra
work is just slowing things down. Feedback from these engineers ultimately
culminated in the managers making the decision to give automatic contributor
rights to any Facebook engineer working on Presto, so that these engineers could
move faster.&lt;/p&gt;

&lt;p&gt;You may think Facebook engineers or managers are the big bad wolf in this
scenario, but they really are not. Engineers at these highly competitive
companies must create memorable work, or they will not get the promotions they
deserve. And if you are a junior engineer and do not get promoted, you get
fired. Corporate leaders also have the right to change how they allocate
resources to work on open-source projects. There’s nothing inherently wrong with
any of this. The problem was changing the commitment we made to keep the
open-source community neutral. It was at that point we knew that we had to
create a fork of the project if we wanted to keep the community’s interest at
the forefront for the project to remain healthy.&lt;/p&gt;

&lt;p&gt;It was also at this point we made our single biggest mistake. We didn’t change
the name away from Presto. It was admittedly hard to walk away from a name we
all knew and loved. We believed that we had set up the project, so that the name
“Presto” was owned by the community and not Facebook. The truth is that once the
community walked out of the project, Facebook was the only one left in Presto
and they became the sole owner. But, the biggest reason this was absolutely the
wrong choice is much simpler; it made the people that stayed at Facebook really
angry. We expected Facebook to do what they really wanted: stop doing the extra
open-source work, fork internally, and leave the community alone. Instead, they
somehow found the motivation to do a lot of work to set up a competing project.
Finally, we spent two additional years continuing to build the Presto name
rather than building the new name and brand. In hindsight, all of this was just
dumb, and we were suffering from our own sunk cost fallacy. So we continued
under the Presto name with the distinguishing suffix of PrestoSQL versus the
original project’s PrestoDB.&lt;/p&gt;

&lt;h2 id=&quot;building-the-trino-community&quot;&gt;Building the Trino community&lt;/h2&gt;

&lt;p&gt;The new PrestoSQL project gave a new home to the existing Presto community. It
provided a project that focused on the open source community and not just the
needs of Facebook. It also gave us time to troubleshoot problems of people who
used Presto. This is what we were doing internally at Facebook but instead we
applied our knowledge of the system towards the community. This was one of the
reasons why leaving Facebook was so beneficial. As we worked closer with
everyone else, we started learning what areas of the project we should focus on
and it turns out that many of the things we were working on at Facebook were
simply not problems that all the other people in the community were facing. This
wasn’t the only benefit to us leaving Facebook, though.&lt;/p&gt;

&lt;p&gt;The hardest part about making a new project successful is user adoption. 
Building great software doesn’t organically build a community. Presto gained 
some of its initial popularity because Facebook used it. We never had to try 
very hard to develop the community initially as the Facebook brand did a great 
job at getting people’s attention. But this community was exclusive to Silicon 
Valley companies. Leaving Facebook acted as a forcing function for us to build 
the community in a classic grassroots way. We went out and started talking to 
people, getting people connected, doing more promotions and events. We were 
pretty motivated after we left. However, all of this is a lot of work for a few 
programmers and while it’s great to see people respond to your work, it takes a
lot out of you. This provided the conditions that gave rise for members to step
up in the new project and become more involved.&lt;/p&gt;

&lt;p&gt;We saw the pattern repeat when
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;we were forced to rebrand and changed the name to Trino&lt;/a&gt;.
We doubled down again on developing the community, and again participation
accelerated. It’s because of this that we believe the Trino community is stronger
than ever before.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/stars.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since the split, Trino release cycles have increased and far surpassed the speed
we had when we were running Presto. Once brand confusion was settled with the
change to the Trino name, the community numbers skyrocketed and we saw 
&lt;a href=&quot;/blog/2021/12/31/trino-2021-a-year-of-growth.html&quot;&gt;unprecedented growth in metrics like GitHub stars, YouTube subscribers, and Slack members&lt;/a&gt;. 
We have many new community-driven features released in Trino that we will be
discussing in more detail in another blog post coming soon. To name a few, Trino now 
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;supports fault-tolerant execution mode&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino/issues/37&quot;&gt;revamped its timestamp support&lt;/a&gt;, 
&lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;dynamic partition pruning&lt;/a&gt;,
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table functions&lt;/a&gt;,
&lt;a href=&quot;/blog/2021/03/10/introducing-new-window-features.html&quot;&gt;advanced window functions&lt;/a&gt;, 
and much much more!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/trajectory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;These metrics help confirm our experience in previous open source projects and
with Trino. In the long run, individual-driven open source projects tend to lead
to healthier communities and healthier ecosystems over company-driven open
source projects. We believe that, we practice that, and we are now reaping the
benefits of it as we close the pages of the first decade of this remarkable
project. We can’t begin to express how thankful we are to all of you who
believed in us and have helped grow Trino to what it is today. Also, we do
thank the Facebook leadership, especially Jay Parikh, who gave us the green
light to create and open source Presto from the beginning. We are looking
forward to the twentieth and thirtieth anniversaries as we continue to disrupt
the analytics industry and improve the lives of those who work in it.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, and David Phillips</name>
        </author>
      

      <summary>It might surprise some that our departure from Facebook was one of the simplest decisions we’ve ever made. Many posts that discuss leaving a FAANG company focus on leaving some grand sum of money or prestige of working at the company. For us, we were leaving the company where we had launched a project that we knew would quickly outgrow the walls of Facebook, and solve a much larger set of problems in the analytics domain. At the time we didn’t quite anticipate that Presto, a distributed SQL query engine for big data analytics, would be adopted around the globe by thousands of companies and an overwhelming number of industries. We appreciate Facebook for serving as the launchpad that inspired others to adopt Presto. Despite the harmonious beginnings, once the needs of the community and Facebook no longer aligned, we had to leave, but we’ll get to that part shortly.</summary>

      
      
    </entry>
  
    <entry>
      <title>Diving into polymorphic table functions with Trino</title>
      <link href="https://trino.io/blog/2022/07/22/polymorphic-table-functions.html" rel="alternate" type="text/html" title="Diving into polymorphic table functions with Trino" />
      <published>2022-07-22T00:00:00+00:00</published>
      <updated>2022-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/22/polymorphic-table-functions</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/22/polymorphic-table-functions.html">&lt;p&gt;In the Trino community, we know that being the coolest query engine is a tough
job. We boldly face the intricacies of the SQL standard to bring you the newest
and most powerful features. Today, we proudly announce that as of release 381,
Trino is on its way to full support for polymorphic table functions (PTFs).&lt;/p&gt;

&lt;p&gt;In this blog post, we are explaining the concept of table functions and 
exploring how they can be leveraged. We also look at what we have already 
implemented, and take a sneak peek into the future.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;definition-time&quot;&gt;Definition time&lt;/h3&gt;

&lt;p&gt;There are several kinds of functions you can call in a SQL query: scalar
functions, aggregate functions, and window functions. They might process the
input row by row (scalar) or all at once (aggregate). One thing they have in
common is that they return scalar values. Table functions are different. They
return tables. In a query, they can appear in any place where a table reference
shows up such as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;foo&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also use table functions in joins:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;bar&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;another_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Polymorphic table functions (PTFs) are a subset of table functions where the
schema of the returned table is determined dynamically. The returned table
schema can depend on the arguments you pass to the function.&lt;/p&gt;

&lt;h3 id=&quot;ok-but-why-are-we-so-excited&quot;&gt;OK, but why are we so excited?&lt;/h3&gt;

&lt;p&gt;We are excited because this feature is a real game changer! Polymorphic table
functions make SQL extensible, provide a framework for processing data in
previously impossible ways, and can act as a bridge between the Trino engine and
external systems or resources you might need for processing your data.
Additionally, polymorphic table functions are standard SQL, and they are very
convenient to use.&lt;/p&gt;

&lt;h3 id=&quot;what-is-available-in-trino-today&quot;&gt;What is available in Trino today?&lt;/h3&gt;

&lt;p&gt;So far, we have added a framework for table functions which can be executed by
the connector. Although this is not the full PTF feature yet, we couldn’t wait
to bring it to life. We added query pass-through table functions for JDBC-based
connectors and ElasticSearch. They mostly go by the name &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt;, and they take
a single argument, that being the query text:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;s1&quot;&gt;&apos;SELECT
          name
        FROM
          tpch.nation
        WHERE
          nationkey = 0&apos;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And this will return:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;ALGERIA&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Something you can’t notice from that example is that when you’re passing that
“query” argument, it’s taking the entire query and having PostgreSQL execute it.
Whatever connector you’re using, the query argument you pass needs to be written
so that it works on the underlying database. On the opposite and more exciting
side of that, if you have a legacy query specific to a database which has
non-standard SQL syntax and would be difficult to rewrite for Trino, now you can
pass that entire query down to the connector by wrapping it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt;
function, skipping the need to migrate it.&lt;/p&gt;

&lt;p&gt;Besides PostgreSQL, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function has equivalent implementations
for Druid, MySQL, Oracle, Redshift, SQL Server, MariaDB, and SingleStore.
ElasticSearch has a similar function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt;. You can check out the
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;Trino docs for each supported connector&lt;/a&gt;
for full details.&lt;/p&gt;

&lt;p&gt;But while we’re here, another cool example to showcase is using query
pass-through to take advantage the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause in Oracle:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;SUBSTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;SUBSTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001] = 1000,
          sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2002] = sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001] + sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2000],
          sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2002] = sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001])
      ORDER BY
        country&apos;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can pass an entire query through to leverage a feature that isn’t a part of
the SQL standard, and with that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause, Oracle can do some fancy
multidimensional array processing for you right then and there, returning the
results as a table back into Trino. We don’t want to get too sidetracked delving
into the specifics of non-Trino tech, so if you want to learn more about what
you can do, check out the connectors you use, and see what cool possibilities
are out there!&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;Now that we’ve discussed what PTFs are, how they work in Trino, and what they do
today, it’s useful to look forward to what’s coming next. The next thing we’re
working on is adding the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to BigQuery.&lt;/p&gt;

&lt;h3 id=&quot;big-ideas&quot;&gt;Big ideas&lt;/h3&gt;

&lt;p&gt;Beyond what’s currently planned, there’s a lot that polymorphic table functions
can do for us. One common function that engineers and analysts commonly request
in Trino is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;. This is a capability that dynamically groups different
values of an input column and converts each value as a set of columns in the
output table. A potential use of PTFs would enable a PIVOT-like transformation
on data, which otherwise isn’t included in the standard SQL specification.&lt;/p&gt;

&lt;p&gt;Another exciting potential is the ability to write scripts to transform or
generate tables in popular languages like Python, Scala, or Javascript. These
can be used to add even more new capabilities that SQL is missing.&lt;/p&gt;

&lt;h3 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h3&gt;

&lt;p&gt;The journey to full PTF support in Trino has just begun. A dedicated operator
for table functions is the next big thing. Right now, Trino can handle PTFs, but
they must be pushed down to the connector and executed there. The Trino engine
does not yet know how to execute them. With an operator, the Trino engine will
be able to control and handle table function execution, and we will be able to
pass tables as arguments to table functions. This will unlock the full potential
of PTFs in Trino, and empower Trino to solve a new class of problems and expand
its potential for application in many new domains.&lt;/p&gt;

&lt;p&gt;If you have any questions or ideas for table functions that you would find
useful, reach out to us on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and
we would love to hear your thoughts and feedback. We’ll also be doing a Trino
Community Broadcast on PTFs on July 28th @ 1pm EDT, so tune in then to have your
questions answered live!&lt;/p&gt;

&lt;p&gt;If you want to learn more about how to implement PTFs, we are working on another
blog post for you already.&lt;/p&gt;

&lt;p&gt;Happy querying!&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen, Brian Olsen, and Cole Bowden</name>
        </author>
      

      <summary>In the Trino community, we know that being the coolest query engine is a tough job. We boldly face the intricacies of the SQL standard to bring you the newest and most powerful features. Today, we proudly announce that as of release 381, Trino is on its way to full support for polymorphic table functions (PTFs). In this blog post, we are explaining the concept of table functions and exploring how they can be leveraged. We also look at what we have already implemented, and take a sneak peek into the future.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino updates to Java 17</title>
      <link href="https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html" rel="alternate" type="text/html" title="Trino updates to Java 17" />
      <published>2022-07-14T00:00:00+00:00</published>
      <updated>2022-07-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/14/trino-updates-to-java-17</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html">&lt;p&gt;You’ve already read the title, and it’s exciting news - as of Trino version 390,
which releases today, Trino has officially been updated from Java 11 to Java 17.
This has a few implications, the most important of which is that if you aren’t
running the Docker image (which automatically comes with the correct version of
Java) and you’ve been running Trino on Java 16 or older, you’ll need to update
Java to run Trino versions 390 and later. It’s also worth mentioning that newer
versions of Java, such as Java 18 or 19, are not supported - they might work,
but they haven’t been tested or benchmarked - Java 17 is the new, recommended
version for Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The reason this change is exciting is that using a new and better version of
Java will make Trino better, too! This initial change is an update to the
runtime version, or what the Trino engine uses while it runs. Because the Java
language performs slightly better on the whole with this update, you may see
some small, across-the-board performance improvements when switching from Java
11 to Java 17. So when you’ve got the time, we strongly recommend making the
upgrade!&lt;/p&gt;

&lt;p&gt;The plan is to update the build to Java 17 a few weeks from now, which will also
allow us to use Java 17 APIs and the changes to the language in Trino code. With
new language features, there are more tools in the development toolkit, and
it’ll allow us to write cleaner and better code moving forwards.&lt;/p&gt;

&lt;p&gt;This upgrade has been in the works for a while and been a long time coming, so
if you want to learn more about the specifics, one of the best places to check
that out is the Trino Community Broadcast. Updating to Java 17 was the focus of
&lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;episode 36&lt;/a&gt;, and we also talked about it
previously in &lt;a href=&quot;https://trino.io/episodes/35.html&quot;&gt;episode 35&lt;/a&gt;. If you want to
check out the code changes that made this happen, you can view
&lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;the tracking issue on Github&lt;/a&gt; for
more information.&lt;/p&gt;

&lt;p&gt;And finally, we want to give a shoutout to &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;
for all the hard work in driving this change.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>You’ve already read the title, and it’s exciting news - as of Trino version 390, which releases today, Trino has officially been updated from Java 11 to Java 17. This has a few implications, the most important of which is that if you aren’t running the Docker image (which automatically comes with the correct version of Java) and you’ve been running Trino on Java 16 or older, you’ll need to update Java to run Trino versions 390 and later. It’s also worth mentioning that newer versions of Java, such as Java 18 or 19, are not supported - they might work, but they haven’t been tested or benchmarked - Java 17 is the new, recommended version for Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>How to use Airflow with Trino</title>
      <link href="https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html" rel="alternate" type="text/html" title="How to use Airflow with Trino" />
      <published>2022-07-13T00:00:00+00:00</published>
      <updated>2022-07-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html">&lt;p&gt;The recent addition of the &lt;a href=&quot;/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant
execution&lt;/a&gt; architecture,
delivered to Trino by Project Tardigrade, makes the use of Trino for running
your ETL workloads an even more compelling alternative than ever before. We’ve
set up a demo environment for you to easily give it a try in &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst
Galaxy&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;With Project Tardigrade providing an out-of-the-box solution with advanced
resource-aware task scheduling and granular retries at the task/query level, we still
need a robust tool to schedule and manage workloads themselves. Apache
Airflow is a great choice for this purpose.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache Airflow&lt;/a&gt; is a widely used workflow engine that allows you to schedule and
run complex data pipelines. Airflow provides many plug-and-play operators and
hooks to integrate with many third-party services like Trino.&lt;/p&gt;

&lt;p&gt;To get started using Airflow to run data pipelines with Trino you need to
complete the following steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Install of Apache Airflow 2.10+&lt;/li&gt;
  &lt;li&gt;Install the TrinoHook&lt;/li&gt;
  &lt;li&gt;Create a Trino connection in Airflow&lt;/li&gt;
  &lt;li&gt;Deploy a TrinoOperator&lt;/li&gt;
  &lt;li&gt;Deploy your DAGs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;installing-apache-airflow-in-docker&quot;&gt;Installing Apache Airflow in Docker&lt;/h2&gt;

&lt;p&gt;The best way to get you going, if you don’t already have an Airflow cluster
available, is to run Airflow in a container using docker compose. Just be
aware that this is not best practice for a production environment.&lt;/p&gt;

&lt;p&gt;Requirements for the host:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker&lt;/li&gt;
  &lt;li&gt;Docker Compose 1.28+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 1) Create a directory named airflow for all our configuration files.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ mkdir airflow
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 2) In the airflow directory create three subdirectory called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dags&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plugins&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;logs&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cd airflow
$ mkdir dags plugins logs
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 3) Download the Airflow docker compose yaml file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ curl -LfO &apos;https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 4) Create an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.env&lt;/code&gt; configuration file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ echo -e &quot;AIRFLOW_UID=$(id -u)&quot; &amp;gt; .env
$ echo &quot;AIRFLOW_GID=0&quot; &amp;gt;&amp;gt; .env 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 5) Start the Airflow containers&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;installing-the-trinohook&quot;&gt;Installing the TrinoHook&lt;/h2&gt;

&lt;p&gt;If running Airflow in docker, you need to install the TrinoHook in
all the docker containers using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;apache/airflow:x.x.x&lt;/code&gt; image.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker ps 
CONTAINER ID   IMAGE                  PORTS                              NAMES
cffdfaeb757e   apache/airflow:2.3.0   0.0.0.0:8080-&amp;gt;8080/tcp             airflow_airflow-webserver_1
b0e72f479a66   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-worker_1
4cdb11b3e5e3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-triggerer_1
41d3c3107ddb   apache/airflow:2.3.0   0.0.0.0:5555-&amp;gt;5555/tcp, 8080/tcp   airflow_flower_1
229a11e9cdd3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-scheduler_1
68160240857d   postgres:13            5432/tcp                           airflow_postgres_1
a96b98da85df   redis:latest           6379/tcp                           airflow_redis_1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To install the TrinoHook you run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pip install apache-airflow-providers-trino&lt;/code&gt; in
the first five containers.  Run the following command replacing the container id of
each of the containers in your deployment.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker exec -it &amp;lt;container_id&amp;gt; pip install apache-airflow-providers-trino
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you have done that you need to restart all five containers:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker container restart &amp;lt;container_id_1&amp;gt; ... &amp;lt;container_id_5&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;creating-a-trino-connection&quot;&gt;Creating a Trino connection&lt;/h2&gt;

&lt;p&gt;After you have installed the TrinoHook and restarted Airflow you can create a
connection to your Trino cluster through the Airflow web UI.  If you just
installed Airflow, then go to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http://localhost:8080&lt;/code&gt; on your browser and login.
The default credentials unless changed are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow&lt;/code&gt; for username and password.&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Admin&lt;/strong&gt; &amp;gt; &lt;strong&gt;Connections&lt;/strong&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-connections.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Click on the blue button to &lt;strong&gt;Add a new record&lt;/strong&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-new-connection.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;Trino&lt;/strong&gt; from the &lt;strong&gt;Connection Type&lt;/strong&gt; dropdown and provide the following information:&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;Connection Id&lt;/td&gt;
   &lt;td&gt;Whatever you want to call your connection.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
    Host
   &lt;/td&gt;
   &lt;td&gt;The hostname or host ip of your trino cluster, e.g., &lt;code&gt;localhost&lt;/code&gt;, &lt;code&gt;10.10.10.1&lt;/code&gt;, or &lt;code&gt;www.mytrino.com&lt;/code&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Schema&lt;/td&gt;
   &lt;td&gt;A schema in your Trino cluster.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Login&lt;/td&gt;
   &lt;td&gt;The username of the user that Airflow uses to connect to Trino.  Best practice would be to create a service account like ‘airflow’. Just understand that this user access level is used to execute SQL statements in Trino.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Password&lt;/td&gt;
   &lt;td&gt;The password of the user that Airflow uses to connect to Trino if authentication is enabled.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Port&lt;/td&gt;
   &lt;td&gt;The port where the Trino Web UI can be accessed, e.g., &lt;code&gt;8080&lt;/code&gt;, &lt;code&gt;8443&lt;/code&gt;.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Extra&lt;/td&gt;
   &lt;td&gt;Additional settings, like &lt;code&gt;protocol:https&lt;/code&gt; if using TLS, or &lt;code&gt;verify:false&lt;/code&gt; if you are using a self-signed certificate.&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Be aware that the test button might not actually return any feedback for Trino connections.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-add-connection.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;deploying-a-trinooperator&quot;&gt;Deploying a TrinoOperator&lt;/h2&gt;

&lt;p&gt;At the time of writing this article there is no TrinoOperator, so you have to
write your own.  You find an implementation in the following section, to get you started.  This operator allows you to
execute any SQL statements that Trino supports such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET SESSION&lt;/code&gt;, and others. You can run multiple statements in a single task so
they are part of a single Trino session.&lt;/p&gt;

&lt;p&gt;To create the TrinoOperator use your favorite text editor to create a file called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino_operator.py&lt;/code&gt; with the following code in it and place it in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow/plugins&lt;/code&gt; directory you created earlier. Airflow automatically compiles the code and you are ready to start
writing DAGs.&lt;/p&gt;

&lt;p&gt;For those new to Airflow, DAG (Directed Acyclic Graph) is a core Airflow
concept, a collection of tasks with dependencies and relationships that indicate
to Airflow how they should be executed. DAGs are written in Python.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.models.baseoperator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BaseOperator&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.utils.decorators&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;apply_defaults&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.providers.trino.hooks.trino&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typing&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequence&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Callable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fetchall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoCustomHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Callable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:sphinx-autoapi-skip:&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BaseOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;template_fields&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequence&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@apply_defaults&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Creating Trino connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoCustomHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isinstance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;

            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing single sql statement&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;get_first&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing multiple sql statements&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isinstance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql_statement&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;extend&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))))&lt;/span&gt;

            &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing multiple sql statements&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;deploying-a-dag&quot;&gt;Deploying a DAG&lt;/h2&gt;

&lt;p&gt;Now that you have deployed the TrinoOperator you can start writing DAGs for your
data pipelines. Let’s write and deploy a simple sample DAG.  DAGs just like the
TrinoOperator are deployed into the airflow/dags
directory you created earlier.&lt;/p&gt;

&lt;p&gt;Create a file called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_first_trino_dag.py&lt;/code&gt; with the following code, and save it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow/dags&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pendulum&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DAG&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.operators.python_operator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PythonOperator&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_operator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TrinoOperator&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;## This method is called by task2 (below) to retrieve and print to the logs the return value of task1
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_instance&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Return Value: &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;xcom_pull&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task_ids&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;return_value&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;default_args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;depends_on_past&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dag_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;my_first_trino_dag&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;schedule_interval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;0 8 * * *&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;start_date&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pendulum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2022&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;US/Central&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;catchup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tags&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 1 runs a Trino select statement to count the number of records 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## in the tpch.tiny.customer table
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;select count(1) from tpch.tiny.customer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 2 is a Python Operator that runs the print_command method above 
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PythonOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;python_callable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;provide_context&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 3 demonstrates how you can use results from previous statements in new SQL statements
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_3&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;select { { task_instance.xcom_pull(task_ids=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;,key=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;return_value&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;)[0] } }&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 4 demonstrates how you can run multiple statements in a single session.  
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Best practice is to run a single statement per task however statements that change session 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## settings must be run in a single task.  The set time zone statements in this example will 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## not affect any future tasks but the two now() functions would timestamps for the time zone 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## set before they were run.
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_4&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;set time zone &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;America/Chicago&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;; select now(); set time zone &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;UTC&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; ; select now()&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## The following syntax determines the dependencies between all the DAG tasks.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Task 1 will have to complete successfully before any other tasks run.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Tasks 3 and 4 won&apos;t run until Task 2 completes.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Tasks 3 and 4 can run in parallel if there are enough worker threads. 
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;task2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;task4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just like with the TrinoOperator DAGs are picked up and compiled by Airflow
automatically.  When Airflow fails to compile your DAG it displays an error
message at the top of the page in the main page where all the DAGs are listed.
You can refresh this page a few times until your DAG is either added to the list
or you see an error message.  You can expand the message to see the source of
the error.  Usually the information provided is enough to understand the issue.&lt;/p&gt;

&lt;p&gt;Once the DAG shows up on your list you can trigger a manual run, using the play
button on the right to  activate your DAG.  I recommend switching to the Graph
view, using the action links on the right to see  how tasks change status as
they run.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-dag.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;You can see logs for each task by clicking on the corresponding box and selecting Log from the options at the top.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;60%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-task.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Check out the logs for the print_command task to see the return value of select statement from task_1&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;60%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-logs.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;As you can see, output from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;print()&lt;/code&gt; commands can be found in these logs.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Apache Airflow has been around for many years now. It is used by many large
companies in production environments. The open source project has an active
community, and I expect that in the near future we will have an official
TrinoHook with additional out-of-the-box functionality. While there might be a
slight learning curve for new users I think that is worth it.&lt;/p&gt;

&lt;p&gt;On the Trino side there are some exciting enhancements for &lt;a href=&quot;/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant
execution&lt;/a&gt; on
the roadmap of Project Tardigrade that will make Trino and Airflow an even
better combination.&lt;/p&gt;

&lt;p&gt;Stay tuned.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note from Trino community&lt;/em&gt;: We welcome blog submissions from the community. If
you have blog ideas, send a message in the #dev chat. We will mail you
Trino swag as a token of appreciation for successful submissions. Enter the &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino
Slack&lt;/a&gt;
and join the conversation in the #project-tardigrade
&lt;a href=&quot;https://join.slack.com/share/enQtMzc3OTczMzkxNDU0OC1mNzEyOWUzNjUyMTgyNDU3ZGJlYTZjYTllYTI1ZmFhMDBlMzYwZWQzOGVkMjhhOGNlMmQ5MWIxM2RmNzZjNWY0&quot;&gt;channel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://cutt.ly/airflow-reddit&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=32100426&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Willie Valdez</name>
        </author>
      

      <summary>The recent addition of the fault-tolerant execution architecture, delivered to Trino by Project Tardigrade, makes the use of Trino for running your ETL workloads an even more compelling alternative than ever before. We’ve set up a demo environment for you to easily give it a try in Starburst Galaxy.</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing the 2022 Trino Summit</title>
      <link href="https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers.html" rel="alternate" type="text/html" title="Announcing the 2022 Trino Summit" />
      <published>2022-06-30T00:00:00+00:00</published>
      <updated>2022-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers.html">&lt;p&gt;We are pleased to announce the upcoming 2022 Trino Summit. The summit is
scheduled as &lt;em&gt;hybrid&lt;/em&gt; event on the 10th of November 2022, and attendance is
free! You will be able to join us online, or you can make the trip to San
Francisco and meet us at the Commonwealth Club on the downtown waterfront.
Please be aware that spots at the live event are limited, so register soon if
you want to attend. Please also be aware that you need to register regardless of
whether you’ll be joining us in-person or online.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register to attend
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Starburst is the lead sponsor for the summit, but they welcome other sponsors to
help make this a successful event for the Trino community. If that interests you
or your employer, you should &lt;a href=&quot;mailto:events@starburst.io&quot;&gt;contact the Starburst team for more information.&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you’d like to share your knowledge and information about Trino usage and give
a talk at this year’s Trino Summit, we’re putting out a call for speakers. We
will be accepting submissions from now until September 15th, but we recommend
submitting soon, because slots are filling up fast.&lt;/p&gt;

&lt;p&gt;We’re looking for intermediate to advanced-level talks on a variety of themes.
If you have an interesting story about how you were able to leverage Trino,
found a neat way to extend it with a custom plugin, or swapped to Trino for a
performance win, we’d love to hear about it. We’re excited to expand our speaker
lineup with talks from the broader Trino community. If you’re interested, you
can check out the speaker registration page for more information.&lt;/p&gt;

&lt;p&gt;And of course, we’re looking forward to seeing you there, whether in-person or
online!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update from 15th September 2022:&lt;/em&gt; The call for speakers is closed. Thank you
for all your submissions.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>We are pleased to announce the upcoming 2022 Trino Summit. The summit is scheduled as hybrid event on the 10th of November 2022, and attendance is free! You will be able to join us online, or you can make the trip to San Francisco and meet us at the Commonwealth Club on the downtown waterfront. Please be aware that spots at the live event are limited, so register soon if you want to attend. Please also be aware that you need to register regardless of whether you’ll be joining us in-person or online. Register to attend Starburst is the lead sponsor for the summit, but they welcome other sponsors to help make this a successful event for the Trino community. If that interests you or your employer, you should contact the Starburst team for more information.</summary>

      
      
    </entry>
  
    <entry>
      <title>Using Trino as a batch processing engine</title>
      <link href="https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load.html" rel="alternate" type="text/html" title="Using Trino as a batch processing engine" />
      <published>2022-06-24T00:00:00+00:00</published>
      <updated>2022-06-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load.html">&lt;p&gt;This past week, &lt;a href=&quot;https://github.com/arhimondr&quot;&gt;Andrii Rosa&lt;/a&gt; hosted a virtual
Trino meetup on the topic of using Trino as a batch processing engine. You can
view the talk from the meetup embedded below. Andrii dives into the history of
Trino as an engine for Batch ETL (extract, transform, load) processing, some
challenges related to that, as well as the new fault-toleration execution
capabilities being added to Trino and how they improve it for Batch ETL use
cases.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;
&lt;iframe width=&quot;560&quot; height=&quot;400&quot; src=&quot;https://www.youtube.com/embed/2Ywqbz4T-Sw?t=1116&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Andrii also gives an update on the work in progress with fault-tolerant
execution, where we are today, and what’s planned for the near future. The
meetup wraps up a with an attendee Q&amp;amp;A at the end. If you’d like to learn more,
go check out the talk!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>This past week, Andrii Rosa hosted a virtual Trino meetup on the topic of using Trino as a batch processing engine. You can view the talk from the meetup embedded below. Andrii dives into the history of Trino as an engine for Batch ETL (extract, transform, load) processing, some challenges related to that, as well as the new fault-toleration execution capabilities being added to Trino and how they improve it for Batch ETL use cases.</summary>

      
      
    </entry>
  
    <entry>
      <title>Building A Modern Data Stack for QazAI</title>
      <link href="https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai.html" rel="alternate" type="text/html" title="Building A Modern Data Stack for QazAI" />
      <published>2022-06-08T00:00:00+00:00</published>
      <updated>2022-06-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai.html">&lt;p&gt;At QazAI, we build data lakes as a service for companies.  In the original
architecture, we get raw data in S3, transform the S3 data with Hive, and then
delivered the data to business units via our datamart built on Clickhouse (for optimal delivery speeds). Over time, we were dragged down by the slower speeds and high costs of running Hive, and started shopping for a faster and cheaper open source engine to do our ETL data transformations.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/old-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows our existing stack. The big problem to solve was that the
Hadoop cluster was extremely inefficient. This leads to slow queries, and up
to 10x higher costs.&lt;/p&gt;

&lt;p&gt;Like many others, I was initially drawn to Trino to run analytics over Hive
tables because of its speed, but found many other advantages as well. Key among
them are the following characteristics.&lt;/p&gt;

&lt;h2 id=&quot;speed&quot;&gt;Speed&lt;/h2&gt;

&lt;p&gt;Queries ran 10 to 100 times faster, compared to our old stack. It was fantastic,
simply beyond our expectations.&lt;/p&gt;

&lt;h2 id=&quot;standard-sql&quot;&gt;Standard SQL&lt;/h2&gt;

&lt;p&gt;Standard SQL dialect that everyone already knew. Data analysts loved getting to
use a dialect they were already familiar with.&lt;/p&gt;

&lt;h2 id=&quot;federated-analytics&quot;&gt;Federated analytics&lt;/h2&gt;

&lt;p&gt;Ability to connect with other databases and run federated queries. After I had
connected all the available data sources, I showed the results to the data
analysts. They were simply amazed, some were shocked when the ‘join’ operation
between the tables of various databases had been completed successfully. To
emphasize - this saved days of work.  You could join data from other data
sources straight away, avoiding the need to create a staging layer in the data
warehouse.&lt;/p&gt;

&lt;h2 id=&quot;simplicity-of-setup&quot;&gt;Simplicity of setup&lt;/h2&gt;

&lt;p&gt;Trino just works out of the box. This is what makes it great. As open source
users, we’re used to going through a complicated software setup process. But
with Trino, there’s no need to deploy anything else. You simply install packages
from the open source repository, and things work. It’s magical. To top that off,
Trino feels like a commercial product with its detailed documentation and active
Slack community that is willing to help you out on everything.&lt;/p&gt;

&lt;h2 id=&quot;exploring-trino-as-an-option-for-etl&quot;&gt;Exploring Trino as an option for ETL&lt;/h2&gt;

&lt;p&gt;A great number of connectors, standard SQL, high processing speed - all these
advantages raise an obvious question: ‘Why not use Trino for ETL processes as
well?&lt;/p&gt;

&lt;p&gt;At QazAI, the key blocker to using Trino for ETL was that Trino doesn’t have
fault tolerance. As a result, our pipelines did not have reliable landing times,
and required a lot of manual monitoring.&lt;/p&gt;

&lt;p&gt;This is precisely what made Project Tardigrade so exciting for us. Proving that
Trino is indeed a true community-driven project, Trino community members have
embarked on the Tardigrade project. The main feature of this technology is the
ability to divide the query into phases, and restart the failed phases. We’ve
been running tests to explore this. The ETL pipeline on Trino running on 5 bare
metal nodes is 20 times faster compared to ETL running on the stack consisting
of Sqoop, HDFS, Hive, and custom Python scripts.&lt;/p&gt;

&lt;h2 id=&quot;testing-trino-for-etl&quot;&gt;Testing Trino for ETL&lt;/h2&gt;

&lt;p&gt;Let’s play a bit with the rental database called DVD.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/rentaldb-schema.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For instance, we create the database shown above in PostgreSQL and work with the &lt;em&gt;rental&lt;/em&gt; table.&lt;/p&gt;

&lt;p&gt;First, we move the table from PostgreSQL to our warehouse in HDFS and Hive.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt;  
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we perform the same operation but we use the table of Iceberg format on S3 with hidden partitioning.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt;  
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitioning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;month(rental_date)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;bucket(inventory_id, 10)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we perform the same operation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;username&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;picture&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;activebool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;create_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Great. What if there is a need to enrich the data with the employees’ and
clients’ names? To do this, we create a table, move it to the
core layer, and then apply denormalization.&lt;/p&gt;

&lt;p&gt;Here we move the measurements table.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;username&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;picture&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;activebool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;create_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s union the Staff and Customers tables.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_core_rental&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;--cast(customer_id as integer) as customer_id,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;--cast(staff_id as integer) as staff_id,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If this table is required by data analysts, then we can easily move it to the data mart (the Clickhouse layer we use to deliver data to end users).&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;   
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;MergeTree&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;order_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;customer_name&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;customer_lastname&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A simple insert/select query and nothing more.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_core_rental&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Alternatively we can easily move the datamart to Clickhouse directly from PostgreSQL without intermediate data layers.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Great.&lt;/p&gt;

&lt;p&gt;One may suggest that this sample dataset is a small one with only 16 000 rows.
The production ETL is mostly run over huge tables containing millions or
billions of rows.  Let’s test. We work with the &lt;em&gt;tpch&lt;/em&gt; database with the scaling
factor 3000.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/tpch-schema.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For testing, we consider three tables: &lt;em&gt;lineitem&lt;/em&gt; (18 billion rows),
&lt;em&gt;orders&lt;/em&gt; (450 million rows) and &lt;em&gt;partsupp&lt;/em&gt; (2.4 billion rows).&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_customer&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;450&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_lineitem&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_partsupp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partsupp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we try to join all three of these tables as it is shown in the ER diagram.
Let’s make it more challenging by turning off one of the workers, which should
result in a query failure. To enable the automatic query rerun of the failed one
we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy=QUERY&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config. properties&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_lineitem_joined&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linenumber&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;quantity&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;extendedprice&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;discount&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tax&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;returnflag&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linestatus&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;commitdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;receiptdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipinstruct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipmode&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;comment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;availqty&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;supplycost&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shippriority&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_partsupp&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query has been completed in 4 hours. Also, at query processing, worker 22
has been turned off. The query has been automatically started over and completed
successfully. At the query processing, three tables have been joined (&lt;em&gt;the
triple join&lt;/em&gt;): 18 billion rows x 2.4 billion rows x 450 million rows.&lt;/p&gt;

&lt;p&gt;This experiment gave us the confidence to move forward in our plans to rebuild
our architecture with Trino in order to perform analytical and transformational
manipulations upon data directly in S3, which will allow us to exclude HDFS and
Hive interference in these processes.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/new-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;As a result we will achieve faster pipelines.&lt;/p&gt;

&lt;p&gt;A huge thanks to the Trino development team and the Trino community for an
excellent product, which I enjoy using and allows me to go beyond conventional
usage patterns.&lt;/p&gt;

&lt;p&gt;If you are looking for help building your data warehouse, or if you’re
interested in joining us at QazAI, feel free to reach out to me at Baurzhan Kuspayev on the &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note from Trino community&lt;/em&gt;: We welcome blog submissions from the community. If you have blog ideas, please send a message in the #dev chat. We will mail you Trino swag as a token of appreciation for successful submissions. &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://cutt.ly/qaz-ai-trino-reddit&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=31672725&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Baurzhan Kuspayev</name>
        </author>
      

      <summary>At QazAI, we build data lakes as a service for companies. In the original architecture, we get raw data in S3, transform the S3 data with Hive, and then delivered the data to business units via our datamart built on Clickhouse (for optimal delivery speeds). Over time, we were dragged down by the slower speeds and high costs of running Hive, and started shopping for a faster and cheaper open source engine to do our ETL data transformations.</summary>

      
      
    </entry>
  
    <entry>
      <title>An opinionated guide to consolidating our data</title>
      <link href="https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html" rel="alternate" type="text/html" title="An opinionated guide to consolidating our data" />
      <published>2022-05-24T00:00:00+00:00</published>
      <updated>2022-05-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html">&lt;h2 id=&quot;maximizing-your-experience-with-zero-choices&quot;&gt;Maximizing your experience with zero choices.&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;I’m publishing this blog post in partnership with the Trino community to go
along a lightning talk I’m giving for their event, Cinco de Trino. This article
was originally published &lt;a href=&quot;https://abhi-vaidyanatha.medium.com/an-opinionated-guide-to-consolidating-your-data-b09386b2b9b5&quot;&gt;on Abhi’s Medium
site&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“My data is all over the place and attempting to analyze or query it is not
only time consuming and expensive, but also emotionally taxing.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;Maybe you haven’t heard those exact words before, but data consolidation is a
real problem. It is common for organizations to have correlated data stored in
various silos or APIs. Performing consistent operations across these various
data sources requires understanding both architecture and surgery, skills that
you may not have picked up as a data practitioner. If you’re part of the Trino
community and are reading this post, you’ve likely encountered unperformant
queries due to unconsolidated data.&lt;/p&gt;

&lt;p&gt;In the past, the data engineering world was not graced with the same level of
love and &lt;a href=&quot;https://tailwindcss.com/&quot;&gt;tooling&lt;/a&gt; as other communities, so we were
expected to make do with whatever came our way. In order to perform the wildly
basic task of moving our data around, we were asked to tithe large sums of money
to the closed-source ELT overlords.&lt;/p&gt;

&lt;p&gt;So where does that leave us? Thankfully things have changed, so here’s how you
can move all your data to a central location for free (well, minus the
infrastructure costs) while making few architectural choices.&lt;/p&gt;

&lt;h2 id=&quot;the-tool&quot;&gt;The tool&lt;/h2&gt;
&lt;p&gt;You don’t have too many choices for FOSS ELT/ETL.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://airbyte.com/&quot;&gt;Airbyte&lt;/a&gt; has been recently making waves as the main
contender for open-source ELT. As of writing this article, it’s only been around
for about two years, during which its established itself as one of the fastest
growing startups in existence. It requires three terminal commands to deploy and
is managed entirely through a UI, so it’s operable by many. It also supports
syncing your data incrementally, so you don’t need to resync existing data when
you want to sync new data. It is relatively new, so some of the polish that
comes with an established project is not there yet. Think of it like a
precocious child.&lt;/p&gt;

&lt;p&gt;You could use &lt;a href=&quot;https://meltano.com/&quot;&gt;Meltano&lt;/a&gt; to take advantage of the large
&lt;a href=&quot;https://www.singer.io/&quot;&gt;Singer&lt;/a&gt; connector ecosystem, but it’s more complicated
to set up and is more of a holistic ops platform, which may be excessive for
your use case.&lt;/p&gt;

&lt;p&gt;You could also use this esoteric project called KETL that is only available at
this sketchy SourceForge &lt;a href=&quot;https://sourceforge.net/projects/ketl/&quot;&gt;link&lt;/a&gt;. But
maybe don’t do that.&lt;/p&gt;

&lt;p&gt;For consolidating your data, use Airbyte. It’s straightforward to setup,
requires minor configuration, and has tightly scoped responsibilities.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/640/1*zqLMo7P3o_HG7EJ2E1dbpg.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-destination&quot;&gt;The destination&lt;/h2&gt;

&lt;p&gt;Let’s use a data lake. Its unstructured nature leaves more flexibility for
purpose and we’ll assume that our data has not been processed or filtered yet.&lt;/p&gt;

&lt;p&gt;Data warehouses are more expensive, require more upkeep, and benefit from the
ETL paradigm as opposed to ELT. Airbyte is an ELT tool focused mostly on the EL
bit, which makes it easier to use with the unstructured data lakes.&lt;/p&gt;

&lt;p&gt;Additionally, S3 supports query engines such as Trino, which will allow us to
query and analyze our data once its been consolidated. Trino also functions as a
powerful data lake transformation engine, so if you’re on the fence due to data
malleability, this might help bring you over.&lt;/p&gt;

&lt;p&gt;We could use Azure Blob Storage or GCS, but for this tutorial, I’ll be keeping
it simple with Amazon S3. If you’ve set up an S3 bucket and IAM, skip the next
paragraph.&lt;/p&gt;

&lt;p&gt;Create a S3 bucket with default settings and grab an access key from IAM. To do
this, head to the top right of the screen in the AWS Management Console where it
says your email provider and then click on &lt;strong&gt;Security Credentials&lt;/strong&gt;. Click
&lt;strong&gt;Create New Access Key&lt;/strong&gt; and save that information for later.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1202/1*mYeldXLcvi7iPBDZ1GKEug.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-deployment&quot;&gt;The deployment&lt;/h2&gt;

&lt;p&gt;Today, we’ll be deploying Airbyte locally on a workstation. Alternatively, you
can deploy it on your own infrastructure, but this requires managing networking
and security, which is unpalatable for a quick demonstration. If you want your
syncs to continue running in perpetuity, you’ll want to deploy Airbyte
externally to your machine. For a guide to deploying Airbyte on EC2 click
&lt;a href=&quot;https://docs.airbyte.com/deploying-airbyte/on-aws-ec2&quot;&gt;here&lt;/a&gt;. For a guide to
deploying Airbyte on Kubernetes, click
&lt;a href=&quot;https://docs.airbyte.com/deploying-airbyte/on-plural&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To begin, install &lt;a href=&quot;https://www.docker.com/products/docker-desktop/&quot;&gt;Docker&lt;/a&gt; and
docker-compose on your workstation.&lt;/p&gt;

&lt;p&gt;Then clone the repository and spin up Airbyte with docker-compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:airbytehq/airbyte.git
cd airbyte
docker-compose up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you see the following banner, you’re good to go.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1148/1*7Fg7Vwi5vgkg94SYRuACLQ.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-data-sources&quot;&gt;The data sources&lt;/h2&gt;

&lt;p&gt;Head over to localhost:8000 on your machine, complete the sign-up flow, and
you’ll be greeted with an onboarding workflow. We’re going to skip this workflow
to emulate a traditional usage of Airbyte. Click on the Sources tab in the left
sidebar and click on +New Source. This is where we’ll be setting up all of our
disparate data sources.&lt;/p&gt;

&lt;p&gt;Search for your data sources in the drop down and fill out the required
configuration. If you’re having trouble setting up a particular data source,
head to the &lt;a href=&quot;https://docs.airbyte.com/&quot;&gt;Airbyte docs&lt;/a&gt;. There’s a dedicated page
for every connector; for example, this is the &lt;a href=&quot;https://docs.airbyte.com/integrations/sources/google-analytics-v4&quot;&gt;setup
guide&lt;/a&gt; for
the Google Analytics source. If you’re just testing Airbyte out, use the PokeAPI
source, as it lets you sync dummy data with no authentication. If your required
data source doesn’t exist, you can request it
&lt;a href=&quot;https://airbyte.com/connector-requests&quot;&gt;here&lt;/a&gt; or build it yourself by heading
&lt;a href=&quot;https://docs.airbyte.com/connector-development/&quot;&gt;here&lt;/a&gt; (isn’t open-source
great?)&lt;/p&gt;

&lt;p&gt;Once you have all of your data sources set up, it will look something like this.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*6_sNtdhFKkSnicyqe2Hhmg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Now we just need to set up our connection to S3 and we are good to go.&lt;/p&gt;

&lt;h2 id=&quot;the-destination-again&quot;&gt;The destination (again)&lt;/h2&gt;

&lt;p&gt;Head over to the &lt;em&gt;Destinations&lt;/em&gt; tab in the left sidebar and follow the same
process for setting up our connection to S3. Click on &lt;em&gt;+New Destination&lt;/em&gt; and
search for S3. Then fill out the configuration for your bucket. We’ll now use
that access key that we generated earlier!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*24LRs9-dB7l35DgsXU6pqQ.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For output format, I recommend using Parquet for analytics purposes. It’s a
&lt;a href=&quot;https://www.qubole.com/tech-blog/columnar-format-in-data-lakes-for-dummies/&quot;&gt;columnar storage
format&lt;/a&gt;,
which is optimized for reads. JSON, CSV, and Avro are supported, but will be
less performant on read.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*tVw2sbTLYDlHpKB97M7cKg.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-connection&quot;&gt;The connection&lt;/h2&gt;

&lt;p&gt;Finally, head over to the &lt;strong&gt;Connections&lt;/strong&gt; tab in the sidebar and click &lt;strong&gt;+New
Connection&lt;/strong&gt;. You will need to do this process for each data source that you
have set up. Select any existing source and click your S3 Destination that you
set up from the drop down. I failed to set up a connection with my GitHub
source, so I navigated to the Airbyte Troubleshooting Discourse and filed an
issue. Response times are really fast there, so I’ll likely be able to resolve
this within a day or two.&lt;/p&gt;

&lt;p&gt;You will then be greeted with the following connection setup page. For most
analytics jobs, syncing more frequently than every 24 hours is expensive and
overkill, so stick with the default. For sources that support it, click on the
sync mode in the streams table to use the &lt;strong&gt;Incremental / Append&lt;/strong&gt; sync mode.
This ensures that every time you sync, Airbyte will check for new data and only
pull in data that you haven’t synced before.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*FZyFWtb3P4sqO77p-WZjAw.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Once you hit &lt;strong&gt;Set up connection&lt;/strong&gt;, Airbyte will run your first sync! You can
click into your connection to get access to the sync logs, replication settings,
and transformation settings if supported.&lt;/p&gt;

&lt;p&gt;Checking our S3 bucket, we can see that our data has successfully reached! If
you’re just testing things out, you’re done.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*qrEc7u2hiUUZv4TO5qOv6A.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-analysis&quot;&gt;The analysis&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up your data pipelines, if you want to run transformation
jobs, Trino enables that use case well — Lyft, Pinterest, and Shopify have all
done this to great success. There’s also a &lt;a href=&quot;https://github.com/starburstdata/dbt-trino&quot;&gt;dbt-trino
plugin&lt;/a&gt; managed by the folks over at
Starburst. Alternatively, you could also accomplish this using &lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-object-lambda-uppercase.html&quot;&gt;S3 Object
Lambda&lt;/a&gt;
if you want to stay within the AWS landscape when possible.&lt;/p&gt;

&lt;p&gt;Once your data is in a queryable state, you can now use
&lt;a href=&quot;https://trino.io/docs/current/connector/hive-s3.html&quot;&gt;Trino&lt;/a&gt; or your favorite
query engine to your heart’s content! If you want to get started with querying
these heterogenous data sources using Trino, here’s a &lt;a href=&quot;https://janakiev.com/blog/presto-trino-s3/&quot;&gt;getting-started
guide&lt;/a&gt; on how to do that. Finally,
join the &lt;a href=&quot;https://airbyte.com/community&quot;&gt;Airbyte&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/community.html&quot;&gt;Trino&lt;/a&gt; communities to find more about how
others are consolidating and querying their data.&lt;/p&gt;</content>

      
        <author>
          <name>Abhi Vaidyanatha</name>
        </author>
      

      <summary>Maximizing your experience with zero choices. I’m publishing this blog post in partnership with the Trino community to go along a lightning talk I’m giving for their event, Cinco de Trino. This article was originally published on Abhi’s Medium site “My data is all over the place and attempting to analyze or query it is not only time consuming and expensive, but also emotionally taxing.”</summary>

      
      
    </entry>
  
    <entry>
      <title>Cinco de Trino recap: Learn how to build an efficient data lake</title>
      <link href="https://trino.io/blog/2022/05/17/cinco-de-trino-recap.html" rel="alternate" type="text/html" title="Cinco de Trino recap: Learn how to build an efficient data lake" />
      <published>2022-05-17T00:00:00+00:00</published>
      <updated>2022-05-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/17/cinco-de-trino-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/17/cinco-de-trino-recap.html">&lt;p&gt;When Trino (formerly PrestoSQL) arrived on the scene almost 10 years ago, it
immediately became known as the much faster alternative to the data warehouse
of big data, Apache Hive. The use cases that you, as the community, have built
had far exceeded anything we had imagined in complexity. Together we’ve made 
Trino not only the fastest way to interactively query large data sets, but also
a convenient way to run federated queries across data sources to make moving all
the data optional.&lt;/p&gt;

&lt;p&gt;At Cinco de Trino, we came full circle back to the next iteration of analytics 
architecture with the data lake.  This conference offers advice from industry 
thought leaders about how to use best lakehouse tools with Trino to manage that 
data complexity. Hear from industry thought leaders like Martin Traverso 
(Trino), Dain Sundstrom (Trino), James Campbell (Great Expectations), Jeremy 
Cohen (DBT Labs), Ryan Blue (Iceberg), Denny Lee (Delta Lake), Vinoth Chandar 
(Hudi). You can watch the talks on-demand on the 
&lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&quot;&gt;Cinco de Trino playlist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I’d like to cover the key items from each talk you won’t want to 
miss.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;keynote-trino-as-a-data-lakehouse&quot;&gt;Keynote: Trino as a data lakehouse&lt;/h3&gt;

&lt;p&gt;Trino co-creator, Martin Traverso, covers where Trino fits into the data lake 
and brings you a sneak peak of the future of a Trino. Polymorphic Table 
Functions, adaptive query planning, are some of the many exciting features 
Martin walks us through.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/gwV3smFiGEg&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;project-tardigrade&quot;&gt;Project Tardigrade&lt;/h3&gt;

&lt;p&gt;If you have one takeaway from the conference, let it be this: there’s a new way
in town to get 60% cost savings on your Trino deployment. Cory Darby walks
through how utilizing the fault-tolerant execution architecture has enabled
BlueCat to auto-scale their Trino clusters, and run over spot instances, which 
yielded massive cost savings. Zebing Lin goes through how this happens behind
the scenes, and how you can run resource-intensive ETL jobs using failure 
recovery delivered by the team behind Project Tardigrade.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/MYBoeB_lQmo&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://trino.io/blog/2022/05/05/tardigrade-launch.html&quot;&gt;Learn more in the Project Tardigrade blog »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/kubernetes/tardigrade-eks&quot;&gt;Try Project Tardigrade Yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;starburst-galaxy-lab&quot;&gt;Starburst Galaxy lab&lt;/h3&gt;

&lt;p&gt;Starburst Galaxy enables you to get Trino up and running rather than spending
your time focusing on the setup, scaling, and maintaining the infrastructure.
Trino co-creator, Dain Sundstrom, walks you through a fun-filled lab that
demonstrates how to use Trino as a service solution, Starburst Galaxy, to
generate &lt;a href=&quot;https://db-engines.com/en/ranking&quot;&gt;database rankings&lt;/a&gt; by ingesting,
cleaning, and analyzing Twitter and Stack Overflow data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/WQNqqkBd_Jo&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;engineering-data-reliability-with-great-expectations&quot;&gt;Engineering data reliability with Great Expectations&lt;/h3&gt;

&lt;p&gt;Let’s be honest: when we claim to have run “tests” for our data pipelines, we 
usually mean we checked that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;input !=NULL&lt;/code&gt;, or that the dashboard isn’t broken. 
James Campbell showcases the Great Expectations connector for Trino. The
Great Expectations connector is officially launched as the new way to write
expectations (data quality checks) for your code.&lt;/p&gt;

&lt;p&gt;What excites us the most?&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The ability to take advantage of far more sophisticated data quality tests
than what any of us would write.&lt;/li&gt;
  &lt;li&gt;Having a really awesome UI to manage expectations.&lt;/li&gt;
  &lt;li&gt;The data source view that makes it easy to dynamically test your custom
data quality checks against backends.&lt;/li&gt;
&lt;/ol&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/9HE6LawCHP8&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;bring-your-data-into-your-data-lake-with-airbyte&quot;&gt;Bring your data into your data lake with Airbyte&lt;/h3&gt;

&lt;p&gt;The first step of doing any analytics is bringing your data into the data lake.
Ingestion engines are a gamechanger for centralizing your data in the data lake.
Up until recently, there were no open software to choose from in this category.
In just 10 minutes, Abhi Vaidyanatha takes us through the journey of taking in 
data from various places into your choice of data lake.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/3E0jb4d2p0U&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://abhi-vaidyanatha.medium.com/an-opinionated-guide-to-consolidating-your-data-b09386b2b9b5&quot;&gt;Read Abhi’s article about Airbyte + Trino »&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;transforming-your-data-with-dbt&quot;&gt;Transforming your data with dbt&lt;/h3&gt;

&lt;p&gt;Ever had 300 lines of SQL in front of you, and wasted lots of time sifting 
through the code to find which part of the code to edit to check for duplicate 
customers?&lt;/p&gt;

&lt;p&gt;Imagine having to update decimal precision used frequently throughout that SQL
statement? What we &amp;lt;3 the most about DBT is that data engineering becomes much 
more like software engineering, where you code in a much more modular way. Along
the way, you get many benefits: the one we love the most? Data lineage graph and
automatic documentation. That’s stuff we always say is important, but never do.&lt;/p&gt;

&lt;p&gt;Even for dbt experts, there’s something new to learn. Jeremy Cohen goes through
new capabilities Trino brings to dbt, while showcasing cool features like
macros: a flexible alternative to SQL defined functions.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/UYS75sjTziU&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/dbt-labs/trino-dbt-tpch-demo&quot;&gt;Check out Jeremy’s demo repo »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;choosing-the-best-data-lakehouse-format-for-you&quot;&gt;Choosing the best data lakehouse format for you&lt;/h2&gt;

&lt;p&gt;Ever wonder about all the hype with the new table formats? Why is everyone 
choosing Iceberg, Delta Lake, Hudi, over Hive? The founders of each of these 
modern table formats showcase each of these table formats and let you be the
judge of which format makes more sense to your architecture. Below are the 
highlights:&lt;/p&gt;

&lt;h3 id=&quot;iceberg&quot;&gt;Iceberg&lt;/h3&gt;

&lt;p&gt;Ryan Blue dives into important elements of your data lakehouse architecture that
affect daily operations and slow down developer efficiency. He then covers how
Iceberg is the solution he realized to solve those issues.&lt;/p&gt;

&lt;p&gt;The two special elements of Iceberg is that it intentionally breaks 
compatibility with the Hive format to bring you features like same table 
partition and schema evolution. I’m the surface this may seem trivial as we’ve 
conditioned our minds to accepting the limitations of hive-like formats.&lt;/p&gt;

&lt;p&gt;The second special element is that Iceberg also builds a community-driven 
specification that enables anyone to build out the same calls to use Iceberg 
library.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/1oXmBbB77ak&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;delta-lake&quot;&gt;Delta Lake&lt;/h3&gt;

&lt;p&gt;90% of the time that our Trino data pipelines break, it was because someone 
committed a bad upstream change. With Delta Lake time travel (coming soon!), you
won’t need to spend a whole day pinpointing that bad change: just travel back in
time and identify which change that was. Denny Lee gives us a compelling 
argument for why users desire ACID guarantees in their data lakehouse and how
Delta Lake solves for that.&lt;/p&gt;

&lt;p&gt;Similar to Iceberg, Delta lake offers optimistic concurrency, which allows there
to be multiple writers to the same Delta Lake table while maintaining ACID
constrains on the data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/TB9Dxv71LxQ&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;hudi-coming-soon-to-trino&quot;&gt;Hudi [Coming Soon to Trino]&lt;/h3&gt;

&lt;p&gt;The coolest part of the talk? Open up a world of new possibilities with near 
real-time analytics in Trino with Hudi. With Hudi, you get to serve real-time 
production systems, debug live issues, and more.&lt;/p&gt;

&lt;p&gt;Vinoth Chandar showcasing the compelling use cases that drove innovation around
Hudi at Uber. He then covers how he views the architecture of data lakes and
lakehouses are starting to merge and the implications this has on the open 
versus proprietary architectures.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/r-fF9uqzUdE&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;touch-talk-and-see-your-data-with-tableau&quot;&gt;Touch, talk, and see your data with Tableau&lt;/h3&gt;

&lt;p&gt;Tableau is our favorite data visualization tool, and in this session, Vlad 
Usatin of Tableau shares how to use Tableau to directly visualize your Trino 
data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/b6kKqNIMvuM&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;Thank you to all who attended or viewed, we hope to see you again at our
upcoming events later this year. Continue the conversation in our 
&lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-18acr4bvr-0DtaCwiLOrv1zetGnV_w~w&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Brian Zhan</name>
        </author>
      

      <summary>When Trino (formerly PrestoSQL) arrived on the scene almost 10 years ago, it immediately became known as the much faster alternative to the data warehouse of big data, Apache Hive. The use cases that you, as the community, have built had far exceeded anything we had imagined in complexity. Together we’ve made Trino not only the fastest way to interactively query large data sets, but also a convenient way to run federated queries across data sources to make moving all the data optional. At Cinco de Trino, we came full circle back to the next iteration of analytics architecture with the data lake. This conference offers advice from industry thought leaders about how to use best lakehouse tools with Trino to manage that data complexity. Hear from industry thought leaders like Martin Traverso (Trino), Dain Sundstrom (Trino), James Campbell (Great Expectations), Jeremy Cohen (DBT Labs), Ryan Blue (Iceberg), Denny Lee (Delta Lake), Vinoth Chandar (Hudi). You can watch the talks on-demand on the Cinco de Trino playlist. In this post, I’d like to cover the key items from each talk you won’t want to miss.</summary>

      
      
    </entry>
  
    <entry>
      <title>Project Tardigrade delivers ETL at Trino speeds to early users</title>
      <link href="https://trino.io/blog/2022/05/05/tardigrade-launch.html" rel="alternate" type="text/html" title="Project Tardigrade delivers ETL at Trino speeds to early users" />
      <published>2022-05-05T00:00:00+00:00</published>
      <updated>2022-05-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/05/tardigrade-launch</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/05/tardigrade-launch.html">&lt;p&gt;After six months of challenging work on Project Tardigrade, we are ready to
launch. With the project we improved the user experience of running resource
intensive queries that are common in the Extract, Transform, Load (ETL) and
batch processing space. It required some significant and fascinating
engineering to get us to the current status. The latest Trino release includes
all the work from Project Tardigrade. Read on to learn how it all works, and
how to enable the fault-tolerant execution in Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot; width=&quot;100%&quot;&gt;
    &lt;img width=&quot;50%&quot; src=&quot;/assets/blog/tardigrade-launch/tardigrade-logo.png&quot; /&gt;
&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;what-is-project-tardigrade&quot;&gt;What is Project Tardigrade?&lt;/h2&gt;

&lt;p&gt;What we love most about Trino is that you get fast query speeds, and you can
iterate fast with intuitive error messages, interactive experience, and query
federation.&lt;/p&gt;

&lt;p&gt;One of the big problems that persisted a long time is that configuring, tuning,
and managing Trino for long-running ETL workloads is very difficult. Following
are just some of the problems you have to deal with:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Reliable landing times:&lt;/em&gt; Queries that run for hours can fail. Restarting
them from scratch wastes resources and makes it hard for you to meet
your completion time requirements.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Cost-efficient clusters:&lt;/em&gt; Trino queries that need terabytes of distributed
memory require extremely large clusters due to the lack of iterative
execution.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Concurrency:&lt;/em&gt; Multiple independent clients may submit their queries
concurrently. Due to the lack of available resources at a certain moment some
of these queries may need to be killed and restarted from zero after a
while. This makes the landing time even more unpredictable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://engineering.salesforce.com/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;Structuring your workload&lt;/a&gt;
to avoid these problems can be done by a team of experts. But that is not
accessible to most Trino users.&lt;/p&gt;

&lt;p&gt;The goal of Project Tardigrade is to provide an “out of the box” solution for the
problems mentioned above. We’ve designed a new
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Fault-Tolerant-Execution&quot;&gt;fault-tolerant execution architecture&lt;/a&gt;
that allows us to implement an advanced resource-aware scheduling with granular
retries.&lt;/p&gt;

&lt;p&gt;Following are some of the benefits and results:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When your long-running queries experience a failure, they don’t have to start
from scratch.&lt;/li&gt;
  &lt;li&gt;When queries require more memory than currently available in the cluster
they are still able to succeed.&lt;/li&gt;
  &lt;li&gt;When multiple queries are submitted concurrently they are able to share
resources in a fair way, and make steady progress.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino does all the hard work of allocating, configuring, and maintaining query
processing behind the scenes. Instead of spending time tuning Trino clusters to
match your workload requirements, or reorganizing your workload to match your
Trino cluster capabilities, you can spend your time on analytics and delivering
business value. And most importantly, your heart won’t skip a beat when you
wake up in the morning wondering whether that query landed on time.&lt;/p&gt;

&lt;h2 id=&quot;what-did-we-test-so-far&quot;&gt;What did we test so far?&lt;/h2&gt;

&lt;p&gt;Since there’s no publicly available testing query set for ETL use cases, we
handcrafted more than a hundred ETL-like queries based on the
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpch/etl&quot;&gt;TPC-H&lt;/a&gt;
and
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpcds/etl&quot;&gt;TPC-DS&lt;/a&gt;
datasets.&lt;/p&gt;

&lt;p&gt;To simulate real world settings, we deployed a cluster
&lt;a href=&quot;https://trino.io/docs/current/admin/fault-tolerant-execution.html&quot;&gt;configured for fault-tolerant execution&lt;/a&gt;
of 15 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m5.8xlarge&lt;/code&gt; nodes and repeatedly executed thousands of queries over
datasets of different sizes (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10GB&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1TB&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10TB&lt;/code&gt;). The queries were
executed sequentially as well as with concurrency factors of 5, 10, and 20.
Failure recovery capabilities were tested by crashing a random node in a
cluster every couple of minutes while streaming a live workload.&lt;/p&gt;

&lt;p&gt;To validate new resource management capabilities we submitted all 22
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpch/etl&quot;&gt;TPC-H&lt;/a&gt;
based queries simultaneously with fault-tolerant execution enabled and disabled.
With fault-tolerant execution disabled only two of them succeeded, while the 
remaining twenty queries failed with resource-related issues, such as
running out of memory. With fault tolerant execution enabled all of the
queries succeeded with no issues.&lt;/p&gt;

&lt;h2 id=&quot;how-do-i-enable-fault-tolerant-execution&quot;&gt;How do I enable fault-tolerant execution?&lt;/h2&gt;

&lt;p&gt;Fault-tolerant execution can only be enabled for an entire cluster.&lt;/p&gt;

&lt;p&gt;In general, we recommend splitting your long-running ETL queries and
short-running interactive workloads and use cases to run on different cluster.
This ensures that long running ETL queries do not impact interactive workloads
and cause a bad user experience. Also note that any short-running,
interactive queries on a fault-tolerant cluster may experience higher latencies
due to the checkpoint mechanism.&lt;/p&gt;

&lt;h3 id=&quot;1-add-an-s3-bucket-for-checkpointing&quot;&gt;1. Add an S3 bucket for checkpointing&lt;/h3&gt;

&lt;p&gt;First you need to create an S3 bucket for spooling. We recommend configuring a
bucket lifecycle rule to automatically expire abandoned objects in the event of
a node crash. You can configure these rules using the 
&lt;a href=&quot;https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-lifecycle-configuration.html&quot;&gt;s3api&lt;/a&gt; 
which is included in the tutorial below.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
    &quot;Rules&quot;: [
        {
            &quot;Expiration&quot;: {
                &quot;Days&quot;: 1
            },
            &quot;ID&quot;: &quot;Expire&quot;,
            &quot;Filter&quot;: {},
            &quot;Status&quot;: &quot;Enabled&quot;,
            &quot;NoncurrentVersionExpiration&quot;: {
                &quot;NoncurrentDays&quot;: 1
            },
            &quot;AbortIncompleteMultipartUpload&quot;: {
                &quot;DaysAfterInitiation&quot;: 1
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;2-configure-the-trino-exchange-manager&quot;&gt;2. Configure the Trino exchange manager&lt;/h3&gt;

&lt;p&gt;Second you need to configure exchange manager. Add a the file 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange-manager.properties&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc&lt;/code&gt; folder of your Trino installation on
the coordinator and all workers with the following content:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;exchange-manager.name=filesystem
exchange.base-directories=s3://&amp;lt;bucket-name&amp;gt;
exchange.s3.region=us-east-1
exchange.s3.aws-access-key=&amp;lt;access-key&amp;gt;
exchange.s3.aws-secret-key=&amp;lt;secret-key&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;3-enable-task-level-retries&quot;&gt;3. Enable task level retries&lt;/h3&gt;

&lt;p&gt;Lastly, you need to configure and enable task level retries by adding the
following properties to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properties&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;retry-policy=TASK
query.hash-partition-count=50
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note: more than 50 partitions is currently not supported by the filesystem
exchange implementation.&lt;/p&gt;

&lt;h3 id=&quot;4-optional-recommended-settings&quot;&gt;4. Optional recommended settings&lt;/h3&gt;

&lt;p&gt;It is also recommended to enable compression to reduce the amount of data spooled
on S3 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange.compression-enabled=true&lt;/code&gt;) as well as reduce the low memory
killer delay to allow the resource manager to unblock nodes running short on memory
faster (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.low-memory-killer.delay=0s&lt;/code&gt;). Additionally, we recommend enabling
automatic writer scaling to optimize output file size for tables created with
Trino (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scale-writers=true&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;To increase overall throughput and reduce resource-related task retries, we
recommend adjusting the concurrency settings based on the hardware
configuration you have chosen.&lt;/p&gt;

&lt;p&gt;Following are the settings for the hardware used in our testing (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;32&lt;/code&gt; vCPUs,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;128GB&lt;/code&gt; memory and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10Gbit/s&lt;/code&gt; network):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;task.concurrency=8
task.writer-count=4
fault-tolerant-execution-target-task-input-size=4GB
fault-tolerant-execution-target-task-split-count=64
fault-tolerant-execution-task-memory=5GB
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;By default Trino is configured to wait up to five minutes for task to recover
before considering it lost and rescheduling. This timeout
can be increased or reduced as necessary by adjusting the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.remote-task.max-error-duration&lt;/code&gt; configuration property. For example:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.remote-task.max-error-duration=1m&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;deploying-on-aws-with-helm-and-kubernetes&quot;&gt;Deploying on AWS with Helm and Kubernetes&lt;/h2&gt;

&lt;p&gt;To test out Tardigrade features, you need at least a cluster with a dedicated
coordinator and two workers for a minimal level of parallelism and performance.
The quickest and easiest way to provide all of these specifications we mentioned
above is by using the
&lt;a href=&quot;https://artifacthub.io/packages/helm/trino/trino&quot;&gt;Trino helm chart&lt;/a&gt; with a
provided &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;values.yml&lt;/code&gt; below and deploying a cluster to the AWS EKS cloud
service. If you are not familiar with deploying Trino on Kubernetes, we
recommend you take a look at the Trino Community Broadcast episodes covering
&lt;a href=&quot;https://trino.io/episodes/24.html&quot;&gt;local Trino on Kubernetes&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/episodes/31.html&quot;&gt;deploying Trino on EKS&lt;/a&gt;.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/4isawxYjDnE&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/kubernetes/tardigrade-eks&quot;&gt;Try Project Tardigrade Yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;closing-notes&quot;&gt;Closing notes&lt;/h2&gt;

&lt;p&gt;Project Tardigrade has been a great success for us already. We learned a lot
and significantly improved Trino. Now we are really ready to share this with
you all, and look forward to fix anything you find. We really want you to push
the limits, and let us know what you find.&lt;/p&gt;

&lt;p&gt;If running fast batch jobs on the fastest state-of-the-art query engine 
interests you, consider playing around with the tutorial above and giving us 
your feedback. You can reach us on the &lt;a href=&quot;https://bit.ly/3IFlNXy&quot;&gt;#project-tardigrade&lt;/a&gt; 
channel in our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you would like to write about your experience and results, or become a
contributor, also let us know on the &lt;a href=&quot;https://bit.ly/3IFlNXy&quot;&gt;#project-tardigrade&lt;/a&gt;
channel. We are happy to send you Tardigrade swag as a thank you.&lt;/p&gt;

&lt;p&gt;Thanks for reading and learning with us today. Happy Querying!&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://www.reddit.com/r/dataengineering/comments/uj2aez/etl_at_trino_speeds_and_a_stepbystep_tutorial_on/&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=31276058&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Andrii Rosa, Brian Olsen, Brian Zhan, Lukasz Osipiuk, Martin Traverso, Zebing Lin</name>
        </author>
      

      <summary>After six months of challenging work on Project Tardigrade, we are ready to launch. With the project we improved the user experience of running resource intensive queries that are common in the Extract, Transform, Load (ETL) and batch processing space. It required some significant and fascinating engineering to get us to the current status. The latest Trino release includes all the work from Project Tardigrade. Read on to learn how it all works, and how to enable the fault-tolerant execution in Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>Tardigrade Project Update</title>
      <link href="https://trino.io/blog/2022/02/16/tardigrade-project-update.html" rel="alternate" type="text/html" title="Tardigrade Project Update" />
      <published>2022-02-16T00:00:00+00:00</published>
      <updated>2022-02-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/02/16/tardigrade-project-update</id>
      <content type="html" xml:base="https://trino.io/blog/2022/02/16/tardigrade-project-update.html">&lt;p&gt;Over the last couple of months we’ve added support for full query retries, landed experimental support 
for task level retries and provided a proof of concept implementation of a distributed exchange plugin 
(description below). We are still working on improving scheduling algorithms as well as optimizing 
exchange plugin implementation to make the task level retries fully usable.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Here is a quick summary of our progress so far:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for &lt;a href=&quot;https://github.com/trinodb/trino/pull/9361&quot;&gt;automatic query retries&lt;/a&gt;. This functionality 
is ready to use and can be enabled by setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy=QUERY&lt;/code&gt; session property. Now 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10507&quot;&gt;it is possible&lt;/a&gt; to enable automatic retries for queries that 
produce more than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;32MB&lt;/code&gt; of output. Dynamic filtering is now also 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10274&quot;&gt;fully supported&lt;/a&gt; with automatic query retries enabled.&lt;/li&gt;
  &lt;li&gt;Landed an &lt;a href=&quot;https://github.com/trinodb/trino/pull/9818&quot;&gt;initial set of changes&lt;/a&gt; to support task level retries. 
To be enabled, a plugin implementing the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeManager.java&quot;&gt;ExchangeManager&lt;/a&gt; 
interface has to be installed.&lt;/li&gt;
  &lt;li&gt;Landed a &lt;a href=&quot;https://github.com/trinodb/trino/pull/10823&quot;&gt;proof of concept implementation&lt;/a&gt; of the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeManager.java&quot;&gt;ExchangeManager&lt;/a&gt; 
interface. The implementation is fully functional, however we are still &lt;a href=&quot;https://github.com/trinodb/trino/issues/11050&quot;&gt;working on optimizing the read path&lt;/a&gt;. 
Also for now, only S3 compatible file systems are supported.&lt;/li&gt;
  &lt;li&gt;Added support for automatic retries in &lt;a href=&quot;https://github.com/trinodb/trino/issues/10252&quot;&gt;Hive&lt;/a&gt; and &lt;a href=&quot;https://github.com/trinodb/trino/pull/10622&quot;&gt;Iceberg&lt;/a&gt;. 
Supporting automatic retries for &lt;a href=&quot;https://github.com/trinodb/trino/issues/10254&quot;&gt;JDBC based connectors&lt;/a&gt; is up for grabs.&lt;/li&gt;
  &lt;li&gt;Implemented &lt;a href=&quot;https://github.com/trinodb/trino/pull/10837&quot;&gt;weight based split assignment&lt;/a&gt; for balanced work distribution between fault tolerant tasks.&lt;/li&gt;
  &lt;li&gt;Working on &lt;a href=&quot;https://github.com/trinodb/trino/pull/11023&quot;&gt;adaptive sizing strategy for intermediate tasks&lt;/a&gt; to minimize scheduling overhead 
while keeping the cost of a single task failure at minimum.&lt;/li&gt;
  &lt;li&gt;Making progress on introducing an &lt;a href=&quot;https://github.com/trinodb/trino/pull/10432&quot;&gt;advanced memory aware scheduling&lt;/a&gt; that would allow us 
to better support memory intensive queries, improve resource utilization and ensure fair resource allocation between queries.&lt;/li&gt;
  &lt;li&gt;Started working on &lt;a href=&quot;https://github.com/trinodb/trino/issues/9935&quot;&gt;supporting dynamic filtering&lt;/a&gt; for queries with task level retries enabled.&lt;/li&gt;
  &lt;li&gt;Working on &lt;a href=&quot;https://github.com/trinodb/trino/issues/10734&quot;&gt;accommodating failed attempts&lt;/a&gt; in various internal statistics reported by 
the engine (e.g.: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryInfo&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryCompletedEvent&lt;/code&gt;). &lt;a href=&quot;https://github.com/trinodb/trino/issues/10754&quot;&gt;UI changes&lt;/a&gt; will come next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over the next couple of weeks we are planning to focus on:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/11050&quot;&gt;Optimizing read path for the reference implementation of the exchange plugin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Landing &lt;a href=&quot;https://github.com/trinodb/trino/pull/10432&quot;&gt;memory aware scheduling for fault tolerant execution&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Landing &lt;a href=&quot;https://github.com/trinodb/trino/pull/11023&quot;&gt;adaptive sizing for intermediate tasks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/10734&quot;&gt;Accommodating failed attempts into query statistics reporting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Making progress on &lt;a href=&quot;https://github.com/trinodb/trino/issues/9935&quot;&gt;supporting dynamic filtering&lt;/a&gt; for queries with task level retries enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current state of development can be tracked by following this &lt;a href=&quot;https://github.com/trinodb/trino/issues/9101&quot;&gt;issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;</content>

      
        <author>
          <name>Andrii Rosa</name>
        </author>
      

      <summary>Over the last couple of months we’ve added support for full query retries, landed experimental support for task level retries and provided a proof of concept implementation of a distributed exchange plugin (description below). We are still working on improving scheduling algorithms as well as optimizing exchange plugin implementation to make the task level retries fully usable.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino 2021 Wrapped: A Year of Growth</title>
      <link href="https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth.html" rel="alternate" type="text/html" title="Trino 2021 Wrapped: A Year of Growth" />
      <published>2021-12-31T00:00:00+00:00</published>
      <updated>2021-12-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth</id>
      <content type="html" xml:base="https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth.html">&lt;p&gt;As we reflect on Trino’s journey in 2021, one thing stands out. Compared to 
previous years we have seen even further accelerated, tremendous growth. Yes,
this is what all these year-in-retrospect blog posts say, but this has some 
special significance to it. This week marked the one-year anniversary since the 
project &lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;dropped the Presto name and moved to the Trino name&lt;/a&gt;.
Immediately after the announcement, the &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;Trino GitHub repository&lt;/a&gt;
started trending in number of stargazers. Up until this point, the PrestoSQL
GitHub repository had only amassed 1,600 stargazers in the two years since it 
had split from the PrestoDB repository. However, within four months after the 
renaming, the number of stargazers had doubled. GitHub stars, issues, pull 
requests and commits started growing at a new trajectory.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/bitsondatadev/status/1344028682126565381&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/2021-review/trending.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;At the time of writing, we just hit 4,600 stargazers on GitHub. This means, we 
have grown by over 3,000 stargazers in the last year, a 187% increase. While we 
are on the subject, let’s talk about the health of the Trino community.&lt;/p&gt;

&lt;h2 id=&quot;2021-by-the-numbers&quot;&gt;2021 by the numbers&lt;/h2&gt;

&lt;p&gt;Let’s take a look at the Trino project growth by the numbers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;3679 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;3015 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2450 new members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;1979 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1213 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;988 new followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;525 average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;491 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;23 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;17 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;13 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;10 Trino 🍕 meetups&lt;/li&gt;
  &lt;li&gt;1 Trino ⛰️ Summit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Along with the growth we’ve seen in GitHub, we have seen a 47% growth of &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;the Trino Twitter&lt;/a&gt; 
followers this year. &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;The Trino Slack community&lt;/a&gt;,
where a large amount of troubleshooting and development discussions occur, saw a
75% growth, nearing 6,000 members. Finally, &lt;a href=&quot;https://www.youtube.com/c/TrinoDB&quot;&gt;the Trino YouTube channel&lt;/a&gt;
has seen an impressive 280% growth in subscribers.&lt;/p&gt;

&lt;p&gt;A lot of the increase on this channel was due to the &lt;a href=&quot;/broadcast/&quot;&gt;Trino Community Broadcast&lt;/a&gt;, 
that brought users and contributors from the community to cover 23 episodes
about the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;7 episodes on the Trino ecosystem (dbt, Amundsen, Debezium, Superset)&lt;/li&gt;
  &lt;li&gt;4 episodes on the Trino project (Renaming Trino, Intro to Trino, Trinewbies)&lt;/li&gt;
  &lt;li&gt;4 episodes on Trino connectors (Iceberg, Druid, Pinot)&lt;/li&gt;
  &lt;li&gt;4 episodes on Trino internals (Distributed Hash-Joins, Dynamic Filtering, Views)&lt;/li&gt;
  &lt;li&gt;2 episodes on Trino using Kubernetes (Trinetes series)&lt;/li&gt;
  &lt;li&gt;2 episodes on Trino users (LinkedIn, Resurface)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While stargazers, subscribers, episodes, and followers tell the story of the 
growing awareness of the Trino project with the new name, what about the actual
rate of development on the project?&lt;/p&gt;

&lt;p&gt;At the start of the year, there were 21,924 commits. This year, we pushed 3,679 
commits to the repository, sitting at over 25,600 now. Looking at the graph, this
keeps us pretty consistent with 2020’s throughput.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/2021-review/commits.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;With the project’s trajectory displayed in numbers, let’s examine the top 
features that landed in Trino this year.&lt;/p&gt;

&lt;h2 id=&quot;features&quot;&gt;Features&lt;/h2&gt;

&lt;p&gt;Here’s a high-level list of the most exciting features that made their way into
Trino in 2021. For details and to keep up you can check out the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;sql-language-improvements&quot;&gt;SQL language improvements&lt;/h3&gt;

&lt;p&gt;SQL language support is crucial for the increasing complexities of queries and 
usage of Trino. In 2021 we added numerous new language features and 
improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/05/19/row_pattern_matching.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/a&gt;
a feature that allows for complex analysis across multiple rows. To learn more 
about this feature watch &lt;a href=&quot;/episodes/23.html&quot;&gt;the Community Broadcast show&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#window-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;&lt;/a&gt; clause.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/03/10/introducing-new-window-features.html#new%20features&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt;&lt;/a&gt;
keyword for usage within a window function.&lt;/li&gt;
  &lt;li&gt;Time travel support and syntax, like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR VERSION AS OF&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR TIMESTAMP AS OF&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/update.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;&lt;/a&gt; is supported.&lt;/li&gt;
  &lt;li&gt;Subquery expressions that return multiple columns. Example: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT x = (VALUES (1, &apos;a&apos;))&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW&lt;/code&gt; … &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RENAME TO&lt;/code&gt; …&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/geospatial.html#from_geojson_geometry&quot;&gt;from_geojson_geometry/to_geojson_geometry&lt;/a&gt; functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/ipaddress.html#ip-address-contains&quot;&gt;contains&lt;/a&gt; 
function for checking if a CIDR contains an IP address.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#listagg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listagg&lt;/code&gt;&lt;/a&gt;
function returns concatenated values seperated by a specified separator.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/string.html#soundex&quot;&gt;soundex&lt;/a&gt; function
that checks phonetic similarity of two strings.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/conversion.html#format_number&quot;&gt;format_number&lt;/a&gt; function.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/set-time-zone.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt;&lt;/a&gt; to set the
 current time zone for the session.&lt;/li&gt;
  &lt;li&gt;Arbitrary queries in &lt;a href=&quot;https://trino.io/docs/current/sql/show-stats.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW STATS&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_CATALOG&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_SCHEMA&lt;/code&gt; session functions.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; which allows for a more efficient delete.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DENY&lt;/code&gt; statement, which enables you to remove a user or groups access via SQL.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN &amp;lt;catalog&amp;gt;&lt;/code&gt; clause to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE ROLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP ROLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT ROLE&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REVOKE ROLE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET ROLE&lt;/code&gt; to specify the target catalog of the statement 
instead of using the current session catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;query-processing-improvements&quot;&gt;Query processing improvements&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for automatic query retries (this feature is very experimental
with some limitations for now).&lt;/li&gt;
  &lt;li&gt;Transparent query retries.&lt;/li&gt;
  &lt;li&gt;Updated the behavior of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; cast to produce &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; objects instead
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; arrays.&lt;/li&gt;
  &lt;li&gt;Column and table lineage tracking in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryCompletedEvent&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance-improvements&quot;&gt;Performance improvements&lt;/h2&gt;

&lt;p&gt;Improved performance for the following operations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Querying Parquet data for files containing column indexes.&lt;/li&gt;
  &lt;li&gt;Reading dictionary-encoded Parquet files.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;a href=&quot;https://trino.io/docs/current/functions/window.html#rank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rank()&lt;/code&gt;&lt;/a&gt; window function.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#sum&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum()&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#avg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg()&lt;/code&gt;&lt;/a&gt; for 
decimal types.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; with single grouping column.&lt;/li&gt;
  &lt;li&gt;Aggregation on decimal values.&lt;/li&gt;
  &lt;li&gt;Evaluation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Computing the product of decimal values with precision larger than 19.&lt;/li&gt;
  &lt;li&gt;Queries that process row or array data.&lt;/li&gt;
  &lt;li&gt;Queries that contain a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Reduced memory usage and improved performance of joins.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY LIMIT&lt;/code&gt; performance was improved when data was pre-sorted.&lt;/li&gt;
  &lt;li&gt;Node-local Dynamic Filtering&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;Added the following improvements and features relevant for authentication, 
authorization and integration with other security systems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic configuration of TLS for 
&lt;a href=&quot;https://trino.io/docs/current/security/internal-communication.html&quot;&gt;secure internal communication&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Handling of Server Name Indication (SNI) for multiple TLS certificates.
This removes the need to provision per-worker TLS certificates.&lt;/li&gt;
  &lt;li&gt;Access control for materialized views.&lt;/li&gt;
  &lt;li&gt;OAuth2/OIDC &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html&quot;&gt;opaque access tokens&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Configuring HTTP proxy for OAuth2 authentication.&lt;/li&gt;
  &lt;li&gt;Configuring &lt;a href=&quot;https://trino.io/docs/current/security/authentication-types.html#multiple-password-authenticators&quot;&gt;multiple password authentication plugins&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Hiding inaccessible columns from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT *&lt;/code&gt; statement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;data-sources&quot;&gt;Data Sources&lt;/h2&gt;

&lt;h3 id=&quot;bigquery-connector&quot;&gt;BigQuery connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP TABLE&lt;/code&gt; support.&lt;/li&gt;
  &lt;li&gt;Added support for case insensitive name matching for BigQuery views.&lt;/li&gt;
  &lt;li&gt;Support reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bignumeric&lt;/code&gt; type whose precision is less than or equal to 
38.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE SCHEMA&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Improved support for BigQuery datetime and timestamp types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;cassandra-connector&quot;&gt;Cassandra connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Mapped Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt; type to Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Added support for Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tuple&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Changed minimum number of speculative executions from two to one.&lt;/li&gt;
  &lt;li&gt;Support for reading user-defined types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;clickhouse-connector&quot;&gt;Clickhouse connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;a href=&quot;https://trino.io/docs/current/connector/clickhouse.html&quot;&gt;ClickHouse connector&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of aggregation queries by computing aggregations within 
ClickHouse. Currently, the following aggregate functions are eligible for
pushdown: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Added support for dropping columns.&lt;/li&gt;
  &lt;li&gt;Map ClickHouse &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; columns as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type in Trino instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;hdfs-s3-azure-and-cloud-object-storage-systems&quot;&gt;HDFS, S3, Azure and cloud object storage systems&lt;/h3&gt;

&lt;p&gt;A core use case of Trino uses the Hive and Iceberg connectors to connect to
a data lake. These connectors differ from most as Trino is the sole query engine
as opposed to the client calling another system. Here are some changes that
for these connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Enabled Glue statistics to support better query planning when using AWS.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; support for ACID tables&lt;/li&gt;
  &lt;li&gt;A lot of Hive view improvements.&lt;/li&gt;
  &lt;li&gt;Parquet column indexes.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;target_max_file_size&lt;/code&gt; configuration to control the file size of data written
by Trino.&lt;/li&gt;
  &lt;li&gt;Streaming uploads to S3 by default to improve performance and reduce disk usage.&lt;/li&gt;
  &lt;li&gt;Improved performance for tables with small files and partitioned tables.&lt;/li&gt;
  &lt;li&gt;Transparent redirection from a Hive catalog to Iceberg catalog if the table is
an Iceberg table.&lt;/li&gt;
  &lt;li&gt;Updated to Iceberg 0.11.0 behavior for transforms of dates and timestamps
before 1970.&lt;/li&gt;
  &lt;li&gt;Added procedure &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.flush_metadata_cache()&lt;/code&gt; to flush metadata caches.&lt;/li&gt;
  &lt;li&gt;Avoid generating splits for empty files.&lt;/li&gt;
  &lt;li&gt;Sped up Iceberg query performance when dynamic filtering can be leveraged.&lt;/li&gt;
  &lt;li&gt;Increased Iceberg performance when reading timestamps from Parquet files.&lt;/li&gt;
  &lt;li&gt;Improved Iceberg performance for queries on nested data through dereference
pushdown.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; operations on S3-backed tables.&lt;/li&gt;
  &lt;li&gt;Made the Iceberg &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt; type available.&lt;/li&gt;
  &lt;li&gt;Trino views made available in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elasticsearch-connector&quot;&gt;Elasticsearch connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for reading fields as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; values.&lt;/li&gt;
  &lt;li&gt;Fixed failure when documents contain fields of unsupported types.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaled_float&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Added support for assuming an IAM role.&lt;/li&gt;
  &lt;li&gt;Added retry requests with backoff when Elasticsearch is overloaded.&lt;/li&gt;
  &lt;li&gt;Better support for Elastic Cloud.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mongodb-connector&quot;&gt;MongoDB connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;a href=&quot;https://trino.io/docs/current/connector/mongodb.html#timestamp_objectid&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp_objectid()&lt;/code&gt;&lt;/a&gt;
function.&lt;/li&gt;
  &lt;li&gt;Enabled &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mongodb.socket-keep-alive&lt;/code&gt; config property by default.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Support reading MongoDB &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DBRef&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Allow skipping creation of an index for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_schema&lt;/code&gt; collection, if it 
already exists.&lt;/li&gt;
  &lt;li&gt;Added support to redact the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mongodb.credentials&lt;/code&gt; in the server log.&lt;/li&gt;
  &lt;li&gt;Added support for dropping columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mysql-connector&quot;&gt;MySQL connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for reading and writing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; values with precision higher
than three.&lt;/li&gt;
  &lt;li&gt;Added support for predicate pushdown on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; columns.&lt;/li&gt;
  &lt;li&gt;Exclude an internal &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sys&lt;/code&gt; schema from schema listings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;pinot-connector&quot;&gt;Pinot connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Updated Pinot connector to be compatible with versions &amp;gt;= 0.8.0 and drop 
support for older versions.&lt;/li&gt;
  &lt;li&gt;Added support for pushdown of filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varbinary&lt;/code&gt; columns to Pinot.&lt;/li&gt;
  &lt;li&gt;Fixed incorrect results for queries that contain aggregations and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; filters over varchar columns.&lt;/li&gt;
  &lt;li&gt;Fixed failure for queries with filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; columns having 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+Infinity&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt; values.&lt;/li&gt;
  &lt;li&gt;Implemented aggregation pushdown.&lt;/li&gt;
  &lt;li&gt;Allowed HTTPS URLs in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;phoenix-connector&quot;&gt;Phoenix connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Phoenix 5 support was added.&lt;/li&gt;
  &lt;li&gt;Reduced memory usage for some queries.&lt;/li&gt;
  &lt;li&gt;Improved performance by adding ability to parallelize queries within Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;features-added-to-various-connectors&quot;&gt;Features added to various connectors&lt;/h3&gt;

&lt;p&gt;In addition to the above some more features were added that apply to connectors
that use common code. These features improve performance using:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-352.html#mysql-connector&quot;&gt;Statistical aggregate function pushdown &lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;TopN pushdown and join pushdown&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;Improved planning times by reducing number of connections opened&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;Improved performance by improving metadata caching hit rate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-357.html&quot;&gt;Rule based identifier mapping support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-360.html&quot;&gt;DELETE, non-transactional inserts and write-batch-size &lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-361.html&quot;&gt;Metadata cache max size&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;TRUNCATE TABLE&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;Improved handling of Gregorian - Julian switch for date type&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Ensured correctness when pushing down predicates and topN to remote system 
that is case-insensitive or sorts differently from Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h2&gt;

&lt;p&gt;There are a lot of performance improvements to list from the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.
Here are a few examples:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved coordinator CPU utilization.&lt;/li&gt;
  &lt;li&gt;Improved query performance by reducing CPU overhead of repartitioning data 
across worker nodes.&lt;/li&gt;
  &lt;li&gt;Reduced graceful shutdown time for worker nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;everything-else&quot;&gt;Everything else&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/event-listeners-http.html&quot;&gt;HTTP Event listener&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Added support for ARM64 in the &lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;Trino Docker image&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;clear&lt;/code&gt; command to the Trino CLI to clear the screen.&lt;/li&gt;
  &lt;li&gt;Improved tab completion for the Trino CLI.&lt;/li&gt;
  &lt;li&gt;Custom connector metrics.&lt;/li&gt;
  &lt;li&gt;Fixed many, many, many bugs!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit&quot;&gt;Trino Summit&lt;/h2&gt;

&lt;p&gt;In 2021 we also enjoyed a successful inaugural Trino Summit, hosted by 
Starburst, with well over 500 attendees. There were wonderful talks
given at this event from companies like Doordash, EA, LinkedIn, Netflix, 
Robinhood, Stream Native, and Tabular. If you missed this event, we have the 
&lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;recordings and slides available&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a teaser, the event started with Commander Bun Bun playing guitar to AC/DC’s,
“Back In Black”.&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/c_qUp0SGeKE&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;h2 id=&quot;renaming-from-prestosql-to-trino&quot;&gt;Renaming from PrestoSQL to Trino&lt;/h2&gt;

&lt;p&gt;As mentioned above, we renamed the project this year. What followed, was an 
outpouring of support and shock from the larger tech community. Community 
members immediately got to work. The project had to change the namespace 
practically overnight from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.prestosql&lt;/code&gt; namespace to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino&lt;/code&gt; and a 
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;migration blog post&lt;/a&gt;
was published. Due to the hasty nature of the Linux Foundation to enforce the
Presto trademark, users had to adapt quickly.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/trinodb/status/1343330429684703232?s=20&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/2021-review/tweets.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This &lt;a href=&quot;https://stackoverflow.com/questions/67414714&quot;&gt;confused many in the community&lt;/a&gt;,
especially once the ownership of old PrestoSQL accounts were taken down by the
Linux Foundation. The &lt;a href=&quot;https://prestosql.io&quot;&gt;https://prestosql.io&lt;/a&gt; site had broken documentation links,
JDBC urls had to change from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino&lt;/code&gt;, header protocol
names had to be changed from prefix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Presto-&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Trino-&lt;/code&gt;, and various other
user impacting changes had to be made in the matter of weeks. Even the legacy 
Docker images were removed from the &lt;a href=&quot;https://hub.docker.com/r/prestosql/presto&quot;&gt;prestosql/presto Docker repository&lt;/a&gt;,
causing disruptions for many users who immediately had to upgrade to the 
&lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;trinodb/trino Docker repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We reached out to multiple projects to update compatibility to
Trino.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dbeaver/dbeaver/pull/10925&quot;&gt;DBeaver&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pinterest/querybook/issues/509&quot;&gt;QueryBook&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Homebrew/homebrew-core/pull/83185&quot;&gt;Homebrew&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dbt-labs/dbt-presto/issues/39&quot;&gt;dbt&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dungdm93/sqlalchemy-trino/issues/20&quot;&gt;sqlalchemy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/sqlpad/sqlpad/pull/974&quot;&gt;sqlpad&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/superset/pull/13105&quot;&gt;Apache Superset&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/getredash/redash/pull/5411&quot;&gt;Redash&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/akullpp/awesome-java/pull/917&quot;&gt;Awesome Java&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/MunGell/awesome-for-beginners/pull/933&quot;&gt;Awesome For Beginners&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/airflow/pull/15187&quot;&gt;Airflow&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lyft/presto-gateway/issues/134&quot;&gt;trino-gateway&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/metabase/metabase/issues/17532&quot;&gt;Metabase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;and so much more…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the breaking changes, once the immediate hurdles fell behind, not only 
was the community excited and supportive about the brand change, but
particularly they were all loving the new mascot. Our adorable bunny was soon 
after &lt;a href=&quot;/episodes/10.html&quot;&gt;named Commander Bun Bun by the community&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/jtannady/status/1346888143459545092&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/2021-review/cbb.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;2022-roadmap-project-tardigrade&quot;&gt;2022 Roadmap: Project Tardigrade&lt;/h2&gt;

&lt;p&gt;One of the interesting developments that came out of Trino Summit was a feature
Trino co-creator, Martin, talked about in &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/?wchannelid=2ug6mgs5ao&amp;amp;wmediaid=o264qw85dj&quot;&gt;the State of Trino presentation&lt;/a&gt;.
He proposed adding granular fault-tolerance and features to improve performance 
in the core engine. While Trino has been proven to run batch analytics workloads
at scale, many have avoided long-running batch jobs in fear of a query failure. 
The fault-tolerance feature introduces a first step for the Trino project to 
gain first-class support for long-running batch queries at massive scale.&lt;/p&gt;

&lt;p&gt;The granular fault-tolerance is being thoughtfully crafted to maintain the 
speed advantage that Trino has over other query engines, while increasing the 
resiliency of queries. In other words, rather than when a query runs out of
resources or fails for any other reason, a subset of the query is
retried. To support this intermediate stage data is persisted to replicated RAM 
or SSD.&lt;/p&gt;

&lt;p&gt;&lt;a title=&quot;Schokraie E, Warnken U, Hotz-Wagenblatt A, Grohme MA, Hengherr S, et al. (2012), CC BY 2.5 &amp;lt;https://creativecommons.org/licenses/by/2.5&amp;gt;, via Wikimedia Commons&quot; href=&quot;https://commons.wikimedia.org/wiki/File:SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png&quot;&gt;&lt;img width=&quot;512&quot; alt=&quot;SEM image of Milnesium tardigradum in active state - journal.pone.0045682.g001-2&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png/512px-SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project to introduce granular fault-tolerance into Trino is called
Project Tardigrade. It is a focus for many contributors now, and we will 
introduce you to details in the coming months. The project is named after the 
microscopic Tardigrades that are the worlds most indestructible creatures, akin
to the resiliency we are adding to Trino’s queries. We look forward to telling 
you more as features unfold.&lt;/p&gt;

&lt;p&gt;Along with Project Tardigrade will be a series of changes focused around faster
performance in the query engine using columnar evaluation, adaptive planning,
and better scheduling for SIMD and GPU processors. We also will be working on
dynamically resolved functions, MERGE support, Time Travel queries in data lake
connectors, Java 17, improved caching mechanisms, and much much more!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In summary, living this first year under the banner of Trino was nothing short
of a wild endeavor. Any engineer knows that naming things is hard, and renaming
things is all the more difficult.&lt;/p&gt;

&lt;p&gt;As we head into 2022, we can be certain of one thing. Trino will be reaching 
into newer areas of development and breaking norms just as it did as Presto in 
previous eras. The adoption of native fault-tolerance to a lightning fast query
engine will bring Trino to a new level of adoption. Keep your eyes peeled for 
more about Project Tardigrade.&lt;/p&gt;

&lt;p&gt;Along with Project Tardigrade, we are looking forward to another year filled
with features, issues, and suggestions from our amazing and passionate community.
Thank you all for an incredible year. We can’t wait to see what you all bring in
2022!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Martin Traverso, Manfred Moser</name>
        </author>
      

      <summary>As we reflect on Trino’s journey in 2021, one thing stands out. Compared to previous years we have seen even further accelerated, tremendous growth. Yes, this is what all these year-in-retrospect blog posts say, but this has some special significance to it. This week marked the one-year anniversary since the project dropped the Presto name and moved to the Trino name. Immediately after the announcement, the Trino GitHub repository started trending in number of stargazers. Up until this point, the PrestoSQL GitHub repository had only amassed 1,600 stargazers in the two years since it had split from the PrestoDB repository. However, within four months after the renaming, the number of stargazers had doubled. GitHub stars, issues, pull requests and commits started growing at a new trajectory.</summary>

      
      
    </entry>
  
    <entry>
      <title>Log4Shell does not affect Trino</title>
      <link href="https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html" rel="alternate" type="text/html" title="Log4Shell does not affect Trino" />
      <published>2021-12-13T00:00:00+00:00</published>
      <updated>2021-12-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html">&lt;p&gt;In the last few days we had a surge of folks in our community reaching out with
concerns over the &lt;a href=&quot;https://www.lunasec.io/docs/blog/log4j-zero-day/&quot;&gt;Log4Shell exploit&lt;/a&gt;
(&lt;a href=&quot;https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228&quot;&gt;CVE-2021-44228&lt;/a&gt;),
and we want to inform you that &lt;strong&gt;Trino is not affected&lt;/strong&gt;. Trino does not use log4j
in the core engine or runtime classes. There are some connectors that include 
the log4j dependency from client dependencies, but are either not used or are 
not versions affected by the Log4Shell vulnerability. Regular security reviews, 
including code and dependency analysis, are part of the regular development 
process. As we learn more we will update the code to keep vulnerabilities out of
the code.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/log4shell/log4shell.jpeg&quot; /&gt;
&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;trino-connectors-with-the-log4j-dependency&quot;&gt;Trino connectors with the Log4j dependency&lt;/h2&gt;

&lt;p&gt;If you do a search in the Trino repository, you’ll notice two direct 
dependencies of the log4j dependency shows up in two of the connectors, Accumulo
and Elasticsearch.&lt;/p&gt;

&lt;h3 id=&quot;accumulo&quot;&gt;Accumulo&lt;/h3&gt;

&lt;p&gt;The Accumulo connector depends on log4j 1.2.17, which although isn’t vulnerable
to Log4Shell, has other vulnerabilities. These vulnerabilities do not apply to 
how we’ve used the loggers in the connector code. To be clear, despite the small
use of this logger in the Accumulo connector, there is still no threat even if 
you are using it. We are &lt;a href=&quot;https://github.com/trinodb/trino/issues/8781&quot;&gt;working on removing&lt;/a&gt;
the uses of this log4j library to avoid any confusion in an upcoming release.&lt;/p&gt;

&lt;h3 id=&quot;elasticsearch&quot;&gt;Elasticsearch&lt;/h3&gt;

&lt;p&gt;The Elasticsearch connector did have an affected dependency 
&lt;a href=&quot;https://github.com/trinodb/trino/commit/2018a94253d48cfdce283538855ee65950f9be3d&quot;&gt;that was recently removed&lt;/a&gt;.
Log4j was not being used in the connector. So despite the existence of the 
dependency in the Elasticsearch connector, there is no direct use of the 
vulnerable library.&lt;/p&gt;

&lt;h2 id=&quot;avoiding-future-introduction-of-log4shell&quot;&gt;Avoiding future introduction of Log4Shell&lt;/h2&gt;

&lt;p&gt;We take security seriously on the Trino project, as it provides a single point 
of access to your data sources. We’re taking precautionary measures to protect 
against the vulnerability from creeping its way into future versions. In version
366, we’re removing that dependency and &lt;a href=&quot;https://github.com/trinodb/trino/commit/10ba96c63ed3875d9dcca335e49bc73f5c0a6a8c&quot;&gt;adding a dedicated rule&lt;/a&gt;
to the build process to ban log4j as a direct dependency.&lt;/p&gt;

&lt;h2 id=&quot;what-should-you-do&quot;&gt;What should you do?&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Rest assured that there is no vulnerability in your Trino cluster.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If you’ve created your own plugin with one of the affected log4j libraries, 
you should upgrade as quickly as possible to 2.15.0 or higher.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;In the coming weeks, upgrade to the 366 release at your convenience.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We know there can be a lot of concern when vulnerabilities come up. We wish you
all the best of luck while you work hard to mitigate the risk of exploits in 
your systems. If you have any questions, reach out on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>In the last few days we had a surge of folks in our community reaching out with concerns over the Log4Shell exploit (CVE-2021-44228), and we want to inform you that Trino is not affected. Trino does not use log4j in the core engine or runtime classes. There are some connectors that include the log4j dependency from client dependencies, but are either not used or are not versions affected by the Log4Shell vulnerability. Regular security reviews, including code and dependency analysis, are part of the regular development process. As we learn more we will update the code to keep vulnerabilities out of the code.</summary>

      
      
    </entry>
  
    <entry>
      <title>JVM challenges in production</title>
      <link href="https://trino.io/blog/2021/10/06/jvm-issues-at-comcast.html" rel="alternate" type="text/html" title="JVM challenges in production" />
      <published>2021-10-06T00:00:00+00:00</published>
      <updated>2021-10-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/10/06/jvm-issues-at-comcast</id>
      <content type="html" xml:base="https://trino.io/blog/2021/10/06/jvm-issues-at-comcast.html">&lt;p&gt;At Comcast, we have a large on-premise Trino cluster. It enables us to extract
insights from data no matter where it resides, and prepares the company for a
more cloud-centric future. Recently, however, we experienced and overcame
challenges related to the Java virtual machine (JVM). We wanted to share what
we encountered and learned in hopes that it might be useful for the Trino
community.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;jit-recompilation&quot;&gt;JIT recompilation&lt;/h2&gt;

&lt;p&gt;Some users complained that nightly reports were taking far too long to
complete. Queries that ran for six hours made very little progress.&lt;/p&gt;

&lt;p&gt;First, we looked at the queries involved in these nightly reports. We
noticed that all these queries involved two particular tables. In this post,
let’s call them table A and table B.&lt;/p&gt;

&lt;p&gt;Our initial suspicion was that there could be an issue with the table data in
HDFS. Thus, we tried to reproduce the performance problem by using queries that
performed simple scans against these tables.&lt;/p&gt;

&lt;p&gt;We tried a simple table scan with no filters, range filter on a partitioned
column, etc.,  ran these queries multiple times and execution times were
consistent. This ruled out a potential problem with HDFS.&lt;/p&gt;

&lt;p&gt;Next, we took a closer look  at the portion of the slow running queries
involving table A, and came up with the simplest possible query that could
demonstrate the problem. We discovered that the following query did not exhibit
the performance problem:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.B da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN &apos;2021-03-22&apos; AND &apos;2021-04-21&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But adding a predicate, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.c2 = &apos;4 (Success)&apos;&lt;/code&gt;, caused the performance problem
to appear:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.date_dim da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN &apos;2021-03-22&apos; AND &apos;2021-04-21&apos;
 AND a.c2 = &apos;4 (Success)&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We narrowed the problem down to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Scan/Filter/Project&lt;/code&gt; operator using the
output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt; from Trino. For the query that performed as
expected, this stage had the following CPU stats:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 2.39h, Scheduled: 4.47h, Input: 17434967615 rows (357.47GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For the version of the query with the additional predicate, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.c2 = &apos;4 (Success)&apos;&lt;/code&gt;,
that exhibited the performance problem, the same stage has the following CPU
stats:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 3.73d, Scheduled: 48.01d, Input: 17052985227 rows (413.98GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This shows that for roughly the equivalent amount of data, Trino used
significantly more CPU (3.73 days to 2.39 hours!!). Our next step was to
determine possible reasons.&lt;/p&gt;

&lt;p&gt;We generated a few &lt;a href=&quot;https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html&quot;&gt;jstack&lt;/a&gt;
and Java flight recorder (JFR) profiles of the Trino Java process from
one of the worker nodes while the scan stage was running. After analyzing these
profiles, we found no obvious problem. Trino performed as expected.&lt;/p&gt;

&lt;p&gt;Next, we looked at the list of tasks in the web UI to see what the distribution
of CPU times for each stage was:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/jvm-issues-at-comcast/web_ui_before.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Some workers have tasks that only use up a few minutes of CPU time and others
have tasks that use up to 2 hours of CPU time! Different query runs would show
this would happen to different workers so it was not a problem with any one
individual worker.&lt;/p&gt;

&lt;p&gt;We discussed this with Starburst engineer, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;,
and came to the conclusion that this could potentially be an issue with JVM
code deoptimization. After re-compiling a method a certain number of times,
the JVM refuses to do so any more and will run the method in interpreted
mode, which is much slower.&lt;/p&gt;

&lt;p&gt;The evidence for this is what we highlighted above: that the CPU used by the
same tasks on different workers vary by a factor of approximately 30. This is
the typical difference for compiled versus interpreted code, according to
Piotr’s experience at Starburst.&lt;/p&gt;

&lt;p&gt;The following JVM options were added to the Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; file to help
with this issue:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:PerMethodRecompilationCutoff=10000&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:PerBytecodeRecompilationCutoff=10000&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These settings increased the recompilation cutoff limit. They are now also
included in the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; settings that ship with Trino since the
348 release.&lt;/p&gt;

&lt;p&gt;Since we have been running Trino in production for some time, we did not have
these settings in our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;initial-results&quot;&gt;Initial results&lt;/h3&gt;

&lt;p&gt;Execution time observed  with the JVM options in place was 4 minutes and 51
seconds. The CPU stats for the scan/filter/project stage for this query now
look like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 3.22h, Scheduled: 7.21h, Input: 17631445897 rows (428.03GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The CPU used by individual tasks is much more uniform:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/jvm-issues-at-comcast/web_ui_after.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;code-cache&quot;&gt;Code cache&lt;/h2&gt;

&lt;p&gt;We noticed that the cluster’s overall CPU utilization decreased after the
cluster was up for a few days, and there would be a few workers where tasks
were running slow.&lt;/p&gt;

&lt;p&gt;When looking at these workers with slow running tasks, we found that CPU usage
was very high:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@worker-node log]# uptime
 21:36:57 up 20 days, 20:39,  1 user,  load average: 149.92, 152.83, 144.82
[root@worker-node log]#
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We also noticed all these workers had messages like this in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;launcher.log&lt;/code&gt;
file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[219756.210s][warning][codecache] Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap &apos;profiled nmethods&apos; is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
CodeHeap &apos;non-profiled nmethods&apos;: size=258436Kb used=235661Kb max_used=257882Kb free=22774Kb
 bounds [0x00007f466f980000, 0x00007f467f5e1000, 0x00007f467f5e1000]
CodeHeap &apos;profiled nmethods&apos;: size=258432Kb used=207330Kb max_used=216383Kb free=51101Kb
 bounds [0x00007f465fd20000, 0x00007f466f980000, 0x00007f466f980000]
CodeHeap &apos;non-nmethods&apos;: size=7420Kb used=1881Kb max_used=3766Kb free=5538Kb
 bounds [0x00007f465f5e1000, 0x00007f465fab1000, 0x00007f465fd20000]
 total_blobs=64220 nmethods=62699 adapters=1432
 compilation: disabled (not enough contiguous free space left)
              stopped_count=4, restarted_count=3
 full_count=3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the code cache is full, the JVM won’t compile any additional code until
space is freed.&lt;/p&gt;

&lt;p&gt;We were running with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:ReservedCodeCacheSize&lt;/code&gt; JVM option set to 512M.
To see what’s taking up space in the code cache, we used jcmd:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jcmd &amp;lt;TRINO_PID&amp;gt; Compiler.CodeHeap_Analytics
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We ran this at various intervals so we could compare how the code cache changed
over time.&lt;/p&gt;

&lt;p&gt;30 of the top 48 non-profiled methods were &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesHashStrategy&lt;/code&gt;, which are
generated per-query. These can’t be removed from the cache until the query is
completed, so the amount of cache needed is going to be relative to the
concurrency. We have a very busy cluster with significant concurrency at our
busiest times.&lt;/p&gt;

&lt;p&gt;Next, we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:ReservedCodeCacheSize&lt;/code&gt; to 2G to see how that would help. We
have not seen the code cache fill while the cluster has been running since
increasing the size to 2GB. We can also monitor the size of the code cache over
time using JMX. One query that can be used if you have the JMX catalog enabled
on your cluster is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
    node,
    regexp_extract(usage, &apos;max=(-?\d*)&apos;, 1) as max,
    regexp_extract(usage, &apos;used=(-?\d*)&apos;, 1) AS used
FROM
  jmx.current.&quot;java.lang:name=codeheap &apos;non-profiled nmethods&apos;,type=memorypool&quot;
ORDER BY used DESC
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;off-heap-memory-usage&quot;&gt;Off heap memory usage&lt;/h2&gt;

&lt;p&gt;One final JVM issue we noticed in our production cluster was that off-heap
memory on some workers grew to be quite large. We allocate approximately 85%
of the physical memory on our workers for the JVM heap. Recently, we received
alerts from our monitoring systems that memory consumption on our workers got
dangerously close to the physical limit on the machines.&lt;/p&gt;

&lt;p&gt;We noticed some memory related issues from the Alluxio client in the Trino
worker logs on machines generating these high memory alerts. Upon further
investigation, we noticed that Trino was running with the open source version
of the Alluxio client. Trino ships with version 2.4.0 of the Alluxio client. We
are an Alluxio customer and use it in our environment.&lt;/p&gt;

&lt;p&gt;After discussing with Alluxio, they suggested we upgrade to version 2.4.1 of
their Enterprise client which includes a fix for an off-heap memory leak bug.
After upgrading to the Alluxio Enterprise client, the off-heap memory usage
became a lot more stable.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;This post outlined some of the JVM issues we encountered while running Trino in
production. Many of these issues we only hit in our production environment and
were difficult to replicate outside of it. Thus, we wanted to write up our 
experience with the hopes of helping other Trino users in the future!&lt;/p&gt;</content>

      
        <author>
          <name>Sajumon Joseph, David Leach, Bryan Aller, Pavan Madhineni, Lavanya Ragothaman, Pratap Moturi, Pádraig O&apos;Sullivan (Starburst)</name>
        </author>
      

      <summary>At Comcast, we have a large on-premise Trino cluster. It enables us to extract insights from data no matter where it resides, and prepares the company for a more cloud-centric future. Recently, however, we experienced and overcame challenges related to the Java virtual machine (JVM). We wanted to share what we encountered and learned in hopes that it might be useful for the Trino community.</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing Trino Summit</title>
      <link href="https://trino.io/blog/2021/09/23/announcing_trino_summit.html" rel="alternate" type="text/html" title="Announcing Trino Summit" />
      <published>2021-09-23T00:00:00+00:00</published>
      <updated>2021-09-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/09/23/announcing_trino_summit</id>
      <content type="html" xml:base="https://trino.io/blog/2021/09/23/announcing_trino_summit.html">&lt;p&gt;Greetings Trino nation,&lt;/p&gt;

&lt;p&gt;Get ready for this year’s virtual Trino Summit event! This year’s summit feels a
little different as the name of the event has changed from Presto to Trino. So
this will be the first event of the project hosted &lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;under the new banner of Trino&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;This year’s Summit is hosted by Starburst virtually on October 21st and 22nd.  We’d originally set the date on September 15th but later realized that this was conflicting with Yom Kippur. While we had originally set out to make this event a hybrid format, we had to make the difficult decision of moving the event to fully virtual in lieu of the growing health concerns around contracting and spreading the delta variant. If you haven’t registered yet, &lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;register here&lt;/a&gt;. If you planned on attending in person, we will still have your registration and you will still be able to attend virtually.&lt;/p&gt;

&lt;p&gt;Get excited for our great lineup of speakers, panels, and presentations! We’re always on the lookout for speakers who are excited to share their Trino experiences.&lt;/p&gt;

&lt;p&gt;We look forward to seeing you there!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Greetings Trino nation, Get ready for this year’s virtual Trino Summit event! This year’s summit feels a little different as the name of the event has changed from Presto to Trino. So this will be the first event of the project hosted under the new banner of Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice IV: Deep dive into Iceberg internals</title>
      <link href="https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals.html" rel="alternate" type="text/html" title="Trino on ice IV: Deep dive into Iceberg internals" />
      <published>2021-08-12T00:00:00+00:00</published>
      <updated>2021-08-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals</id>
      <content type="html" xml:base="https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far, this series has covered some very interesting user level concepts of the
Iceberg model, and how you can take advantage of them using the Trino query 
engine. This blog post dives into some implementation details of Iceberg by 
dissecting some files that result from various operations carried out using 
Trino. To dissect you must use some surgical instrumentation, namely Trino, Avro
tools, the MinIO client tool and Iceberg’s core library. It’s useful to dissect
how these files work, not only to help understand how Iceberg works, but also to
aid in troubleshooting issues, should you have any issues during ingestion or
querying of your Iceberg table. I like to think of this type of debugging much
like a fun game of operation, and you’re looking to see what causes the red
errors to fly by on your screen.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/operation.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;understanding-iceberg-metadata&quot;&gt;Understanding Iceberg metadata&lt;/h2&gt;

&lt;p&gt;Iceberg can use any compatible metastore, but for Trino, it only supports the 
Hive metastore and AWS Glue similar to the Hive connector. This is because there
is already a vast amount of testing and support for using the Hive metastore in
Trino. Likewise, many Trino use cases that currently use data lakes already use
the Hive connector and therefore the Hive metastore. This makes it convenient to
have as the leading supported use case as existing users can easily migrate
between Hive to Iceberg tables. Since there is no indication of which connector
is actually executed in the diagram of the Hive connector architecture, it
serves as a diagram that can be used for both Hive and Iceberg. The only
difference is the connector used, but if you create a table in Hive, you can 
view the same table in Iceberg.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metadata.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To recap the steps taken from the first three blogs; the first blog created an
events table, while the first two blogs ran two insert statements. The first
insert contained three records, while the second insert contained a single
record.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-snapshot-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Up until this point, the state of the files in MinIO haven’t really been shown
except some of the manifest list pointers from the snapshot in the third blog
post. Using the &lt;a href=&quot;https://docs.min.io/minio/baremetal/reference/minio-cli/minio-mc.html&quot;&gt;MinIO client tool&lt;/a&gt;,
you can list files that Iceberg generated through all these operations and then
try to understand what purpose they are serving.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% mc tree -f local/
local/
└─ iceberg
   └─ logging.db
      └─ events
         ├─ data
         │  ├─ event_time_day=2021-04-01
         │  │  ├─ 51eb1ea6-266b-490f-8bca-c63391f02d10.orc
         │  │  └─ cbcf052d-240d-4881-8a68-2bbc0f7e5233.orc
         │  └─ event_time_day=2021-04-02
         │     └─ b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc
         └─ metadata
            ├─ 00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json
            ├─ 00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json
            ├─ 00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
            ├─ 23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro
            ├─ 92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro
            ├─ snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro
            ├─ snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro
            └─ snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are a lot of files here, but here are a couple of patterns that you
can observe with these files.&lt;/p&gt;

&lt;p&gt;First, the top two directories are named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metadata&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/data//&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;As you might expect, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; contains the actual ORC files split by partition.
This is akin to what you would see in a Hive table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; directory. What is
really of interest here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metadata&lt;/code&gt; directory. There are specifically
three patterns of files you’ll find here.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&amp;lt;file-id&amp;gt;.avro/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/snap-&amp;lt;snapshot-id&amp;gt;-&amp;lt;version&amp;gt;-&amp;lt;file-id&amp;gt;.avro&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&amp;lt;version&amp;gt;-&amp;lt;commit-UUID&amp;gt;.metadata.json&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Iceberg has a persistent tree structure that manages various snapshots of the
data that are created for every mutation of the data. This enables not only a
concurrency model that supports serializable isolation, but also cool features
like time travel across a linear progression of snapshots.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metastore-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This tree structure contains two types of Avro files, manifest lists and
manifest files. Manifest list files contain pointers to various manifest files
and the manifest files themselves point to various data files. This post starts
out by covering these manifest files, and later covers the table metadata files
that are suffixed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.metadata.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;The last blog covered&lt;/a&gt;
the command in Trino that shows the snapshot information that is stored in the
metastore. Here is that command and its output again for your review.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT manifest_list 
FROM iceberg.logging.&quot;events$snapshots&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshots&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You’ll notice that the manifest list returns the Avro files prefixed with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snap-&lt;/code&gt; are returned. These files are directly correlated with the snapshot
record stored in the metastore. According to the diagram above, snapshots are
records in the metastore that contain the url of the manifest list in the Avro
file. Avro files are binary files and not something you can just open up in a
text editor to read. Using the 
&lt;a href=&quot;https://downloads.apache.org/avro/avro-1.10.2/java/avro-tools-1.10.2.jar&quot;&gt;avro-tools.jar tool&lt;/a&gt;
distributed by the 
&lt;a href=&quot;https://avro.apache.org/docs/current/index.html&quot;&gt;Apache Avro project&lt;/a&gt;,
you can actually inspect the contents of this file to get a better understanding
of how it is used by Iceberg.&lt;/p&gt;

&lt;p&gt;The first snapshot is generated on the creation of the events table. Upon
inspecting this file, you notice that the file is empty. The output is an
empty line that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jq&lt;/code&gt; JSON command line utility removes on pretty printing
the JSON that is returned, which is just a newline. This snapshot represents an
empty state of the table upon creation. To investigate the snapshots you need to
download the files to your local filesystem. Let’s move them to the home 
directory:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result (is empty):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The second snapshot is a little more interesting and actually shows us the 
contents of a manifest list.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro&quot;,
   &quot;manifest_length&quot;:6114,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;added_data_files_count&quot;:{
      &quot;int&quot;:2
   },
   &quot;existing_data_files_count&quot;:{
      &quot;int&quot;:0
   },
   &quot;deleted_data_files_count&quot;:{
      &quot;int&quot;:0
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001fI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:3
   },
   &quot;existing_rows_count&quot;:{
      &quot;long&quot;:0
   },
   &quot;deleted_rows_count&quot;:{
      &quot;long&quot;:0
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To understand each of the values in each of these rows, you can refer to the 
Iceberg 
&lt;a href=&quot;https://iceberg.apache.org/spec/#manifest-lists&quot;&gt;specification in the manifest list file section&lt;/a&gt;.
Instead of covering these exhaustively, let’s focus on a few key fields. Below
are the fields, and their definition according to the specification.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;manifest_path&lt;/code&gt; - Location of the manifest file.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;partition_spec_id&lt;/code&gt; - ID of a partition spec used to write the manifest; must
be listed in table metadata partition-specs.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;added_snapshot_id&lt;/code&gt; - ID of the snapshot where the manifest file was added.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;partitions&lt;/code&gt; - A list of field summaries for each partition field in the spec.
Each field in the list corresponds to a field in the manifest file’s partition
spec.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;added_rows_count&lt;/code&gt; - Number of rows in all files in the manifest that have
status ADDED, when null this is assumed to be non-zero.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As mentioned above, manifest lists hold references to various manifest files.
These manifest paths are the pointers in the persistent tree that tells any
client using Iceberg where to find all of the manifest files associated with a
particular snapshot. To traverse this tree, you can look over the different
manifest paths to find all the manifest files associated with the particular
snapshot you want to traverse. Partition spec ids are helpful to know the
current partition specification which are stored in the table metadata in the
metastore. This references where to find the spec in the metastore. Added
snapshot ids tells you which snapshot is associated with the manifest list.
Partitions hold some high level partition bound information to make for faster
querying. If a query is looking for a particular value, it only traverses the
manifest files where the query values fall within the range of the file values.
Finally, you get a few metrics like the number of changed rows and data files,
one of which is the count of added rows. The first operation consisted of three
rows inserts and the second operation was the insertion of one row. Using the
row counts you can easily determine which manifest file belongs to which
operation.&lt;/p&gt;

&lt;p&gt;The following command shows the final snapshot after both operations executed
and filters out only the fields pointed out above.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro | jq &apos;. | {manifest_path: .manifest_path, partition_spec_id: .partition_spec_id, added_snapshot_id: .added_snapshot_id, partitions: .partitions, added_rows_count: .added_rows_count }&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro&quot;,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:4564366177504223700
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:1
   }
}
{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro&quot;,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001fI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:3
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the listing of the manifest file related to the last snapshot, you notice the
first operation where three rows were inserted is contained in the manifest file
in the second JSON object. You can determine this from the snapshot id, as well
as, the number of rows that were added in the operation. The first JSON object
contains the last operation that inserted a single row. So the most recent
operations are listed in reverse commit order.&lt;/p&gt;

&lt;p&gt;The next command does the same listing of the file that you ran with the
manifest list, except you run this on the manifest files themselves to expose
their contents and discuss them. To begin with, you run the command to show the
contents of the manifest file associated with the insertion of three rows.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/avro-tools-1.10.0.jar tojson ~/Desktop/avro_files/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;status&quot;:1,
   &quot;snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;data_file&quot;:{
      &quot;file_path&quot;:&quot;s3a://iceberg/logging.db/events/data/event_time_day=2021-04-01/51eb1ea6-266b-490f-8bca-c63391f02d10.orc&quot;,
      &quot;file_format&quot;:&quot;ORC&quot;,
      &quot;partition&quot;:{
         &quot;event_time_day&quot;:{
            &quot;int&quot;:18718
         }
      },
      &quot;record_count&quot;:1,
      &quot;file_size_in_bytes&quot;:870,
      &quot;block_size_in_bytes&quot;:67108864,
      &quot;column_sizes&quot;:null,
      &quot;value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:1
            }
         ]
      },
      &quot;null_value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:0
            }
         ]
      },
      &quot;nan_value_counts&quot;:null,
      &quot;lower_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Oh noes&quot;
            }
         ]
      },
      &quot;upper_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Oh noes&quot;
            }
         ]
      },
      &quot;key_metadata&quot;:null,
      &quot;split_offsets&quot;:null
   }
}
{
   &quot;status&quot;:1,
   &quot;snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;data_file&quot;:{
      &quot;file_path&quot;:&quot;s3a://iceberg/logging.db/events/data/event_time_day=2021-04-02/b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc&quot;,
      &quot;file_format&quot;:&quot;ORC&quot;,
      &quot;partition&quot;:{
         &quot;event_time_day&quot;:{
            &quot;int&quot;:18719
         }
      },
      &quot;record_count&quot;:2,
      &quot;file_size_in_bytes&quot;:1084,
      &quot;block_size_in_bytes&quot;:67108864,
      &quot;column_sizes&quot;:null,
      &quot;value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:2
            }
         ]
      },
      &quot;null_value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:0
            }
         ]
      },
      &quot;nan_value_counts&quot;:null,
      &quot;lower_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Double oh noes&quot;
            }
         ]
      },
      &quot;upper_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;WARN&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Maybeh oh noes?&quot;
            }
         ]
      },
      &quot;key_metadata&quot;:null,
      &quot;split_offsets&quot;:null
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now this is a very big output, but in summary, there’s really not too much to
these files. As before, there is a 
&lt;a href=&quot;https://iceberg.apache.org/spec/#manifests&quot;&gt;Manifest section in the Iceberg spec&lt;/a&gt;
that details what each of these fields means. Here are the important fields:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_id&lt;/code&gt; - Snapshot id where the file was added, or deleted if status is
two. Inherited when null.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file&lt;/code&gt; - Field containing metadata about the data files pertaining to the
manifest file, such as file path, partition tuple, metrics, etc…&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.file_path&lt;/code&gt; - Full URI for the file with FS scheme.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.partition&lt;/code&gt; - Partition data tuple, schema based on the partition
spec.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.record_count&lt;/code&gt; - Number of records in the data file.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.*_count&lt;/code&gt; - Multiple fields that contain a map from column id to 
number of values, null, nan counts in the file. These can be used to quickly 
filter out unnecessary get operations.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.*_bounds&lt;/code&gt; - Multiple fields that contain a map from column id to
lower or upper bound in the column serialized as binary. Each value must be less
than or equal to all non-null, non-NaN values in the column for the file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each data file struct contains a partition and data file that it maps to. These
files only be scanned and returned if the criteria for the query is met when 
checking all of the count, bounds, and other statistics that are recorded in the
file. Ideally only files that contain data relevant to the query should be
scanned at all. Having information like the record count may also help in the
query planning process to determine splits and other information. This
particular optimization hasn’t been completed yet as planning typically happens
before traversal of the files. It is still in ongoing discussion and
&lt;a href=&quot;https://youtu.be/ifXpOn0NJWk?t=2132&quot;&gt;is discussed a bit by Iceberg creator Ryan Blue in a recent meetup&lt;/a&gt;.
If this is something you are interested in, keep posted on the Slack channel and
releases as the Trino Iceberg connector progresses in this area.&lt;/p&gt;

&lt;p&gt;As mentioned above, the last set of files that you find in the metadata
directory which are suffixed with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.metadata.json&lt;/code&gt;. These files at baseline are
a bit strange as they aren’t stored in the Avro format, but instead the JSON
format. This is because they are not part of the persistent tree structure.
These files are essentially a copy of the table metadata that is stored in the
metastore. You can find the fields for the table metadata listed
&lt;a href=&quot;https://iceberg.apache.org/spec/#table-metadata-fields&quot;&gt;in the Iceberg specification&lt;/a&gt;.
These tables are typically stored persistently in a metasture much like the Hive
metastore but could easily be replaced by any datastore that can support 
&lt;a href=&quot;https://iceberg.apache.org/spec/#metastore-tables&quot;&gt;an atomic swap (check-and-put) operation&lt;/a&gt;
required for Iceberg to support the optimistic concurrency operation.&lt;/p&gt;

&lt;p&gt;The naming of the table metadata includes a table version and UUID: 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;table-version&amp;gt;-&amp;lt;UUID&amp;gt;.metadata.json&lt;/code&gt;. To commit a new metadata version, which
just adds 1 to the current version number, the writer performs these steps:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It creates a new table metadata file using the current metadata.&lt;/li&gt;
  &lt;li&gt;It writes the new table metadata to a file following the naming with the next
version number.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;It requests the metastore swap the table’s metadata pointer from the old
location to the new location.&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;If the swap succeeds, the commit succeeded. The new file is now the 
 current metadata.&lt;/li&gt;
      &lt;li&gt;If the swap fails, another writer has already created their own. The
 current writer goes back to step 1.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to see where this is stored in the Hive metastore, you can reference
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE_PARAMS&lt;/code&gt; table. At the time of writing, this is the only method of
using the metastore that is supported by the Trino Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT PARAM_KEY, PARAM_VALUEFROM metastore.TABLE_PARAMS;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;PARAM_KEY                &lt;/th&gt;
      &lt;th&gt;PARAM_VALUE                                                                                     &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;EXTERNAL                 &lt;/td&gt;
      &lt;td&gt;TRUE                                                                                            &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;metadata_location        &lt;/td&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;numFiles                 &lt;/td&gt;
      &lt;td&gt;2                                                                                               &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;previous_metadata_location&lt;/td&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;table_type               &lt;/td&gt;
      &lt;td&gt;iceberg                                                                                         &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;totalSize                &lt;/td&gt;
      &lt;td&gt;5323                                                                                            &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;transient_lastDdlTime    &lt;/td&gt;
      &lt;td&gt;1622865672                                                                                      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So as you can see, the metastore is saying the current metadata location is the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json&lt;/code&gt; file. Now you can
dive in to see the table metadata that is being used by the Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% cat ~/Desktop/avro_files/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;format-version&quot;:1,
   &quot;table-uuid&quot;:&quot;32e3c271-84a9-4be5-9342-2148c878227a&quot;,
   &quot;location&quot;:&quot;s3a://iceberg/logging.db/events&quot;,
   &quot;last-updated-ms&quot;:1622865686323,
   &quot;last-column-id&quot;:5,
   &quot;schema&quot;:{
      &quot;type&quot;:&quot;struct&quot;,
      &quot;fields&quot;:[
         {
            &quot;id&quot;:1,
            &quot;name&quot;:&quot;level&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;string&quot;
         },
         {
            &quot;id&quot;:2,
            &quot;name&quot;:&quot;event_time&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;timestamp&quot;
         },
         {
            &quot;id&quot;:3,
            &quot;name&quot;:&quot;message&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;string&quot;
         },
         {
            &quot;id&quot;:4,
            &quot;name&quot;:&quot;call_stack&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:{
               &quot;type&quot;:&quot;list&quot;,
               &quot;element-id&quot;:5,
               &quot;element&quot;:&quot;string&quot;,
               &quot;element-required&quot;:false
            }
         }
      ]
   },
   &quot;partition-spec&quot;:[
      {
         &quot;name&quot;:&quot;event_time_day&quot;,
         &quot;transform&quot;:&quot;day&quot;,
         &quot;source-id&quot;:2,
         &quot;field-id&quot;:1000
      }
   ],
   &quot;default-spec-id&quot;:0,
   &quot;partition-specs&quot;:[
      {
         &quot;spec-id&quot;:0,
         &quot;fields&quot;:[
            {
               &quot;name&quot;:&quot;event_time_day&quot;,
               &quot;transform&quot;:&quot;day&quot;,
               &quot;source-id&quot;:2,
               &quot;field-id&quot;:1000
            }
         ]
      }
   ],
   &quot;default-sort-order-id&quot;:0,
   &quot;sort-orders&quot;:[
      {
         &quot;order-id&quot;:0,
         &quot;fields&quot;:[
            
         ]
      }
   ],
   &quot;properties&quot;:{
      &quot;write.format.default&quot;:&quot;ORC&quot;
   },
   &quot;current-snapshot-id&quot;:4564366177504223943,
   &quot;snapshots&quot;:[
      {
         &quot;snapshot-id&quot;:6967685587675910019,
         &quot;timestamp-ms&quot;:1622865672882,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;changed-partition-count&quot;:&quot;0&quot;,
            &quot;total-records&quot;:&quot;0&quot;,
            &quot;total-data-files&quot;:&quot;0&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro&quot;
      },
      {
         &quot;snapshot-id&quot;:2720489016575682283,
         &quot;parent-snapshot-id&quot;:6967685587675910019,
         &quot;timestamp-ms&quot;:1622865680419,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;added-data-files&quot;:&quot;2&quot;,
            &quot;added-records&quot;:&quot;3&quot;,
            &quot;added-files-size&quot;:&quot;1954&quot;,
            &quot;changed-partition-count&quot;:&quot;2&quot;,
            &quot;total-records&quot;:&quot;3&quot;,
            &quot;total-data-files&quot;:&quot;2&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro&quot;
      },
      {
         &quot;snapshot-id&quot;:4564366177504223943,
         &quot;parent-snapshot-id&quot;:2720489016575682283,
         &quot;timestamp-ms&quot;:1622865686278,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;added-data-files&quot;:&quot;1&quot;,
            &quot;added-records&quot;:&quot;1&quot;,
            &quot;added-files-size&quot;:&quot;746&quot;,
            &quot;changed-partition-count&quot;:&quot;1&quot;,
            &quot;total-records&quot;:&quot;4&quot;,
            &quot;total-data-files&quot;:&quot;3&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro&quot;
      }
   ],
   &quot;snapshot-log&quot;:[
      {
         &quot;timestamp-ms&quot;:1622865672882,
         &quot;snapshot-id&quot;:6967685587675910019
      },
      {
         &quot;timestamp-ms&quot;:1622865680419,
         &quot;snapshot-id&quot;:2720489016575682283
      },
      {
         &quot;timestamp-ms&quot;:1622865686278,
         &quot;snapshot-id&quot;:4564366177504223943
      }
   ],
   &quot;metadata-log&quot;:[
      {
         &quot;timestamp-ms&quot;:1622865672894,
         &quot;metadata-file&quot;:&quot;s3a://iceberg/logging.db/events/metadata/00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json&quot;
      },
      {
         &quot;timestamp-ms&quot;:1622865680524,
         &quot;metadata-file&quot;:&quot;s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json&quot;
      }
   ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, these JSON files can quickly grow as you perform different
updates on your table. This file contains a pointer to all of the snapshots and
manifest list files, much like the output you found from looking at the
snapshots in the table. A really important piece to note is the schema is stored
here. This is what Trino uses for validation on inserts and reads. As you may
expect, there is the root location of the table itself, as well as a unique
table identifier. The final part I’d like to note about this file is the
partition-spec and partition-specs fields. The partition-spec field holds the
current partition spec, while the partition-specs is an array that can hold a
list of all partition specs that have existed for this table. As pointed out
earlier, you can have many different manifest files that use different partition
specs. That wraps up all of the metadata file types you can expect to see in
Iceberg!&lt;/p&gt;

&lt;p&gt;This post wraps up the Trino on ice series. Hopefully these blog posts serve as
a helpful initial dialogue about what is expected to grow as a vital portion of
an open data lakehouse stack. What are you waiting for? Come join the fun and
help us implement some of the missing features or instead go ahead and try 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/iceberg/trino-iceberg-minio&quot;&gt;Trino on Ice(berg)&lt;/a&gt;
yourself!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals So far, this series has covered some very interesting user level concepts of the Iceberg model, and how you can take advantage of them using the Trino query engine. This blog post dives into some implementation details of Iceberg by dissecting some files that result from various operations carried out using Trino. To dissect you must use some surgical instrumentation, namely Trino, Avro tools, the MinIO client tool and Iceberg’s core library. It’s useful to dissect how these files work, not only to help understand how Iceberg works, but also to aid in troubleshooting issues, should you have any issues during ingestion or querying of your Iceberg table. I like to think of this type of debugging much like a fun game of operation, and you’re looking to see what causes the red errors to fly by on your screen.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec</title>
      <link href="https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html" rel="alternate" type="text/html" title="Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec" />
      <published>2021-07-30T00:00:00+00:00</published>
      <updated>2021-07-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec</id>
      <content type="html" xml:base="https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the last two blog posts, we’ve covered a lot of cool feature improvements of
Iceberg over the Hive model. I recommend you take a look at those if you haven’t
yet. We introduced concepts and issues that table formats address. This blog 
closes up the overview of Iceberg features by discussing the concurrency model
Iceberg uses to ensure data integrity, how to use snapshots via Trino, and the
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg Specification&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;concurrency-model&quot;&gt;Concurrency Model&lt;/h2&gt;

&lt;p&gt;Some issues with the Hive model are the distinct locations where the metadata is
stored and where the data files are stored. Having your data and metadata split
up like this is a recipe for disaster when trying to apply updates to both
services atomically.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metadata.png&quot; alt=&quot;Iceberg metadata diagram of runtime, and file storage&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A very common problem with Hive is that if a writing process failed during
insertion, many times you would find the data written to file storage, but the
metastore writes failed to occur. Or conversely, the metastore writes were
successful, but the data failed to finish writing to file storage due to a 
network or file IO failure. There’s a good 
&lt;a href=&quot;https://trino.io/episodes/5.html&quot;&gt;Trino Community Broadcast episode&lt;/a&gt; that talks
about a function in Trino that exists to resolve these issues by syncing the
metastore and file storage. You can watch 
&lt;a href=&quot;https://www.youtube.com/watch?v=OXyJFZSsX5w&amp;amp;t=2097s&quot;&gt;a simulation of this error&lt;/a&gt;
on that episode.&lt;/p&gt;

&lt;p&gt;Aside from having issues due to the split state in the system, there are many 
other issues that stem from the file system itself. In the case of HDFS, 
depending on the specific filesystem implementation you are using, you may have
&lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Core_Expectations_of_a_Hadoop_Compatible_FileSystem&quot;&gt;different atomicity guarantees for various file systems and their operations&lt;/a&gt;,
such as creating, deleting, and renaming files and directories. HDFS isn’t the
only troublemaker here. Other than Amazon S3’s 
&lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3-now-delivers-strong-read-after-write-consistency-automatically-for-all-applications/&quot;&gt;recent announcement of strong consistency in their S3 service,&lt;/a&gt;
most object storage systems only offer &lt;em&gt;eventual&lt;/em&gt; consistency that may not show
the latest files immediately after writes. Despite storage systems showing more
progress towards offering better performance and guarantees, these systems still
offer no reliable locking mechanism.&lt;/p&gt;

&lt;p&gt;Iceberg addresses all of these issues in a multitude of ways. One of the primary
ways Iceberg introduces transactional guarantees is by storing the metadata in
the same datastore as the data itself. This simplifies handling commit failures
down to rolling back on one system rather than trying to coordinate a rollback
across two systems like in Hive. Writers independently write their metadata and
attempt to perform their operations, needing no coordination with other writers.
The only time the writers coordinate is when they attempt to perform a commit of
their operations. In order to do a commit, they perform a lock of the current
snapshot record in a database. This concurrency model where writers eagerly do
the work upfront is called &lt;strong&gt;&lt;em&gt;optimistic concurrency control&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Currently, in Trino, this method still uses the Hive metastore to perform the
lock-and-swap operation necessary to coordinate the final commits. Iceberg 
creator, &lt;a href=&quot;https://www.linkedin.com/in/rdblue/&quot;&gt;Ryan Blue&lt;/a&gt;, 
&lt;a href=&quot;https://youtu.be/-iIY2sOFBRc?t=1351&quot;&gt;covers this lock-and-swap mechanism&lt;/a&gt; and
how the metastore can be replaced with alternate locking methods. In the event
that &lt;a href=&quot;https://iceberg.apache.org/reliability/#concurrent-write-operations&quot;&gt;two writers attempt to commit at the same time&lt;/a&gt;,
the writer that first acquires the lock successfully commits by swapping its
snapshot as the current snapshot, while the second writer will retry to apply
its changes again. The second writer should have no problem with this, assuming
there are no conflicting changes between the two snapshots.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This works similarly to a git workflow where the main branch is the locked
resource, and two developers try to commit their changes at the same time. The
first developer’s changes may conflict with the second developer’s changes. The
second developer is then forced to rebase or merge the first developer’s code
with their changes before commiting to the main branch again. The same logic
applies to merging data files. Currently, Iceberg clients use a
&lt;a href=&quot;https://iceberg.apache.org/reliability/#concurrent-write-operations&quot;&gt;copy-on-write mechanism&lt;/a&gt;
that makes a new file out of the merged data in the next snapshot. This enables
accurate time traveling and preserves previous split versions of the files. At
the time of writing, upserts via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE INTO&lt;/code&gt; syntax are not supported in Trino,
but 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/7708&quot;&gt;this is in active development&lt;/a&gt;.
&lt;strong&gt;&lt;em&gt;UPDATE:&lt;/em&gt;&lt;/strong&gt; Since the original writing of this post, the 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/7933&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; syntax exists as of version 393&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One of the great benefits of tracking each individual change that gets written
to Iceberg is that you are given a view of the data at every point in time. This
enables a really cool feature that I mentioned earlier called &lt;strong&gt;&lt;em&gt;time travel&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;snapshots-and-time-travel&quot;&gt;Snapshots and Time Travel&lt;/h2&gt;

&lt;p&gt;To showcase snapshots, it’s best to go over a few examples drawing from the
event table we 
&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;created in the previous blog posts&lt;/a&gt;.
This time we’ll only be working with the Iceberg table, as this capability is
not available in Hive. Snapshots allow you to have an immutable set of your data
at a given time. They are automatically created on every append or removal of
data. One thing to note is that for now, they do not store the state of your
metadata.&lt;/p&gt;

&lt;p&gt;Say that you have created your events table and inserted the three initial rows
as we did previously. Let’s look at the data we get back and see how to check
the existing snapshots in Trino:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;To query the snapshots, all you need is to use the $ operator appended to the
end of the table name, and add the hidden table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshots&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshot_id&lt;/th&gt;
      &lt;th&gt;parent_id&lt;/th&gt;
      &lt;th&gt;operation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Let’s take a look at the manifest list files that are associated with each 
snapshot ID. You can tell which file belongs to which snapshot based on the 
snapshot ID embedded in the filename:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT manifest_list
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;shapshots&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-7620328658793169607-1-cc857d89-1c07-4087-bdbc-2144a814dae2.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-2115743741823353537-1-4cb458be-7152-4e99-8db7-b2dda52c556c.avro&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Now, let’s insert another row to the table:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO iceberg.logging.events
VALUES
(
‘INFO’,
timestamp ‘2021-04-02 00:00:11.1122222’,
‘It is all good’,
ARRAY [‘Just updating you!’]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s check the snapshot table again:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshot_id&lt;/th&gt;
      &lt;th&gt;parent_id&lt;/th&gt;
      &lt;th&gt;operation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;7030511368881343137&lt;/td&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Let’s also verify that our row was added:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;It is all good&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Since Iceberg is already tracking the list of files added and removed at each
snapshot, it would make sense that you can travel back and forth between these
different views into the system, right? This concept is called time traveling.
You need to specify which snapshot you would like to read from and you will see
the view of the data at that timestamp. In Trino, you need to use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt;
operator, followed by the snapshot you wish to read from:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.“events@2115743741823353537”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If you determine there is some issue with your data, you can always roll back to
the previous state permanently as well. In Trino we have a function called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rollback_to_snapshot&lt;/code&gt; to move the table state to another snapshot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CALL system.rollback_to_snapshot(‘logging’, ‘events’, 2115743741823353537);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that we have rolled back, observe what happens when we query the events
table with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Notice the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INFO&lt;/code&gt; row is still missing even though we query the table without
specifying a snapshot id. Now just because we rolled back, doesn’t mean we’ve
lost the snapshot we just rolled back from. In fact, we can roll forward, or as
I like to call it, 
&lt;a href=&quot;https://en.wikipedia.org/wiki/Back_to_the_Future&quot;&gt;back to the future&lt;/a&gt;! In
Trino, you use the same function call but with a predecessor of the existing
snapshot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CALL system.rollback_to_snapshot(‘logging’, ‘events’, 7030511368881343137)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now we should be able to query the table again and see the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INFO&lt;/code&gt; row 
return:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;It is all good&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;As expected, the INFO row returns when you roll back to the future.&lt;/p&gt;

&lt;p&gt;Having snapshots not only provides you with a level of immutability that is key
to the eventual consistency model, but gives you a rich set of features to
version and move between different versions of your data like a git repository.&lt;/p&gt;

&lt;h2 id=&quot;iceberg-specification&quot;&gt;Iceberg Specification&lt;/h2&gt;

&lt;p&gt;Perhaps saving the best for last, the benefit of using Iceberg is the community
that surrounds it, and the support you receive. It can be daunting to have to
choose a project that replaces something so core to your architecture. While
Hive has so many drawbacks, one of the things keeping many companies locked in
is the fear of the unknown. How do you know which table format to choose? Are
there unknown data corruption issues that I’m about to take on? What if this
doesn’t scale like it promises on the label? It is worth noting that 
&lt;a href=&quot;https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/&quot;&gt;alternative table formats are also emerging in this space&lt;/a&gt; 
and we encourage you to investigate these for your own use cases. When sitting
down with Iceberg creator, Ryan Blue, 
&lt;a href=&quot;https://www.twitch.tv/videos/989098630&quot;&gt;comparing Iceberg to other table formats&lt;/a&gt;, 
he claims the community’s greatest strength is their ability to look forward.
They intentionally broke compatibility with Hive to enable them to provide a
richer level of features. Unlike Hive, the Iceberg project explained their
thinking in a spec.&lt;/p&gt;

&lt;p&gt;The strongest argument I can see for Iceberg is that it has a 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;specification&lt;/a&gt;. This is something that has
largely been missing from Hive and shows a real maturity in how the Iceberg
community has approached the issue. On the Trino project, we think standards are
important. We adhere to many of them ourselves, such as the ANSI SQL syntax, and
exposing the client through a JDBC connection. By creating a standard around
this, you’re no longer tied to any particular technology, not even Iceberg
itself. You are adhering to a standard that will hopefully become the de facto
standard over a decade or two, much like Hive did. Having the standard in clear
writing invites multiple communities to the table and brings even more use 
cases. Doing so improves the standards and therefore the technologies that
implement them.&lt;/p&gt;

&lt;p&gt;The previous three blog posts of this series covered the features and massive
benefits from using this novel table format. The following post will dive deeper
and discuss more about how Iceberg achieves some of this functionality, with an
overview into some of the internals and metadata layouts. In the meantime, feel
free to try 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/iceberg/trino-iceberg-minio&quot;&gt;Trino on Ice(berg)&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals In the last two blog posts, we’ve covered a lot of cool feature improvements of Iceberg over the Hive model. I recommend you take a look at those if you haven’t yet. We introduced concepts and issues that table formats address. This blog closes up the overview of Iceberg features by discussing the concurrency model Iceberg uses to ensure data integrity, how to use snapshots via Trino, and the Iceberg Specification.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice II: In-place table evolution and cloud compatibility with Iceberg</title>
      <link href="https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html" rel="alternate" type="text/html" title="Trino on ice II: In-place table evolution and cloud compatibility with Iceberg" />
      <published>2021-07-12T00:00:00+00:00</published>
      <updated>2021-07-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;The first post&lt;/a&gt; 
covered how Iceberg is a table format and not a file format It demonstrated the
benefits of hidden partitioning in Iceberg in contrast to exposed partitioning 
in Hive. There really is no such thing as “exposed partitioning.” I just thought
that sounded better than not-hidden partitioning. If any of that wasn’t clear, I
recommend either that you stop reading now, or go back to the first post before 
starting this one. This post discusses evolution. No, the post isn’t covering 
Darwinian nor Pokémon evolution, but in-place table evolution!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/evolution.gif&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;You may find it a little odd that I am getting excited over tables evolving 
in-place, but as mentioned in the last post, if you have experience performing 
table evolution in Hive, you’d be as happy as Ash Ketchum when Charmander 
evolved into Charmeleon discovering that Iceberg supports Partition evolution 
and schema evolution. That is, until Charmeleon started treating Ash like a jerk
after the evolution from Charmander. Hopefully, you won’t face the same issue 
when your tables evolve.&lt;/p&gt;

&lt;p&gt;Another important aspect that is covered, is how Iceberg is developed with cloud
storage in mind. Hive and other data lake technologies were developed with file
systems as their primary storage layer. This is still a very common layer today,
but as more companies move to include object storage, table formats did not 
adapt to the needs of object stores. Let’s dive in!&lt;/p&gt;

&lt;h2 id=&quot;partition-specification-evolution&quot;&gt;Partition Specification evolution&lt;/h2&gt;

&lt;p&gt;In Iceberg, you are able to update the partition specification, shortened to 
partition spec in Iceberg, on a live table. You do not need to perform a table 
migration as you do in Hive. In Hive, partition specs don’t explicitly exist 
because they are tightly coupled with the creation of the Hive table. Meaning, 
if you ever need to change the granularity of your data partitions at any point,
you need to create an entirely new table, and move all the data to the new 
partition granularity you desire. No pressure on choosing the right granularity
or anything!&lt;/p&gt;

&lt;p&gt;In Iceberg, you’re not required to choose the perfect partition specification 
upfront, and you can have multiple partition specs in the same table, and query
across the different sized partition specs. How great is that! This means, if 
you’re initially partitioning your data by month, and later you decide to move 
to a daily partitioning spec due to a growing ingest from all your new 
customers, you can do so with no migration, and query over the table with no 
issue.&lt;/p&gt;

&lt;p&gt;This is conveyed pretty succinctly in this graphic from the Iceberg 
documentation. At the end of the year 2008, partitioning occurs at a monthly 
granularity and after 2009, it moves to a daily granularity. When the query to 
pull data from December 14th, 2008 and January 13th, 2009, the entire month of 
December gets scanned due to the monthly partition, but for the dates in 
January, only the first 13 days are scanned to answer the query.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/partition-spec-evolution.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;At the time of writing, Trino is able to perform reads from tables that have 
multiple partition spec changes but partition evolution write support does not 
yet exist. &lt;a href=&quot;https://github.com/trinodb/trino/issues/7580&quot;&gt;There are efforts to add this support in the near future&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;schema-evolution&quot;&gt;Schema evolution&lt;/h2&gt;

&lt;p&gt;Iceberg also handles schema evolution much more elegantly than Hive. In Hive, 
adding columns worked well enough, as data inserted before the schema change 
just reports null for that column. For formats that use column names, like ORC 
and Parquet, deletes are also straightforward for Hive, as it simply ignores 
fields that are no longer part of the table. For unstructured files like CSV 
that use the position of the column, deletes would still cause issues, as 
deleting one column shifts the rest of the columns. Renames for schemas pose an 
issue for all formats in Hive as data written prior to the rename is not 
modified to the new field. This effectively works the same as if you deleted 
the old field and added a new column with the new name. This lack of support for
schema evolution across various file types in Hive requires a lot of memorizing
the formats underneath various tables. This is very susceptible to causing user
errors if someone executes one of the unsupported operations on the wrong table.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th colspan=&quot;4&quot;&gt;Hive 2.2.0 schema evolution based on file type and operation.&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;/td&gt;
    &lt;td&gt;Add&lt;/td&gt;
    &lt;td&gt;Delete&lt;/td&gt;
    &lt;td&gt;Rename&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;CSV/TSV&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;JSON&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;ORC/Parquet/Avro&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Currently in Iceberg, schemaless position-based data formats such as CSV and TSV
are not supported, though there are &lt;a href=&quot;https://github.com/apache/iceberg/issues/118&quot;&gt;some discussions on adding limited support 
for them&lt;/a&gt;. This would be good from
a reading standpoint, to load data from the CSV, into an Iceberg format with all
the guarantees that Iceberg offers.&lt;/p&gt;

&lt;p&gt;While JSON doesn’t rely on positional data, it does have an explicit dependency
on names. This means, that if I remove a text column from a JSON table named 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, then later I want to add a new int column called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, I 
encounter an error when I try to read in the data with the string type from 
before when I try to deserialize the JSON files. Even worse would be if the new
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column you add has the same type as the original but a semantically 
different meaning. This results in old rows containing values that are 
unknowingly from a different domain, which can lead to wrong analytics. After 
all, someone who adds the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column might not even be aware of the 
old &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column, if it was quite some time ago when it was dropped.&lt;/p&gt;

&lt;p&gt;ORC, Parquet, and Avro do not suffer from these issues as they are columnar 
formats that keep a schema internal to the file itself, and each format tracks 
changes to the columns through IDs rather than name values or position. Iceberg
uses these unique column IDs to also keep track of the columns as changes are 
applied.&lt;/p&gt;

&lt;p&gt;In general, Iceberg can only allow this small set of file formats due to the 
&lt;a href=&quot;https://iceberg.apache.org/evolution/#correctness&quot;&gt;correctness guarantees&lt;/a&gt; it 
provides. In Trino, you can add, delete, or rename columns using the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE&lt;/code&gt; command. Here’s an example that continues from the table created 
in the last post  that inserted three rows. The DDL statement looked like this.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6), 
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = &apos;ORC&apos;,
  partitioning = ARRAY[&apos;day(event_time)&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE&lt;/code&gt; sequence that adds a new column named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, 
inserts data including into the new column, renames the column, and prints the 
data.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ALTER TABLE iceberg.logging.events ADD COLUMN severity INTEGER; 

INSERT INTO iceberg.logging.events VALUES 
(
  &apos;INFO&apos;, 
  timestamp 
  &apos;2021-04-01 19:59:59.999999&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;es muy bueno&apos;, 
  ARRAY [&apos;It is all normal&apos;], 
  1
);

ALTER TABLE iceberg.logging.events RENAME COLUMN severity TO priority;

SELECT level, message, priority
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;priority&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;es muy bueno&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ALTER TABLE iceberg.logging.events 
DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
   level varchar,
   event_time timestamp(6),
   message varchar,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;]
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice how the priority and severity columns are both not present in the schema.
As noted in the table above, Hive renames cause issues for all file formats. Yet
in Iceberg, performing all these operations causes no issues with the table and
underlying data.&lt;/p&gt;

&lt;h2 id=&quot;cloud-storage-compatibility&quot;&gt;Cloud storage compatibility&lt;/h2&gt;

&lt;p&gt;Not all developers consider or are aware of the performance implications of 
using Hive over a cloud object storage solution like S3 or Azure Blob storage. 
One thing to remember is that Hive was developed with the Hadoop Distributed 
File System (HDFS) in mind. HDFS is a filesystem and is particularly well suited
to handle listing files on the filesystem, because they were stored in a 
contiguous manner. When Hive stores data associated with a table, it assumes 
there is a contiguous layout underneath it and performs list operations that are
expensive on cloud storage systems.&lt;/p&gt;

&lt;p&gt;The common cloud storage systems are typically object stores that do not lay out
the files in a contiguous manner based on paths. Therefore, it becomes very 
expensive to list out all the files in a particular path. Yet, these list 
operations are executed for every partition that could be included in a query, 
regardless of only a single row, in a single file out of thousands of files 
needing to be retrieved to answer the query. Even ignoring the performance costs
for a minute, object stores may also pose issues for Hive due to eventual 
consistency. Inserting and deleting can cause inconsistent results for readers, 
if the files you end up reading are out of date.&lt;/p&gt;

&lt;p&gt;Iceberg avoids all of these issues by tracking the data at the file level, 
rather than the partition level. By tracking the files, Iceberg only accesses 
the files containing data relevant to the query, as opposed to accessing files 
in the same partition looking for the few files that are relevant to the query. 
Further, this allows Iceberg to control for the inconsistency issue in 
cloud-based file systems by using a locking mechanism at the file level. See the
file layout below that Hive layout versus the Iceberg layout. As you can see in 
the next image, Iceberg makes no assumptions about the data being contiguous or 
not. It simply builds a persistent tree using the snapshot (S) location stored 
in the metadata, that points to the manifest list (ML), which points to 
manifests containing partitions (P). Finally, these manifest files contain the 
file (F) locations and stats that can quickly be used to prune data versus 
needing to do a list operation and scanning all the files.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/cloud-file-layout.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Referencing the picture above, if you were to run a query where the result set 
only contains rows from file F1, Hive would require a list operation and 
scanning the files, F2 and F3. In Iceberg, file metadata exists in the manifest 
file, P1, that would have a range on the predicate field that prunes out files 
F2 and F3, and only scans file F1. This example only shows a couple of files, 
but imagine storage that scales up to thousands of files! Listing becomes 
expensive on files that are not contiguously stored in memory. Having this 
flexibility in the logical layout is essential to increase query performance. 
This is especially true on cloud object stores.&lt;/p&gt;

&lt;p&gt;If you want to play around with Iceberg using Trino, check out the 
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot;&gt;Trino Iceberg docs&lt;/a&gt;. 
To avoid issues like the eventual consistency issue, as well as other problems 
of trying to sync operations across systems, Iceberg provides optimistic 
concurrency support, which is covered in more detail in
&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;the next post&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals The first post covered how Iceberg is a table format and not a file format It demonstrated the benefits of hidden partitioning in Iceberg in contrast to exposed partitioning in Hive. There really is no such thing as “exposed partitioning.” I just thought that sounded better than not-hidden partitioning. If any of that wasn’t clear, I recommend either that you stop reading now, or go back to the first post before starting this one. This post discusses evolution. No, the post isn’t covering Darwinian nor Pokémon evolution, but in-place table evolution!</summary>

      
      
    </entry>
  
    <entry>
      <title>Row pattern recognition with MATCH_RECOGNIZE</title>
      <link href="https://trino.io/blog/2021/05/19/row_pattern_matching.html" rel="alternate" type="text/html" title="Row pattern recognition with MATCH_RECOGNIZE" />
      <published>2021-05-19T00:00:00+00:00</published>
      <updated>2021-05-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/05/19/row_pattern_matching</id>
      <content type="html" xml:base="https://trino.io/blog/2021/05/19/row_pattern_matching.html">&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;version 356&lt;/a&gt;. With
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.&lt;/p&gt;

&lt;p&gt;This blog post gives you a taste of row pattern matching capabilities, and a
quick overview of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;a-regular-expression-and-a-table-a-fruitful-relationship&quot;&gt;A regular expression and a table: a fruitful relationship&lt;/h2&gt;

&lt;p&gt;The regex matching we all know is about searching for patterns in character
strings. But how does a regex match a sequence of rows? Certainly, a row of
data is a more complex structure than a character. And so, row pattern matching
is more expressive than regex matching in text. Unlike characters, which stay
constantly in their places in a string, rows aren’t assigned up-front to
pattern components. This is where the additional level of complexity comes
from: whether the row is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;, is conditional. It is revealed as
the pattern matching goes forward. It depends on the data in the row, but also
on the context of the current match and even on the match number. Also, a row
can match different labels at a time.&lt;/p&gt;

&lt;p&gt;Consider this simple example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PATTERN: A B+ C D?
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, let’s match it to the string &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;ABBCEE&quot;&lt;/code&gt;. There is exactly one way to
match it: the prefix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;ABBC&quot;&lt;/code&gt; is a match.&lt;/p&gt;

&lt;p&gt;Now, let’s see what it takes to match a pattern to rows of a table.
Consider the table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numbers&lt;/code&gt; with a single column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;number&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/table-numbers.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You need &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;defining conditions&lt;/code&gt; to define how the rows of the table can be
mapped to pattern components &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DEFINE:
    A &amp;lt;- true (matches every row)
    B &amp;lt;- number is greater than previous number
    C &amp;lt;- number is lower or equal to A
    D &amp;lt;- matches every row, but only in the first match;
         otherwise doesn&apos;t match any row
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, the conditions can refer to other pattern components (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
 depends on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;), or the sequential match number (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When searching for a match, the engine goes row by row, and assigns labels
according to the pattern. Every time the pattern shows the next component
(label) to be matched, the defining condition of that component is evaluated
for the current row in the context of the partial match.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/first-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;After finding a match, you can step one row forward and search for another one.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/second-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So far, two matches were found in the same set of rows. Interestingly, a row
that was labeled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt; in the first match, became &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt; in the second match.
Let’s try to find another match.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/third-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;time-to-get-more-technical&quot;&gt;Time to get more technical&lt;/h2&gt;

&lt;p&gt;…and use some real &lt;s&gt;life&lt;/s&gt; money examples.&lt;/p&gt;

&lt;p&gt;In the preceding examples, the pattern consisted of components &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;. They were chosen this way to capture the analogy between pattern
matching in a string and pattern matching in a set of rows. According to the
SQL specification, row pattern components can be named with arbitrary
identifiers, as long as they are compliant with the SQL identifier semantics,
so you don’t need to limit yourself to single-letter names, and instead you can
use more verbose labels.&lt;/p&gt;

&lt;p&gt;Officially, the pattern components, or labels, are called the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;primary pattern
variables&lt;/code&gt;. They are the basic components of the row pattern. Consider the
following example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PATTERN( START DOWN+ UP+ )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are three primary pattern variables: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;START&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP&lt;/code&gt;. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt; is
the “one or more” quantifier you know from the regex syntax. Intuitively, this
pattern should match a sequence of rows which are first “decreasing”, and then
“increasing”. You need to inform the engine how it should map rows to the
variables. In other words, you need to define what the “decreasing” and
“increasing” rows are:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DEFINE DOWN AS price &amp;lt; PREV(price),
       UP AS price &amp;gt; PREV(price)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now it’s clear that “decreasing” and “increasing” is about the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; values.
There is no defining condition for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;START&lt;/code&gt; variable, which informs the
engine that the match can start anywhere.&lt;/p&gt;

&lt;p&gt;The preceding example shows the two key clauses of row pattern recognition:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt;. Let’s see what other keywords there are in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOHNIZE&lt;/code&gt; clause.&lt;/p&gt;

&lt;h2 id=&quot;syntax-overview&quot;&gt;Syntax overview&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax is long and rich enough to capture everything that
a pattern matching tool needs, and all the options which let you easily toggle
your matching strategies.&lt;/p&gt;

&lt;p&gt;Technically, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; is part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ...
    FROM some_table
        MATCH_RECOGNIZE (
          [ PARTITION BY column [, ...] ]
          [ ORDER BY column [, ...] ]
          [ MEASURES measure_definition [, ...] ]
          [ rows_per_match ]
          [ AFTER MATCH skip_to ]
          PATTERN ( row_pattern )
          [ SUBSET subset_definition [, ...] ]
          DEFINE variable_definition [, ...]
          )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; can be used in the query as one of the stages of processing
data. You can &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; from its results or even stream them into another
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses are the heart of row pattern recognition.
They are also the only two required subclauses of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;. They were
touched upon in the previous section.&lt;/p&gt;

&lt;p&gt;The pattern syntax is close to regular expression syntax. It also supports some
extensions specific to row pattern recognition. They are explained in
&lt;a href=&quot;#pattern-syntax&quot;&gt;Row pattern syntax&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clauses are similar to those in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;
syntax. They help you structure the input data. You can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; to
break up your data into independent chunks. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; is useful to establish 
the order of rows before searching for the pattern. Typically, you want to
analyze series of events over time, so ordering by date is a good choice.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/partition-by-order-by.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause, you can specify what information you need about every
match that is found. In the example, if you’re interested in the order date,
the lowest value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; and the sequential number of the match, this is the
way to retrieve them:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;MEASURES order_date AS date,
         LAST(DOWN.price) AS bottom_price,
         MATCH_NUMBER() AS match_no
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bottom_price&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_no&lt;/code&gt; are exposed by the pattern recognition
clause as output columns.&lt;/p&gt;

&lt;p&gt;The expressions in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses allow you to combine the
input data with the information about the matched pattern. They support many
extensions and special constructs to help you get the most of your data, both
when defining the pattern, and retrieving useful information after a successful
match. The special keyword &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; is one example. For the full list of the
magic spells, check &lt;a href=&quot;#expressions&quot;&gt;Expressions for special tasks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; clause has two useful toggles. The first of them lets you
choose whether the output includes all rows of the match, or a single-row
summary. For all rows, specify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt;. For a single row, choose
the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ONE ROW PER MATCH&lt;/code&gt;. There are also sub-options available, enabling
different handling of empty matches and unmatched rows.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/rows-per-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Another toggle is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP&lt;/code&gt; clause. It allows you to specify where
the row pattern matching resumes after finding a match. The default option is
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP PAST LAST ROW&lt;/code&gt;, but you can also skip to the next row or to a
specific position in the match based on the matched pattern variables.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/after-match-skip.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SUBSET&lt;/code&gt; clause is where the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;union pattern variables&lt;/code&gt; are defined. They
are a concise way to refer to a group of primary pattern variables:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SUBSET U = (DOWN, UP)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The following expression returns the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; from the last row
matched either to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP&lt;/code&gt; primary variable:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;LAST(U.price)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;-row-pattern-syntax&quot;&gt;&lt;a name=&quot;pattern-syntax&quot;&gt;&lt;/a&gt; Row pattern syntax&lt;/h2&gt;

&lt;p&gt;The basic element of row pattern is the primary pattern variable. Other syntax
components include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concatenation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;A B C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Alternation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;A | B | C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Permutation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PERMUTE(A, B, C)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Grouping&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(A B C)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Partition start anchor&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;^
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Partition end anchor&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Empty pattern&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Exclusion syntax&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{- row_pattern -}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Exclusion syntax is useful in combination with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; option.
If you find some sections of the match uninteresting, you can wrap them in the
exclusion, and they are dropped from the output.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/exclusion.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quantifiers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Row pattern syntax supports all kinds of quantifiers: the basic ones &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;?&lt;/code&gt;, and others, which let you specify the exact number of repetitions, or the
accepted range: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n, m}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n,}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{,n}&lt;/code&gt;. Make sure you don’t confuse
those:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n}&lt;/code&gt; is for exactly n repetitions,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n,}&lt;/code&gt; is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n, ∞}&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{,n}&lt;/code&gt; is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{0, n}&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quantifiers are greedy by default. It means that they prefer higher number of
repetitions over lower number. If you want it the other way, you can change a
quantifier to reluctant by appending &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;?&lt;/code&gt; immediately after it. So, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(pattern)?&lt;/code&gt;
prefers a single match of the pattern, while &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(pattern)??&lt;/code&gt; would rather omit
the pattern altogether.&lt;/p&gt;

&lt;h3 id=&quot;match-preference&quot;&gt;Match preference&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; is supposed to produce at most one match starting from a
specific row. If there are more matches available, the winner is chosen based
on the order of preference. The greedy and reluctant quantifiers are one
example of preference. Other pattern components have their own rules:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;pattern alternation prefers the left-hand components to the right-hand ones.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;pattern permutation is equivalent to alternation of all permutations of its
components. If multiple matches are possible, the match is chosen based on the
lexicographical order established by the order of components in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERMUTE&lt;/code&gt;
list. For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERMUTE(A, B, C)&lt;/code&gt;, the preference of options goes as follows:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A B C&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A C B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B A C&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B C A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C A B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C B A&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;-expressions-for-special-tasks&quot;&gt;&lt;a name=&quot;expressions&quot;&gt;&lt;/a&gt; Expressions for special tasks&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; clause provides special expression syntax, available in
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses. Its purpose is to combine the input data
with the information about the match. The syntax includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Pattern variable references&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They allow referring to certain components of the match, for example
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN.price&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP.order_date&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Logical navigation operations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FIRST&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They allow you to navigate over the rows of a match based on the pattern
variables assigned to them. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST(DOWN.price, 3)&lt;/code&gt; navigates to the
last row labeled as “DOWN”, goes three occurrences of the “DOWN” label
backwards, and gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; value from that row. The default offset is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST(DOWN.price)&lt;/code&gt; gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; value from the last row labeled as “DOWN”.
If the logical navigation goes beyond the match bounds, the operation returns
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Physical navigation operations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PREV&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They let you navigate over the rows of the partition by a specified offset.
Physical navigations use logical navigations as the starting point. For
example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT(DOWN.price, 5)&lt;/code&gt; first navigates to the last row labeled as
“DOWN”. Starting from there, it goes five rows forward and gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
value from that row. In the preceding example, the logical navigation &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; is
implicit, but you can specify the nested logical navigation explicitly, for
example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT(FIRST(DOWN.price, 4), 5)&lt;/code&gt;. The default offset is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, which means
that the physical navigations by default go one row backwards, or one row
forward.&lt;/p&gt;

&lt;p&gt;The physical navigation can retrieve values beyond the match bounds. It gives
you great flexibility. For example, the defining conditions of pattern
variables can peek at the values ahead. Also, when computing row pattern
measures, you can refer to the wider context of the match.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CLASSIFIER&lt;/code&gt; function&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns the primary pattern variable associated with the row.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_NUMBER&lt;/code&gt; function&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns the sequential number of the match within the partition.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; keywords&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The expressions in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clause are evaluated when the pattern matching
is in progress. At each step, the engine only knows a part of the match. This
is the &lt;em&gt;running semantics&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The expressions of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause are evaluated when the match is
complete. The engine can see the whole match from the position of the final
row. This is the &lt;em&gt;final semantics&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;However, with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; option, when the match result is
processed row by row, you can choose either approach to compute the measures.
To do that, you can specify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; keyword before the logical
navigation operation, for example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING LAST(DOWN.price)&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL LAST(DOWN.price)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;running semantics&lt;/em&gt; is the default both in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MESAURES&lt;/code&gt;
clauses. Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; only applies to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;To sum up, here’s one complex measure expression combining different elements
of the special syntax:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/measure-example.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-cli-show-off-time&quot;&gt;Trino CLI show-off time!&lt;/h2&gt;

&lt;p&gt;Now, let’s see the whole machinery come to life. This is the same example data
that we used before, and the same goal: detect a “V”-shape of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
values over time for different customers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; WITH orders(customer_id, order_date, price) AS (VALUES
    (&apos;cust_1&apos;, DATE &apos;2020-05-11&apos;, 100),
    (&apos;cust_1&apos;, DATE &apos;2020-05-12&apos;, 200),
    (&apos;cust_2&apos;, DATE &apos;2020-05-13&apos;,   8),
    (&apos;cust_1&apos;, DATE &apos;2020-05-14&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-15&apos;,   4),
    (&apos;cust_1&apos;, DATE &apos;2020-05-16&apos;,  50),
    (&apos;cust_1&apos;, DATE &apos;2020-05-17&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-18&apos;,   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price &amp;lt; PREV(price),
                UP AS price &amp;gt; PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Two matches are detected, one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_1&lt;/code&gt;, and one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_2&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;empty-matches-explained&quot;&gt;Empty matches explained&lt;/h2&gt;

&lt;p&gt;An empty match is a legit result of row pattern recognition. There are
different pattern constructs that can result in an empty match. The empty
pattern syntax &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;()&lt;/code&gt; is the trivial one. Empty match can also result e.g. from
quantification: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A*&lt;/code&gt;, or alternation: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A | ()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;An empty match does not consume any input rows, but like every match, it is
associated with a row, called the &lt;em&gt;starting row&lt;/em&gt;. That is the row at which the
pattern matching started. Note that if the pattern allows an empty match, it
guarantees that no rows remain unmatched. Also, an empty match, as well as
non-empty matches, gets a sequential number, which can be retrieved by the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_NUMBER&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;Depending on your use case, you can consider empty matches informative or just
see them as a leftover of the algorithm.&lt;/p&gt;

&lt;p&gt;There’s one more thing linked to empty matches. Some patterns have the
dangerous potential of looping endlessly over a piece that doesn’t consume any
rows. It doesn’t have to be as explicit as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;()*&lt;/code&gt;. There are complex patterns
that don’t show their looping potential at first glance. We handled them
carefully so that you never have to waste your time on looping queries.&lt;/p&gt;

&lt;h2 id=&quot;in-a-few-words-whats-so-cool-about-row-pattern-matching&quot;&gt;In a few words, what’s so cool about row pattern matching?&lt;/h2&gt;

&lt;p&gt;From the SQL viewpoint, you can think of row pattern matching as extended
window functions. Window functions allow you to capture some dependencies in
rows of data based on their relative position or value. Row pattern matching
allows you to detect arbitrarily complicated dependencies, based not only on
the input values but also on the details of the actual match and on the match
number.&lt;/p&gt;

&lt;p&gt;Before the introduction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you had to feed your data to
external tools to reason about trends and patterns. Now, you can achieve it
directly in your query, and even build your query upon the pattern recognition
clause to further process the match results.&lt;/p&gt;

&lt;p&gt;Row pattern matching is typically used:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;in trade applications for tracking trends or identifying customers with
specific behavioral patterns,&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;in shipping applications for tracking packages through all possible valid
paths,&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;in financial applications for detecting unusual incidents, which might signal
fraud.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What’s your use case?&lt;/p&gt;

&lt;p&gt;I hope you enjoy Trino’s new feature. Refer to
&lt;a href=&quot;https://trino.io/docs/current/sql/match-recognize.html&quot;&gt;Trino docs&lt;/a&gt; for even
more details, examples and usage tips. &lt;a href=&quot;/slack.html&quot;&gt;Please &lt;strong&gt;do&lt;/strong&gt; reach out to us with any
questions or issues&lt;/a&gt;. We plan to support row pattern matching in
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause soon, so stay tuned!&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen (kasiafi)</name>
        </author>
      

      <summary>The MATCH_RECOGNIZE syntax was introduced in the latest SQL specification of 2016. It is a super powerful tool for analyzing trends in your data. We are proud to announce that Trino supports this great feature since version 356. With MATCH_RECOGNIZE, you can define a pattern using the well-known regular expression syntax, and match it to a set of rows. Upon finding a matching row sequence, you can retrieve all kinds of detailed or summary information about the match, and pass it on to be processed by the subsequent parts of your query. This is a new level of what a pure SQL statement can do. This blog post gives you a taste of row pattern matching capabilities, and a quick overview of the MATCH_RECOGNIZE syntax.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice I: A gentle introduction To Iceberg</title>
      <link href="https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html" rel="alternate" type="text/html" title="Trino on ice I: A gentle introduction To Iceberg" />
      <published>2021-05-03T00:00:00+00:00</published>
      <updated>2021-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Back in the &lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Gentle introduction to the Hive connector&lt;/a&gt; 
blog post, I discussed a commonly misunderstood architecture and uses of the 
Trino Hive connector. In short, while some may think the name indicates Trino 
makes a call to a running Hive instance, the Hive connector does not use the 
Hive runtime to answer queries. Instead, the connector is named Hive connector 
because it relies on Hive conventions and implementation details from the Hadoop
ecosystem - the invisible Hive specification.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;I call this specification invisible because it doesn’t exist. It lives in the 
Hive code and the minds of those who developed it. This is makes it very 
difficult for anybody else who has to integrate with any distributed object 
storage that uses Hive, since they had to rely on reverse engineering and 
keeping up with the changes. The way you interact with Hive changes based on 
&lt;a href=&quot;https://medium.com/hashmapinc/four-steps-for-migrating-from-hive-2-x-to-3-x-e85a8363a18&quot;&gt;which version of Hive or Hadoop&lt;/a&gt; 
you are running. It also varies if you are in the cloud or over an object store.
Spark has even &lt;a href=&quot;https://spark.apache.org/docs/2.4.4/sql-migration-guide-hive-compatibility.html&quot;&gt;modified the Hive spec&lt;/a&gt;
in some ways to fit the Hive model to their use cases. It’s a big mess that data 
engineers have put up with for years. Yet despite the confusion and lack of 
organization due to Hive’s number of unwritten assumptions, the Hive connector 
is the most popular connector in use for Trino. Virtually every big data query 
engine uses the Hive model today in some form. As a result it is used by 
numerous companies to store and access data in their data lakes.&lt;/p&gt;

&lt;p&gt;So how did something with no specification become so ubiquitous in data lakes? 
Hive was first in the large object storage and big data world as part of Hadoop.
Hadoop became popular from good marketing for Hadoop to solve the problems of 
dealing with the increase in data with the Web 2.0 boom . Of course, Hive didn’t
get everything wrong. In fact, without Hive, and the fact that it is open 
source, there may not have been a unified specification at all. Despite the many
hours data engineers have spent bashing their heads against the wall with all 
the unintended consequences of Hive, it still served a very useful purpose.&lt;/p&gt;

&lt;p&gt;So why did I just rant about Hive for so long if I’m here to tell you about 
&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;? It’s impossible for a teenager 
growing up today to truly appreciate music streaming services without knowing 
what it was like to have an iPod with limited storage, or listening to a 
scratched burnt CD that skips, or flipping your tape or record to side-B. The 
same way anyone born before the turn of the millennium really appreciates 
streaming services, so you too will appreciate Iceberg once you’ve learned the 
intricacies of managing a data lake built on Hive and Hadoop.&lt;/p&gt;

&lt;p&gt;If you haven’t used Hive before, this blog post outlines just a few pain points 
that come from this data warehousing software to give you proper context. If you have already
lived through these headaches, this post acts as a guide to Iceberg from 
Hive. This post is the first in a series of blog posts discussing Apache Iceberg in 
great detail, through the lens of the Trino query engine user. If you’re not 
aware of Trino (formerly PrestoSQL) yet, it is the project that houses the 
founding Presto community after the 
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;founders of Presto left Facebook&lt;/a&gt;.
This and the next couple of posts discuss the Iceberg specification and all
the features Iceberg has to offer, many times in comparison with Hive.&lt;/p&gt;

&lt;p&gt;Before jumping into the comparisons, what is Iceberg exactly? The first thing to
understand is that Iceberg is not a file format, but a table format. It may not
be clear what this means by just stating that, but the function of a table 
format becomes clearer as the improvements Iceberg brings from the Hive table 
standard materialize. Iceberg doesn’t replace file formats like ORC and Parquet,
but is the layer between the query engine and the data. Iceberg maps and indexes
the files in order to provide a higher level abstraction that handles the 
relational table format for data lakes. You will understand more about table 
formats through examples in this series.&lt;/p&gt;

&lt;h2 id=&quot;hidden-partitions&quot;&gt;Hidden Partitions&lt;/h2&gt;

&lt;h3 id=&quot;hive-partitions&quot;&gt;Hive Partitions&lt;/h3&gt;

&lt;p&gt;Since most developers and users interact with the table format via the query 
language, a noticeable difference is the flexibility you have while creating a 
partitioned table. Assume you are trying to create a table for tracking events 
occurring in our system. You run both sets of SQL commands from Trino, just 
using the Hive and Iceberg connectors which are designated by the catalog name 
(i.e. the catalog name starting with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.&lt;/code&gt; uses the Hive connector, while the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg.&lt;/code&gt; table uses the Iceberg connector). To begin with, the first DDL 
statement attempts to create an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;events&lt;/code&gt; table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;logging&lt;/code&gt; schema in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt; catalog, which is configured to use the Hive connector. Trino also 
creates a partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;events&lt;/code&gt; table using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field which is a
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP&lt;/code&gt; field.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running this in Trino using the Hive connector produces the following error message.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Partition keys must be the last columns in the table and in the same order as the table properties: [event_time]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Hive DDL is very dependent on ordering for columns and specifically 
partition columns. Partition fields must be located in the final column 
positions and in the order of partitioning in the DDL statement. The next 
statement attempts to create the same table, but now with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field 
moved to the last column position.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time TIMESTAMP
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This time, the DDL command works successfully, but you likely don’t want to
partition your data on the plain timestamp. This results in a separate file for 
each distinct timestamp value in your table (likely almost a file for each 
event). In Hive, there’s no way to indicate the time granularity at which you 
want to partition natively. The method to support this scenario with Hive is to
create a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; column, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time_day&lt;/code&gt; that is dependent on the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; column to create the date partition value.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time_day VARCHAR
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time_day&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This method wastes space by adding a new column to your table. Even worse,
it puts the burden of knowledge on the user to include this new column for 
writing data. It is then necessary to use that separate column for any read 
access to take advantage of the performance gains from the partitioning.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO hive.logging.events
VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:00:00.000001&apos;,
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;], 
  &apos;2021-04-01&apos;
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-02 15:55:55.555555&apos;,
  &apos;Double oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;],
  &apos;2021-04-02&apos;
),
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-02 00:00:11.1122222&apos;,
  &apos;Maybeh oh noes?&apos;,
  ARRAY [&apos;Bad things could be happening??&apos;], 
  &apos;2021-04-02&apos;
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice that the last partition value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;2021-04-01&apos;&lt;/code&gt; has to match the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP&lt;/code&gt; 
date during insertion. There is no validation in Hive to make sure this is 
happening because it only requires a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; and knows to partition based on 
different values.&lt;/p&gt;

&lt;p&gt;On the other hand, If a user runs the following query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM hive.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;they get the correct results back, but have to scan all the data in the table:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This happens because the user forgot to include the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time_day &amp;lt; &apos;2021-04-02&apos;&lt;/code&gt; predicate in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; 
clause. This eliminates all the benefits that led us to create the partition in
the first place and yet frequently this is missed by the users of these tables.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM hive.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos; 
AND event_time_day &amp;lt; &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;iceberg-partitions&quot;&gt;Iceberg Partitions&lt;/h3&gt;

&lt;p&gt;The following DDL statement illustrates how these issues are handled in Iceberg
via the Trino Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6),
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  partitioning = ARRAY[&apos;day(event_time)&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Taking note of a few things. First, notice the partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; 
column that is defined without having to move it to the last position. There 
is also no need to create a separate field to handle the daily partition on the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field. The &lt;em&gt;&lt;strong&gt;partition specification&lt;/strong&gt;&lt;/em&gt; is maintained internally
by Iceberg, and neither the user nor the reader of this table needs to know 
anything about the partition specification to take advantage of it. This concept
is called &lt;em&gt;&lt;strong&gt;hidden partitioning&lt;/strong&gt;&lt;/em&gt; , where only the table creator/maintainer 
has to know the &lt;em&gt;&lt;strong&gt;partitioning specification&lt;/strong&gt;&lt;/em&gt;. Here is what the insert 
statements look like now:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO iceberg.logging.events
VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:00:00.000001&apos;,
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-02 15:55:55.555555&apos;,
  &apos;Double oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]),
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-02 00:00:11.1122222&apos;,
  &apos;Maybeh oh noes?&apos;,
  ARRAY [&apos;Bad things could be happening??&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; dates are no longer needed. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field is 
internally converted to the proper partition value to partition each row. Also,
notice that the same query that ran in Hive returns the same results. The big 
difference is that it doesn’t require any extra clause to indicate to filter 
partition as well as filter the results.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM iceberg.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So hopefully that gives you a glimpse into what a table format and specification
are, and why Iceberg is such a wonderful improvement over the existing and 
outdated method of storing your data in your data lake. While this post covers
a lot of aspects of Iceberg’s capabilities, this is just the tip of the Iceberg…&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/see_myself_out.gif&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;If you want to play around with Iceberg using Trino, check out the 
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot;&gt;Trino Iceberg docs&lt;/a&gt;.
The next post covers how table evolution works in Iceberg, as well as, how 
Iceberg is an improved storage format for cloud storage.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals Back in the Gentle introduction to the Hive connector blog post, I discussed a commonly misunderstood architecture and uses of the Trino Hive connector. In short, while some may think the name indicates Trino makes a call to a running Hive instance, the Hive connector does not use the Hive runtime to answer queries. Instead, the connector is named Hive connector because it relies on Hive conventions and implementation details from the Hadoop ecosystem - the invisible Hive specification.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2021/04/21/the-definitive-guide.html" rel="alternate" type="text/html" title="Trino: The Definitive Guide" />
      <published>2021-04-21T00:00:00+00:00</published>
      <updated>2021-04-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/04/21/the-definitive-guide</id>
      <content type="html" xml:base="https://trino.io/blog/2021/04/21/the-definitive-guide.html">&lt;p&gt;Just over a year ago we &lt;a href=&quot;https://trino.io/blog/2020/04/11/the-definitive-guide.html&quot;&gt;announced the availability of the first book about
Trino&lt;/a&gt; - our
definitive guide. Back then the project was still called Presto, and the rename
with the end of 2020 was a good reason for us to give the book a refresh.&lt;/p&gt;

&lt;p&gt;Today, we are happy to announce that a new edition now titled &lt;strong&gt;Trino: The
Definitive Guide&lt;/strong&gt; is available.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-of-trino-the-definitive-guide-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy of Trino: The Definitive Guide&lt;/a&gt; from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; now!&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/ttdg-cover.png&quot; align=&quot;right&quot; style=&quot;float: right; margin-left: 20px; margin-bottom: 20px; width: 100%; max-width: 350px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The new edition of the book from O’Reilly is available in digital formats
as well as physical copies. You can find more information about the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our
permanent page about it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The book is now updated to Trino release 354 for all filenames, installation
methods, command, names and properties. We addressed all problems found by our
readers and reported to us as well.&lt;/p&gt;

&lt;p&gt;Our major supporter, &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;, allowed us to work
on the book and bring it across the finish line again. You can get a
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free digital copy from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;updated example code
repository&lt;/a&gt;,
provide feedback and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to it all!&lt;/p&gt;

&lt;p&gt;Matt, Manfred and Martin&lt;/p&gt;</content>

      
        <author>
          <name>Matt Fuller, Manfred Moser and Martin Traverso</name>
        </author>
      

      <summary>Just over a year ago we announced the availability of the first book about Trino - our definitive guide. Back then the project was still called Presto, and the rename with the end of 2020 was a good reason for us to give the book a refresh. Today, we are happy to announce that a new edition now titled Trino: The Definitive Guide is available. Get a free copy of Trino: The Definitive Guide from Starburst now!</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino at Writing Day</title>
      <link href="https://trino.io/blog/2021/04/14/wtd-writing-day.html" rel="alternate" type="text/html" title="Trino at Writing Day" />
      <published>2021-04-14T00:00:00+00:00</published>
      <updated>2021-04-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/04/14/wtd-writing-day</id>
      <content type="html" xml:base="https://trino.io/blog/2021/04/14/wtd-writing-day.html">&lt;p&gt;First time Trino blogger, long time lurker on the Trino slack. My name is 
&lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt; and I’m an open source docs enthusiast! 
I’ve had the pleasure of contributing to this community for the past few months. 
Recently I’ve been working with &lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Brian Olsen&lt;/a&gt;, our fearless 
developer advocate, as well as some of our other Trino doc contributors, to get 
Trino ready for the Write the Docs &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt; open source event!&lt;/p&gt;

&lt;p&gt;If you’re not familiar with &lt;a href=&quot;https://www.writethedocs.org&quot;&gt;Write the Docs&lt;/a&gt;, it’s
a global community of people who care about documentation.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“We consider everyone who cares about communication, documentation, and their
users to be a member of our community. This can be programmers, tech writers,
developer advocates, customer support, marketers, and anyone else who wants
people to have great experiences with software.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt; is
the first day of their upcoming virtual documentation conference, &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/&quot;&gt;Write the
Docs Portland (PST)&lt;/a&gt; April
25-27, 2021. The goal of Writing Day is to get a bunch of interesting people in
a room together and introduce them to cool open source projects that they can
onboard and contribute to.&lt;/p&gt;

&lt;p&gt;Writing Day is open to all conference attendees and several Trino enthusiasts are
attending as mentors. Leading up to the conference, we’re focused on identifying
docs issues that are ideal for first time contributors. If you’re a regular
Trino contributor, you might notice that we’re going through and tagging items
as “good first issue” and “docs” - we’ll be using those tags to create an 
&lt;a href=&quot;https://github.com/trinodb/trino/issues?q=is%3Aopen+label%3Adocs+label%3A%22good+first+issue%22&quot;&gt;issues filter&lt;/a&gt; 
for the event. We’re also doing some work on the Trino docs readme to
help folks onboard faster.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/tickets/&quot;&gt;Snag a ticket&lt;/a&gt; if
you’re interested in participating, we hope to see you there! Our goal is to
continue curating good first issues for future writers and developers.&lt;/p&gt;

&lt;p&gt;Join the new &lt;a href=&quot;https://trinodb.slack.com/messages/C01TEP0HJTH&quot;&gt;#documentation channel&lt;/a&gt; 
on the &lt;a href=&quot;./slack.html&quot;&gt;Trino slack&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/trinodb/trino/stargazers&quot;&gt;favorite the Trino project&lt;/a&gt; on GitHub.&lt;/p&gt;

&lt;p&gt;If you’re interested in learning more about &lt;a href=&quot;https://www.writethedocs.org&quot;&gt;Write the Docs&lt;/a&gt; 
or &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt;, 
feel free to reach out to me (&lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt;), 
&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Brian Olsen&lt;/a&gt;, or 
&lt;a href=&quot;https://twitter.com/mosabua&quot;&gt;Manfred Moser&lt;/a&gt; on twitter or the &lt;a href=&quot;./slack.html&quot;&gt;Trino slack&lt;/a&gt;. You 
can also check out the Write the Docs &lt;a href=&quot;https://www.writethedocs.org/slack/&quot;&gt;slack community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have an open source project that you’re interested in bringing to Writing
Day, chat with me, &lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt;, on twitter or 
on the Trino or Write the Doc slack communities.&lt;/p&gt;</content>

      
        <author>
          <name>Rose Williams (she/her)</name>
        </author>
      

      <summary>First time Trino blogger, long time lurker on the Trino slack. My name is Rose Williams and I’m an open source docs enthusiast! I’ve had the pleasure of contributing to this community for the past few months. Recently I’ve been working with Brian Olsen, our fearless developer advocate, as well as some of our other Trino doc contributors, to get Trino ready for the Write the Docs Writing Day open source event!</summary>

      
      
    </entry>
  
    <entry>
      <title>Introducing new window features</title>
      <link href="https://trino.io/blog/2021/03/10/introducing-new-window-features.html" rel="alternate" type="text/html" title="Introducing new window features" />
      <published>2021-03-10T00:00:00+00:00</published>
      <updated>2021-03-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/03/10/introducing-new-window-features</id>
      <content type="html" xml:base="https://trino.io/blog/2021/03/10/introducing-new-window-features.html">&lt;p&gt;In Trino, we are thrilled to get feedback and feature requests from our
fantastic community, and we’re tirelessly motivated to meet the expectations!
The SQL specification is another source of inspiration. From time to time, we
go through those encrypted scrolls to give you a new feature that you didn’t
even know you needed!&lt;/p&gt;

&lt;p&gt;Recently, there was a push in Trino to extend support for window functions.
In this post, we explain the complexities of window function, and describe a
couple of our recent additions. If “window” doesn’t sound familiar, read on.
Already a window expert? Skip to &lt;a href=&quot;#new features&quot;&gt;what’s new&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A window is the structure you run your window function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OVER&lt;/code&gt;. It has three
components:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;partitioning&lt;/li&gt;
  &lt;li&gt;ordering&lt;/li&gt;
  &lt;li&gt;frame&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You use partitioning to break your input data into independent chunks. Ordering
is to order rows within the partition. And frame is a kind of “sliding window”.
For every processed row, the frame encloses a certain portion of the sorted
partition. Your window function processes this portion and yields the result
for the row.&lt;/p&gt;

&lt;p&gt;A “running average” is one simple example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For a particular customer identified by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt;, it sorts their orders by
date and computes a sequence of average prices since the beginning up to each
consecutive entry. The window frame for a row includes all rows from the start
up to and including that row.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/running-average.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;According to standard SQL, there are 3 ways to specify the frame. The first way
is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; (like in the example). With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt;, you can specify frame bounds by a
physical offset from the current row. While &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW&lt;/code&gt; means “between the beginning of the partition and the current
row”, you can also specify precisely where the frame starts and ends, for
example with: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS BETWEEN 10 PRECEDING AND 5 FOLLOWING&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; is a more complicated way of defining frame on ordered data. It does
not rely on physical offset (in rows), but on logical offset (in value). That
is, the frame includes rows where the value is within a certain range from the
value in the current row.&lt;/p&gt;

&lt;p&gt;Until recently, Trino only supported &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; in limited cases.
You could use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE UNBOUNDED PRECEDING&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT ROW&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED
FOLLOWING&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED PRECEDING&lt;/code&gt; includes all rows since the partition start,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED FOLLOWING&lt;/code&gt; includes all rows until the partition end,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT ROW&lt;/code&gt; is trickier. It includes all rows where values of the sort key
are the same as in the current row. We call them a &lt;em&gt;peer group&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s time to introduce the first new feature:&lt;/p&gt;

&lt;h2 id=&quot;-full-support-for-frame-type-range&quot;&gt;&lt;a name=&quot;new features&quot;&gt;&lt;/a&gt; Full support for frame type RANGE&lt;/h2&gt;

&lt;p&gt;Since &lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;version 346&lt;/a&gt;, it is
possible to specify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; with an offset value. The frame includes all rows
whose value is within this range from the current row.&lt;/p&gt;

&lt;p&gt;Let’s modify our example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, for every row, we get the average price from the preceding month. Note that
the offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;interval &apos;1&apos; month&lt;/code&gt; applies to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orderdate&lt;/code&gt;, which is the sorting
column.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/running-average-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Of course, we don’t have to order by date. The sorting column can be of any
numeric or date/time type, and the offset must be compatible. Also, the offset
doesn’t have to be a literal. It can come in another column of a table or,
generally, it can be any expression, as long as the type matches.&lt;/p&gt;

&lt;p&gt;A frame of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; does not quite fit in the abstraction of a “sliding
window”. Frames can be bigger or smaller depending not only on the offset
values but also on the actual input data. A long series of similar entries can
produce a huge frame, while a gap in input values can result in an empty frame.&lt;/p&gt;

&lt;p&gt;For illustration, imagine a group of students, and the results of some test they
took. Our table has two columns: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_id&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result&lt;/code&gt;, which is the number
of points. For each student, let’s find how many students did better by 1 to 2
points:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH students_results(student_id, result) AS (VALUES
    (&apos;student_1&apos;, 17),
    (&apos;student_2&apos;, 16),
    (&apos;student_3&apos;, 18),
    (&apos;student_4&apos;, 18),
    (&apos;student_5&apos;, 10),
    (&apos;student_6&apos;, 20),
    (&apos;student_7&apos;, 16))
SELECT
    student_id,
    result,
    count(*) OVER (
        ORDER BY result
        RANGE BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS close_better_scores_count
FROM students_results;

 student_id | result | close_better_scores_count
------------+--------+---------------------------
 student_5  |     10 |                         0
 student_7  |     16 |                         3
 student_2  |     16 |                         3
 student_1  |     17 |                         2
 student_3  |     18 |                         1
 student_4  |     18 |                         1
 student_6  |     20 |                         0
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the frame does not contain the current row. For a particular student,
it only includes students with better results, and not themselves. For the
unfortunate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_5&lt;/code&gt;, there are no students with similar test results. The
frame is also empty for the lucky &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_6&lt;/code&gt; who scored the most points.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/students-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Besides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt;, there is another way to specify the frame on
ordered data. And yes, Trino supports this mechanism! Let me introduce the
second of our recent additions:&lt;/p&gt;

&lt;h2 id=&quot;support-for-frame-type-groups&quot;&gt;Support for frame type GROUPS&lt;/h2&gt;

&lt;p&gt;This feature, added in
&lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;version 346&lt;/a&gt;, allows you to
include or exclude the whole &lt;em&gt;peer groups&lt;/em&gt; of rows in ordered data.&lt;/p&gt;

&lt;p&gt;For illustration, let’s consider again the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;students_results&lt;/code&gt; table. For each
student, let’s find the gap between their result and the result of a student (or
students) who did slightly better.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH students_results(student_id, result) AS (VALUES
    (&apos;student_1&apos;, 17),
    (&apos;student_2&apos;, 16),
    (&apos;student_3&apos;, 18),
    (&apos;student_4&apos;, 18),
    (&apos;student_5&apos;, 10),
    (&apos;student_6&apos;, 20),
    (&apos;student_7&apos;, 16))
SELECT
    student_id,
    result,
    max(result) OVER (
        ORDER BY result
        GROUPS BETWEEN CURRENT ROW AND 1 FOLLOWING) - result AS gap_till_better_score
FROM students_results;

 student_id | result | gap_till_better_score
------------+--------+-----------------------
 student_5  |     10 |                     6
 student_7  |     16 |                     1
 student_2  |     16 |                     1
 student_1  |     17 |                     1
 student_3  |     18 |                     2
 student_4  |     18 |                     2
 student_6  |     20 |                     0
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The window function for each student returns the closest better result. The
frame of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt; used here, includes all entries equal to the current
entry in terms of points (that is the student’s &lt;em&gt;peer group&lt;/em&gt;), and the next
group.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/students-groups.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In frames of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt;, like in other frame types, the offset doesn’t have
to be constant. It can be any expression, as long as its type is exact numeric
with scale 0. Simply put, we can skip any integer number of groups.&lt;/p&gt;

&lt;h3 id=&quot;under-the-covers&quot;&gt;Under the covers&lt;/h3&gt;

&lt;p&gt;How do we deal with finding the frame bounds effectively? With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; it’s easy.
We only need to skip a determined number of rows forward or backwards.&lt;/p&gt;

&lt;p&gt;With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt;, we need to examine the actual values to see if they fall within
the given range. Our approach is optimized for the case where the offset values
are constant for all rows. Our solution involves caching frame bounds computed
for the preceding row, and using them as the starting point to find frame
bounds for the current row. Ideally, we never have to move the frame bounds
back as we process subsequent rows. In such a case, the amortized cost of frame
bound calculations per row is constant.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/sliding-frame-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Our strategy for determining frame bounds for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt; is similar. We cache the
frame bounds computed for the preceding row and use them as the starting point
for the current row. If the frame offset is constant, frame bounds slide from
one peer group to another every time the processed row leaves one peer group and
enters the next one.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/sliding-frame-groups.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;support-for-window-clause&quot;&gt;Support for WINDOW clause&lt;/h2&gt;

&lt;p&gt;As all the preceding examples show, a window function is a big chunk of syntax.
What if we wanted to use several window functions over the same window? Say, we
need an average price and a total price from the preceding month. And the top
price. Does it have to look like the below?&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
    avg(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW),
    sum(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW),
    max(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Well, no more. Starting with
&lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;Trino 352&lt;/a&gt;, you can
predefine a window specification, and then use it or redefine it wherever you
need. This is thanks to the third of our new additions: support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;
clause.&lt;/p&gt;

&lt;p&gt;Technically speaking, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause is part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT …
    FROM …
        WHERE …
        GROUP BY …
        HAVING …
        WINDOW …
ORDER BY …
OFFSET …
LIMIT / FETCH …
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause, you can define any number of named windows. Then you
can simply refer to them by their names in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; list or an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;
clause.&lt;/p&gt;

&lt;p&gt;Let’s check how the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause helps with our example query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
	avg(totalprice) OVER w,
	sum(totalprice) OVER w,
	max(totalprice) OVER w
FROM orders
WINDOW w AS (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To be even more concise, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause allows you to define more
specialized windows from existing window definitions:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WINDOW 
	w1 AS (PARTITION BY custkey),
	w2 AS (w1 ORDER BY orderdate),
	w3 AS (w2 RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Alternatively you can define the window only partially and then complete it
where it’s used:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
	avg(totalprice) OVER (w ROWS BETWEEN 10 PRECEDING AND CURRENT ROW) AS recent_average,
	sum(totalprice) OVER (w ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) AS next_buys,
FROM orders
    WINDOW w AS (PARTITION BY custkey ORDER BY orderdate)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are some ANSI rules, though, you need to follow when redefining windows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; is only allowed in the base definition,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; can only be specified once in the named windows reference chain,&lt;/li&gt;
  &lt;li&gt;frame can only be specified in the final definition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In case you wonder, there’s no need to worry if some predefined windows are
eventually unused. Unused windows do not affect the efficiency of your query
execution. Partitioning, sorting and frame bound computations are costly
operations. That’s why we made sure that unused window parts do not appear in
the query plan.&lt;/p&gt;

&lt;p&gt;There’s one last detail about the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause that needs clarification. The
columns referenced in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause are columns of the input table. In the
following example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; is clearly a column of the table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;countries&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;... FROM countries WINDOW w AS (ORDER BY country_code)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Obvious enough. Why am I telling this?&lt;/p&gt;

&lt;p&gt;Window functions can be used in two different clauses of a query, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;. With the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause, there is a rule that column references
used there refer to the output table rather than the input table. Consider this
query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH countries(country_code) AS (VALUES &apos;pol&apos;, &apos;CAN&apos;, &apos;USA&apos;)
SELECT upper(country_code) AS country_code
    FROM countries
    WINDOW w AS (ORDER BY country_code)
ORDER BY row_number() OVER w
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Window &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;w&lt;/code&gt; is used in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause. So, does the window’s ordering use
the original &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; column from the input table, or does it “see” the
uppercased &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; from the output table?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/country-code.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The SQL spec is clear about it: a column reference in the named window always
refers to the original column, no matter where you use this window. In the
example, the result is ordered according to the original values: lowercase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pol&lt;/code&gt;
after uppercase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;USA&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/country-code-result.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As expected:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; country_code
--------------
 CAN
 USA
 POL
(3 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here the story ends. Thanks for your attention! I hope you enjoy Trino’s
new superpowers. In case of questions or issues — &lt;a href=&quot;/slack.html&quot;&gt;you
know where to find us&lt;/a&gt;. More goodies are on the way, so stay tuned! How
about regex matching on tables?&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen (kasiafi)</name>
        </author>
      

      <summary>In Trino, we are thrilled to get feedback and feature requests from our fantastic community, and we’re tirelessly motivated to meet the expectations! The SQL specification is another source of inspiration. From time to time, we go through those encrypted scrolls to give you a new feature that you didn’t even know you needed!</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino in 2020 - An amazing year in review</title>
      <link href="https://trino.io/blog/2021/01/08/2020-review.html" rel="alternate" type="text/html" title="Trino in 2020 - An amazing year in review" />
      <published>2021-01-08T00:00:00+00:00</published>
      <updated>2021-01-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/01/08/2020-review</id>
      <content type="html" xml:base="https://trino.io/blog/2021/01/08/2020-review.html">&lt;p&gt;&lt;strong&gt;Wow!&lt;/strong&gt; If you would have to sum up what happened in the last year in this
great community, &lt;strong&gt;wow&lt;/strong&gt; would be it. It is truly awe-inspiring to be part of
this incredible journey of Trino. Oh yeah, on that note. Our community and
project &lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;chose the new name Trino&lt;/a&gt;,
to be able to continue to innovate and develop freely as a community of peers.
Presto® and Presto® SQL are a thing of the past.&lt;/p&gt;

&lt;p&gt;Now that is out of the way, let’s dive right in and see what all our community
members across the globe have created with us!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2020/01/01/2019-summary.html&quot;&gt;2019 was a big year for us&lt;/a&gt;, but check
out how 2020 eclipsed even that!&lt;/p&gt;
&lt;h2 id=&quot;by-the-numbers&quot;&gt;By the numbers&lt;/h2&gt;

&lt;p&gt;Even the size and growth of &lt;a href=&quot;/slack.html&quot;&gt;our community on Slack&lt;/a&gt; is impressive:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Started in January 2020 with ~1600 members and 280 weekly active&lt;/li&gt;
  &lt;li&gt;Over 3200 members by December 2020&lt;/li&gt;
  &lt;li&gt;560 members active weekly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The innovation and change of &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the source code on GitHub&lt;/a&gt; is a result of the hard work of the community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Over 4000 commits merged&lt;/li&gt;
  &lt;li&gt;More than 2800 pull requests received&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/release.html#releases-2020&quot;&gt;23 releases&lt;/a&gt;, nearly every two
weeks basically!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can see, much of the excitement around the name change has quickly
increased the number of stars we have on GitHub. While some of this certainly
stems from an initial buzz around a shiny new name, we also believe that this
name change has brought clarity to the community. Trino is an improved version,
supported by the founders and creators of Presto®, along with the major
contributors.&lt;/p&gt;

&lt;p&gt;And if you have not done so already, make sure to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;star the
repository&lt;/a&gt; and &lt;a href=&quot;/slack.html&quot;&gt;join us on slack&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;features-and-code&quot;&gt;Features and code&lt;/h2&gt;

&lt;p&gt;While everything mentioned is already exciting, the true work is visible in the
new features and improvements in Trino. It is a long list, but read on. You
won’t want to miss anything.&lt;/p&gt;

&lt;h3 id=&quot;improvements-to-ansi-sql-support&quot;&gt;Improvements to ANSI SQL support&lt;/h3&gt;

&lt;p&gt;A core feature of Trino is the ability to use the same standard SQL for any
connected data source. These improvements empower all users.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Variable-precision temporal types, with precision down to picoseconds
(10&lt;sup&gt;−12&lt;/sup&gt;s). This a very important feature for any time critical
systems such as financial transactions processing&lt;/li&gt;
  &lt;li&gt;Correct, and now SQL specification compliant timestamp semantics, making
migration of SQL statements from other compliant systems such as many RDBMSs
easier&lt;/li&gt;
  &lt;li&gt;Implicit coercions for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; clause&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt;-based window frames&lt;/li&gt;
  &lt;li&gt;More support for various shapes of correlated subqueries&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTERSECT ALL&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXCEPT ALL&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Parameter support in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; clause&lt;/li&gt;
  &lt;li&gt;Experimental support for &lt;a href=&quot;/docs/current/sql/select.html?highlight=recursive#with-recursive-clause&quot;&gt;recursive queries&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Enforcement of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT NULL&lt;/code&gt; constraints when inserting data&lt;/li&gt;
  &lt;li&gt;Quantified comparisons (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt; ALL (...)&lt;/code&gt;) in aggregation queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-query-improvements&quot;&gt;Other query improvements&lt;/h3&gt;

&lt;p&gt;A number of other features were added to make querying your data sources with
Trino even more powerful:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/language/types.html#t-digest&quot;&gt;T-digest data type&lt;/a&gt; and functions
for approximate quantile computations&lt;/li&gt;
  &lt;li&gt;Support for setting and reading column comments&lt;/li&gt;
  &lt;li&gt;Numerous new functions including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;concat_ws()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regexp_count()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regexp_position()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;contains_sequence()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;murmur3()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_unixtime_nanos()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_iso8601_timestamp_nanos()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;human_readable_seconds()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bitwise&lt;/code&gt; operations, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;luhn_check()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_most_frequent()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;translate()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;starts_with()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;Trino is already &lt;a href=&quot;/index.html&quot;&gt;ludicrously fast&lt;/a&gt;. But then again, even faster is
better, so we worked on that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved pushdown of complex operations into connectors, including
&lt;a href=&quot;/docs/current/optimizer/pushdown.html&quot;&gt;aggregation pushdown&lt;/a&gt; and TopN
pushdown.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;Dynamic filtering and partition pruning&lt;/a&gt;, which can improve performance of
highly selective joins manyfold.&lt;/li&gt;
  &lt;li&gt;Cost-based decisions for queries containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN &amp;lt;subquery&amp;gt;&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Information_schema performance improvements, which benefit third-party BI
tools that need to inspect table metadata, for example DBeaver, Datagrip,
Power BI, Tableau, Looker, and others.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/14/dereference-pushdown.html&quot;&gt;Faster queries on nested data in Parquet and ORC&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Faster and more accurate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_percentile&lt;/code&gt;, based on t-digest data structure.&lt;/li&gt;
  &lt;li&gt;Support of Bloom filters in ORC.&lt;/li&gt;
  &lt;li&gt;Experimental, optimized Parquet writer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;The more data you access with Trino, the more it becomes critical to secure it.
With that in mind we added a lot of improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Web UI&lt;/a&gt; now requires
authentication. Various actions such as viewing query details, killing
queries, etc., are protected with authorization checks based on the identity
of the user. Additionally, the UI now supports OAuth2 for user identification.&lt;/li&gt;
  &lt;li&gt;External and internal APIs are now properly secured with authentication and
authorization checks. Importantly, this fixes a &lt;a href=&quot;https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-15087&quot;&gt;CVE reported
vulnerability&lt;/a&gt;
that affects all older versions of Presto®.&lt;/li&gt;
  &lt;li&gt;A &lt;a href=&quot;/docs/current/security/secrets.html&quot;&gt;new mechanism to externalize secrets in configuration
 files&lt;/a&gt; that makes it easier to integrate
 with third-party secret managers and deployment tools.&lt;/li&gt;
  &lt;li&gt;Support for JSON Web Key (JWK) authentication and &lt;a href=&quot;/docs/current/develop/certificate-authenticator.html&quot;&gt;pluggable certificate
authenticators&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Add new &lt;a href=&quot;docs/current/security/salesforce.html&quot;&gt;Salesforce authenticator&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The query engine and access control SPIs now support injecting row filters and
column masks.&lt;/li&gt;
  &lt;li&gt;New syntax for managing permissions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT/REVOKE&lt;/code&gt; on schema,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE/SCHEMA/VIEW ... SET AUTHORIZATION&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;data-sources&quot;&gt;Data sources&lt;/h2&gt;

&lt;p&gt;Trino empowers you to use one platform to access all data sources. Connectors
enable this and we added numerous new connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/iceberg.html&quot;&gt;Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/prometheus.html&quot;&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/oracle.html&quot;&gt;Oracle&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/pinot.html&quot;&gt;Pinot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/druid.html&quot;&gt;Druid&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/bigquery.html&quot;&gt;BigQuery&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/memsql.html&quot;&gt;MemSQL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All other connectors received a large host of improvements. Let’s just look at
two popular connectors:&lt;/p&gt;

&lt;h3 id=&quot;hive-connector-for-hdfs-s3-azure-and-cloud-object-storage-systems&quot;&gt;Hive connector for HDFS, S3, Azure and cloud object storage systems&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Complex Hive views, allows integration with Hive or simplifying
migration from Hive&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;ACID transactional tables&lt;/a&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; support&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive-caching.html&quot;&gt;Built-in storage caching&lt;/a&gt; and
support for &lt;a href=&quot;/docs/current/connector/hive-alluxio.html&quot;&gt;external caching with
Alluxio&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;New procedures: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.drop_stats()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_partition()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_partition()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-azure.html&quot;&gt;Azure object storage&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-s3.html&quot;&gt;S3 encrypted files, flexible S3 security mappings and
Intelligent-Tiering S3 storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elasticsearch-connector&quot;&gt;Elasticsearch connector&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch connector&lt;/a&gt;
received numerous powerful improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Password authentication&lt;/li&gt;
  &lt;li&gt;Support for index aliases&lt;/li&gt;
  &lt;li&gt;Support for array types, Nested, and IP type&lt;/li&gt;
  &lt;li&gt;Support for Elasticsearch 7.x&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h2&gt;

&lt;p&gt;Operating and maintaining a Trino cluster takes a significant amount of
resources. So any work to improve the runtime needs have a significant positive
impact:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/installation/deployment.html#java-runtime-environment&quot;&gt;Requirement to use Java
11&lt;/a&gt;, with
better GC performance, overall performance, and improved container
support&lt;/li&gt;
  &lt;li&gt;Support for ARM64-based processors to run Trino&lt;/li&gt;
  &lt;li&gt;Support for minimum number of workers before query starts, useful for
implementing autoscaling&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/25/data-integrity-protection.html&quot;&gt;Data integrity checks for network transfers&lt;/a&gt; to prevent data corruption during
processing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;everything-else&quot;&gt;Everything else&lt;/h2&gt;

&lt;p&gt;There is so much more to capture, and you really would have to read all the
&lt;a href=&quot;/docs/current/release.html#releases-2020&quot;&gt;release notes&lt;/a&gt; in detail to know it
all. To safe you from that, here are a few more noteworthy changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for materialized views in Iceberg connector&lt;/li&gt;
  &lt;li&gt;JDBC driver backward compatibility tests&lt;/li&gt;
  &lt;li&gt;Support for multiple event listeners&lt;/li&gt;
  &lt;li&gt;Added Python client support for exec with parameters&lt;/li&gt;
  &lt;li&gt;New look and navigation for the &lt;a href=&quot;/docs/current/index.html&quot;&gt;documentation&lt;/a&gt;, and
lots of new content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;community-resources-and-events&quot;&gt;Community resources and events&lt;/h2&gt;

&lt;p&gt;Beyond the raw code and helping each other, the community collaborated on other
helpful resources like books and in-depth video tutorials.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mattsfuller&quot;&gt;Matt&lt;/a&gt;, &lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred&lt;/a&gt;,
and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin&lt;/a&gt;  published the book &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt; with O’Reilly. Over 5000
readers took advantage of the &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;free digital copy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Brian and Manfred launched the live streaming event &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt;, and grew their audience and back catalog to
include some very useful material. If you have not seen it yet, go and &lt;a href=&quot;/broadcast/episodes.html&quot;&gt;watch
some old episodes&lt;/a&gt; and join us in the next ones.&lt;/p&gt;

&lt;p&gt;We also had a number of other online events and presentations, with direct
participation of our community members:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A &lt;a href=&quot;/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html&quot;&gt;dedicated conference event&lt;/a&gt;
for the community in Japan was very successful.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;/blog/2020/09/28/argentina-big-data-meetup.html&quot;&gt;Argentina Big Data Meetup&lt;/a&gt; had a large audience from the
community in South America&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A series of virtual events around the project started with a roadmap and
overview meeting and included a number real world use case examples at scale:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;Trino at Pinterest&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;Trino Migration at ARM Treasure Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Trino at Zuora&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another series of training classes with the project founders was hugely
successful. It includes very valuable content for any Trino user, from beginners
to experts, that you should not miss:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL in Trino with David&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Understanding and Tuning Trino Query Processing with Martin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Securing Trino with Dain&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Configuring and Tuning Trino with Dain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;2020 was a wild ride for us all. Trino and the Trino community definitely
emerged as a winner, and we are looking forward to a very bright future with you
all.&lt;/p&gt;

&lt;p&gt;A couple of ongoing work is already underway and very promising:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Optimized Parquet reader, on par with ORC reader support&lt;/li&gt;
  &lt;li&gt;Support for SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statements&lt;/li&gt;
  &lt;li&gt;Oauth2 support for JDBC&lt;/li&gt;
  &lt;li&gt;Support for SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re starting the new year with a shiny new name, a cute little bunny, and a
very vibrant community. The future is looking great for Trino!&lt;/p&gt;

&lt;p&gt;Don’t hesitate and miss out on all the benefits of Trino. Join us &lt;a href=&quot;/slack.html&quot;&gt;on
Slack&lt;/a&gt; to get started!&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Manfred Moser, Brian Olsen</name>
        </author>
      

      <summary>Wow! If you would have to sum up what happened in the last year in this great community, wow would be it. It is truly awe-inspiring to be part of this incredible journey of Trino. Oh yeah, on that note. Our community and project chose the new name Trino, to be able to continue to innovate and develop freely as a community of peers. Presto® and Presto® SQL are a thing of the past. Now that is out of the way, let’s dive right in and see what all our community members across the globe have created with us!</summary>

      
      
    </entry>
  
    <entry>
      <title>Migrating from PrestoSQL to Trino</title>
      <link href="https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html" rel="alternate" type="text/html" title="Migrating from PrestoSQL to Trino" />
      <published>2021-01-04T00:00:00+00:00</published>
      <updated>2021-01-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html">&lt;p&gt;As we previously announced, we’re
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;rebranding Presto SQL as Trino&lt;/a&gt;.
Now comes the hard part: migrating to the new version of the software.
We just released the first version,
&lt;a href=&quot;/docs/current/release/release-351.html&quot;&gt;Trino 351&lt;/a&gt;,
which uses the name Trino everywhere, both internally and externally.
Unfortunately, there are some unavoidable compatibility aspects that
administrators of Trino need to know about. We hope this post makes the
transition as smooth as possible.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;things-that-havent-changed&quot;&gt;Things that haven’t changed&lt;/h1&gt;

&lt;p&gt;Let’s start with the good news. For end users running queries against Trino,
everything should be the same. There are no changes to the SQL language,
SQL functions, session properties, etc.&lt;/p&gt;

&lt;p&gt;Users now see &lt;em&gt;Trino&lt;/em&gt; in error messages, a different logo in the web UI,
and error stack traces have a different package name, but otherwise they
won’t know that anything has changed. All of their views, reports,
or other stored queries will work as before.&lt;/p&gt;

&lt;p&gt;Similarly for administrators, except for a few things noted in the
&lt;a href=&quot;/docs/current/release/release-351.html&quot;&gt;Trino 351 release notes&lt;/a&gt;,
all the configuration properties are the same.&lt;/p&gt;

&lt;h1 id=&quot;client-protocol-compatiblity&quot;&gt;Client protocol compatiblity&lt;/h1&gt;

&lt;p&gt;The client protocol is how clients, such as the
&lt;a href=&quot;docs/current/client/cli.html&quot;&gt;CLI&lt;/a&gt; or
&lt;a href=&quot;/docs/current/client/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;,
talk to Trino. It uses standard HTTP as the underlying communications
protocol, with some custom HTTP headers to communicate values
to and from Trino. Unfortunately, those header names started with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Presto-&lt;/code&gt; and thus had to be changed to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Trino-&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The Trino CLI and JDBC driver send the new headers, so they are
&lt;strong&gt;only compatible with Trino versions 351 and newer&lt;/strong&gt;. Users should
wait to upgrade the CLI or JDBC driver until the Trino servers they
talk to have been upgraded.&lt;/p&gt;

&lt;p&gt;Out of the box, the Trino server does not work with older clients.
However, in order to support a graceful transition, you can allow the
server to support older clients by adding a configuration property:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;protocol.v1.alternate-header-name=Presto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;We recommend using version 350 of CLI and JDBC driver as the transition version&lt;/strong&gt;.
It has all the newest features such as variable precision timestamps,
has been tested with a range of older server versions, and is the last
version to support older servers.&lt;/p&gt;

&lt;h1 id=&quot;jdbc-driver&quot;&gt;JDBC driver&lt;/h1&gt;

&lt;p&gt;The URL prefix for the JDBC driver now starts with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino:&lt;/code&gt; instead
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto:&lt;/code&gt;. This means that any client applications using the
JDBC driver need to update their connection configuration. The old
prefix is still supported, but will be removed in a future release.&lt;/p&gt;

&lt;p&gt;The class name of the driver is now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino.jdbc.TrinoDriver&lt;/code&gt;. This is
of no concern to most users, as the driver is normally accessed via the
standard JDBC auto-discovery mechanism based on the URL. As with the URL prefix,
the old name is still supported, but will be removed in a future release.&lt;/p&gt;

&lt;h1 id=&quot;server-rpm&quot;&gt;Server RPM&lt;/h1&gt;

&lt;p&gt;The name of the RPM has changed, so it is treated as a different RPM, and
thus you cannot simply upgrade from the old version to the new version.
All of the directories for the RPM that contained the name &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt; now
use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead. You likely want to uninstall the old RPM, rename
the config and log directories, then install the new RPM.&lt;/p&gt;

&lt;h1 id=&quot;docker-image&quot;&gt;Docker image&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;Trino Docker image&lt;/a&gt; is now
published as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trinodb/trino&lt;/code&gt;. The supported configuration directory is
now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt;. The CLI is now named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;jmx-mbean-naming&quot;&gt;JMX MBean naming&lt;/h1&gt;

&lt;p&gt;Trino runs on the JVM, which has the JMX framework as a standard way to expose
system and application metrics. Trino exposes a huge number of JMX metrics for
administrators to monitor their clusters. You might be using these metrics
via your monitoring system, or perhaps you are accessing them in SQL via the
Trino &lt;a href=&quot;/docs/current/connector/jmx.html&quot;&gt;JMX connector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The metrics for Trino server now start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt;. You
might need to update this name in your monitoring system, or you can revert
to the old name:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jmx.base-name=presto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Similarly, the metrics for the Elasticsearch, Hive, Iceberg, Raptor, and Thrift
connectors now start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino.plugin&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto.plugin&lt;/code&gt;. Again,
you might need to update these names in your monitoring system, or you can
revert to the old name. For example, for the Hive connector:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jmx.base-name=presto.plugin.hive
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;thrift-connector&quot;&gt;Thrift connector&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/connector/thrift.html&quot;&gt;Thrift connector&lt;/a&gt; had many
&lt;a href=&quot;/docs/current/release/release-351.html#thrift-connector-changes&quot;&gt;backwards incompatible changes&lt;/a&gt;
to both the Thrift service interface and the configuration properties. You need
update all of your implementations of the Thrift service used by the connector.&lt;/p&gt;

&lt;h1 id=&quot;spi&quot;&gt;SPI&lt;/h1&gt;

&lt;p&gt;If you have any custom plugins for Trino, such as connectors or functions,
these need to be updated. The package name is now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino.spi&lt;/code&gt;, and a
few classes were renamed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoException&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoException&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoPrincipal&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoPrincipal&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoWarning&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoWarning&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are no functional changes, so all you should need to do is update
your imports and rename the references to the above class names.&lt;/p&gt;

&lt;h1 id=&quot;migration-guide&quot;&gt;Migration guide&lt;/h1&gt;

&lt;p&gt;Now that you understand what is different and what you need to change,
you can start thinking about the list of steps needed to perform the
migration. The following is a rough plan for upgrading your environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Prepare to deploy the new version&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Let users know the name is changing, so they are not surprised by the logo changes in the UI.&lt;/li&gt;
  &lt;li&gt;Make sure that users are using recent client versions. Ideally, upgrade them all to
version 350, as mentioned above. You can check the HTTP request logs for the coordinator
to see what client versions are in use.&lt;/li&gt;
  &lt;li&gt;Update your server configuration with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;protocol.v1.alternate-header-name=Presto&lt;/code&gt;
to allow supporting all of your existing Presto clients.&lt;/li&gt;
  &lt;li&gt;If you are using the RPM, have a plan to deal with the new RPM name
and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; directory names.&lt;/li&gt;
  &lt;li&gt;If you are using Docker, use the new image name, make sure your configuration will
be mounted using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; path name, and remember that the CLI is now named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Update any custom plugins to use the new SPI.&lt;/li&gt;
  &lt;li&gt;Check if you have anything using JMX to monitor your clusters, and decide if you will
update them to the new names or set a Trino config to revert to the old names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Upgrade your servers to Trino 351+&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade development and staging servers.&lt;/li&gt;
  &lt;li&gt;Upgrade production servers. If you have multiple clusters, you can do them one
at a time, and verify everything is working before moving on to the next one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Upgrade clients&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade all clients including the CLI, JDBC driver, Python, etc., to the Trino versions.&lt;/li&gt;
  &lt;li&gt;Update any applications using JDBC to use the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino:&lt;/code&gt; connection URL prefix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Cleanup&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;protocol.v1.alternate-header-name&lt;/code&gt; configuration property.&lt;/li&gt;
  &lt;li&gt;If you configured Trino to use the old JMX names, convert your monitoring system
to use the new JMX names and remove the fallback configs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-help&quot;&gt;Getting help&lt;/h1&gt;

&lt;p&gt;We’re here to help! If you run into any issues while upgrading, or having any
questions or concerns, &lt;a href=&quot;/slack.html&quot;&gt;ask on Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips, Dain Sundstrom</name>
        </author>
      

      <summary>As we previously announced, we’re rebranding Presto SQL as Trino. Now comes the hard part: migrating to the new version of the software. We just released the first version, Trino 351, which uses the name Trino everywhere, both internally and externally. Unfortunately, there are some unavoidable compatibility aspects that administrators of Trino need to know about. We hope this post makes the transition as smooth as possible.</summary>

      
      
    </entry>
  
    <entry>
      <title>We’re rebranding PrestoSQL as Trino</title>
      <link href="https://trino.io/blog/2020/12/27/announcing-trino.html" rel="alternate" type="text/html" title="We’re rebranding PrestoSQL as Trino" />
      <published>2020-12-27T00:00:00+00:00</published>
      <updated>2020-12-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/12/27/announcing-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2020/12/27/announcing-trino.html">&lt;p&gt;We’re rebranding PrestoSQL as Trino. The software and the community you have come to love and depend on aren’t 
going anywhere, we are simply renaming. &lt;strong&gt;Trino is the new name for PrestoSQL&lt;/strong&gt;, the project supported by the founders 
and creators of Presto® along with the major contributors – just under a shiny new name. And now you can find us here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;GitHub: &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;https://github.com/trinodb/trino&lt;/a&gt;. Please give it a &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;star&lt;/a&gt;!&lt;/li&gt;
  &lt;li&gt;Twitter: &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;@trinodb&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slack: &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;https://trino.io/slack.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn why we’re doing this, read on…&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In 2012, Dain, David and Martin joined the Facebook data infrastructure team. Together with Eric Hwang, we created 
Presto® to address the problems of low latency interactive analytics over Facebook’s massive Hadoop data warehouse. 
One of our non-negotiable conditions was for Presto® to be an open source project. Open source is in our DNA - we had 
all used and participated in open source projects to various degrees in the past, and we recognized the power of open 
communities and developers coming together to build successful software that can stand the test of time.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-announcement/team.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Over the next six years, we worked hard to build a healthy open source community and ecosystem around the project. We 
worked with developers and users all over the world and welcomed them into the Presto® community. Presto® was on a path 
of increasing growth and success, in large part because of the contributions from developers across many fields and all 
over the world.&lt;/p&gt;

&lt;p&gt;Unfortunately in 2018, it became clear that Facebook management wanted to have tighter control over the project and its 
future. This culminated with their decision to grant Facebook developers commit rights on the project without any prior 
experience in Presto®. We strongly believe that this kind of decision is not compatible with having a healthy, open 
community. Moreover, they made this decision by fiat without engaging the Presto® community. As a matter of principle, 
we had no choice but to leave Facebook in order to focus on making sure Presto® continued to be a successful project 
with an open, collaborative and independent community. In reality, the choice was easy.&lt;/p&gt;

&lt;p&gt;We started the Presto Software Foundation in January 2019 as an independent entity to oversee the development of the 
software and community, continuing the meritocratic system that had been in place over the previous 6 years. The community 
quickly consolidated under this new home. We intentionally stayed unemployed over the next 10 months to focus on expanding 
and strengthening the community by working directly with major users and contributors, as well as reaching out to a wider 
group of users and developers across the globe. This resulted in new use cases and an injection of energy, making the 
project more vibrant than ever before as even more new users and developers became engaged. But, don’t take our word for 
it, let the data speak for itself:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-announcement/commits.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Months after this consolidation, Facebook decided to create a competing community using The Linux Foundation®. As a first 
action, Facebook applied for a trademark on Presto®. This was a surprising, norm-breaking move because up until that point, 
the Presto® name had been used without constraints by commercial and non-commercial products for over 6 years. In September 
of 2019, Facebook established the Presto Foundation at The Linux Foundation®, and immediately began working to enforce this 
new trademark. We spent the better part of the last year trying to agree to terms with Facebook and The Linux Foundation 
that would not negatively impact the community, but unfortunately we were unable to do so. The end result is that we must 
now change the name in a short period of time, with little ability to minimize user disruption.&lt;/p&gt;

&lt;p&gt;On a personal note, and as the founders who named the project Presto® in the first place, this is an incredibly sad and 
disappointing turn of events. And while we will always have fondness for the name Presto®, we have come to accept that a 
name is just a name. To be frank, we’re tired of this endless distraction, and we intend to focus on what matters most 
and what we are best at doing – building high quality software everyone can rely on and fostering a healthy community 
of users and developers that build it and support it. We’re not going anywhere – we’re the same people, the same amazing 
software, under a new name: Trino.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you love this project, you already love Trino. ❤️&lt;/strong&gt;&lt;/p&gt;

&lt;html&gt;
&lt;p style=&quot;font-size:0.8em&quot;&gt;Facebook is a registered trademark of Facebook Inc.  The Linux Foundation and Presto are trademarks of The Linux Foundation.&lt;/p&gt;
&lt;/html&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>We’re rebranding PrestoSQL as Trino. The software and the community you have come to love and depend on aren’t going anywhere, we are simply renaming. Trino is the new name for PrestoSQL, the project supported by the founders and creators of Presto® along with the major contributors – just under a shiny new name. And now you can find us here: GitHub: https://github.com/trinodb/trino. Please give it a star! Twitter: @trinodb Slack: https://trino.io/slack.html If you want to learn why we’re doing this, read on…</summary>

      
      
    </entry>
  
    <entry>
      <title>A Report about Presto Conference Tokyo 2020 Online</title>
      <link href="https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html" rel="alternate" type="text/html" title="A Report about Presto Conference Tokyo 2020 Online" />
      <published>2020-11-21T00:00:00+00:00</published>
      <updated>2020-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020</id>
      <content type="html" xml:base="https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html">&lt;p&gt;On Nov 11th, 2020, Japan Presto Community held the 2nd Presto Conference 
welcoming Martin Traverso and Brian Olsen.
The conference was hosted at Youtube Live.
This article is the summary of the conference aiming to share their great talks.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-community-updates&quot;&gt;Presto Community Updates&lt;/h1&gt;

&lt;p&gt;First of all, Martin introduced recent Presto updates in these days. 
It covers recent changes and enhancements achieved by the community activities.
Attendees also learned several new functions that will be available soon.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update / Merge (https://github.com/prestosql/presto/issues/3325)&lt;/li&gt;
  &lt;li&gt;Materialized Views (https://github.com/prestosql/presto/pull/3283)&lt;/li&gt;
  &lt;li&gt;Dynamically resolved functions&lt;/li&gt;
  &lt;li&gt;Optimized Parquet reader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, at Q&amp;amp;A, he suggests new developers who want to contribute to PrestoSQL 
to check “good first issue” tag on Github. The tag is a good first step for a new joiner to contribute. 
Ref. &lt;a href=&quot;https://github.com/prestosql/presto/labels/good%20first%20issue&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/NxDBBEA67Ws&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-community---how-to-get-involved&quot;&gt;Presto Community - How to get involved&lt;/h1&gt;

&lt;p&gt;To make attendees get used to Presto Community, Martin provided a guide for walking around Presto community. 
He gives us their team’s principles about the Presto community, and talk about their education strategy for new Presto users.
I would like to quote the pricinpals here.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We are passionate about open source&lt;/li&gt;
  &lt;li&gt;We help others be succesfful with what we create&lt;/li&gt;
  &lt;li&gt;We create robust long-lasting software&lt;/li&gt;
  &lt;li&gt;We are egalitarian (nobody is more important than the other)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;support-presto-as-a-feature-of-saas&quot;&gt;Support Presto as a feature of SaaS&lt;/h1&gt;

&lt;p&gt;Then, Satoru Kamikaseda, Technical Support Engineer at Treasure Data, provides an overview of how Treasure Data supports Presto in their service. 
Presto is heavily used to support many enterprise use cases as a customer data platoform, 
and it is becoming the hub component processing high throughput workload from many kinds of clients such as Spark, ODBC and JDBC.&lt;/p&gt;

&lt;p&gt;He described statistics about Presto queries on their platform, and how to support each cases. 
In the stats, 1/3 is any investigation of job failure and query result, 1/3 is a request to help their client’s SQL, 
and others are a sort of notifications to their clients and performance investigation. 
His talk must be useful for any SaaS companies that provides a query engine to their clients to learn how difficult it is to support a distibuted query engine.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/GR6e3dfKKJ8w4c&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/SatoruKamikaseda/support-presto-as-a-feature-of-saas&quot; title=&quot;Support Presto as a feature of SaaS&quot; target=&quot;_blank&quot;&gt;Support Presto as a feature of SaaS&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/SatoruKamikaseda&quot; target=&quot;_blank&quot;&gt;SatoruKamikaseda&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;how-to-use-presto-with-aws-efficiently&quot;&gt;How to use Presto with AWS efficiently&lt;/h1&gt;

&lt;p&gt;We could learn how to use Presto with AWS including Presto on EMR, Presto on EC2, Presto by Athena and AWS Glue.
Noritaka Sekiyama, Sr. Big Data Architect at Amazon Web Service, Japan, also shares a comparison of Presto on AWS (EC2, EMR, Athena). 
If you are a new to Presto, his talk gives you an insight to choose your first Presto environement.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/kWzJ1XqR96A9di&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/ssuserca76a5/aws-presto&quot; title=&quot;AWS で Presto を徹底的に使いこなすワザ&quot; target=&quot;_blank&quot;&gt;AWS で Presto を徹底的に使いこなすワザ&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/ssuserca76a5&quot; target=&quot;_blank&quot;&gt;Noritaka Sekiyama&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;presto--line-2020&quot;&gt;Presto @ LINE 2020&lt;/h1&gt;

&lt;p&gt;LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). HYuya Ebihara, one of Presto maintainers, 
gives us how they improve Presto at their platform since they presented in &lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;the previous conference&lt;/a&gt;. 
Their Presto usage significantly increases from 2019. Num of Presto workers from 100 to 300 and Num of daily queries reaches to 50,000 queries from 20,000 queries. 
We could learn how to upgrade Presto from 314 to 339 and how they resolved issues through Presto upgrade.&lt;/p&gt;

&lt;iframe src=&quot;https://docs.google.com/presentation/d/e/2PACX-1vS2QdQjhLsiSuVdWlEmT23ixqoZXkRrKKMRGa1hrZHg65OpcH18RpzARotOMYvIBSwP57lPPAHkUQOx/embed&quot; frameborder=&quot;0&quot; width=&quot;595&quot; height=&quot;485&quot; allowfullscreen=&quot;true&quot; mozallowfullscreen=&quot;true&quot; webkitallowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;dive-into-amazon-athena---serverless-presto-2020&quot;&gt;Dive into Amazon Athena - Serverless Presto, 2020&lt;/h1&gt;

&lt;p&gt;Makoto Kawamura, Solution Architect at Amazon Web Service Japan, 
introduces the latest features of AWS Athena and performance tuning tips. It must be helpful for developers who tied to AWS to explore Amazon Athena.&lt;/p&gt;

&lt;div style=&quot;width: 90%&quot;&gt;&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;92a399aad5344df197279cd4195d9464&quot; data-ratio=&quot;1.77777777777778&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;&lt;/div&gt;

&lt;h1 id=&quot;presto-cassandra-connector-hack-at-repro&quot;&gt;Presto Cassandra Connector Hack at Repro&lt;/h1&gt;

&lt;p&gt;Repro provides Customer Engagement Platform that enables companies to personalize their communication strategies with the right message at the right time to drive better retention and lifetime value. 
They use Presto for a segmentation backend system in their service to make a list of audiences with a certain condition.&lt;/p&gt;

&lt;p&gt;Takeshi Arabiki gives us an in-depth presentation on the modification of Presto Casandra to stabilize and improve the performance of Presto, 
in addition to the use of Presto in Repro.
His talk covers a wide range of topics from investigation of the bottleneck to its resolution.&lt;/p&gt;

&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;9289d942805a4bf2be908cf42a122a29&quot; data-ratio=&quot;1.77777777777778&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;

&lt;h1 id=&quot;testing-distributed-query-engine-as-a-service&quot;&gt;Testing Distributed Query Engine as a Service&lt;/h1&gt;

&lt;p&gt;At the end, Naoki Takezoe from Treasure Data, talks their challenges towards Presto upgrade and 
how hard to migrate variety of workload with performance stability. 
In actual production-scale enviroment that are running multiple client, testing is one of big challenges. 
He shows how they simulate their client workload with theier developed query simulator to cover various corner cases and to verify data correctness.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/yCrep8qbYUzNzh&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/takezoe/testing-distributed-query-engine-as-a-service&quot; title=&quot;Testing Distributed Query Engine as a Service&quot; target=&quot;_blank&quot;&gt;Testing Distributed Query Engine as a Service&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/takezoe&quot; target=&quot;_blank&quot;&gt;takezoe&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;This conference was the first online Presto conference in Tokyo. 
Unfortunately, We couldn’t have a chance to discuss with the community developers and creators in face-to-face. We hope we’ll get such a great opportunity in the near future.
However, that was a great time to have many presentations with the community members to learn a lot of new things from their wornderful experience.
During the conference, the average number of Youtube Live viewers are over 100 people, 
and the total of attendees are around 180 people. 
In the previous conference, there were 89 attendees. I think the number of Presto developers/users in Japan has been increasing gradually. 
We really appreciate developers in the community and creators. Thank you so much for coming to the conference and see you next time!&lt;/p&gt;

&lt;h1 id=&quot;youtube-live-link&quot;&gt;Youtube Live link&lt;/h1&gt;

&lt;p&gt;The event is mainly talked in Japanese.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://youtu.be/NxDBBEA67Ws&quot;&gt;Presto Conference Tokyo 2020 Online&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Toru Takahashi, Treasure Data</name>
        </author>
      

      <summary>On Nov 11th, 2020, Japan Presto Community held the 2nd Presto Conference welcoming Martin Traverso and Brian Olsen. The conference was hosted at Youtube Live. This article is the summary of the conference aiming to share their great talks.</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing Presto Conference Tokyo 2020</title>
      <link href="https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020.html" rel="alternate" type="text/html" title="Announcing Presto Conference Tokyo 2020" />
      <published>2020-10-21T00:00:00+00:00</published>
      <updated>2020-10-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020.html">&lt;p&gt;Last year, &lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;Presto Conference Tokyo 2019&lt;/a&gt; 
was held in Japan with Martin Traverso, Dain Sundstrom and David Phillips, 
the founders of the Presto Software Foundation.&lt;/p&gt;

&lt;p&gt;This year, the event changes to be an online only event. Presto Conference 
Tokyo 2020 is happening on the 20th of November. 
You can &lt;a href=&quot;https://techplay.jp/event/795265&quot;&gt;find out details and register right now&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;The event includes six sessions from Treasure Data, Amazon Web Services 
Japan, Repro and LINE, as well as open sessions with Martin and Brian Olsen, 
a Developer Advocate at Starburst Data.
This is a valuable opportunity to hear from engineers who are actually using 
Presto. It has something for those who are using Presto for data engineering
and those who don’t use Presto yet but are interested in it.&lt;/p&gt;

&lt;!--more--&gt;</content>

      
        <author>
          <name>Yuya Ebihara, LINE</name>
        </author>
      

      <summary>Last year, Presto Conference Tokyo 2019 was held in Japan with Martin Traverso, Dain Sundstrom and David Phillips, the founders of the Presto Software Foundation. This year, the event changes to be an online only event. Presto Conference Tokyo 2020 is happening on the 20th of November. You can find out details and register right now! The event includes six sessions from Treasure Data, Amazon Web Services Japan, Repro and LINE, as well as open sessions with Martin and Brian Olsen, a Developer Advocate at Starburst Data. This is a valuable opportunity to hear from engineers who are actually using Presto. It has something for those who are using Presto for data engineering and those who don’t use Presto yet but are interested in it.</summary>

      
      
    </entry>
  
    <entry>
      <title>A gentle introduction to the Hive connector</title>
      <link href="https://trino.io/blog/2020/10/20/intro-to-hive-connector.html" rel="alternate" type="text/html" title="A gentle introduction to the Hive connector" />
      <published>2020-10-20T00:00:00+00:00</published>
      <updated>2020-10-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/20/intro-to-hive-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/20/intro-to-hive-connector.html">&lt;p&gt;TL;DR: The Hive connector is what you use in Trino for reading data from object
storage that is organized according to the rules laid out by Hive, without using
the Hive runtime code.&lt;/p&gt;

&lt;p&gt;One of the most confusing aspects when starting Trino is the Hive connector. 
Typically, you seek out the use of Trino when you experience an intensely slow
query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In
fact, the genesis of Trino, formerly known as Presto, came about due to these 
slow Hive query conditions at Facebook back in 2012.&lt;/p&gt;

&lt;p&gt;So when you learn that Trino has a Hive connector,
it can be rather confusing since you moved to Trino to circumvent the slowness
of your current Hive cluster. Another common source of confusion is when you
want to query your data from your cloud object storage, such as AWS S3, MinIO, 
and Google Cloud Storage. This too uses the Hive connector. If that 
confuses you, don’t worry, you are not alone. This blog aims to explain this
commonly confusing nomenclature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;hive-architecture&quot;&gt;Hive architecture&lt;/h1&gt;

&lt;p&gt;To understand the origins and inner workings of Trino’s Hive connector, you
first need to know a few high level components of the Hive architecture.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/intro-to-hive-connector/hive.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can simplify the Hive architecture to four components:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The runtime&lt;/em&gt; contains the logic of the query engine that translates the SQL
-esque Hive Query Language(HQL) into MapReduce jobs that run over files stored 
in the filesystem.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The storage&lt;/em&gt; component is simply that, it stores files in various formats and
index structures to recall these files. The file formats can be anything as
simple as JSON and CSV, to more complex files such as columnar formats like ORC
and Parquet. Traditionally, Hive runs on top of the Hadoop Distributed
Filesystem (HDFS). As cloud-based options became more prevalent, object storage
like Amazon S3, Azure Blob Storage, Google Cloud Storage, and others needed
to be leveraged as well and replaced HDFS as the storage component.&lt;/p&gt;

&lt;p&gt;In order for Hive to process these files, it must have a mapping
from SQL tables in &lt;em&gt;the runtime&lt;/em&gt; to files and directories in &lt;em&gt;the storage&lt;/em&gt;
component. To accomplish this, Hive uses the Hive Metastore Service (HMS), 
often shortened to &lt;em&gt;the metastore&lt;/em&gt; to manage the metadata about the files such
as table columns, file locations, file formats, etc…&lt;/p&gt;

&lt;p&gt;The last component not included in the image is Hive’s &lt;em&gt;data organization
specification&lt;/em&gt;. The documentation of this element only exists in the code in
Hive and has been reverse engineered to be used by other systems like Trino 
to remain compatible with other systems.&lt;/p&gt;

&lt;p&gt;Trino reuses all of these components except for &lt;em&gt;the runtime&lt;/em&gt;. This is the same
approach most compute engines take when dealing with data in object stores, 
specifically, Trino, Spark, Drill, and Impala. When you think of the Hive
connector, you should think about a connector that is capable of reading data
organized by the unwritten Hive specification.&lt;/p&gt;

&lt;h3 id=&quot;trino-runtime-replaces-hive-runtime&quot;&gt;Trino runtime replaces Hive runtime&lt;/h3&gt;

&lt;p&gt;In the early days of big data systems, many expected query turnaround to take a 
long time due to the high volume of unstructured data in ETL workloads. The
primary goal in early iterations of these systems was simply throughput over
large volumes of data while maintaining fault-tolerance. Now, more businesses
want to run fast interactive queries over their big data instead of running jobs
that take hours and produce possibly undesirable results. Many companies have
petabytes of data and metadata in their data warehouse. Data in storage is
cumbersome to move and the data in the metastore takes a long time to repopulate
in other formats. Since only the runtime that executed Hive queries needs
replacement, the Trino engine utilizes the existing metastore metadata and
files residing in storage, and the Trino runtime effectively replaces the
Hive runtime responsible for analyzing the data.&lt;/p&gt;

&lt;h1 id=&quot;trino-architecture&quot;&gt;Trino Architecture&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/intro-to-hive-connector/trino.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-hive-connector-nomenclature&quot;&gt;The Hive connector nomenclature&lt;/h3&gt;

&lt;p&gt;Notice, that the only change in the Trino architecture is &lt;em&gt;the runtime&lt;/em&gt;. The
HMS still exists along with &lt;em&gt;the storage&lt;/em&gt;. This is not by accident. This design
exists to address a common problem faced by many companies. It simplifies the
migration from using Hive to using Trino. Regardless of &lt;em&gt;the storage&lt;/em&gt; component
used &lt;em&gt;the runtime&lt;/em&gt; makes use of the HMS and that is the reason this connector is
the Hive connector.&lt;/p&gt;

&lt;p&gt;Where the confusion tends to come from, is when you search for a connector
from the context of the storage systems you want to query. You may not even be 
aware &lt;em&gt;the metastore&lt;/em&gt; is a necessity or even exists. Typically, you look for an
S3 connector, a GCS connector or a MinIO connector. All you need is the Hive 
connector and the HMS to manage the metadata of the objects in your storage.&lt;/p&gt;

&lt;h3 id=&quot;the-hive-metastore-service&quot;&gt;The Hive Metastore Service&lt;/h3&gt;

&lt;p&gt;The HMS is the only Hive process used in the entire Trino ecosystem when using
the Hive connector. The HMS is actually a simple service with a binary API using
&lt;a href=&quot;https://thrift.apache.org/&quot;&gt;the Thrift protocol&lt;/a&gt;. This service makes updates to
the metadata, stored in an RDBMS such as PostgreSQL, MySQL, or MariaDB. There
are other compatible replacements of the HMS such as AWS Glue, a
drop-in substitution for the HMS.&lt;/p&gt;

&lt;h3 id=&quot;getting-started-with-the-hive-connector-on-trino&quot;&gt;Getting started with the Hive Connector on Trino&lt;/h3&gt;

&lt;p&gt;To drive this point home, I created a tutorial that showcases using Trino and
looking at the metadata it produces. In the following scenario, the docker 
environment contains four docker containers:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; - &lt;em&gt;the runtime&lt;/em&gt; in this scenario that replaces Hive.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; - &lt;em&gt;the storage&lt;/em&gt; is an open-source cloud object storage.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive-metastore&lt;/code&gt; -  &lt;em&gt;the metastore&lt;/em&gt; service instance.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mariadb&lt;/code&gt; - the database that &lt;em&gt;the metastore&lt;/em&gt; uses to store the metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can play around with the system and optionally view the configurations. The
scenario asks you to run a query to populate data in MinIO and then see the
resulting metadata populated in MariaDB by the HMS. The next step asks you to
run queries over the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mariadb&lt;/code&gt; database which holds the generated
metadata from &lt;em&gt;the metastore&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If you have any questions or run into any issues with the example, you can find
us on &lt;a href=&quot;/slack.html&quot;&gt;slack&lt;/a&gt; on the #dev or #general channels.&lt;/p&gt;

&lt;p&gt;Have fun!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/hive/trino-minio&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/intro-to-hive-connector/intro-to-hive.jpeg&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>TL;DR: The Hive connector is what you use in Trino for reading data from object storage that is organized according to the rules laid out by Hive, without using the Hive runtime code. One of the most confusing aspects when starting Trino is the Hive connector. Typically, you seek out the use of Trino when you experience an intensely slow query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In fact, the genesis of Trino, formerly known as Presto, came about due to these slow Hive query conditions at Facebook back in 2012. So when you learn that Trino has a Hive connector, it can be rather confusing since you moved to Trino to circumvent the slowness of your current Hive cluster. Another common source of confusion is when you want to query your data from your cloud object storage, such as AWS S3, MinIO, and Google Cloud Storage. This too uses the Hive connector. If that confuses you, don’t worry, you are not alone. This blog aims to explain this commonly confusing nomenclature.</summary>

      
      
    </entry>
  
    <entry>
      <title>Launching Presto First Steps training</title>
      <link href="https://trino.io/blog/2020/10/07/presto-first-steps.html" rel="alternate" type="text/html" title="Launching Presto First Steps training" />
      <published>2020-10-07T00:00:00+00:00</published>
      <updated>2020-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/07/presto-first-steps</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/07/presto-first-steps.html">&lt;p&gt;Writing the book &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive
Guide&lt;/a&gt; with Matt and Martin earlier this
year, and then publishing it with &lt;a href=&quot;https://www.oreilly.com/&quot;&gt;O’Reilly&lt;/a&gt; was a
great experience and has been a great success. Lots of readers took advantage of
getting a &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;free digital copy of the book from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now it is time to follow up with a training class. I am pleased to let you know
that you can join me for three hours of
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;Presto First Steps&lt;/a&gt;
in November.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The new course is aimed at beginners with Presto, who want to accelerate their
initial understanding and adoption. You ramp up quickly to install and configure
Presto, use the CLI, and learn how to query connected data sources with SQL. The
class is completely interactive, and I look forward to many of you joining me
and bring lots of great questions to ask.&lt;/p&gt;

&lt;p&gt;The class includes three interactive training exercises on
&lt;a href=&quot;https://katacoda.com/&quot;&gt;Katacoda&lt;/a&gt;. They allow you to get hands on experience
with Presto immediately. Lots of useful tips and tricks are covered in my
material, and of course I plan to run a bunch of additional demos. You can find
more details about the content of the class in &lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;the registration
page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Don’t miss out and make sure you &lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;reserve your ticket
now&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Writing the book Trino: The Definitive Guide with Matt and Martin earlier this year, and then publishing it with O’Reilly was a great experience and has been a great success. Lots of readers took advantage of getting a free digital copy of the book from Starburst. Now it is time to follow up with a training class. I am pleased to let you know that you can join me for three hours of Presto First Steps in November.</summary>

      
      
    </entry>
  
    <entry>
      <title>Hello I&apos;m Brian, Presto Developer Advocate</title>
      <link href="https://trino.io/blog/2020/10/01/intro-developer-advocate.html" rel="alternate" type="text/html" title="Hello I&apos;m Brian, Presto Developer Advocate" />
      <published>2020-10-01T00:00:00+00:00</published>
      <updated>2020-10-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/01/intro-developer-advocate</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/01/intro-developer-advocate.html">&lt;p&gt;Hello, Presto nation!&lt;/p&gt;

&lt;p&gt;My name is Brian, and I’m a new developer advocate working at Starburst. Let me 
give you a little background on how I got here, and cover how my role can help
the Presto community.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/developer-advocate/brian.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;My career in computation and databases started in the military. As luck would
have it, I worked on a big data team as my first job out of college! I was in a
Hive shop that dealt with the typical outdated runtime and slow query
turnaround. Eventually, our architect introduced us to Presto as an alternative.
I worked with him to start testing and moving our existing use cases built on
Hive to use Presto. We also used Elasticsearch and had a few cases that needed
to perform joins and unions over the datasets in both Elasticsearch and Hive.
There were a few use cases that were not going to immediately be transferable
without some modification to the Presto Elasticsearch connector.&lt;/p&gt;

&lt;h2 id=&quot;joining-the-presto-community&quot;&gt;Joining the Presto community&lt;/h2&gt;

&lt;p&gt;The first modification was &lt;a href=&quot;https://github.com/trinodb/trino/issues/2441&quot;&gt;adding support for Elasticsearch array 
types&lt;/a&gt;, and the second was, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/754&quot;&gt;support for nested types&lt;/a&gt;. My 
first interaction with the Presto community was incredible! As a serial
open-source attempter, I always wanted to get invested in an open-source
project. I had started pull requests in various projects. Sometimes I ran into 
unpleasant maintainers, in other cases the rules were daunting or too confusing
to start. I created a pull request only to have it sit there with no
communication as to why it wasn’t accepted or even looked at. However, when I
first joined &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;, I searched to see if there was already a
discussion about array types in the history. I ran into &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX/p1570064139005900&quot;&gt;a discussion between 
Dain and Martin about this 
issue&lt;/a&gt;. I
conversed with Martin, who was incredibly polite and willing to take time to 
discuss how this should be implemented.&lt;/p&gt;

&lt;h2 id=&quot;contributing&quot;&gt;Contributing&lt;/h2&gt;

&lt;p&gt;When I actually pulled the code, I saw how well written and maintained it was
compared to many open-source projects I had seen in the past. I made a few
changes, wrote a test around my use case, and signed a CLA agreement. After a
couple of weeks, my pull request was merged and I had finally contributed to an
open-source project. After that interaction, and seeing the code, I wanted to do
more. I really saw something special with this community.&lt;/p&gt;

&lt;p&gt;While many Presto contributors are doing amazing work contributing code, I
noticed there were some holes in other areas of the community that needed to be
filled. I started answering questions on Slack, LinkedIn, and Twitter and I
planned out a Udemy course for Presto. The &lt;a href=&quot;https://youtu.be/RPaG0Gu2I6c&quot;&gt;initial 
video&lt;/a&gt; I piloted is about tuning the memory
configuration of Presto.&lt;/p&gt;

&lt;h2 id=&quot;becoming-a-developer-advocate&quot;&gt;Becoming a developer advocate&lt;/h2&gt;

&lt;p&gt;Around this time I got into contact with some folks at Starburst about joining 
them to work with the community and Presto full-time! As I joined, we hadn’t
figured out what my exact role was at Starburst. Eventually, we decided I would
best serve as a developer advocate. What I’ve come to find is this role is 
aiming to do exactly what I set out to do before I joined. As a developer
advocate, I serve the community and act as a liaison between Starburst and the
Presto community. Up until this time, that responsibility has been unofficially
shared by many of the maintainers of Presto. I am here to simply take some of
that responsibility from them and focus all of my efforts on community growth
and health.&lt;/p&gt;

&lt;p&gt;The health of a community is difficult to define and is generally
subject to various signals that we can observe. These signals include an
increase in helpful interactions within the community, new members joining the
community, members who are actively engaging in the community, diversity of the
community, and more. If we start by focusing on making the community successful,
the success of the project will follow. Keeping the goal in mind that co-creator
David Phillips mentions:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This is the type of project that we look at Postgres as the inspiration. 
Postgres started in the eighties, it became a SQL system in the nineties, and
it’s still in active use and active development today. We say we want Presto
to have the same kind of history. - David Phillips&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next Steps&lt;/h2&gt;

&lt;p&gt;My first goal is to create a larger set of free learning materials, that expand
upon my initial goals when planning for my Udemy course. I recently started a
show with Manfred Moser called the Presto Community Broadcast. The show landing 
page is &lt;a href=&quot;/broadcast.html&quot;&gt;here&lt;/a&gt; and contains all the information about the show
schedule and where to find new and old episodes. This helps as we can use any
relevant material we create on this show for future teaching or blogs. We want
these live sessions to be interactive, and look forward to your feedback to
understand if our efforts are actually helping, or if you have ideas to improve
the show. This show, along with blogs, documentation, and interactive tutorials
are how I initially intend to fill some common questions that are received
through our &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and &lt;a href=&quot;https://stackoverflow.com/questions/tagged/presto&quot;&gt;Stack 
Overflow&lt;/a&gt; channels. Another
goal of adding these materials is to attract new members to the community. Not
all the material may be super relevant to the existing members of the community,
but this makes the community much more viable for newer members.&lt;/p&gt;

&lt;p&gt;Outside of providing new learning materials, your feedback helps us to
understand common problems and allows us to fix them. This feedback will aid us
in focusing on issues commonly voiced within the community but somehow get lost
in translation. This could be improving the Presto code itself, or it could be
making the documentation better, or to address common confusion, even if the
confusion comes from a force outside of the Presto community.&lt;/p&gt;

&lt;p&gt;For example, I recently &lt;a href=&quot;https://bitsondata.dev/what-is-benchmarketing-and-why-is-it-bad/&quot;&gt;wrote a 
blog&lt;/a&gt; about
some shady benchmarketing practices that were painting Presto in a bad light. 
The goal here was to make fun of the wildly bogus claims brought against Presto 
and the community. What better way to do that than to write a nerdy Justin
Bieber parody?&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FSy8V-R0_Zw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;While I have hopefully convinced you all of my mission here. I can’t accomplish
any of this in a vacuum. The whole point of my work starts and ends with all of
you. I look forward to speaking with and one day post COVID-19, meeting you all
at meetups and conferences. For now virtual meetups and the Presto Community
Broadcast are a great start. If you have ideas or want to reach out to introduce
yourself, you can find me on 
&lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks for reading this and being a part of this community. One last thing to
tell you about myself, I’m a sucker for cheesy sign-offs so…&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For fast data at resto, Presto is the besto!&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Hello, Presto nation! My name is Brian, and I’m a new developer advocate working at Starburst. Let me give you a little background on how I got here, and cover how my role can help the Presto community.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto at Argentina Big Data Meetup 2020-09-23</title>
      <link href="https://trino.io/blog/2020/09/28/argentina-big-data-meetup.html" rel="alternate" type="text/html" title="Presto at Argentina Big Data Meetup 2020-09-23" />
      <published>2020-09-28T00:00:00+00:00</published>
      <updated>2020-09-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/09/28/argentina-big-data-meetup</id>
      <content type="html" xml:base="https://trino.io/blog/2020/09/28/argentina-big-data-meetup.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IkjNcW7cS2w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Martin made a guest appearance at the 
&lt;a href=&quot;https://www.meetup.com/Argentina-Big-Data-Meetup/&quot;&gt;Argentina Big Data Meetup&lt;/a&gt;
(online) where in the first hour Martin talks about Presto’s past, present, and
future. This includes the history from Facebook to Starburst, some context to
some early architectural decisions, as well as, why Presto was open-sourced. 
Finally, Martin covers recent changes along with some upcoming changes on the
roadmap.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/blog/argentina-big-data-meetup/Presto%20-%20Big%20Data%20Meetup%20Argentina%202020-09-23.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next hour is an interesting talk given by Federico Palladoro covering his
company, Jampp’s, migration strategy from EMR Presto to Docker using Nomad vs
Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/blog/argentina-big-data-meetup/Big%20Data%20Meetup_%20Presto%20on%20Docker.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These presentations are in Spanish.&lt;/p&gt;

&lt;!--more--&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Martin made a guest appearance at the Argentina Big Data Meetup (online) where in the first hour Martin talks about Presto’s past, present, and future. This includes the history from Facebook to Starburst, some context to some early architectural decisions, as well as, why Presto was open-sourced. Finally, Martin covers recent changes along with some upcoming changes on the roadmap. Slides The next hour is an interesting talk given by Federico Palladoro covering his company, Jampp’s, migration strategy from EMR Presto to Docker using Nomad vs Kubernetes. Slides These presentations are in Spanish.</summary>

      
      
    </entry>
  
    <entry>
      <title>Read support for original files of Hive transactional tables in Presto</title>
      <link href="https://trino.io/blog/2020/09/23/hive-acid-original-files.html" rel="alternate" type="text/html" title="Read support for original files of Hive transactional tables in Presto" />
      <published>2020-09-23T00:00:00+00:00</published>
      <updated>2020-09-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/09/23/hive-acid-original-files</id>
      <content type="html" xml:base="https://trino.io/blog/2020/09/23/hive-acid-original-files.html">&lt;p&gt;In &lt;a href=&quot;https://trino.io/docs/current/release/release-331.html&quot;&gt;Presto 331&lt;/a&gt;,
read support for Hive transactional tables was introduced. It works well, if a
user creates a new Hive transactional table and reads it from Presto. However,
if an existing table is converted to a Hive transactional table, Presto would
fail to read data from such a table because read support for original files was
missing. Original files are those files in a Hive transactional table that
existed before the table was converted into a Hive transactional table.
Until version 340, Presto expected all files in a Hive transactional table to be
in Hive ACID format. Users would have to perform a major compaction to convert
original files into ACID files (i.e. base files) in such tables. This is not
always possible as the original flat table (table in non-ACID format) could be
huge and converting all the existing data into ACID format can be very
expensive.&lt;/p&gt;

&lt;p&gt;This blog is an extension of the blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’
support in Presto&lt;/a&gt;. It first describes
original files and then goes into details of read support for such files that
was  added in Presto 340.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;what-are-the-original-files&quot;&gt;What are the original files?&lt;/h1&gt;

&lt;p&gt;Files present in non-transactional ORC tables have the standard ORC schema. When
a flat table is converted into a transactional table, existing files are not
converted into Hive ACID format. Such files, in a transactional table, that are
not in Hive ACID format, are called original files. These files are named as
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_X&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_X_copy_Y&lt;/code&gt;. These files don’t have ACID columns and have
differences in the schema as follows:&lt;/p&gt;

&lt;p&gt;Table Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Original File Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    n_nationkey : int,
    n_name : string,
    n_regionkey : int,
    n_comment : string
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Delta File Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Before Presto 340, Presto used to fail the query if it reads from a Hive
transactional table having original files.&lt;/p&gt;

&lt;h1 id=&quot;update-and-delete-support-on-original-files&quot;&gt;Update and delete support on original files&lt;/h1&gt;

&lt;p&gt;Hive achieves updates/deletes on a row in original files by synthetically
generating ACID columns for original files. Presto follows the same mechanism of
generating ACID columns synthetically as discussed later.&lt;/p&gt;

&lt;h2 id=&quot;acid-column-generation-on-original-files&quot;&gt;ACID column generation on original files&lt;/h2&gt;

&lt;p&gt;Files in Hive ACID format have 5 ACID columns, but we need only 3 columns i.e.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalTransactionId&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucketId&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rowId&lt;/code&gt; to uniquely identify a row. In
this section, we will see how these 3 columns are synthetically generated for
original files.&lt;/p&gt;

&lt;h3 id=&quot;original-transaction-id&quot;&gt;Original transaction ID&lt;/h3&gt;

&lt;p&gt;An original transaction ID is the write ID when a record is first created. For
original files, the original transaction ID is always 0.&lt;/p&gt;

&lt;h3 id=&quot;bucket-id&quot;&gt;Bucket ID&lt;/h3&gt;

&lt;p&gt;Bucket ID is retrieved from the original file name. For the original file
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000ABC_DEF&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000ABC_DEF_copy_G&lt;/code&gt;, the bucket ID will be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ABC&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;row-id&quot;&gt;Row ID&lt;/h3&gt;

&lt;p&gt;To calculate the row ID, the total row count of all the original files, which
come before the current one in lexicographical order, is calculated.
Then, the row ID is equal to the sum of the value calculated and local row ID in
the current original file.&lt;/p&gt;

&lt;p&gt;Here is an example to calculate the global Row ID of the 3rd row of an original
File &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_0_copy_2&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000000_0            -&amp;gt; 	X1 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_1     -&amp;gt; 	X2 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_2     -&amp;gt;	[ Row 0 ]
                        [ Row 1 ]
                        [ Row 2 ]   &amp;lt;- Local Row ID (returned by filePosition in OrcRecordReader) = 2
                                       Global Row ID = (X1+X2+2)
                        [ Row 3 ]

000000_0_copy_3     -&amp;gt;  X4 Rows
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; As we see, additional computations are required to generate row IDs
while reading original files, therefore, read is slower than ACID format files
in the transactional table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once Presto has the 3 ACID columns for a row, it can check for update/delete on
it. Delete deltas, written by Hive for Original files, have row IDs generated by
following the same
strategy as discussed above, hence, the same logic of filtering out deleted rows
as discussed in &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’ support in Presto
&lt;/a&gt; works with the original files too.&lt;/p&gt;

&lt;h1 id=&quot;changes-in-presto-to-support-reading-original-files&quot;&gt;Changes in Presto to support reading original files&lt;/h1&gt;

&lt;p&gt;Presto split generation logic and ORC reader is modified to add read support
for original files. Following are the changes done at coordinator and worker
level:&lt;/p&gt;

&lt;h2 id=&quot;split-generation&quot;&gt;Split generation&lt;/h2&gt;

&lt;p&gt;We use a new class named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; to store &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFiles
&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaFiles&lt;/code&gt; for HiveSplit. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader.loadPartitions
&lt;/code&gt; is called in an executor to create splits for each partition. In addition
to the steps mentioned in blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’ support in
Presto&lt;/a&gt;, Presto does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Original files, ACID subdirectories (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;) are
figured out by listing the partition location by Hive AcidUtils Helper class.&lt;/li&gt;
  &lt;li&gt;Registry for delete deltas &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaInfo&lt;/code&gt; is created which has minimal
information through which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; path can be constructed by the workers.&lt;/li&gt;
  &lt;li&gt;Registry for original files &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFileInfo&lt;/code&gt; is created which has
information like file name, size and bucket ID.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder&lt;/code&gt; keeps a map
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder.bucketIdToOriginalFileInfoMap&lt;/code&gt; of bucket ID to the list of
original files belonging to the same bucket.&lt;/li&gt;
  &lt;li&gt;Hive splits are created for each original file, base and delta directories.
Each hive split has a construct &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For an original file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; has:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;&lt;strong&gt;Bucket ID:&lt;/strong&gt; Bucket ID of the original file.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;OriginalFilesList:&lt;/strong&gt; List of all the original files belong to the
 same bucket calculated from
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder.bucketIdToOriginalFileInfoMap&lt;/code&gt;.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;DeleteDeltaFilesList:&lt;/strong&gt; List of delete deltas.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For an base/delta file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; has:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;&lt;strong&gt;DeleteDeltaFilesList:&lt;/strong&gt; List of delete deltas.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;reading-hive-original-files-data-in-workers&quot;&gt;Reading Hive original files data in workers&lt;/h2&gt;

&lt;p&gt;Hive splits generated during the split generation phase make their way to worker
nodes where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSourceFactory&lt;/code&gt; is used to create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSource&lt;/code&gt; for
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt; operator. In addition to the steps mentioned in blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and
transactional tables’ support in Presto&lt;/a&gt;
, Presto does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt; is created for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; locations, if any.&lt;/li&gt;
  &lt;li&gt;For original file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSourceFactory&lt;/code&gt; fetches &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalFilesList&lt;/code&gt;
from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; and calculates &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalFileRowId&lt;/code&gt; by calling
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFilesUtils.getPrecedingRowCount&lt;/code&gt; and sends this information to
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSource&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSouce&lt;/code&gt; returns rows from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt; which are not present in
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;follow-up&quot;&gt;Follow up&lt;/h1&gt;

&lt;p&gt;For an original file split, the current implementation may take quadratic time
in the worst case to calculate global row ID by reading row IDs from the
original files’ footer. It may be optimized by keeping a query level cache in
worker nodes or by precomputing global row IDs in coordinator during split
computation.&lt;/p&gt;

&lt;h1 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h1&gt;

&lt;p&gt;I would also like to express my gratitude to everyone who helped me throughout
developing the feature. Thank you
&lt;a href=&quot;https://in.linkedin.com/in/shubham-tagra-267a5838&quot;&gt;Shubham Tagra&lt;/a&gt; for
brainstorming sessions and providing continuous guidance on Presto Hive ACID.
Thank you &lt;a href=&quot;https://www.linkedin.com/in/piotrfindeisen/&quot;&gt;Piotr Findeisen&lt;/a&gt; for
helping me further refine the code with insightful code reviews.&lt;/p&gt;</content>

      
        <author>
          <name>Harmandeep Singh, Qubole</name>
        </author>
      

      <summary>In Presto 331, read support for Hive transactional tables was introduced. It works well, if a user creates a new Hive transactional table and reads it from Presto. However, if an existing table is converted to a Hive transactional table, Presto would fail to read data from such a table because read support for original files was missing. Original files are those files in a Hive transactional table that existed before the table was converted into a Hive transactional table. Until version 340, Presto expected all files in a Hive transactional table to be in Hive ACID format. Users would have to perform a major compaction to convert original files into ACID files (i.e. base files) in such tables. This is not always possible as the original flat table (table in non-ACID format) could be huge and converting all the existing data into ACID format can be very expensive. This blog is an extension of the blog Hive ACID and transactional tables’ support in Presto. It first describes original files and then goes into details of read support for such files that was added in Presto 340.</summary>

      
      
    </entry>
  
    <entry>
      <title>Configuring and Tuning Presto Performance with Dain</title>
      <link href="https://trino.io/blog/2020/08/27/training-performance.html" rel="alternate" type="text/html" title="Configuring and Tuning Presto Performance with Dain" />
      <published>2020-08-27T00:00:00+00:00</published>
      <updated>2020-08-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/27/training-performance</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/27/training-performance.html">&lt;p&gt;With the help of &lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt;, you composed a number of useful queries.
You gained valuable insights from the resulting data. However these complex
queries take time to run. If only you could make them run faster. I think we
have just what you need:&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Understanding and Tuning Presto Query Processing&lt;/strong&gt;
with Dain Sundstrom.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We did it again! Joined by over 120 eager students we discussed all sorts of
aspects of sizing and tuning your Presto cluster. Yet again we received so many
questions that we went over our planned time budget. The material covered is
crucial to run a Presto deployment successfully in production, so make sure you
check out the recording and the slide deck:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/09/Presto-Training-Series-Configuring-Tuning-Presto-Performance.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Pu80FkBRP-k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;This training session is geared towards helping users tune and size their Presto
deployment for optimal performance. Delivered by Dain Sundstrom,  this session
covers the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Cluster configuration and node sizing&lt;/li&gt;
  &lt;li&gt;Memory configuration and management&lt;/li&gt;
  &lt;li&gt;Improving task concurrency and worker scheduling&lt;/li&gt;
  &lt;li&gt;Tuning your JVM configuration&lt;/li&gt;
  &lt;li&gt;Investigating queries for join order and other criteria&lt;/li&gt;
  &lt;li&gt;Tuning the cost-based optimizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 9 September 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/38kt5ih&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>With the help of David’s training about advanced SQL, you composed a number of useful queries. You gained valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need: Join us for a free webinar Understanding and Tuning Presto Query Processing with Dain Sundstrom. Update: We did it again! Joined by over 120 eager students we discussed all sorts of aspects of sizing and tuning your Presto cluster. Yet again we received so many questions that we went over our planned time budget. The material covered is crucial to run a Presto deployment successfully in production, so make sure you check out the recording and the slide deck: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Faster Queries on Nested Data</title>
      <link href="https://trino.io/blog/2020/08/14/dereference-pushdown.html" rel="alternate" type="text/html" title="Faster Queries on Nested Data" />
      <published>2020-08-14T00:00:00+00:00</published>
      <updated>2020-08-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/14/dereference-pushdown</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/14/dereference-pushdown.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-334.html&quot;&gt;Presto 334&lt;/a&gt;
adds significant performance improvements for queries
accessing nested fields inside struct columns. They have been optimized through
the pushdown of dereference expressions. With this feature, the query execution
prunes structural data eagerly, extracting the necessary fields.&lt;/p&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowType&lt;/code&gt; is a built-in data type of Presto, storing the in-memory
representation of commonly used nested data types of the connectors, eg.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT&lt;/code&gt; type in Hive. Datasets often contain wide and deeply nested structural
columns, i.e. a struct column having hundreds of fields, with the fields being
nested themselves.&lt;/p&gt;

&lt;p&gt;Although such &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowType&lt;/code&gt; columns can contain plenty of data, most of the
analytical queries access just a few fields out of it. Without dereference
pushdown, Presto scans the whole column, and shuffles all that data around
before projecting the necessary fields. This suboptimal execution causes higher
CPU usage, higher memory usage and higher query latencies, than required. The
unnecessary operations get even more expensive with wider/deeper structs and
more complex query plans.&lt;/p&gt;

&lt;p&gt;LinkedIn’s data ecosystem makes heavy usage of nested columns. It is common to
have 2-3 levels of nesting, and up to 50 fields in most of our tracking tables.
Because of the query execution inefficiency for nested fields, ETL pipelines
were set up at LinkedIn to copy the nested columns as a set of top-level columns
 corresponding to subfields. This step added overhead in our ingestion process
and delayed data availability for analytics. It also caused ORC schemas to be
inconsistent with the rest of the infrastructure, making it harder to migrate
from existing flows on row-oriented formats.&lt;/p&gt;

&lt;p&gt;Similarly, Lyft’s schemas make heavy use of nested data to decompose a ride
into its routes, riders, segments, modes, and geo-coordinates. Prior to the
performance improvements, analytical queries would either need to be run on
clusters with very long timeouts, or the data would have to be flattened before
being analyzed, adding an extra ETL step. Not only would this be costly, it
would also cause the original schema to diverge in our data warehouse making it
more difficult for data scientists to understand.&lt;/p&gt;

&lt;p&gt;The dereference pushdown optimization in Presto is having a massive impact on
the ingestion story at both LinkedIn and Lyft. Nested data is now being made
available faster for consumption with a consistency of structure across all
stores, while maintaining performance parity for analytical queries.&lt;/p&gt;

&lt;h1 id=&quot;example&quot;&gt;Example&lt;/h1&gt;

&lt;p&gt;Say we have a Hive table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt;, with a struct-typed column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; in the
schema. The column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; is wide and deeply nested, i.e. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW(company
varchar, requirements ROW(skills array(...), education ROW(...), salary ...) ,
...)&lt;/code&gt;. Most queries would access a small percentage of data from this struct
using the dereference projection (the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.&lt;/code&gt; operation). Consider such a query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;
below.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;appid&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;company&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;applications&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jobs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jobid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jobid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It should suffice to scan only one field &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;company&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;J.job_info&lt;/code&gt; for
executing this query. But, without dereference pushdown, Presto scans and
shuffles everything from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt;, only to project a single field at the end.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/original_plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;solution-pushdown-of-dereference-expressions&quot;&gt;Solution: Pushdown of Dereference Expressions&lt;/h1&gt;

&lt;p&gt;With dereference pushdown, Presto optimizes queries by extracting the sufficient
 fields from a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; as early as possible. This is enforced by modifying the
query plan through a set of optimizers, and can be broadly divided into two
parts.&lt;/p&gt;

&lt;p&gt;First, dereference projections are extracted in the query plan and pushed as
close to the table scan as possible. This happens independent of what the
connector is. Secondly, there is a further improvement for Hive tables. The
Hive Connector and ORC/Parquet readers have been optimized to scan only the
sufficient subfield columns.&lt;/p&gt;

&lt;p&gt;Pushdown of predicates on the subfields is also a crucial optimization. For
example, if a query has filters on subfields (i.e. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.b &amp;gt; 5&lt;/code&gt;), they should be
utilized by ORC/Parquet readers while scanning files. The pushdown helps with
the pruning of files, stripes and row-groups based on column-level statistics.
This optimization is achieved as a byproduct of the above two optimizations.&lt;/p&gt;

&lt;p&gt;With the dereference pushdown, queries observe significant performance gains in
terms of CPU/memory usage and query runtime, roughly proportional to the
relative size of nested columns compared to the accessed fields.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-in-query-plan&quot;&gt;Pushdown in Query Plan&lt;/h2&gt;

&lt;p&gt;The goal here is to execute dereference projections as early as possible. This
usually means performing them right after the table scans.&lt;/p&gt;

&lt;p&gt;A projection operation that performs dereferencing on input symbols (i.e.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt;) reduces the amount of data going up the plan tree. Pushing
dereference projections down means that we are pruning data early. It reduces
the amount of data being processed and shuffled in query execution. For the
example query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;, the query plan looks like the following when dereference
pushdown is enabled.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/transformed_plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The projection &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt; now directly follows the scan of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt; table,
 avoiding the propagation the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Limit&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Join&lt;/code&gt; nodes. Note
that all of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; is still being scanned, and pruning it in the reader
requires connector-dependent optimizations.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-in-the-hive-connector&quot;&gt;Pushdown in the Hive Connector&lt;/h2&gt;

&lt;p&gt;In columnar formats like ORC and Parquet, the data is laid out in a columnar
fashion even for subfields. If we have a column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT(f1, f2, f3)&lt;/code&gt;, the
subfields &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f1&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f2&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f3&lt;/code&gt; are stored as independent columns. An optimized
query engine should only scan the required fields through its ORC reader,
skipping the rest. This optimization has been added for Hive connector.&lt;/p&gt;

&lt;p&gt;Dereference projections above a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScanNode&lt;/code&gt; are pushed down in the Hive
connector as “virtual” (or “projected”) columns. The query plan is modified to
refer to these new columns. For the query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt; table would be scanned
differently with this optimization, as shown below. The projection is now
embedded in the Hive connector. Here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info#company&lt;/code&gt; can be thought of as
a virtual column representing the subfield &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/connector_pushdown.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The Hive connector handles the projections before returning columns to Presto’s
engine. It provides the required virtual columns to format-specific readers.
ORC and Parquet readers optimize their scans based on subfields required,
increasing their read throughput. Subfield pruning is not possible for
row-oriented format readers (e.g. AVRO). For them, Hive connector performs
adaptation to project the required fields.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-of-predicates-on-subfields&quot;&gt;Pushdown of Predicates on Subfields&lt;/h2&gt;

&lt;p&gt;Columnar formats store per-column statistics in the data files, which can be
used by the readers for filtering. eg. if a query contains filter &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y = 5&lt;/code&gt; for a
top-level column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;, Presto’s ORC reader can skip ORC stripes and files by
looking at the upper and lower bounds for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; in the statistics.&lt;/p&gt;

&lt;p&gt;The same concept of predicate-based pruning can work for filters involving
subfields, since the statistics are also stored for subfield columns. i.e.
Presto’s ORC/Parquet reader should be able to filter based on a constraint like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1 = 5&lt;/code&gt; for more optimal scans. Good news! In the final optimized plan,
predicates on a subfield are pushed down to the hive connector as a constraint
on the corresponding virtual column, and later used for optimizing the scan.
The complete logic is a bit complicated to explain here, but can be illustrated
through the following example.&lt;/p&gt;

&lt;p&gt;Given an initial plan with a predicate on a dereferenced field (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1 = 5&lt;/code&gt;), a
chain of optimizers transform it to a more optimal plan with reader-level
predicates. In the future, the same optimization will be added to the Parquet
reader.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/predicate_pushdown.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the final plan, Hive connector knows to scan the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; and the subfield
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1&lt;/code&gt;. It also takes advantage of the “virtual” column constraint &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x#f1 = 5&lt;/code&gt;
for reader-level pruning.&lt;/p&gt;

&lt;h2 id=&quot;performance-improvement&quot;&gt;Performance Improvement&lt;/h2&gt;

&lt;p&gt;Dereference pushdown improves performance for queries accessing nested fields
in multiple ways. First, it increases the read throughput for table scans,
reducing the CPU time. The pruning of fields during the scan also means lesser
data to process for all downstream operators and tasks. So the early
projections result in more optimal execution for any operations that involve
shuffle or copy of data. Moreover, for ORC/Parquet, the read performance
improves in the case of selective filters on subfields.&lt;/p&gt;

&lt;p&gt;Below are some experimental results on a production dataset at LinkedIn which
contains 3 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT&lt;/code&gt; columns, having ~20-30 small subfields in each. The
example queries used in the analysis access only a few subfields. The queries
have been listed as their approximate query shape for the sake of brevity. The
plots compare CPU usage, peak memory usage and averaged query wall time.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/cpu_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/memory_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/runtime_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;CPU usage and peak memory usage show orders-of-magnitude improvement in
presence of dereference pushdown. Query wall times also reduce considerably,
and this improvement is more drastic for the relatively complex &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; query,
as expected.&lt;/p&gt;

&lt;p&gt;Please note that these are not benchmarks! The performance improvement you’ll
see will vary depending on how many columns are contained in your nested data
versus how many you’ve referenced. At Lyft we saw improvements of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50x&lt;/code&gt; for some
queries!&lt;/p&gt;

&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;

&lt;p&gt;The pushdown of dereference expressions can be extended to arrays. i.e.
dereference operations applied after unnesting an array should also get pushed
down to the readers. For example, using our jobs table from before, our
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs.job_info&lt;/code&gt; structure may contain a repeating structure such as
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;required_skills&lt;/code&gt;. With the following query, the entire required_skills
structure would be read even though only a small part of it is being referenced.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jobs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;required_skills&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;years_of_experience&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The work for this improvement is being tracked in &lt;a href=&quot;https://github.com/trinodb/trino/issues/3925&quot;&gt;this issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Similar to Hive Connector, connector-level dereference pushdown can be extended
to other connectors supporting nested types.&lt;/p&gt;

&lt;p&gt;Another future improvement will be the pushdown of predicates on subfields for
data stored in Parquet format. Although the pruning of nested fields occurs
with Parquet, the predicates are not yet pushed down into the reader.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Pushing down dereference operations in the query provides massive performance
gains, especially while operating on large structs. At LinkedIn and Lyft, this
feature has shown great impact for analytical queries on nested datasets.&lt;/p&gt;

&lt;p&gt;We’re excited for the Presto community to try it out. Feel free to dig into
&lt;a href=&quot;https://github.com/trinodb/trino/issues/1953&quot;&gt;this github issue&lt;/a&gt; for
technical details. Please reach out to us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; for further
disucssions or reporting issues.&lt;/p&gt;</content>

      
        <author>
          <name>Pratham Desai (LinkedIn), James Taylor (Lyft)</name>
        </author>
      

      <summary>Presto 334 adds significant performance improvements for queries accessing nested fields inside struct columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.</summary>

      
      
    </entry>
  
    <entry>
      <title>Securing Presto with Dain</title>
      <link href="https://trino.io/blog/2020/08/13/training-security.html" rel="alternate" type="text/html" title="Securing Presto with Dain" />
      <published>2020-08-13T00:00:00+00:00</published>
      <updated>2020-08-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/13/training-security</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/13/training-security.html">&lt;p&gt;All the useful and fast running queries your created with the knowledge from
&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt; and &lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Martin’s training about query
tuning&lt;/a&gt; created a problem. You
now have lots of users on your Presto cluster that want to access all sorts of
different data source, have different privileges and corporate security asked
about your plans. How about you tap into some help from Dain:&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Securing Presto&lt;/strong&gt; with Dain Sundstrom.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;What a great training session! Dain captured the audience and lots of questions
were covered beyond all the great material from the slides. Everything is now
available for your convenience:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Training-Securing-Presto.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/KiMyRc3PSh0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;In this training session Dain teaches you how to securely deploy Presto at
scale. We cover how to secure Presto itself, access to Presto, and access to
your underlying data. This session covers the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto authentication, including password &amp;amp; LDAP Authentication&lt;/li&gt;
  &lt;li&gt;Authorization to access your data sources&lt;/li&gt;
  &lt;li&gt;Encryption including Presto client-to-coordinator communication&lt;/li&gt;
  &lt;li&gt;Secure communication in the cluster&lt;/li&gt;
  &lt;li&gt;Support for Kerberos&lt;/li&gt;
  &lt;li&gt;Secrets usage for configuration files including catalogs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 26 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/3ioQu7c&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>All the useful and fast running queries your created with the knowledge from David’s training about advanced SQL and Martin’s training about query tuning created a problem. You now have lots of users on your Presto cluster that want to access all sorts of different data source, have different privileges and corporate security asked about your plans. How about you tap into some help from Dain: Join us for a free webinar Securing Presto with Dain Sundstrom. Update: What a great training session! Dain captured the audience and lots of questions were covered beyond all the great material from the slides. Everything is now available for your convenience: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Happy Eighth Birthday Presto!</title>
      <link href="https://trino.io/blog/2020/08/08/presto-eighth-birthday.html" rel="alternate" type="text/html" title="Happy Eighth Birthday Presto!" />
      <published>2020-08-08T00:00:00+00:00</published>
      <updated>2020-08-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/08/presto-eighth-birthday</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/08/presto-eighth-birthday.html">&lt;p&gt;Today, Presto turned eight years old! As Presto co-creator
Dain Sundstrom points out, there’s a reason why the eighth birthday is a
little special:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/daindumb/status/1292296395219595264&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/dain-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Even though Presto is a relatively young project, countless consumers, 
developers, and business personnel have felt its impact. It’s pretty clear
that there’s a lot going on with this project since its inception eight years
ago. Recently, the Presto project hit a stunning twenty thousand commits:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/mtraverso/status/1289036458670448641&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/martin-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It makes you ponder how Presto became so successful in such a short amount of
time. Should the credit be given to the four founders who brought Presto to
life? Perhaps the supporting companies that provided the conditions that
called for such innovation? Or was it the community built around Presto since
its inception that has enabled this radical success?&lt;/p&gt;

&lt;p&gt;In my mind, it’s a combination of these conditions but with a special
emphasis on the latter. Without the founders’ dedication to designing Presto
for speed and extensibility and putting emphasis on a welcoming and
inclusive open-source community we wouldn’t have seen Presto outside the
walls of Facebook. Without companies like Facebook, Teradata, Netflix, and
Treasure Data that acted as a catalyst to this change, we wouldn’t have the initial
use cases that tested Presto’s scalable design and shined a light on Presto
to bring the awareness to the masses. Finally, without the passionate community
of developers who took an interest in giving back their time and efforts, 
Presto wouldn’t be anywhere near as robust or flexible as it is today. Now 
Presto has reached an unprecedented level of maturity and helped many
developers, scientists, and analysts find the answers they were looking for. 
It speaks volumes about just how special the project really is.&lt;/p&gt;

&lt;p&gt;This community of developers is really special in that the level of
expectations for developers new to OSS (open source software) is really a
low bar. Speaking from personal experience as a serial OSS attempter, when
I joined I noticed everyone treating each other with respect, a
willingness to teach, and a deliberate openness to new ideas. I interfaced
with engineers working at Starburst, the founders of Presto, and many
passionate developers like myself who also knew a thing or two about the
project that were so helpful to me. This was unlike other experiences I had
in the past that made joining an open source community an elite club that
only existing members had access to. To me, this inclusiveness is why the
presto community is thriving.&lt;/p&gt;

&lt;p&gt;The Presto community is most vibrant in &lt;a href=&quot;/slack.html&quot;&gt;the slack channel&lt;/a&gt;. Here users and
developers may ask questions such as installing and using presto, discussing
bug fixes or design changes, or sometimes just sharing great experiences or
news related to presto. This slack channel has recently grown to 2300 users
with around 500 active users at any given time.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/prestosql/status/1278393800092643328&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/presto-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To celebrate Presto really means to celebrate this community, and while we
can’t thank every individual who has contributed, we want to thank just a
handful of you for your hard work. Thanks to these engineers for their
contributions to the Presto project!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/ebyhr&quot;&gt;ebyhr&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;kasiafi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;Praveen2112&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/phd3&quot;&gt;phd3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lxynov&quot;&gt;lxynov&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;pettyjamesm&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Lewuathe&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;raunaqmorarka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/elonazoulay&quot;&gt;elonazoulay&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/luohao&quot;&gt;luohao&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While linking you to a blog post may not be a satisfactory thank you, the
gratitude is perhaps best &lt;a href=&quot;https://groups.google
.com/g/presto-users/c/647v2ckRyGA&quot;&gt;stated on the presto-users&lt;/a&gt; google group by co-creator Martin Traverso:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When Dain, David, Eric and I started the project that many
years ago, we had the goal to make it open source and build a community
around it. What we never imagined was how far it would go, how widely it
would be adopted across the entire world, and how many amazing people we
would meet and get a chance to work with along the way.&lt;/p&gt;

  &lt;p&gt;Congratulations to everyone who played a part in that journey. It’s been a
great ride so far. Here’s to another 8 years!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thanks to everyone who has contributed to Presto, congratulations to the
founders for starting such an amazing project. Together let’s make Presto the
most useful analytics tool yet!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Today, Presto turned eight years old! As Presto co-creator Dain Sundstrom points out, there’s a reason why the eighth birthday is a little special:</summary>

      
      
    </entry>
  
    <entry>
      <title>Understanding and Tuning Presto Query Processing with Martin</title>
      <link href="https://trino.io/blog/2020/07/30/training-query-tuning.html" rel="alternate" type="text/html" title="Understanding and Tuning Presto Query Processing with Martin" />
      <published>2020-07-30T00:00:00+00:00</published>
      <updated>2020-07-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/30/training-query-tuning</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/30/training-query-tuning.html">&lt;p&gt;With the help of &lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt; you composed a number of useful queries.
You gain valuable insights from the resulting data. However these complex
queries take time to run. If only you could make them run faster. I think we
have just what you need coming up.&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Understanding and Tuning Presto Query Processing&lt;/strong&gt;
with Martin Traverso.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We are delighted that such an advanced topic attracted close to 150 attendees.
Everyone learned a lot and many additional questions came up during class and in
the Q&amp;amp;A overtime. Take advantage of the slides and recording to recapture, or if
you could not attend:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Training-Understanding-and-Tuning-Presto-Query-Processing.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/GcS02yTNwC0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;In this training session Martin helps to understand how Presto executes query.
That knowledge can help you improve query performance. For example, the explain
plan is a powerful tool, but reading the plans and make sense of them can be
overwhelming. We explore how to create an explain plan for you query and how to
read it. We look at the work the cost-based optimizer performs and how you can
potentially help Presto run your queries even faster. This session covers to
following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Explain the EXPLAIN&lt;/li&gt;
  &lt;li&gt;Learn how queries are analyzed and executed&lt;/li&gt;
  &lt;li&gt;Understand what the optimizer does, including some of its limitations&lt;/li&gt;
  &lt;li&gt;Showcase the cost-based optimizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 12 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2VB9DZP&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>With the help of David’s training about advanced SQL you composed a number of useful queries. You gain valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need coming up. Join us for a free webinar Understanding and Tuning Presto Query Processing with Martin Traverso. Update: We are delighted that such an advanced topic attracted close to 150 attendees. Everyone learned a lot and many additional questions came up during class and in the Q&amp;amp;A overtime. Take advantage of the slides and recording to recapture, or if you could not attend: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto for Analytics at Pinterest</title>
      <link href="https://trino.io/blog/2020/07/22/presto-summit-pinterest.html" rel="alternate" type="text/html" title="Presto for Analytics at Pinterest" />
      <published>2020-07-22T00:00:00+00:00</published>
      <updated>2020-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/22/presto-summit-pinterest</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/22/presto-summit-pinterest.html">&lt;p&gt;After &lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto&lt;/a&gt; and the two
real world examples from &lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Zuora&lt;/a&gt;
and &lt;a href=&quot;/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;Arm Treasure Data&lt;/a&gt;, I hope
you are ready to hear from a well known brand using Presto in their analytics
ecosystem – &lt;a href=&quot;https://www.pinterest.com&quot;&gt;Pinterest&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presto: A key component for analytics at Pinterest&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;Our webinar was well received and caused a whole bunch of questions. Check out
the slides and video recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Summit-Webinar-Series-Presto-at-Pinterest.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/mZ59CTOPkl8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us to learn how Pinterest uses Presto to meet the company’s rapidly
increasing analytics need, while keeping the cost low.&lt;/p&gt;

&lt;p&gt;Presto plays an important role in Pinterest’s analytics ecosystem. Find out how
runs Presto at the company, how Pinterest leverages warning systems to guide
users to write better queries, and how Pinterest scales up their clusters to
meet with their rapid growing and complex workloads.&lt;/p&gt;

&lt;p&gt;The following topics are discussed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto integrated with Pinterest infrastructure&lt;/li&gt;
  &lt;li&gt;Setup of a warning systems to guide users write better queries&lt;/li&gt;
  &lt;li&gt;Management of complex workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Speakers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/puchengy/&quot;&gt;Pucheng Yang&lt;/a&gt; is a software engineer
at Pinterest working on the Presto, SparkSQL and Hive query engines. He joined
the company two years ago as a new grad.&lt;/li&gt;
  &lt;li&gt;Yi He is a software engineer at Pinterest. Prior to Pinterest, he worked at
Facebook on Presto OLAP and query federation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 19 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/32FfRfm&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us and participating in the webinar
with their questions.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>After State of Presto and the two real world examples from Zuora and Arm Treasure Data, I hope you are ready to hear from a well known brand using Presto in their analytics ecosystem – Pinterest: Presto: A key component for analytics at Pinterest Update: Our webinar was well received and caused a whole bunch of questions. Check out the slides and video recording: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Advanced SQL in Presto with David</title>
      <link href="https://trino.io/blog/2020/07/15/training-advanced-sql.html" rel="alternate" type="text/html" title="Advanced SQL in Presto with David" />
      <published>2020-07-15T00:00:00+00:00</published>
      <updated>2020-07-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/15/training-advanced-sql</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/15/training-advanced-sql.html">&lt;p&gt;You have read our book &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;, practiced with various SQL examples, and
consulted our &lt;a href=&quot;https://trino.io/docs&quot;&gt;Presto documentation&lt;/a&gt;. Great steps to
become a Presto and SQL expert. However, learning efficient and advanced SQL can
take years of experience. Luckily we have some help from an expert coming your
way.&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Advanced SQL in Presto&lt;/strong&gt; with David Phillips.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;With nearly 200 live attendees and a two hour session we ended with lots of
questions from the engaged audience. After 20 minutes overtime we wrapped up the
successful event. Check out the presentation slides and the recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/07/Presto-Training-Series-Advanced-SQL-Features-in-Presto.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/HN_95ObHAiw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;Our first session with David is geared towards helping users understand how to
run more complex and comprehensive SQL queries with Presto. Delivered by David
Phillips, this session covers to following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Using JSON and other complex data types&lt;/li&gt;
  &lt;li&gt;Advanced aggregation techniques&lt;/li&gt;
  &lt;li&gt;Window functions&lt;/li&gt;
  &lt;li&gt;Array and map functions&lt;/li&gt;
  &lt;li&gt;Lambda expressions&lt;/li&gt;
  &lt;li&gt;Many other SQL functions and features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 29 July 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2YOtx5f&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>You have read our book Trino: The Definitive Guide, practiced with various SQL examples, and consulted our Presto documentation. Great steps to become a Presto and SQL expert. However, learning efficient and advanced SQL can take years of experience. Luckily we have some help from an expert coming your way. Join us for a free webinar Advanced SQL in Presto with David Phillips. Update: With nearly 200 live attendees and a two hour session we ended with lots of questions from the engaged audience. After 20 minutes overtime we wrapped up the successful event. Check out the presentation slides and the recording: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Migration at Arm Treasure Data</title>
      <link href="https://trino.io/blog/2020/07/06/presto-summit-arm-td.html" rel="alternate" type="text/html" title="Presto Migration at Arm Treasure Data" />
      <published>2020-07-06T00:00:00+00:00</published>
      <updated>2020-07-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/06/presto-summit-arm-td</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/06/presto-summit-arm-td.html">&lt;p&gt;Both events of our virtual Presto Summit tour event,
&lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto&lt;/a&gt; and the
&lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Zuora presentation&lt;/a&gt;
were well received and recordings are available for you to watch. Your next
chance to learn more about Presto in the real world comes from Arm Treasure
Data and is presented by Taro L. Saito:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presto at Arm Treasure Data: A Journey of Migrating 1 Million Presto Queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with some in-depth, detailed questions from the audience.
Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/NGMugRsNraE&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us to discover how as part of their customer data platform, Arm Treasure
Data utilizes Presto as the query engine processing over 1 million queries per
day. This system supports the data business of over 500 companies in three
regions - US, EU, and Asia.&lt;/p&gt;

&lt;p&gt;Arm Treasure Data has been using Presto 0.205 and in 2019 started a big
migration project to Presto 317. Although they performed extensive query
simulations to check any incompatibilities, the team faced many unexpected challenges.
In this session you learn more about their migration of the production system:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Technical details on many challenges&lt;/li&gt;
  &lt;li&gt;Key lessons learned&lt;/li&gt;
  &lt;li&gt;Latest updates on AWS Graviton2, the next generation of 64-bit Arm instance
types that can be used for running Presto&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our speaker, Taro L. Saito, is a principal software engineer at Arm Treasure
Data and Ph.D. of computer science at the University of Tokyo. He has built a
cloud database service at Arm Treasure Data, which is processing over millions
of queries every day. Previously, he worked as an assistant professor at the
University of Tokyo, studying distributed database systems and their
applications to genome sciences. He has created several open-source projects,
including Airframe, MessagePack, and various sbt plugins (sbt-sonatype,
sbt-pack) for Scala that help to publish thousands of OSS projects.&lt;/p&gt;

&lt;p&gt;Date: Thursday, 16 July 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/38wrS80&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Both events of our virtual Presto Summit tour event, State of Presto and the Zuora presentation were well received and recordings are available for you to watch. Your next chance to learn more about Presto in the real world comes from Arm Treasure Data and is presented by Taro L. Saito: Presto at Arm Treasure Data: A Journey of Migrating 1 Million Presto Queries Update: We had a great event with some in-depth, detailed questions from the audience. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Data Integrity Protection in Presto</title>
      <link href="https://trino.io/blog/2020/06/25/data-integrity-protection.html" rel="alternate" type="text/html" title="Data Integrity Protection in Presto" />
      <published>2020-06-25T00:00:00+00:00</published>
      <updated>2020-06-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/25/data-integrity-protection</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/25/data-integrity-protection.html">&lt;p&gt;It all started on an Thursday afternoon in March, when &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
was grilling Presto with heavy rounds of benchmarks, as we were ramping up to Starburst Enterprise
Presto 332-e release. Karol discovered what seemed to be a serious regression, and turned out to be even more
serious Cloud environment issue.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-benchmarks&quot;&gt;Presto Benchmarks&lt;/h1&gt;

&lt;p&gt;At the Presto project, we take serious care of stability and efficiency, so releases undergo
rigorous performance benchmarks. The intention is to safe guard against any performance regressions
or stability problems. Usually, the performance improvements are benchmarked separately when they
are being added to the codebase. At Starburst, those benchmarks are even more important, especially
for the Starburst Enterprise Presto LTS releases.&lt;/p&gt;

&lt;p&gt;On a side note, we use &lt;a href=&quot;https://github.com/trinodb/benchto&quot;&gt;Benchto&lt;/a&gt; for organizing
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/presto-benchto-benchmarks&quot;&gt;Presto benchmark suites&lt;/a&gt;,
executing them and collecting the results. We use managed &lt;a href=&quot;https://kubernetes.io/&quot;&gt;Kubernetes&lt;/a&gt; in a public
cloud for provisioning Presto clusters, along with &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/starburst-on-kubernetes/&quot;&gt;Starburst Enterprise Presto Kubernetes&lt;/a&gt;.
We use &lt;a href=&quot;https://jupyter.org/&quot;&gt;Jupyter&lt;/a&gt; for producing result reports in HTML and PDF formats.&lt;/p&gt;

&lt;h1 id=&quot;alleged-regression&quot;&gt;Alleged Regression&lt;/h1&gt;

&lt;p&gt;It all started in March, when &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
was grilling Presto with heavy rounds of benchmarks for the Starburst Enterprise Presto 332-e release.
On one Thursday afternoon he reported stability problems, with few benchmark runs failing with
exceptions similar to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200326_150852_00338_dj225): Unknown block encoding:
LONG_ARRAY� � �� � @@@���� �@  @ � �@@@ @@� @�@D�� @@��@ `� @@� @#�@ � 0�
... (9550 more bytes)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In Presto, a block encoding is a way of encoding a particular Block type (here, a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LongArrayBlock&lt;/code&gt;).
They are used when exchanging blocks of data between Presto nodes, or in spill to disk.
Blocks form a polymorphic class hierarchy, so every time a block is encoded, we need
to also store the encoding identifier. The encoding identifier (here, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LONG_ARRAY&lt;/code&gt; string)
is written as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;string length&amp;gt;&lt;/code&gt; (4-byte, signed integer in little-endian) followed by
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;string bytes&amp;gt;&lt;/code&gt; containing the UTF-8 representation of the encoding id. Clearly, in the case above,
the receiver read the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;encoding id length&amp;gt;&lt;/code&gt; as 9623 instead of 10! How could that be ever possible?&lt;/p&gt;

&lt;p&gt;Presto 332 brought a lot of good changes and upgrade to Java 11 was one of them.
Therefore, Starburst Enterprise Presto 332-e was the first Starburst release using Java 11 by default.
For earlier releases, we ran benchmarks using AWS EC2 machines orchestrated with &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/aws/&quot;&gt;Starburst’s Presto
CloudFormation Template (CFT)&lt;/a&gt;. This was also the first time we did
Presto release benchmarks running on Kubernetes clusters, with AWS EKS. We could suspect many different factors
as being the cause. We started to sift through the code, search team’s “collective brain” and
the Internet for any ideas. One of the important sources was Vijay Pandurangan’s writeup on &lt;a href=&quot;https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19&quot;&gt;data
corruption bug discovered by Twitter in 2015&lt;/a&gt;. Of course, we also repeated benchmark runs. Seeing is believing.&lt;/p&gt;

&lt;h1 id=&quot;production-issues&quot;&gt;Production issues&lt;/h1&gt;

&lt;p&gt;On the next day, a customer reported similar problems with their Presto cluster. Of course, they
were not running a yet-to-be-released version that we were still benchmarking. They run into what seemed to
be a very serious regression in a Starburst Enterprise Presto 323-e release line. The customer was also using
the AWS cloud, but not the Kubernetes deployment. They were using &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/aws/&quot;&gt;CFT-based deployment&lt;/a&gt;
– the same stack we were using for all our release benchmarks so far – and we had never run into issues like this before.
As the customer was using a fresh-off-press latest minor release, we decided (in spirit of global health care trend)
to “quarantine” that release and roll back the customer installation to the previous version.&lt;/p&gt;

&lt;p&gt;However, the fact that a small bug fix release triggered data problems was unnerving. The fact that we
did not discover any of these problems before, was even more unnerving.&lt;/p&gt;

&lt;h1 id=&quot;more-testing--the-data-corruption&quot;&gt;More testing – the data corruption&lt;/h1&gt;

&lt;p&gt;As we were running more and more, and even more test runs, we discovered new failure modes.
For example:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200327_001931_00020_8di4r): Cannot cast DECIMAL(7, 2) &apos;18734974449861284.67&apos; to DECIMAL(12, 2)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Well, this message is not &lt;em&gt;wrong&lt;/em&gt;. It’s not possible to cast &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;18734974449861284.67&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(12, 2)&lt;/code&gt;.
Except that it is &lt;em&gt;also&lt;/em&gt; not possible to have a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(7, 2)&lt;/code&gt; with such value. Something wrong happened to the
data. At that moment, we realized the problem was very serious, because data could become corrupted.
This corrupted data could lead to a failure (like above), but it could also lead to incorrect query results,
or incorrect data being persisted (in case of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS&lt;/code&gt; queries). We created
a virtual War Room (that is, a Slack channel), got together all Presto experts and our experienced field team
to discuss potential causes, further diagnostics and mitigation strategies.&lt;/p&gt;

&lt;p&gt;Since the problem was affecting data exchanges between Presto nodes, we listed the following strategies
to try to dissect the problem:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;determining which query (queries) is (are) causing failures,&lt;/li&gt;
  &lt;li&gt;running with HTTP/2,&lt;/li&gt;
  &lt;li&gt;reverting to running on Java 8,&lt;/li&gt;
  &lt;li&gt;enabling exchange compression (as decompression is very sensitive to data corruption),&lt;/li&gt;
  &lt;li&gt;trying to upgrade Jetty,&lt;/li&gt;
  &lt;li&gt;determining whether failures correlate with JVM GC activity,&lt;/li&gt;
  &lt;li&gt;inspecting the source code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;different-configuration&quot;&gt;Different configuration&lt;/h1&gt;

&lt;p&gt;We were able to quickly prototype and verify some of the ideas. Switching to HTTP/2 or
upgrading Jetty to the latest version did not help. Nor did downgrading to Jetty version
that had been using for a long time. We also verified that problem was reproducible with Java 8,
so we concluded Java 11 was not the cause of it.&lt;/p&gt;

&lt;h1 id=&quot;checksums&quot;&gt;Checksums&lt;/h1&gt;

&lt;p&gt;We identified the problem occurs somewhere within exchanges, between one Presto worker
node serializing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; object (basic unit of data processing in Presto) and another node
deserializing it.&lt;/p&gt;

&lt;p&gt;While decimal cast failure didn’t directly point at the data corruption problem (there could
be many other reasons for it), there was no other explanation for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Unknown block encoding&lt;/code&gt; exceptions.
The serialization is done in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde.serialize&lt;/code&gt; (used by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskOutputOperator&lt;/code&gt;, the data sender) and
deserialization is done in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde.deserialize&lt;/code&gt; (used by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ExchangeOperator&lt;/code&gt;, the
receiver of the data). As the logic is nicely encapsulated in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde&lt;/code&gt; class, we
added checksums to the serialized data: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;checksum&amp;gt; &amp;lt;serialized page&amp;gt;&lt;/code&gt;.
This felt like a smart move – except that it gave us nothing more than a confirmation that
there is a problem (“checksum failure”).
This we already knew.&lt;/p&gt;

&lt;p&gt;We considered adding logging to capture data going out from one node and going in on
another node, but that would be huge amount of logs. One run of benchmarks transfers
hundreds of terabytes of data between the nodes.&lt;/p&gt;

&lt;p&gt;We went ahead and created a Presto build that added data redundancy to be able to reconstruct
the data on the receiving side.
There are many &lt;a href=&quot;https://en.wikipedia.org/wiki/Erasure_code&quot;&gt;well-known error-correction codes&lt;/a&gt;
(e.g. &lt;a href=&quot;https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction&quot;&gt;Reed–Solomon error correction&lt;/a&gt;
available in Hadoop 3). In our case, speed of &lt;em&gt;implementation&lt;/em&gt; (a.k.a. simplicity) was a deciding factor,
so we added data mirroring: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;checksum&amp;gt; &amp;lt;serialized page&amp;gt; &amp;lt;serialized page&amp;gt;&lt;/code&gt;.
In order to avoid logging of all the data exchanges, we added the deserialized pages (both copies)
to the exceptions being raised.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;java.sql.SQLException: Query failed (#20200401_113622_00676_p7qp7): Hash mismatch, read: 1251072184702746109, calculated: 7591448164918409110
    Suppressed: java.lang.RuntimeException: Slice, first half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)
    Suppressed: java.lang.RuntimeException: Slice, secnd half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The exception told us the first part was changed, since read checksum did not match the calculated
checksum (it was calculated based on the first copy of the data and was different than the checksum
calculated on the sending side).
Having the encoded data in the exception like that, it was easy to extract the actual data and compare,
so now we could see &lt;em&gt;how&lt;/em&gt; the data was changed.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cat failure.txt | grep &apos;Slice, first half&apos; | cut -d: -f4- | sed &apos;s/^ *//&apos; | xxd -r -p &amp;gt; changed
cat failure.txt | grep &apos;Slice, secnd half&apos; | cut -d: -f4- | sed &apos;s/^ *//&apos; | xxd -r -p &amp;gt; original
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Comparing binary files is fun, but in practice it can be more convenient to compare &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hexdump&lt;/code&gt; output.
The output below was created with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vimdiff &amp;lt;(hexdump -Cv original) &amp;lt;(hexdump -Cv changed)&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b10  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b20  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b30  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b40  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b50  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b60  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  e1 67 25 00 00 00 00 00
 00017b70  00 e1 67 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  e1 67 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b80  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b90  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017ba0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bb0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bc0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bd0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017be0  00 fb 69 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017bf0  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c00  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c10  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c20  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c30  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c40  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c50  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c60  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c70  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c80  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c90  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017ca0  00 34 68 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  34 68 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cb0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cc0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cd0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017ce0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cf0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d00  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d10  00 2e 6b 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d20  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d30  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d40  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d50  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d60  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d70  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d80  00 cf 68 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017d90  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017da0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017db0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017dc0  00 6b 69 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017dd0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017de0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017df0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e00  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e10  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e20  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e30  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e40  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e50  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e60  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e70  00 a9 66 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e80  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e90  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017ea0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017eb0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ec0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ed0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ee0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ef0  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 0...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is perhaps no surprise that 0 bytes occupied a lot of the data transfer. For performance reasons,
Presto uses fixed-length representation for fixed-length data types, such as integers or decimals.
Compressing data for the sake of network exchanges makes sense, if your network is saturated and
CPU is not, and is off by default. If we replace 0 bytes with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__&lt;/code&gt;, we see that the difference
between original (left) and changed (right) is pretty interesting: it looks like one 0 byte was
shifted from offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00017b60+5&lt;/code&gt; (approximately) to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00017e90+12&lt;/code&gt; (approximately).
This is very unusual data change. We got other failure samples showing similar data changes,
with varying offset numbers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b10  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b20  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b30  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b40  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b50  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b60  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  e1 67 25 __ __ __ __ __
 00017b70  __ e1 67 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  e1 67 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b80  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b90  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017ba0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bb0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bc0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bd0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017be0  __ fb 69 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017bf0  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c00  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c10  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c20  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c30  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c40  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c50  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c60  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c70  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c80  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c90  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017ca0  __ 34 68 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  34 68 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cb0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cc0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cd0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017ce0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cf0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d00  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d10  __ 2e 6b 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d20  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d30  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d40  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d50  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d60  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d70  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d80  __ cf 68 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017d90  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017da0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017db0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017dc0  __ 6b 69 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017dd0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017de0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017df0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e00  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e10  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e20  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e30  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e40  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e50  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e60  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e70  __ a9 66 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e80  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e90  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017ea0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017eb0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ec0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ed0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ee0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ef0  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 00...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;outside-of-presto&quot;&gt;Outside of Presto&lt;/h1&gt;

&lt;p&gt;We captured a cluster of 10 nodes manifesting the problem and hold on to it in further investigation.
Our testing showed that TPC-DS query 72 is significantly more likely to fail than other queries.
On the isolated cluster, a loop running TPC-DS query 72 would reproduce a failure within 2 hours.
We added additional information in the exception reporting checksum failure, to identify on which
node the failure happens and which node is the sender of the data. For all the failures on the isolated
10-node cluster, the failure would always happen with one worker node (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10.83.28.124&lt;/code&gt;, the Receiver) reading data
from certain other worker node (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10.142.0.84&lt;/code&gt;, the Sender). We stopped all other workers and attempted to
reproduce the problem outside of Presto.&lt;/p&gt;

&lt;p&gt;One of the things we tried was checking the network reliability with netcat.
On the Sender node, we ran the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;dd if=/dev/urandom of=/tmp/small-data bs=$[1024*1024] count=1
ncat -l 20165 --keep-open --max-conns 100 --sh-exec &quot;cat /tmp/small-data&quot; -v
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the Receiver node we run the following in a loop:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ncat --recv-only 10.142.0.84 20165 &amp;gt; &quot;/tmp/received&quot;
sha1sum &quot;/tmp/received&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running this in a loop for just a few dozens of seconds resulted in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; different
than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/small-data&lt;/code&gt;. Sometimes the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; would be “just” a prefix of the original data
and sometimes there would be data displacements within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; file. We cross-checked these
observations on a different pair of nodes and also on a different public cloud, using same netcat version.
We observed the same behavior everywhere we checked it, with varying, but high error rate, over 1%. This high
error rate was what led us to discard this evidence – there was either something wrong with the way we
used netcat, we violated netcat’s assumptions or netcat was not the right tool for this task.&lt;/p&gt;

&lt;p&gt;We searched for other tools that we could use. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iperf&lt;/code&gt; is a well-known tool for stressing out the network.
Sadly, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iperf&lt;/code&gt; &lt;a href=&quot;https://github.com/esnet/iperf/issues/157&quot;&gt;does not have an ability to verify exchanged data integrity yet&lt;/a&gt;.
We deployed a &lt;a href=&quot;https://github.com/findepi/netsum&quot;&gt;home-made, Java-based tool&lt;/a&gt; instead. using this tool
we were able to reproduce the data corruption problem between Sender and Receiver nodes. The error rate
was very low. To reproduce the problem we had to saturate the network and use multiple concurrent TCP connections
(which is very similar to how Presto uses the network). This validated our
observations that the data corruption problem was happening outside of Presto. Interestingly, we were unable
to reproduce the problem when stressing the network with a single TCP connection.&lt;/p&gt;

&lt;h1 id=&quot;mystery-unsolved&quot;&gt;Mystery unsolved&lt;/h1&gt;

&lt;p&gt;Obviously, with such a strong evidence gathered so far, we opened a support ticket with AWS.
The support team was great and did a lot of investigation on their own. Unfortunately, the problem went
away before the support team was able to get to the bottom of it. It was April already.
Perhaps, one day someone will find the smoking gun and write the rest of this story.&lt;/p&gt;

&lt;h1 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h1&gt;

&lt;p&gt;We implemented data integrity protection measure in Presto. We used &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso’s&lt;/a&gt;
Java implementation of the &lt;a href=&quot;https://github.com/Cyan4973/xxHash&quot;&gt;XXHash64&lt;/a&gt; algorithm. Thanks to its
speed, we could enable it by default, with negligible impact on overall query performance.
By default, data integrity violation results in query failure, but Presto can be configured to retry as well,
by setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange.data-integrity-verification&lt;/code&gt; configuration property.&lt;/p&gt;

&lt;p&gt;This chapter of the Presto history should remain closed and we should be able to forget about all this.
However, a couple days ago, a customer running Presto on Azure Kubernetes Service (AKS) reported an exception like
the one below. On the next day, we bumped into this as well. We were doing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS SELECT&lt;/code&gt;
to prepare a new benchmark dataset on Azure Storage.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200622_124803_00000_abcde): Checksum verification failure on 10.12.3.47
    when reading from http://10.12.3.53:8080/v1/task/20200622_124803_00000_abcde.2.6/results/5/8:
    Data corruption, read checksum: 0xe17e6eaeb665dc6e, calculated checksum: 0xb3540697373195f1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is no fun when a query fails like this. However – what a joy and pride that it did not silently
return incorrect query results. Rest assured, Presto will not return incorrect results, wherever you
run it.&lt;/p&gt;

&lt;h1 id=&quot;credits&quot;&gt;Credits&lt;/h1&gt;

&lt;p&gt;Special thanks go to our customers, for your understanding and the trust you have in us.
Without you, Starburst wouldn’t be as fun place as it is!
Thanks to &lt;a href=&quot;https://github.com/lukasz-walkiewicz&quot;&gt;Łukasz Walkiewicz&lt;/a&gt; and &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
for fantastic benchmark and experimentation automation and your help with running the experiments!
Thanks to &lt;a href=&quot;https://github.com/willmostly&quot;&gt;Will Morrison&lt;/a&gt; for finding the Sender and Receiver machines
that reproduced the problem so nicely!
Thanks to &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;, &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;
and &lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt; for guidance, ideas, clever tips and code pointers!
Thanks to &lt;a href=&quot;https://github.com/losipiuk&quot;&gt;Łukasz Osipiuk&lt;/a&gt; for running experiments, cross-checking
the results and helping keep sanity. Shout out to the whole Starburst team – it was truly a team’s work!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>It all started on an Thursday afternoon in March, when Karol Sobczak was grilling Presto with heavy rounds of benchmarks, as we were ramping up to Starburst Enterprise Presto 332-e release. Karol discovered what seemed to be a serious regression, and turned out to be even more serious Cloud environment issue.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto at Zuora</title>
      <link href="https://trino.io/blog/2020/06/16/presto-summit-zuora.html" rel="alternate" type="text/html" title="Presto at Zuora" />
      <published>2020-06-16T00:00:00+00:00</published>
      <updated>2020-06-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/16/presto-summit-zuora</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/16/presto-summit-zuora.html">&lt;p&gt;The Presto Summit is morphing into a series of virtual events, and we already
started with the &lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto webinar&lt;/a&gt; recently. Next up is a talk about Presto with
lots of practical insights at &lt;a href=&quot;https://zuora.com/&quot;&gt;Zuora&lt;/a&gt; presented by Henning
Schmiedehausen:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using Presto as Query Layer in a Distributed Microservices Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with lots of questions from the audience, taking us beyond
the planned time frame. Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ICAPZksjP0k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Presto has found its place as a SQL-based query engine for big data in the new
stack, but it does not have to be limited to big data and large scale analytics
applications.&lt;/p&gt;

&lt;p&gt;In this presentation, Henning highlights how Presto helped Zuora to transform
its monolithic data architecture for an online transactional system into a
loosely coupled, services-based architecture. In doing so it helped to solve the
most pressing problem when splitting up data, providing direct to access
production data across many services and enabling complex data queries across
live data. Zuora Data Query was an instant success when it was launched.&lt;/p&gt;

&lt;p&gt;In this webinar you discover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The technical architecture that embedded Presto in the Zuora service stack&lt;/li&gt;
  &lt;li&gt;The pieces of Presto that could be used directly off the shelf&lt;/li&gt;
  &lt;li&gt;How we productized it into a system that now serves huge numbers of small
queries against live data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our speaker, Henning Schmiedehausen, Chief Architect at Zuora, is a thought
leader in the open source Java community with more than 25 years of experience
contributing to successful open source projects. At Zuora he serves as the chief
architect and is responsible for the technical aspects of transforming the Zuora
system to a new, scalable, and flexible Microservices Architecture. Prior to
Zuora he worked at Facebook and Groupon as a principal engineer. Henning also
served as a board member at the Apache Software Foundation&lt;/p&gt;

&lt;p&gt;Date: Tuesday, 30 June 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2YfPNne&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>The Presto Summit is morphing into a series of virtual events, and we already started with the State of Presto webinar recently. Next up is a talk about Presto with lots of practical insights at Zuora presented by Henning Schmiedehausen: Using Presto as Query Layer in a Distributed Microservices Architecture Update: We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Dynamic partition pruning</title>
      <link href="https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html" rel="alternate" type="text/html" title="Dynamic partition pruning" />
      <published>2020-06-14T00:00:00+00:00</published>
      <updated>2020-06-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/14/dynamic-partition-pruning</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html">&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Star_schema&quot;&gt;Star-schema&lt;/a&gt; is one of the most widely used data mart patterns. 
The star schema consists of fact tables (usually partitioned) and dimension tables, 
which are used to filter rows from fact tables.
Consider the following query which captures a common pattern of a fact table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; partitioned by the column 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt; joined with a filtered dimension table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT COUNT(*) FROM 
store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE d_following_holiday=&apos;Y&apos; AND d_year = 2000;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without dynamic filtering, Presto will push predicates for the dimension table to the table scan on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim&lt;/code&gt; but 
it will scan all the data in the fact table since there are no filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; in the query.
The join operator will end up throwing away most of the probe-side rows as the join criteria is highly selective. 
The current implementation of &lt;a href=&quot;https://trino.io/blog/2019/06/30/dynamic-filtering.html&quot;&gt;dynamic filtering&lt;/a&gt; improves
on this, however it is limited only to broadcast joins on tables stored in ORC or Parquet format. 
Additionally, it does not take advantage of the layout of partitioned Hive tables.&lt;/p&gt;

&lt;p&gt;With dynamic partition pruning, which extends the current implementation of dynamic filtering, every worker node collects 
values eligible for the join from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim.d_date_sk&lt;/code&gt; column and passes it to the coordinator. 
Coordinator can then skip processing of the partitions of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; which don’t meet the join criteria. 
This greatly reduces the amount of data scanned from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; table by worker nodes. 
This optimization is applicable to any storage format and to both broadcast and partitioned join.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;design-considerations&quot;&gt;Design considerations&lt;/h1&gt;

&lt;p&gt;This optimization requires dynamic filters collected by worker nodes to be communicated to the coordinator over the network.
We needed to ensure that this additional communication overhead does not overload the coordinator.
This was achieved by packing dynamic filters into Presto’s existing framework for sending status updates from worker to coordinator.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/server/DynamicFilterService.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt;&lt;/a&gt; 
was added on the coordinator node to perform dynamic filter collection asynchronously.
Queries registered with this service can request dynamic filters while scheduling splits without blocking any operations.
This service is also responsible for ensuring that all the build-side tasks of a join stage have completed execution before 
constructing dynamic filters to be used in the scheduling of probe-side table scans by the coordinator.&lt;/p&gt;

&lt;h1 id=&quot;implementation&quot;&gt;Implementation&lt;/h1&gt;

&lt;p&gt;For identifying opportunities for dynamic filtering in the logical plan, we rely on the implementation added in
&lt;a href=&quot;https://github.com/trinodb/trino/pull/91&quot;&gt;#91&lt;/a&gt;. Dynamic filters are modeled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FunctionCall&lt;/code&gt; expressions which 
evaluate to a boolean value. They are created in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PredicatePushDown&lt;/code&gt; optimizer rule from the equi-join clauses of inner join 
nodes and pushed down in the plan along with other predicates. Dynamic filters are added to the plan after the cost-based 
optimization rules. This ensures that dynamic filters do not interfere with cost estimation and join reordering.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PredicatePushDown&lt;/code&gt; rule can end up pushing dynamic filters to unsupported places in the plan via inferencing. 
This was solved by adding the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveUnsupportedDynamicFilters.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RemoveUnsupportedDynamicFilters&lt;/code&gt;&lt;/a&gt;
optimizer rule which is responsible for ensuring that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dynamic filters are present only directly above a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt; node and only if the subtree is on the probe side of some downstream &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JoinNode&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Dynamic filters are removed from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JoinNode&lt;/code&gt; if there is no consumer for it on its probe side subtree.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also run &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/sanity/DynamicFiltersChecker.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFiltersChecker&lt;/code&gt;&lt;/a&gt;
at the end of the planning phase to ensure that the above conditions have been satisfied by the optimized plan.&lt;/p&gt;

&lt;p&gt;We reuse the existing &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/operator/DynamicFilterSourceOperator.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterSourceOperator&lt;/code&gt;&lt;/a&gt;
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalExecutionPlanner&lt;/code&gt; to collect build-side values from each inner join on each worker node. In addition to passing the collected &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDomain&lt;/code&gt;
to &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalDynamicFiltersCollector&lt;/code&gt;&lt;/a&gt; 
within the same worker node for use in broadcast join probe-side scans, we also pass them to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskContext&lt;/code&gt; to populate task 
status updates for the coordinator.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ContinuousTaskStatusFetcher&lt;/code&gt; on the coordinator node pulls task status updates from all worker nodes up to every
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;task.status-refresh-max-wait&lt;/code&gt; seconds (default is 1 second) or less (if task status changes). &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt; 
on the coordinator regularly polls for dynamic filters from task status updates through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SqlQueryExecution&lt;/code&gt; and provides
an interface to supply dynamic filters when they are ready. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ConnectorSplitManager#getSplits&lt;/code&gt; API has been updated to
optionally utilize dynamic filters supplied by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the Hive connector, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundHiveSplitLoader&lt;/code&gt; can apply dynamic filtering by either completely skipping the listing
of files within a partition, or by avoiding the creation of splits within a loaded partition if the dynamic filters 
become available in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalHiveSplitFactory#createInternalHiveSplit&lt;/code&gt; due to lazy enumeration of splits.&lt;/p&gt;

&lt;h1 id=&quot;benchmarks&quot;&gt;Benchmarks&lt;/h1&gt;

&lt;p&gt;We ran TPC-DS queries on 5 worker nodes cluster of r4.8xlarge machines using data stored in ORC format.
TPC-DS tables were partitioned as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;catalog_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;catalog_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cs_sold_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;web_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;web_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ws_sold_date_sk&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/hdinsight/tpcds-hdinsight/blob/master/ddl/createAllORCTables.hql&quot;&gt;createAllORCTables.hql&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following queries ran faster by more than 20% with dynamic partition pruning (measuring the elapsed time in seconds,
 CPU time in minutes and Data read in MB).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Query&lt;/th&gt;
      &lt;th&gt;Baseline elapsed&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning elapsed&lt;/th&gt;
      &lt;th&gt;Baseline CPU&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning CPU&lt;/th&gt;
      &lt;th&gt;Baseline data read&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning data read&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;q01&lt;/td&gt;
      &lt;td&gt;10.96&lt;/td&gt;
      &lt;td&gt;8.50&lt;/td&gt;
      &lt;td&gt;10.2&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;17.91&lt;/td&gt;
      &lt;td&gt;14.53&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q04&lt;/td&gt;
      &lt;td&gt;21.63&lt;/td&gt;
      &lt;td&gt;10.80&lt;/td&gt;
      &lt;td&gt;23.6&lt;/td&gt;
      &lt;td&gt;16.1&lt;/td&gt;
      &lt;td&gt;34.81&lt;/td&gt;
      &lt;td&gt;12.99&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q05&lt;/td&gt;
      &lt;td&gt;41.38&lt;/td&gt;
      &lt;td&gt;14.94&lt;/td&gt;
      &lt;td&gt;57.1&lt;/td&gt;
      &lt;td&gt;16.8&lt;/td&gt;
      &lt;td&gt;54.81&lt;/td&gt;
      &lt;td&gt;11.45&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q07&lt;/td&gt;
      &lt;td&gt;12.35&lt;/td&gt;
      &lt;td&gt;9.26&lt;/td&gt;
      &lt;td&gt;26.4&lt;/td&gt;
      &lt;td&gt;14.6&lt;/td&gt;
      &lt;td&gt;30.28&lt;/td&gt;
      &lt;td&gt;17.31&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q08&lt;/td&gt;
      &lt;td&gt;10.48&lt;/td&gt;
      &lt;td&gt;6.43&lt;/td&gt;
      &lt;td&gt;11.0&lt;/td&gt;
      &lt;td&gt;4.7&lt;/td&gt;
      &lt;td&gt;10.19&lt;/td&gt;
      &lt;td&gt;3.52&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q11&lt;/td&gt;
      &lt;td&gt;20.04&lt;/td&gt;
      &lt;td&gt;14.82&lt;/td&gt;
      &lt;td&gt;35.6&lt;/td&gt;
      &lt;td&gt;27.8&lt;/td&gt;
      &lt;td&gt;25.37&lt;/td&gt;
      &lt;td&gt;9.72&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q17&lt;/td&gt;
      &lt;td&gt;24.05&lt;/td&gt;
      &lt;td&gt;9.87&lt;/td&gt;
      &lt;td&gt;26.4&lt;/td&gt;
      &lt;td&gt;12.0&lt;/td&gt;
      &lt;td&gt;30.18&lt;/td&gt;
      &lt;td&gt;9.75&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q18&lt;/td&gt;
      &lt;td&gt;13.98&lt;/td&gt;
      &lt;td&gt;6.00&lt;/td&gt;
      &lt;td&gt;17.5&lt;/td&gt;
      &lt;td&gt;7.7&lt;/td&gt;
      &lt;td&gt;20.29&lt;/td&gt;
      &lt;td&gt;8.81&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q25&lt;/td&gt;
      &lt;td&gt;18.91&lt;/td&gt;
      &lt;td&gt;8.04&lt;/td&gt;
      &lt;td&gt;26.9&lt;/td&gt;
      &lt;td&gt;9.1&lt;/td&gt;
      &lt;td&gt;37.54&lt;/td&gt;
      &lt;td&gt;11.12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q27&lt;/td&gt;
      &lt;td&gt;11.98&lt;/td&gt;
      &lt;td&gt;5.58&lt;/td&gt;
      &lt;td&gt;25.1&lt;/td&gt;
      &lt;td&gt;8.6&lt;/td&gt;
      &lt;td&gt;26.69&lt;/td&gt;
      &lt;td&gt;10.12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q29&lt;/td&gt;
      &lt;td&gt;24.11&lt;/td&gt;
      &lt;td&gt;15.46&lt;/td&gt;
      &lt;td&gt;30.5&lt;/td&gt;
      &lt;td&gt;18.5&lt;/td&gt;
      &lt;td&gt;30.18&lt;/td&gt;
      &lt;td&gt;13.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q31&lt;/td&gt;
      &lt;td&gt;27.81&lt;/td&gt;
      &lt;td&gt;12.77&lt;/td&gt;
      &lt;td&gt;48.2&lt;/td&gt;
      &lt;td&gt;21.3&lt;/td&gt;
      &lt;td&gt;39.53&lt;/td&gt;
      &lt;td&gt;13.73&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q32&lt;/td&gt;
      &lt;td&gt;11.51&lt;/td&gt;
      &lt;td&gt;8.15&lt;/td&gt;
      &lt;td&gt;12.7&lt;/td&gt;
      &lt;td&gt;10.3&lt;/td&gt;
      &lt;td&gt;15.05&lt;/td&gt;
      &lt;td&gt;12.76&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q33&lt;/td&gt;
      &lt;td&gt;15.95&lt;/td&gt;
      &lt;td&gt;4.31&lt;/td&gt;
      &lt;td&gt;24.3&lt;/td&gt;
      &lt;td&gt;5.4&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;6.67&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q35&lt;/td&gt;
      &lt;td&gt;15.10&lt;/td&gt;
      &lt;td&gt;5.22&lt;/td&gt;
      &lt;td&gt;13.8&lt;/td&gt;
      &lt;td&gt;6.2&lt;/td&gt;
      &lt;td&gt;4.83&lt;/td&gt;
      &lt;td&gt;1.70&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q36&lt;/td&gt;
      &lt;td&gt;11.68&lt;/td&gt;
      &lt;td&gt;6.43&lt;/td&gt;
      &lt;td&gt;22.4&lt;/td&gt;
      &lt;td&gt;11.4&lt;/td&gt;
      &lt;td&gt;24.28&lt;/td&gt;
      &lt;td&gt;12.78&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q38&lt;/td&gt;
      &lt;td&gt;21.08&lt;/td&gt;
      &lt;td&gt;16.20&lt;/td&gt;
      &lt;td&gt;39.4&lt;/td&gt;
      &lt;td&gt;31.6&lt;/td&gt;
      &lt;td&gt;5.65&lt;/td&gt;
      &lt;td&gt;3.15&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q40&lt;/td&gt;
      &lt;td&gt;37.40&lt;/td&gt;
      &lt;td&gt;11.98&lt;/td&gt;
      &lt;td&gt;37.7&lt;/td&gt;
      &lt;td&gt;8.4&lt;/td&gt;
      &lt;td&gt;17.02&lt;/td&gt;
      &lt;td&gt;9.20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q46&lt;/td&gt;
      &lt;td&gt;11.57&lt;/td&gt;
      &lt;td&gt;9.06&lt;/td&gt;
      &lt;td&gt;24.4&lt;/td&gt;
      &lt;td&gt;17.3&lt;/td&gt;
      &lt;td&gt;18.51&lt;/td&gt;
      &lt;td&gt;14.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q48&lt;/td&gt;
      &lt;td&gt;20.48&lt;/td&gt;
      &lt;td&gt;12.65&lt;/td&gt;
      &lt;td&gt;42.3&lt;/td&gt;
      &lt;td&gt;22.5&lt;/td&gt;
      &lt;td&gt;20.71&lt;/td&gt;
      &lt;td&gt;11.54&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q49&lt;/td&gt;
      &lt;td&gt;26.69&lt;/td&gt;
      &lt;td&gt;16.01&lt;/td&gt;
      &lt;td&gt;38.8&lt;/td&gt;
      &lt;td&gt;12.0&lt;/td&gt;
      &lt;td&gt;68.67&lt;/td&gt;
      &lt;td&gt;30.57&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q50&lt;/td&gt;
      &lt;td&gt;46.90&lt;/td&gt;
      &lt;td&gt;33.22&lt;/td&gt;
      &lt;td&gt;43.4&lt;/td&gt;
      &lt;td&gt;42.5&lt;/td&gt;
      &lt;td&gt;21.30&lt;/td&gt;
      &lt;td&gt;16.77&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q54&lt;/td&gt;
      &lt;td&gt;43.05&lt;/td&gt;
      &lt;td&gt;11.39&lt;/td&gt;
      &lt;td&gt;27.5&lt;/td&gt;
      &lt;td&gt;14.8&lt;/td&gt;
      &lt;td&gt;17.71&lt;/td&gt;
      &lt;td&gt;11.52&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q56&lt;/td&gt;
      &lt;td&gt;16.23&lt;/td&gt;
      &lt;td&gt;4.12&lt;/td&gt;
      &lt;td&gt;23.8&lt;/td&gt;
      &lt;td&gt;5.5&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;6.72&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q60&lt;/td&gt;
      &lt;td&gt;16.39&lt;/td&gt;
      &lt;td&gt;6.02&lt;/td&gt;
      &lt;td&gt;25.1&lt;/td&gt;
      &lt;td&gt;6.6&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;7.42&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q61&lt;/td&gt;
      &lt;td&gt;17.18&lt;/td&gt;
      &lt;td&gt;5.50&lt;/td&gt;
      &lt;td&gt;33.4&lt;/td&gt;
      &lt;td&gt;7.1&lt;/td&gt;
      &lt;td&gt;42.63&lt;/td&gt;
      &lt;td&gt;9.37&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q66&lt;/td&gt;
      &lt;td&gt;13.67&lt;/td&gt;
      &lt;td&gt;6.59&lt;/td&gt;
      &lt;td&gt;19.1&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;19.63&lt;/td&gt;
      &lt;td&gt;8.34&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q69&lt;/td&gt;
      &lt;td&gt;9.89&lt;/td&gt;
      &lt;td&gt;7.46&lt;/td&gt;
      &lt;td&gt;10.5&lt;/td&gt;
      &lt;td&gt;6.1&lt;/td&gt;
      &lt;td&gt;4.83&lt;/td&gt;
      &lt;td&gt;3.16&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q71&lt;/td&gt;
      &lt;td&gt;17.32&lt;/td&gt;
      &lt;td&gt;6.11&lt;/td&gt;
      &lt;td&gt;23.3&lt;/td&gt;
      &lt;td&gt;6.6&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;8.06&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q74&lt;/td&gt;
      &lt;td&gt;16.86&lt;/td&gt;
      &lt;td&gt;9.44&lt;/td&gt;
      &lt;td&gt;24.1&lt;/td&gt;
      &lt;td&gt;17.6&lt;/td&gt;
      &lt;td&gt;22.59&lt;/td&gt;
      &lt;td&gt;8.08&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q75&lt;/td&gt;
      &lt;td&gt;122.04&lt;/td&gt;
      &lt;td&gt;69.45&lt;/td&gt;
      &lt;td&gt;102.7&lt;/td&gt;
      &lt;td&gt;62.9&lt;/td&gt;
      &lt;td&gt;110.86&lt;/td&gt;
      &lt;td&gt;63.91&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q77&lt;/td&gt;
      &lt;td&gt;23.94&lt;/td&gt;
      &lt;td&gt;7.51&lt;/td&gt;
      &lt;td&gt;29.3&lt;/td&gt;
      &lt;td&gt;6.8&lt;/td&gt;
      &lt;td&gt;49.95&lt;/td&gt;
      &lt;td&gt;12.20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q80&lt;/td&gt;
      &lt;td&gt;43.46&lt;/td&gt;
      &lt;td&gt;18.57&lt;/td&gt;
      &lt;td&gt;45.8&lt;/td&gt;
      &lt;td&gt;11.5&lt;/td&gt;
      &lt;td&gt;37.25&lt;/td&gt;
      &lt;td&gt;11.78&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q85&lt;/td&gt;
      &lt;td&gt;20.97&lt;/td&gt;
      &lt;td&gt;16.54&lt;/td&gt;
      &lt;td&gt;16.9&lt;/td&gt;
      &lt;td&gt;14.7&lt;/td&gt;
      &lt;td&gt;14.65&lt;/td&gt;
      &lt;td&gt;10.52&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-partition-pruning/benchmark.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;18 TPC-DS queries improved runtime by over 50% while decreasing CPU usage by an average of 64%.
Data read was decreased by 66%.&lt;/li&gt;
  &lt;li&gt;7 TPC-DS queries improved between 30% to 50% while decreasing CPU usage by an average of 47%.
Data read was decreased by 54%.&lt;/li&gt;
  &lt;li&gt;29 TPC-DS queries improved by 10% to 30% while decreasing CPU by an average of 20%.
Data read was decreased by 27%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that the baseline here includes the improvements from the existing 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1686&quot;&gt;node local dynamic filtering&lt;/a&gt; implementation.&lt;/p&gt;

&lt;h1 id=&quot;discussion&quot;&gt;Discussion&lt;/h1&gt;

&lt;p&gt;In order for dynamic filtering to work, the smaller dimension table needs to be chosen as a join’s build side.
Cost-based optimizer can automatically do this using table statistics from the metastore.
Therefore, we generated table statistics prior to running this benchmark and rely on the CBO to correctly choose
the smaller table on the build side of join.&lt;/p&gt;

&lt;p&gt;It is quite common for large fact tables to be partitioned by dimensions like time.
Queries joining such tables with filtered dimension tables benefit significantly from dynamic partition pruning. 
This optimization is applicable to partitioned Hive tables stored in any data format.
It also works with both broadcast and partitioned joins. Other connectors can easily take advantage of dynamic filters 
by implementing the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ConnectorSplitManager#getSplits&lt;/code&gt; API which supplies dynamic filters to the connector.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Support for using &lt;a href=&quot;https://github.com/trinodb/trino/pull/3871&quot;&gt;min-max range&lt;/a&gt; in DynamicFilterSourceOperator when 
the build-side contains too many values.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/3972&quot;&gt;Passing dynamic filters back to the worker nodes&lt;/a&gt; from coordinator 
to allow ORC and Parquet readers to use dynamic filters with partitioned joins.&lt;/li&gt;
  &lt;li&gt;Allow connectors to &lt;a href=&quot;https://github.com/trinodb/trino/pull/3414&quot;&gt;block probe-side scan&lt;/a&gt; until dynamic filters are ready.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2674&quot;&gt;Support dynamic filtering with inequality operators&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2190&quot;&gt;Support for semi-joins&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Take advantage of dynamic filters in connectors other than Hive.&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Raunaq Morarka, Qubole and Karol Sobczak, Starburst Data</name>
        </author>
      

      <summary>Star-schema is one of the most widely used data mart patterns. The star schema consists of fact tables (usually partitioned) and dimension tables, which are used to filter rows from fact tables. Consider the following query which captures a common pattern of a fact table store_sales partitioned by the column ss_sold_date_sk joined with a filtered dimension table date_dim: SELECT COUNT(*) FROM store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk WHERE d_following_holiday=&apos;Y&apos; AND d_year = 2000; Without dynamic filtering, Presto will push predicates for the dimension table to the table scan on date_dim but it will scan all the data in the fact table since there are no filters on store_sales in the query. The join operator will end up throwing away most of the probe-side rows as the join criteria is highly selective. The current implementation of dynamic filtering improves on this, however it is limited only to broadcast joins on tables stored in ORC or Parquet format. Additionally, it does not take advantage of the layout of partitioned Hive tables. With dynamic partition pruning, which extends the current implementation of dynamic filtering, every worker node collects values eligible for the join from date_dim.d_date_sk column and passes it to the coordinator. Coordinator can then skip processing of the partitions of store_sales which don’t meet the join criteria. This greatly reduces the amount of data scanned from store_sales table by worker nodes. This optimization is applicable to any storage format and to both broadcast and partitioned join.</summary>

      
      
    </entry>
  
    <entry>
      <title>Hive ACID and transactional tables&apos; support in Presto</title>
      <link href="https://trino.io/blog/2020/06/01/hive-acid.html" rel="alternate" type="text/html" title="Hive ACID and transactional tables&apos; support in Presto" />
      <published>2020-06-01T00:00:00+00:00</published>
      <updated>2020-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/01/hive-acid</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/01/hive-acid.html">&lt;p&gt;Hive ACID and transactional tables are supported in Presto since the 331
release. Hive ACID support is an important step towards GDPR/CCPA compliance,
and also towards Hive 3 support as &lt;a href=&quot;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/hive-overview/content/hive_upgrade_changes.html&quot;&gt;certain distributions&lt;/a&gt;
of Hive 3 create transactional tables by default.&lt;/p&gt;

&lt;p&gt;In this blog post we cover the concepts of Hive ACID and transactional
tables along with the changes done in Presto to support them. We also cover the
performance tests on this integration and look at the future plans for this
feature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;how-to-use-hive-acid-and-transactional-tables-in-presto&quot;&gt;How to use Hive ACID and transactional tables in Presto&lt;/h1&gt;

&lt;p&gt;Hive transactional tables are readable in Presto without any need to tweak
configs, you only need to take care of these requirements:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Use Presto version 331 or higher&lt;/li&gt;
  &lt;li&gt;Use Hive 3 Metastore Server. Presto does not support Hive transactional
tables created with Hive before version 3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that Presto cannot create or write to Hive transactional tables yet. You
can create and write to Hive transactional tables via
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions&quot;&gt;Hive&lt;/a&gt;
or via Spark with &lt;a href=&quot;https://github.com/qubole/spark-acid&quot;&gt;Hive ACID Data Source plugin&lt;/a&gt; and
use Presto to read these tables.&lt;/p&gt;

&lt;h1 id=&quot;what-is-hive-acid-and-hive-transactional-tables&quot;&gt;What is Hive ACID and Hive transactional tables&lt;/h1&gt;
&lt;p&gt;Hive transactional tables are the tables in Hive that provide ACID semantics.
This excerpt from
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions&quot;&gt;Hive documentation&lt;/a&gt;
covers ACID traits well:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;“ACID stands for four traits of database transactions:
Atomicity (an operation either succeeds completely or fails,
it does not leave partial data), Consistency (once an application performs an
operation the results of that operation are visible to it in every subsequent
operation), Isolation (an incomplete operation by one user does not cause
unexpected side effects for other users), and Durability (once an operation is
complete it will be preserved even in the face of machine or system failure).
These traits have long been expected of database systems as part of their
transaction functionality.“&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&quot;need-for-hive-acid-and-transactional-tables&quot;&gt;Need for Hive ACID and transactional tables&lt;/h1&gt;
&lt;p&gt;In any organisation, there is always a need to update or delete existing entries
in tables e.g., a user writes or updates the review for an item purchased a
week back or a transaction status is changed after a day, etc..
With regulations like GDPR/CCPA updates/deletes become even more frequent as the
users can ask the organisation to delete the data on them, and organisations are
obligated to fulfill these requests.&lt;/p&gt;

&lt;p&gt;The standard practice to update data has been to overwrite the partition or
table with the updated data but this is inefficient and unreliable. It takes a
lot of resources to overwrite all of the existing data to update a few entries,
but more importantly there are issues around isolation when reads on old data
are going on and the overwrite starts deleting that data. To solve these issues
several solutions have been developed, many of them are covered
&lt;a href=&quot;https://www.qubole.com/blog/qubole-open-sources-multi-engine-support-for-updates-and-deletes-in-data-lakes/&quot;&gt;in this blog post&lt;/a&gt;,
and Hive ACID is one of them.&lt;/p&gt;

&lt;h1 id=&quot;concepts-of-hive-acid-and-transactional-tables&quot;&gt;Concepts of Hive ACID and transactional tables&lt;/h1&gt;

&lt;p&gt;Several concepts like transactions, WriteIds, deltas, locks, etc. are added in
Hive to achieve ACID semantics. To understand the changes done in Presto to
support Hive ACID and transactional tables, covered in the next section, it is
important to understand these concepts first. So let’s look at them in detail.&lt;/p&gt;

&lt;h2 id=&quot;types-of-hive-transactional-tables&quot;&gt;Types of Hive transactional tables&lt;/h2&gt;
&lt;p&gt;There are two types of Hive transactional tables: Insert-Only transactional
tables and CRUD transactional tables.
Following table compares the two:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Type of transactional table&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Hive DML Operations Supported&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Input Formats supported&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Synthetic columns in file?&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Additional Table Properties&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Insert-Only Transactional Tables&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;INSERT&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;All input formats&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;No&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional&apos;=&apos;true&apos;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional_properties&apos;=&apos;insert_only&apos;&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;CRUD Transactional Tables&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;INSERT, UPDATE, DELETE&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;ORC&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Yes&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional&apos;=&apos;true&apos;&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;hive-transactions&quot;&gt;Hive Transactions&lt;/h2&gt;
&lt;p&gt;Hive transactional tables should be accessed under Hive Transactions only. Note that
these transactions are different from Presto transactions and are managed by
Hive. Running DML queries under separate transactions helps in atomicity. Each
transaction is independent and when rolled back will not have any impact on the
state of the table.&lt;/p&gt;

&lt;h2 id=&quot;writeids&quot;&gt;WriteIds&lt;/h2&gt;
&lt;p&gt;DML queries under a transaction write to a unique location under partition/table
described in detail later in “New Sub-Directories” section. This location is derived
by WriteId allocated to the transaction. This provides Isolation of DML queries
and such queries can run in parallel, whenever they can, without interfering
with each other.&lt;/p&gt;

&lt;h2 id=&quot;valid-writeids&quot;&gt;Valid WriteIds&lt;/h2&gt;
&lt;p&gt;Read queries under a transaction get a list of valid WriteIds that belong to the
transactions which were successfully committed. This ensures Consistency by
making results of committed transactions available to all the future
transactions and also provides Isolation as DML and read queries can run in
parallel with read queries not reading partial data written by DML queries.&lt;/p&gt;

&lt;h2 id=&quot;new-sub-directories&quot;&gt;New Sub-Directories&lt;/h2&gt;
&lt;p&gt;Results of a DML queries are written to a unique location derived from WriteId
of the transaction. These unique locations are delta directories under
partition/table location. Apart from the WriteId, this unique location is made
up of the DML operation and depending on the operation type there can be two
types of delta directories:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Delete Delta Directory: This delta directory is created for results of
DELETE statements and is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta_&amp;lt;writeId&amp;gt;_&amp;lt;writeId&amp;gt;&lt;/code&gt; under
partition/table location.&lt;/li&gt;
  &lt;li&gt;Delta Directory: This type is created for the results of INSERT statements
and is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta_&amp;lt;writeId&amp;gt;_&amp;lt;writeId&amp;gt;&lt;/code&gt; under partition/table location.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Apart from delta directories, there is another sub-directory that is now added
called “Base directory” and is named as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base_&amp;lt;writeId&amp;gt;&lt;/code&gt; under partition/table
location. This type of directory is created by INSERT OVERWRITE TABLE query or
by major compaction which is described later.&lt;/p&gt;

&lt;p&gt;The following animation shows how these new sub-directories are created in the
filesystem along with transaction management at metastore with different
queries:
&lt;img src=&quot;/assets/blog/hive-acid/directories.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;rowid&quot;&gt;RowID&lt;/h2&gt;
&lt;p&gt;To uniquely identify each row in the table, a synthetic rowId is created and
added to each row. RowIds are added to CRUD transactional tables only because it
is used in case of DELETE statements only. When a DELETE is performed, the
rowIds of the rows that it would delete are written into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
directory and subsequents reads will read all but these rows.&lt;/p&gt;

&lt;p&gt;RowId is made of 5 entries today: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;operation&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalTransaction&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucket&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rowId&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currentTransaction&lt;/code&gt; but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;operation&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currentTransaction&lt;/code&gt; fields
are redundant now.
RowId is added in the root STRUCT of ORC and hence the schema of ORC files is
different from the schema defined in the table, e.g.:&lt;/p&gt;

&lt;p&gt;Schema of CRUD transactional Hive Table:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Schema of ORC file for this table:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that one level of nesting of table schema, like the inner struct above, is
applicable to flat Hive tables too. The two level nesting of data columns is
added for Orc files of CRUD transactional tables to keep rowId columns isolated
from data columns.&lt;/p&gt;

&lt;h2 id=&quot;compactions&quot;&gt;Compactions&lt;/h2&gt;
&lt;p&gt;The working described above with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories for each
transaction makes the DML queries execute fast but have
the following impact on read queries:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Many delta directories with small data in each directory will slow down
execution of read queries. This is a known problem around
small files where engines end up spending more time opening files than actually
processing the data.&lt;/li&gt;
  &lt;li&gt;Cross referencing all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories to remove all deleted rows
slows down the reads.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve these problems, Hive compacts delta directories asynchronously at two
levels:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Minor Compaction: This compaction combines active &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directories into one
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directory and active &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories into one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
directory thereby decreasing the number of small files. Limiting scope of this
compaction to combining only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directories keeps it fast. Minor compaction
is automatically triggered as soon as active delta directories count reaches
10 (configurable). This compaction creates new delta directories like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta_&amp;lt;start_write_id&amp;gt;_&amp;lt;end_write_id&amp;gt;&lt;/code&gt; where [start_write_id, end_write_id]
gives the range of existing delta directories that we compacted. Similar naming
convention is used for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directory.&lt;/li&gt;
  &lt;li&gt;Major Compaction: Minor compaction does not work on merging base, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories as that requires rewriting of data with only the
non-deleted rows, hence time consuming. This work is handled by a separate, less
frequent and longer running, compaction called Major compaction. Major
compaction is triggered when the total size of delta directories reaches
10% (configurable) of the base directory size. This compaction creates a new
Base directory.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;locks&quot;&gt;Locks&lt;/h2&gt;
&lt;p&gt;Hive uses shared locks to control what operations can run in parallel on
partition/table. For example, DML queries take a write-lock on partitions they
are modifying while read queries take a read-lock on partitions they are
reading. The read-locks taken by read queries prevents Hive from cleaning up the
delta directories that have been compacted while they are being read by the
query.&lt;/p&gt;

&lt;h1 id=&quot;changes-in-presto-to-support-hive-acid-and-transactional-ables&quot;&gt;Changes in Presto to support Hive ACID and transactional ables&lt;/h1&gt;

&lt;p&gt;At high level, there are changes at two places in Presto to support Hive ACID
and transactional tables: In split generation logic that runs in coordinator and
in ORC reader that is used in workers.&lt;/p&gt;

&lt;h2 id=&quot;split-generation&quot;&gt;Split generation&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;Hive ACID State is setup in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiTransactionalHiveMetastore.beginQuery&lt;/code&gt;,
only for Hive transactional tables:
    &lt;ol&gt;
      &lt;li&gt;A new Hive transaction is opened per Query&lt;/li&gt;
      &lt;li&gt;A shared read-lock is obtained from Metastore server for the partitions
 read in the query&lt;/li&gt;
      &lt;li&gt;A Heartbeat mechanism is set up to inform the Metastore server about
 liveliness periodically. Frequency of heartbeats is figured out from the
 Metastore server but can be overridden with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.transaction-heartbeat-interval&lt;/code&gt;
 property.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader&lt;/code&gt; is set up with valid WriteIds for the partitions as
provided by Metastore server&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader.loadPartitions&lt;/code&gt; is called in an Executor to create
splits for each partition:
    &lt;ol&gt;
      &lt;li&gt;ACID sub-directories: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories are
 figured out by listing the partition location&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt;, a registry of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories, is
 created. It contains minimal information through which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
 directory paths can be recreated at workers.&lt;/li&gt;
      &lt;li&gt;HiveSplits are created with each location of base and delta directories.
 Each HiveSplit contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;If the table is Insert-Only transactional table then
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt; is empty and the HiveSplit is same as the HiveSplit
 on flat/non-transactional Hive table&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;reading-hive-transactional-data-in-workers&quot;&gt;Reading Hive transactional data in workers&lt;/h2&gt;

&lt;p&gt;The HiveSplit generated during the split generation phase make their way to
worker nodes where OrcPageSourceFactory is used to create PageSource for
TableScan operator.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Insert-Only transactional tables are read in the same way a non-transactional
tables are read, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSource&lt;/code&gt; is created for their splits which reads the
data for the split and makes it available to TableScanOperator&lt;/li&gt;
  &lt;li&gt;CRUD transactional tables need special handling during reads because the file
schema does not match the table for them due to the synthetic RowId column added
which introduces additional Struct nesting as mentioned earlier:
    &lt;ol&gt;
      &lt;li&gt;RowId columns are added to the list of columns to be read from file&lt;/li&gt;
      &lt;li&gt;ORC reader is setup by accessing column name from the file instead of
 using the column indexes from table schema, equivalent to forcing
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.orc.use-column-names=true&lt;/code&gt; for CRUD transactional tables&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt; is created for the ORC file of the split&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt; is created for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; locations, if any.&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSouce&lt;/code&gt; is created that returns rows from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt;
 which are not present in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt;. This cross referencing of deleted
 rows is done lazily for each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; only when that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; is
 needed to be read from the PageSource. This works well with the lazy
 materialization logic of Presto to skip over Blocks if a predicate does not
 apply to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; at all.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;performance-numbers&quot;&gt;Performance numbers&lt;/h1&gt;
&lt;p&gt;Each Insert on Hive transactional table can create additional splits for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt;
directories and each delete can create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories that adds
additional work of cross referencing deleted rows while reading the split. To
measure the impact of these operations on reads from Presto we ran the following
performance tests where multiple Hive transactional tables are created with
varying number of Insert and Delete operations and runtime of different
read-focused Presto queries were recorded:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Table Type&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Description&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;delta directories&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;delete_delta directories&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Flat&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;TPCDS store_sales scale 3000 table, 8.6B rows&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Only Base&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Hive transactional store_sales scale 3000 table: 8.6B rows&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 1-Delete&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Only Base” with rows having customer_id=100 deleted by 1 DELETE query: 347 deleted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 1-Delete + 1-Insert&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Base + 1 Delete” with deleted rows added back by 1 INSERT query: 347 deleted entries + 347 inserted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 5-Deletes&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Only Base” with rows for 5 customer_ids deleted by 5 DELETE queries: 1355 rows deleted&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 5-Deletes + 5-Inserts&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Base + 1 Delete” with deleted rows added back by 5 INSERT queries: 1355 deleted entries + 1355 inserted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Following is the result of these tests, ran on a cluster with 5 c3.4xlarge
machines on AWS:
&lt;img src=&quot;/assets/blog/hive-acid/perf.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It was seen that there is an impact of deleted rows on read performance, which
is expected as the work for the reader increases in this case. But with
predicates in place, this impact was reduced as the amount of data to be read
goes down.&lt;/p&gt;

&lt;h1 id=&quot;ongoing-and-future-work&quot;&gt;Ongoing and Future work&lt;/h1&gt;
&lt;p&gt;There has been ongoing work on the Hive ACID integration and some improvements
are planned in future, notably:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Bucketed Hive transactional table support has been added (&lt;a href=&quot;https://github.com/trinodb/trino/pull/1591&quot;&gt;#1591&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Support for original files is in progress (&lt;a href=&quot;https://github.com/trinodb/trino/pull/2930&quot;&gt;#2930&lt;/a&gt;),
this will allow Presto to read the Hive tables that were converted to
transactional table at some point after having non-transactional data&lt;/li&gt;
  &lt;li&gt;Write support will be taken up in future (&lt;a href=&quot;https://github.com/trinodb/trino/issues/1956&quot;&gt;#1956&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;There is ongoing work on Hive side for ACID on Parquet format. Once that
lands, Presto’s implementation will be extended to support Parquet too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;acknowledgements-and-conclusion&quot;&gt;Acknowledgements and Conclusion&lt;/h1&gt;
&lt;p&gt;Thanks to the folks who helped out in the development of this feature:
&lt;a href=&quot;https://www.linkedin.com/in/abhishek-somani-a946aa1b&quot;&gt;Abhishek Somani&lt;/a&gt; provided
continuous guidance on internals of Hive ACID,
&lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom&quot;&gt;Dain&lt;/a&gt; helped out with simplifying
ORC reader and along with &lt;a href=&quot;https://www.linkedin.com/in/piotrfindeisen/&quot;&gt;Piotr&lt;/a&gt;
helped in code refinement and with multiple rounds of reviews.&lt;/p&gt;

&lt;p&gt;While we continue development on this feature to get full fledged support
including writes, you can start using it on Hive transactional tables which do
not have files in flat format. If you have such tables and want to use Presto
with them then you can apply &lt;a href=&quot;https://github.com/trinodb/trino/pull/2930&quot;&gt;this fix&lt;/a&gt;
to your Presto installation or you can trigger a  major compaction on all
partitions to migrate full table into CRUD transactional table format.&lt;/p&gt;</content>

      
        <author>
          <name>Shubham Tagra, Qubole</name>
        </author>
      

      <summary>Hive ACID and transactional tables are supported in Presto since the 331 release. Hive ACID support is an important step towards GDPR/CCPA compliance, and also towards Hive 3 support as certain distributions of Hive 3 create transactional tables by default. In this blog post we cover the concepts of Hive ACID and transactional tables along with the changes done in Presto to support them. We also cover the performance tests on this integration and look at the future plans for this feature.</summary>

      
      
    </entry>
  
    <entry>
      <title>Apache Pinot Connector</title>
      <link href="https://trino.io/blog/2020/05/25/pinot-connector.html" rel="alternate" type="text/html" title="Apache Pinot Connector" />
      <published>2020-05-25T00:00:00+00:00</published>
      <updated>2020-05-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/25/pinot-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/25/pinot-connector.html">&lt;p&gt;Presto 334 introduces the new &lt;a href=&quot;https://trino.io/docs/current/connector/pinot.html&quot;&gt;Pinot Connector&lt;/a&gt;
which allows Presto to query data stored in &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot™&lt;/a&gt;.
Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot
alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.&lt;/p&gt;

&lt;p&gt;Pinot UDF’s can be directly used by including the Pinot SQL query in quotes, explained below in the &lt;em&gt;Pinot SQL Passthrough&lt;/em&gt; section.
This enables aggregations and other complex query types to be done directly in Pinot.&lt;/p&gt;

&lt;p&gt;This connector supports Pinot 0.3.0 and newer.&lt;/p&gt;

&lt;h1 id=&quot;setup&quot;&gt;Setup&lt;/h1&gt;

&lt;p&gt;Create a properties file in the catalog directory, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/pinot.properties&lt;/code&gt; which includes at least the
following to get started:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;connector.name=pinot
pinot.controller-urls=host1:9000,host2:9000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt; is a comma separated list of controller hosts. If Pinot is deployed via &lt;a href=&quot;https://kubernetes.io/&quot;&gt;Kubernetes&lt;/a&gt; and you expose the 
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt; needs to point to the controller Service endpoint. The Pinot broker and server must be accessible
via DNS as Pinot will return hostnames and not ip addresses.&lt;/p&gt;

&lt;p&gt;If you have a smaller number of Pinot servers than Presto workers or a relatively small number of rows per Pinot segment,
you can minimize the requests to pinot by increasing the number of Pinot segments per split (default is 1 segment per split):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pinot.segments-per-split=15
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If DNS resolution is slow or you get &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Request timed out&lt;/code&gt; errors, you can increase the request timeout as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pinot.request-timeout=3m
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;schema&quot;&gt;Schema&lt;/h1&gt;

&lt;p&gt;Pinot supports the following data types. Currently null values are not supported. The corresponding Presto datatypes are:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Pinot Datatype&lt;/th&gt;
      &lt;th&gt;Presto Datatype&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;boolean&lt;/td&gt;
      &lt;td&gt;boolean&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;integer&lt;/td&gt;
      &lt;td&gt;integer&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;float, double&lt;/td&gt;
      &lt;td&gt;double&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;string, bytes*&lt;/td&gt;
      &lt;td&gt;varchar&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;integer_array&lt;/td&gt;
      &lt;td&gt;array(integer)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;float_array, double_array&lt;/td&gt;
      &lt;td&gt;array(double)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;long_array&lt;/td&gt;
      &lt;td&gt;array(bigint)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;string_array&lt;/td&gt;
      &lt;td&gt;array(varchar)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;ul&gt;
  &lt;li&gt;The Pinot &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bytes&lt;/code&gt; type is converted to a hex-encoded varchar. See the &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Pinot docs&lt;/a&gt; for more information.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;pinot-sql-passthrough&quot;&gt;Pinot SQL Passthrough&lt;/h1&gt;

&lt;p&gt;If you would like to leverage Pinot’s fast aggregations you can use a “dynamic” table where you specify the Pinot SQL 
query as the table name and it is passed directly to Pinot:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pinot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;SELECT col3, col4, MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;FOO&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;BAR&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filter in the outer presto query will be pushed down into the Pinot query via Presto’s
&lt;a href=&quot;https://github.com/trinodb/trino/blob/334/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L746&quot;&gt;applyFilter()&lt;/a&gt;.
These queries are routed to the broker and
should not return huge amounts of data as broker queries currently return a single response with all the results. This
is more suited to aggregate queries.&lt;/p&gt;

&lt;p&gt;Limits are pushed into the “dynamic” Pinot query via Presto’s
&lt;a href=&quot;https://github.com/trinodb/trino/blob/334/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L727&quot;&gt;applyLimit()&lt;/a&gt;.
The above query would yield the following Pinot PQL query:&lt;/p&gt;

&lt;p&gt;Pinot functions such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERCENTILEEST&lt;/code&gt; can be used in the quoted sql.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pinot_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;FOO&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;BAR&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you are returning a larger dataset you can issue a normal Presto query which will get routed to the Pinot servers which
store the Pinot segments. Filters and Limits are pushed down to Pinot for regular queries as well.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future Work&lt;/h1&gt;

&lt;p&gt;As Presto and Pinot continue to evolve the Pinot connector will leverage new features such as aggregation pushdown and more.&lt;/p&gt;</content>

      
        <author>
          <name>Elon Azoulay</name>
        </author>
      

      <summary>Presto 334 introduces the new Pinot Connector which allows Presto to query data stored in Apache Pinot™. Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.</summary>

      
      
    </entry>
  
    <entry>
      <title>State of Presto</title>
      <link href="https://trino.io/blog/2020/05/15/state-of-presto.html" rel="alternate" type="text/html" title="State of Presto" />
      <published>2020-05-15T00:00:00+00:00</published>
      <updated>2020-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/15/state-of-presto</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/15/state-of-presto.html">&lt;p&gt;Presto is continuing to gain adoption across many industries and use cases. Our
community is growing rapidly and there is a lot going on, so we are taking the
Presto Summit online. And we are starting with a State of Presto webinar with
the founders of the project.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with lots of questions from the audience, taking us beyond
the planned time frame. Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/epdgIsAT3EA&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us virtually to hear Presto co-creators 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;,
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;, and 
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt; talk about the state of Presto,
followed by a live Q&amp;amp;A moderated by Presto maintainer
&lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agenda:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;2020 project milestones&lt;/li&gt;
  &lt;li&gt;Community and technical growth&lt;/li&gt;
  &lt;li&gt;Recent Presto updates&lt;/li&gt;
  &lt;li&gt;Project roadmap&lt;/li&gt;
  &lt;li&gt;Live Q&amp;amp;A&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Thursday, 21 May 2020&lt;/p&gt;

&lt;p&gt;Time: 11am PDT (San Francisco), 2pm EDT (New York), 7pm BST (London), 6pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://www.starburstdata.com/webinar-state-of-presto/?utm_campaign=Webinar%20-%20State%20of%20Presto%20-%202020%20-%20May&amp;amp;utm_source=trino.io&amp;amp;utm_medium=blog&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many questions and a lively webinar.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Presto is continuing to gain adoption across many industries and use cases. Our community is growing rapidly and there is a lot going on, so we are taking the Presto Summit online. And we are starting with a State of Presto webinar with the founders of the project. Update: We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto on FLOSS Weekly</title>
      <link href="https://trino.io/blog/2020/05/06/floss-weekly.html" rel="alternate" type="text/html" title="Presto on FLOSS Weekly" />
      <published>2020-05-06T00:00:00+00:00</published>
      <updated>2020-05-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/06/floss-weekly</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/06/floss-weekly.html">&lt;p&gt;Spreading the word about our project is an important task to grow the community
around Presto. With a large, lively community we can ensure the success of
Presto. Today we had the opportunity to talk about Presto on the long running
open source podcast &lt;a href=&quot;https://twit.tv/shows/floss-weekly&quot;&gt;FLOSS Weekly&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;a href=&quot;http://www.stonehenge.com/merlyn/&quot;&gt;Randal Schwartz&lt;/a&gt; was joined by his co-host
&lt;a href=&quot;https://webmink.com/about/&quot;&gt;Simon Phipps&lt;/a&gt;. We introduced Presto overall and
talked about use cases of Presto and the problems it can solve. Both hosts, as
well as the live audience, had some great questions and we did our best to
answer them.&lt;/p&gt;

&lt;p&gt;We moved through the history of Presto, current users and usage, the community
around the project, and Dain talked about some of the upcoming improvements. In
the end it seemed like we just scratched the surface and all wanted to keep
talking about the project.&lt;/p&gt;

&lt;p&gt;It was a great conversation and you should check it out!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;watch-a-recording-of-the-presto-episode-of-floss-weekly-now&quot;&gt;&lt;a href=&quot;https://twit.tv/shows/floss-weekly/episodes/577?autostart=false&quot;&gt;Watch a recording of the Presto episode of FLOSS Weekly now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Dain Sundstrom and Manfred Moser</name>
        </author>
      

      <summary>Spreading the word about our project is an important task to grow the community around Presto. With a large, lively community we can ensure the success of Presto. Today we had the opportunity to talk about Presto on the long running open source podcast FLOSS Weekly.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto: The Definitive Guide</title>
      <link href="https://trino.io/blog/2020/04/11/the-definitive-guide.html" rel="alternate" type="text/html" title="Presto: The Definitive Guide" />
      <published>2020-04-11T00:00:00+00:00</published>
      <updated>2020-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/04/11/the-definitive-guide</id>
      <content type="html" xml:base="https://trino.io/blog/2020/04/11/the-definitive-guide.html">&lt;p&gt;Nearly two years ago Matt and Martin got the ball rolling on getting a book
about Presto happening. A thriving project and community like everyone around
Dain, David and Martin, the founders and creators of Presto, just needs a book.
Even in this digital age of online documentation, communities on chat and other
platforms, and videos everywhere, there is great value in a well structured and
written book. Today, we are happy to announce that our book &lt;strong&gt;Presto: The
Definitive Guide&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-of-trino-the-definitive-guide-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy of Trino: The Definitive Guide&lt;/a&gt; from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; now!&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;This first book about Presto, is finally available for you all to get, read and
hopefully learn from.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update April 2021&lt;/strong&gt;: The project has moved to the
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;new name Trino&lt;/a&gt;, and the content
of our book
&lt;a href=&quot;/blog/2021/04/21/the-definitive-guide.html&quot;&gt;has been updated&lt;/a&gt; to
&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/ttdg-cover.png&quot; align=&quot;right&quot; style=&quot;float: right; margin-left: 20px; margin-bottom: 20px; width: 100%; max-width: 350px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;With the help of O’Reilly, the book is now available in digital form, and paper
copies are just around the corner as well. You can find more information about
the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our permanent page about
it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is based on the very recent 330 release of Presto, but applicable to any
Presto version. The book is broken up into three separate parts. No matter, if
you are beginner keen to learn, or maybe with just a bit of command line and SQL
knowledge, or an advanced or even expert Presto user, we are certain that you
can learn something from the book and encourage you to check it out.&lt;/p&gt;

&lt;p&gt;The first part of the book establishes what Presto is, and gets you quick wins
to install a minimal setup, run it, connect to it with the CLI and an
application using the JDBC driver and run some SQL queries.&lt;/p&gt;

&lt;p&gt;The second part dives into the details of the Presto architecture, query
planning, connectors for all sorts of data sources and SQL usage. There is a lot
to learn and digest in these main sections.&lt;/p&gt;

&lt;p&gt;In the third part we round things out with tuning tips, a good overview
of the Web UI, usage of other tools, security configuration and more tips to get
Presto into production.&lt;/p&gt;

&lt;p&gt;Of course, putting all this information together requires work from many people.
And in fact we did get lots of help from members of the Presto community and
O’Reilly.&lt;/p&gt;

&lt;p&gt;Specifically, we have some great news from our major supporter, Starburst!
Starburst allowed us to work on the book and bring it across the finish line.&lt;/p&gt;

&lt;p&gt;And that turns out to be great news for you all as well. Not only is the book
finished now, you can also get a
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free digital copy of Trino: The Definitive Guide&lt;/a&gt;
from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;code repository for
the book&lt;/a&gt;, provide
feedback and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to it all!&lt;/p&gt;

&lt;p&gt;Matt, Manfred and Martin&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Exhausted, but happy authors&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Matt Fuller, Manfred Moser and Martin Traverso</name>
        </author>
      

      <summary>Nearly two years ago Matt and Martin got the ball rolling on getting a book about Presto happening. A thriving project and community like everyone around Dain, David and Martin, the founders and creators of Presto, just needs a book. Even in this digital age of online documentation, communities on chat and other platforms, and videos everywhere, there is great value in a well structured and written book. Today, we are happy to announce that our book Presto: The Definitive Guide. Get a free copy of Trino: The Definitive Guide from Starburst now! This first book about Presto, is finally available for you all to get, read and hopefully learn from. Update April 2021: The project has moved to the new name Trino, and the content of our book has been updated to Trino: The Definitive Guide.</summary>

      
      
    </entry>
  
    <entry>
      <title>Beyond LIMIT, Presto meets OFFSET and TIES</title>
      <link href="https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties.html" rel="alternate" type="text/html" title="Beyond LIMIT, Presto meets OFFSET and TIES" />
      <published>2020-02-03T00:00:00+00:00</published>
      <updated>2020-02-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties</id>
      <content type="html" xml:base="https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties.html">&lt;p&gt;Presto follows the SQL Standard faithfully. We extend it only when it is well justified,
we strive to never break it and we always prefer the standard way of doing things.
There was one situation where we stumbled, though. We had a non-standard way of limiting
query results with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; without implementing the standard way of doing that first.
We have corrected that, adding ANSI SQL way of limiting query results, discarding initial
results and – a hidden gem – retaining initial results in case of ties.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;limiting-query-results&quot;&gt;Limiting query results&lt;/h1&gt;

&lt;p&gt;Probably everyone using relational databases knows the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; syntax for limiting query
results. It is supported by e.g. MySQL, PostgreSQL and many more SQL engines following
their example. It is so common that one could think that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; is the standard way
of limiting the query results.  Let’s have a look at how various popular SQL engines
provide this feature.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;DB2, MySQL, MariaDB, PostgreSQL, Redshift, MemSQL, SQLite and many others provide the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;... LIMIT n&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;SQL Server provides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT TOP n ...&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;Oracle provides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;... WHERE ROWNUM &amp;lt;= n&lt;/code&gt; syntax.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what does the SQL Standard say?&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ONLY&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we look again at the database systems mentioned above, it turns out many of them support the standard
syntax too: Oracle, DB2, SQL Server and PostgreSQL (although that’s not documented currently).&lt;/p&gt;

&lt;p&gt;And Presto? Presto has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; support since 2012. In &lt;a href=&quot;https://trino.io/docs/current/release/release-310.html&quot;&gt;Presto 310&lt;/a&gt;,
we added also the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; support.&lt;/p&gt;

&lt;p&gt;Let’s have a look beyond the limits.&lt;/p&gt;

&lt;h1 id=&quot;tie-break&quot;&gt;Tie break&lt;/h1&gt;

&lt;p&gt;Admittedly, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; syntax is way more verbose than the short &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; syntax Presto
always supported (and still does). However, it is also more powerful: it allows selecting rows “top n,
ties included”. Consider a case where you want to list top 3 students with highest score on an exam.
What happens if the 3&lt;sup&gt;rd&lt;/sup&gt;, 4&lt;sup&gt;th&lt;/sup&gt; and 5&lt;sup&gt;th&lt;/sup&gt; persons have equal score? Which
one should be returned? Instead of getting an arbitrary (and indeterminate) result you can use
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS WITH TIES&lt;/code&gt; syntax:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS WITH TIES&lt;/code&gt; clause retains all rows with equal values of the ordering keys (the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause) as
the last row that would be returned by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; clause.&lt;/p&gt;

&lt;h1 id=&quot;offset&quot;&gt;Offset&lt;/h1&gt;

&lt;p&gt;Per the SQL Standard, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; clause can be prepended with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET m&lt;/code&gt;, to skip &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m&lt;/code&gt; initial rows.
In such a case, it makes sense to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH NEXT ...&lt;/code&gt; variant of the clause – it’s allowed with and without &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;,
but definitely looks better with that clause.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;OFFSET&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NEXT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As an extension to SQL Standard, and for the brevity of this syntax, we also allow &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;OFFSET&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;concluding-notes&quot;&gt;Concluding notes&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... ROWS ONLY&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... WITH TIES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; are powerful and very useful clauses
that come especially handy when writing ad-hoc queries over big data sets. They offer certain syntactic freedom beyond
what is described here, so check out documentation of &lt;a href=&quot;/docs/current/sql/select.html#offset-clause&quot;&gt;OFFSET Clause&lt;/a&gt; and
&lt;a href=&quot;/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;LIMIT or FETCH FIRST Clauses&lt;/a&gt; for all the options.
Since semantics of these clauses depend on query results being well ordered, they are best used with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; that
defines proper ordering. Without proper ordering the results are arbitrary (except for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH TIES&lt;/code&gt;) which may or may
not be a problem, depending on the use case.&lt;/p&gt;

&lt;p&gt;For scheduled queries, or queries that are part of some workflow (as opposed to ad-hoc), we recommend using query
predicates (where relevant) instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;. Read more at
&lt;a href=&quot;https://use-the-index-luke.com/sql/partial-results/fetch-next-page&quot;&gt;https://use-the-index-luke.com/sql/partial-results/fetch-next-page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>Presto follows the SQL Standard faithfully. We extend it only when it is well justified, we strive to never break it and we always prefer the standard way of doing things. There was one situation where we stumbled, though. We had a non-standard way of limiting query results with LIMIT n without implementing the standard way of doing that first. We have corrected that, adding ANSI SQL way of limiting query results, discarding initial results and – a hidden gem – retaining initial results in case of ties.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto in 2019: Year in Review</title>
      <link href="https://trino.io/blog/2020/01/01/2019-summary.html" rel="alternate" type="text/html" title="Presto in 2019: Year in Review" />
      <published>2020-01-01T00:00:00+00:00</published>
      <updated>2020-01-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/01/01/2019-summary</id>
      <content type="html" xml:base="https://trino.io/blog/2020/01/01/2019-summary.html">&lt;p&gt;What a great year for the Presto community! We started with the year with the launch of the 
&lt;a href=&quot;/blog/2019/01/31/presto-software-foundation-launch.html&quot;&gt;Presto Software Foundation&lt;/a&gt;, 
with the long term goal of ensuring the project remains collaborative, open and independent from 
any corporate interest, for years to come.&lt;/p&gt;

&lt;p&gt;Since then, the community around Presto has grown and consolidated. We’ve seen contributions 
from more than 120 people across over 20 companies. Every week, 280 users and developers 
interact in the project’s &lt;a href=&quot;/slack.html&quot;&gt;Slack channel&lt;/a&gt;. We’d like to take the opportunity to thank 
everyone that contributed the project in one way or another. Presto wouldn’t be what it is without your 
help.&lt;/p&gt;

&lt;p&gt;With the collaboration of companies such as &lt;a href=&quot;https://starburstdata.com&quot;&gt;Starburst&lt;/a&gt;, &lt;a href=&quot;https://qubole.com&quot;&gt;Qubole&lt;/a&gt;, 
&lt;a href=&quot;https://varada.io&quot;&gt;Varada&lt;/a&gt;, &lt;a href=&quot;https://twitter.com&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.treasuredata.com&quot;&gt;ARM Treasure Data&lt;/a&gt;,
&lt;a href=&quot;https://wix.com&quot;&gt;Wix&lt;/a&gt;, &lt;a href=&quot;https://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;, and the &lt;a href=&quot;https://www.meetup.com/Big-things-are-happening-here/&quot;&gt;Big Things community&lt;/a&gt;,
we ran several Presto summits across the world:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/03/Presto-Conference-Israel.html&quot;&gt;Tel Aviv, Israel, April 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/17/Presto-Summit.html&quot;&gt;San Francisco, USA, June 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;Tokyo, Japan, July 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/09/05/Presto-Summit-Bangalore.html&quot;&gt;Bangalore, India, September, 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburstdata.com/technical-blog/nyc-presto-summit-recap/&quot;&gt;New York, USA, December 2019&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these events were a huge success and brought thousands of Presto users, contributors and other community members together to 
share their knowledge and experiences.&lt;/p&gt;

&lt;p&gt;The project has been more active than ever. We completed 28 releases comprised of more than 2850 
commits in over 1500 pull requests. Of course, that alone is not a good measure of progress, so 
let’s take a closer look at everything that went in. And there is a lot to look at!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;language-features&quot;&gt;Language Features&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS [ONLY | WITH TIES]&lt;/code&gt;&lt;/a&gt; 
standard syntax. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH TIES&lt;/code&gt; clause is particularly useful when some of the rows have the same 
value for the columns being used to order the results of a query. Consider a case where you want to 
list top 5 students with highest score on an exam. If the 6th person has the same score as the 5th, you 
want to know this as well, instead of getting an arbitrary and non-deterministic result:&lt;/p&gt;

    &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;USING&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/select.html#offset-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;&lt;/a&gt; syntax, which is especially useful in ad-hoc queries.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/comment.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COMMENT ON &amp;lt;table&amp;gt;&lt;/code&gt;&lt;/a&gt; syntax to 
set or remove table comments. Comments can be shown via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIBE&lt;/code&gt;
or the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.metadata.table_comments&lt;/code&gt; table.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt; in the context of an outer join.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; in the context of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEFT JOIN&lt;/code&gt;. With this feature, it is now possible 
to preserve the outer row when the array contains zero elements or is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt;. Most common usages
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN&lt;/code&gt; should actually be using this form.&lt;/p&gt;

    &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IGNORE NULLS&lt;/code&gt; clause for window functions. This is useful when combined with 
functions such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lead&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lag&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;first_value&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_value&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nth_value&lt;/code&gt; if the dataset contains nulls.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; expansion using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.*&lt;/code&gt; operator.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-schema.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE SCHEMA&lt;/code&gt;&lt;/a&gt; syntax and support 
in various connectors (Hive, Iceberg, MySQL, PostgreSQL, Redshift, SQL Server, Phoenix).&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;+&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Subscript operator to access &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; type fields by index. This greatly improves usability 
and readability of queries when dealing with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; types containing anonymous fields.&lt;/p&gt;

    &lt;p&gt;&lt;img src=&quot;/assets/blog/2019-review/row-ordinal.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;query-engine&quot;&gt;Query Engine&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Generalize conditional, lazy loading and processing (a.k.a., Late Materialization) beyond 
Table Scan, Filter and Projection to support Join, Window, TopN and SemiJoin operators. This can dramatically 
reduce latency, CPU and I/O for highly selective queries. This is one of the most important performance 
optimizations in recent times and we will be blogging about this more in coming weeks.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;Unwrap cast/predicate pushdown&lt;/a&gt; optimizations.&lt;/li&gt;
  &lt;li&gt;Connector pushdown during planning for operations such as limit, table sample, or projections. This allows 
connectors to optimize how data is accessed before it’s provided to the Presto engine for further processing.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/06/30/dynamic-filtering.html&quot;&gt;Dynamic filtering&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Cost-Based Optimizer can now consider &lt;a href=&quot;https://github.com/trinodb/trino/pull/247&quot;&gt;estimated query peak memory&lt;/a&gt; 
footprint. This is especially useful for optimizing bigger queries, where not all parts of the query can 
be run concurrently.&lt;/li&gt;
  &lt;li&gt;Improved handling of &lt;a href=&quot;https://github.com/trinodb/trino/pull/1431&quot;&gt;projections&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/864&quot;&gt;aggregations&lt;/a&gt; and &lt;a href=&quot;https://github.com/trinodb/trino/pull/1359&quot;&gt;cross joins&lt;/a&gt; 
in cost based optimizer.&lt;/li&gt;
  &lt;li&gt;Improved accounting and reporting of physical and network data read or transmitted during query processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/08/23/unnest-operator-performance-enhancements.html&quot;&gt;10x performance improvement for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;2-7x improvement in performance of &lt;a href=&quot;/blog/2019/04/23/even-faster-orc.html&quot;&gt;ORC decoders&lt;/a&gt;, resulting in a 
10% global CPU improvement for the TPC-DS benchmark.&lt;/li&gt;
  &lt;li&gt;Improvements when reading small Parquet files, files with large number of columns, or files with small row
groups. We found this very useful, for example, when working with data exported from Snowflake.&lt;/li&gt;
  &lt;li&gt;Support for new ORC bloom filters.&lt;/li&gt;
  &lt;li&gt;Remove &lt;a href=&quot;/blog/2019/06/03/redundant-order-by.html&quot;&gt;redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;&lt;/a&gt; clauses.&lt;/li&gt;
  &lt;li&gt;Improvements for &lt;a href=&quot;/blog/2019/06/03/redundant-order-by.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT-IN&lt;/code&gt;&lt;/a&gt; with subquery expressions (i.e., semijoin).&lt;/li&gt;
  &lt;li&gt;Huge performance improvements when &lt;a href=&quot;https://github.com/trinodb/trino/pull/1329&quot;&gt;reading from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;information_schema&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Reduce query latency and Hive metastore load, for both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Improve metadata handling during planning. This can result in dramatic improvements in latency, 
especially for connectors such as MySQL, PostgreSQL, Redshift, SQL Server, etc. Some queries like 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW SCHEMAS&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW TABLES&lt;/code&gt; that could take several minutes to complete now finish in a few seconds.&lt;/li&gt;
  &lt;li&gt;Improved stability, performance, and security when spilling is enabled.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;functions&quot;&gt;Functions&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/array.html#combinations&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;combinations&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/conversion.html#format&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/uuid.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type&lt;/a&gt; and related functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/array.html#all_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;all_match&lt;/code&gt;&lt;/a&gt;,
&lt;a href=&quot;/docs/current/functions/array.html#any_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;any_match&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/array.html#none_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;none_match&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Support flexible aggregation with lambda expressions using
  &lt;a href=&quot;/docs/current/functions/aggregate.html#reduce_agg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce_agg&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;New date and time functions: &lt;a href=&quot;/docs/current/functions/datetime.html#last_day_of_month&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_day_of_month&lt;/code&gt;&lt;/a&gt;,
&lt;a href=&quot;/docs/current/functions/datetime.html#at_timezone&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;at_timezone&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/datetime.html#with_timezone&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;with_timezone&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-role.html&quot;&gt;Role-based access control&lt;/a&gt; and related commands.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-view.html#security&quot;&gt;INVOKER security mode&lt;/a&gt; for views, which allows views to be run using the permissions of the 
current user.&lt;/li&gt;
  &lt;li&gt;Prevent replay attacks and result hijacking in client APIs.&lt;/li&gt;
  &lt;li&gt;JWT-based &lt;a href=&quot;/docs/current/security/internal-communication.html#internal-authentication&quot;&gt;internal communication&lt;/a&gt; authentication,
which obsoletes the need to use Kerberos or certificates and greatly simplifies secure setups.&lt;/li&gt;
  &lt;li&gt;Credential passthrough, which allows Presto to authenticate with the underlying data source with 
credentials provided by the user running a query. This especially useful when dealing with
Google Storage in GCP or SQL databases that manage user authentication and authorization on 
their own.&lt;/li&gt;
  &lt;li&gt;Impersonation for &lt;a href=&quot;/docs/current/connector/hive.html#hive-thrift-metastore-configuration-properties&quot;&gt;Hive metastore&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Support for reading and writing encrypted files in HDFS using Hadoop KMS.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://trino.io/docs/current/admin/spill.html#spill-encryption&quot;&gt;encrypting spilled data&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;geospatial&quot;&gt;Geospatial&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;New geospatial functions: 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Points&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Points&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Length&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Length&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Area&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Area&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#line_interpolate_point&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_interpolate_point&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/geospatial.html#line_interpolate_points&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_interpolate_points&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SphericalGeography&lt;/code&gt; type and &lt;a href=&quot;/docs/current/functions/geospatial.html#to_spherical_geography&quot;&gt;related functions&lt;/a&gt; 
to support spatial features in geographic coordinates (latitude / longitude) using a spherical model of the earth.&lt;/li&gt;
  &lt;li&gt;Support for Google Maps Polyline format via &lt;a href=&quot;/docs/current/functions/geospatial.html#to_encoded_polyline&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_encoded_polyline&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;/docs/current/functions/geospatial.html#from_encoded_polyline&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_encoded_polyline&lt;/code&gt;&lt;/a&gt; functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/geospatial.html#geometry_from_hadoop_shape&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry_from_hadoop_shape&lt;/code&gt;&lt;/a&gt; to decode geometry objects in 
Spatial Framework for Hadoop representation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;cloud-integration&quot;&gt;Cloud Integration&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Support for Azure Data Lake Blob and ADLS Gen2 storage.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-gcs-tutorial.html&quot;&gt;Google Cloud Storage&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Several &lt;a href=&quot;/blog/2019/05/06/faster-s3-reads.html&quot;&gt;performance improvements&lt;/a&gt; for AWS S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;cli-and-jdbc-driver&quot;&gt;CLI and JDBC Driver&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;JSON output format and improvements to CSV output format.&lt;/li&gt;
  &lt;li&gt;Support and stability improvements for running the CLI and JDBC driver with Java 11.&lt;/li&gt;
  &lt;li&gt;Improve compatibility of JDBC driver with third-party tools.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Syntax highlighting and multi-line editing.&lt;/p&gt;

    &lt;p&gt;&lt;img src=&quot;/assets/blog/2019-review/presto-cli.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-connectors&quot;&gt;New Connectors&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/googlesheets.html&quot;&gt;Google Sheets&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/kinesis.html&quot;&gt;Amazon Kinesis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/06/04/phoenix-connector.html&quot;&gt;Apache Phoenix&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/memsql.html&quot;&gt;MemSQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Apache Iceberg (preview version still under development)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-improvements&quot;&gt;Other Improvements&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://hub.docker.com/r/prestosql/presto&quot;&gt;Presto Docker image&lt;/a&gt; that provides an out-of-the-box single node 
  cluster with the JMX, memory, TPC-DS, and TPC-H catalogs. It can be deployed as a full cluster by 
  mounting in configuration and can be used for Kubernetes deployments.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for LZ4 and Zstd compression in Parquet and ORC. LZ4 is currently the recommended algorithm for fast, lightweight
compression, and Zstd otherwise.&lt;/li&gt;
  &lt;li&gt;Support for insert-only Hive transactional tables and Hive bucketing v2 as part of 
&lt;a href=&quot;/blog/2019/12/28/hive-3.html&quot;&gt;making Presto compatible with Hive 3&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improvements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ANALYZE&lt;/code&gt; statement for Hive connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/blog/2019/05/29/improved-hive-bucketing.html&quot;&gt;multiple files per bucket&lt;/a&gt; 
for Hive tables. This allows inserting data into bucketed tables without having to rewrite entire partitions
and improves Presto compatibility with Hive and other tools.&lt;/li&gt;
  &lt;li&gt;Support for upper- and mixed-case table and column names in JDBC-based connectors.&lt;/li&gt;
  &lt;li&gt;New features and improvements in type mappings in PostgreSQL, MySQL, SQL Server and Redshift
connectors. This includes support for PostgreSQL arrays and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp with time zone&lt;/code&gt; type, and 
the ability to read columns of unsupported types.&lt;/li&gt;
  &lt;li&gt;Improvements in &lt;a href=&quot;https://github.com/trinodb/trino/pull/833&quot;&gt;Hive compatibility with Hive version 2.3&lt;/a&gt; 
and &lt;a href=&quot;https://github.com/trinodb/trino/pull/1937&quot;&gt;with Cloudera (CDH)’s Hive&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Connector provided view definitions, which allow connectors to generate the definition dynamically at query time. 
For example, the connector can provide a union of two tables filtered on a disjoint time range, with the cutoff 
time determined at resolution time.&lt;/li&gt;
  &lt;li&gt;Lots and lots of bug fixes!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;coming-up&quot;&gt;Coming Up…&lt;/h1&gt;

&lt;p&gt;These are some of the projects that are currently in progress and are likely to land in the short term.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for pushing down row dereference expressions into connectors. This will help reduce 
the amount of data and CPU needed to process highly nested columnar formats such as ORC and Parquet.&lt;/li&gt;
  &lt;li&gt;Extend dynamic filtering to support distributed joins and other operators. Use dynamic filters for 
pruning partitions at runtime when querying Hive.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2418&quot;&gt;Extended Late Materialization&lt;/a&gt; support to queries involving 
complex correlated subqueries.&lt;/li&gt;
  &lt;li&gt;Finalize &lt;a href=&quot;/blog/2019/12/28/hive-3.html&quot;&gt;Hive 3 support&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improved &lt;a href=&quot;https://github.com/trinodb/trino/pull/2358&quot;&gt;INSERT into partitioned tables&lt;/a&gt;, which will help with 
large ETL queries.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/1324&quot;&gt;Improvements and features&lt;/a&gt; in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2028&quot;&gt;Pinot&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/1959&quot;&gt;Oracle&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2397&quot;&gt;Influx&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2321&quot;&gt;Prometheus&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CQT2JH4KG/p1576038838027500&quot;&gt;Salesforce&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://github.com/trinodb/trino/pull/2106&quot;&gt;Confluent registry in Kafka connector&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Revamp of the function registry and function resolution to support dynamically-resolved 
functions and SQL-defined functions.&lt;/li&gt;
  &lt;li&gt;A new &lt;a href=&quot;https://github.com/trinodb/trino/pull/2004&quot;&gt;Parquet writer&lt;/a&gt; optimized to work efficiently 
within Presto.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;… and many, many more.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>What a great year for the Presto community! We started with the year with the launch of the Presto Software Foundation, with the long term goal of ensuring the project remains collaborative, open and independent from any corporate interest, for years to come. Since then, the community around Presto has grown and consolidated. We’ve seen contributions from more than 120 people across over 20 companies. Every week, 280 users and developers interact in the project’s Slack channel. We’d like to take the opportunity to thank everyone that contributed the project in one way or another. Presto wouldn’t be what it is without your help. With the collaboration of companies such as Starburst, Qubole, Varada, Twitter, ARM Treasure Data, Wix, Red Hat, and the Big Things community, we ran several Presto summits across the world: Tel Aviv, Israel, April 2019 San Francisco, USA, June 2019 Tokyo, Japan, July 2019 Bangalore, India, September, 2019 New York, USA, December 2019 All these events were a huge success and brought thousands of Presto users, contributors and other community members together to share their knowledge and experiences. The project has been more active than ever. We completed 28 releases comprised of more than 2850 commits in over 1500 pull requests. Of course, that alone is not a good measure of progress, so let’s take a closer look at everything that went in. And there is a lot to look at!</summary>

      
      
    </entry>
  
    <entry>
      <title>Hive 3 support in Presto</title>
      <link href="https://trino.io/blog/2019/12/28/hive-3.html" rel="alternate" type="text/html" title="Hive 3 support in Presto" />
      <published>2019-12-28T00:00:00+00:00</published>
      <updated>2019-12-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/12/28/hive-3</id>
      <content type="html" xml:base="https://trino.io/blog/2019/12/28/hive-3.html">&lt;p&gt;The Hive community is centered around a few different Hive distributions, one of them
being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there
is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game.&lt;/p&gt;

&lt;p&gt;In this post, we summarize which Hive 3 features Presto already supports, covering
all the work that went into Presto to achieve that. We also outline next steps lying
ahead.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;There are several Hive versions in active use by the Hive community: 0.x, 1.x, 2.x
and 3.x. Hive 3 major release brings a number of interesting features, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;support for Hadoop Erasure Coding (EC), allowing &lt;a href=&quot;https://blog.cloudera.com/introduction-to-hdfs-erasure-coding-in-apache-hadoop/&quot;&gt;much better HDFS storage capacity
utilization&lt;/a&gt;
without reducing data availability,&lt;/li&gt;
  &lt;li&gt;update to ORC ACID transactional tables - they no longer need to be bucketed,&lt;/li&gt;
  &lt;li&gt;transactional tables for all file formats (“insert-only” except for ORC),&lt;/li&gt;
  &lt;li&gt;materialized views,&lt;/li&gt;
  &lt;li&gt;new bucketing function, offering a better data distribution and less data skew,&lt;/li&gt;
  &lt;li&gt;new timestamp semantics and timestamp-related changes in file formats,&lt;/li&gt;
  &lt;li&gt;and a lot more (let’s skip over features and changes that are not interesting from
Presto perspective).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s no surprise that many people want to try out all these features and run Hive 3,
either the Apache project’s official release or using HDP version 3.&lt;/p&gt;

&lt;h1 id=&quot;hive-3-in-presto&quot;&gt;Hive 3 in Presto&lt;/h1&gt;

&lt;p&gt;The Presto community expressed interest in using Presto with Hive 3, both in the project’s
&lt;a href=&quot;https://github.com/trinodb/trino/issues/576&quot;&gt;issues&lt;/a&gt; and on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You spoke, we listened. Actually – we, community, spoke &lt;em&gt;and&lt;/em&gt; listened.&lt;/p&gt;

&lt;p&gt;In collaboration between Starburst, Qubole and the wider Presto community, Presto gradually
improves its compatibility with Hive 3:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto 319 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1532&quot;&gt;fixed issues with backwards-incompatible changes in Hive metastore thrift API&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 320 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1614&quot;&gt;added continuous integration with Hive 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 321 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1697&quot;&gt;added support for Hive bucketing v2&lt;/a&gt;
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;bucketing_version&quot;=&quot;2&quot;&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Presto 325 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1958&quot;&gt;added continuous integration with HDP 3’s Hive 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 327 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1034&quot;&gt;added support for reading from insert-only transactional tables&lt;/a&gt;, and &lt;a href=&quot;https://github.com/trinodb/trino/pull/2099&quot;&gt;added compatibility with timestamp
values stored in ORC by Hive 3.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming improvements already being worked on include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2068&quot;&gt;Read support for ORC ACID tables&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/1591&quot;&gt;Read support for bucketed ORC ACID tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;try-it-out&quot;&gt;Try it out&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;https://twitter.com/findepi/status/1204783485094944768&quot;&gt;amazing Presto community&lt;/a&gt; is working hard on
getting Hive 3 support fully integrated in the Presto project and a lot is already accomplished.
Chances are THAT all you need is already included in the latest release. If you need one of the upcoming
improvements, watch the pull requests linked above, the &lt;a href=&quot;https://github.com/trinodb/trino/issues/1218&quot;&gt;roadmap issue&lt;/a&gt;,
join &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and stay tuned for upcoming release announcements. In the meantime, you
can try out the features today by running the &lt;a href=&quot;https://docs.starburstdata.com/latest/release/release-323-e.html&quot;&gt;323-e release&lt;/a&gt; of Starburst Presto.&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>The Hive community is centered around a few different Hive distributions, one of them being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game. In this post, we summarize which Hive 3 features Presto already supports, covering all the work that went into Presto to achieve that. We also outline next steps lying ahead.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Experiment with Graviton Processor</title>
      <link href="https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor.html" rel="alternate" type="text/html" title="Presto Experiment with Graviton Processor" />
      <published>2019-12-23T00:00:00+00:00</published>
      <updated>2019-12-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor</id>
      <content type="html" xml:base="https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor.html">&lt;p&gt;This December, AWS announced new instance types powered by &lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-new-amazon-ec2-m6g-c6g-and-r6g-instances-powered-by-next-generation-arm-based-aws-graviton2-processors/&quot;&gt;Arm-based AWS Graviton2 Processor&lt;/a&gt;. M6g, C6g, and R6g are designed to deliver up to 40% improved price/performance compared with the current generation instance types. We can achieve cost-effectiveness by using these instance type series. Presto is just a Java application, so that we should be able to run the workload with this type of cost-effective instance type without any modification.&lt;/p&gt;

&lt;p&gt;But is it true? Initially, we do not have a clear answer to how much effort we need to bring Presto into the world of the different processors. No care about the underlying platform is generally beneficial for development. But if using different processors enables us to accelerate the performance and stability of Presto, we must care about it. We must prove anything unclear by the experiment.&lt;/p&gt;

&lt;p&gt;This article is the report to clarify what we need to do to run Presto on the Arm-based platform and see how much benefit we can potentially obtain with Graviton Processor.&lt;/p&gt;

&lt;p&gt;As the Graviton 2 based instance types are preview state, we tried to run Presto on A1 instance that has the first generation of Graviton processor inside. It still would be a helpful anchor to understand the potential benefit of the Graviton 2 processor.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;how-to-make-presto-compatible-with-arm&quot;&gt;How to make Presto compatible with Arm&lt;/h1&gt;

&lt;p&gt;We are going to build the binary of Presto supporting Arm platform first. From the results, there are not so many things to do so. As long as JVM supports the Arm platform, it should work without any modification in the application code. But Presto has some restrictions on the platform where it runs to protect the functionality, including plugins. For example, the latest Presto supports only &lt;a href=&quot;https://github.com/trinodb/trino/blob/ee05ee5221690d66598039c6e397f7c7cb4c202b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java#L69&quot;&gt;x86 and PowerPC architectures&lt;/a&gt;. This limitation prevents us from using Presto on the Arm platform.&lt;/p&gt;

&lt;p&gt;To make Presto runnable on Arm machine, we need to modify &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java&quot;&gt;PrestoSystemRequirements&lt;/a&gt; class to allow &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aarch64&lt;/code&gt; architecture and more. For experimental purposes, we can apply such a patch to remove the restriction altogether.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;diff --git a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
index 07b7d12c64..b6a1249681 100644
--- a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
+++ b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
@@ -71,9 +71,9 @@ final class PrestoSystemRequirements
 String osName = StandardSystemProperty.OS_NAME.value();
 String osArch = StandardSystemProperty.OS_ARCH.value();
 if (&quot;Linux&quot;.equals(osName)) {
- if (!&quot;amd64&quot;.equals(osArch) &amp;amp;&amp;amp; !&quot;ppc64le&quot;.equals(osArch)) {
- failRequirement(&quot;Presto requires amd64 or ppc64le on Linux (found %s)&quot;, osArch);
- }
 if (&quot;ppc64le&quot;.equals(osArch)) {
 warnRequirement(&quot;Support for the POWER architecture is experimental&quot;);
 }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This patch is all we have to do to run Presto on the Arm platform. It should work for most cases except for the usage with &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html&quot;&gt;Hive connector&lt;/a&gt; because it has a native code not yet available for Arm platform.&lt;/p&gt;

&lt;h1 id=&quot;prepare-docker-images&quot;&gt;Prepare Docker Images&lt;/h1&gt;

&lt;p&gt;Docker container is a desirable option to run Presto experimentally due to its availability and easiness of use. But there is one thing to do to build Docker image supporting cross-platform.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.docker.com/buildx/working-with-buildx/&quot;&gt;Docker buildx&lt;/a&gt; is an experimental feature for the full support of &lt;a href=&quot;https://github.com/moby/buildkit&quot;&gt;Moby BuildKit toolkit&lt;/a&gt;. It enables us to build a Docker image supporting multiple platforms, including Arm. The feature is so useful that we can quickly make the cross-platform Docker image with a one-line command. But the feature is not generally available in the typical installation of Docker. Enabling the experimental flag is necessary as follows in the case of macOS.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/docker-daemon.png&quot; alt=&quot;Docker Daemon Experimental Feature&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And make sure to restart the Docker daemon. We can build the Docker image for Presto supporting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aarch64&lt;/code&gt; architecture with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;buildx&lt;/code&gt; command. We have used the source code of &lt;a href=&quot;https://github.com/trinodb/trino/commit/b0c07249de5c70a70b3037875df4fd0477dec9fc&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;317-SNAPSHOT&lt;/code&gt;&lt;/a&gt; with the earlier patch in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoSystemRequirements&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT \
 --platform linux/arm64 \
 -f presto-base/Dockerfile-aarch64 \
 -t lewuathe/presto-base:317-SNAPSHOT-aarch64 \
 presto-base --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-coordinator:317-SNAPSHOT-aarch64 \
 presto-coordinator --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-worker:317-SNAPSHOT-aarch64 \
 presto-worker --push
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We should be able to specify multiple platform names for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--platform&lt;/code&gt; option. But unfortunately, the Docker image of OpenJDK for Arm is distributed under &lt;a href=&quot;https://hub.docker.com/r/arm64v8/openjdk/&quot;&gt;the separated organization&lt;/a&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arm64v8/openjdk&lt;/code&gt;. Building an image supporting Arm requires us another &lt;a href=&quot;https://github.com/Lewuathe/docker-presto-cluster/blob/master/presto-base/Dockerfile-aarch64&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;&lt;/a&gt;. Anyway, Docker images containing Presto supporting Arm are now available.&lt;/p&gt;

&lt;h1 id=&quot;setup-a1-instance&quot;&gt;Setup A1 Instance&lt;/h1&gt;

&lt;p&gt;The following setup prepares the environment enough to run docker-compose on the A1 instance. &lt;a href=&quot;https://github.com/docker/compose/issues/5342&quot;&gt;As no docker-compose binary for Arm&lt;/a&gt; is distributed officially, we need to install and build docker-compose with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pip&lt;/code&gt;. Make sure to run them after the instance initialization completes.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Install Docker
$ sudo yum update -y
$ sudo amazon-linux-extras install docker -y
$ sudo service docker start
$ sudo usermod -a -G docker ec2-user

# Install docker-compose
$ sudo yum install python2-pip gcc libffi-devel openssl-devel -y
$ sudo pip install -U docker-compose
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;performance-comparison&quot;&gt;Performance Comparison&lt;/h1&gt;

&lt;p&gt;Let’s briefly take a look into how the performance provided by the Graviton processor looks like. We are going to use &lt;a href=&quot;https://aws.amazon.com/ec2/instance-types/a1/&quot;&gt;a1.4xlarge&lt;/a&gt; as a benchmark instance of Graviton processor.&lt;/p&gt;

&lt;p&gt;Here is our specification of the benchmark conditions.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We use the commit &lt;a href=&quot;https://github.com/trinodb/trino/commit/b0c07249de5c70a70b3037875df4fd0477dec9fc&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b0c07249de5c70a70b3037875df4fd0477dec9fc&lt;/code&gt;&lt;/a&gt; + the patch previously described.&lt;/li&gt;
  &lt;li&gt;1 coordinator + 2 worker processes run by &lt;a href=&quot;https://docs.docker.com/compose/&quot;&gt;docker-compose&lt;/a&gt; on a single instance.&lt;/li&gt;
  &lt;li&gt;We use a1.4xlarge and c5.4xlarge, whose CPU core and memory are the same as a1.4xlarge. And we also compared with m5.2xlarge, whose on-demand instance cost is close to a1.4xlarge.&lt;/li&gt;
  &lt;li&gt;We use &lt;a href=&quot;https://github.com/trinodb/trino/tree/master/presto-benchto-benchmarks/src/main/resources/sql/presto/tpch&quot;&gt;q01, q10, q18, and q20&lt;/a&gt; run on the TPCH connector. Since the Presto TPCH connector does not access external storage, we can measure pure CPU performance without worrying about network variance.&lt;/li&gt;
  &lt;li&gt;We choose &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sf1&lt;/code&gt; as the scaling factor of TPCH connector&lt;/li&gt;
  &lt;li&gt;Our experiment measures the average time of 5 query runtime after 5 times warmup for every query.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;openjdk-8&quot;&gt;OpenJDK 8&lt;/h4&gt;
&lt;p&gt;Here is the result of our experiment. The vertical axis represents the running time in milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/openjdk8-performance.png&quot; alt=&quot;OpenJDK 8 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It shows c5.4xlarge achieves the best performance consistently in every case. Compared with m5.2xlarge, the result was switched by the query type. a1.4xlarge and m5.2xlarge are probably competing with each other.&lt;/p&gt;

&lt;p&gt;Although we use OpenJDK 8 for this case, it might not be able to generate the code fully optimized for Arm architecture. In general, the later versions, such as &lt;a href=&quot;https://medium.com/@carlosedp/java-benchmarks-on-arm64-17edd8b9ff79&quot;&gt;OpenJDK 9 or 11, give us better performance&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;openjdk-11&quot;&gt;OpenJDK 11&lt;/h4&gt;
&lt;p&gt;Let’s try to run Presto with OpenJDK 11 again.  There is one thing to do. From JDK 9, the &lt;a href=&quot;https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8180425&quot;&gt;Attach API&lt;/a&gt; was disabled as default. We have found that we needed to allow the usage of attach API by adding the following option in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; file, otherwise we will see an error message at the bootstrap phase.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-Djdk.attach.allowAttachSelf=true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is the performance comparison with OpenJDK 11.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/openjdk11-performance.png&quot; alt=&quot;OpenJDK 11 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;a1.4xlarge and c5.4xlarge achieve even higher performance than OpenJDK 8 for every case. On the contrary, m5.2xlarge shows a slower result in some cases.
While this result still demonstrates c5.4xlarge is the best instance in terms of the performance, the performance gaps between instances are smaller compared with the OpenJDK 8 cases. Especially, a1.4xlarge shows relatively competitive performance with the smaller dataset (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt;). How does the scaling factor influence performance? We’ll see.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/sf-comparison.png&quot; alt=&quot;Scaling Factor Comparison&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The above chart shows how performance is affected by the scaling factor. c5.4xlarge demonstrates the most stable running time, regardless of the scaling factor. If we want to achieve stable performance as much as possible, c5.4xlarge is a good option in the list. a1.4xlarge and m5.2xlarge show similar volatility against the scaling factor this time.&lt;/p&gt;

&lt;p&gt;Considering the cost of a1.4xlarge instance is 40% cheaper than c5.4xlarge, it may make sense to use a1.4xlarge for the specific case. The on-demand cost of &lt;a href=&quot;https://aws.amazon.com/ec2/pricing/on-demand/&quot;&gt;a1.4xlarge is $9.8/day and c5.4xlarge is $16.3/day for on-demand instance type&lt;/a&gt;. The public announcement says &lt;a href=&quot;https://aws.amazon.com/ec2/graviton/&quot;&gt;Graviton 2 delivers 7x performance compared to the Graviton processor&lt;/a&gt;. We may expect an even better performance by using a new generation processor. We cannot wait for the general availability of Graviton 2.&lt;/p&gt;

&lt;h4 id=&quot;amazon-corretto&quot;&gt;Amazon Corretto&lt;/h4&gt;
&lt;p&gt;How about other JVM distributions? Now we have found Amazon Corretto also supports Arm architecture, and it distributes &lt;a href=&quot;https://hub.docker.com/layers/amazoncorretto/library/amazoncorretto/11/images/sha256-8f06c4a09e6a0784d6da3fb580bd57c4881df3fc8f56de1f3c0fd66dde20e43c&quot;&gt;the Docker image built for Arm&lt;/a&gt;. Let’s try Amazon Corretto similarly.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/a1-instance-performance.png&quot; alt=&quot;A1 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This chart illustrates the performance result by different JDK implementations, OpenJDK 8, OpenJDK 11, and Amazon Corretto 11. Overall, OpenJDK 11 seems to be the best. But Amazon Corretto achieves the even better performance in some of the sf1 cases interestingly. It indicates that Presto with Amazon Corretto may provide better performance in some query types.&lt;/p&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;As Presto is just a Java application, there are not so many things to do to support the Arm platform. Only applying one patch and one JVM option brings us Presto binary supporting the latest platform. It is always exciting to see a new technology used for complicated distributed systems such as Presto. The combination of cutting-edge technologies surely takes us a journey to the new horizon of technological innovation.&lt;/p&gt;

&lt;p&gt;Last but not least, we have used docker-compose and TPCH connectors to execute queries for the Presto cluster quickly in the Arm platform. Note that the performance of a distributed system such as Presto depends on various kinds of factors. Please be sure to run your benchmark carefully when you try to use a new instance type in your production environment.&lt;/p&gt;

&lt;p&gt;We have uploaded the Docker image used for this experiment publicly. Feel free to use them if you are interested in running Presto on the Arm platform.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Image for Armv8 using OpenJDK 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-aarch64
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-aarch64


# Image for Armv8 using Amazon Corretto 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-corretto
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-corretto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And also, I have raised &lt;a href=&quot;https://github.com/trinodb/trino/issues/2262&quot;&gt;an issue&lt;/a&gt; to start the discussion of supporting Arm architecture in the community. It would be great if we could get any feedback from those who are interested in it.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</content>

      
        <author>
          <name>Kai Sasaki, Arm Treasure Data</name>
        </author>
      

      <summary>This December, AWS announced new instance types powered by Arm-based AWS Graviton2 Processor. M6g, C6g, and R6g are designed to deliver up to 40% improved price/performance compared with the current generation instance types. We can achieve cost-effectiveness by using these instance type series. Presto is just a Java application, so that we should be able to run the workload with this type of cost-effective instance type without any modification. But is it true? Initially, we do not have a clear answer to how much effort we need to bring Presto into the world of the different processors. No care about the underlying platform is generally beneficial for development. But if using different processors enables us to accelerate the performance and stability of Presto, we must care about it. We must prove anything unclear by the experiment. This article is the report to clarify what we need to do to run Presto on the Arm-based platform and see how much benefit we can potentially obtain with Graviton Processor. As the Graviton 2 based instance types are preview state, we tried to run Presto on A1 instance that has the first generation of Graviton processor inside. It still would be a helpful anchor to understand the potential benefit of the Graviton 2 processor.</summary>

      
      
    </entry>
  
    <entry>
      <title>First Presto Summit in India, Bangalore, September 2019</title>
      <link href="https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore.html" rel="alternate" type="text/html" title="First Presto Summit in India, Bangalore, September 2019" />
      <published>2019-09-05T00:00:00+00:00</published>
      <updated>2019-09-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore</id>
      <content type="html" xml:base="https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore.html">&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/MyPost.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt; organized the first ever Presto Summit in India on September 05, 2019. 
Bangalore, as the technology  and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot 
of interest and adoption in this (south asia and asia pacific) region, as was evident with the 
turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, 
on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), 
as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.&lt;/p&gt;

&lt;p&gt;With 150 attendees from more than 75 companies,  Presto community in India was super excited and 
eager to meet and interact with Presto co-creators - &lt;a href=&quot;https://www.linkedin.com/in/traversomartin/&quot;&gt;Martin Traverso&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom/&quot;&gt;Dain Sundstrom&lt;/a&gt; and
&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, who flew down to Bangalore for this  Event.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;welcome-note-by-joydeep-sen-sarma&quot;&gt;Welcome Note by Joydeep Sen Sarma&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A1895.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/joydeeps/&quot;&gt;Joydeep Sen Sarma&lt;/a&gt;, co-creator Hive and co-founder Qubole, kicked off the event by welcoming 
Presto co-creators, speakers and all the attendees. He also provided a brief historical perspective 
of Qubole’s contributions to Presto and highlighted the importance of Presto in Qubole’s customer base.&lt;/p&gt;

&lt;h1 id=&quot;keynote-by-martin-dain-and-david&quot;&gt;Keynote by Martin, Dain and David&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%201.%20Keynote%20by%20Martin%2C%20David%2C%20Dain.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/viBY8Fa3OjI&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A1911.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This was followed by the most awaited presentation of the day - 
the keynote from Martin, Dain and David. Martin took the audience through Presto’s journey - right from its birth at Facebook, 
to its growth and adoption at Facebook, and finally to the present with the formation of Presto Software Foundation 
for wider community involvement. He also highlighted some of their design choices and some mis-steps they took along the way.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-grab&quot;&gt;Presto at Grab&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%202.%20Talk%20by%20Edwin%20Law%20Grab.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/0TR7Nzs8asc&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/grab-talk.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;First industry speaker of the day was &lt;a href=&quot;https://www.linkedin.com/in/edwinlawhh/&quot;&gt;Edwin Hui Hean Law&lt;/a&gt;, 
Data Engineering Lead at &lt;a href=&quot;https://www.grab.com/sg/&quot;&gt;Grab, Singapore&lt;/a&gt;. He and his team flew all the way 
from Singapore for Presto Summit - a true testament to their passion and interest in Presto. His talk 
covered Grab’s experience of using Presto on Amazon EMR followed by their migration to Presto on Qubole. 
He provided his insights on the relative pros and cons of these platforms. Final part of his talk covered his 
team’s recent experimentation with Presto on Kubernetes.&lt;/p&gt;

&lt;h1 id=&quot;read-support-for-hive-acid-tables-in-presto&quot;&gt;Read Support for Hive ACID tables in Presto&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%203.%20Talk%20by%20Shubham%20Tagra%20Qubole.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/Q2Nv18ohegA&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2023.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Next, &lt;a href=&quot;https://www.linkedin.com/in/shubham-tagra-267a5838/&quot;&gt;Shubham Tagra&lt;/a&gt;, Sr. Staff at &lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt;, 
presented his work on providing read support for Hive ACID tables in Presto. This has become increasingly important with the arrival of 
data privacy regulations like GDPR and CCPA that grant users “Right to erasure” and/or “Right to rectification”. 
These regulations require that organisations storing user data are obligated to delete or update user data as per user request. 
Hive ACID is a solution available in open source that addresses these problems around delete and updates. 
Shubham’s talk covered why he picked Hive ACID over other options available in open source, as well as 
details of Hive ACID and Presto integration that he added.&lt;/p&gt;

&lt;h1 id=&quot;presto-optimizations-at-zoho-corporation&quot;&gt;Presto Optimizations at Zoho Corporation&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%204.%20Talk%20by%20Praveen%20Krishna%20Zoho.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/mffX12yZTaU&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2072.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Post lunch, &lt;a href=&quot;https://www.linkedin.com/in/praveenkrishna2112/&quot;&gt;Praveen Krishna&lt;/a&gt; from &lt;a href=&quot;https://www.zohocorp.com/&quot;&gt;Zoho Corporation&lt;/a&gt;, 
presented a summary of his team’s journey with Presto. In order to serve their teams with a pretty small cluster, 
they had to optimize Presto at various levels. Praveen’s team started by analyzing various phases of query execution 
and their impact on performance. Praveen’s team optimized Presto’s planner and reduced the planning time by 
20-30% for queries involving multiple joins on wide tables. He also highlighted how they have integrated 
Apache Lucene to speed up full text search operation. After several iterations his team came up with a model 
where they maintained the Lucene index for each row group in the ORC itself. For columns with higher null ratio, 
replacing normal blocks with run length encoded blocks reduced memory consumption . With this logic implemented 
in ORC reader and Core Presto, they were able to reduce memory pressure in the cluster .&lt;/p&gt;

&lt;h1 id=&quot;presto-at-walmart-labs&quot;&gt;Presto at Walmart Labs&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%205.%20Talk%20by%20Ashish%20Tadose%20Walmart%20Labs.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/wap7Hr7P8Bo&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2092.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Second presentation in this session was from &lt;a href=&quot;https://www.linkedin.com/in/ashish-tadose-78773b22/&quot;&gt;Ashish Kumar Tadose&lt;/a&gt;, 
Principal Engineer at &lt;a href=&quot;https://www.walmartlabs.com/&quot;&gt;Walmart Labs&lt;/a&gt;. He gave an overview of how his team is 
using Presto on Google Compute Cloud (GCP). 
He highlighted the challenges associated with querying diverse data sources at Walmart and how his team has 
tackled these challenges using Presto. His talk also described how his team has implemented monitoring, auto scaling, 
caching (via Alluxio), and security policies via Ranger.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-inmobi&quot;&gt;Presto at InMobi&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%206.%20Talk%20by%20Rohit%20Chatter%20InMobi.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/zEvqrAss7Iw&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2222.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ater a coffee break, &lt;a href=&quot;https://www.linkedin.com/in/rohit-chatter-525b62/&quot;&gt;Rohit Chatter&lt;/a&gt;, CTO at &lt;a href=&quot;https://www.inmobi.com/&quot;&gt;InMobi&lt;/a&gt;, 
provided a historical perspective of how his team has migrated from Hive in private Data centers to Presto on the 
public cloud. His talk covered various aspects of how his team handles autoscaling and workload management on the cloud.&lt;/p&gt;

&lt;h1 id=&quot;presto-scheduler-changes-for-rubix&quot;&gt;Presto Scheduler Changes for Rubix&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%207.%20Talk%20by%20Garvit%20Gupta%2C%20Microsoft%20and%20Ankit%20Dixit%2C%20Qubole.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/x8xIWuQnEFs&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2258.JPG&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2248.JPG&quot; alt=&quot;&quot; /&gt;
Next, &lt;a href=&quot;https://www.linkedin.com/in/garvitg/&quot;&gt;Garvit Gupta&lt;/a&gt; from &lt;a href=&quot;http://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt; presented his work on 
Presto scheduler changes for data locality and optimized scheduling for caching engines like &lt;a href=&quot;https://www.qubole.com/rubix/&quot;&gt;RubiX&lt;/a&gt;. 
This work was done primarily as part of his internship at Qubole. This talk was co-presented 
by &lt;a href=&quot;https://www.linkedin.com/in/ankit-dixit-a725545b/&quot;&gt;Ankit Dixit&lt;/a&gt; from &lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt;, 
who first gave an overview of the  Rubix caching engine and its architecture.  Garvit highlighted the need for having locality as another dimension 
to be considered while assigning splits to nodes and how this led to the implementation of a new Presto scheduler. 
The new scheduling model manages to prioritize locality while ensuring a uniform distribution of workload to nodes and 
improves efficacy of any data caching framework that you would use with Presto. His talk covered the new scheduler 
changes in detail, and concluded with  performance numbers where he saw upto 9x improvement in cached/local reads with RubiX.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-miq-digital&quot;&gt;Presto at MiQ Digital&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%208.%20Talk%20by%20Rohit%20Srivastava%20MIQ.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/nOmI48iqlU4&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2274.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Final presentation of the day was from &lt;a href=&quot;https://www.linkedin.com/in/rohitsrivastava20/&quot;&gt;Rohit Srivastava&lt;/a&gt;, 
Engineering Manager at &lt;a href=&quot;http://www.wearemiq.com/&quot;&gt;MiQ Digital&lt;/a&gt;, who presented an overview of Unified Insights &amp;amp; Data 
Analytics platform at MiQ. He highlighted several challenges that his team had to overcome, such as scaling the 
team/infrastructure/company, dealing with data copies, duplication of data pre-processing and the cost and 
effort that goes into it, meeting strict SLAs etc.  He gave an overview of how using Presto on Qubole for all 
dashboarding needs with additions like standardising most of their data to be stored in the Apache Parquet format 
on S3 has helped overcome some of these challenges.&lt;/p&gt;

&lt;p&gt;In summary, first Presto Summit in India, had a great mix of  talks - some  were around Presto usage and 
experience of operating large Presto deployments across multiple clouds, while some others focussed on niche 
technical contributions around Presto scheduler changes for data locality, speeding up ORC reader, and read support for 
Hive ACID tables in Presto. Participants had interesting and engaging questions for all the speakers and in general, 
enjoyed interacting with Presto founders, other Presto users and developers in the region.&lt;/p&gt;

&lt;p&gt;Videos and slides for all talks can be found &lt;a href=&quot;https://go.qubole.com/2019-09-05---FE---Presto-Summit-19-Bangalore_Post-Summit-Videos-LP-2.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We look forward to the next Presto Summit in this region soon!&lt;/p&gt;</content>

      
        <author>
          <name>Vijay Mann, Director of Engineering, Qubole</name>
        </author>
      

      <summary>Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity. With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/Bangalore-2019/MyPost.png" />
      
    </entry>
  
    <entry>
      <title>Unnest Operator Performance Enhancement with Dictionary Blocks</title>
      <link href="https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements.html" rel="alternate" type="text/html" title="Unnest Operator Performance Enhancement with Dictionary Blocks" />
      <published>2019-08-23T00:00:00+00:00</published>
      <updated>2019-08-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements</id>
      <content type="html" xml:base="https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements.html">&lt;p&gt;Queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause are expected to have a significant performance improvement starting version 316.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;executive-summary&quot;&gt;Executive Summary&lt;/h1&gt;

&lt;p&gt;The execution plans for queries with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause contain an Unnest Operator. The previous implementation of Unnest Operator performed a deep copy on all input blocks to generate output blocks. This caused high CPU consumption and memory allocation for the operator, and impacted the performance of such queries. The impact was worse for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; queries accessing a high number of columns, or even a few columns with deeply nested schema.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We realized that the implementation can be made more efficient by avoiding copies in the Unnest Operator, if possible. Using dictionary blocks to create output blocks pointing to input elements has given us significant CPU and memory benefits by avoiding copies. The benchmark results for the new Unnest Operator implementation show more than ~10x gain in CPU time and 3x~5x gain in memory allocation.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Let’s try to understand this change with an example. At LinkedIn, the most common usage for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause is seen to be for unnesting a single array or map column. A sample query with the clause would look like the following:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnest_c1&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnest_c1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The plots below compare the performance of Unnest Operator in the previous and the current implementation for 3 different cases. Every case evaluates the Unnest Operator performance for a query like the above, on a table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; with two columns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c0&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c1&lt;/code&gt;. For all the 3 cases, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c0&lt;/code&gt; is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; type column. But the nested column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c1&lt;/code&gt; is of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP(VARCHAR, VARCHAR)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(ROW(VARCHAR, VARCHAR, VARCHAR))&lt;/code&gt; types respectively. All the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; elements in both the columns have length 50, and the arrays in the second column have lengths distributed uniformly between 0 and 300.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We used JMH &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/test/java/io/prestosql/operator/BenchmarkUnnestOperator.java&quot;&gt;benchmark&lt;/a&gt; to measure the performance of the queries in terms of CPU time and memory allocations per operation. An “operation” (for the purposes of this measurement) is defined as the processing of 10,000 rows by an unnest operator.
These results reflect the speedup of the operator and may not extend to the overall query execution.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-cpu.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The figure above compares the CPU times before and after the enhancements. For the three cases, we see that every operation finishes more than 10x faster. The new implementation removes the need of copying data for output block generation in this case, giving us significant CPU time savings.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-memory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The figure above compares the memory allocation per operation before and after the enhancement. The new Unnest Operator implementation does not allocate new large memory chunks for output blocks. Instead, it uses integer typed pointers pointing to input block elements, which results in smaller memory allocations than creating new VARCHAR blocks. This brings down the allocation rate by 3x-5x in this example.&lt;/p&gt;

&lt;p&gt;Let’s dig into the design and implementation details.&lt;/p&gt;

&lt;h1 id=&quot;background&quot;&gt;Background&lt;/h1&gt;

&lt;p&gt;An Operator in Presto performs a step of computation on data. The local execution plan for a task involves pipelines of operators. Operators process pages coming from the previous Operator in the pipeline, and produce output pages for the next one. Code for an Operator has to be efficient, since it may be evaluated billions of times for a single query.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; is made of a set of blocks storing data for different columns. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; is one of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; implementations in Presto. The elements in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; are represented using an integer array (called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ids&lt;/code&gt;) and a reference to another block. The values in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ids&lt;/code&gt; array represent elements of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; by pointing to element indices in the referenced block. DictionaryBlocks are useful to perform more efficient encoding of columns with duplicates.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The Unnest Operator was implemented before the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; was added. We saw an opportunity to enhance the performance of this Operator by using DictionaryBlocks. A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; can enable the Unnest Operator to reuse already constructed input blocks. Using DictionaryBlock for the Unnest operator eliminates the need for expensive copies and results in significant compute and memory savings.&lt;/p&gt;

&lt;h1 id=&quot;design&quot;&gt;Design&lt;/h1&gt;

&lt;p&gt;Consider the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; query on a table with one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; type and one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; type columns.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-input-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_position&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;positions_held&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_position&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-output-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Elements of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; column are replicated while we unnest elements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt; column. In this example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; is a “replicated column”, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt;  will be referred to as an “unnested column”.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Multiple unnest columns are also allowed (eg. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST(positions_held, company_name) AS U(unnested_position, unnested_company)&lt;/code&gt;), but that case is not that common. It requires special handling, and we talk about  that &lt;a href=&quot;#dealing-with-multiple-unnest-columns&quot;&gt;later&lt;/a&gt; in the post.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the old design, an element from a replicated column would get copied over &lt;em&gt;n&lt;/em&gt; times for building the output, where &lt;em&gt;n&lt;/em&gt; is the cardinality of the element in the unnest column. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Alice&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Bob&lt;/code&gt; will be copied 2 and 3 times respectively. In the new design, the output block will contain &lt;em&gt;n&lt;/em&gt; pointers to the element in the input block, without actually copying. It will store a reference to the input block as well. The benefits here are proportional to the replicated column element sizes. &lt;em&gt;The bigger the element size, the greater the speedup.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-replicate-name.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Unnest columns are handled the same way. The previous design would copy them over one by one. This becomes CPU intensive and requires new memory allocations, especially in case of deeply nested columns, since a deep copy is required. In the new design, we try to use pointers instead of copies in most of the cases. The following figure shows the output block structure of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_positions&lt;/code&gt; column in the query above, for the old and the new implementation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-unnest-positions.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The indices in the output block &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B3&lt;/code&gt; shown above are strictly increasing starting from 0, but that is not always the case. The same input block can be used to generate multiple output blocks, with a different set of indices. Another interesting scenario is when multiple columns are being unnested. In that case, the output may require null appends because of the difference in cardinalities. We look for null elements in the input block and use their indices for handling the null-appends. If that is not possible, we have to fall back to copying data. We discuss this in more detail in the next section.&lt;/p&gt;

&lt;h1 id=&quot;implementation-challenges&quot;&gt;Implementation Challenges&lt;/h1&gt;

&lt;h4 id=&quot;extracting-input-from-nested-blocks&quot;&gt;Extracting Input from Nested Blocks&lt;/h4&gt;

&lt;p&gt;Data in the input unnest columns is represented in terms of nested structures (eg. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ArrayBlock&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MapBlock&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowBlock&lt;/code&gt;), which creates a layer of indirection on top of the actual element blocks. For the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt; column from the example above, the input block is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ArrayBlock&lt;/code&gt;, that contains:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;offset information for representing arrays in every row&lt;/li&gt;
  &lt;li&gt;actual data in the form of an underlying element block storing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;s.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;p&gt;For building an output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt;, we create pointers to this underlying block. While processing entries from input array block, array offsets are translated to indices of the underlying block. Similar translation has been implemented for unnest columns with array type, map type and array of row type. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarMap&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarArray&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarRow&lt;/code&gt; structures are used for enabling such translation of indices.&lt;/p&gt;

&lt;h4 id=&quot;dealing-with-multiple-unnest-columns&quot;&gt;Dealing with Multiple Unnest Columns&lt;/h4&gt;

&lt;p&gt;When there are more than one nested columns in a table, a user may want to unnest multiple columns in the same query. Consider a table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; with 3 columns: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schools_attended&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_dates&lt;/code&gt;. They have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; types respectively. Every row in this table indicates schools attended and corresponding graduation dates for a person. Let’s say a user wants to unnest the contents of the two array columns into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_school&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_graduation_date&lt;/code&gt;.&lt;/p&gt;

&lt;!--more--&gt;
&lt;p&gt;One naive way of doing that is using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause twice, on the two different columns. This translates to two different &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; operators (as shown in the query below) with a single unnest column producing two independent cross joins, and the execution will proceed the way we discussed earlier. This query structure is not very helpful, since we get blown up cross joined data.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schools_attended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graduation_dates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;!--more--&gt;
&lt;p&gt;The correct way of unnesting the two columns is using them in the same unnest clause, as shown below.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schools_attended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graduation_dates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The arrays/maps being unnested in multiple columns can have different cardinalities. In this example, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_date&lt;/code&gt; value for the last school may not be present, if the user has not yet graduated. Null elements need to be appended to the output unnest columns in such cases.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the example data shown below, a NULL element is appended in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_graduation_date&lt;/code&gt; column since the array in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_dates&lt;/code&gt; column is shorter than that in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schools_attended&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-corner-case.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Since we are using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; for building the unnest output column, appending a null gets slightly tricky. How do we create a pointer for representing a NULL? The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; implementation, as of now, does not have a way to represent null elements. In such cases, we first check for existence of a null element in the input block. If we find a NULL element there, we use the index of that element while appending NULLs in the output. Otherwise we copy elements from the input to create a new output block, like we used to do in the previous implementation.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In cases with multiple columns, the length of arrays/maps are usually the same, and misalignments are not that frequent. Having said that, misalignments can result in copying of data while building output blocks if NULL elements are not present in the input. This may reduce the CPU and memory savings (even increase the average memory allocation in some cases), but this specific case is not common.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future Work&lt;/h1&gt;

&lt;p&gt;Performance for the queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause can be further improved through the following optimizations.&lt;/p&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;While unnesting a deeply nested column of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array(row(.....))&lt;/code&gt;, the user is often interested in a small subset of fields from the row. Such cases can benefit from optimization of the logical plan through the pushdown of dereference projections. There are ongoing efforts in the community in this direction.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;The dictionary blocks created in the discussed implementation use the input block as a reference. What happens if the input itself is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt;? We end up with two levels of dereferencing. Such cases can be further optimized by collapsing the multiple indirections into a single one.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;The common case for unnest column does not involve any NULL appends. The unnested output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; in this case represents a range over the input block. We can avoid the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; creation by using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getRegion&lt;/code&gt; method on the input block.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;For variable-width and complex columns, usage of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; can be beneficial in terms of CPU and memory. This may be overkill for primitive types (booleans or integers) and we might be better off copying rather than creating a dictionary block. Selectively choosing to use dictionary blocks based on the type can be helpful.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;LinkedIn’s data ecosystem makes heavy use of tables with deeply nested columns, and this change is beneficial for handling Presto queries on such tables. In our internal experiments with production data, we have seen queries perform up to ~9x faster with as much as ~13x less cpu usage.&lt;/p&gt;

&lt;p&gt;We look forward to people in the community trying this out starting with the 316 release. We would love to hear others’ observations of performance after this change. Feel free to reach out to me over &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;slack&lt;/a&gt; (handle @padesai)  or &lt;a href=&quot;https://www.linkedin.com/in/pratham-desai/&quot;&gt;LinkedIn&lt;/a&gt; with questions or feedback.&lt;/p&gt;</content>

      
        <author>
          <name>Pratham Desai, LinkedIn</name>
        </author>
      

      <summary>Queries with CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.</summary>

      
      
    </entry>
  
    <entry>
      <title>A Report of First Ever Presto Conference Tokyo</title>
      <link href="https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo.html" rel="alternate" type="text/html" title="A Report of First Ever Presto Conference Tokyo" />
      <published>2019-07-11T00:00:00+00:00</published>
      <updated>2019-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo</id>
      <content type="html" xml:base="https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo.html">&lt;p&gt;Nowadays, Presto is getting much attraction from the various kind of companies all around 
the world. Japan is not an exception. Many companies are using Presto as their primary data 
processing engine.&lt;/p&gt;

&lt;p&gt;To keep in touch with each other among the community members in Japan, we have just held the 
first ever Presto conference in Tokyo with welcoming Presto creators, &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and &lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;. 
The conference was hosted at the Tokyo office of &lt;a href=&quot;https://www.treasuredata.com/&quot;&gt;Arm Treasure Data&lt;/a&gt;. 
This article is the summary of the conference aiming to convey the excitement in the room.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-conference-tokyo/overall-view.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-current-and-future&quot;&gt;Presto: Current and Future&lt;/h1&gt;

&lt;p&gt;First of all, Presto creators introduced their work in these days and software foundation 
launched in the last year. They covered the following changes and enhancements achieved by 
the community recently.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto Software Foundation&lt;/li&gt;
  &lt;li&gt;New Connectors
    &lt;ul&gt;
      &lt;li&gt;Phoenix&lt;/li&gt;
      &lt;li&gt;Elasticsearch&lt;/li&gt;
      &lt;li&gt;Apache Ranger&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Attendees can also learn several plans that will happen shortly.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The plan to support more complex pushdown to connectors&lt;/li&gt;
  &lt;li&gt;Case-sensitive identifier&lt;/li&gt;
  &lt;li&gt;Timestamp semantics&lt;/li&gt;
  &lt;li&gt;Dynamic filtering&lt;/li&gt;
  &lt;li&gt;Connectors such as Iceberg, Kinesis, Druid.&lt;/li&gt;
  &lt;li&gt;Coordinator high availability&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;reading-the-source-code-of-presto&quot;&gt;Reading The Source Code of Presto&lt;/h1&gt;

&lt;p&gt;To make attendees get used to the technical talk about Presto in the conference, 
&lt;a href=&quot;https://github.com/xerial&quot;&gt;Leo&lt;/a&gt; provided a guide for walking around the source code of 
Presto code. Since the Presto source code repository is enormous, it must be helpful as 
a leader to help developers explore the forest of the codebase.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/vTpEZFzu03tVhv&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/taroleo/reading-the-source-code-of-presto&quot; title=&quot;Reading The Source Code of Presto&quot; target=&quot;_blank&quot;&gt;Reading The Source Code of Presto&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo&quot; target=&quot;_blank&quot;&gt;Taro L. Saito&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-at-arm-treasure-data&quot;&gt;Presto At Arm Treasure Data&lt;/h1&gt;

&lt;p&gt;Then &lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Kai&lt;/a&gt; (it’s me) provides an overview of how Arm Treasure 
Data uses Presto in their service. Presto is heavily used to support many enterprise use 
cases, including IoT data analysis, and it is becoming the hub component processing high 
throughput workload from many kinds of clients such as Spark, ODBC and JDBC.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/cVfDINF85hx0Vx&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/taroleo/presto-at-arm-treasure-data-2019-updates&quot; title=&quot;Presto At Arm Treasure Data - 2019 Updates&quot; target=&quot;_blank&quot;&gt;Presto At Arm Treasure Data - 2019 Updates&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo&quot; target=&quot;_blank&quot;&gt;Taro L. Saito&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;large-scale-migration-from-hive-to-presto-in-yahoo-japan&quot;&gt;Large Scale Migration from Hive to Presto in Yahoo! JAPAN&lt;/h1&gt;

&lt;p&gt;We could learn how hard to migrate large scale workload from Hive to Presto from the 
presentation given by &lt;a href=&quot;https://github.com/oneonestar&quot;&gt;Star&lt;/a&gt; from Yahoo! Japan. Quite a few people 
seem to be interested in the tool they have created to convert HiveQL into Presto SQL. They might 
have faced the same type of challenges.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/ld3tI0uIzAQe1&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/techblogyahoo/large-scale-migration-fromhive-to-presto-at-yahoo-japan&quot; title=&quot;Large scale migration fromHive to Presto at Yahoo! JAPAN&quot; target=&quot;_blank&quot;&gt;Large scale migration fromHive to Presto at Yahoo! JAPAN&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/techblogyahoo&quot; target=&quot;_blank&quot;&gt;Yahoo!デベロッパーネットワーク&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-at-line&quot;&gt;Presto At LINE&lt;/h1&gt;

&lt;p&gt;LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). 
&lt;a href=&quot;https://github.com/wyukawa&quot;&gt;Wataru Yukawa&lt;/a&gt;, &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt; gave us how 
they can improve their platform with collaborating with the community. We could find difficulty 
and challenge primarily provoked by the dependencies on other Hadoop ecosystems such as HDFS and Spark.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/Hx9oz6Pi1su5rj&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/wyukawa/presto-conferencetokyo2019&quot; title=&quot;Presto conferencetokyo2019&quot; target=&quot;_blank&quot;&gt;Presto conferencetokyo2019&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/wyukawa&quot; target=&quot;_blank&quot;&gt;wyukawa &lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;One notable thing in the session was the question about the discussion of how to make the error 
message excellent provided by Presto. David and creators are genuinely caring about the error message 
shown by the system. To reduce the time consumed to deal with the inquiry about the error, improving 
the error message is one of the best options. That’s the primary reason to maintain the error message 
easy to understand.&lt;/p&gt;

&lt;h1 id=&quot;qa-session&quot;&gt;Q&amp;amp;A Session&lt;/h1&gt;

&lt;p&gt;At the end of the conference, attendees got a chance to freely ask Presto creators about a bunch of 
topics not only Presto technical thing but also their working style, or thoughts. Here is a part of 
the list of Q&amp;amp;A talked at the conference.&lt;/p&gt;

&lt;p&gt;Q: What do you expect most from Japan community?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Considering the communication in the Israel community, gaining the diversity of the use case will make 
Presto better. We are expecting that kind of diversity. Japan surely has a unique community to solve 
the difficulty. Having a Japanese slack channel might be a good idea to help each other :)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Q: How do you review the pull request code? How to keep the quality of the code review process?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Code review difficulty depends on the complexity of PR itself. We use IntelliJ extensively to read 
the code base. There are mainly two things to keep the code review quality. One is that involving 
the actual code review will make you a good reviewer. Another thing is automating minor checks 
such as code style. These things are helpful to keep the code review process functional.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;Make it readable is the most important thing in the Presto codebase.&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Do not use the abbreviation and slang because not everyone can understand these words at a glance&lt;/li&gt;
    &lt;li&gt;Write comment -&amp;gt; Write code -&amp;gt; Delete comment. That is the process to make the code readable itself.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Q: SQL on Everything approach vs. pursuing the performance. Which direction should Presto move forward?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;It depends on the community decision. However, along with the discussion with several companies 
in the community, even not a single company does not show much concern about the performance of Presto.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;This conference was the first ever Presto conference inviting the Presto creators in Tokyo. We were
able to have an exciting discussion with the community developers and creators. One of the great 
things we could find in the conference was the enthusiasm of creators to make Presto usable 
by every developer. They are genuinely caring about the error message checked by users, code 
quality read by developers. Thanks to this type of good usability from the viewpoint of both 
users and developers, Presto keeps gaining attraction from the community.&lt;/p&gt;

&lt;p&gt;That was a great time to have many conversations with the community members. We really appreciate 
developers in the community and creators. Thank you so much for coming to the conference and see 
you next time!&lt;/p&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://techplay.jp/event/733772&quot;&gt;Presto Conference Tokyo 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo/reading-the-source-code-of-presto&quot;&gt;Reading The Source Code of Presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo/presto-at-arm-treasure-data-2019-updates&quot;&gt;Presto At Arm Treasure Data - 2019 Updates&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/techblogyahoo/large-scale-migration-fromhive-to-presto-at-yahoo-japan&quot;&gt;Large Scale Migration from Hive to Presto in Yahoo! JAPAN&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/wyukawa/presto-conferencetokyo2019&quot;&gt;Presto At LINE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Kai Sasaki, Arm Treasure Data</name>
        </author>
      

      <summary>Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine. To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/presto-conference-tokyo/overall-view.jpg" />
      
    </entry>
  
    <entry>
      <title>Introduction to Trino Cost-Based Optimizer</title>
      <link href="https://trino.io/blog/2019/07/04/cbo-introduction.html" rel="alternate" type="text/html" title="Introduction to Trino Cost-Based Optimizer" />
      <published>2019-07-04T00:00:00+00:00</published>
      <updated>2019-07-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/07/04/cbo-introduction</id>
      <content type="html" xml:base="https://trino.io/blog/2019/07/04/cbo-introduction.html">&lt;p&gt;Last edited 15 June 2022: Update to use the Trino project name.&lt;/p&gt;

&lt;p&gt;The Cost-Based Optimizer (CBO) in Trino achieves stunning results in industry
standard benchmarks (and not only in benchmarks)! The CBO makes decisions based
on several factors, including shape of the query, filters and table statistics.
I would like to tell you more about what the table statistics are in Trino and
what information can be derived from them.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;This post was originally published at &lt;a href=&quot;https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/&quot;&gt;Starburst Data Engineering
Blog&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;background&quot;&gt;Background&lt;/h1&gt;

&lt;p&gt;Before diving deep into how Trino analyzes statistics, let’s set up a stage so
that our considerations are framed in some context. Let’s consider a Data
Scientist who wants to know which customers spend most dollars with the
company, based on history of orders (probably to offer them some discounts).
They would probably fire up a query like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, Trino needs to create an execution plan for this query. It does so by
first transforming a query to a plan in the simplest possible way — here it
will create CROSS JOINS for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM customer c, orders o, lineitem l&lt;/code&gt; part of the
query and FILTER for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE c.custkey = o.custkey AND l.orderkey = o.orderkey&lt;/code&gt;.
The initial plan is very naïve — CROSS JOINS will produce humongous amounts of
intermediate data. There is no point in even trying to execute such a plan and
Trino won’t do that. Instead, it applies transformation to make the plan more
what user probably wanted, as shown below. Note: for succinctness, only part of
the query plan is drawn, without aggregation (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;) and sorting (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER
BY&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-eliminate-cross-join.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Indeed, this is much better than the CROSS JOINS. But we can do even better, if
we consider &lt;em&gt;cost&lt;/em&gt;.&lt;/p&gt;

&lt;h1 id=&quot;cost-based-optimizer&quot;&gt;Cost-Based Optimizer&lt;/h1&gt;

&lt;p&gt;Without going into database internals on how JOIN is implemented, let’s take
for granted that it makes a big difference which table is right and which is
left in the JOIN. (Simple explanation would be that the table on the right
basically needs to be kept in the memory while JOIN result is calculated).
Because of that, the following plans produce same result, but may have
different execution time or memory requirements.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-join-flip.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;CPU time, memory requirements and network bandwidth usage are the three
dimensions that contribute to query execution time, both in single query and
concurrent workloads. These dimensions are captured as the &lt;em&gt;cost&lt;/em&gt; in Trino.&lt;/p&gt;

&lt;p&gt;Our Data Scientist knows that most of the customers made at least one order and
every order had at least one item (and many orders had many items), so
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; is the biggest table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; is medium and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; is the
smallest. When joining &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt;, having &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; on the right
side of the JOIN is not a good idea! However, how the planner can know that? In
the real world, the query planner cannot reliably deduce information just from
table names. This is where table statistics kick in.&lt;/p&gt;

&lt;h2 id=&quot;table-statistics&quot;&gt;Table statistics&lt;/h2&gt;

&lt;p&gt;Trino has &lt;a href=&quot;https://trino.io/docs/current/develop/connectors.html&quot;&gt;connector-based
architecture&lt;/a&gt;. A
connector can provide &lt;a href=&quot;https://trino.io/docs/current/optimizer/statistics.html&quot;&gt;table and column
statistics&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of rows in a table,&lt;/li&gt;
  &lt;li&gt;number of distinct values in a column,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values in a column,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value in a column,&lt;/li&gt;
  &lt;li&gt;average data size for a column.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, if some information is missing — e.g. average text length in a
varchar column is unknown — a connector can still provide other information and
Cost-Based Optimizer will be able to use that.&lt;/p&gt;

&lt;p&gt;In our Data Scientist’s example, data sizes can look something like the
following:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-data-table-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Having this knowledge, &lt;a href=&quot;https://trino.io/docs/current/optimizer/cost-based-optimizations.html&quot;&gt;Trino’s Cost-Based
Optimizer&lt;/a&gt;
will come up with completely different join ordering in the plan.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-cbo-results.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;filter-statistics&quot;&gt;Filter statistics&lt;/h2&gt;

&lt;p&gt;As we saw, knowing the sizes of the tables involved in a query is fundamental
to properly reordering the joins in the query plan. However, knowing just the
sizes is not enough. Returning to our example, the Data Scientist might want to
drill down into results of their previous query, to know which customers
repeatedly bought and spent most money on a particular item (clearly, this must
be some consumable, or a mobile phone). For this, they will use almost
identical query as the original one, adding one more condition.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;106170&lt;/span&gt;                              &lt;span class=&quot;c1&quot;&gt;--- additional condition&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The additional FILTER might be applied after the JOIN or before. Obviously,
filtering as early as possible is the best strategy, but this also means the
actual size of the data involved in the JOIN will be different now. In our Data
Scientist’s example, the join order will indeed be different.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-cbo-results-with-filter.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;under-the-hood&quot;&gt;Under the Hood&lt;/h1&gt;

&lt;h2 id=&quot;execution-time-and-cost&quot;&gt;Execution Time and Cost&lt;/h2&gt;

&lt;p&gt;From external perspective, only three things really matter:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;execution time,&lt;/li&gt;
  &lt;li&gt;execution cost (in dollars),&lt;/li&gt;
  &lt;li&gt;ability to run (sufficiently) many concurrent queries at a time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The execution time is often called “wall time” to emphasize that we’re not
really interested in “CPU time” or number of machines/nodes/threads involved.
Our Data Scientist’s clock on the wall is the ultimate judge. It would be nice
if they were not forced to get coffee/eat lunch during each query they run. On
the other hand, a CFO will be interested in keeping cluster costs at the lowest
possible level (without, of course, impeding employees’ effectiveness). Lastly,
a System Administrator needs to ensure that all cluster users can work at the
same time. That is, that the cluster can handle many queries at a time,
yielding enough throughput that “wall time” observed by each of the users is
satisfactory.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/under-the-hood.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It is possible to optimize for only one of the above dimensions. For example,
we can have single node cluster and CFO will be happy (but employees will go
somewhere else). Contrarily, we may have thousand node cluster even if the
company cannot afford that. Users will be (initially) happy, until the company
goes bankrupt. Ultimately, however, we need to balance these trade-offs, which
basically means that queries need to be executed as fast as possible, with as
little resources as possible.&lt;/p&gt;

&lt;p&gt;In Trino, this is modeled with the concept of the cost, which captures
properties like CPU cost, memory requirements and network bandwidth usage.
Different variants of a query execution plan are explored, assigned a cost and
compared. The variant with the least overall cost is selected for execution.
This approach neatly balances the needs of cluster users, administrators and
the CFO.&lt;/p&gt;

&lt;p&gt;The cost of each operation in the query plan is calculated in a way appropriate
for the type of the operation, taking into account statistics of the data
involved in the operation. Now, let’s see where the statistics come from.&lt;/p&gt;

&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;

&lt;p&gt;In our Data Scientist’s example, the row counts for tables were taken directly
from table statistics, i.e. provided by a connector. But where did “~3K rows”
come from? Let’s dive into some nitty-gritty details.&lt;/p&gt;

&lt;p&gt;A query execution plan is made of “building block” operations, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;table scans (reading the table; at runtime this is actually combined with a
filter)&lt;/li&gt;
  &lt;li&gt;filters (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause or any other conditions deduced by the query
planner)&lt;/li&gt;
  &lt;li&gt;projections (i.e. computing output expressions)&lt;/li&gt;
  &lt;li&gt;joins&lt;/li&gt;
  &lt;li&gt;aggregations (in fact there are a few different “building blocks” for
aggregations, but that’s a story for another time)&lt;/li&gt;
  &lt;li&gt;sorting (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;limiting (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;sorting and limiting combined (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY .. LIMIT ..&lt;/code&gt; deserves
specialized support)&lt;/li&gt;
  &lt;li&gt;and a lot more!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The way how the statistics are computed for most interesting “building blocks”
is discussed below.&lt;/p&gt;

&lt;h2 id=&quot;table-scan-statistics&quot;&gt;Table Scan statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/table-scan-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As explained in “Table statistics” section, the connector which defines the
table is responsible for providing the table statistics. Furthermore, the
connector will be informed about any filtering conditions that are to be
applied to the data read from the table. This may be important e.g. in the case
of Hive partitioned table, where statistics are stored on per-partition basis.
If the filtering condition excludes some (or many) partitions, the statistics
will consider smaller data set (remaining partitions) and will be more
accurate.&lt;/p&gt;

&lt;p&gt;To recall, a connector can provide the following table and column statistics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of rows in a table,&lt;/li&gt;
  &lt;li&gt;number of distinct values in a column,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values in a column,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value in a column,&lt;/li&gt;
  &lt;li&gt;average data size for a column.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;filter-statistics-1&quot;&gt;Filter statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/filter-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When considering a filtering operation, a filter’s condition is analyzed and
the following estimations are calculated:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;what is the probability that data row will pass the filtering condition. From
this, expected number of rows after the filter is derived,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values for columns involved in the filtering condition (for
most conditions, this will simply be 0%),&lt;/li&gt;
  &lt;li&gt;number of distinct values for columns involved in the filtering condition,&lt;/li&gt;
  &lt;li&gt;number of distinct values for columns that were not part of the filtering
condition, if their original number of distinct values was more than the
expected number of data rows that pass the filter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, for a condition like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item = 106170&lt;/code&gt; we can observe that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;no rows with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item&lt;/code&gt; being &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; will meet the condition,&lt;/li&gt;
  &lt;li&gt;there will be only one distinct value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item&lt;/code&gt; (106170) after the
filtering operation,&lt;/li&gt;
  &lt;li&gt;on average, number of data rows expected to pass the filter will be equal to
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;number_of_input_rows * fraction_of_non_nulls / distinct_values&lt;/code&gt;. (This
assumes, of course, that users most often drill down in the data they really
have, which is quite a reasonable assumption and also safe to make).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;projection-statistics&quot;&gt;Projection statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/projection-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Projections (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item – 1 AS iid&lt;/code&gt;) are similar to filters, except that, of
course, they do not impact the expected number of rows after the operation.&lt;/p&gt;

&lt;p&gt;For a projection, the following types of column statistics are calculated (if
possible for given projection expression):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of distinct values produced by the projection,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values produced by the projection,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value produced by the projection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Naturally, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iid&lt;/code&gt; is only returned to the user, then these statistics are not
useful. However, if it’s later used in filter or join operation, these
statistics are important to correctly estimate the number of rows that meet the
filter condition or are returned from the join.&lt;/p&gt;

&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Summing up, Trino’s Cost-Based Optimizer is conceptually a very simple thing.
Alternative query plans are considered, the best plan is chosen and executed.
Details are not so simple, though. Fortunately, to use
&lt;a href=&quot;https://trino.io/&quot;&gt;Trino&lt;/a&gt;, one doesn’t need to know all these details.
Of course, anyone with a technical inclination that like to wander in database
internals is invited to study &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino code&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Enabling Trino CBO is really simple:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimizer.join-reordering-strategy=AUTOMATIC&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;join-distribution-type=AUTOMATIC&lt;/code&gt; in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properties&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/analyze.html&quot;&gt;analyze&lt;/a&gt; your tables,&lt;/li&gt;
  &lt;li&gt;no, there is no third step. That’s it!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Take Trino CBO for a spin today and let us know about &lt;em&gt;your&lt;/em&gt; Trino
experience!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>Last edited 15 June 2022: Update to use the Trino project name. The Cost-Based Optimizer (CBO) in Trino achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Trino and what information can be derived from them.</summary>

      
      
    </entry>
  
    <entry>
      <title>Dynamic filtering for highly-selective join optimization</title>
      <link href="https://trino.io/blog/2019/06/30/dynamic-filtering.html" rel="alternate" type="text/html" title="Dynamic filtering for highly-selective join optimization" />
      <published>2019-06-30T00:00:00+00:00</published>
      <updated>2019-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/30/dynamic-filtering</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/30/dynamic-filtering.html">&lt;p&gt;By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;In the highly-selective join scenario, most of the probe-side rows are dropped immediately after being read, since they 
don’t match the join criteria.&lt;/p&gt;

&lt;p&gt;Our idea was to extend Presto’s predicate pushdown support from the planning phase to run-time, in order to skip reading 
the non-relevant rows from &lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-for-apps-deck-varada-prestoconf&quot;&gt;our connector&lt;/a&gt; 
into Presto&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot; role=&quot;doc-noteref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. It should allow much faster joins, when the build-side scan results in a low-cardinality table:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/dynamic-filtering.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The approach above is called “dynamic filtering”, and there is &lt;a href=&quot;https://github.com/trinodb/trino/issues/52&quot;&gt;an ongoing effort&lt;/a&gt; 
to integrate it into Presto.&lt;/p&gt;

&lt;p&gt;The main difficulty is the need to pass the build-side values from the inner-join operator to the probe-side scan operator, 
since the operators may run on different machines. A possible solution is to use the coordinator to facilitate the message 
passing. However, it requires multiple changes in the existing Presto codebase and careful design is needed to avoid overloading
the coordinator.&lt;/p&gt;

&lt;p&gt;Since it’s a complex feature with lots of moving parts, we suggest the approach below that allows solving it in a simpler way 
for specific join use-cases. We note that parts of the implementation below will also help implementing the general dynamic 
filtering solution.&lt;/p&gt;

&lt;h1 id=&quot;design&quot;&gt;Design&lt;/h1&gt;

&lt;p&gt;Our approach relies on the &lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2018/09/Presto-Cost-Based-Query-Optimizer-WP.pdf&quot;&gt;cost-based optimizer&lt;/a&gt; 
(CBO) that allows using “broadcast” join, since in our case the build-side is much smaller than the probe-side. In this case, 
the probe-side scan and the inner-join operators are running in the same process - so the message passing between them becomes 
much simpler.&lt;/p&gt;

&lt;p&gt;Therefore, most of the required changes are at the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalExecutionPlanner&lt;/code&gt;&lt;/a&gt; 
class, and there is no dependencies on the planner nor the coordinator.&lt;/p&gt;

&lt;h1 id=&quot;implementation&quot;&gt;Implementation&lt;/h1&gt;

&lt;p&gt;First, we make sure that a broadcast join is used and that the local stage query plan contains the probe-side 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/plan/TableScanNode.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt;&lt;/a&gt; node.
Otherwise - we don’t apply our the optimization since we need access to the probe-side &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/split/PageSourceProvider.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSourceProvider&lt;/code&gt;&lt;/a&gt; 
for predicate pushdown.&lt;/p&gt;

&lt;p&gt;Then, we add a new “collection” operator, just before the hash-builder operator as described below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/operators.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This operator collects the build-side values, and after its input is over, exposes the resulting dynamic filter as a 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-spi/src/main/java/io/prestosql/spi/predicate/TupleDomain.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDomain&lt;/code&gt;&lt;/a&gt; 
to the probe-side &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/split/PageSourceProvider.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSourceProvider&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since the probe-side scan operators are running concurrently with the build-side collection, we don’t block the first probe-side 
splits - but allow them to be processed while dynamic filters collection is in progress.&lt;/p&gt;

&lt;p&gt;The lookup-join operator is not changed, but the optimization above allows it to process much less probe-side rows, while 
keeping the result the same.&lt;/p&gt;

&lt;h1 id=&quot;benchmarks&quot;&gt;Benchmarks&lt;/h1&gt;

&lt;p&gt;We ran TPC-DS queries on i3.metal 3-node Varada cluster using TPC-DS scale 1000 data.
The following queries benefit the most for our dynamic filtering implementation (measuring the elapsed time in seconds).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Query&lt;/th&gt;
      &lt;th&gt;Dynamic filtering &amp;amp; CBO&lt;/th&gt;
      &lt;th&gt;Only CBO&lt;/th&gt;
      &lt;th&gt;No CBO&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q10.sql&quot;&gt;q10&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;2.5&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;10.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q20.sql&quot;&gt;q20&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.9&lt;/td&gt;
      &lt;td&gt;12.6&lt;/td&gt;
      &lt;td&gt;26.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q31.sql&quot;&gt;q31&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;6.5&lt;/td&gt;
      &lt;td&gt;34.8&lt;/td&gt;
      &lt;td&gt;41.5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q32.sql&quot;&gt;q32&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;6.9&lt;/td&gt;
      &lt;td&gt;23.0&lt;/td&gt;
      &lt;td&gt;29.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q34.sql&quot;&gt;q34&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.1&lt;/td&gt;
      &lt;td&gt;11.4&lt;/td&gt;
      &lt;td&gt;14.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q69.sql&quot;&gt;q69&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;2.7&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q71.sql&quot;&gt;q71&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
      &lt;td&gt;91.8&lt;/td&gt;
      &lt;td&gt;107.4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q77.sql&quot;&gt;q77&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.5&lt;/td&gt;
      &lt;td&gt;17.9&lt;/td&gt;
      &lt;td&gt;18.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q96.sql&quot;&gt;q96&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;1.9&lt;/td&gt;
      &lt;td&gt;8.0&lt;/td&gt;
      &lt;td&gt;10.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q98.sql&quot;&gt;q98&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;5.8&lt;/td&gt;
      &lt;td&gt;26.5&lt;/td&gt;
      &lt;td&gt;57.1&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/benchmark.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For example, running the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q71.sql&quot;&gt;TPC-DS q71 query&lt;/a&gt; 
results in ~9x performance improvement:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Dynamic filtering&lt;/th&gt;
      &lt;th&gt;Enabled&lt;/th&gt;
      &lt;th&gt;Disabled&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Elapsed (sec)&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;92&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;CPU (min)&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;127&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Data read (GB)&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;112&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id=&quot;discussion&quot;&gt;Discussion&lt;/h1&gt;

&lt;p&gt;These queries are joining large fact “sales” tables with much smaller and filtered dimension tables (e.g. “items”, “customers”, “stores”) - 
resulting in significant optimization by using dynamic filtering.&lt;/p&gt;

&lt;p&gt;Note that we rely on the fact that our connector allows efficient run-time filtering of the build-side table, by using an inline index 
for every column for each split.&lt;/p&gt;

&lt;p&gt;We also rely on the CBO and statistics’ estimation to correctly convert join distribution type to “broadcast” join. Since current statistics’ 
estimation doesn’t support all query plans, this optimization cannot be currently applied for some types of 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/58b86da0eda9d479d418d9752b8cdd4d2c44d9ae/presto-main/src/main/java/io/prestosql/cost/AggregationStatsRule.java&quot;&gt;aggregations&lt;/a&gt; 
(e.g. &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q19.sql&quot;&gt;TPC-DS q19 query&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In addition, our current dynamic filtering doesn’t support multiple join operators in the same stage, so there are some TPC-DS queries 
(e.g. &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q13.sql&quot;&gt;q13&lt;/a&gt;) 
that may be optimized further.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;

&lt;p&gt;The implementation above is currently in the process of being &lt;a href=&quot;https://github.com/trinodb/trino/pull/931&quot;&gt;reviewed&lt;/a&gt; and will be 
available in a release soon. In addition, we intend to improve the existing implementation to resolve the limitations described above, 
and to support more join patterns.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;Initially we had experimented with adding &lt;a href=&quot;https://github.com/trinodb/trino/blob/1afbe98bb1eebfcf9050efa5c9a6bb6ccad80c8c/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L527-L533&quot;&gt;Index Join support&lt;/a&gt; to our connector, but since it requires a global index and efficient lookups for high performance, we switched to the dynamic filtering approach. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Roman Zeyde</name>
        </author>
      

      <summary>By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 315</title>
      <link href="https://trino.io/blog/2019/06/15/release-315.html" rel="alternate" type="text/html" title="Release 315" />
      <published>2019-06-15T00:00:00+00:00</published>
      <updated>2019-06-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/15/release-315</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/15/release-315.html">&lt;p&gt;This version adds support for
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... WITH TIES&lt;/code&gt;&lt;/a&gt;
syntax, locality-awareness to default scheduler for better workload balancing, the new
&lt;a href=&quot;https://trino.io/docs/current/functions/conversion.html#format&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format()&lt;/code&gt;&lt;/a&gt; function,
and improved support for ORC bloom filters. Additionally, connectors can now provide
view definitions, which opens up several new use cases.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-315.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for FETCH FIRST ... WITH TIES syntax, locality-awareness to default scheduler for better workload balancing, the new format() function, and improved support for ORC bloom filters. Additionally, connectors can now provide view definitions, which opens up several new use cases. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 314</title>
      <link href="https://trino.io/blog/2019/06/08/release-314.html" rel="alternate" type="text/html" title="Release 314" />
      <published>2019-06-08T00:00:00+00:00</published>
      <updated>2019-06-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/08/release-314</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/08/release-314.html">&lt;p&gt;This version adds support for reading ZSTD and LZ4-compressed Parquet data
and writing ZSTD-compressed ORC data, improves compatibility with the Hive
2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON
output format for the CLI, and improves the rendering of the plan structure
in &lt;a href=&quot;https://trino.io/docs/current/sql/explain.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;&lt;/a&gt; output.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-314.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for reading ZSTD and LZ4-compressed Parquet data and writing ZSTD-compressed ORC data, improves compatibility with the Hive 2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON output format for the CLI, and improves the rendering of the plan structure in EXPLAIN output. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Apache Phoenix Connector</title>
      <link href="https://trino.io/blog/2019/06/04/phoenix-connector.html" rel="alternate" type="text/html" title="Apache Phoenix Connector" />
      <published>2019-06-04T00:00:00+00:00</published>
      <updated>2019-06-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/04/phoenix-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/04/phoenix-connector.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;
introduces a new &lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html&quot;&gt;Apache Phoenix Connector&lt;/a&gt;, 
which allows Presto to query data stored in &lt;a href=&quot;https://hbase.apache.org/&quot;&gt;HBase&lt;/a&gt;
using &lt;a href=&quot;https://phoenix.apache.org/&quot;&gt;Apache Phoenix&lt;/a&gt;.  This unlocks new capabilities that previously
weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and
joining Phoenix data with data from other Presto data sources.&lt;/p&gt;

&lt;h1 id=&quot;setup&quot;&gt;Setup&lt;/h1&gt;
&lt;p&gt;To get started, simply drop in a new catalog properties file, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/phoenix.properties&lt;/code&gt;,
which defines the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;connector.name=phoenix
phoenix.connection-url=jdbc:phoenix:host1,host2,host3:2181:/hbase
phoenix.config.resources=/path/to/hbase-site.xml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix.connection-url&lt;/code&gt; is the standard Phoenix connection string, which contains the zookeeper
quorum host information and root zookeeper node.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix.config.resources&lt;/code&gt; is a comma separated list of configuration files, used to specify any
&lt;a href=&quot;https://phoenix.apache.org/tuning.html&quot;&gt;custom connection properties&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;schema&quot;&gt;Schema&lt;/h1&gt;
&lt;p&gt;For the most part, data types in Phoenix match up with those in Presto, with a few
&lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html#data-types&quot;&gt;minor exceptions&lt;/a&gt;.  One thing
to note, however, is that tables in Phoenix require a primary key, whereas Presto has no concept of
primary keys.  To handle this, the Phoenix connector uses a table property to specify the primary key. 
For example, consider the following statement in Phoenix:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CONSTRAINT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The equivalent statement in Presto would look something like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;rowkeys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;pk_part_1,pk_part2&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additional Phoenix and HBase table properties can be specified in a 
&lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html#table-properties-phoenix&quot;&gt;similar way&lt;/a&gt;. 
Note also that the default (empty) schema in Phoenix will always map to a Presto schema named “default”.&lt;/p&gt;

&lt;h1 id=&quot;beyond-mapreduce&quot;&gt;Beyond MapReduce&lt;/h1&gt;
&lt;p&gt;When Phoenix users want to run long-running queries that scan over all/most of the data in a table,
they typically have used the Phoenix &lt;a href=&quot;https://phoenix.apache.org/phoenix_mr.html&quot;&gt;MapReduce integration&lt;/a&gt;. 
However, this has limitations, as the document states:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: The SELECT query must not perform any aggregation or use DISTINCT as these are not supported by our map-reduce integration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is because the framework only constructs simple Mappers which scan over each region.  To
do more complex operations like aggregations, the framework would need Reducers as well.
Someone could implement that, but then they would essentially be on the path towards rewriting
Hive from scratch.&lt;/p&gt;

&lt;p&gt;Presto now provides the ability to do these more complex operations.  The Phoenix connector
performs the same filtered scans as the MapReduce framework, but now the Presto engine does
the aggregations, joins, etc.&lt;/p&gt;

&lt;h1 id=&quot;federation&quot;&gt;Federation&lt;/h1&gt;
&lt;p&gt;With the Phoenix connector, querying multiple Phoenix clusters is as easy as querying the
respective catalogs.  As a simple example, suppose we have one cluster in region &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;us-west&lt;/code&gt; and
another cluster in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;us-east&lt;/code&gt;.  If we create two catalog files, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix_west.properties&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix_east.properties&lt;/code&gt;, then we can query both:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;us-west&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix_west&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;us-east&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix_east&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;joining-with-other-data-sources&quot;&gt;Joining with other data sources&lt;/h1&gt;
&lt;p&gt;Another nice feature of Presto is the ability to join data in Phoenix with other data sources.
Suppose we have the following tables:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;customer (
  custkey bigint,
  comment varchar,
  ...
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;orders (
  orderkey bigint,
  custkey bigint,
  totalprice double,
  ...
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Suppose further that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Either table can hold large amounts of data&lt;/li&gt;
  &lt;li&gt;The customer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; field can change frequently&lt;/li&gt;
  &lt;li&gt;We want to be able to query for orders with a certain &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;totalprice&lt;/code&gt; range, and join with the
customer table to get the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; for these orders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phoenix/HBase is a row-oriented storage solution with very fast lookup by primary key.  On the
other hand, ORC is a column-oriented file format that can filter results by column value very
efficiently.  So in this use case, it might make sense to store the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; table in Phoenix
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt; as the primary key, and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table in ORC, perhaps in an object store like
S3.  We can then use Presto to leverage the strengths of each of our data stores and combine OLTP
with OLAP:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;comment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;insertingupdating-data&quot;&gt;Inserting/Updating data&lt;/h1&gt;
&lt;p&gt;In the prior example, since our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; data is coming from Phoenix, our OLTP store, we can
easily insert new data:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;some comment&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since Presto’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; translates to Phoenix’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPSERT&lt;/code&gt;, inserting is the same as updating - i.e.
if there’s already a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt; of 101, then the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; will get updated instead.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;
&lt;p&gt;With upcoming improvements to Presto, there will be opportunities to further optimize the performance
of the Phoenix connector.&lt;/p&gt;

&lt;p&gt;One of the biggest ways Phoenix optimizes performance is through the use of 
&lt;a href=&quot;https://www.3pillarglobal.com/insights/hbase-coprocessors&quot;&gt;HBase coprocessors&lt;/a&gt;, which allow custom
code to be run on each regionserver.  For example, to do aggregations, Phoenix runs a partial
aggregation in the coprocessor of each table region, and the result for each region is then passed
back to the client for a final aggregation.  That way, the table data itself doesn’t need to be
sent from each region to the client - just the partial aggregation result.  However, currently only
filters are pushed down to the Phoenix connector.  With the ongoing work in Presto to support more
&lt;a href=&quot;https://github.com/trinodb/trino/issues/18&quot;&gt;complex pushdown&lt;/a&gt; to connectors, we will be able to
pushdown operations like aggregations to the Phoenix connector, which in turn can push them further
down to the HBase coprocessors.&lt;/p&gt;

&lt;p&gt;Another area of potential improvement is integration with Presto’s 
&lt;a href=&quot;https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/&quot;&gt;cost-based optimizer&lt;/a&gt;,
which can analyze table statistics to do things like join reordering. Phoenix already supports
&lt;a href=&quot;https://phoenix.apache.org/update_statistics.html&quot;&gt;statistics collection&lt;/a&gt;, with more improvements
underway, so this is just a matter of integrating with the Presto statistics framework.&lt;/p&gt;

&lt;h1 id=&quot;questions&quot;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;If you have any questions about the connector, or Phoenix in general, feel free to ask on the
Phoenix dev mailing list: &lt;a href=&quot;mailto:dev@phoenix.apache.org&quot;&gt;dev@phoenix.apache.org&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Vincent Poon</name>
        </author>
      

      <summary>Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.</summary>

      
      
    </entry>
  
    <entry>
      <title>Removing redundant ORDER BY</title>
      <link href="https://trino.io/blog/2019/06/03/redundant-order-by.html" rel="alternate" type="text/html" title="Removing redundant ORDER BY" />
      <published>2019-06-03T00:00:00+00:00</published>
      <updated>2019-06-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/03/redundant-order-by</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/03/redundant-order-by.html">&lt;p&gt;Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work.
Some SQL constructs such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; do not affect query results in many situations, and can negatively
affect performance unless the optimizer is &lt;em&gt;smart enough&lt;/em&gt; to remove them.&lt;/p&gt;

&lt;p&gt;Until very recently, Presto would insert a sorting step for each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause in a query. This, combined
with users and tools inadvertently using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; in places that have no effect, could result in severe
performance degradation and waste of resources. We finally fixed this in
&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Quoting from the SQL specification (ISO 9075 Part 2):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; can contain an optional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;. The ordering of the rows of the table
 specified by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; is guaranteed only for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; that immediately 
 contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means, a query engine is free to ignore any &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause that doesn’t fit that context. Let’s consider
some examples where the clause is irrelevant.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;While this query has the semblance of creating a sorted table, that’s not so. Tables in SQL are inherently
unordered. Once the data is written, there’s no guarantee it will come out sorted when read. This is 
particularly true for a parallel, distributed query engine like Presto that reads and processes data using
many threads simultaneously. Note that some storage engines may store data sorted, but that is not controlled
during data insertion. Executing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; just causes the query to perform poorly due to reduced 
parallelism in the merging step of a distributed sort, and consumes more CPU and memory to sort the data.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;key&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this case, whether the tables involved in the join are sorted doesn’t matter, since Presto is going to 
build a hash lookup table out of one of them to execute the join operation. As in the previous example
preserving the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; just causes the query to perform poorly.&lt;/p&gt;

&lt;p&gt;When &lt;em&gt;does&lt;/em&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; matter? Since it is “guaranteed only for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; that immediately 
contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;”, only operations that are part of the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; are 
sensitive to it.&lt;/p&gt;

&lt;p&gt;A query expression is a block with the following structure:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;query expression&amp;gt; ::=
  [ &amp;lt;with clause&amp;gt; ] 
  &amp;lt;query expression body&amp;gt;
  [ &amp;lt;order by clause&amp;gt; ] 
  [ &amp;lt;result offset clause&amp;gt; ] 
  [ &amp;lt;fetch first clause&amp;gt; ]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression body&amp;gt;&lt;/code&gt; devolves into one of the set operations (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTERSECT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXCEPT&lt;/code&gt;), 
a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; construct, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VALUES&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;The only operations that occur after an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt; (a.k.a., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;. So, 
unless a subquery contains one of these two clauses, the query engine is free to remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; 
clause without breaking the semantics dictated by the specification.&lt;/p&gt;

&lt;p&gt;Here’s an example where the clause is meaningful:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Other databases tackle this in a variety of ways. &lt;a href=&quot;https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/&quot;&gt;MariaDB&lt;/a&gt;
and &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.remove.orderby.in.subquery&quot;&gt;Hive 3.0&lt;/a&gt;
will ignore redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clauses. SQL Server, on the other hand, will produce an error:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table
expressions, unless TOP or FOR XML is also specified.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;whats-the-catch&quot;&gt;What’s the catch?&lt;/h2&gt;

&lt;p&gt;It is a common mistake for users to think the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause has a meaning in the language regardless of where it 
appears in a query. The fact that, for implementation reasons, in some cases &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; is significant for Presto 
complicates matters. We often see users rely on this when formulating queries where aggregation or window functions 
are sensitive to the order of their inputs:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Right Way™ of doing this in SQL is to use the aggregation or window-specific &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause. For the
examples above:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order to ease the transition, the new behavior can be turned off globally via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimizer.skip-redundant-sort&lt;/code&gt;
configuration option or on a per-session basis via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skip_redundant_sort&lt;/code&gt; session property. 
These options will be removed in a future version.&lt;/p&gt;

&lt;p&gt;Additionally, any time Presto detects a redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause, it will warn users about it:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/redundant-order-by/redundant-order-by.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work. Some SQL constructs such as ORDER BY do not affect query results in many situations, and can negatively affect performance unless the optimizer is smart enough to remove them.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 313</title>
      <link href="https://trino.io/blog/2019/06/01/release-313.html" rel="alternate" type="text/html" title="Release 313" />
      <published>2019-06-01T00:00:00+00:00</published>
      <updated>2019-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/01/release-313</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/01/release-313.html">&lt;p&gt;This version fixes incorrect results for queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPING SETS&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;, fixes selecting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type from the CLI and JDBC driver,
and adds support for compression and encryption when using
&lt;a href=&quot;https://trino.io/docs/current/admin/spill.html&quot;&gt;Spill to Disk&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-313.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version fixes incorrect results for queries involving GROUPING SETS and LIMIT, fixes selecting the UUID type from the CLI and JDBC driver, and adds support for compression and encryption when using Spill to Disk. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Using Precomputed Hash in SemiJoin Operations</title>
      <link href="https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd.html" rel="alternate" type="text/html" title="Using Precomputed Hash in SemiJoin Operations" />
      <published>2019-05-30T00:00:00+00:00</published>
      <updated>2019-05-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd.html">&lt;p&gt;Queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; over a subquery are much faster in 
&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/semijoin-precomputed-hash/semijoin-precomputed-hash-gains.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We ran the benchmark above with 3 workers (r3.2xlarge) and 1 coordinator (r3.xlarge) on 
TPC-DS scale 1000 stored in ORC format using the following queries:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ss_customer_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_customer_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ss_store_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_store_sk&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_hours&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;8AM-4PM&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;what-was-the-improvement&quot;&gt;What was the improvement?&lt;/h1&gt;

&lt;p&gt;We found that the optimization to use precomputed hashes, which is enabled by 
default, was missing in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator.  Hash values were precomputed at the leaf 
stages but they were not being used in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator leading to re-calculation 
of the hash values at this operator. Since queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; over a 
subquery use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator, &lt;a href=&quot;https://github.com/trinodb/trino/pull/767&quot;&gt;the fix to use precomputed hash in SemiJoin operator&lt;/a&gt; 
improves the performance of such queries significantly.&lt;/p&gt;

&lt;h1 id=&quot;how-does-optimize-hash-generation-optimization-work&quot;&gt;How does &lt;em&gt;optimize-hash-generation&lt;/em&gt; optimization work&lt;/h1&gt;

&lt;p&gt;Presto divides a query plan into parts called Stages which can be run in parallel on 
multiple nodes, each node working on different set of data. There are two types of stages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Leaf Stages: these are the stages that are at the leaf of the Query Plan and read 
data from a datasource, like a Hive Table.&lt;/li&gt;
  &lt;li&gt;Intermediate Stages: these are the stages other than the leaf stages and process 
data from other upstream stages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; operator shuffles and transfers the output from upstream stages to the 
intermediate stages. For certain operators like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, output data of 
the leaf stage is partitioned by the values of a column and the shuffle operation ensures 
that a particular partition is always processed by the same task of the Intermediate stage. 
This partitioning requires calculation of a hash on that column’s values during exchange 
and later in the intermediate stage same hash is needed during the execution of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; 
or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; operation. To prevent redundant calculations, Presto calculates this hash value 
in the leaf stage, uses it in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; operator and makes it available in the output to let
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; operations use it in the intermediate stage.&lt;/p&gt;

&lt;p&gt;Consider this query to count the number of stores per city:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;city&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stores&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;city&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query plan (simplified) and its division into stages looks like below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/semijoin-precomputed-hash/query-plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The leaf stage (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage2&lt;/code&gt;) reads the table from a data source, feeds the partially 
aggregated data to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt; where final aggregation happens, and finally, the result is available 
via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each row produced by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage2&lt;/code&gt;, needs to be partitioned by the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;city&lt;/code&gt; column in it to ensure 
data for same city is processed by the same task of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt;. After the exchange, when a row is consumed 
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt;, it needs to be hashed again to find a group for the row so that the final aggregation 
accumulates results for each city in it’s corresponding group bucket. Double hash calculations on 
the values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;city&lt;/code&gt; column is prevented by doing this calculation once while reading the data and then 
using it in both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Final Aggregation&lt;/code&gt; operations which reduces CPU usage of the query. 
Additionally, pushing this calculation into leaf stage which is better parallelized when there is 
a large number of splits for this stage, improves query latency.&lt;/p&gt;

&lt;h1 id=&quot;how-to-get-this-fix&quot;&gt;How to get this fix?&lt;/h1&gt;

&lt;p&gt;This fix is available in Presto version 312 and above. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimize-hash-generation&lt;/code&gt; setting is enabled 
by default so the fix will be in action as soon as you upgrade your Presto installation.&lt;/p&gt;</content>

      
        <author>
          <name>Shubham Tagra, Qubole</name>
        </author>
      

      <summary>Queries involving IN and NOT IN over a subquery are much faster in Presto 312.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 312</title>
      <link href="https://trino.io/blog/2019/05/29/release-312.html" rel="alternate" type="text/html" title="Release 312" />
      <published>2019-05-29T00:00:00+00:00</published>
      <updated>2019-05-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/29/release-312</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/29/release-312.html">&lt;p&gt;This version has many performance improvements (including
&lt;a href=&quot;/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;cast optimization&lt;/a&gt;),
a new &lt;a href=&quot;https://trino.io/docs/current/language/types.html#uuid-type&quot;&gt;UUID&lt;/a&gt; data type
and &lt;a href=&quot;https://trino.io/docs/current/functions/uuid.html#uuid&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid()&lt;/code&gt;&lt;/a&gt; function,
a new &lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html&quot;&gt;Apache Phoenix connector&lt;/a&gt;,
support for the PostgreSQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP WITH TIME ZONE&lt;/code&gt; data type,
support for the MySQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; data type,
&lt;a href=&quot;/blog/2019/05/29/improved-hive-bucketing.html&quot;&gt;improved support for Hive bucketed tables&lt;/a&gt;,
and some bug fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version has many performance improvements (including cast optimization), a new UUID data type and uuid() function, a new Apache Phoenix connector, support for the PostgreSQL TIMESTAMP WITH TIME ZONE data type, support for the MySQL JSON data type, improved support for Hive bucketed tables, and some bug fixes. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Improved Hive Bucketing</title>
      <link href="https://trino.io/blog/2019/05/29/improved-hive-bucketing.html" rel="alternate" type="text/html" title="Improved Hive Bucketing" />
      <published>2019-05-29T00:00:00+00:00</published>
      <updated>2019-05-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/29/improved-hive-bucketing</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/29/improved-hive-bucketing.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;
adds support for the more flexible bucketing introduced in recent
versions of Hive. Specifically, it allows any number of files per bucket,
including zero. This allows inserting data into an existing partition without
having to rewrite the entire partition, and improves the performance of
writes by not requiring the creation of files for empty buckets.&lt;/p&gt;

&lt;h1 id=&quot;hive-bucketing-overview&quot;&gt;Hive bucketing overview&lt;/h1&gt;

&lt;p&gt;Hive bucketing is a simple form of hash partitioning. A table is bucketed
on one or more columns with a fixed number of hash buckets. For example,
a table definition in Presto syntax looks like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;page_views&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;page_url&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;dt&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bucketed_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;user_id&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bucket_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The bucketing happens within each partition of the table (or across the entire
table if it is not partitioned). In the above example, the table is partitioned
by date and is declared to have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50&lt;/code&gt; buckets using the user ID column. This
means that the table will have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50&lt;/code&gt; buckets for each date. The assigned bucket
for each row is determined by hashing the user ID value. This means that all
user IDs with the same value will go into the same bucket.&lt;/p&gt;

&lt;h1 id=&quot;original-hive-bucketing&quot;&gt;Original Hive bucketing&lt;/h1&gt;

&lt;p&gt;Originally, Hive required exactly one file per bucket. The files were named
such that the bucket number was implicit based on the file’s position within
the lexicographic ordering of the file names. For example, the following list
of files represent buckets &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;, respectively:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;00000_0
00001_0
00002_0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;file0
file3
file5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;bucketA
bucketB
bucketD
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The file names are meaningless aside from their ordering with respect to the
other file names.&lt;/p&gt;

&lt;h1 id=&quot;whats-the-problem&quot;&gt;What’s the problem?&lt;/h1&gt;

&lt;p&gt;The original Hive bucketing scheme has a couple of problems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Inserting data into the table by adding additional files is not possible.
Instead, an insert operation requires rewriting all of the existing files,
which can be quite expensive.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If the data is sparse, some of the buckets might be empty, but because there
must be a file for every bucket, the writer must create an empty file for
each bucket. Some file formats, such as ORC, support zero-byte files as empty
files. Other formats require writing a file with a valid header and footer.
Creating these files adds latency to the write operation, and storing these
tiny files is inefficient for file systems like HDFS which are designed for
large files.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;improved-hive-bucketing&quot;&gt;Improved Hive bucketing&lt;/h1&gt;

&lt;p&gt;Newer versions of Hive support a bucketing scheme where the bucket number is
included in the file name. This is the same naming scheme that Hive has always
used, thus it is backwards compatible with existing data. The naming convention
has the bucket number as the start of the file name, and requires that the
number starts with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The following list of files shows what data written by Hive might look like for
a table with a bucket count of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;4&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000000_0            # bucket 0
000000_0_copy_1     # bucket 0
000000_0_copy_2     # bucket 0
000001_0            # bucket 1
000001_0_copy_1     # bucket 1
000003_0            # bucket 3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can see that there are multiple files for buckets &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, one file for
bucket &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3&lt;/code&gt;, and no files for bucket &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, Presto used a different naming convention that was valid
according to the lexicographical ordering requirement, but not the newer
explicit numbering convention. File names written by Presto used to look
like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;20180102_030405_00641_x1y2z_bucket-00234
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20180102_030405_00641_x1y2z&lt;/code&gt; value at the start of the file name
is the Presto query ID for the query that wrote the data. This is followed
by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucket-&lt;/code&gt; plus the padded bucket number. Presto now writes file names
that match the new Hive naming convention, with the bucket number at the
the start and the query ID at the end:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000234_0_20180102_030405_00641_x1y2z
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When reading bucketed tables, Presto supports both the new Hive convention
and the old Presto convention. Additionally, it still supports the original
Hive scheme when the files do not match either of the naming conventions,
keeping the requirement that there must be exactly one file per bucket.&lt;/p&gt;

&lt;h1 id=&quot;skipping-empty-buckets-for-faster-writes&quot;&gt;Skipping empty buckets for faster writes&lt;/h1&gt;

&lt;p&gt;Now that Hive and Presto no longer require files for empty buckets, Presto
does not need to create them. They are still created by default for
compatibility with earlier versions of Hive, Presto, and other tools, but
we expect to disable it in a future release, making writes faster by default.
Or you may choose to disable them now if that works for your environment.
This is controlled by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.create-empty-bucket-files&lt;/code&gt; configuration
property or the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create_empty_bucket_files&lt;/code&gt; session property.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips</name>
        </author>
      

      <summary>Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.</summary>

      
      
    </entry>
  
    <entry>
      <title>Optimizing the Casts Away</title>
      <link href="https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html" rel="alternate" type="text/html" title="Optimizing the Casts Away" />
      <published>2019-05-21T00:00:00+00:00</published>
      <updated>2019-05-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/21/optimizing-the-casts-away</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html">&lt;p&gt;The next release of Presto (version 312) will include a new optimization to remove unnecessary casts 
which might have been added implicitly by the query planner or explicitly by users when they wrote the query.&lt;/p&gt;

&lt;p&gt;This is a long post explaining how the optimization works. If you’re only interested in the results,
skip to the &lt;a href=&quot;#results&quot;&gt;last section&lt;/a&gt;. For the full details, read on!&lt;/p&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS_CHTML&quot;&gt;
&lt;/script&gt;

&lt;div style=&quot;display:none&quot;&gt;
$$ 
\newcommand\cast[2]{
    \text{cast}_{\text{#1} \rightarrow \text{#2}}
} 
\newcommand\trueOrNull[1]{
  \text{if}(#1 \text{ is null}, \text{null}, \text{true})
} 
\newcommand\falseOrNull[1]{
  \text{if}(#1 \text{ is null}, \text{null}, \text{false})
} 
$$
&lt;/div&gt;

&lt;p&gt;Like many programming languages, SQL allows certain operations between values of different 
types if there are implicit conversions (a.k.a., implicit casts or coercions) between those types.
This improves usability, as it allows writing expressions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5 &amp;gt; 2&lt;/code&gt; without worrying &lt;em&gt;too much&lt;/em&gt;
whether the types are compatible (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5&lt;/code&gt; is of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(2,1)&lt;/code&gt;, while &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt; is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;During query analysis and planning, Presto introduces explicit casts for any implicit conversion in the
original query as it translates it into the intermediate query plan representation the engine uses 
internally for optimization and execution. This eliminates a layer of complexity for the optimizer, 
which, as a result, doesn’t need to reason about types (type inference) or worry about whether expressions 
are properly typed.&lt;/p&gt;

&lt;p&gt;More importantly, it simplifies the job of defining and implementing operators (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt;, etc). 
Without implicit conversions, there would need to exist a variant of every operator for every combination
 of compatible types. For example, it would be necessary to have an implementation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt; operator for 
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, tinyint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, smallint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, integer)&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, bigint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(smallint, integer)&lt;/code&gt;, and so on.&lt;/p&gt;

&lt;p&gt;Given two columns, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t :: smallint&lt;/code&gt;, and an expression such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s = t&lt;/code&gt;, the planner 
determines that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; can be implicitly coerced to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; and derives the following expression:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = t   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is not without challenges. The predicate pushdown logic relies on simple equality and 
range comparisons to move predicates around, and importantly, to infer that certain predicates
in one branch of a join can be used to constrain the values on the other side of the join. An
expression like the one above is not “simple” from this perspective due to the type conversion 
involved, and it can defeat the (arguably simplistic) predicate inference algorithm.&lt;/p&gt;

&lt;p&gt;Secondly, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is a constant (or an expression that is effectively constant), the engine has to 
convert every value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; it sees during query execution in order to compare it with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;. This 
brings up the obvious question: “can’t it somehow convert &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; and compare directly”?
It would look like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s = CAST(t AS tinyint)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is a constant, the term &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CAST(t AS tinyint)&lt;/code&gt; can be trivially pre-computed and reused 
for the entire query. It’s not that simple in the general case, though. Narrowing cast, such 
as a conversion from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;, or from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; can fail or alter
the value due to rounding or truncation, so we must take special care to avoid errors or 
change query semantics. We discuss this at length in the sections below.&lt;/p&gt;

&lt;h1 id=&quot;some-properties-of-well-behaved-implicit-casts&quot;&gt;Some properties of (well-behaved) implicit casts&lt;/h1&gt;

&lt;p&gt;Let’s take a short detour and talk briefly about some properties of well-behaved implicit 
casts we can exploit to do the transformation we described in the previous section.&lt;/p&gt;

&lt;p&gt;Since the query engine is free to insert implicit casts wherever it sees fit, these functions
need to follow some ground rules. Failure to do so can result in queries producing incorrect
results due to changes in query semantics.&lt;/p&gt;

&lt;p&gt;Implicit casts need to have the following properties:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Injective_function&quot;&gt;Injective&lt;/a&gt;. Given \(\cast{S}{T}\) every value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; 
must map to a distinct value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; (this does not imply that every value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; has to map to a value 
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, though).&lt;/li&gt;
  &lt;li&gt;Order-preserving. Given \(s_1 \in S\) and \(s_2 \in S\),&lt;/li&gt;
&lt;/ul&gt;

\[\begin{equation}
s_1 = s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) = \cast{S}{T}(s_2) \\
s_1 &amp;lt; s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) &amp;lt; \cast{S}{T}(s_2) \\
s_1 &amp;gt; s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) &amp;gt; \cast{S}{T}(s_2)
\end{equation}\]

&lt;p&gt;For exact numeric types (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal&lt;/code&gt;, etc.), this holds as long as 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; has enough integer digits to hold the integral part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; and enough fractional digits to 
hold the fractional part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As an example, the picture below depicts how every value of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;, which has a range
of \([-128, 127]\), maps to a distinct value of a wider type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;. Also, every value 
of the wider type that is within the range of representable values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; has a distinct 
mapping to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;. So, for the values within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; range, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;
conversion is &lt;a href=&quot;https://en.wikipedia.org/wiki/Bijection&quot;&gt;bijective&lt;/a&gt;. This is not necessary for the 
transformation to work, but it simplifies one of the cases we’ll consider. We’ll cover this more later.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/tinyint-integer.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;On the other hand, some conversions such as those between integer types and decimal types with fractional
parts are injective but not bijective, even when excluding the values outside the range of the narrower
 type.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/tinyint-decimal.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The properties clearly hold for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;biginteger&lt;/code&gt;. They also hold for:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(3,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(4,1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(5,2)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(5,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(6,1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(7,2)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(10,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(11,1)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(19,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(20, 1)&lt;/code&gt; → …&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It even works for conversions between exact and approximate numbers, such as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does &lt;em&gt;not&lt;/em&gt; work for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; when precision is large
because not all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt;s fit in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; (64 bits vs 53-bit mantissa) and not all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;s fit in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; 
(32 bits vs 23-bit mantissa). Sadly, for legacy reasons Presto allows those conversions implicitly. We “justify” 
it with the argument that “since they are dealing with approximate numerics anyway, and given the conversions only 
lose precision in the least significant part, they are sort of ok”. This is something we’ll revisit in the
future once we have a reasonable story around dealing with inherent break in backward-compatibility
of removing such conversions.&lt;/p&gt;

&lt;p&gt;Finally, the properties also apply for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; conversions:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(2)&lt;/code&gt; → … → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-to-the-point&quot;&gt;Getting to the point…&lt;/h1&gt;

&lt;p&gt;With this in mind, let’s look at the simplest scenario: conversions between integer types.&lt;/p&gt;

&lt;p&gt;As in the example we covered in the introduction, the transformation is straightforward 
when the constant can be represented in the narrower type. Given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = smallint &apos;1&apos;     ⟺  s = tinyint &apos;1&apos;
CAST(s AS smallint) = smallint &apos;127&apos;   ⟺  s = tinyint &apos;127&apos;
CAST(s AS smallint) = smallint &apos;-128&apos;  ⟺  s = tinyint &apos;-128&apos;

CAST(s AS smallint) &amp;gt; smallint &apos;10&apos;    ⟺  s &amp;gt; tinyint &apos;10&apos;
CAST(s AS smallint) &amp;lt; smallint &apos;10&apos;    ⟺  s &amp;lt; tinyint &apos;10&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course, when the value is at the edge of the range of the narrower type, we can cleverly 
turn some inequalities into equalities:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;gt;= smallint &apos;127&apos;   ⟺  s &amp;gt;= tinyint &apos;127&apos;  
                                        ⟺  s =  tinyint &apos;127&apos;
                                       
CAST(s AS smallint) &amp;lt;= smallint &apos;-128&apos;  ⟺  s &amp;lt;= tinyint &apos;-128&apos;  
                                        ⟺  s =  tinyint &apos;-128&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additionally, we may be able to tell that an expression is always &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;. Special
care needs to be taken when the value is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;, though, since in SQL any comparison with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; 
yields &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;gt; smallint &apos;127&apos;    ⟺  s &amp;gt; tinyint &apos;127&apos;  
                                        ⟺  if(s is null, null, false)
                                        
CAST(s AS smallint) &amp;lt;= smallint &apos;127&apos;   ⟺  s &amp;lt;= tinyint &apos;127&apos;  
                                        ⟺  if(s is null, null, true)

CAST(s AS smallint) &amp;lt; smallint &apos;-128&apos;   ⟺  s &amp;lt; tinyint &apos;-128&apos;  
                                        ⟺  if(s is null, null, false)
                                        
CAST(s AS smallint) &amp;gt;= smallint &apos;-128&apos;  ⟺  s &amp;gt;= tinyint &apos;-128&apos;  
                                        ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can make similar inferences when the value is outside the range of possible values
for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;. For equality comparisons, it’s trivial.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = smallint &apos;1000&apos;  ⟺  if(s is null, null, false)    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Conversely,&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;lt;&amp;gt; smallint &apos;1000&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just like the earlier cases involving comparisons with values at the edge of the range,
we can apply the same idea when the value falls outside of the range:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;lt; smallint &apos;1000&apos;   ⟺  if(s is null, null, true) 
CAST(s AS smallint) &amp;lt; smallint &apos;-1000&apos;  ⟺  if(s is null, null, false)

CAST(s AS smallint) &amp;gt; smallint &apos;1000&apos;   ⟺  if(s is null, null, false) 
CAST(s AS smallint) &amp;gt; smallint &apos;-1000&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;unrepresentable-values&quot;&gt;Unrepresentable values&lt;/h1&gt;

&lt;p&gt;Values that are outside the range of the narrower type may not be the only ones without a mapping. 
For example, for a type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(2,1)&lt;/code&gt;, any value with a fractional part (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2.3&lt;/code&gt;) cannot 
be represented as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We can tell whether a value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; is representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; by converting it to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; and back to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt;. We’ll 
call this value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&apos;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;lt;&amp;gt; t&apos;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is not representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, and similar rules as for out-of-range values apply when the 
expression involves an equality. For example, given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS double) =  double &apos;1.1&apos;  ⟺  if(s is null, null, false)    
CAST(s AS double) &amp;lt;&amp;gt; double &apos;1.1&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When some values in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; are not representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, the cast between &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T → S&lt;/code&gt; will generally either truncate
or round. The SQL specification doesn’t mandate which of those alternatives an implementation should follow,
and even allows that to vary for conversions between various combinations of types.&lt;/p&gt;

&lt;p&gt;This throws a bit of a wrench in our plans, so to speak. If we can’t tell whether a cast will round or truncate,
how would we know whether a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; comparison should turn into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;=&lt;/code&gt; in the resulting expression? To 
illustrate, let’s consider this example. Given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS double) &amp;gt; double &apos;1.9&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the conversion from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; truncates, the expression above is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt; tinyint &apos;1&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the other hand, if the conversion rounds, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.9&lt;/code&gt; becomes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;, and the expression is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt;= tinyint &apos;2&apos;              
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order to know which operator to use in the transformed expression (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; vs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;=&lt;/code&gt;), it is therefore 
crucial to distinguish between those two behaviors. The good news is that there’s a simple and elegant way
out of this hole.&lt;/p&gt;

&lt;p&gt;An important observation is that we don’t need to know how the conversion behaves &lt;em&gt;in general&lt;/em&gt;, but only how 
it behaves when applied to the constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;. Regardless of whether the conversion truncates or rounds, for a 
given value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;, the outcome can be seen to &lt;em&gt;round up&lt;/em&gt; or &lt;em&gt;round down&lt;/em&gt;, as depicted below.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/round-down.svg&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/round-up.svg&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We can easily tell which of those scenarios applies by comparing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&apos;&lt;/code&gt;: if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;gt; t&apos;&lt;/code&gt;, the operation rounded
down. Conversely, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;lt; t&apos;&lt;/code&gt;, it rounded up. If &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t = t&apos;&lt;/code&gt;, the value is representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, and the rules from the 
previous section apply.&lt;/p&gt;

&lt;h1 id=&quot;oh-the-nullability&quot;&gt;Oh, the nullability&lt;/h1&gt;

&lt;p&gt;Let’s take another quick detour and talk about the issue of nullability. After all, no discussion about
SQL is complete without an exploration of the semantics of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;SQL uses &lt;a href=&quot;https://en.wikipedia.org/wiki/Three-valued_logic#Application_in_SQL&quot;&gt;three-valued logic&lt;/a&gt;. In addition
to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, logical expressions can evaluate to an &lt;em&gt;unknown&lt;/em&gt; value, which is indicated by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.
Logical operations &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt; behave according to the following rules:&lt;/p&gt;

\[\begin{array}{|c|c|c|c|}
\hline
\text{A} &amp;amp; \text{B} &amp;amp; \text{A and B} &amp;amp; \text{A or B} \\ 
\hline
\text{true}&amp;amp; \text{null} &amp;amp; \text{null} &amp;amp; \text{true} \\ 
\hline
\text{false}&amp;amp; \text{null} &amp;amp; \text{false} &amp;amp; \text{null} \\ 
\hline
\end{array}\]

&lt;p&gt;The logical comparison operators =, &amp;lt;&amp;gt;, &amp;gt;, ≥, &amp;lt;, ≤ evaluate to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; when one or both operands are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.
Hence, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;, our expression &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s as smallint) = t&lt;/code&gt; can be simply replaced with a constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As we mentioned in the previous section, there are cases where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s as smallint) = t&lt;/code&gt; can be reduced to 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, &lt;em&gt;except&lt;/em&gt; for the fact that if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is null, the expression needs to return &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; to preserve
semantics. So, we use the following forms to capture this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;if(s IS null, null, false)
if(s IS null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The catch with that is that the optimizer does not understand the semantics of these &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt; expressions and cannot 
use them for deriving additional properties. In essence, it becomes an optimization barrier. On the other hand,
the optimizer is pretty good at manipulating logical conjunctions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND&lt;/code&gt;) and disjunctions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt;). So, let’s see 
how we can use boolean logic to obtain an equivalent formulation.&lt;/p&gt;

&lt;p&gt;We can exploit the properties of SQL boolean logic to derive expressions that behave in the same manner as the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if()&lt;/code&gt; constructs from above:&lt;/p&gt;

\[\begin{align}
    \text{if}(s \text{ is null}, \text{null}, \text{false}) &amp;amp; \iff (s \text{ is null}) \text{ and null} \\
    \text{if}(s \text{ is null}, \text{null}, \text{true})  &amp;amp; \iff (s \text{ is not null}) \text{ or null} \\
\end{align}\]

&lt;p&gt;Let’s break it down to see why that works.&lt;/p&gt;

\[\begin{align}         
   \text{if}(s \text{ is null}, \text{null}, \text{false}) &amp;amp; = (s \text{ is null}) \text{ and null} \\ 
      &amp;amp; = \begin{cases}
             \text{true and null}  &amp;amp; = \text{null},   &amp;amp; \text{if } s \text{ is null} \\
             \text{false and null} &amp;amp; = \text{false},  &amp;amp; \text{if } s \text{ is not null} 
          \end{cases} \\[5pt]
   \text{if}(s \text{ is null}, \text{null}, \text{true})  &amp;amp; = (s \text{ is not null}) \text{ or null} \\
      &amp;amp; = \begin{cases}
              \text{false or null}  &amp;amp; = \text{null},   &amp;amp; \text{if } s \text{ is null} \\
              \text{true or null}   &amp;amp; = \text{true},   &amp;amp; \text{if } s \text{ is not null} 
           \end{cases}
\end{align}\]

&lt;h1 id=&quot;putting-it-all-together&quot;&gt;Putting it all together&lt;/h1&gt;

&lt;p&gt;Now that we’ve had a taste of how this optimization works, let’s put it all together into one rule to rule
them all.&lt;/p&gt;

&lt;p&gt;Given an expression of the following form,&lt;/p&gt;

\[\cast{S}{T}(s) \otimes t \quad s \in S, t \in T, \otimes \in [=, \ne, &amp;lt;, \le, &amp;gt;, \ge]\]

&lt;p&gt;we derive a transformation based on the rules below.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;If \(t \text{ is null} \Rightarrow \cast{S}{T}(s) \otimes t \iff \text{null} \tag{1}\) \(\\[5pt]\)&lt;/li&gt;
  &lt;li&gt;If \(\exists s&apos; \in S \ldotp s&apos; = \cast{T}{S}(t)\), we calculate \(t&apos; = \cast{S}{T}(s&apos;)\) and consider 
the following cases:
    &lt;ol&gt;
      &lt;li&gt;&lt;a name=&quot;2.1&quot;&gt;&lt;/a&gt; If \(t = t&apos; \Rightarrow \cast{S}{T}(s) \otimes t \iff s \otimes \cast{T}{S}(t) \tag{2.1}\) \(\\[5pt]\)
        &lt;ul&gt;
          &lt;li&gt;&lt;a name=&quot;2.1.1&quot;&gt;&lt;/a&gt; In the special case where \(\\[5pt]\) \(\quad  s&apos; = \text{min}_S  \Rightarrow   
 \left\{
  \begin{array}{@{}ll@{}}
 \cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s \ne \text{min}_{S}     \\
 \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}           \\
 \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}          \\
 \cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_{S}
  \end{array}\right. \tag{2.1.1}  \\[5pt]\)&lt;/li&gt;
          &lt;li&gt;&lt;a name=&quot;2.1.2&quot;&gt;&lt;/a&gt; In the special case where \(\\[5pt]\) \(\quad s&apos; = \text{max}_S  \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff \falseOrNull{s}        \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_{S}     \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s \ne \text{max}_{S}   \\
\cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}
  \end{array}\right. \tag{2.1.2} \\[5pt]\)&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Otherwise, \(\\[5pt]\) \(\quad  t \ne t&apos; \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
   \cast{S}{T}(s) = t   &amp;amp; \iff \falseOrNull{s}        \\
   \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}            
  \end{array}\right. \tag{2.2} \\[5pt]\)&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;
            &lt;p&gt;Further, if \(\\[5pt]\) \(\quad \quad  t &amp;lt; t&apos; \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s \ge \cast{T}{S}(t)    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s \ge \cast{T}{S}(t)    \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s &amp;lt;  \cast{T}{S}(t)     \\
\cast{S}{T}(s) \le t &amp;amp; \iff s &amp;lt;  \cast{T}{S}(t)
  \end{array}\right. \tag{2.2.1} \\[5pt]\)&lt;br /&gt;
 In the special case where \(\\[5pt]\) \(\quad \quad s&apos; = \text{max}_S  \Rightarrow  
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s = \text{max}_{S}    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_{S}    \\
  \end{array}\right. \\[5pt] \tag{2.2.1.1}\)&lt;/p&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;p&gt;Otherwise, if \(\\[5pt]\) \(\quad \quad  t &amp;gt; t&apos; \Rightarrow
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s &amp;gt;    \cast{T}{S}(t)    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s &amp;gt;    \cast{T}{S}(t)    \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s \le  \cast{T}{S}(t)    \\
\cast{S}{T}(s) \le t &amp;amp; \iff s \le  \cast{T}{S}(t)
  \end{array}\right. \\[5pt] \tag{2.2.2}\)&lt;br /&gt;
 In the special case where \(\\[5pt]\) \(\quad \quad s&apos; = \text{min}_S  \Rightarrow  
  \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s = \text{min}_{S}    \\
\cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_{S}
 \end{array}\right. \\[5pt] \tag{2.2.2.1}\)&lt;/p&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;If \(\cast{T}{S}\) is undefined or \(\cast{T}{S}(t)\) fails, \(\\[5pt]\) \(t &amp;lt; \cast{S}{T}(\text{min}_S) \Rightarrow  
  \left\{
 \begin{array}{@{}ll@{}}
         \cast{S}{T}(s) =   t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}     \\
         \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) \le t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \trueOrNull{s}     \\
         \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}     
\end{array}\right. \\[5pt] \tag{3.1}\)
\(t = \cast{S}{T}(\text{min}_S) \Rightarrow  
  \left\{
 \begin{array}{@{}ll@{}}
         \cast{S}{T}(s) =   t &amp;amp; \iff s = \text{min}_S       \\
         \cast{S}{T}(s) \ne t &amp;amp; \iff s &amp;gt; \text{min}_S       \\
         \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}        \\
         \cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_S       \\
         \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff s &amp;gt; \text{min}_S       \\
         \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}     
\end{array}\right. \\[5pt] \tag{3.2}\)
\(t &amp;gt; \cast{S}{T}(\text{max}_S) \Rightarrow  
  \left\{
    \begin{array}{@{}ll@{}}
            \cast{S}{T}(s) =   t &amp;amp; \iff \falseOrNull{s}    \\
            \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \falseOrNull{s}    \\
            \cast{S}{T}(s) \ge t &amp;amp; \iff \falseOrNull{s}    
   \end{array}\right. \\[5pt] \tag{3.3}\)
\(t = \cast{S}{T}(\text{max}_S) \Rightarrow  
 \left\{
   \begin{array}{@{}ll@{}}
           \cast{S}{T}(s) =   t &amp;amp; \iff s = \text{max}_S   \\
           \cast{S}{T}(s) \ne t &amp;amp; \iff s &amp;lt; \text{max}_S   \\
           \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s &amp;lt; \text{max}_S   \\
           \cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}     \\
           \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \falseOrNull{s}    \\
           \cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_S       
  \end{array}\right. \\[5pt] \tag{3.4}\) &lt;br /&gt;
 Otherwise, the transformation is not applicable.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;omgwtfnan&quot;&gt;OMGWTFNaN&lt;/h1&gt;

&lt;p&gt;As if all of this weren’t enough, there’s an additional complication we need to handle for types such
as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;. Those types are what the SQL specification calls &lt;em&gt;approximate numeric&lt;/em&gt; types.
Presto implements them as &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_754&quot;&gt;IEEE-754&lt;/a&gt; single and double 
precision floating point numbers, respectively.&lt;/p&gt;

&lt;p&gt;In addition to finite numbers, IEEE-754 defines an additional set of values: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; (not a number).
It is worth noting that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+∞&lt;/code&gt; do not behave like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;∞&lt;/code&gt; in the mathematical sense. They are actual values
in the ordered set of numbers, but they don’t represent any finite number. Therefore, the following relations hold:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-∞ &amp;lt; -1.23E30 &amp;lt; 0 &amp;lt; 3.45E25 &amp;lt; +∞
-∞ = -∞
+∞ = +∞ 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+∞&lt;/code&gt; can be treated as regular values, we can use them as the minimum and maximum values of the range
for these types. Any other choice would not work, since all values of a type must be contained within the range of the type
for the transformation to be valid. That is,&lt;/p&gt;

\[\forall v \in T \quad T_{\text{min}} \le v \le T_{\text{max}}\]

&lt;p&gt;Let’s look at an example to understand why this is necessary. Instead of using \([-∞, ∞]\) as the range, 
let’s say we picked the minimum and maximum representable values for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; type (-3.4028235E38 and 3.4028235E38), and
consider this expression (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: real&lt;/code&gt;):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cast(s AS double) &amp;gt;= double &apos;3.4028235E38&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the rules in the previous section, \(t = 3.4028235\text{E}38\), \(s&apos;= 3.4028235\text{E}38\) and \(t&apos; = 3.4028235E38\). Since 
\(t = t&apos;\) and \(s&apos; = max_S\), from &lt;a href=&quot;#2.1.2&quot;&gt;rule 2.1.2&lt;/a&gt;, the expression reduces to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s = 3.4028235E38 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is clearly incorrect. When &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s = Infinity&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s AS double)&lt;/code&gt; results in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double &apos;Infinity&apos;&lt;/code&gt;, which is not equal
to 3.4028235E38.&lt;/p&gt;

&lt;p&gt;On the other hand, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; doesn’t obey any of the comparison rules. It’s neither equal nor distinct from itself, and
it’s neither larger, nor smaller than any other value:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;NaN =  NaN  ⟺  false  
NaN &amp;lt;&amp;gt; NaN  ⟺  false
NaN &amp;gt; 0     ⟺  false
NaN = 0     ⟺  false
NaN &amp;lt; 0     ⟺  false
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; is not part of the ordered set of values for these types, and the requirement that every value be contained 
in the range doesn’t hold. From &lt;a href=&quot;#2.1.1&quot;&gt;rule 2.1.1&lt;/a&gt;, an expression such as:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cast(s AS double) &amp;gt;= double &apos;-Infinity&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;reduces to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if(s is null, null, true)&lt;/code&gt;, which is incorrect, since the expression returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Is all hope lost for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;? Fortunately, not. The range is only needed as an optimization. If we
forgo defining a range for types that don’t have the required properties, the special cases &lt;a href=&quot;#2.1.1&quot;&gt;2.1.1&lt;/a&gt; and 
&lt;a href=&quot;#2.1.2&quot;&gt;2.1.2&lt;/a&gt; don’t apply, and by &lt;a href=&quot;#2.1&quot;&gt;rule 2.1&lt;/a&gt;, the expression is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt;= real &apos;-Infinity&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which correctly returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;-show-me-the-money&quot;&gt;&lt;a name=&quot;results&quot;&gt;&lt;/a&gt; Show me the money!&lt;/h1&gt;

&lt;p&gt;So, does all of this even matter? Why, yes! Glad you asked.&lt;/p&gt;

&lt;p&gt;As with any performance optimization, you can improve things by working smarter (can you avoid work that can be 
proven to be unnecessary) or by working harder (can you do the work you have to do more efficiently). This
optimization does a little of both. Let’s consider three scenarios when it has a positive effect.&lt;/p&gt;

&lt;h4 id=&quot;dead-code&quot;&gt;Dead code&lt;/h4&gt;

&lt;p&gt;Since in some cases it can prove that the comparisons will always produce &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, regardless of the input,
it can short-circuit entire conditions or subplans before even a single row of data is read. Some query generation 
tools are not sophisticated enough and may emit queries that contain that kind of construct. Also, everyone makes
mistakes, and it’s not hard to end up with queries that contain what’s effectively &lt;em&gt;dead code&lt;/em&gt;.  The last thing you
want is to sit in front of the screen waiting for a query to complete … waiting … waiting … just for Presto
to tell you &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;¯\_(ツ)_/¯&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, given:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;smallint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- &amp;lt;insert lots of rows into t&amp;gt; --&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Produces the following query plan (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Values&lt;/code&gt; is an empty inline table):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[x]
  - Values
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;improved-join-performance&quot;&gt;Improved JOIN performance&lt;/h4&gt;

&lt;p&gt;What’s nice about this optimization is that it &lt;em&gt;enables&lt;/em&gt; other optimizations to work better. We mentioned earlier
that comparisons that are not simple expressions between columns, or between columns and constants, make it harder for the
predicate pushdown optimization to infer predicates that can be propagated to the other branch of a join.&lt;/p&gt;

&lt;p&gt;Given two tables:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;smallint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the following query:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query plan without this optimization is:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[name]
  - InnerJoin[expr = v]
    - ScanFilterProject[t1, filter = CAST(v AS bigint) = BIGINT &apos;1&apos;]
        expr := CAST(v AS bigint)
    - TableScan[t2]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The optimization allows the predicate pushdown logic to apply the condition to the other side of the join, producing
a much better plan. If data in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t2&lt;/code&gt; is somehow organized by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v&lt;/code&gt; (e.g., a partition key in Hive), or if the
connector understands how to apply the filter at the source, the query won’t need to even read certain parts of the
table. The query plan with the optimization enabled:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[name]
  - CrossJoin
    - ScanFilterProject[t1, filter = (v = SMALLINT &apos;1&apos;)]
    - ScanFilterProject[t2, filter = (v = BIGINT &apos;1&apos;)]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;best-bang-for-the-buck&quot;&gt;Best bang for the buck&lt;/h4&gt;

&lt;p&gt;Finally, if the condition absolutely needs to be evaluated, the transformed expression could be significantly
more efficient, especially when the cast between the two types is expensive. To illustrate, given a table
with 1 billion rows and a column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k :: bigint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count_if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without the optimization:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- [...]
    - ScanProject
===&amp;gt;    CPU: 3.75m (66.34%), Scheduled: 5.56m (145.22%)
        expr := (CAST(&quot;k&quot; AS decimal(19,0)) &amp;gt; CAST(DECIMAL &apos;0&apos; AS decimal(19,0)))
        
        
Query 20190515_072240_00006_rgzb4, FINISHED, 4 nodes
Splits: 110 total, 110 done (100.00%)
0:22 [1000M rows, 8.4GB] [46M rows/s, 395MB/s]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With the optimization:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- [...]
    - ScanProject
===&amp;gt;    CPU: 29.93s (58.17%), Scheduled: 47.44s (145.07%)
        expr := (&quot;k&quot; &amp;gt; BIGINT &apos;0&apos;)
        
        
Query 20190515_071912_00005_bz6cb, FINISHED, 4 nodes
Splits: 110 total, 110 done (100.00%)
0:03 [1000M rows, 8.4GB] [335M rows/s, 2.81GB/s]        
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thirsty for more? Here’s the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/UnwrapCastInComparison.java&quot;&gt;code&lt;/a&gt;. 
Happy querying!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Many thanks to &lt;a href=&quot;https://github.com/kasiafi&quot;&gt;kasiafi&lt;/a&gt; for their thoughtful and thorough feedback on early
drafts of this post.&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Summit 2019 @TwitterSF</title>
      <link href="https://trino.io/blog/2019/05/17/Presto-Summit.html" rel="alternate" type="text/html" title="Presto Summit 2019 @TwitterSF" />
      <published>2019-05-17T00:00:00+00:00</published>
      <updated>2019-05-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/17/Presto-Summit</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/17/Presto-Summit.html">&lt;p&gt;Next month will mark the 2nd annual Presto Summit hosted by the
&lt;a href=&quot;https://trino.io/foundation.html&quot;&gt;Presto Software Foundation&lt;/a&gt;,
&lt;a href=&quot;https://starburstdata.com&quot;&gt;Starburst Data&lt;/a&gt;, and &lt;a href=&quot;https://twitter.com&quot;&gt;Twitter&lt;/a&gt;. Last year’s event was
a great success (see the
&lt;a href=&quot;https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/&quot;&gt;Presto Summit 2018 recap&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Please join the community of Presto users and developers for an all-day event dedicated to the world’s fastest 
distributed SQL query engine. At the Summit we’ll share the latest on Presto and learn how some of the most 
innovative companies are using this technology to power their analytics platforms.&lt;/p&gt;

&lt;p&gt;The agenda will feature talks from some of the world’s largest and innovative Presto users:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comcast&lt;/li&gt;
  &lt;li&gt;Twitter&lt;/li&gt;
  &lt;li&gt;Nordstrom&lt;/li&gt;
  &lt;li&gt;Grubhub&lt;/li&gt;
  &lt;li&gt;Lyft&lt;/li&gt;
  &lt;li&gt;Netflix&lt;/li&gt;
  &lt;li&gt;LinkedIn&lt;/li&gt;
  &lt;li&gt;Criteo&lt;/li&gt;
  &lt;li&gt;Starburst&lt;/li&gt;
  &lt;li&gt;Presto Software Foundation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(the details will be announced soon)&lt;/p&gt;

&lt;p&gt;If you wish to speak at the event, the call for papers is still open:
&lt;a href=&quot;https://www.starburstdata.com/2019-presto-summit-speaker-registration/&quot;&gt;2019 Presto Summit – Speaker Registration&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please RSVP to secure your spot (space is limited):
&lt;a href=&quot;https://prestosummit.splashthat.com/&quot;&gt;Presto Summit 2019 @TwitterSF&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Kamil Bajda-Pawlikowski</name>
        </author>
      

      <summary>Next month will mark the 2nd annual Presto Summit hosted by the Presto Software Foundation, Starburst Data, and Twitter. Last year’s event was a great success (see the Presto Summit 2018 recap).</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 311</title>
      <link href="https://trino.io/blog/2019/05/15/release-311.html" rel="alternate" type="text/html" title="Release 311" />
      <published>2019-05-15T00:00:00+00:00</published>
      <updated>2019-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/15/release-311</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/15/release-311.html">&lt;p&gt;This version adds standard
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#offset-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;&lt;/a&gt;
syntax, a new function
&lt;a href=&quot;https://trino.io/docs/current/functions/array.html#combinations&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;combinations()&lt;/code&gt;&lt;/a&gt;
for computing k-combinations of array elements,
and support for nested collections in Cassandra.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-311.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds standard OFFSET syntax, a new function combinations() for computing k-combinations of array elements, and support for nested collections in Cassandra. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-05-08</title>
      <link href="https://trino.io/blog/2019/05/08/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-05-08" />
      <published>2019-05-08T00:00:00+00:00</published>
      <updated>2019-05-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/08/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/08/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FL0O62iCkE8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Existing function support&lt;/li&gt;
  &lt;li&gt;Function namespaces&lt;/li&gt;
  &lt;li&gt;Connector-resolved functions&lt;/li&gt;
  &lt;li&gt;SQL-defined functions&lt;/li&gt;
  &lt;li&gt;Remote functions&lt;/li&gt;
  &lt;li&gt;Polymorphic table functions&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Existing function support Function namespaces Connector-resolved functions SQL-defined functions Remote functions Polymorphic table functions</summary>

      
      
    </entry>
  
    <entry>
      <title>Faster S3 Reads</title>
      <link href="https://trino.io/blog/2019/05/06/faster-s3-reads.html" rel="alternate" type="text/html" title="Faster S3 Reads" />
      <published>2019-05-06T00:00:00+00:00</published>
      <updated>2019-05-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/06/faster-s3-reads</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/06/faster-s3-reads.html">&lt;p&gt;Presto is known for working well with Amazon S3. We recently made an
improvement that greatly reduces network utilization and latency when
reading ORC or Parquet data.&lt;/p&gt;

&lt;h1 id=&quot;the-problem&quot;&gt;The problem&lt;/h1&gt;

&lt;p&gt;The improvement started with a question
from &lt;a href=&quot;https://github.com/bzillins&quot;&gt;Brenton Zillins&lt;/a&gt;
at &lt;a href=&quot;https://www.stackpath.com/&quot;&gt;Stackpath&lt;/a&gt;
on our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Slack&lt;/a&gt; workspace. He noticed
that the network traffic to Presto workers was many times larger than the
amount of input data reported by Presto for the query.&lt;/p&gt;

&lt;p&gt;After a lively discussion on the Slack channel, we found the cause. Parquet
would perform a positioned read against the S3 file system to ask for an
exact byte range (start and end). However, the file system only implemented
the streaming API, so it would tell S3 about the starting location, but
not the end location. The file system would stop reading from the stream once
it reached the requested end location, but substantial additional data could
be read from S3 due to various buffers in different parts of the system.&lt;/p&gt;

&lt;p&gt;The streaming API has an additional problem. Establishing a new connection
to S3 incurs latency, especially when using secure connections over TLS.
There is no way to abort a streaming request to S3, other than by closing
the connection, so the file system is forced to close connections after
every request, thus preventing the connection from being reused.&lt;/p&gt;

&lt;h1 id=&quot;the-fix&quot;&gt;The fix&lt;/h1&gt;

&lt;p&gt;We solved this by implementing positioned reads in the S3 file system.
Position reads, which are the only types used by ORC and Parquet, work by
asking S3 for the exact byte range required. These reads use the minimal
amount of network traffic and allow the connection to be reused.&lt;/p&gt;

&lt;p&gt;Brenton tested out the change and reported success:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This PR brought us from &amp;gt;1 GB/s object read rate to under 10 MB/s
the same query. Thank you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this issue is obvious in retrospect, we are surprised that it took
so long to find it, given that S3 is one of the most popular storage systems.
This is a great example of how the community makes everything better.
Being observant and reporting an issue can have a huge win for everyone.&lt;/p&gt;

&lt;h1 id=&quot;how-to-get-it&quot;&gt;How to get it&lt;/h1&gt;

&lt;p&gt;This improvement is in &lt;a href=&quot;https://trino.io/download.html&quot;&gt;Presto 302+&lt;/a&gt;,
so you will need to upgrade if you are using an earlier version.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips</name>
        </author>
      

      <summary>Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 310</title>
      <link href="https://trino.io/blog/2019/05/03/release-310.html" rel="alternate" type="text/html" title="Release 310" />
      <published>2019-05-03T00:00:00+00:00</published>
      <updated>2019-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/03/release-310</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/03/release-310.html">&lt;p&gt;This version adds standard
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt;&lt;/a&gt;
syntax, support for using an
&lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#s3-credentials&quot;&gt;alternate AWS role&lt;/a&gt;
when accessing S3 or Glue, and improved handling of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOUBLE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REAL&lt;/code&gt;
when Hive table and partition metadata differ.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-310.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds standard FETCH FIRST syntax, support for using an alternate AWS role when accessing S3 or Glue, and improved handling of DECIMAL, DOUBLE, and REAL when Hive table and partition metadata differ. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>A review of the first international Presto Conference, Tel Aviv, April 2019</title>
      <link href="https://trino.io/blog/2019/05/03/Presto-Conference-Israel.html" rel="alternate" type="text/html" title="A review of the first international Presto Conference, Tel Aviv, April 2019" />
      <published>2019-05-03T00:00:00+00:00</published>
      <updated>2019-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/03/Presto-Conference-Israel</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/03/Presto-Conference-Israel.html">&lt;p&gt;&lt;strong&gt;Community&lt;/strong&gt;, &lt;em&gt;noun&lt;/em&gt;: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/audience.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The fun picture you see here was taken at the first lecture of the First international
Presto summit in Israel last month.&lt;/p&gt;

&lt;p&gt;The atmosphere in the room during the various presentations was unique. It’s as if you
could physically feel the brainpower of 250 engineers fascinated by technology in one room.&lt;/p&gt;

&lt;p&gt;We would like to share with you a bit of the content that was discussed during
the conference. Enjoy the read and the videos!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-software-foundation-presentation&quot;&gt;Presto Software Foundation presentation&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/intro.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The day started with &lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom/&quot;&gt;Dain Sundstrom&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/traversomartin/&quot;&gt;Martin Traverso&lt;/a&gt;, and
&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, Presto founders
who gave us a great panoramic view on &lt;a href=&quot;https://trino.io/foundation.html&quot;&gt;Presto Software Foundation&lt;/a&gt;,
past, present, and future roadmap.&lt;/p&gt;

&lt;p&gt;The Presto founders presented in their talk the following topics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Presto foundation creation&lt;/li&gt;
  &lt;li&gt;ORC improvements&lt;/li&gt;
  &lt;li&gt;The complex pushdown algorithm in details&lt;/li&gt;
  &lt;li&gt;The opensource roadmap strategy and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/pushdown.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can find the entire video of the presentation &lt;a href=&quot;https://vimeo.com/331764101&quot;&gt;here&lt;/a&gt; and the
slides &lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-summit-israel-201904&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;varada-presentation&quot;&gt;Varada presentation&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/david-krakov/&quot;&gt;David Krakov&lt;/a&gt;, co-founder and CTO at &lt;a href=&quot;https://varada.io&quot;&gt;Varada&lt;/a&gt;
explained how Varada is an example of how Presto can be leveraged to create a new innovative technology that
allows interactive analytics on top of a data lakes extracted sets, or in other words Presto for apps.&lt;/p&gt;

&lt;p&gt;David presented the three axes of innovation that the Varada team created, to achieve an indexed big
data on a distributed platform:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;SSD and NVMeF distributed calculation&lt;/li&gt;
  &lt;li&gt;All dimensions are indexed in the ingest process&lt;/li&gt;
  &lt;li&gt;Synchronization&lt;/li&gt;
  &lt;li&gt;Fully automated copy management directly connected to the raw data in the data lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/varada1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can find the video of the presentation &lt;a href=&quot;https://vimeo.com/331767154&quot;&gt;here&lt;/a&gt; and the slides
&lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-for-apps-deck-varada-prestoconf&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;wix-open-sourcing-quix&quot;&gt;WiX open sourcing Quix&lt;/h1&gt;

&lt;p&gt;The big announcement of the conference came from &lt;a href=&quot;https://www.linkedin.com/in/valeryfrolov/&quot;&gt;Valery Florov&lt;/a&gt;
of &lt;a href=&quot;http://wix.com/&quot;&gt;Wix&lt;/a&gt;. As a web-scale data-driven company, with 150M users, Wix has more than 1000 users
of Presto, and over 100K daily queries.&lt;/p&gt;

&lt;p&gt;All those queries come through a unified front end for data discovery, transformation, and query: the Quix
IDE. Quix is simultaneously:
A notebook manager for users to write and share executable notes&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dataset explorer showing catalogs and metadata&lt;/li&gt;
  &lt;li&gt;Feature-rich SQL query editor&lt;/li&gt;
  &lt;li&gt;Job scheduler for ETL jobs&lt;/li&gt;
  &lt;li&gt;Wix has open-sourced most of Quix, available under an MIT license at https://github.com/wix-incubator/quix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/wix.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As a Presto centric company Wix has developed few more exciting enhancements:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;HBase + Parquet interleaving to mix compacted historic data and latest 14 days&lt;/li&gt;
  &lt;li&gt;One SQL - a query rewriter that unifies usage of Presto and BigQuery to one SQL&lt;/li&gt;
  &lt;li&gt;ActiveDirectory data security layer to control access to data&lt;/li&gt;
  &lt;li&gt;Google Drive integration - run Presto SQL directly on Google Sheets. This is one of the coolest connectors
to be created and generated a lot of excitement. Can’t wait for Wix to open source this one as well!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See more in the &lt;a href=&quot;https://vimeo.com/331767442&quot;&gt;video&lt;/a&gt;,
&lt;a href=&quot;https://www.slideshare.net/OriReshef/quix-presto-ide-presto-summit-il&quot;&gt;slides&lt;/a&gt;,
&lt;a href=&quot;https://github.com/wix-incubator/quix&quot;&gt;source code&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;ironsource----analyzing-data-at-a-petabyte-scale&quot;&gt;Ironsource -  Analyzing data at a petabyte scale.&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.ironsrc.com/&quot;&gt;Ironsource&lt;/a&gt; is the ad network of choice for the gaming industry.  Supplying
solutions for application developers, customer engagement solutions and Ad monetization. Ironsource collects
terabytes of events on a daily basis.&lt;/p&gt;

&lt;p&gt;In his talk, &lt;a href=&quot;https://www.linkedin.com/in/korenor/&quot;&gt;Or Koren&lt;/a&gt;, head of the data team at Ironsource, shared
their journey from terabyte scale to petabyte scale. In his talk Or showed how their entire interactive
analytics platform was rebuilt to be based on Presto, and the huge savings they got from it including new
business insights coming from their data science teams and the data analyst team.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/Israel-2019/ironsource1.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/Israel-2019/ironsource2.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The before and after slides that Or presented in a very clear way the reduction in cost and the increase
in efficiency that the use of Presto brought to Ironsource.&lt;/p&gt;

&lt;p&gt;See Or’s slides &lt;a href=&quot;https://www.slideshare.net/OriReshef/data-analytics-at-a-petabyte-scale-final&quot;&gt;here&lt;/a&gt; and the
talk &lt;a href=&quot;https://vimeo.com/333732300&quot;&gt;video&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;datorama-on-mutable-data-at-scale&quot;&gt;Datorama on mutable data at scale&lt;/h1&gt;

&lt;p&gt;A charismatic presenter, &lt;a href=&quot;https://www.linkedin.com/in/afinkelstein/&quot;&gt;Alexey Finkelstein&lt;/a&gt; from
&lt;a href=&quot;https://datorama.com/&quot;&gt;Salesforce Datorama&lt;/a&gt; had the room rolling with laughter more than once, and
on a topic of no laughter: managing mutable data with Presto.  Datorama provides a marketing intelligence
platform. It has 30,000 customers, who can interactively interact with 1.5PB of data available for interactive
queries.&lt;/p&gt;

&lt;p&gt;Datorama provides for that a “data lake as a service”, called a DatoLake. Files on data lakes by their nature
are not transactionally updatable on a row level, but the users of Datorama require the ability to delete/update
 specific rows in a transactional manner.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/datorama.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To solve this Datorma has embarked on a journey. Based on partitioning the data by a version number (such as
 20190101_&lt;strong&gt;009&lt;/strong&gt;), and rebuilding a partition based on updates.  There were 3 attempts to the journey and
learning on each step:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;At first, using an external Postgres metastore to store the versions, swapping in the metastore and using
that as part of a sub-query to Presto to use the correct version. This approach did not pushdown partition pruning.&lt;/li&gt;
  &lt;li&gt;Next, moving the metastore query to happen before query generation, and be dynamically generate the right filter
at each sub-query. This approach required two-pass processing for each query and did not support direct SQL to clients.&lt;/li&gt;
  &lt;li&gt;And finally, swapping the partition in the Hive Metastore in a transactional manner directly in the Hive Metastore
database (MySQL), and refresh the Presto hive cache. With this approach, queries do not need to know about the
version change and full separation of the mutability logic from the query is achieved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See much more details in the &lt;a href=&quot;https://vimeo.com/333759030&quot;&gt;video&lt;/a&gt;, &lt;a href=&quot;https://www.slideshare.net/OriReshef/mutable-data-scale&quot;&gt;slides&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;varada-join-optimization-and-dynamic-filtering&quot;&gt;Varada, Join Optimization and Dynamic filtering&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/romanzeyde/&quot;&gt;Roman Zeyde&lt;/a&gt; is Varada’s Presto architect. Roman has a unique
algorithmic background being a Talpiot graduate and an ex-Googler.&lt;/p&gt;

&lt;p&gt;Roman’s talk discussed a new approach to make Joins work faster. Varada will contribute Roman’s work on dynamic
filtering back to the community. Stay tuned :)&lt;/p&gt;

&lt;p&gt;The talk went over the following major topics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Presto Cost Based Optimizer feature as a basis for Join optimization&lt;/li&gt;
  &lt;li&gt;Join optimzation strategies&lt;/li&gt;
  &lt;li&gt;Dynamic filtering in the application for join optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/varada2.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Roman’s &lt;a href=&quot;https://vimeo.com/331946107&quot;&gt;talk&lt;/a&gt;, &lt;a href=&quot;https://www.slideshare.net/OriReshef/dynamic-filtering-for-presto-join-optimisation&quot;&gt;slides&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;qa-session&quot;&gt;Q&amp;amp;A session&lt;/h1&gt;

&lt;p&gt;The event finished by an hour-long Q&amp;amp;A session led by &lt;a href=&quot;https://www.linkedin.com/in/demibenari/&quot;&gt;Demi Ben-Ari&lt;/a&gt;, VP R&amp;amp;S at
&lt;a href=&quot;https://www.panorays.com/&quot;&gt;Panorays&lt;/a&gt; and co-founder of Big Things, an Israeli Meetup group having 5000 people listed,
all fans of Big data technologies.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/qa.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;See you all in the Second international Presto Conference in Tel Aviv!&lt;/p&gt;</content>

      
        <author>
          <name>Ori Reshef, VP Product, Varada</name>
        </author>
      

      <summary>Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals” The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month. The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room. We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/Israel-2019/audience.jpg" />
      
    </entry>
  
    <entry>
      <title>Release 309</title>
      <link href="https://trino.io/blog/2019/04/25/release-309.html" rel="alternate" type="text/html" title="Release 309" />
      <published>2019-04-25T00:00:00+00:00</published>
      <updated>2019-04-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/25/release-309</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/25/release-309.html">&lt;p&gt;This version adds support for case-insensitive name matching in
JDBC-based connectors, more data types in
&lt;a href=&quot;https://trino.io/docs/current/connector/postgresql.html&quot;&gt;PostgreSQL connector&lt;/a&gt;,
and some bug fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-309.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Even Faster ORC</title>
      <link href="https://trino.io/blog/2019/04/23/even-faster-orc.html" rel="alternate" type="text/html" title="Even Faster ORC" />
      <published>2019-04-23T00:00:00+00:00</published>
      <updated>2019-04-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/23/even-faster-orc</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/23/even-faster-orc.html">&lt;p&gt;Trino is known for being the fastest SQL on Hadoop engine, and our custom ORC
reader implementation is a big reason for this speed – now it is even faster!&lt;/p&gt;

&lt;h2 id=&quot;why-is-this-important&quot;&gt;Why is this important?&lt;/h2&gt;

&lt;p&gt;For the TPC-DS benchmark, the new reader reduced the global query time by ~5%
and CPU usage by ~9%, which improves user experience while reducing the cost.&lt;/p&gt;

&lt;h2 id=&quot;what-improved&quot;&gt;What improved?&lt;/h2&gt;

&lt;p&gt;ORC uses a two step system to decode data. The first step is a traditional
compression algorithm like gzip that generically reduces data size. The second
step has data type specific compression algorithms that convert the raw bytes
into values (e.g., text, numbers, timestamps). It is this latter step that we
improved.&lt;/p&gt;

&lt;h2 id=&quot;how-much-faster-is-the-decoder&quot;&gt;How much faster is the decoder?&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/orc-speedup.svg&quot; alt=&quot;ORC Speedup&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-exactly-is-this-faster&quot;&gt;Why exactly is this faster?&lt;/h2&gt;

&lt;p&gt;Explaining why the new code is faster requires a brief explanation of the
existing code. In the old code, a typical value reader looked like this:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;skip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RunLengthEncodedBlock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;nextBit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;appendNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This code does a few things well. First, for the &lt;em&gt;all values are null&lt;/em&gt; case, it
returns a run length encoded block which has custom optimizations throughout
Trino (this &lt;a href=&quot;https://github.com/trinodb/trino/pull/229&quot;&gt;optimization&lt;/a&gt; was
recently added by &lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;Praveen Krishna&lt;/a&gt;). Secondly,
it separates the unconditional &lt;em&gt;no nulls&lt;/em&gt; loop from the conditional &lt;em&gt;mixed nulls&lt;/em&gt;
loop. It is common to have a column without nulls, so it makes sense to split
this out, since unconditional loops are faster than conditional loops.&lt;/p&gt;

&lt;p&gt;On the downside, this code has several performance issues:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Many data encodings can be efficiently read in bulk, but this code reads one
value at a time.&lt;/li&gt;
  &lt;li&gt;In some cases, the code can be called with different type instances, which
result in slow dynamic dispatch call sites in the loop.&lt;/li&gt;
  &lt;li&gt;Value reading in the null loop is conditional, which is expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;optimize-for-bulk-reads&quot;&gt;Optimize for bulk reads&lt;/h3&gt;

&lt;p&gt;As you can see from the code above, Trino is always loading values in batches
(typically 1024). This makes the reader and the downstream code more efficient as
the overhead of processing data is amortized over the batch, and in some cases
data can be processed in parallel. ORC has a small number of low level decoders
for booleans, numbers, bytes and so on. These encodings are optimized for each
data type, which means each must be optimized individually. In some cases, the
decoders already had internal batch output buffers, so the optimization was
trivial. In another equally trivial case, we changed the float and double stream
decoders from loading a value byte at a time to bulk loading an entire array of
values directly from the input and improved the performance more than 10x.&lt;/p&gt;

&lt;p&gt;Some changes, however, were significantly more complex. One example is the
boolean reader, which was changed from decoding a single bit at a time to
decoding 8 bits at a time. This sounds simple, but in practice doing this
efficiently is complex, since reads are not aligned to 8 bits, and there is the
general problem of forming JVM friendly loops. For those interested, the code is
&lt;a href=&quot;https://github.com/trinodb/trino/blob/308/presto-orc/src/main/java/io/prestosql/orc/stream/BooleanInputStream.java#L218&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;avoid-dynamic-dispatch-in-loops&quot;&gt;Avoid dynamic dispatch in loops&lt;/h3&gt;

&lt;p&gt;This is the kind of problem that is not obvious when reading code, and it is
easily missed in benchmarks. The core problem happens when you have a loop
containing a method call whose target class can vary over the lifetime of the
execution. For example, this simple loop from above may or may not be fast,
depending on how many different classes it sees for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; across multiple
executions:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Most of the ORC column readers can only be called with a single type
implementation, but the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LongStreamReader&lt;/code&gt; is called with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BIGINT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SMALLINT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TINYINT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DATE&lt;/code&gt; types. This causes the JVM to generate a dynamic
dispatch in the core of the loop. Besides the obvious extra work to select the
target code and branch prediction problems, dynamic dispatch calls are normally
not inlined, which disables many powerful optimizations in the JVM. The good news
is that the fix is trivial:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BigintType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;IntegerType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The hard part is knowing that this is a problem. The existing benchmarks for ORC
only tested a single type at a time, which allowed the JVM to inline the target
method and produce much more optimal code. In this case, we happen to know that
the code is being invoked with multiple types, so we updated the benchmark to
warm up the JVM with multiple types before benchmarking.&lt;/p&gt;

&lt;p&gt;For more information on this kind of optimization, I suggest reading Aleksey
Shipilëv’s blog posts on JVM performance. Specifically, &lt;a href=&quot;https://shipilev.net/blog/2015/black-magic-method-dispatch&quot;&gt;The Black Magic of (Java)
Method Dispatch&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;improve-null-reading&quot;&gt;Improve null reading&lt;/h3&gt;

&lt;p&gt;With the above improvements, we were getting great performance of 0.5ns to 3ns
per value for most types without nulls, but the benchmarks with nulls were taking
an additional ~6ns per value. Some of that is expected, since we must decode the
additional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;present&lt;/code&gt; boolean stream, but booleans decode at a rate of ~0.5ns per
value, so that isn’t the problem. &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;
and I built and benchmarked many different implementations, but we only found one
with really good performance.&lt;/p&gt;

&lt;p&gt;The first implementation we built was simply to bulk read a null array, bulk read
the values packed into the front of an array, and then spread the nulls across
the array:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// bulk read and count null values&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUnsetBits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// bulk read non-values into an array large enough for full results&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;longNonNullValueTemp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// copy non-null values into output position (in reverse order)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;--)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;--;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is better because it always bulk reads the values, but there is still a ~4ns
per value penalty for nulls. We haven’t been able to explain why it happens, but
we’ve observed that the number drops dramatically after we adjusted the code to
assign to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[outputPosition]&lt;/code&gt; outside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt; block. We can’t do that
in-place, as in the snippet above, so we introduce a temporary buffer:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// bulk read and count null values&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUnsetBits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// bulk read non-values into a temporary array&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tempBuffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// copy values into result&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempBuffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With this change, the null penalty drops to ~1.5ns per value, which is reasonable
given that just reading the null flag counts ~0.5ns per value. There are two
downsides to this approach. Obviously, there is an extra temporary buffer, but
since the reader is single threaded, we can reuse it for the whole file read.
Secondly, the null values are no longer zero. This should not be a problem for
correctly written code, but could potentially trigger latent bugs. We did find
another approach that left the nulls unset, but it was a bit slower and required
another temp buffer, so we settled on this approach.&lt;/p&gt;

&lt;h2 id=&quot;how-much-will-my-setup-improve&quot;&gt;How much will my setup improve?&lt;/h2&gt;

&lt;p&gt;We tested the performance using the standard TPC-DS and TPC-H benchmarks on zlib
compressed ORC files:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Benchmark&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Duration&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;CPU&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;TPC-DS&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;5.6%&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;9.3%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;TPC-H&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4.5%&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8.3%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;There are a number of reasons you may get a larger or smaller win:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The exact queries matter: In the benchmarks above, some queries saved more than
20% CPU and others only saved 1%.&lt;/li&gt;
  &lt;li&gt;The compression matters: In our tests we used zlib, which is the most expensive
compression supported by ORC. Compression algorithms that use less CPU (e.g.,
Zstd, LZ4, or Snappy) will generally see larger relative improvements.&lt;/li&gt;
  &lt;li&gt;This improvement is only in &lt;a href=&quot;https://trino.io/download.html&quot;&gt;Trino 309+&lt;/a&gt;,
so if you are using an earlier version you will need to upgrade. Also, if you are
still using Facebook’s version of Presto, you can either upgrade to Trino 309+ or
wait to see if they backport it.&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Dain Sundstrom, Martin Traverso</name>
        </author>
      

      <summary>Trino is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/orc-speedup.png" />
      
    </entry>
  
    <entry>
      <title>Release 308</title>
      <link href="https://trino.io/blog/2019/04/12/release-308.html" rel="alternate" type="text/html" title="Release 308" />
      <published>2019-04-12T00:00:00+00:00</published>
      <updated>2019-04-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/12/release-308</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/12/release-308.html">&lt;p&gt;This version includes significant 
&lt;a href=&quot;/blog/2019/04/23/even-faster-orc.html&quot;&gt;performance improvements&lt;/a&gt;
when reading ORC data, authorization checks for 
&lt;a href=&quot;https://trino.io/docs/current/sql/show-columns.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW COLUMNS&lt;/code&gt;&lt;/a&gt;,
and limit pushdown for JDBC-based connectors.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-308.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes significant performance improvements when reading ORC data, authorization checks for SHOW COLUMNS, and limit pushdown for JDBC-based connectors. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 307</title>
      <link href="https://trino.io/blog/2019/04/08/release-307.html" rel="alternate" type="text/html" title="Release 307" />
      <published>2019-04-08T00:00:00+00:00</published>
      <updated>2019-04-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/08/release-307</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/08/release-307.html">&lt;p&gt;This version includes some important security fixes, support for inner and outer
joins involving lateral derived tables (&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#lateral&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt;&lt;/a&gt;),
new syntax for setting &lt;a href=&quot;https://trino.io/docs/current/sql/comment.html&quot;&gt;table comments&lt;/a&gt;, and performance
improvements.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-307.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes some important security fixes, support for inner and outer joins involving lateral derived tables (LATERAL), new syntax for setting table comments, and performance improvements. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-04-03</title>
      <link href="https://trino.io/blog/2019/04/03/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-04-03" />
      <published>2019-04-03T00:00:00+00:00</published>
      <updated>2019-04-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/03/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/03/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/VQhDBPltUyk&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Memory management&lt;/li&gt;
  &lt;li&gt;Spilling&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Memory management Spilling</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 306</title>
      <link href="https://trino.io/blog/2019/03/16/release-306.html" rel="alternate" type="text/html" title="Release 306" />
      <published>2019-03-16T00:00:00+00:00</published>
      <updated>2019-03-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/16/release-306</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/16/release-306.html">&lt;p&gt;This version includes some bug fixes, as well as performance improvements when decoding ORC data.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-306.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes some bug fixes, as well as performance improvements when decoding ORC data. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-03-13</title>
      <link href="https://trino.io/blog/2019/03/13/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-03-13" />
      <published>2019-03-13T00:00:00+00:00</published>
      <updated>2019-03-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/13/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/13/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/hMmFM1MBEB8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Dynamic Filtering&lt;/li&gt;
  &lt;li&gt;Changes to TIMESTAMP semantics&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Dynamic Filtering Changes to TIMESTAMP semantics</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 305</title>
      <link href="https://trino.io/blog/2019/03/08/release-305.html" rel="alternate" type="text/html" title="Release 305" />
      <published>2019-03-08T00:00:00+00:00</published>
      <updated>2019-03-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/08/release-305</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/08/release-305.html">&lt;p&gt;Changes in this version include peak-memory awareness in
&lt;a href=&quot;https://trino.io/docs/current/optimizer/cost-based-optimizations.html&quot;&gt;cost-based optimizer&lt;/a&gt;,
improved handling of CSV output in &lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;CLI&lt;/a&gt;,
and performance improvements for Parquet.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-305.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>Changes in this version include peak-memory awareness in cost-based optimizer, improved handling of CSV output in CLI, and performance improvements for Parquet. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 304</title>
      <link href="https://trino.io/blog/2019/02/27/release-304.html" rel="alternate" type="text/html" title="Release 304" />
      <published>2019-02-27T00:00:00+00:00</published>
      <updated>2019-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/27/release-304</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/27/release-304.html">&lt;p&gt;New features include &lt;a href=&quot;https://trino.io/docs/current/admin/spill.html&quot;&gt;spilling&lt;/a&gt; for queries
that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive 
&lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#procedures&quot;&gt;procedure&lt;/a&gt; to synchronize 
partition metadata with the file system.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-304.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-02-27</title>
      <link href="https://trino.io/blog/2019/02/27/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-02-27" />
      <published>2019-02-27T00:00:00+00:00</published>
      <updated>2019-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/27/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/27/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/7bclzfYUfQg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Pushdown of complex operations (filter, project, join, etc.)&lt;/li&gt;
  &lt;li&gt;Coordinator high availability&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Pushdown of complex operations (filter, project, join, etc.) Coordinator high availability</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 303</title>
      <link href="https://trino.io/blog/2019/02/14/release-303.html" rel="alternate" type="text/html" title="Release 303" />
      <published>2019-02-14T00:00:00+00:00</published>
      <updated>2019-02-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/14/release-303</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/14/release-303.html">&lt;p&gt;This version includes bug fixes and performance improvements.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-303.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes bug fixes and performance improvements. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 302</title>
      <link href="https://trino.io/blog/2019/02/06/release-302.html" rel="alternate" type="text/html" title="Release 302" />
      <published>2019-02-06T00:00:00+00:00</published>
      <updated>2019-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/06/release-302</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/06/release-302.html">&lt;p&gt;New features include native support for 
&lt;a href=&quot;https://trino.io/docs/current/connector/hive-gcs-tutorial.html&quot;&gt;Google Cloud Storage&lt;/a&gt; 
and a connector for 
&lt;a href=&quot;https://trino.io/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-302.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include native support for Google Cloud Storage and a connector for Elasticsearch. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-02-06</title>
      <link href="https://trino.io/blog/2019/02/06/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-02-06" />
      <published>2019-02-06T00:00:00+00:00</published>
      <updated>2019-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/06/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/06/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/YfDe_YVzMyI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;About the Foundation&lt;/li&gt;
  &lt;li&gt;Getting involved&lt;/li&gt;
  &lt;li&gt;Summary of new features&lt;/li&gt;
  &lt;li&gt;Top requested features&lt;/li&gt;
  &lt;li&gt;Release verification&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda About the Foundation Getting involved Summary of new features Top requested features Release verification</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 301</title>
      <link href="https://trino.io/blog/2019/01/31/release-301.html" rel="alternate" type="text/html" title="Release 301" />
      <published>2019-01-31T00:00:00+00:00</published>
      <updated>2019-01-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/01/31/release-301</id>
      <content type="html" xml:base="https://trino.io/blog/2019/01/31/release-301.html">&lt;p&gt;New features include role-based access control and 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-role.html&quot;&gt;role management&lt;/a&gt;, 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-view.html#security&quot;&gt;invoker security&lt;/a&gt;
mode for views, and &lt;a href=&quot;https://trino.io/docs/current/sql/analyze.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ANALYZE&lt;/code&gt;&lt;/a&gt;
syntax for collecting table statistics.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-301.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include role-based access control and role management, invoker security mode for views, and ANALYZE syntax for collecting table statistics. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Software Foundation Launch</title>
      <link href="https://trino.io/blog/2019/01/31/presto-software-foundation-launch.html" rel="alternate" type="text/html" title="Presto Software Foundation Launch" />
      <published>2019-01-31T00:00:00+00:00</published>
      <updated>2019-01-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/01/31/presto-software-foundation-launch</id>
      <content type="html" xml:base="https://trino.io/blog/2019/01/31/presto-software-foundation-launch.html">&lt;p&gt;We are pleased to &lt;a href=&quot;https://www.prweb.com/releases/prweb16070792.htm&quot;&gt;announce&lt;/a&gt;
the launch of the Presto Software Foundation,
a not-for-profit organization dedicated to the advancement of the Presto
open source distributed SQL engine. The foundation is committed to ensuring
the project remains open, collaborative and independent for decades to come.&lt;/p&gt;

&lt;p&gt;We started the Presto project in 2012 as a small team at Facebook,
with the goals of building a high performance, standards compliant, easy-to-use
and dependable query engine capable of scaling to the largest datasets
(exabyte scale) in the world. From day one, we designed and developed Presto
to be maintained by an independent open source community.&lt;/p&gt;

&lt;p&gt;In 2013, we released Presto under the Apache License and opened development to the public.
Since then, the Presto community has expanded globally, with developers in
Brazil, China, Germany, India, Israel, Japan, Poland, Singapore, the U.S., the U.K.,
and more. In recent years, the center of gravity of the Presto community has shifted,
with the majority of contributions now coming from developers outside of Facebook.&lt;/p&gt;

&lt;p&gt;From the beginning, we stressed the importance of code quality, architectural
extensibility, and open collaboration with the community. With the rapid expansion
of both the Presto user base and Presto developer community over the last several
years, establishing a non-profit to institutionalize these values is the next
logical step to ensure that this project stands the test of time.&lt;/p&gt;

&lt;p&gt;The foundation is dedicated to preserving the vision of high quality, performant
and dependable software developed by an open, collaborative and independent
community of developers throughout the world. Everyone is welcome to participate,
whether it be via code contributions, suggestions for improvements, or bug reports.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.</summary>

      
      
    </entry>
  
</feed>
