logo

Create a SQL Audience

Octolis uses Nunjucks, the Javascript implementation of Jinja, a widely used templating language.
This means that all the templating syntax from Jinja is available inside of Octolis SQL Audiences to generate powerful queries.
Octolis also extends this templating syntax with some functions often inspired by DBT (which also uses Jinja).
Especially, please note the {{ ref("Source name") }} syntax that should be used instead of directly referring to the Source table name.
This way, users don't have to know the actual table name of the Source to write queries.

Adapt/Optimize your query depending on the execution context

The $scope variable of a SQL Audience execution

  1. $scope='standalone':
    1. The SQL query runs in the context of either an entireAudience or a sql execution. All data from all Sources is fetched, and the SQL query runs againts them. All you need to do is using the usual SELECT ... FROM {{ ref("Source name") }} syntax.
  1. $scope='increment':
    1. The SQL query runs in the context of a connectionSource, syncSource, or webhookSource execution. The incremental data of only one Source enters the SQL Audience. Here it becomes interesting as this is THE way of adapting/improving your query. Why run a query against all data of all Sources when you know there are only a few new incremental changes?

The $origin_name variable and {{ ref($origin) }} table of a SQL Audience execution

  1. $origin_name is a variable equal to name of the Source that emitted the incremental changes.
  1. {{ ref($origin) }} is the table containing the incremental changes.
⚠️
This is only applicable when $scope='increment'

A first example

Let's say you compute some orders aggregates in a SQL Audience. A usual query would look like the following:
sql
SELECT orders.contact_masterid, orders.total_amount, orders.__modified_at__ as modified_at FROM {{ ref("Orders source") }} orders GROUP BY contact_masterid
Whatever the kind of execution, this will always work, as it always fetches all data from the orders Source and computes aggregates based on it.
Now let's take a step back. This is not efficient at all in the case of a connectionSource execution. Instead of only processing the incremental changes, it will work against all the Source data.
Let's adapt the query:
sql
SELECT orders.contact_masterid, orders.total_amount, orders.__modified_at__ as modified_at {% if $scope === "increment" %} FROM {{ ref($origin) }} {% else %} FROM {{ ref("Orders source") }} {% endif %} GROUP BY contact_masterid
Which is of course equivalent to:
sql
SELECT orders.contact_masterid, orders.total_amount, orders.__modified_at__ as modified_at FROM {% if $scope === "increment" %}{{ ref($origin) }}{% else %}{{ ref("Orders source") }}{% endif %} GROUP BY contact_masterid
This is already way better.
This simple syntax only works because there is only one source, otherwise, you would have to use:
{% if $scope === "increment" and $origin_name === 'Contacts source' %}
instead of only {% if $scope === "increment" %}.

A second example

Now let's say you have two contacts and order Sources that can trigger your SQL Audience.
A basic query would look like the following:
sql
SELECT contacts.__masterid__::text as contact_masterid, contacts.first_name, contacts.last_name, sum(orders.total_amount) as ltv, greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at FROM {{ ref('Contacts source') }} contacts LEFT JOIN {{ ref('Orders source') }} orders ON contacts.__masterid__::text = orders.contact_masterid;
Now let's optimize it so that we don't compute aggregates for all contacts again when there are only a few incremental changes coming from the contacts Source:
sql
SELECT contacts.__masterid__::text as contact_masterid, contacts.first_name, contacts.last_name, sum(orders.total_amount) as ltv, greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at {% if $scope === "increment" and $origin_name === 'Contacts source' %} FROM {{ ref($origin) }} contacts {% else %} FROM {{ ref("Orders source") }} {% endif %} LEFT JOIN {{ ref('Orders source') }} orders ON contacts.__masterid__::text = orders.contact_masterid;
Now let's optimize it also for the case when there are incremental changes coming from the orders Source:
sql
SELECT contacts.__masterid__::text as contact_masterid, contacts.first_name, contacts.last_name, sum(orders.total_amount) as ltv, greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at FROM {% if $scope === "increment" and $origin_name === 'Contacts source' %}{{ ref($origin) }}{% else %}{{ ref("Contacts source") }}{% endif %} contacts LEFT JOIN {% if $scope === "increment" and $origin_name === 'Orders source' %}{{ ref($origin) }}{% else %}{{ ref('Orders source') }}{% endif %} orders ON contacts.__masterid__::text = orders.contact_masterid {% if $scope === "increment" and $origin_name === 'Orders source' %} WHERE contacts.__masterid__::text IN (SELECT origin.contact_masterid FROM {{ ref($origin) }} AS origin) {% endif %}
And that's it! This way your query will always run in an optimized way over the mimimum required number of records.
πŸ’‘
This is how Octolis works internally in "builder" Audiences to ensure performance.

Warning

When you read this you maybe thought it makes sense to implement those optimizations into each and every SQL Audience. But this does not always make sense...
Let's take the RFM segment for instance: it should always be computed again each time there is a new order or a new contact.

Helpful?