RFI Segmentation

📊 Business Context

What is it about?

The objective is to segment customers according to this progress in the purchasing journey. We need to know for each customer whether he or she is ahead, on time, or behind his or her individual purchasing rhythm. It concerns customers with at least 3 purchases.

Customers are grouped into several segments: recent purchase, soon to be reached, overdue, significantly overdue, and inactive.

This segmentation could be decomposed by product categories, which could make sense when there are some categories with products consumed on a recurring basis.

What are the main use cases?

Stimulate the re-purchase of overdue customers by setting up triggers or specific reactivation campaigns.

Implement multichannel anti-churn actions for significantly overdue and inactive customers.

Manage the activity by analyzing the share of early and late customers.

What business impact can I expect?

Impact on: Retention, LTV

Level of impact: ⭐️⭐️️⭐️️

↔️ Inputs & Outputs

Data input

All you need as an input is a single dataset made of Orders with at least the following attributes :

order_id: the unique identifier of the transaction.

customer_id: the unique identifier of the customer who made the purchase.

order_date: date at which the purchase was made.

order_amount: the total amount that was spent by the customer for this order.

[Option] You might also provide a list of product category ids.

User Settings

The user can:

Adjust RFI segment labels and thresholds (see default values in Our approach > Step 3 - RFI segmentation ).

Add a list of product categories to compute an additional rfi_label for each of them. This category RFI will be based on the global threshold but computed only on purchases of products of this category.

Our approach

Step 1 - Compute delays

Step 2 - Assess purchasing rhythm

Step 3 - RFI Segmentation

Expected output

The output of this recipe is a new Dataset containing a single row of contacts who made at least 3 purchases.

In this Dataset, besides contact_id you will have 2 output columns :

rfi_label: this is the name of the global RFI segment.
rfi_by_cat: concatenation of category names and rfi_labels under the format "cat1":"RFI SEGMENT","cat2":"RFI SEGMENT" (optional)
time_since_last_order : see definition in Our approach
avg_delay_between_orders : see definition in Our approach
purchasing_rythm : see definition in Our approach

You can then make a join between your new RFI Dataset and your existing Contacts Dataset to map the global rfi_label in addition to the rfi_by_cat.

🖥️ Implementation in Octolis

1. Prepare your input data

Make sure that you have built an Dataset containing all your orders.

👉 Create a dataset

In case you have several order Sources (e.g. eShop & PoS) you need to group all sources into an orders master Dataset.

👉 Add a source to an existing dataset

2. Apply the SQL Template “RFI Segmentation”

Create a new Dataset using SQL expert mode.

Copy / past the following SQL template.

After pasting the template, make sure to modify all variables with the names of your input columns.

Add your RFI Segmentation Dataset as a source of your Contacts Dataset, map the needed fields to import them into your Contacts Dataset.

sql
/* RFI Segmentation SQL Template */

WITH filtered_orders AS (SELECT "contact_id",
                                "order_id",
                                DATE_TRUNC('day', "created_at")::TIMESTAMP AS "created_at"
                         FROM raw.orders
                         WHERE
                             /* Optional filters depending on your dataset */
                             "total_amount" > 0
                           AND "completed" IS TRUE),

     general_orders AS (SELECT DISTINCT ON ("contact_id", "created_at") *
                        FROM filtered_orders),

     categorical_orders AS (SELECT DISTINCT ON ("contact_id", "created_at", p."category") o.*,
                                                                                          p."category" AS "product_category"
                            FROM filtered_orders o
                                     INNER JOIN raw.ordersitems USING ("order_id")
                                     INNER JOIN raw.products p USING ("product_id")),

     with_previous_order_date AS (SELECT "contact_id",
                                         "created_at",
                                         NULL AS "product_category",
                                         (LAG("created_at", 1) OVER (PARTITION BY "contact_id" ORDER BY "created_at")) AS "previous_order_date"
                                  FROM general_orders
                                  UNION ALL
                                  SELECT "contact_id",
                                         "created_at",
                                         "product_category",
                                         (LAG("created_at", 1) OVER (PARTITION BY "contact_id", "product_category" ORDER BY "created_at")) AS "previous_order_date"
                                  FROM categorical_orders
                                  WHERE "product_category" IN ('cat_1', 'cat_2', 'cat_3')),

     delays AS (SELECT "contact_id",
                       "product_category",
                        COUNT(*) AS "orders_count",
                        DATE_PART('day', NOW() - MAX("created_at")) AS "time_since_last_order", ROUND(AVG(DATE_PART('day', "created_at" - "previous_order_date"))) AS "avg_delay_between_orders"
                FROM with_previous_order_date
                GROUP BY "contact_id", "product_category"),

     with_purchasing_rythm AS (SELECT "contact_id",
                                      "product_category",
                                      "time_since_last_order",
                                      "avg_delay_between_orders",
                                      (ROUND(CAST("time_since_last_order" / "avg_delay_between_orders" AS NUMERIC),1)) AS "purchasing_rythm"
                               FROM delays
                               WHERE "orders_count" >= 3
                                 AND "avg_delay_between_orders" <> 0),

     with_rfi_label AS (SELECT "contact_id",
                               "product_category",
                               "time_since_last_order",
                               "avg_delay_between_orders",
                               "purchasing_rythm",
                               (CASE
                                    WHEN "purchasing_rythm" < 0.6 THEN 'Recent purchase'
                                    WHEN "purchasing_rythm" < 1 THEN 'Soon to be reached'
                                    WHEN "purchasing_rythm" < 1.5 THEN 'Overdue'
                                    WHEN "purchasing_rythm" < 2.5 THEN 'Significantly Overdue'
                                    ELSE 'Inactive' END
                                   ) AS "rfi_label"
                        FROM with_purchasing_rythm)

SELECT DISTINCT ON ("contact_id") "contact_id",
                                  "time_since_last_order",
                                  "avg_delay_between_orders",
                                  "purchasing_rythm",
                                  "rfi_label",
                                  STRING_AGG(FORMAT('"%s":"%s"', "product_category", "rfi_label"), ',')
                                  FILTER (WHERE "product_category" IS NOT NULL) OVER (PARTITION BY "contact_id" ORDER BY "product_category") AS "rfi_by_cat"
FROM with_rfi_label
ORDER BY "contact_id", "product_category" NULLS FIRST;

Value	Segment
< 60%	Recent purchase
< 100%	Soon to be reached
< 150%	Overdue
< 250%	Significantly Overdue
> 250%	Inactive