RFI Segmentation

📊 Business Context


What is it about?
  • The objective is to segment customers according to this progress in the purchasing journey. We need to know for each customer whether he or she is ahead, on time, or behind his or her individual purchasing rhythm. It concerns customers with at least 3 purchases.
  • Customers are grouped into several segments: recent purchase, soon to be reached, overdue, significantly overdue, and inactive.
  • This segmentation could be decomposed by product categories, which could make sense when there are some categories with products consumed on a recurring basis.
What are the main use cases?
  • Stimulate the re-purchase of overdue customers by setting up triggers or specific reactivation campaigns.
  • Implement multichannel anti-churn actions for significantly overdue and inactive customers.
  • Manage the activity by analyzing the share of early and late customers.
What business impact can I expect?
  • Impact on: Retention, LTV
  • Level of impact: ⭐️⭐️️⭐️️

↔️ Inputs & Outputs


Data input
All you need as an input is a single dataset made of Orders with at least the following attributes :
  • order_id: the unique identifier of the transaction.
  • customer_id: the unique identifier of the customer who made the purchase.
  • order_date: date at which the purchase was made.
  • order_amount: the total amount that was spent by the customer for this order.
[Option] You might also provide a list of product category ids.
User Settings
The user can:
  • Adjust RFI segment labels and thresholds (see default values in Our approach > Step 3 - RFI segmentation ).
  • Add a list of product categories to compute an additional rfi_label for each of them. This category RFI will be based on the global threshold but computed only on purchases of products of this category.
Our approach
Step 1 - Compute delays
Step 2 - Assess purchasing rhythm
Step 3 - RFI Segmentation
Expected output
  • The output of this recipe is a new Dataset containing a single row of contacts who made at least 3 purchases.
  • In this Dataset, besides contact_id you will have 2 output columns :
    • rfi_label: this is the name of the global RFI segment.
    • rfi_by_cat: concatenation of category names and rfi_labels under the format "cat1":"RFI SEGMENT","cat2":"RFI SEGMENT" (optional)
    • time_since_last_order : see definition in Our approach
    • avg_delay_between_orders : see definition in Our approach
    • purchasing_rythm : see definition in Our approach
  • You can then make a join between your new RFI Dataset and your existing Contacts Dataset to map the global rfi_label in addition to the rfi_by_cat.

🖥️ Implementation in Octolis


1. Prepare your input data
Make sure that you have built an Dataset containing all your orders.
In case you have several order Sources (e.g. eShop & PoS) you need to group all sources into an orders master Dataset.
2. Apply the SQL Template “RFI Segmentation”
  • Create a new Dataset using SQL expert mode.
  • Copy / past the following SQL template.
  • After pasting the template, make sure to modify all variables with the names of your input columns.
  • Add your RFI Segmentation Dataset as a source of your Contacts Dataset, map the needed fields to import them into your Contacts Dataset.
sql
/* RFI Segmentation SQL Template */ WITH filtered_orders AS (SELECT "contact_id", "order_id", DATE_TRUNC('day', "created_at")::TIMESTAMP AS "created_at" FROM raw.orders WHERE /* Optional filters depending on your dataset */ "total_amount" > 0 AND "completed" IS TRUE), general_orders AS (SELECT DISTINCT ON ("contact_id", "created_at") * FROM filtered_orders), categorical_orders AS (SELECT DISTINCT ON ("contact_id", "created_at", p."category") o.*, p."category" AS "product_category" FROM filtered_orders o INNER JOIN raw.ordersitems USING ("order_id") INNER JOIN raw.products p USING ("product_id")), with_previous_order_date AS (SELECT "contact_id", "created_at", NULL AS "product_category", (LAG("created_at", 1) OVER (PARTITION BY "contact_id" ORDER BY "created_at")) AS "previous_order_date" FROM general_orders UNION ALL SELECT "contact_id", "created_at", "product_category", (LAG("created_at", 1) OVER (PARTITION BY "contact_id", "product_category" ORDER BY "created_at")) AS "previous_order_date" FROM categorical_orders WHERE "product_category" IN ('cat_1', 'cat_2', 'cat_3')), delays AS (SELECT "contact_id", "product_category", COUNT(*) AS "orders_count", DATE_PART('day', NOW() - MAX("created_at")) AS "time_since_last_order", ROUND(AVG(DATE_PART('day', "created_at" - "previous_order_date"))) AS "avg_delay_between_orders" FROM with_previous_order_date GROUP BY "contact_id", "product_category"), with_purchasing_rythm AS (SELECT "contact_id", "product_category", "time_since_last_order", "avg_delay_between_orders", (ROUND(CAST("time_since_last_order" / "avg_delay_between_orders" AS NUMERIC),1)) AS "purchasing_rythm" FROM delays WHERE "orders_count" >= 3 AND "avg_delay_between_orders" <> 0), with_rfi_label AS (SELECT "contact_id", "product_category", "time_since_last_order", "avg_delay_between_orders", "purchasing_rythm", (CASE WHEN "purchasing_rythm" < 0.6 THEN 'Recent purchase' WHEN "purchasing_rythm" < 1 THEN 'Soon to be reached' WHEN "purchasing_rythm" < 1.5 THEN 'Overdue' WHEN "purchasing_rythm" < 2.5 THEN 'Significantly Overdue' ELSE 'Inactive' END ) AS "rfi_label" FROM with_purchasing_rythm) SELECT DISTINCT ON ("contact_id") "contact_id", "time_since_last_order", "avg_delay_between_orders", "purchasing_rythm", "rfi_label", STRING_AGG(FORMAT('"%s":"%s"', "product_category", "rfi_label"), ',') FILTER (WHERE "product_category" IS NOT NULL) OVER (PARTITION BY "contact_id" ORDER BY "product_category") AS "rfi_by_cat" FROM with_rfi_label ORDER BY "contact_id", "product_category" NULLS FIRST;