Historical RFM

📊 Business Context

What is it about?

RFM segmentation classifies customers according to their individual buying behavior.

Historical RFM stores the RFM value of a customer for a given period of time, e.g. yearly or quarterly.

By default, we compute an RFM segment from 30 days ago, and from 90 days ago.

👉

See RFM Segmentation for more details.

What are the main use cases?

Analyze the effectiveness of your marketing actions in a given period by monitoring the evolution of RFM segment size.

Analyze how each individual customer behavior evolves in time.

Target customers according to their RFM “trend”, meaning the evolution of the segment they belong to (from A to C).

What business impact can I expect?

Impact on: Retention rate, Number of orders per contact, Marketing effectiveness

Level of impact: ⭐️⭐️️⭐️⭐️️⭐️️

↔️ Inputs & Outputs

Data input

You need a dataset made of Orders with at least the following attributes :

order_id: unique identifier of the transaction.

customer_id: unique identifier of the customer who made the purchase.

order_date: date at which the purchase was made.

order_amount: total amount that was spent by the customer for this order.

You will also need a Customers dataset with the following attributes

customer_id: unique identifier of the customer who made the purchase.

User Settings

The user can choose:

The periodicity of historicalization: yearly, quarterly or monthly.

To restrict orders taken into account to a given timeframe (like the last 3 years).

Expected output

The output of this recipe is a new Dataset containing a row by time period (year, quarter, or month) for each of your contacts who made at least one purchase.

In this Dataset, besides contact_id you will have 4 output columns :

rfm_recency: this is the recency scoring of the contact, meaning the R value in RFM.
rfm_frequency_monetary: this scoring covers both Frequency (F) and Monetary (M) of RFM.
rfm_label: the segment label for the contact defined according to the values of R and FxM.
previous_rfm_label: the segment label of the previous time period.

You can then make a join between your new RFM Dataset and your existing Contacts Dataset to map your rfm_labeland previous_rfm_label for instance.

🖥️ Implementation in Octolis

1. Prepare your input data

Make sure that you have built an Dataset containing all your orders.

👉 Create a dataset

In case you have several order Sources (e.g. eShop & PoS) you need to group all sources into an orders master Dataset.

👉 Add a source to an existing dataset

2. Apply the SQL Template “Historical RFM”

Create a new Dataset using SQL expert mode.

Copy / past the following SQL template.

After pasting the template, make sure to modify all variables with the names of your input columns.

Add your Historical RFM Dataset as a source of your Contacts Dataset, map the needed fields to import them into your Contacts Dataset.

sql
/* Historical RFM SQL Template */

WITH periodicity_setting AS (
    -- Can be managed by templating
    SELECT
        /* Choose one of the bellow periodicity. */
        -- 'Yearly'                                     AS "periodicity",
        -- 'Quarterly'                                  AS "periodicity",
        'Monthly'                                       AS "periodicity",
        DATE_TRUNC('month', NOW() - INTERVAL '1 month') AS "start_date" -- Choose a date that is included in the start period.
),

     constants AS (SELECT "start_date",
                          (CASE
                               WHEN "periodicity" = 'Yearly' THEN 'YYYY'
                               WHEN "periodicity" = 'Quarterly' THEN 'YYYY Q'
                               ELSE 'YYYY MM' END
                              ) AS "period_format",
                          (CASE
                               WHEN "periodicity" = 'Yearly' THEN INTERVAL '1 year'
                               WHEN "periodicity" = 'Quarterly' THEN INTERVAL '3 months'
                               ELSE INTERVAL '1 month' END
                              ) AS "period_interval"
                   FROM periodicity_setting),

     orders_with_period AS (SELECT "contact_id",
                                   "created_at"::TIMESTAMP,
                                   "total_amount"::FLOAT,
                                   "start_date",
                                   "period_format",
                                   "period_interval",
                                /* Assign '0000' period for records that belong to older periods than the start period. */
                                   (CASE
                                        WHEN TO_CHAR("created_at", "period_format") <
                                             TO_CHAR("start_date", "period_format") THEN '0000'
                                        ELSE TO_CHAR("created_at", "period_format")
                                       END
                                       ) AS "period"
                            FROM raw.orders
                                     CROSS JOIN constants
                            WHERE
                                /* Optional filters depending on your dataset */
                                "created_at"::TIMESTAMP BETWEEN CURRENT_DATE - INTERVAL '3 years' AND CURRENT_DATE
                              AND "total_amount" > 0
                              AND "completed" IS TRUE
                              AND TO_CHAR("created_at"::TIMESTAMP, "period_format") <> TO_CHAR(NOW(), "period_format") -- Skip current period which is the uncompleted one
     ),

-- Placeholders to ensure that there will be an RFM row for each period once a user has placed an order
     placeholders AS (SELECT TO_CHAR(
                                     GENERATE_SERIES(
                                             (CASE
                                                 WHEN "start_date" > MIN("created_at") THEN "start_date"
                                                 ELSE DATE_TRUNC('month', MIN("created_at"))
                                                 END),
                                             DATE_TRUNC('month', NOW() - "period_interval"),
                                             "period_interval"
                                         ),
                                     "period_format"
                                 )           AS "period",
                             "contact_id",
                             TO_TIMESTAMP(0) AS "last_order_date",
                             0               AS "count_order",
                             0               AS "sum_amount"
                      FROM orders_with_period
                      GROUP BY "contact_id", "start_date", "period_interval", "period_format"),

     aggregates_per_period AS (SELECT "contact_id",
                                      "period",
                                      MAX("created_at")   AS "last_order_date",
                                      COUNT(*)            AS "count_order",
                                      SUM("total_amount") AS "sum_amount"
                               FROM orders_with_period
                               GROUP BY "contact_id", "period"),

     pre_running_aggregates AS (SELECT "contact_id",
                                       "period",
                                       COALESCE(aggs."last_order_date", plc."last_order_date") AS "last_order_date",
                                       COALESCE(aggs."count_order", 0)                         AS "count_order",
                                       COALESCE(aggs."sum_amount", 0)                          AS "sum_amount"
                                FROM aggregates_per_period aggs
                                         FULL JOIN placeholders plc USING ("contact_id", "period")),

     running_aggregates AS (SELECT DISTINCT ON ("contact_id", "period") "contact_id",
                                                                        "period",
                                                                        MAX("last_order_date") OVER (PARTITION BY "contact_id" ORDER BY "period") AS "last_order_date",
                                                                        SUM("count_order") OVER (PARTITION BY "contact_id" ORDER BY "period")     AS "count_order",
                                                                        SUM("sum_amount") OVER (PARTITION BY "contact_id" ORDER BY "period")      AS "sum_amount"
                            FROM pre_running_aggregates),

     pre_rfm_scores AS (SELECT "contact_id",
                               "period",
                               "last_order_date",
                               "count_order",
                               (CASE WHEN "count_order" <> 0 THEN "sum_amount" / "count_order" ELSE 0 END) AS "avg_amount"
                        FROM running_aggregates),

     three_demensional_rfm_scores AS (SELECT "contact_id",
                                             "period",
                                             NTILE(6) OVER (PARTITION BY "period" ORDER BY "last_order_date", "contact_id") AS "rfm_recency",
                                             NTILE(6) OVER (PARTITION BY "period" ORDER BY "count_order", "contact_id") AS "rfm_frequency",
                                             NTILE(6) OVER (PARTITION BY "period" ORDER BY "avg_amount", "contact_id") AS "rfm_monetary"
                                      FROM pre_rfm_scores),

     two_dimensional_rfm_scores AS (SELECT "contact_id",
                                           "period",
                                           "rfm_recency",
                                           NTILE(6)
                                           OVER (PARTITION BY "period" ORDER BY "rfm_frequency" * "rfm_monetary", "contact_id") AS "rfm_frequency_monetary"
                                    FROM three_demensional_rfm_scores),

     rfm_scores_and_label AS (SELECT "contact_id",
                                     "period",
                                     "rfm_recency",
                                     "rfm_frequency_monetary",
                                     (CASE
                                          WHEN "rfm_recency" IN (5, 6) AND "rfm_frequency_monetary" IN (5, 6) THEN 'Top Customers'
                                          WHEN "rfm_recency" IN (3, 4, 5, 6) AND "rfm_frequency_monetary" IN (4, 5, 6) THEN 'Loyal Customers'
                                          WHEN "rfm_recency" IN (4, 5, 6) AND "rfm_frequency_monetary" IN (2, 3) THEN 'Potential Loyalist'
                                          WHEN "rfm_recency" IN (4, 5, 6) AND "rfm_frequency_monetary" IN (1) THEN 'Promising Customers'
                                          WHEN "rfm_recency" IN (3) AND "rfm_frequency_monetary" IN (3) THEN 'Needing Attention'
                                          WHEN "rfm_recency" IN (3) AND "rfm_frequency_monetary" IN (1, 2) THEN 'About To Sleep'
                                          WHEN "rfm_recency" IN (1, 2) AND "rfm_frequency_monetary" IN (3, 4, 5, 6) THEN 'At risk'
                                          WHEN "rfm_recency" IN (1, 2) AND "rfm_frequency_monetary" IN (1, 2) THEN 'Lost'
                                         END) AS "rfm_label"
                              FROM two_dimensional_rfm_scores),

     with_previous_label AS (SELECT "contact_id",
                                    "period",
                                    "rfm_recency",
                                    "rfm_frequency_monetary",
                                    "rfm_label",
                                    (LAG("rfm_label", 1) OVER (PARTITION BY "contact_id" ORDER BY "period")) AS "previous_rfm_label"
                             FROM rfm_scores_and_label)

SELECT "contact_id",
       "period",
       "rfm_recency",
       "rfm_frequency_monetary",
       "rfm_label",
       "previous_rfm_label"
FROM with_previous_label
WHERE "period" <> '0000'
ORDER BY "contact_id", "period";