Historical RFM

📊 Business Context

What is it about?
  • RFM segmentation classifies customers according to their individual buying behavior.
  • Historical RFM stores the RFM value of a customer for a given period of time, e.g. yearly or quarterly.
  • By default, we compute an RFM segment from 30 days ago, and from 90 days ago.
See RFM Segmentation for more details.
What are the main use cases?
  • Analyze the effectiveness of your marketing actions in a given period by monitoring the evolution of RFM segment size.
  • Analyze how each individual customer behavior evolves in time.
  • Target customers according to their RFM “trend”, meaning the evolution of the segment they belong to (from A to C).
What business impact can I expect?
  • Impact on: Retention rate, Number of orders per contact, Marketing effectiveness
  • Level of impact: ⭐️⭐️️⭐️⭐️️⭐️️

↔️ Inputs & Outputs

Data input
You need a dataset made of Orders with at least the following attributes :
  • order_id: unique identifier of the transaction.
  • customer_id: unique identifier of the customer who made the purchase.
  • order_date: date at which the purchase was made.
  • order_amount: total amount that was spent by the customer for this order.
You will also need a Customers dataset with the following attributes
  • customer_id: unique identifier of the customer who made the purchase.
User Settings
The user can choose:
  • The periodicity of historicalization: yearly, quarterly or monthly.
  • To restrict orders taken into account to a given timeframe (like the last 3 years).
Expected output
  • The output of this recipe is a new Dataset containing a row by time period (year, quarter, or month) for each of your contacts who made at least one purchase.
  • In this Dataset, besides contact_id you will have 4 output columns :
    • rfm_recency: this is the recency scoring of the contact, meaning the R value in RFM.
    • rfm_frequency_monetary: this scoring covers both Frequency (F) and Monetary (M) of RFM.
    • rfm_label: the segment label for the contact defined according to the values of R and FxM.
    • previous_rfm_label: the segment label of the previous time period.
  • You can then make a join between your new RFM Dataset and your existing Contacts Dataset to map your rfm_labeland previous_rfm_label for instance.

🖥️ Implementation in Octolis

1. Prepare your input data
Make sure that you have built an Dataset containing all your orders.
In case you have several order Sources (e.g. eShop & PoS) you need to group all sources into an orders master Dataset.
2. Apply the SQL Template “Historical RFM”
  • Create a new Dataset using SQL expert mode.
  • Copy / past the following SQL template.
  • After pasting the template, make sure to modify all variables with the names of your input columns.
  • Add your Historical RFM Dataset as a source of your Contacts Dataset, map the needed fields to import them into your Contacts Dataset.
/* Historical RFM SQL Template */ WITH periodicity_setting AS ( -- Can be managed by templating SELECT /* Choose one of the bellow periodicity. */ -- 'Yearly' AS "periodicity", -- 'Quarterly' AS "periodicity", 'Monthly' AS "periodicity", DATE_TRUNC('month', NOW() - INTERVAL '1 month') AS "start_date" -- Choose a date that is included in the start period. ), constants AS (SELECT "start_date", (CASE WHEN "periodicity" = 'Yearly' THEN 'YYYY' WHEN "periodicity" = 'Quarterly' THEN 'YYYY Q' ELSE 'YYYY MM' END ) AS "period_format", (CASE WHEN "periodicity" = 'Yearly' THEN INTERVAL '1 year' WHEN "periodicity" = 'Quarterly' THEN INTERVAL '3 months' ELSE INTERVAL '1 month' END ) AS "period_interval" FROM periodicity_setting), orders_with_period AS (SELECT "contact_id", "created_at"::TIMESTAMP, "total_amount"::FLOAT, "start_date", "period_format", "period_interval", /* Assign '0000' period for records that belong to older periods than the start period. */ (CASE WHEN TO_CHAR("created_at", "period_format") < TO_CHAR("start_date", "period_format") THEN '0000' ELSE TO_CHAR("created_at", "period_format") END ) AS "period" FROM raw.orders CROSS JOIN constants WHERE /* Optional filters depending on your dataset */ "created_at"::TIMESTAMP BETWEEN CURRENT_DATE - INTERVAL '3 years' AND CURRENT_DATE AND "total_amount" > 0 AND "completed" IS TRUE AND TO_CHAR("created_at"::TIMESTAMP, "period_format") <> TO_CHAR(NOW(), "period_format") -- Skip current period which is the uncompleted one ), -- Placeholders to ensure that there will be an RFM row for each period once a user has placed an order placeholders AS (SELECT TO_CHAR( GENERATE_SERIES( (CASE WHEN "start_date" > MIN("created_at") THEN "start_date" ELSE DATE_TRUNC('month', MIN("created_at")) END), DATE_TRUNC('month', NOW() - "period_interval"), "period_interval" ), "period_format" ) AS "period", "contact_id", TO_TIMESTAMP(0) AS "last_order_date", 0 AS "count_order", 0 AS "sum_amount" FROM orders_with_period GROUP BY "contact_id", "start_date", "period_interval", "period_format"), aggregates_per_period AS (SELECT "contact_id", "period", MAX("created_at") AS "last_order_date", COUNT(*) AS "count_order", SUM("total_amount") AS "sum_amount" FROM orders_with_period GROUP BY "contact_id", "period"), pre_running_aggregates AS (SELECT "contact_id", "period", COALESCE(aggs."last_order_date", plc."last_order_date") AS "last_order_date", COALESCE(aggs."count_order", 0) AS "count_order", COALESCE(aggs."sum_amount", 0) AS "sum_amount" FROM aggregates_per_period aggs FULL JOIN placeholders plc USING ("contact_id", "period")), running_aggregates AS (SELECT DISTINCT ON ("contact_id", "period") "contact_id", "period", MAX("last_order_date") OVER (PARTITION BY "contact_id" ORDER BY "period") AS "last_order_date", SUM("count_order") OVER (PARTITION BY "contact_id" ORDER BY "period") AS "count_order", SUM("sum_amount") OVER (PARTITION BY "contact_id" ORDER BY "period") AS "sum_amount" FROM pre_running_aggregates), pre_rfm_scores AS (SELECT "contact_id", "period", "last_order_date", "count_order", (CASE WHEN "count_order" <> 0 THEN "sum_amount" / "count_order" ELSE 0 END) AS "avg_amount" FROM running_aggregates), three_demensional_rfm_scores AS (SELECT "contact_id", "period", NTILE(6) OVER (PARTITION BY "period" ORDER BY "last_order_date", "contact_id") AS "rfm_recency", NTILE(6) OVER (PARTITION BY "period" ORDER BY "count_order", "contact_id") AS "rfm_frequency", NTILE(6) OVER (PARTITION BY "period" ORDER BY "avg_amount", "contact_id") AS "rfm_monetary" FROM pre_rfm_scores), two_dimensional_rfm_scores AS (SELECT "contact_id", "period", "rfm_recency", NTILE(6) OVER (PARTITION BY "period" ORDER BY "rfm_frequency" * "rfm_monetary", "contact_id") AS "rfm_frequency_monetary" FROM three_demensional_rfm_scores), rfm_scores_and_label AS (SELECT "contact_id", "period", "rfm_recency", "rfm_frequency_monetary", (CASE WHEN "rfm_recency" IN (5, 6) AND "rfm_frequency_monetary" IN (5, 6) THEN 'Top Customers' WHEN "rfm_recency" IN (3, 4, 5, 6) AND "rfm_frequency_monetary" IN (4, 5, 6) THEN 'Loyal Customers' WHEN "rfm_recency" IN (4, 5, 6) AND "rfm_frequency_monetary" IN (2, 3) THEN 'Potential Loyalist' WHEN "rfm_recency" IN (4, 5, 6) AND "rfm_frequency_monetary" IN (1) THEN 'Promising Customers' WHEN "rfm_recency" IN (3) AND "rfm_frequency_monetary" IN (3) THEN 'Needing Attention' WHEN "rfm_recency" IN (3) AND "rfm_frequency_monetary" IN (1, 2) THEN 'About To Sleep' WHEN "rfm_recency" IN (1, 2) AND "rfm_frequency_monetary" IN (3, 4, 5, 6) THEN 'At risk' WHEN "rfm_recency" IN (1, 2) AND "rfm_frequency_monetary" IN (1, 2) THEN 'Lost' END) AS "rfm_label" FROM two_dimensional_rfm_scores), with_previous_label AS (SELECT "contact_id", "period", "rfm_recency", "rfm_frequency_monetary", "rfm_label", (LAG("rfm_label", 1) OVER (PARTITION BY "contact_id" ORDER BY "period")) AS "previous_rfm_label" FROM rfm_scores_and_label) SELECT "contact_id", "period", "rfm_recency", "rfm_frequency_monetary", "rfm_label", "previous_rfm_label" FROM with_previous_label WHERE "period" <> '0000' ORDER BY "contact_id", "period";