Transform a dataset

Understand the different ways of preparing your data

There are several ways to prepare your data sources. When you create a new dataset, you choose one or several sources that feed this dataset. Sources are deduped/merge to form a unified table (=the dataset).
Most data preparation should be applied at the dataset level after sources are unified, but you can also apply some data preparation recipes at the source level, before dedupe.
Image without caption

Dataset level - Add no code data prep recipes

How do data prep recipes work?

In your dataset view, the right sidebar lists all the data prep recipes applied to your dataset. If you click on "Add new recipe", you will see the list of all recipes available, and you can add quickly a new one.
Image without caption
All these recipes are processed in almost real-time each time there is a record created or updated in the source feeding the dataset. These recipes are processed following the position of the item in the sidebar. This is why there are some "system recipes" created by default to process first the merge of the various sources, and the dedupe process.
When you add a new recipe, it will be added at the end of the sidebar. You can click on an existing recipe to modify or delete it. For now, you cannot change easily the position of the recipe in the listing, but it will be possible soon.
Please don't forget to click on "Save" after modifying the data prep recipes. After clicking on Save, we will rebuild the dataset.

What are all the data prep recipes available?

List of all no code dataprep recipes available [Work in progress]

Dataset level - Build a full SQL dataset

When you create a new dataset, you're invited to choose between "No code" or "SQL". The main purpose of SQL-based datasets is to have more flexibility to prepare your data sources.
Image without caption
Some advanced data preparation could be difficult or almost impossible to do within the “no code” dataset builder. For example, if I want to create a dataset “Consents” based on the columns “Optin Email” / “Optin SMS” of my contacts, it implies transforming columns as new lines, there is no way to do it using the “no code” data preparation recipes.
What should you know to use SQL-based datasets?" - The SQL code should follow Postgres syntax. - We are using “Jinja” templating language to be able to add functions, variables, etc.

Source level - Prepare some columns before dedupe

When you add a new source, in the mapping step, it’s possible to add some preparation rules for each column. The main purpose of source-level data preparation is to prepare columns used for dedupe.
Because the dedupe is processed just after importing the sources, it could be necessary to normalize the columns used in the dedupe at the first step, just after importing the sources files. For example, if you want to dedupe your contacts based on Email x Phone number, you will need to normalize these 2 columns to be sure that “ x 0660036339” match with “ x +33660036339”.
Image without caption
Most data prep options available at this step are quite easy to understand. For each column, you can choose to put the column value in “lowercase” or to apply a “Find & replace “ function.
Zoom on SQL functions