1. List all business entities


We suggest listing all the business entities by analyzing your data sources, the entities used, and also the targeted use cases.
To inspire you, here is the typical list of business entities for an e-commerce / retail business:
  • Contacts (customers or just contacts)
  • Consents
  • Orders
  • Orders items
  • Products
  • Email interactions
  • Website interactions

2. List all data sources for each business entity


For each business entity, the idea is to define precisely all the data sources. By precisely, I mean that it is useful to anticipate how this data source will be imported.
For example, you could have 3 data sources for your “Contacts”:
  • Shopify
    • Method: direct Shopify connector
    • Frequency: Octolis fetches data every hour by default
    • Scope: all contacts, customers or not
    • Incremental mode: only new or updated contacts
    • Key: Customer ID
  • Newsletter signups (Typeform)
    • Method: webhooks generated by Typeform (or CSV import in FTP server?)
    • Frequency: Real-time if using webhooks
    • Scope: all newsletter signups, with email, first name, date
    • Incremental mode: only new or updated contacts
    • Key: Email

3. Understand the data prep requirements for each source file


For each data source, you may have some custom work to do for data preparation. The idea is first to define the data model for each business entity, and then to adjust each data source to this data model.
We suggest you start by listing the target columns for each business entity, and the expected format for each one. For example, I want a column “County”, with the format, is “ISO code 3 letters”.

4. Anticipate the need for destination syncs pipeline


Because some destination syncs will imply having specific datasets, it is worth listing the main outgoing flows you will need.
For each flow, you could define:
  • Destination: the system you want to send data to.
  • Method: is there some existing connector, cf. listing, or will you use FTP files/webhooks / API?
  • Frequency: at each change in the Octolis database, or every hour?
  • Data scope: what columns? any specific data prep?

5. Design the dataset organization


We suggest you start with a macro map that highlights
  • the sources of each dataset.
  • the main destinations

6. Specify the target data model of each dataset, and the links between the dataset


Once you have reviewed the first macro map, you can specify the data model of each dataset in more detail, and also anticipate how you’re gonna link datasets between them.