4 things a Data Warehouse alone won't solve
Or how to save 9 months of missed €$ opportunities💸
With a data warehouse, my life would be easier.
As a Growth and Marketing leader, you just want to access clean customer data to personalize your go-to-market actions, and grow revenues.
That's what you think a data warehouse will help you with. Unfortunately, it's not as straightforward as it seems to be.
Here is what I wish I had known earlier about this topic, that would have made me save months of work, and corresponding opportunity cost.
TL;DR:
A data warehouse is like a warehouse: 4 bare walls within which you store things, that happen to be data.
Growth teams often confuse it with a cosy centrally-located apartment, with heating, running water, high-speed wifi and a fully stacked fridge.
I've seen teams waiting for 9 months for a data warehouse in the hope of implementing business actions, based on real-time info (e.g., as soon as a lead fills a contact form, a sales calls her back), only to see their hopes (and revenue KPIs) crushed by the reality.
👋 💸 9 months of revenue opportunity cost is a lot!
In a nutshell: data warehouses have been designed to enable data analysis on past events (passive and aggregated approach), not to trigger actions based on user profiles (active and personalized approach), let alone in a 'real-time' fashion.
Here are 4 things to keep in mind and key questions every Growth and Marketing leader should ask her engineering team before advocating for a data warehouse:
1/ Data warehouses are not natively sync'ed to application tools
(e.g., CRMs, emailing)
As said earlier, a data warehouse is a warehouse located in place A.
Your application tools: CRMs, emailing, ads tools for instance, are in other places: B, C, D.
There is no pre-existing routes or bridges linking A to B, A to C, A to D.
Meaning:
If Jenifer (a lead) signs up to an Enterprise plan…
…this information will be logged in the Database (production table in the data warehouse),
…but not necessarily sent to 'outbound sales tools', such as Lemlist or Salesloft.
Why should you care?
Jenifer might keep receiving sales email or calls, and being targeted by ads. Part of your ads budget will be wasted on an existing customer, plus you might alienate her.
⚠️ Questions Growth and Marketing teams should ask to Engineering teams:
How are you going to connect the data warehouse to application tools? (e.g., CRMs, Emailing, Ads tools)
Once again, prioritize:
The tools you need to connect to the data warehouse according to your business needs: maybe the Sales CRM first, and then the Ads platforms?
The specific data (which properties or events) that need to be sent from the data warehouse to these tools
Tip: avoid saying: ‘all data and all tools’, it's usually not that useful and might get your engineering team super frustrated if their work isn't put to good use afterwards!
2/ Customer data in the data warehouse is just stored in silos, not unified into a single customer view
What the heck does it mean?
Have you ever had the feeling that the customer service you were talking to had no context about you as a client: your previous complaints, your tenure, the products you used?
In other words, they had no clue who you were.
This is because customer data is usually stored in silos, defined by their sources (the tools they come from, each tool = 1 silo):
Let's say your name is Jenifer, and you're the head of Ops at SUPERCAT Company.
You might have spoken to a Sales person from CYBERDOG (a cybersecurity company) at an event that logs your data in their Sales CRM: Hubspot for instance
You have then received emails sent by their outbound marketing team through Lemlist for instance.
You have also chatted with CYBERDOG’s community manager at a webinar, and the conversation is logged in their webinar tool: e.g., Livestorm
You decided to start a free trial, and received emails sent by their product marketing team through Customer.io
You finally talk to CYBERDOG customer service, but their team only uses Zendesk, and can't access a timeline showing your previous touchpoints with their teams.
You're super frustrated.
Also, at the same time, the CEO of CYBERDOG has touchbased with the CEO of SUPERCAT (your boss).
- CYBERDOG CEO has tagged SUPERCAT as a 'high potential, please provide priority service' lead in the Sales CRM (Hubspot).
- CYBERDOG customer Service had no means to make the link between you and your CEO in Zendesk, because data was siloed (Zendesk <> Hubspot).
- They wish they had a 'company profile' (SUPERCAT) in which touchpoints from different 'users' were stored (your touchpoints: Jenifer, head of Ops, and your CEO’s).
Although unification (also called 'identity resolution') does seem trivial to understand, it is complex to implement as:
There are many sources of data (each tool is a source of data, and a destination)
Especially in B2B, you need an 'Account' view (based on a Company level), and a 'User' view (based on physical persons' level). For instance for Slack:
In general, 1 Slack account has several users: CYBERDOG is an account, each team member is a user on Slack
But a user can be linked to several accounts: if you have several Slack accounts (and I bet you have), you're in this case
It's not always obvious which id should be used for the deduplication of profiles:
E.g.,
- Anonymous users
- People who download your mobile app but never logged in
- People who interact with their business email, and personal email (via social media) at the same timeRules for identity resolution should be defined by business rules, and then implemented by data/tech teams
⚠️ Questions Growth and Marketing teams should ask to Engineering teams:
Are you able to create a single customer view across data tables / tools?
The best way to work on this with Engineers, is to take concrete examples: for instance the journey described for Jenifer at CYBERDOG.
You'll be able to be more specific when describing your needs, which will be highly appreciated by your engineering team.
What processes in my team should I change to make sure you can unify data?
If the way your company collects data is through manual input (e.g., a Sales person logs informations about a call with a lead), and that manual input is not consistent (or does not exist), there will be little chance a data team or a tool (even the ones that say they 'automagically' ✨ solve your issues) can unify data.
Tip: at the risk of seeming ‘obvious’: this is team work here.
The more precise you are at specifying your needs and disciplined when it comes to manual inputs (in the CRM), the better your tech team will be at performing identity resolution.
3/ Data warehouses are generally refreshed only ONCE a day
Data warehouses were initially created and designed to analyze product data, not to take actions on it.
Updates of data tables require extracting data from their sources (a production database, a CRM such as Hubspot or Salesforce), loading it in the data warehouse, and potentially transforming it so that it is easily reusable.
It's quite a process, and that's why data warehouses are usually refreshed only once a day.
But what if you were counting on the data warehouse to drive quick 'win back campaigns', or 'sales actions'?
For instance, when a newly acquired client drops during onboarding, would you not like to know in the minute, to take actions: send an email, have a Customer Success person call them back? Rather than wait for the next data warehouse update: in potentially 24 hours?
⚠️ Question Growth and Marketing teams should ask to Engineering teams:
How often will data be refreshed in the data warehouse? Have you designed it to be a :
(i) Reporting data warehouse, enabling the analysis of past events, in a passive and aggregated approach?
-> looking at dashboards to determine past trends among cohorts
or a
(ii) 'Production' data warehouse, aiming at taking real-time actions based on users’ individual and specific behavior?
-> as soon as a user complies with a set of conditions (e.g., completes sign-up but does not activate), trigger a messaging/ads display/calling campaign, until its state changes (e.g., becomes active)Even better, prepare a list of data that would ideally need to be refreshed more often, and prioritize your needs:
For high-level reporting, once per day may be enough
For sales sensitive data points, every minute or every 10 minutes might be better
Tip: once again, back to basics.
Don’t jump to a potential technical solution (your tech team will know better and honestly might be a little annoyed), but work on formulating your needs, challenging them: why do I need a real-time refresh? Would every hour actually be enough? Will I really trigger actions on all these data points, or can I use only one as a proxy?
4/ Analyzing data from a data warehouse requires BI / Tech skills
Imagine that you’d like perform an analysis on users who signed up on CYBERDOG following the christmas campaign in Berlin in December 2020.
This data must be somewhere in the data warehouse, right?
A data warehouse layers on top of a large amount of databases.
Think of a huge warehouse with a lot of stocks (data tables). The analysis you’re asking for is like preparing an order in a huge Amazon warehouse.
Let’s take this comparison further:
You need to pick different products in your Amazon warehouse
-> several data points in a data warehouseThese products are in multiple aisles
-> different data tables in a data warehouseYou need to have a map, for orientation around the aisles
-> data catalogs, documentation, and a good sense of orientation in a data warehouseYou might need a cart, to ride around the warehouse (it’s huge)
-> a specific tool
As a result, you'll probably not be able to navigate and analyze data in a warehouse without the help of data engineers/scientists, unless you have a specific ‘support system’: tooling, documentation, etc.
⚠️ Question Growth and Marketing teams should ask to Engineering teams:
How are we going to access data within the data warehouse without going through the tech team?
A few points to check:
- What requests will I be able to perform without data engineering skills or advanced SQL skills (unless you’re advanced in SQL)?
- How can I understand the data tables structure? Is there a documentation I can access?
- If I want to perform an advanced request, how should I specify my need, and who on the data team can help me? Are there specific ressources allocated to my needs, or each of my requests will compete with everything else the data team has on its plate?Tip: A lot of vendors will claim they automagically 💫 solve this problem from A to Z. The truth is you need 3 things to solve this problem: a BI tool, proper documentation, cross-team collaboration/modus operandi.
If you’d like to benchmark different tools or org design, let me know, happy to help!
Want to go further?
Book a personalized 25-min ‘data for business’ coaching session with Lago team. What we usually help on: your tool stack, tracking plan, team organization, recruitment.
Join our invite-only community: comment this post, and I’ll follow-up!
Have you enjoyed this article? Please share the love 💝
See you next Thursday! 👇
I’ll talk about getting your tracking plan right.
There are many things you can’t control in life (and regarding data), but a proper tracking plan is definitely in your hands, and can have a huge impact.
N.B.: Photo by Sikai Gu on Unsplash
4 things a Data Warehouse alone won't solve
Love the use cases and real life situations. Interested to follow the Lago project through the community!
Thanks for the great post. Interested in joining the community!