Whether you call it a Data Warehouse, data platform, data hub or data management system, you can be sure that most businesses have some kind of platform that centralizes data for analytical purposes. For this post, I’ll use the term Data Warehouse.

A business has two ways of obtaining a Data Warehouse:

  1. Build one in-house
  2. Have external developers build one

This post will be about the second option, where external developers build a Data Warehouse for a business. It doesn’t matter if it’s built from scratch, using a semifinished product or a ready off-the-shelf solution. I’m going to talk about the difficulty selling it from the external (consulting) side, the difficult buying it from the customer side, and proposing a solution in the form of a “Data Warehouse Configurator” modeled after Car Configurators from various car manufacturers.

Data is everywhere, at all kinds and sizes of businesses. Recently I’ve found a lot of pleasure in helping smaller companies build their first Data Warehouse. This post is written from the perspective of selling Data Warehouse projects to smaller prospects, think less than 200 days of effort each. The larger projects are disappearing, because the big guys are able to build in-house. Knowledge and skills are easier to obtain than before, with the coming of easier tools and education providers providing data tracks.

Difficulty selling: The Data Warehouse proposition

A typical project to build a Data Warehouse at a customer starts with extracting and analyzing a few source systems. For example, those that hold financial data and employee data. For these source system there will be built:

  • An extraction mechanism to retrieve data from the source system
  • A transformation layer, to parse the data and transform it into usable information
  • A visualization layer, to where the charts and graphs are made

Most of the work is technical, which has no real business impact. That’s because it doesn’t help the customer understand their business, and doesn’t improve the bottom line. The cliché is to use a picture of an iceberg where the useful graphs and figures are above the water, and the rest of the Data Warehouse is ⇩ expandableunderwater. This iceberg image means that most of the time and effort that is spent on building a Data Warehouse is completely invisible to the business users, but it needs to happen before useful graphs and figures can be shown.

This is a conundrum. If most of the work is technical (about 70%), and the technical things hide underwater, how can all the technical work be sold to the customer that just wants the tip of the iceberg? Certain things can be explained, see The Confusing Technical Details later, however there’s still this big chunk of work that has to be billed that the customer can hardly make sense of. This is often solved by talking the customer through many slides of the company’s experience, trustworthiness, partnership promises and previous customer cases. The customer is promised beautiful outcomes, if they would only trust us and sign the contract.

In practice, you’ll see the lowest bidder winning the contract. If you can’t see and compare the differences, it doesn’t matter much who you pick.

Difficulty buying: The confusing technical details

The Data Warehouse is a complex thing for those uninitiated, and experts tend to either explain too much or too little. In my quotes I work with a calculation sheet where I put in all the parameters of a project, such as:

  • The type of source system
  • The number of ⇩ expandableentities
  • The estimated complexity of an extraction
  • The estimated effort needed for the transformations
  • The estimated effort needed for building the reports

Think of an Excel sheet with about fifty rows of items with an estimate in hours and days, with a few detail sheets for working out the estimate for extracting source systems and building the transformations.

The problem I regularly face is that the breakdown structure is too technical. Customers are not interested in how much time building a “slowly changing dimension” costs, they want to see historic values. Customers are not interested in ⇩ expandabledelta loads, instead they want to reduce the runtime and cost of a load job. They don’t care about tables and facts and dimensions, they care about data made actionable through pretty dashboards.

For each feature I should list the price, instead of just breaking down the work in parts.

Difficulty buying: The labor and pricing

Some things are complex to do, and other things are easy to do. Extracting from a common ⇩ expandableRDBMS source system, such as a Microsoft/Azure SQL Server, Oracle Database or PostgreSQL, is relatively straightforward. However, extracting data from an undocumented API, which is rate-limited and needs additional software or licenses to connect, is much more complicated. If the customer is told something along the lines of “the simple source A costs €100 to extract, the complex source B costs €10,000 to extract”, they might not find the data worth the effort of extracting.

The same goes for other requirements. For example, the customer comes up with a certain use-case: When they enter something into their finance application, they want it available in their Data Warehouse right away. There are several options:

  • To quote a real-time architecture, with a higher project budget and higher resource usage cost
  • To quote a batch-driven architecture, and advising the customer that they don’t need real-time
  • To quote a real-time architecture, and a batch-driven architecture and make the customer make a decision based on cost

How can I show customers what the impact would be of each addition or reduction in their requirements, without being the man in the middle? Making adjustments in calculations are time-consuming. I want to give my (potential) customers the tools to find what kind of Data Warehouse works best for them, and what the cost would be. I want to empower the customer to make their own choices.

So I asked myself, where does one spend a 50-150k on a complex machine, self configured to their specific needs? Well, a car of course. Additionally, I’m sure that most business decision makers have experienced the purchase of a car, so it should be a familiar experience. The picking of a model, the balancing of options and budget, and finally ordering the car and waiting for the delivery day.

For each feature I should list the price, and allow the customer to add or remove it to a calculation dynamically.

Reviewing prior art: the Car Configurator

Cars are expensive, complex things. When purchasing a car, customers are gradually guided into picking a model, a version, and set of options. The customers are given the tools to “build” their own cars via the website. Let’s take a look at the car manufacturer Volvo:

First image

In this model overview the most popular cars are shown first. Currently, those are the hybrid models. The cars are categorized by type, such as SUV, Estate and Sedan. Most customers have a specific preference and already know what type they want.

When I pick the XC40 I get redirected to a page with general information about the car. It’s time to sell the model, Volvo does that by having pretty pictures of the car, then talking about why this model is a good fit for the customer. They write:

For every you Smart. Versatile. Expressive. Meet our pure electric compact SUV – for every version of you.

Lesson learned: Show the different models, and give for each model a high over description.

First image

Car battery technology is being improved upon rapidly. I’m sure they could put a lot of technobabble on the page about how their batteries are the best, but Volvo keeps in mind that it’s not about what’s inside the battery, it’s all about what it enables customers to do: drive more.

Fewer charging minutes, more miles Our next generation electric powertrains increase the capacity of your XC40 Recharge, giving you longer range and quicker DC fast charging.

Lesson learned: talk more about what it does for the customer instead of what’s inside it.

Further down the page there are the variants of the model. Volvo chooses to divide them in Essential, Core, Plus and Ultimate. The essential is a good car, but the Core is better, and ultimately there is the Ultimate level that includes all the previous features and some more.

First image

It seems to me that it helps customers identify the things they might require of a car. Mind you, all the items are clickable to provide details if you’re unsure what an option does. Throughout the whole website all option titles are clickable for additional information in understandable language.

Lesson learned: Give understandable descriptions of functionality, but only when asked

First image

Deciding on the “Plus” level, I click on the “Design Yours” button. The final choices are:

  • A review of the variant with the possibility to change it
  • A color palette to choose from
  • The choice of rims
  • The interior options
  • And additional packages such as rooftop boxes and bike holders

Lesson learned: Give the opportunity to add various options to personalize the product

BMW offers a ⇩ expandablesimilar experience. I haven’t checked all manufacturers, but I’m pretty sure it’s a common mechanic. Let’s try to apply this mechanic to a Data Warehouse

The Data Warehouse Configurator

Car manufacturers are able to provide a lot of variations and options, to cater to different needs and requirements. Customers apparently don’t mind paying for options, as long as the price and impact is clear. They might not need all the options that are available, but then they won’t build them in. I’m going to apply this model to a Data Warehouse.

The different models

For the purpose of this blog post, I’ve created four models of Data Warehouses. I’ll use the abbreviation DWH from time to time. In this table I’ve put high level information about each model:

  Small DWH Medium DWH Large DWH Sporty
Model name Groceree Compacto Familii Zoofs
Type Entry Starter Allround Performance
Customization None Little Standard Standard
Business results Near-instant Few weeks Few months Many months
Refresh capacity Daily loads Multi-daily Every 15 minutes (near) real-time
Integrations Standard systems Standard systems + few custom Standard + many custom Standard + many custom + real time
Complexity Low Low-Medium Medium High
0-100 12s 9.5s 8s 3.9s
External developers 1 3 7 17+
Cost 10-25k 25-50k 50-200k 200k+

Whereas most customers come from ⇩ expandableExcelytics, the Goveree or Compacto offering might be a good step up from their existing situation. It would already be a big improvement over doing things manually. It has no use researching the capabilities of the Sporty model if it’s not something that fits the budget and use case.

Detailed model information

If we click through to the detail pages of each model, we find various texts that describe the model in more information. It has to answer the questions such as:

  • Why is this model good for my use case?
  • What can I do with this model?
  • What are the additional options available for my model?

Small DWH: Groceree

The Groceree DWH is an entry level Data Warehouse proposition. It offers a streamlined process, a standard Data Warehouse, and preconfigured reports delivered in a very short time-to-market. Within a week you can expect daily reports on the most popular Finance, HR and ERP systems. It empowers business users to make good decisions fast, with a minimal investment.

It is especially a good fit for customers that either have no existing analytics capability, or are currently using Excel or other business user tools to gain insight in their business processes.

Your employees will be trained in the use of standard reports, both how to look at them and how to interpret the results.

Medium DWH: Compacto

The Compacto DWH is a perfect vehicle for those just starting out with analytics, but require more data loads and more source systems to be connected. It provides the ability to connect to some custom source systems. This model is perfect for customers who want to start out, but a bit more customized.

Due to the ability to customize the platform and reports, the Data Warehouse can dynamically adapt to your business needs.

Large DWH: Familii

The Familii DWH is bigger, faster and more connected. There are no barriers and no limits. Connect all your systems, integrate all your data and create a centralized Data Warehouse for your whole business, both business users and other systems.

The Familii is built to remain, offering advanced Data Quality Checks, pro-active monitoring and support, Advanced Analytics and capabilities to support Machine Learning use cases.

Sporty DWH: Zoofs

The Zoofs is the state-of-the-art Data Warehouse. It’s the most advanced offering, giving the business all the tools to support high performance business requirements, fast.

At this tier (near) real-time data processing is supported, which is perfect when you need to know everything, now. It can process enormous volumes of data, and support thousands of consumers.

The model variant tiers

The model variant tiers preselect options. Generally there are three or four tiers, getting pricier and fancier as it goes up. In the case of Volvo this is “Core, Plus and Ultimate”. Here the game is to play with the option sets and seduce the customer to pick one tier higher.

Sure you can pick the sunroof for €2000, but if you pick the Ultimate you pay €2500, and then you have not only the sunroof but the LED Headlights as well!

  Core Plus (Core, with additionally) Ultimate (Plus, with additionally)
Small/ Groceree Basic Foundation Additional Report Mobile/PDA Reports, Support Prepaid (1x)
Medium/ Compacto Basic Foundation Historical Data Data Quality Checks
Large/ Familii Premium Foundation Historical Data Change Data Capture, Data Quality Checks
Sporty/ Zoofs Premium Foundation Historical Data, Change Data Capture, Data Quality Checks Bring Your Own Cloud

Examples of additional options

The options are additional functionalities the customer can purchase to customize the proposition to their needs. It is very important to avoid all technobabble in the descriptions of the options. Try to use terms and descriptions that business people understand, so they can properly judge whether they need the functionality.

For example:

Historical data

Source systems capture states of processes. These states can be captured in the Data Warehouse to be analyzed or reported upon. An indicator can be built for various situations, such as when a status change took too long.

Industry Examples

E-Commerce

Customers place orders on the website. An order has multiple states, such as placed, processing, shipped and returned. When a sales order status changes, the event can be recorded in the Data Warehouse. This allows us to recognize and see situations where the processing time took longer than usual. This allows business users to check and improve the processes.

Ticketing

Ticketing systems allows users or customers to make requests. These requests contain information about the request, when it was made and by whom, but also the status of the request. By keeping history of the status changes, issues and inefficiencies in the ticket handling process can be found.

Requirements

In order to be able to process historical data, the unique records in a table need to be found. Ask your technical department whether for the information you’d like to keep track of, whether there is a unique data field or combination of data fields in the specific table(s).

Costs

The cost of enabling Historical data is a rate of €100 per table entity. If a source system contains 10 tables, then the cost will be 10x€100: €1000.

Change Data Capture

When a source system is extracted, all the records are retrieved. As source systems will grow over time, this can eventually take a long time. By implementing Change Data Capture, only the data that has been added or modified will be retrieved. This reduces load-times and associated costs. Large tables can be extracted in minutes instead of hours.

Industry Examples

Education

An education institute started recording data in their student information system since 1994. Every day the teachers mark the presence of students in their classroom into the system. Currently, there are about 84 million entries in the presence registration table. Extraction of this table takes four hours. By enabling Change Data Capture the education institute reduced the load time to 20 minutes. This does not only save them time, but also reduced load costs due to shorter running sessions.

Requirements

In order to implement Change Data Capture, it needs to be possible to recognize which records have been added and modified. Ask your technical department whether in the source system the following data fields exist in your source table:

  • A data field that indicates when a field has been modified (such as, an update date, modified date)
  • (Optional, beneficial) A data field that indicates when a field has been added (such as, an added date, create date)
  • (Optional, beneficial) A data field that indicates whether a field has been deleted

Additionally, the unique records in a table need to be found. Ask your technical department whether for the information you’d like to keep track of, whether there is a unique data field or combination of data fields in the specific table(s).

Costs

The cost of Change Data Capture is at a rate of €100 per table entity. If a source system contains 10 tables, then the cost will be 10x€100: €1000.

Data Quality Checks

All businesses struggle with Data Quality issues. Data Quality issues come in many forms, such as mistyped information, incorrect classifications and missing values. With the Data Quality Checks you have the ability to enter rules that will be tested over ranges of data. Daily a report will be generated to show the state of the data.

Industry Examples

Food industry

There have been complaints about products being sold without proper allergen information in the product description on the website. The costs are high, due to returns, alternate products that have to be provided and of course the reputation damage. By implementing Data Quality Checks it’s possible to find missing product information, reducing costs and frustration.

Requirements

In order to implement Data Quality Checks, the customer needs to provide the rules that have to be tested for.

Costs

The cost of Data Quality Checks Framework is €1899, with an option for an €799 4-hour workshop for building the first Data Quality Check rules.

Support Prepaid

Purchase support with the prepaid model. Get all the help you need, without having a subscription running when you don’t need it.

Conditions

Each prepaid block consists of 32 hours of support. Each support event decreases the prepaid balance with a minimum of 4 hours. Each prepaid block is valid for up to 12 months.

Pricing

Each prepaid block cost €3999

The following options are left to the reader as an exercise:

  • Basic Foundation
  • Premium Foundation
  • Data Governance Tools
  • Mobile/PDA reports
  • Bring your own cloud
  • (Near) Real-Time
  • Additional Report
  • Additional Source System
  • Support Business Hours Subscription

Summary

While the options and pricing are just fictive, it does give an idea on how to approach the selling of a Data Warehouse in a nontraditional way. It would give transparency, flexibility and the tools to customers in the areas where it’s often neglected. By looking at established techniques of car manufacturers, inspiration can be drawn in order to sell something else which is similarly complex and expandable, in a straightforward and easy to navigate structure.

The next step is to actually build a Data Warehouse Configurator as a website. Something that end-users can visit and play around with, and to put actual, real prices there. This way customers can think, consider and compare with other Data Warehouse vendors. I hope to find some good examples in the future.

Final thoughts

Things not covered (but are worth listing) are:

  • Consider the dependencies between options (can’t have real-time without the premium foundation)
  • Consider that not all models support all options (the basic models don’t support the premium foundation)
  • Consider that your platform will be a little more product-shaped
  • Consider that upgrades can be sold separately, later as well. This helps customers to space out investments
  • Consider the possibility of automatically converting the quote into a project plan
  • Consider the automatic delivery estimation based on availability