Written by: David Carneal – Digital Efficiency Consulting Group – DECG
Read Time: 8 min
Why are we arguing about the same KPI again?
Picture a leadership meeting where everyone shows up with the same agenda and three different “truths.” Sales has a dashboard. Ops has a dashboard. Finance has a dashboard. The KPI on the slide is the same name in all three places, and somehow it has three different values.
Then the spreadsheet gets summoned like a wizard’s staff. Someone says, “Hold on, I have the export.” Another person replies, “That export is from last Tuesday.” And a third person opens a file named final_v9_REALfinal.xlsx like it’s an ancient artifact you’re not allowed to touch without gloves.
This article is about making that argument boring. Not “mysterious and dramatic.” Boring. Because when your numbers are boringly consistent, your decisions get a whole lot more exciting (in the good way).
- Stop dashboard debates by making numbers explainable and reproducible.
- Separate raw data from standardized data from meaning from presentation.
- Build a habit where “where did this number come from?” has a one-minute answer.
Why data integrity becomes everyone’s problem
Data integrity sounds like a job title (“Integrity Manager, reporting to the Spreadsheet Council”). In reality, it’s a business problem that shows up as wasted time, slow decisions, and quiet distrust.
When different teams pull “the truth” from different places, you don’t just get different reports. You get different realities. And now every meeting includes a side quest: reconcile the number before you can discuss the number.
- Multiple source systems and multiple extracts create multiple versions of reality.
- Manual exports multiply quickly: export, re-export, “fix,” and re-upload.
- Meetings become forensic labs: “Where did this number come from?”
- People stop trusting dashboards and start trusting whoever yells the loudest.
What data integrity actually means in business terms
Plain-English definition: you can explain the number and reproduce it, every time, without a heroic effort.
Integrity isn’t a single control, it’s a chain. If any link is weak, your KPI turns into a rumor with a logo.
The integrity chain (and where it breaks)
If you can’t trace a KPI back through that chain, you don’t have integrity. You have vibes.
- Source systems (ERP, CRM, WMS, service tools, finance)
- Ingestion (how data lands in your environment)
- Storage (raw preservation vs curated structures)
- Transformation (standardizing, mapping, and calculating)
- Consumption (dashboards, reports, exports, and “quick” pivots)
What a data farm is (and what it isn’t)
A data farm is a governed, layered pattern for handling data on purpose. It has fences, labels, gates, and rules. It’s designed so that traceability is normal, not a detective story.
A data farm is not a shared drive named “Data.” It’s not a folder that looks like an abandoned storage unit. And it is definitely not “whatever happens when someone with admin access gets bored on a Friday.”
The goal is simple: make “where the number came from” boringly easy to answer.
- Raw stays raw (you keep the receipts).
- Standardized data makes entities look the same everywhere.
- A semantic layer defines meaning and KPIs with adult supervision.
- Dashboards become the farmers market, not the kitchen sink.
Data farm vs data lake vs warehouse vs lakehouse
Welcome to the buzzword petting zoo. These patterns can all work, but they fail in very predictable ways when governance is missing.
- Data lake: Big and flexible. Great for raw storage, and also great at becoming a swamp without rules.
- Data warehouse: Curated and structured. Analytics-first. Usually stricter on schema and access.
- Lakehouse: Tries to be both. Sometimes brilliant, sometimes a costume party with invoices.
- Data farm (pattern): A practical way to separate zones on purpose: raw → standardized → meaning → dashboards.
The layers of a data farm (from dirt to dashboards)
Think of the farm as zones with different rules. Each layer has a job. If you skip layers, you usually end up rebuilding them later while pretending it was “a planned refactor.”
Layer 1: The Raw Field (copy it exactly, don’t “help” it)
Purpose: land data as-received, timestamped, preserved. This is your audit breadcrumb trail.
Rule: humans don’t write here. Service accounts do. You keep source IDs and codes intact because they’re the receipts.
- Examples of source systems (high-level menu):
- ERP: NetSuite, Microsoft Dynamics 365, SAP, Epicor
- CRM: Salesforce, HubSpot
- WMS: Manhattan, Blue Yonder, Körber/HighJump
- Service: ServiceNow, Zendesk, Jira Service Management
- Billing/Finance: QuickBooks, Sage Intacct, Stripe
Ingestion (boring is the goal)
If ingestion is exciting, your quarter is about to get spicy. You want repeatable, logged, monitored, and recoverable.
Treat file drops as untrusted input: validate, log, quarantine when needed.
- Common approaches (overview only):
- API pulls on schedules
- CDC/replication from databases
- SFTP drops (with strict controls and validation)
- Managed connectors (the “plumbing subscription” vendors)
Layer 2: The Universal Packing House (standardize the chaos)
Purpose: make the same entity look the same everywhere. This is where you stop arguing about what “Won” means.
- Universal format themes:
- Date/time standards and timezone strategy
- Shared identifiers (customer, product/SKU, location)
- Normalized statuses (“WIN”, “Won”, “✅” becomes one governed status)
- Mapped codes (GL, departments, reason codes, shipment statuses)
- Predictable EDI translation (850/856/810 stop being secret runes)
Layer 3: The Semantic Layer (meaning, not just rows)
This is what BI tools should touch: curated, documented, consistent. It’s where KPIs stop being personal opinions.
- Common outputs:
- Conformed dimensions (Customers, Products, Dates)
- Fact tables (Orders, Shipments, Invoices, Tickets)
- Documented KPI logic with an owner (adult supervision included)
- Where logic can live (high-level examples):
- Governed SQL models
- Versioned transformation projects
- Certified semantic models
Layer 4: Dashboards (the farmers market, not the kitchen sink)
Dashboards are where stakeholders shop for decisions. They should pull from certified datasets, not directly from source systems.
- BI tool examples: Power BI, Tableau, Looker, Qlik, Sigma
- Governance pattern:
- Executive dashboards pull from certified/endorsed datasets.
- Self-service is allowed, but “custom” needs a label and guardrails.
Lock it down: tamper resistance and auditability
If your raw zone is editable by humans, someone will “fix” something. It might even be well-intentioned. It will still break traceability.
You want the ability to answer: who changed what, when, and why? Without needing to interview half the company.
- Access control and separation of duties (raw is protected, curated is certified).
- Versioning and review for transformations (no silent KPI rewrites).
- Clear ownership for KPI definitions and datasets.
- Logs and monitoring so issues are found by alerts, not angry meetings.
Data quality checks: trust, but verify automatically
You don’t need perfection. You need early warning. These checks catch the “yesterday vanished” problems before the quarterly review.
- Baseline checks:
- Row counts and volume checks (Did yesterday vanish?)
- Reconciliation checks (billing totals, revenue tie-outs)
- Referential integrity checks (orphans get flagged)
- Anomaly detection (spikes, drops, weirdness)
- When something fails:
- Quarantine bad records instead of quietly blending them in.
- Route exceptions to a small approval workflow.
- Fix at the source when possible, then reprocess.
Implementation roadmap (practical, not heroic)
This is how you avoid a two-year “data transformation” that produces exactly one screenshot and a lot of emotional damage.
- Align on definitions: pick 5–10 executive KPIs and define them in plain English.
- Build raw plus automated ingestion for 1–2 systems (start where money meets truth).
- Add the universal format layer so entities and statuses stop shapeshifting.
- Add the semantic layer plus certified datasets (this is where trust is earned).
- Deploy dashboards and governance habits so the system stays sane.
Common failure modes (and how to avoid them)
Most data initiatives fail the same way: the logic lives in someone’s head, the raw zone isn’t protected, and problems are discovered by surprise instead of monitoring.
Here’s a quick scan you can run without opening a single tool.
- Failure modes:
- KPI logic lives in a spreadsheet (or worse, in someone’s memory).
- Dashboards connect directly to source systems (fast, fragile, confusing).
- No monitoring (problems discovered by angry meetings).
- Raw data gets “fixed” by hand (goodbye traceability).
- Mini checklist (quick scan):
- An owner exists for each KPI definition.
- Traceability (source → dashboard) is documented.
- Raw zone is write-protected for humans.
- Transforms are versioned and reviewed.
- Certified datasets feed executive dashboards.
One small step this week
Pick the KPI that causes the most arguments. Trace it end-to-end (source → dashboard). Write the definition in plain English. Assign an owner. Make it official.
Then enjoy your next meeting, when someone asks “where did this number come from?” and the answer takes one minute instead of one week.
CTA: If you want a fast way to turn KPI arguments into a governed, repeatable system, start with a single KPI trace and a simple layered plan. Small improvements create real change.
