Somewhere in your organisation, there's a spreadsheet that functions as critical infrastructure.
Maybe it's a store mapping file — the one that connects POS store IDs to inventory system codes so your data warehouse can produce a unified view of a "store." Maybe it's a product category hierarchy that three analysts maintain independently, each with their own version. Maybe it's a currency code list, a cost-centre mapping, or a set of customer tier classifications that someone exported from a source system eighteen months ago and never updated.
You know the one. It lives on SharePoint, or in a shared drive, or — worse — in someone's email. It was supposed to be temporary. It always is.
If you've ever had an ETL job fail at 2am because someone added a new row to a lookup table and forgot to tell anyone, you know this problem. If you've ever had two analysts pull the same report and get different numbers because they're referencing different versions of the product category mapping, you know this problem. If you've ever spent a Friday afternoon trying to figure out which copy of the store hierarchy is the "real" one, you definitely know this problem.
This is the reference data problem, and almost every data team has it. Most just don't call it that.
Nobody sets out to build critical business logic on a spreadsheet. It happens gradually, and it happens for entirely rational reasons.
It starts innocently. A data engineer needs to map store identifiers between two systems — say, the POS system uses Store ID 123 and the inventory system uses Store Code ABC. There's no automated way to link them. So someone exports a list, builds a mapping in Excel, and the ETL pipeline references it. Problem solved.
Except it's not solved. It's deferred.
A month later, a new store opens. Someone updates the spreadsheet. But "someone" is vague — it might be the data engineer, or the operations team, or a business analyst who happens to know about the file. If it's a good day, they update the right copy. If it's a bad day, they update a copy, but not the one the pipeline reads from.
Three months in, the spreadsheet has grown. It's not just store mappings now. Someone added a tab for product categories. Someone else added brand classifications and lifecycle stages — attributes the source POS system doesn't capture, but the analytics team needs for reporting. The file is doing real work. It's also completely ungoverned.
Six months in, the spreadsheet is load-bearing. Multiple pipelines depend on it. Business users update it without telling the engineering team. The engineering team hard-codes references to specific columns and row ranges. Nobody has a clear picture of what changed, when, or why. There's no audit trail. There's no validation. There's just the spreadsheet.
And then someone leaves the company. The person who understood the spreadsheet's logic, who knew which tabs were current and which were legacy, who remembered why column F exists — they're gone. And now the rest of the team is reverse-engineering a spreadsheet that has become, without anyone intending it, the single point of failure for a significant portion of the data platform.
This isn't a failure of discipline. It's a failure of tooling.
There's no natural "home" for this kind of data in most data architectures. Source systems don't manage it — they produce it, but they don't govern it across system boundaries. Data warehouses consume it, but they're not designed to be edited by business users. dbt seed files are a step in the right direction, but they're version-controlled config files aimed at developers, not a business-user interface for maintaining classifications and mappings. Custom-built admin pages work until the developer who built them moves to another project, and then they become their own maintenance burden.
So it ends up in Excel. Not because anyone chose Excel as the ideal platform for managing reference data, but because Excel was there, and nothing better was available.
The spreadsheet itself isn't the problem. The problem is what happens downstream when your reference data is unmanaged.
Your reports lie — quietly. Two analysts build dashboards for the same quarterly review. One uses last month's product category mapping. The other uses a version from six months ago that doesn't include the new "Premium" tier. Both dashboards look correct. Both show different numbers. Nobody catches it until the VP of Sales asks why the numbers in one presentation don't match the other. The investigation takes a week. The root cause is a stale lookup table.
This isn't hypothetical. It's the most common symptom of unmanaged reference data, and it erodes trust in the data platform. Once business users lose confidence in the numbers, they start maintaining their own spreadsheets — which compounds the original problem.
Your pipelines are fragile. An ETL job that references a lookup table is only as reliable as the process for updating that table. If the process is "someone edits an Excel file and maybe tells the data team," you have an unmanaged dependency in your pipeline. When a new store code appears and the mapping file doesn't include it, the join fails silently or the pipeline breaks loudly. Either way, you're debugging at 2am because a business user added a row to a spreadsheet at 4pm.
The insidious version of this is the silent failure — the pipeline runs successfully, but produces wrong results because the lookup data is stale or incomplete. Nobody notices until someone downstream builds a report on bad data and makes a decision based on it.
You can't answer "who changed this, and when?" In any governed data environment, you need to know what your reference data looked like at any point in time. Which product categories were active in Q3? When was Store 247 reclassified from "Medium" to "Large"? Who approved the change to the supplier tier mapping?
With a spreadsheet, the answer is usually a combination of "check the SharePoint version history" and "ask Sarah, she might remember." That's not governance. In regulated industries — financial services, insurance, healthcare — it's a compliance risk. But even outside regulated industries, the inability to trace reference data changes makes root-cause analysis of data quality issues dramatically harder.
Knowledge walks out the door. The most dangerous property of spreadsheet-managed reference data is that the logic lives in someone's head, not in the system. The person who built the spreadsheet knows that rows 2–150 are active mappings and rows 151–200 are deprecated. They know that the "Override" column was added for a specific edge case during last year's migration. They know that column G has a formula that references column M, and if you sort the sheet, it breaks.
None of this is documented. When that person leaves — or goes on holiday, or gets reassigned — the institutional knowledge goes with them. The next person inherits a spreadsheet with no context, no documentation, and no way to understand the decisions embedded in it.
So if not Excel, then what?
The answer isn't more discipline or better spreadsheet hygiene. The answer is giving reference data a proper home — infrastructure that's designed for the job, not repurposed from a general-purpose tool.
Here's what well-managed reference data looks like:
It has a single source of truth. One place where the store mapping, the product categories, the supplier codes, and every other piece of lookup data lives. Not three copies on SharePoint. Not a version in the Git repo and another in someone's Downloads folder. One authoritative source that every pipeline, dashboard, and system can reference.
Business users can edit it without engineering involvement. This is where most solutions break down. Data engineers can manage lookup tables in code — dbt seed files, SQL scripts, configuration files in version control. But the people who understand the business logic behind reference data are usually business users: the operations manager who knows the store hierarchy, the merchandiser who defines product categories, the finance analyst who maintains cost-centre mappings. They need an interface they can use directly, without filing a Jira ticket and waiting for a developer to make the change.
Every change has an audit trail. Who changed it, when they changed it, what the previous value was, and (ideally) why they changed it. Not as a nice-to-have, but as a fundamental property of the system. This is what turns reference data from "a thing we hope is correct" into "a thing we can prove is correct."
It's accessible via API. Your data pipelines shouldn't be reading from a file path on a shared drive. Reference data should be available programmatically — through an API, an SDK, or a direct database connection — so that ETL jobs, integration pipelines, and applications can consume it reliably without depending on file-system conventions or manual data loads.
It's versioned and restorable. If someone makes a mistake — and they will — you should be able to see exactly what changed and roll it back. You should be able to answer the question "what did this mapping look like on March 15th?" without digging through email threads.
These aren't aspirational features. They're the baseline requirements for managing data that other systems depend on. We wouldn't accept a source-code repository without version history, or a database without access controls. Reference data deserves the same rigour.
If you're reading this and thinking "okay, but why doesn't something already exist for this?" — you're asking the right question.
The enterprise software world has a category called Master Data Management (MDM). It's been around for decades, and it nominally covers reference data. The problem is that MDM platforms — Informatica, TIBCO EBX, SAP Master Data Governance — are designed for multi-year, enterprise-wide data governance initiatives. They're powerful. They're also expensive, slow to implement, and dramatically overbuilt for the problem of managing lookup tables.
If all you need is a governed place to manage your store mappings, product categories, and code crosswalks, an MDM platform is like buying a 747 to fly to the next suburb. You don't need golden record management, probabilistic matching algorithms, or a twelve-month implementation with a systems integrator. You need something lighter, faster, and purpose-built.
On the other end of the spectrum, there are the DIY approaches: Excel spreadsheets, dbt seed files, custom-built admin pages, SharePoint lists. These are free or cheap, and they work — until they don't. The moment you need audit trails, multi-user governance, API access, or the ability to let a business user manage data without a developer intermediary, you've outgrown them.
The gap is in the middle. Between "too heavy" and "too fragile." Between the MDM platform that takes a year to implement and the spreadsheet that takes a year to become unmanageable.
That's the gap we built TitanRDM to fill.
TitanRDM is a reference data management platform — purpose-built for the specific problem this article describes. It gives your lookup tables, mappings, classifications, and hierarchies a proper home: centralised, governed, API-accessible, and designed for both business users and data engineers to work with.
It's not MDM. It doesn't try to be. There's no golden record management, no probabilistic matching, no twelve-month implementation. You can set up your first managed lookup table in minutes, import data from your existing spreadsheets, and start serving it to your pipelines via API or SDK on the same day.
Business users get an intuitive interface for maintaining the data they already understand — store classifications, product hierarchies, supplier mappings. Data engineers get API and SDK access (Python, SQL, REST) to integrate reference data into existing pipelines without changing their architecture. And everyone gets audit trails, version history, and the confidence that the reference data driving their reports is governed, not guessed at.
If the problems in this article sound familiar, that's because we've lived them too. TitanRDM exists because we got tired of watching data teams manage critical business logic in spreadsheets they were vaguely ashamed of.
If you're ready to give your lookup tables a proper home:
Set up your first managed lookup table in 10 minutes. Free tier. No credit card. No sales call.