05 · Ideas to Build

Ideas to build.

Not built—yet. Architectures I’ve worked through end to end, with the business case already made. The difference from the shipped work: these live in my head, not in production.

01 devops · review throughput

Lineage-driven blast-radius review

Draft
Thesis

Change a key table and you can’t see the downstream blast radius — a known unknown.

Build

Column-level lineage makes the before/after impact analysis tractable — one AI-assisted pass.

Value

Review pipelines you’re only adjacent to, with confidence — not chasing every unknown.

  • dbt Fusion
  • column lineage
  • auto-tests
  • AI review

The problem

Simple, well-documented pipelines aren’t scary to review. Changes to high-leverage tables with known unknowns are. A logic change’s impact across the downstream population is unknown: did the code quietly assume a scenario that never happens, when it’s actually real? Is the field or row being added or removed relevant to some other downstream calculation — and did that calculation account for it? The pain runs the whole value chain. Developers ship with less confidence and spend more time tracing paths, still leaving unknowns. Reviewers face the same choice — investigate, or accept the risk. Bad code slips through wherever safeguards are thin. And the strategic cost is bigger: low trust in AI-generated code plus long reviews means AI development isn’t even allowed in some circles.

The build

Detailed before/after analysis of how the shape of downstream features changes isn’t widely done at a fine level — and table-level lineage can’t do it, because one table can feed thousands of fields. Column-level lineage (a dbt Fusion–style engine) makes the targeted query feasible: analyze just the impacted paths. Then bring all the pieces — lineage, encoded standards, automatic tests, AI review — together in one PR workflow. Third-party tools can be parts of it; the value is having everything in one place to maximize review throughput.

Why it's not built yet

A Fusion-style column-lineage engine isn’t readily accessible yet. And this kind of work always reads as a nice-to-have, so it drops down the backlog when there’s no clear path and the development effort isn’t scoped.

Where it comes from

The real goal was empowering a team to review code for pipelines they’re only adjacent to. Standards, process, AI support, and automatic tests all help gauge expected results versus actual — and at least ensure due diligence was done. On fast-iterating, complex codebases the gain compounds; hard to put a clean number on it, but it could be 2x or more.

02 ingestion · control plane

Standardized ingestion + CDC base

Draft
Thesis

iPaaS connectors don’t fit odd sources, so custom pipelines drift into a dozen flavors.

Build

A portable dlt + field-level CDC pattern standardizes the raw→staging layer.

Value

Agents and humans both build fast on one consistent base — capacity stops being the bottleneck.

  • dlt
  • standardized raw→staging
  • field-level CDC
  • Streamlit control

feeds → scorecard stack · feeds → workflow layer

The problem

iPaaS connectors can lack the flexibility you need for non-blue-chip source systems — and then you end up building custom code anyway. But when you build your own connectors, it’s very easy for them to start looking and feeling slightly different across your environment. You look up one day and some pipelines do this well, but the majority all have slightly different flavors, strategies, and reasons for doing this or that. Your agents get confused. Your humans get confused. Everything slows down, and your data lakehouse feels much heavier than it needs to. The biggest pain is the data team — which means the business users who have to decide whether to pony up budget for more capacity or wait longer to get requests across the finish line.

The build

The goal is to standardize the raw-to-staging layer as much as possible, so agents and humans alike can quickly and easily build and maintain these pipelines. The field-level CDC part is the foundation: essentially rebuilding full audit-log immutability — not exactly, but as close as you can get — for batch or micro-batch ingestion. That consistent base is what you build super-performant incremental pipelines on (something like Feldera), or a custom event pipeline to trigger processes and integrations. A Streamlit control plane lets the business steer what’s captured.

Why it's not built yet

Teams reach for an iPaaS connector to avoid the lift, then quietly build custom around its gaps — without ever standardizing the result. A clean, consistent pattern is foundation work with no clean day-one business ask, so it rarely gets prioritized over the next feature request.

Where it comes from

It clicked when I built a field-level CDC on an odd webhook-XML pipeline. There were so many entities in there, and the business was growing, so it was hard to predict what the right thing to build was — handling everything wasn’t realistic. One day we decided to just capture as many important fields as we could for the core entity and build a field-level CDC unpivot pipeline. Almost immediately every request felt lighter. Ad-hoc questions were answered straight from that table, because you knew the data was there somewhere — and you’d often repeat a pattern from before rather than starting from scratch.

03 analytics · multi-site ops

Agile scorecard stack

Draft
Thesis

Multi-site scorecards are always in flux, so the data team plays whack-a-mole with logic.

Build

Marts derive from a field-level change log — definitions and anchor dates become config.

Value

Re-base a metric and restate years of history in one run, not a pipeline project.

  • field-level change log
  • incremental marts
  • anomaly/trend
  • root-cause
  • next-best-action

← fed by ingestion base · feeds → ROI engine

The problem

Think of any operation that runs its analysis segmented across dozens to hundreds of individual locations, with a manager → district → regional → area hierarchy. Everyone has a scorecard — five, ten, or 350 KPIs they feel they need to track their operation. Managers get these scorecards and are supposed to figure out who to talk to to solve what problem. Maybe it’s daily, weekly, or monthly for them, but at the warehouse level you track it across many dimensions for every location. The hard part is upstream: these metrics often need many variations and anchor dates, and business needs and triggers change. The data team pays for that flux with database and business-logic sprawl, inefficient pipelines, and whack-a-mole every time the business uncovers a new anchor date it believes in or changes the temporal groupings it wants. You want your data team focused on the hard things — the architecture, understanding the business, why the data matters — not wiring up more silver-layer logic and gold-layer CTEs to add yet another metric to the scorecard.

The build

Start with field-level change-data tracking to truly capture events in a tidy way, and make your data marts just a derivative of detailed, rebuilt audit logs. As long as those logs are intact, everything can be rebuilt from scripts — the only reason to materialize a view is performance, so there’s essentially no friction beyond spinning up a large warehouse and running a stored procedure to start over. Downstream of the scorecard sits automatic anomaly and trend detection at the location level, and downstream of that a root-cause analysis engine. All of it can run without AI — but it can also feed agentic AI that drives next-best-action without having to run all that heavy math itself.

Why it's not built yet

Most teams feel it’s too much upfront cost for abstraction. They’re handed the business’s requirements and figure they’re close enough to the long-term ones — and maybe that’s true for some businesses, but many are in enough flux that being agile and efficient matters. It isn’t particularly hard; it’s building a foundation that doesn’t always have a clear day-one business ask to tie it to. It’s also a real architecture challenge to take high-volume batch data and incrementally maintain field-level change-capture primitives — for what? Just to re-aggregate them slightly cleaner in the next table? It’s a long play most teams don’t have the appetite for.

Where it comes from

Rebasing the temporal grouping of a core metric like conversion rate to a different anchor date you don’t even have captured in your data model is a major headache. Say monthly sales count moves from date-created to first-visit date — and first-visit date isn’t even a field the source system maintains properly, so you add another line to your pipeline to capture it correctly. Then, if you want to rebuild history, you have to hope the data history is clean and the query to get it is reliably partitioned over your temporal period. That’s a lot of work compared to an architecture that naturally supports inferring events from the timeline of field changes and before/after values.

04 strategy · roi

Best-case / worst-case ROI engine

Draft
Thesis

Roadmaps chase line items that feel urgent but can’t actually move the P&L.

Build

Ride the KPI tree to estimate each bet’s best-/worst-case P&L impact — days, not months.

Value

A fast ROI range before you commit teams to the wrong KPI.

  • CPM
  • KPI trees
  • P&L variance
  • best/worst-case

← fed by scorecard stack

The problem

The big one is opportunity cost. Time, energy, money, and human capital all spent on initiatives that realistically are going to be extremely hard to turn into actual business value — even if the line item you’re targeting feels very important. Anyone responsible for cross-functional or whole-department performance meets the stakeholder who owns a process and is convinced their data needs are critical to maximizing their own metric. They sound convincing in the moment: “We see this issue every month where xyz is doing abc and we’re getting leakage, and this number that’s supposed to be 20% is 40% — it’s a major problem.” But when you zoom out to its impact on the P&L, you’re often just not talking about something that — even if the proposed data product had the ideal effect — would move the number much.

The build

This isn’t especially novel; plenty of places do a version of it in the CPM world. The problem is most formal CPM projects are months-long engagements with several consultants, and I rarely see it used at all where I’ve been. How do you get to value faster? If your KPIs are modeled correctly — per the scorecard stack in idea 03 — this becomes pretty straightforward to at least estimate the drivers, best case, and worst case for an initiative. A mart downstream of CPM and your KPI trees can find associations and help tell the story of which P&L line items are the most variable and what metrics might influence them.

Why it's not built yet

At the abstract level, blame goals. Most companies think in terms of budgets and targets, not best case and worst case. If you build a process around managing labor cost to a 20% target, you get a lot of 18–22% numbers. Someone came up with 20%, and maybe they had good reason, but over time that just becomes the norm — without a deep process around whether it’s the right target, or whether major operational changes could drive it even lower. It’s very rare for anyone to consider what a realistic high number would be, or whether the distribution is left- or right-tailed.

Where it comes from

Anyone who guides strategy and tactics would use this to assist cost-benefit analysis and the potential ROI of new projects, and to decide how to deploy both human and financial resources. It’s hard to say what a wrong bet costs. If you’re deploying a half-dozen teams across a half-dozen initiatives, maybe not much. But if you’re deploying a significant portion of your time, energy, and development talent attacking the wrong KPI, it could be massive.

05 enterprise · workflow

Cross-system workflow layer

Draft
Thesis

Approvals hop systems and stall at the seams; the process is capped by the vendor.

Build

One activity schema on the lakehouse orchestrates human-, automation-, and agent-owned steps.

Value

One surface for enterprise workflow — stitch the cross-system processes vendors won’t.

  • activity schema
  • lakehouse-native
  • human + agent orchestration
  • governance

← fed by ingestion base · kin to TilesApp

The problem

Think of a regional manager in a multi-site operation. Who knows how many approvals come across their desk — requests to clean something up, new-hire validations to monitor, access requests, expense-report approvals, you name it. And for a single flow they might have to end a process in one system, copy the data into an email, wait for a document to upload, and so on. The pain shows up at the seams: one moment everything’s fine, the next the regional is on PTO and some systems have his backup coded and others don’t, so he’s trying to handle things on his phone. His week is just scattered with context-switching, and approvals and access get delayed. It’s tough to manage. Worse, your business processes become limited by the third-party software — when you take control of your own automation, you get the flexibility to stitch cross-system workflows together. So who pays? The company, in the opportunity cost of what doesn’t get done.

The build

The key is the architecture. It’s an activity-schema-like model where any workflow app’s structure could fit — similar to my TilesApp productivity app, written as generic entities that can belong to other entities. You build the template for your process by nesting what is and isn’t possible and allowed. The actual activities for every process and workflow item are one core table, living in a central location. You can then orchestrate a process as a combination of human-owned activities and automation- or agent-owned ones. Some steps fire on an action — completing a list, uploading a file — and some are automations waiting for a trigger to then sync to another system that waits on human action. All of it can be managed, observed, and forked off your central lakehouse. It can be embedded in a core company-wide workflow app, or pieces of it can be embedded directly in the systems they work in — UKG, Salesforce, whatever.

Why it's not built yet

It’s only relatively recently that a central platform like Snowflake made this easier to do without as much of a lift. You could have built all of this yourself ten years ago — it was just much more friction. Now that it’s possible, it doesn’t exist mostly because it’s very ambitious, and migrating processes is a lot of friction. You need a brand-new workflow that’s difficult to build in an existing system to begin to justify it, and most times the risk and the unknown drive people to some third-party software that claims it can handle it.

Where it comes from

The proof is just living as the de facto director of data and analytics for a multi-site operation in this space — there was friction everywhere. The one use case that was going to drive an opportunity to implement this was a complicated onboarding flow that involved HR, OPS, and IT, with hold windows and data events orchestrating the process in between the human elements. This workflow architecture was going to drive that. It got pushed.

← back to the issue ask about Nick