04 · What I'm Watching

Six trend takes.

Six current trends in AI, data, and data engineering — reacted to from first principles. One throughline runs under all of them.

"The illusion of an answer is more dangerous than no answer at all."

Each take is a summary in my own framing. The full, unedited words sit underneath it — and link back to the source conversation in the archive.

01 Trend

"AI is replacing the data analyst" — LLMs + NL-to-SQL, so business users just query their data directly.

Self-serve access isn't business value.

In my words

I would expand this to include chat with data over semantic layer (Ontology, semantic model). This goes back to my point earlier, which is just because you are using a data product to answer a question does not mean you are providing the business value.

As it relates to this, not everyone is a data analyst, wants to be a data analyst, or is capable of being a data analyst. If we pretend anything is possible in terms of end users answering their own questions and building their own artifacts (reusable recipes, self made dashboards, push reports, whatever), this does not necessarily mean they will be providing the business any more value than if they were just "winging it". It's very possible they end up making worse decisions because they see false signals and become over confident or run into the problem of "the illusion of an answer is much more dangerous than no answer at all".

I think if we are going to be putting this type of tool out there, the business must focus on CPM / BPM, truly deep diving on what metrics ACTUALLY matter and how to think about them..

see source ↗ Conversation · 11:30 PM · conv-4-p1

02 Trend

The semantic / metrics layer push — dbt Semantic Layer, MetricFlow, LookML: define metrics once, centrally.

Don't build the semantic monster — spend the week with the business.

In my words

This is related to above. This is important, but doesn't matter if the metrics aren't any good in terms of actual bottom line business value.

I think the biggest problem here is what snowflake is attempting to solve with the, not sure the exact name, but open semantic interchange it might be called. Right now there are so many flavors of how to write these semantic models and they all answer certain things better than the others... and I don't know any of them are really that flexible to answer the wide variety of twists and nuances on the metrics without becoming a monster to maintain and not really any value add compared to the alternative.

I do think chat with data is very important and some layer of this matters (ontology in general), but I'm not super sold on the benefits of trying to put everything into a semantic layer. If you were to pressure me today I would say some version of lightweight semantic model as a reference point or base for some extremely core and common data marts / metrics... but focus much more on the CPM side, business process specific examples.

I'd rather spend a week working with the business, documenting examples and queries and deep dive analysis that resulted in real business impact than building a formal semantic layer. The north star first principles, traps, data nuances in that week long deep dive can give conversational analytics some of the specifics it needs to steer end users towards positive ROI instead of negative ROI. "The answer to your question is this, but this hides a number of real world issues that need broken down or clarified before you make your decision", or something like that.

see source ↗ Conversation · 11:30 PM · conv-4-p2

03 Trend

Data mesh — decentralized data ownership, domain teams owning their own pipelines.

Centralized excellence first; mesh only as far as it's real.

In my words

I don't think this matters all that much in practice in the sense that each company realistic is limited with exactly how much flexibility they truly have here based on size and scale and scope. That said, I think the best model is generally centralized IT as a center of excellence and data mesh as much as is realistic. For many that may look like IT owned resources that specialize in one or two business functions, but still roll up to IT. That's fine.

But the principle is the same as before. You have to understand the business to deliver valuable data products, and in a larger company you generally you have to understand it better than the people making the data requests.. you have to understand it at a c-suite level (or damn near)

see source ↗ Conversation · 11:30 PM · conv-4-p3

04 Trend

The modern data stack consolidating — Snowflake vs. Databricks as the two poles.

All-in on simple — Snowflake + dbt, MotherDuck, Feldera.

In my words

In general I am all in on snowflake for simplicity and favor consolidation and simplicity just to make things as straightforward as possible for your data team. I don't understand stressing over the pennies and ending up with tech sprawl compared to paying more and keeping it simple. 15% markup on token cost for a snowflake managed solution for example to me is a no brainer compared to self hosting on azure.. unless you are talking massive scale.

That said, my favorite data stack right now would likely involve snowflake+dbt as the core, motherduck as the gold presentation layer, and feldera for heavy incremental processing. Depending on the business you may be able to adjust that quite a bit... but I do think the landscape is quite different than what snowflake was originally built for.

I think distributed, smaller scale, incremental pipelines make a lot of sense and the value and leverage of what snowflake was built for is not as much as it once was. I am not a technical expert on the actual engineering side of databases and the detailed pros and cons of execution.

see source ↗ Conversation · 11:30 PM · conv-4-p4

05 Trend

Agentic AI / AI data engineers — agents writing pipelines, transformations, and data-quality monitoring.

Joins were never the bottleneck — understanding the business was.

In my words

AI I think can help an extreme amount on drafting structures, template architectures, Unit Testing, code reviews, smoke tests, and improved CICD particularly... and I think advanced incremental pipelines and column level lineage is pretty vital to that. I think the best data platforms are going to be built and engineered from the ground up to be maintained and enhanced by AI.. but I think we are a far far ways off from AI being able to handle a variety of actual engineering challenges.

I always laugh when these topics promote how much time it saves building an "end to end" pipeline because normal data modeling on relative clean data was never the blocker. A data pipelines time to value is not limited by writing joins and bringing together pretty clean dim/fact out of source systems. It's limited by understanding the nuance of the business process and the data and handling that complexity consistently.

I do think there are interesting architectures within the realm of reason that could help here. If you have the right environments, test data, and unit tests you could probably set up a "goal driven" AI pipeline that understands the end goal and iterates on solutions until that is met... if it has the right set up input data and unit tests to work with. Writing the unit tests would still be the hardest part in terms of working with the business to understand all the scenarios that are possible (and if you can write off / solve for differently any edge cases). Even there though, if you had the raw data AI can help speed this process as well.

Ultimately it ends up back where we started... How do we know all of this is even valuable? That's where AI is still not even close to close, but probably can help smoke out initial pass and expedite the discovery process.

I think business will succeed and fail by using AI to leverage either luck or skill presented as great business intuition. The companies that move fast with correct intuitions on what levers really matter and how to think about them and apply them to future (extrapolation is inherently risky) will be able to build analytics processes and CPM that makes their AI assisted workflows and automations provide positive value while others are fiddling with none or even negative ROI. Likely a story of rich get richer.. but also can make disruptors more disruptive faster.

see source ↗ Conversation · 11:30 PM · conv-4-p5

06 Trend

Vibe coding / AI-first development — LLMs writing most of the code, humans supervising.

Absolutely the way — the exact ratio is just a detail.

In my words

First thought, with a qualification at the end... this is absolutely the way and thinking differently just means you haven't figured out how to do it right yet. Even if no one in the world has yet (this is almost certainly not true) there is absolutely a way to where AI writes the majority of the code with proper north star principles, examples, goals tests standards, etc.

That said, I'm flexible on the exact ratio that is optimum here. Would I be extremely shocked if someone said that is 40% AI generated code in mature tech stacks is purely AI code with 30% AI started and heavily modified and 30 % by hand? Probably not.

I just think vibe coding and having the AI assisted development offers so much more than just the code output. I think it can empower so much more that we were limited by with a fleet of human developers, while also yes...having some of the same drawbacks as human developers.

see source ↗ Conversation · 11:30 PM · conv-4-p6

← back to the issue read the full conversation ↗ ask about Nick