Welcome to the Agents of Data podcast. In this episode we dig into semantics. What is a semantic layer and what does it do? What are the challenges of implementing semantic layers? And where does a genetic AI fit into the picture? We're joined by Frank Weigel, Chief of Product at Matillion, to dig into how we can avoid stale data, battlegrounds for ownership and the shift in human roles. As ever, share your thoughts on Matillion's LinkedIn, Instagram and Reddit.
So in a previous episode we spoke about MCP, A2A and semantic layers. Today we're going to double click into the semantic layer. I'm joined by our Chief of AI, Julian, and Frank, our CPO. Let's get to it. So Frank, can you start off by telling me what is a semantic layer?
Sure. It's actually I think an interesting problem and it's one that I think a lot of people in the agentic space are going to need to be familiar with. So there's a fundamental issue when it comes to LLMs and business problems, which is that ultimately the user they're going to talk in natural language and explain a problem like they would to one of their colleagues. Now when we talk about something we use a lot of terms that are specific to our company, and you guys all understand what it means because we all work at the same place. And so that same problem exists for any agentic AI - that it needs to actually understand what people are talking about at a very high level of specificity because you typically ask them to access data, access the system, do specific tasks, do a calculation. And so therefore there's kind of like two levels to what we call a semantic layer. One, the higher level is about what I just talked about, which is really the mapping certain language terms to a more specific meaning for a company. So you know in accounting this could be when somebody asks for like you know I want the revenue numbers, a specific company will know how revenue is calculated for them. You know you have this maybe in the BI system today if they talk about one of their products right. The system needs to understand ‘oh this is one of my products’. So, when you ask about you know what revenue do I have for my magical widget A right. It needs to know that magical widget A is the name of one of my products. Things like this. So you have that level of semantic layer which is about understanding a lot of the business language and then going often to definitions.
Now for systems like what we are building when it comes to data you then have an additional problem which is that generally language is always very imprecise when it comes to selecting something specific in data. And so you know in the case you know what we are doing with data let's say you you're asking a human or a virtual agent doesn't actually matter you know do something with products and then we look at a data warehouse and there might be 10 product tables in there. And the first problem is well which of those were you actually talking about, which one did you want me to use? Which are eight different revenue columns do you need when you when you once you've got to it that… so a human or an agentic AI need to know how to pick it and so you do this by ultimately providing more metadata more technical data about what's stored in a table and what does it mean right. So for example it could be oh this product or this revenue table is for this subsidiary and you know or this was this business unit that was an acquisition so it's still kept in a separate like accounting system right. And oh this product table is one Julian created as a copy because he was playing around and trying some things out so he created a copy so don't ever use that one. And of course this isn't just at the table level it really goes down all the way to individual columns in a in a database in our case about understanding this. And of course in other domains you know it might not be about understanding things in a in a data it might be about things in a form it might be about images right and how are you going to associate certain business meaning with images. So this kind of concept of a semantic layer of basically adding an additional depth of understanding that's not implicit in the thing itself so that agentic AI can reason about it, use it better and ultimately do what you know a human would do with it.
You know in our use case I looked at the same that we're providing in agentic data engineer or data engineering system we need to give it the same notes you would a new data engineer join the team to say you know the kind of intro you'd give her even if it's a short five minute or half hour thing like this schema is the bronze layer this is the silver, this is the gold, this is the people tend to prefix things with their initials as their personal tables, just the simple intro bullet points, and how do you feed that into the system to make sure that the your agent is able to take advantage of that or you can teach it just like onboarding a new member of the team.
So where do you think people actually right now and in the world we live in store this information at the present like what sort of places can you get this? So in the heads the old and gray head members of the team who've been there a few years and some people remember this with pointers which are go and ask bob or others are and actually just understanding in the series means and knowing who's working with the particular systems and then knowing who to ask is a valuable piece of information how we support agents with that maybe one day we would drive. Sometimes it's wikis. Sometimes they were even written this calendar updated this calendar year um often it was a project somebody diligently did two or three years ago and it's the best you've got because no one's touched it since.
That's actually I think a really interesting point which is this isn't really a new problem right this this problem just like Julian described exists with humans if you take AI out you had already various attempts at it just for the humans to solve it right like because knowing like Bob is the person you need to talk to works great and Bob is amazing but the problem is like what if Bob is ill of course you know something like this so people started to try and collect it and I think you have got the the unstructured ones right like you say like there's often like an on warning document every new person reads for data there is the very structured approach like people start to build data catalogs master data management though of course I don't think those are typically all that great and up to date right which is the whole other angle then of of of that problem. Yeah the stuff becomes the information becoming stale but sometimes stale information is better than none that's the the major factor there and then you've got obviously the automatically captured metadata that seems within on a system in a database structure you've got anything we collect around lineage but often again how do you use that in a systematic way all sorts of different challenges and in some ways we have an advantage for the that this is an LLM based problem because collating multiple sources in different varying and even in consistent formats is actually quite a space that I can give you I can give you a chunk of a table and a sample of some records and some free text notes bundle them all together and say okay can you try and make sense of this. But just from this conversation like it's all good for humans you know I can talk to Bob, Bob can talk to me I know Bob has that information in his head I go and get it and then I work on a problem for AI agents like as we've just been talking about like this stuff's all over the place I'm sure there are businesses out there that have a semantic layer that they've built and they've spent loads of time that's just perfect but the reality is that most businesses don't have that and agentic AI only knows what it can see so how do we solve that problem how do we somehow centralize all this information so that it is available to our AI agents that work on our on our problems.
Well we asked Sam to build this one. But I think it might be interesting to talk a little bit about right like how are we currently looking to solve this because it's indeed a problem we are very much kind of working on right now. How do you how do you how do you populate that layer how do you handle how do you handle I think one interesting problem is how do you handle the aging of the information I think that's probably and conveying the relative the relative recency is how do you had what circumstances do you want the information to be forgotten is another challenge.
I think I think you have to obviously first you've got to start with the basics you get the metadata information if we talk about I mean an easy environment for this is like is a data warehouse like Snowflake you've got column information you've got table information you've got even the descriptive attributes of those tables what type they are that can all feed into the model and agents can see that they can start to reason about exactly you know and start to assume what columns might go where and what tables can be joined to to which tables but you know assumptions are a bad thing and if an agent assumes early on incorrectly goes down the wrong path you're in trouble so then you've got to think about how do you start to augment all that information together save it somewhere is that you know another database or another metadata layer that you store or a data catalog or is that something more like a graph database that can track all the relationships between things I know we've been we're looking at that a little bit I'm quite a fan of the graph database and from all this problem because we have we have a lot of stuff that is that has connections but doesn't fall into neat categorical structures and if we try and force it into them we're going to struggle and make compromises all over the place. So maybe just if people haven't thought about that problem much right so I think part of what we're discovering is that you know indeed there's lots of existing sources of metadata especially for the metadata it's very rich in the data world at least but then if you just all read it together you know you can start storing it whatever database but I think Julian as you referred to sometimes for us right we want to understand more than just the relationship like you know is a child of like we want to actually ideally kind of you know be able to also walk semantic ones like you know this is similar to this one or it's it's uh understanding the relationship like if I'm looking at sales database I'm looking at my extracts from Salesforce I need to understand an account may have come from an opportunity that may have been a lead which had a contact been in its past and I still need to associate those contacts the account because I need to know who to call up to see how my customer's doing and all of that information is absolutely gold dust when he was trying to build that report and some of them might not even be certain relationships right we think this might refer to something or we have some knowledge about a table but you know the some of the column information has changed because what was in the data catalog isn't really the latest anymore so I think being able to be like we think this is about the same table those kinds of relationships and being able to walk them yeah I think will be easy that database that that schema drift will change happens more often people think the number of times we've seen data pipelines fail because an optional column or configurable column in some SaaS system has gone away we're no longer tracking that category on the customer so the column was deleted and suddenly the pipeline's not not able to talk to it and of course the semantic layer maybe it needs to keep up keep on top of that and understand that they just not valid so this is a very complicated problem right but it means that you ultimately need something watching the data sources watching the data warehouse the tables learn it learning from what's changing and updating that model over time ideally close to real time as possible so that when you give your agent the next problem you can go and solve it I think I feel like semantic layers there's always been a need for them but as we've kind of spoke about there it's always in someone's head or someone knows to look for it so talking about more closely with an agentic system why why doesn't it and we'll go to you Julian why does an agentic system need a semantic layer why are they suddenly becoming the cool thing that everyone needs to have it the information certainly that kind of information is very valuable to inform me some of the activities that your agentic system may need to do but not necessarily or let's say we again in our use case I'm asking Maia to edit an existing pipeline and commit the pipeline into into the repo that's not that's not a use case where it needs a semantic layer but if I'm asking it to build a new sales report or a new customer view it absolutely needs to be able to find out okay what tables are there and they may be you know that's fine in a toy example in my nice dev database where I've only got 20 tables where there's 200 or 2000 that becomes a challenge and the agentic system which fits in neatly if we can give it we give them a tool to say okay go query the data catalog go query the semantic information go query the notes the alternative is I'm passing those notes into into the whole set of documentation in with every prompt which slows it down gets very costly and also just provides noise so we need to provide some mechanism by which you can search and find the relevant information from that rather than asking the LLM to read every piece of documentation on every every corner of our of our landscape and that's that's where the things like the graph database for you to come in which is like okay well how do I search and how do I get the relevant concepts back I think one thing that I find really interesting is because as we said this is very closely at least at the metadata level it's very close to related to data catalogs and as we were kind of I think pointing out those are rarely truly well maintained and up to date and authoritative but I think with LLMs we do have one advantage because they can cope with data that's kind of right and it's better than nothing and it will more likely let it make right choices or if the the additional scheme you know additional columns were added to the schema it can typically I think cope with that like much better than than traditional systems so for me that's going to be one reason why I think where everything has failed in the past to try and build authoritative master data catalogs I think our or the modern kind of semantic layer for LLMs I think will end up being much better not because it's a complete accurate catalog necessarily but because the LLM can deal with some uncertainty and some conflicting things it can deal with the unstructured data yeah and I think for us also using making use of that catalog more often and a data data quality I've always found is directly related to how much downstream activity is on that data because people will go back and say I can trace an error to this I will challenge it then there's reasons to make want to update it because it's being heavily used it's like if I if I've got a database of shipping data and the parcel goes to the wrong place if that's wrong you can bet that's going to be challenged or a database of sales figures that relates to somebody's bonus that data is going to be the cleanest in your data warehouse but if it's if it's notes on what marketing events the person attend the thing attended and it's not going to no downstream process breaks that that data could be messy and no one's really paying attention to it because not being used so the LLM using it gives the the folks related to the catalog an incentive to get it clean and give us a reason to pay to start paying attention to it I'm kind of a believer in an over time this and you know it's happening with Maia now but the responsibility of the people who are currently spending most of their time you know building data pipelines or or building a pipeline of some kind that relies on this information will slowly shift over time more to the left and the right one validating the output of those pipelines but also maintaining and looking after the catalogs and the semantic layer because I think we're going to see more and more pipelines created and more and more information automatically being generated and pushed into these semantic models and we've seen that we've seen that process in how people job shift and focus shift exactly in other gen AI enabled work use cases like art we've seen with the support case answering where first of all the because the the incident management level the incident response more of that is being automated especially if the data is good the documentation is good people have got more time to look at the knowledge management layer but also that knowledge management layer is now valuable not just to the human user but to the LLM and we've seen how good documentation really affects it we saw a case was a support case where we we got it kicked back to say modern gave the wrong answer we review it we look at we would go and read the article that had been fed through in the rag and oh look two people who agree with the LLM two people that disagree with the LLM root cause was the article was the ambiguous itself and we're going to see the same thing I think with the semantic layer which is you spend less time building the pipeline more time okay have I got my documentation up to date is it am I providing accurate information and I think it's also this is maybe one of these areas where you know AI just isn't magic right sometimes people think like great the genetic AI's got to like do everything and do it so much better than any person or um but I I think what's often overlooked is that aspect right that yes there's some amazing capabilities the LLMs have but it still needs the context right context really I think we continuously to find always is is king and it's the difference between getting great answers out and getting um less reliable answers out and so just like you know you used to train your apprentice and like as part of that did a lot of knowledge transfer in the end to provide that context I think that will shift as you suggest to now spending time on providing that knowledge in a form that the LLM and humans can take advantage of it's a good mental model to use actually the think to think of the LLMs as being like a new person a new employee coming in a junior employee similar they're bright they're enthusiastic they've got an infinite work ethic but they're very dependent on you giving them right instructions and the right contextual information to do their job right I think it's I think it's okay that like like a real human being working on a problem doesn't get it right first time every time it's going to be the same with our with our you know our AI agents in whatever tool we use you're going to do a lot of things with builds of data partners whatever you're doing and then you're going to go and check the outputs and you're going to be like yeah well that looks good that oh that one's not quite right and maybe that the root cause of that was actually that the semantic layer was wrong so what you're going to do there is probably revert that go back change the semantic layer update the information and then set it off again it also gives some opportunities in the validation space which is once I built my pipeline or built my my chunk of code or similar to be able to say okay let's evaluate this semantically I think I have built something that gives me this report but I mean think of the number of times in a management or an auditing role where we go okay we think this dashboard is showing us these figures and then when you really clarify that okay well here are the little conditions here's me oh it's excluding our definition of paying customers is this that sort of that actually.
I think we so far talked a lot about kind of that that I think well we would call more technical metadata which is truly like in the guts of like what does this field in the database mean or where can I find this kind of data you now talk about those kind of a higher level semantics the higher that gets stuck and often that's the cause or those of us in the data analytics space have numerous times had somebody going your dashboard doesn't agree with their dashboard and you go or you have to go off and go like okay did we get it wrong is there a mistake or actually 90% of the time it's this is actually apples to oranges comparison because there's a few quite subtle differences about what's being counted what dates are used what assumptions go into it and that's kind of buried somewhere either in the data engineering code or in the dashboard and that actually logging where those business decisions have made or those definitions to try and mean that the next time somebody else builds it with or without an AI tool let's hope they use the same definitions well a semantic layer and a gen AI assistant actually can help drive that and help you capture that tribal knowledge rather than us end up with once again two interesting assistant clashing sets of numbers so where are we going to get it from where do you see so that data that is where I think you're kind of you start to get the business ontology view the stuff that needs to be sat alongside sat alongside the okay here's what's in this column um but I think there's a whole lot of stuff that needs to be recorded and again it's not it's not a mess it's consistent structure but it's almost a a lot a log of the decisions made or the assumptions made uh and we start to track that once again this is where I seek some opportunities for the graph technologies because okay this decision has been made about this report that uses these three things which relate to these two three entities I need to track that relationship so when somebody starts to query customer the I can make my LLM aware to say okay well actually there's been a business decision made about our definition of what is a paying customer or what is a churn customer and that and that kind of stuff that I guess we we did focus a little bit on the the lower level databases stuff but that stuff likely to just be in a confluence uh space somewhere or maybe several or a common dashboard literally could even be jira tickets right it could be anywhere if it's an action's log actually I think bi dashboards because right for a lot of the bi tools you end up building basically the underlying model and then in the report you can I think often deduce like because it literally will have the calculation with the name and uh being able to extract that is is um I think certainly for us like something is not reminding me of earlier in my career turning out dashboards and putting the notes into the dashboard in a way that it wasn't possible to crop and copy face the chart without taking the notes with it so you couldn't get it into some exact presentation without the caveats and it's how does that stick it stick to the data in some way how do we keep that close together .
So what do you guys think about over time do you think customers or or people will customers for us and then generally out there in the industry do you think people will push to try and centralize this semantic layer in a single place or do you think we'll end up still in a scattered environment where they have a BI tool with its semantic layer uh you know an ETL tool with its semantic layer or uh I suspect it's correct me if you disagree for it but I think it's gonna or forever be fragmented there's forever going to be this that initiative so that there's always going to be a reason why this week we want it all here but next week we're over there is a more attractive place to put it it's actually I think a really interesting question because given how important semantics are for agents working well I actually think it might become a bit of a of a battleground almost right because you have at all that understanding who owns the best layer that understands most about that business will be able to have the most the best working agents so it's going to be interesting whether you know is each kind of like tool or a genetic system trying to you know build up its its its big base is there going to be a drive from customers to say no no we need an open standard and everything about it centralized though of course then with an open or with a centralized system that tries to serve lots and lots of different processes.
We just talked about the you know the the the graph the knowledge graph you're building that could become so complicated then are we just back to the world well there's a data catalog that in theory is amazing it's never actually used is it even going to become a bit of a little bit of a political battleground of who wants to own the definition thing I I would like to own the definition of what we consider a revenue generated customer why because it relates to actually compensation or my my target my KPIs or similar that kind of that is true so for those things I feel there might be already processes because for like for anything financial and accounting there's probably already some kind of going to have to be like how does that central owner get the validity to audit or visibility for those that logic actually that's a whole other we haven't talked yet about we have talked I think about it very abstractly in how agents will use it should we talk a bit about that last the visibility because I think we need to make sure that people can actually see what's in it and go like oh that's wrong let me fix it have you thought much about how we would do that in product yeah I mean I guess it does depend on the ownership a little bit but at the end of the day if you've got this if it's a graph do you give the user some nice UI where they can go in look at the graph traverse the graph look at like oh this table and you know this table and this is what it means and add those things in I think you would need that right it depends whether it's flat or a graph but also I think giving the users the ability and the tools especially given now LLMs and how good they are at taking unstructured data and feeding that in to update the model is also giving them an interface to be able to pass in their unstructured information and push that into the catalog as well the the kind of transaction history on the the conversation history of the interactions with the LLM that's taken this and the input becoming a valuable artifact you know it's own right it's the ability to once again when we get the whole the exact dashboard discussion about okay we want to we want to order these numbers because these things are inconsistent or similar the most valuable that process is tentative of kind of tracking your way back through to right or who made the decision about we we'd record it this way and that's probably in some meaty minutes if you're lucky or in a or it yes and we've actually got where we can capture that process we can capture the assumption that went in even if it was the LLM making the assumption say well i got the i've got this from the notes that said here's here's the revenue field i should be using so i put that field in and that was my lot and here's my other bits of business logic about explainability of where the network came from and of course you can chat with the LLM to say go back to your history um tell me why it's just like querying the analyst like what does it report like that well because you told me to kind of i said i thought you should use this column says my LLM but you insisted we use that one smart when when when when our agent starts saying i told you so because i know we we're doing well you talked about one thing there which is so you talked about the unstructured documents and then putting it in some sort of a guess structured i mean even if it's a graph it still is like things that can be walked have either of you thought about how how we could do that um it doesn't seem straightforward i mean there are obviously the graph i mean everything there in in a graph situation you have you have nodes and edges you have through you have the nodes being that the facts or the details i can have attributes of their own type of things so they could have tags that we could filter on you know okay this is a this is relates to a table this race recall on this race to a decision those pieces you could then it allows you also to put date stamps on them and forgetting or memory or recency is an important part then you have the edges and really actually the edges can be you could destroy this is simple tabular structure because you've got i've got start node end node the edge that connects them and within that edge i've got the um probably a definition about what the relationship is so if i was building one off a bunch of documentation i might have or a bunch of HR pieces i might have node one is Julian node two is matillion the edge relationship is works for or is the employer of sort of thing um and that's you might also have some degree of like depth of that relationship like occurs in multiple places so it but the the key thing really there is it's fairly simple facts it's just the way you join them together that's complex and we want you know we want to be able to filter them based on the types of fact but isn't there the figuring out what's like if i read uh if i have like a wiki or a doc and it talks about some table isn't it still then going to be really hard for us to figure out but which table is it actually talking about just like you know there might be multiple warehouses there might be it might actually not be in the warehouse might talk about another source so have we thought about how to kind of match that i suspect it's uh yeah you can't just throw you can't throw an llm go give me the structure and then feed that directly into a graph or a data i think you've part of it's going to be like if the information just isn't there can the model come back or you know do you converse with a do you converse with another agent on this and does that model come back with i can see there's these tables i can see that they maybe are relate to these tables in the data warehouse which ones are they so it's still a human somewhat human orchestrated approach but it's doing a lot of the heavy lifting.
It's one of the concepts that you know we've been some of our other work we've been chatting with some of the folks at Manchester University and they're giving some interesting ideas didn't working with um and there but one of the interesting spaces there was inferring what entities should exist or are logically likely to exist in a um in a system like this and then look at the gaps look at there okay well i can see you again i see i've got accounts i've got customers i've probably got products they're buying those sort of things i've worked and i look at my existing thing i've captured okay i've got the account i've got the customer but i don't see you have a product details table that gap is actually a useful piece of information to go looking for yeah to be able to say okay i need to i should be looking for this i'll look through the table to see if there's anything that looks like a product table even though it's not been mentioned um or just go back to the users they look there seems to be some data missing maybe we've got source missing here i really liked with what you proposed like that you come and actually just ask a human for clarification that you just at the end of it that this is why you know humans are not going anywhere in this process our our roles and our jobs may change over time but there's just gonna be gaps in what the llm knows the context like you say context is key if there's a lack of context assumptions should never be made and someone that has knowledge needs to come and getting the system to the agentic framework to acknowledge its limitations and acknowledge its gaps i think is going to be which is really hard to do with llm say hate saying they hate they hate being they hate saying no and they hate saying they don't know something because that is just not how they fundamentally work very human right yeah for those as i've mentioned before to others as we've worked for you as of a certain age getting the ai to say i'm sorry i can't do that dave is our ultimate challenge um but that's and then also we you know we talk a lot about human to machine and machine to machine or agent to agent type learning but i see actually machine to human might be an interesting one though when does the llm know it needs a slack bulb and ask him a question about oh you know what was in that tape yes that that'll be a really interesting one to and how do we make sure it doesn't bug him too much well and especially because things keep changing right so so far we i think maybe talk more about how does the central semantic layer know kind of what's already out there but then of course i mean in our case like a data agent will actually literally create new data because that's the purpose of a data pipeline so then do you think like what the agent themselves will kind of keep updating with the newly generated to keep growing that i think they definitely i feel like they can they're they're the ones potentially building the pipelines now so they're the ones that know they've just you know potentially joined two tables together to create a new table that now needs its own semantic meaning or extra additional descriptions and metadata on top of so they can do that but at a fundamental system level as well can the platform do that you know a more deterministic way of like two things have been joined together now i'm going to relate those in a in a graph maybe somewhere or something like that ideally in most things you'd still do stuff deterministically programmatically but agents can infer so much more if they already have the awareness of your project in the context of the question you ask them you know your platform might not know that at runtime but the agent does so that can help to facilitate that knowledge we're in a space where the agent can test hypotheses in some ways it can what do you mean but so the agent sees the information semantically so i think these tables are the right ones i think this is how they connect in terms of what we're building what we're doing we've given the obviously the core function of the agent is to build data transformations so we could we could allow it to say okay well take the best get your best case table it's just the same way human was try and build a pipeline that joins them together look at the output you know is there any data there at all when you make that join even if you do make the join and then you aggregate it or count it up do you get same table same levels of volume results as you'd expect and if not then it's going to try something different and yeah and i guess the semantic layer helps right because now as you said when i look at the data i don't just look is there data but i can literally try and be like hmm i expected the answer to be an email address or a postcode or whatever and i can now look at it and go like this doesn't look like the right thing right and i guess it's true you need to have the semantic understanding of the source and the question of what you expect dealing with it i mean semantic layer is where you might be dealing with some really nasty problems like that you might have heterogeneity of the data which is inconsistency of the types of data in the same table because they've come they've come from different sources they've been slightly transformed or joined together or the process changed a couple of years ago about what's going in there and how do you do that well normally again our expert data team has some notes to say remember that we we hijack this field for this little project or that we just change we we flipped over from one source system to another so some of these columns are not being populated or they've got different categorical values here that's well you know our sale amount now includes the sales tax where previous this system it didn't know like knowing that one's a real gotcha first and that's the sort of thing you can include in semantic layer to say look for these those little ifs and buts in the conditions of our data processed that will be very cool and i think again this one of the things before llm's i think this kind of annotation gets really difficult for like a traditional database engine or something could probably you know like can you know tempo but they get schema versions and whatever not the half a dozen gollop points it's like always remember these odd little things that happened over over time this all makes me wonder about one thing which is you know we talked about these these these magical semantical layers i think it's an interesting question we talked a little bit earlier will there be like one will there be one company-wide for all agents will there be one per product will there be one per department and then who kind of owns it?
Any any thoughts, Sam, what's your prediction? I there's a tough one i think there if you model this stuff as a graph then you have an opportunity to form quite complicated relationships between different domains or areas of a business and i suppose it depends how that business splits up what they do if they've got teams sharing a lot of the same information that needs a lot of the same semantic data then it means it's natural that you don't even want to collect that once store it once and share it if you've got teams that use the same metadata from an underlying database but in a slightly different context that completely changes the way the agent has to think about that data then you would want a separate layer and a separate set of kind of meanings about that data for that what i reckon it's got i reckon it has to be owned by the people whose business process that topic covers um so it becomes a natural a natural flow from the way they're doing business the way they're operating every step you move away from the person who's actually doing the process or making a decision based on information the more likely you are to get inconsistencies and things that then going to lead to well semantic errors because of the way of capturing the way it's being reported and the way it's being processed isn't doesn't reflect reality so you've got to have a system that can accommodate i think you've got to accommodate departmental nodes or domain specific nodes um in terms of at least what gets fed in then somehow we manage to bring it together and where you've got friction hopefully that maybe comes a forcing function to make some tough conversations come out about the fact that you've not been reported consistently against each other all this time it's going to be interesting I think because you don't I don't think you can expect that each then department builds up their own kind of from scratch right but yet figuring out what should be shared and what isn't yeah she sounds like a really interesting uh process I'm sure I'm sure we've seen many times over the years where at the interface between sales and marketing the definitions of the things being handed over are hotly debated or or actually just quietly have differing opinions on um can you have can you make the uh catalog itself an agent and can you have it talk to like another one so can can they kind of come to an agreement have I made like debate amongst each other about like what they think should be there is i mean there is actually one piece to which is a good use for the lm to process is to say these two these two sources of truth where they differ where are other inconsistencies and i think that might be the process but i like the idea of my let my agent talk to your agent yeah they'll they'll hang out that nobody really wants to have that discussion for a three-hour meeting on like what's campus elite or not i can definitely see the you know this crazy world of agents i mean agents of data this podcast is what we're talking about i can see crazy world of agents talking to agents.
A2A, MCP, service things we want to cover in later for another you know fun topic look forward to that one, but yeah Frank, Julian thank you so much for coming on Agents of Data um cheers.
Thanks for listening to The Agents of Data podcast. The podcast is brought to you by intelligent data integration platform Matillion to discover agentic data engineering visit matillion.commaia
Don't forget to subscribe and let us know what you think of everything we discussed over on Matillion's linkedin and instagram.
We recommend upgrading to the latest Chrome, Firefox, Safari, or Edge.
Please check your internet connection and refresh the page. You might also try disabling any ad blockers.
You can visit our support center if you're having problems.