Mostly Unstructured
00:00:00 Speaker: All right. It's another episode of the Mostly Unstructured Podcast. We're back, we're back. Ed and Clay, back on the mic. Yeah, exactly. Um, yeah. Either we are doing a pretty good job or there's nobody else dumb enough to get on here. I think the latter, but let's roll with it. We'll go with that. Yeah. Yeah, we'll go with the latter. Uh, let's just jump right in. So what we're going to talk about today are some trends. Um, and it seems like it's a trend to put out trends every year. Yeah. There is that's a hard thing. You've got it in like what baby names or. Yeah. All kind of stuff like maybe we'll release like a trends of trends thing I think that might be a good thing, a good hook and technology is rife with that. Right. And we're we're big fans of, uh, deep analysis. Yeah, absolutely. We've been around for a while. Yeah, they do a great job. Yeah. Very trusted. They released their 2026 trends, which we're going to date our podcast by saying the name right. The date. Right. Um, but the trends this year they've put out eleven, which in itself could be a trend. Yeah. That instead of ten. Right. You get eleven. They turned it up to eleven. It's kind of like finding the extra fries in the bottom of your. Do you like those fries? Yeah. Of your bag. Oh, score. Yeah. Um, and and this, this trend report is available to anybody. Yeah, absolutely. And so let's talk about some of these again. We won't we'll go in. Not in any particular order. Sure. But, um, one of the ones they suggest up front is that enterprise grade AI governance will become non-negotiable. Absolutely. Um, and they love what they say that say that AI as AI scales. Chaos ensues. So does. Yeah. So does the chaos. Yeah. Um, and and really, this unstructured data collection is rife with opportunity for problems with security, right? Compliance issues. Um, let's talk about what what is governance in general? Um, when we're talking about it in AI space, like when we think of that, what are we what are we talking about? Sure. Well, I mean, one of the things that I would start with is, I mean, if you think about the the roots of KeyMark and the roots of how we serve our customers, um, you know, one of the things about the content industry is you are typically storing a corporation's most sensitive data. Most of what we do, um, is in highly regulated environments around sensitive data. And so like, like HR, HR information, you know, we get to patient data. Patient data. You've got you know, we're in a number of you know, banks. And so you're talking about financial records. And so when you think about those things, you know, the you we never go through a meeting with a potential customer without diving deep into governance. And you know, how things are secured and all those things. So, you know, I think that with the hype around AI and the fact that at a high level, everyone understands that AI needs to consume data, there is sort of this rush to, you know, we'll just just get the data right. We'll just lift it out. We'll get the data right. And essentially, you know, ripping off this governance layer that's been there for very specific reasons. And just saying, well, we'll just pull all that data over here. And that's, you know, obviously wildly concerning. And I think to your point, as you know, people are in this kind of rush to we need to stand up this agent or we need to do this. You know, you're starting to find these kinds of chaotic episodes that are going to, you know, necessitate, I mean, you're going to have people from legal and compliance coming out of the woodworks and these organizations saying, whoa, whoa, whoa, whoa, we've got to have a governance layer. I mean, I don't it's, it's it's mandatory. So when you talk about that, we're also thinking external and internal, right. Like, I mean, from an internal standpoint, I mean, most everybody nobody wants to, you know, the Trojan horse getting inside the walls and being able to get to your, to your information. But even inside the walls of your organization, not everybody needs access from a genetic standpoint to query the the corporate LLM, as we might call it, right. And go, hey, I need this. And I'm able to get somebody else's personal records or something I shouldn't have access to. So is that that's part of it. I mean, to your point about HR, right? I mean, it would be really nice if I work in HR or I'm a, you know, senior executive to, you know, be able to query a, you know, an agent and just say, hey, I need payroll information for XYZ. Well, you know, without an appropriate, you know, governance methodology, maybe now we've exposed all the payroll data, even to your point, if it's behind our firewall, you know, it's almost worse to me if all of our employees are now combing through other people's, you know, payroll data. So, you know, there is and it's funny, I mean, there are people that will look at this and kind of roll their eyes. But what has happened is, you know, there's AI has caused this just sort of, you know, throw up, you know, the baby out with the bathwater. Let's just get the data. You know, we have to show success. We have to please the boss. We have to say, you know, we've got this AI and you know, that comes with risks that people are only now uncovering. So we were old enough to remember the old filing cabinets, right? Oh, yeah. It's more than just leaving the cabinet drawer unlocked. Yeah, that's exactly correct. I mean, version control. Sure. That's another element of governance. It's not just security. It's also making sure that what you're ingesting is being put in the right place at the right time. Correct. Right. Correct. And then just having those confidence thresholds, knowing that, um, what is being ingested is what we want. And it's not going to aid create hallucinations. Um, that we're not extracting stuff we don't want. Right. So these are all part of the governance layer. Um, and I think that's and again, it's been a differentiator, right. Oh, we've got this. But now I think deep analysis is dead on like it's table stakes. Like if you don't have that you might as well not be in the business. Yeah. I mean, you know, you've you've got two really significant risk points. You know, one is, you know what we've what we've talked about, which is data that should be governed and secured, just being generally available, even internally, let alone externally. And secondly, you know, agents just spitting out bad data, you know, bad information and you know that, uh, again, both internally but also externally, you know, can cause material harm. And, you know, I mean, we're the last thing I'm saying is, you know, AI is bad, AI is amazing. I mean, it is amazing. And it's only getting more. So it's that, you know, it has this core dependency on the data that you feed it. And that data needs to be not only accurate, but you have to have contemplated the security model. Yep. So let's move into another trend and we could dive into all of these. Pretty sure we're going to cover them a little bit on a surface level. Um, so another one of the statements they make is that knowledge workers will relearn to read and software vendors will rediscover enterprise. So I love this idea that, um, you know, professionals who we've we've all kind of developed this habit of, well, a lot of people have not everybody but summarization. Yeah. And then just using that as your final word and this shift backwards in a way of going, okay, I'm going to use that as a starting point. But I'm also I'm going to actually go back and read some, some of this. And I think that's what's interesting to me about this is it's so connected to human nature. Yes. Like we all left our if, if like I don't think anybody would go if somebody came in and just did everything you wanted to do to clean your house, made your food, made your bed, did everything for you. Like, would you turn that down? No. It sounds great so far. Sounds amazing. Yeah. We we can easily fall into a bit of laziness. And I don't know about you. I've seen that. I've had to fight against that myself. I'm not pointing a finger at anybody else. It's like, really easy to go. Copy paste. Boom. Done. Yeah. And I think that trend is true. I think people are starting to see, um, the, the pitfalls of just that copy paste, take it all sure as gospel. And then yes, it's great to have a twenty page report summarized for you. Yeah. You can kind of especially the executive level. That's wonderful. But at some point there's a lot of stuff missing in there. We kind of create. What are some things that like you see that that creates in that laziness. Like what are we missing when we do that? Yeah. You know, I think that's it's such a good point because, you know, realistically, one of the things that, um, we know is, you know, over the last five, ten, fifteen years, what's happened to the human attention span is. Yeah, you know, the statistics behind it are are terrifying. And so to your point, you know, if you're going to allow me to not have to read ten pages and you're going to give me two paragraphs and say, well, this is all you need to know. Human nature is like, perfect. That's great. And I think, you know, what's what's happening is that your source data. Right. Um, one it better be right. And, and I think, you know, interestingly, you know, now as you and I have talked about particularly, you know, you as a marketer, when I go to Google today, you know, what I primarily get is this AI summary, right. Which is challenging for marketers. And you know, we're working through that. What I think is an interesting exercise to do is, you know, normally we go to Google to find things out. We don't know. What I think is an exercise to do is go in and type in some things you do know and see what it says and then read it. Yeah. And you know, what I found is that it's kind of directionally correct. One of my favorite sayings is directionally correct. So there's a lot of good stuff in there. There's also some stuff that's just anywhere from kind of to all the way. Just not true. Yeah. And you know, I think the point that we're both trying to to emphasize is like, look, it's if you don't get the source data right and, you know, have a means by which to determine that you have an obligation to question that which you're reading. Right. An obligation. Because to your point, if I go copy paste that with the, you know, with the intent of sending to someone of importance, for example. Yeah. Boy, I'm living with some risk. And, you know, that's the thing about what's happening is, you know, people are just going, oh, I'll just use this MLM and I'm good. It'll all be great and it'll be mostly great. What's funny is I think we missed the fact that humans need training just as much as the AI does. It's not really the AI's fault. I got to tell this story, as I have a friend who's in mortgage lending, and I said something about, you know, he's like, hey, what are you doing these days? And I explained to him, and he's like, I hate AI. I mean, he got really upset about it. And I'm like, whoa, whoa, whoa. Like what? You know who peed in your car? You're that's the hill you're going to die on. All right. Like like, why do you hate it so much? And he went into this whole story about. He works with a large who they'll remain nameless, a very, very large entity who does underwriting and stuff for him. And they he knew all of the, the, the, the people there that were processing loans. And these account folks now are using AI to pre-approve and he was getting stuff back that should have never been approved. He was getting other things back that were denied, that there was no reason for them getting denied. And it was the AI's fault. And I said, well, no, not really. It's said company's fault for not training their agents to still do the job they've always done and to pay attention to that, because he made the comment, they know they could look at that information just a couple lines of it and go, well, that's, that's not the you wouldn't deny that for that element of that piece of data. And so it's not the AI's fault. It's just and I think that's, you know, important that we make that. Well, it's, you know, it's so it's funny. So you know, I'm going to I'm going to date myself. So, you know, back back when I still had hair, um, and I was, you know, heavy into the, in the OCR industry, we would, you know, kind of evolved and, you know, now we're talking about IDP, but, you know, as you would go talk to prospective customers and say, well, you know, the OCR has to be one hundred percent. It's got to be one hundred percent. Well, it's not going to be one hundred percent. And the the funny thing about that was, you know, the sort of thing that you would sort of have to ease your way into with the, the, the customer is your people aren't one hundred percent. I mean, we're, we're taking what somebody is typing today and now we're going to OCR it. Neither are one hundred percent. But if you do this and say I can OCR it at pick your number ninety, ninety four percent, ninety five percent, but you have a human look at that remainder, right? Then our chances of one hundred percent in terms of the output are really, really high. And there's a lot of similarities to what we're talking about. We we talked, I think, on a prior podcast about the idea of human in the loop. Your buddy's example is the precise example, right? Where you know, what we have to guard against though, is this sort of human, uh, reaction of like, oh, the AI is doing it, so I don't need to look at it. And that's that can't be the process. Is that laziness factor? Yeah. And, and and I think that going back to our earlier point, it's that, you know assumptive. Well it's AI so it must must be. Right. Right. If it had the right data then it probably would be. That's right. Yeah. And I don't want to say that those people are being lazy. They may have well been trained to just sure it's process issue. Hey, hey, this is how we do it. And that I'm not going to I'm not going to mess that up. No, no. Right. So, um. Yeah. That's it. It is interesting how we all behave with these new tools and how we all have to adapt to them. So let's cover a couple more of these before we go. Sounds good. This one is near and dear to our heart. Oh, yeah. Yeah. Yep. Yeah, I know what's coming. Yep. This unstructured data gold mine. And now we're talking. Yeah. And deep analysis is saying that the gold rush will finally begin. I hear that and I go finally. Like where? What are you people doing? Like, yeah, we're. Yeah. We're not. There's plenty of people like us, like going, oh, this is where the gold lies is in your unstructured data. The vast majority of your your organizational knowledge is stuck. It's trapped. Why not? Like what? What is it? It's got to be hype cycle blindness or deafness, maybe. Or I mean, well, I think, you know, I mean, first of all, I love the term gold rush because I think you and I even maybe in this podcast talked about, you know, sort of this, this gold that our customers are sitting on in terms of the unstructured content. And I was visiting a customer the other day and, you know, explaining, having been in this industry a long time, the, you know, the amount of time spent on, you know, trying to convince people that content is cool. And, um, uh, being pretty much, uh, having to acknowledge that nobody thinks it's cool. Yeah. Until now. Right. Yeah. And, you know, I think what's happened, honestly, is that we have for years and years and years taken the idea of unstructured content is, you know, I store it, I may retrieve it for some, you know, period of useful time. And at that point it becomes an archive, one that I may have to have. And so, you know, whatever. But but what's, you know, what's changed is the ability to now reach into it for useful information. Right. I remember getting a call, you know, many, many, many years ago, um, you know, in healthcare. And we were talking about storing all this clinical information. And, you know, one of the questions was, you know, can I do a query for every baby that was born breech between X and y dates. And I said, well, I mean, if you put, you know, breech as a, a metadata point on every document, then sure. But you wouldn't. And so, you know, there's always been sort of this opacity to the, you know, to the content store that's made it like, you know, yes. It's great. I can, I could pull up all those charts and look through to find. But that's not what the ask was, right. Now all of a sudden you're saying, well, you know, I could reach in and find all kinds of useful information. I could add it to the, you know, uh, Data Lake where I'm doing analytics against my EMR data and my, my, my base data. And that extends to, you know, think about claims, right. So you've got, uh, insurance claims, you've got all kinds of data in the line of business system, the claims claim system. But there's also all this information that got submitted with the claim itself. So the police report or, you know, images. Right. All those kinds of things that those are in the archive and somebody could pull them up. But if you started to surface data from that in concert with claims data, well, everything from fraud modeling to, you know, anything else becomes a possibility. So I think as people are starting to realize that, you know, kind of the cheesy gold rush analogy, you know, there's gold in them hills. There are hills, right, that, you know, this is this truly now is exciting. Now it comes with the caveats we talked about earlier. You know, you can't just go, all right. I had all this governed particular segments of my company. And well, we're just going to reach in and expose it everywhere. That's a but the fact that today there is a real ability to go in and turn that into useful data for agents, for analytics, for automation. You know, and so on. I mean, that is the gold. And that's why I think people are so excited. It's why we're so excited because, you know, we've been working with customers for a very long time, helping them store all this critical information that we now can exponentially expand the usefulness of. Yeah. So let's let's do a little bit of, um, comparison really quick before we move on to the last topic. There are plenty of solutions now that would claim there's no need for intelligent document processing or ingestion of all of that data. Yeah, I can I can just layer it over. It's agnostic. I can and you know, we know people in this industry and it's great. I'm not saying that it's bad. I'm just saying it's different. And I want to point out some of that differentiation. I think you could really help with that. So just just because I can layer an agentic over a solution over the top of all of my content, and it can be agnostic to, to the like, I can just layer over that and I can query off of that. Sure. That sounds like some people might go, well then why do I need to ingest all this into my data warehouse or data lake? Or like, why do I need IDP? We talk about that being a first mile of of AI and of intelligent processing for, for organizations where where's where's the where's the need for both of those? Well, I think there's, I mean, there's, there's multiple components, right? I mean, I think when you, when you think about IDP, first of all, you know, one of the first things I think about is the sort of, you know, always on, um, go forward, transactional set of content that's coming into the organization. You know, we've spent years and years helping people with inbound, you know, transactional information, whether it was claims and insurance or charts and healthcare or, you know, mortgages and banks and those kinds of things. You know, one of the things that's happened is that because the capabilities that IDP has today, you can expand the set of solution types. So, you know, IDP for as powerful as it was for years and years and years, where did it get implemented? AP right. Mortgage you know, etc.. And it really didn't get too far outside of that because as the volumes came down, the return on investment came down with it because it took so much to get set up. Well, that's what's changed, right? Is the setup to have a useful IDP solution for any number of document types is reduced drastically. Right. And so there's this go forward of wow I can tackle more problems. You know with it. Then there's the fact that, you know, you think about scale and performance, you know. So IDP, a good IDP solution, can get into the tens of millions, hundreds of millions of documents per year. And while that's not critical to everyone, it's certainly critical for many of the, you know, many people in the enterprise. And so scaling, performance matters. I think the other thing is, you know, that I see IDP that, you know, there is this fork in the road that I'm hopeful will, um, you know, return to the mean a little bit, or at least start to get categorized where I see IDP, you know, kind of going like this where there's, you know, many of the IDP solutions are undertaking the, you know, just give us your stuff and, you know, feed it in and we're going to magically read it and spit it out and everything like that. And the tools are great. I mean, they really are great. However, you know, they work under a lot of assumptions. Many of them assume you already know the document type, so they don't have the ability to classify it. You know, many of them are, you know, using kind of these back end open source tools that know what they know. They're not, you know, necessarily scaling for production performance. You know what? If you have a ton of document types, what if you have large scale? What if you have. So I like to think that there's going to be some categorization of IDP solution types, because this idea that there's now almost five hundred vendors being sort of looped under the definition of doing the same thing. Oh yeah, that's just not no, not only is it not fair to those of us in the industry, it's particularly unfair to the customer because the customer is not armed necessarily with the questions to ask to distinguish, because, you know, I think we talked about before, it will demo great. Oh yeah, it will demo great. But when you turn it on for real. Can it handle what it needs to? Yeah, a good segue. This is a good segue into the last, um, the last one of the trends I want to talk about, which is the fact that they make the comment that purpose built machine learning models will be making a comeback. Right. And and that's an interesting thing because we're you're just you just talked about it. It's nice that. Oh wow. You can just plug it in and the power of you've ever seen a real demo. Not a one you just kind of refer to as this marketing setup. We messaged it to do and recorded it like a real life, dropping any kind of document in you multiple types and watching the AI separate. Extract it, classify it. It's absolutely mind blowing. And it does it incredibly well. That's right. So little training needed on that. But but but deep analysis is saying, wait a minute. There's a place still for those. You know, we need those really predictable outputs. Um, is it going away? What do you think? Do you agree with them? Disagree with them? I mean, is there still room? It seems like there's a it's not an Or. It's an and it's definitely an end. And you know there's a reason we like those guys. They know what they're talking about. Yeah. And I think their point on this one is absolutely spot on. Um, the idea that, you know, you can just go out and leverage a very generic LM and say that's, that's going to be my backbone of how I'm going to classify or extract or what have you just belies reality, right? I mean, we've done over the years a lot of work in the mortgage space, as I mentioned, for example, we've done work in the healthcare space. There are just, you know, specialty language in those documents, their specialty content. There's, you know, there's things that won't translate well. There's, you know, so the idea that you're going to be able to just say, oh, yeah, we'll just use this big generic model and and that's going to get it done. And, you know, unfortunately that's not only happening with IDP vendors. There's a ton of homegrown going on. So people in it saying, well, I can go grab this tool in the LLM and and I'll just slam this together. Well, okay. What if it's wrong? Did you build an interface to to verify that. How are you going to validate it. Do you do lookups into your existing databases. And that might sound old school, but it's I prefer to think of it as it's the learnings of production reality. Right. And so when you go to tackle the problem, the idea of a purpose built, you know, kind of a, you know, to your point that that ML component, we know because that that's been in place for a long time, the IDP industry. We know that works well, but there's also a reason why things have evolved, right? If I can now marry these two right, or use things like Rag right retrieval, augmented generation where I can layer on top this, you know, more specific to me, more specific to my business language model or Rag model that is going to enhance what that machine learning component does. Now I'm in, you know, kind of a utopian state in terms of, you know, the accuracy of my output. And the great thing is, as much as maybe I made that sounded complicated, the speed to get that up and running to do that is it's a it's a small percentage of what it used to take, right. Ten years ago, if you wanted to do mortgage, you had to feed thousands and thousands of examples for to learn on and, and you know, etc. and that's just not it anymore. And so it's really, you know, the it's the combo of the two that really unleashes the full power of what IDP is about. And I think you know, the deep analysis guys, you know get that absolutely right. I think what we're seeing and to be honest with you, a big reason for this podcast is to further educate so people don't buy into the magic demo. You know, the magic demo is what's leading to all the without what you can't spell magic without AI, you know it's you. You know the magic is what's leading to the ninety five percent stats and the MIT study of like, failures. Right? It's going, yeah, that looks great. I got to have that. Well, of course it looks great. But, you know, I think what you're seeing is people are now, you know, being forced to take a step back and go, all right, there's there's a reason that, you know, there's there are tried and true approaches to this. And I don't you know, my hope is people don't hear that as as stodgy. What I'm saying is when you combine the, you know, kind of the old and the new is when you get the next level. Yeah. Well, I know we're hugely appreciative of the work that a lot of these analysts do, including analysis. Yeah, absolutely. Totally recommend checking them out. Um, they've got a lot of knowledge and years behind them. So with that, I think we'll wrap this podcast up. Yeah, that sounds good. Um, again, hope you learned something. And if you heard something brilliant, then just assume we planned it. It's exactly right. All right. Thank you. Yeah, man. We'll see you. Appreciate it. Sounds good.
We recommend upgrading to the latest Chrome, Firefox, Safari, or Edge.
Please check your internet connection and refresh the page. You might also try disabling any ad blockers.
You can visit our support center if you're having problems.