MATTHEW LUNGREN: All right.
Welcome to another episode of The Pod.
Today's guest, if you're a regular listener
of this podcast, really needs no introduction.
But it's Peter Lee, who's one of my colleagues here at Microsoft,
and he's the president of Microsoft Research.
And he's focused on steering the company's worldwide network
of research labs, incubating breakthroughs,
research-powered business and artificial intelligence,
computer foundations, health and life sciences.
So covering the entire breadth.
Before joining Microsoft, Peter headed the Transformational
Convergence Technology Office at DARPA,
and also serve as chair of Carnegie Mellon University's
Computer Science Department.
He's a fellow of the ACM, and actually
recently elected member of the Mayo Clinic Board of Trustees.
And, of course, you may know him as well
from his very influential book from 2023,
called The AI Revolution in Medicine, GPT-4 and Beyond.
And then, of course, now, the follow-up
podcast series, which we'll get into today.
So welcome, Peter.
Thanks for joining us.
PETER LEE: Oh, thanks, Matt.
Or should I say Dr. Lungren?
It's great to be here.
MATTHEW LUNGREN: Only part time now.
So that.
So, Justin, I know that we and I have been trying to keep up,
as we always do, with this show of what's the most current event
and how are we following some of the trends
and then how does it impact health care.
I feel like the last few days has been even more
frenzied than usual, and I'm hearing that July, overall,
will be crazy.
But I think we're recording this around the time of the Grok
launch.
So I think it's probably the first topic that, at least,
I'm interested in covering.
So I don't know if you have some of the latest info.
JUSTIN NORDEN: So I've just pulled up now.
They just released their benchmarks.
This will be very similar to people who've been listening.
New model comes out, more money, more compute,
better performance all state.
Just a few other parts.
I missed a few other things.
The CEO has stepped down.
There's been some differences on alignment and training,
and some controversy around the model.
We won't dive into that today, but a few
of the other things, just to go fly
through from the data, new model kind of jumping
to the top of the benchmarks.
This is starting to outcompete o3 Pro, Gemini 2.5 Pro,
et cetera, on a multi modal benchmark putting together here.
And then I'll show one more here talking about some
of those specific exams, PAARC, AGI, et cetera.
And I know, Matt, actually you had some extra comments you
wanted to cover on this one.
MATTHEW LUNGREN: Well, yeah.
I think this is a great opportunity.
Now, that we have Peter with us too, it's just that, I think,
we talk a lot about bigger models, larger clusters,
maybe even more data.
And that pre-training world.
And I feel like, especially since '01, if not '03, I've used
DeepSeek and some others, we really have started to think
more about post training and test time compute.
And I don't know how you're seeing this.
I think there was that Noam Brown quote from OpenAI who
basically said that just a few seconds of letting the model,
quote-unquote, "think" is equivalent to scaling up
by a couple x.
How are you looking at this?
And then do you feel like this is yet another opportunity just
to continue to infinitely scale at some level
to the limits of physics, I suppose in electricity
to get to that next big breakthrough?
PETER LEE: Well, I think for sure,
at least for people who are involved in the more research
side of the development of these systems
that the hot area and the focus is on post-training
and on inference time compute.
I think that's where a lot of the thought
is going in, because it's getting harder and harder
to get breakthroughs and real advances
in the pre-training phase.
And pre-training also is just high stakes.
It's getting to the point where to have another pre-training run
at scale is like making a commitment to,
I don't know, a new silicon processor
architecture that you can't do too many of those things.
And so there's just a lot more flexibility and a lot more
opportunity if you are an innovator
to try out your ideas in the post training phase.
So there's just a huge flurry of activity there.
One thing, though, I think does seem to be--
we don't know everything about why these reasoning
models are working so well.
But one thing that seems to be true
is that the pre-trained model, that base model,
has to be very good in order to reliably get
good results in these reasoning paradigms.
MATTHEW LUNGREN: I feel like, it seems
like there's so many different paths on the test time
compute, but also that I worry that we're chasing benchmarks
because I feel like, in some cases,
it shows super awesome performance
on a lot of these benchmarks.
There's that old like, is it like Goodhart's law
or whatever, where it's when a measure becomes a target,
it ceases to be a good measure.
That whole idea I feel like it's now almost we
all are becoming more comfortable with various models,
maybe in our daily life.
And then we get the vibes, I want to say, or feel,
and we know our tasks pretty well,
and we get that Ethan Mollick's jagged edge understanding,
and then we can say, OK, that one is better
because I've used three or others for this exact same task.
And I feel like that is you almost
have to wait for the reaction as opposed
to overreacting to the benchmarks.
But this is pretty promising either way.
PETER LEE: Oh, yeah.
I think it is.
Part of the evaluation problem, of course,
is hard because it depends so much
on what it is that you want out of these models.
But I think we're getting a better and better sense of this.
Out of Microsoft Research, there's
a very experimental evaluation approach
called Adele, which actually used ideas
from psychometrics to try to evaluate these things.
And I think that's different than the current benchmarks
that people are using, which stress test,
problem-solving ability, logical reasoning, and world knowledge
much more directly.
But in the end, I think what we want are collaborators.
We want things that can work with us.
And so the question is, how do you
evaluate, when you hire an intern,
how do you evaluate whether you have a good one or not?
And so I think we're still trying to figure that out.
By the way, the--
oh, go ahead, Justin.
Yeah.
JUSTIN NORDEN: No, I was going to say, but you know,
and there are things you expect your intern to do.
Do they retrieve the right data that you ask?
Do they follow the instructions well?
Are they good at communicating back the results?
Are they giving you leverage on your time,
which is the ultimate metric that we're shooting for?
And I do love and I know you spoke with Ethan Mollick
as well, who talks a lot about just these concepts.
So there are some ways that we talk about it.
PETER LEE: Yeah, I think you want
someone who's a good listener, also
knows when to ask the right questions.
Doesn't waste your time with those.
Knows when to go somewhere else for help.
So there are things--
there are some things that are still missing.
Memory is a big one.
You want someone who kind of learns from experience
and remembers what you like, what you don't like.
There's something called entitlements,
where if you give permission to use certain kinds of tools
that it knows how to go out, learn how to use those, and use
those kinds of things.
And so things are coming along.
And I think we're going to see continued breakthroughs
as these new capabilities get integrated into the models
that we have available to us.
MATTHEW LUNGREN: I mean, I think this is a great allegory
for health care.
Because I was a part of missions committee for Med school.
We obviously interviewed residents
and yes, test scores played a role in that.
But everyone was always looking for that magic combination
of factors that led to phenomenal physicians.
And it's just so hard to do with humans.
I mean, for a while, there was a period of time, I think,
just in the, it was maybe the early 2000s,
where it was like liberal arts majors,
but also good MCAT scores.
That was the thing.
And they were kind of moving away from your typical biology
majors.
And then everyone was like, where
do we have an interesting philosophy major we can grab?
And I don't know if that actually led
to any meaningful selection.
There was lots of theories behind it.
But there's a similar, I think, effect here,
where it really is feel and using it and all those things.
And to your point, the things that are coming or the things
that are being explored around being
able to complete a complex task in the way
that you want it to without a lot of intervention.
I think those are the kinds of bars
that I'm looking for personally, because I get more frustrated.
If you look at this kind of data,
this is what folks are really looking to do.
And this may require additional post-training work
in terms of literally RL on a job category,
like the folks at thinking machines I think are famously
trying to look towards, which is take a given task,
and actually have folks think their way through the task,
could that potentially be a path towards a reinforcement
learning?
And that agent-based approach is that it ends up
being really good at completing these long tasks without getting
off into the ditch, which you get super frustrated.
You're like, oh, it's going to do it.
And then it comes back, you're like,
man, I waited for 20 minutes and it totally missed the boat.
And you almost want to pause it and stop it and you can't.
There's moments like that.
You know it's going to change, but.
PETER LEE: Well, Matt, you were involved
in that very interesting work on the health care agent
orchestrator.
And for people who aren't familiar with that,
it's literally an agent that participates in online tumor
board meeting, and can facilitate the use of other AI
models, but also helps facilitate a meeting in a really
interesting way.
And I think people who are getting
to use that kind of early on, particularly at Stanford, I
think are finding it to be surprisingly useful.
And there's a framework there that's interesting.
But you and I have discussed this, Matt.
It is still so limited.
For example, as a participant in the meeting,
it doesn't have the ability to raise its hand
and butt into the conversation.
There's a level of proactiveness.
It's a pure, it's a second-class citizen still today.
And as we think about the future of these types of systems
and the human machine collaboration,
that obviously where we're at today with that orchestrator
agent can't be the endpoint.
There's something that I think is
going to be much more of, I think,
an eventual equal citizen, first class citizen
in those kinds of settings.
JUSTIN NORDEN: It is really interesting
that we all spend a ton of time looking at the newest
research, things that are coming out, very, very little today
is still at the bedside actually delivering care.
And Matt's been following the AI and radiology
side of this for forever.
We're just starting to get more and more adoption.
And so what we can prove in a lab
is still just so far away from what we actually
see in practice.
MATTHEW LUNGREN: And it may turn out that, to your point,
we have been building tons of neuro models for a long time.
The way my framework has shifted, especially with,
to your point, Peter, about having different agents,
is now it almost makes it, like I can think
it's 1 plus 1 equals 3 to me.
Because like those narrow models,
we spend a lot of time on those.
Those are tools now.
For a given agent as long as, to your point,
the super agent has at least enough contextual understanding,
knows what I need to get done, understands the intent,
and whatever protocol ends up winning the day looks like MVP
or whatever that's going to end up being, can figure out, OK,
you're a model that does X, you measure lung nodules.
Great.
That was part of the thing that I was asked to do in my task.
I'm going to check that box off, and I'm going
to use that model to run that.
I mean, I see it coming into view.
And then I think we're starting to feel like in terms
of the scales where there's the camp that thinks we just
need one really super smart model to do it all,
and we can debate that.
But I also feel like the narrative today, to me,
is multiple agents that are more specialized and overall
coming up with a better way to complete a given task.
And I think you flash that paper up, Justin.
But similar to that MAI workers, can we actually
start to look at the comparison between here's
the various tasks that walk me through a diagnosis
and then base model versus multiple models.
And it's an interesting, especially when you start
to put other lenses on it.
This paper shows, which is you could
order every test in the world and you'll get the diagnosis.
But is that economically feasible?
No.
So can we start to look at this more practically.
And I think they did a great job with this.
PETER LEE: Oh, I think this latest work,
and I'm aware of three or four labs,
at least, around the world that are pursuing this same thing.
But I think the MAI team here at Microsoft
is the first out on this.
And internally, we refer to this as sequential diagnosis.
What's so interesting for people that haven't read
the paper is the model starts with the simplest
of all prompts.
You get the presentation of a patient that's literally a one
liner, like an 18-year-old woman presents with a cough and sore
throat.
So something as very minimal and simple as that.
And at that point, the model has to be able to ask questions,
has to be smart and economic about doing an exam,
ordering labs, perhaps making referrals
to other agents or other medical specialists,
and there's a penalty for the costs of those things.
And so as you start to really delve into this,
yes, the AI model itself is interesting,
but I think to your point, Matt and Justin,
to my mind, what's even more interesting is
that the evaluation setup for this thing, because it's really
having to be a collaborator, it's having to understand
the context of medical care, work
with other agents, both human and AI,
in order to achieve an economically reasonable outcome
for the case.
And, of course, the headline is this thing
does four times better than human doctors.
But that's really not the point.
The real point is that starting from that just simple prompt,
it's able to proceed with diagnosis.
Now, there's still huge questions.
There's the question of what happens
if a totally healthy patient that just needs a cup of tea
and needs to go have good a couple days of rest.
What happens in those situations?
And so there's still a lot of work to do.
But the evaluation framework, at least right now,
appears to be able to accommodate the study
of that kind of thing.
So I think that's super exciting.
JUSTIN NORDEN: Yeah.
And I think this paper has gotten a ton of attention
from the media and other places, especially with those headlines.
And it's something I think people
who don't spend a ton of time focused on AI and health--
it's like, oh, but the models won't
be able to ask the questions, the models
won't be able to follow.
Actually, no, no, no they can't--
to your point of starting with a very simple prompt.
Interestingly, though, what you mentioned on the evaluation
framework, there's been some discussion too of,
well, wait a second, a hospital actually doesn't necessarily
want the lowest cost path.
This is an issue if we're not ordering the expensive tests.
And so it just shows the really interesting point you bring up,
which is, what we choose for these models to evaluate
is just so important to make sure
we're kind of getting towards the answers
we want, and we won't get into the whole health care economics
debate for what happens and the incentives to use AI properly.
But the evaluation is just so important.
PETER LEE: It's such a good point, Justin.
And maybe this is a question for Matt,
since he's connected with the nuance business at Microsoft.
But there's always been a question in the story
that I've told on a podcast not too long ago, is during COVID,
I had to go see a dermatologist because I
had a growth on my left cheek.
So I go see the dermatologist, and this clinic
happened to be wealthy enough that they had a human scribe
in the exam room.
And so I get treated.
They freeze off this growth.
And a few days later, I go online
and look at the submitted note, clinical note.
And the note basically says that the treatment was necessary
because I was unable to wear a COVID mask, which was not really
true, but it did allow a code that then achieved,
it was upcoding basically.
And so I was thinking about this because at that time, Matt and I
and a bunch of others were really hard at work
on our medical clinical note-taking applications.
And actually it's something we've
been working on since 2018.
We started a project called Empower MD, which
actually resulted in both Dragon Copilot and the Bridge.
And I tried to wonder, would our product
or our Bridge or Ambience or any of these products
do similar kinds of coding, and what
is the revenue impact on clinics like that one?
MATTHEW LUNGREN: I mean, it's funny that you mentioned that,
because I think revenue cycle management is a massive space.
And I think where does the ambient note move into that?
The proper words in my field and in medical imaging,
there are certain things you have to mention in order to
is it a complete, is it a limited, those kinds
of things that come up.
I think that there are both really optimistic ways
to look at this and then also pessimistic ways.
And again, this goes into just the health economics side.
But you could go as far as to say into some of the darker
areas like literally fraud and other things
that we know occur that cause a lot of waste and expense
to the health system on top of the administrative costs.
And so maybe this is another way to say,
is the automated approach going to give us
a better handle on it, or a better way to audit
or monitor that at a system level,
as opposed to these one-offs that are happening
or folks, frankly, even having to potentially go back, and then
add these things to the notes post treatment, which again,
doesn't necessarily reflect reality in some cases?
PETER LEE: Yeah.
By the way, this touches on something
you brought up earlier about whether there's
one superintelligence that does it all
or whether there's lots of specialized AI models.
The one obvious next gap for something,
for this ambient listening AI, I think
is some connection to quality measures.
And the reason I think that makes
sense is there's a business logic there,
because quality measures are directly
tied to improved revenues for most health
care delivery organizations.
And right now, that's a problem for these ambient tools
because they are pure costs.
They're costs that doctors and nurses seem to like.
And so I think that's why they're doing surprisingly
well in the market right now.
But eventually, you would want these things also
to improve the kind of cost structures that these places.
And so quality measures seems, there's
a lot of other interest in prior authorization and referrals
and so on, as well as reducing errors, like medication
errors or other kind of diagnostic errors.
But quality measure seems like the one
that just it would surprise me if there
aren't real advances there.
And particularly with the receptiveness
of organizations like Medicare and Medicaid
to these kinds of ideas.
And so the question to my mind is
whether that ends up being a system of AI agents
specialized in different ways, or whether you need some more
holistic, integrated AI.
And I don't know the answer to that question right now,
but my instinct has always been the more integrated approach,
even though agents and more specialized models
seem to be the popular approach at the moment.
JUSTIN NORDEN: I won't have Matt comment on this because I know
he's--
but, yes, I spend my days speaking with health systems
across the country and its nuance
and a bridge that come up on the ambient documentation space.
Although I'll say, interestingly, Ambience,
I'd say another company in this space,
they are trying to publish now, by the way,
we let you build more.
And they actually, so is that the macro health care outcome
we want to get to?
No.
Of course not.
However, they do think that is the kind of early proof
point for where we're running with these tools.
I agree with you, Peter.
I would love if quality were the ultimate metric.
It's not really measured yet, especially with these tools.
PETER LEE: Actually, Justin, just
I don't want to let go of that point
that you're making because the one thing that's bothered me
the most over the last five or eight years in medicine
is that both technologists and policy
makers in a good-hearted attempt to improve matters,
have tended more often than not to put more burden on providers,
more cognitive burden, more effort burden, and more cost
burden.
And so we pushed very hard with the government on the fire
mandates, the fast health care Interoperability Resources data
standard mandates.
But so much of the burden of actually making that happen
has fallen on the shoulders of providers.
And this is a repeating pattern.
And techies like me have just blindly
done that over and over again.
We just think that doctors can just do more and more,
nurses can just do more and more.
We have to get way smarter about that somehow,
and spread the burden in smarter ways.
So that's a little bit of a rant.
But if there's one thing I've learned in my role
here at Microsoft Research over, let's say the last eight years,
is that we've got to stop doing that.
MATTHEW LUNGREN: Peter, we encourage rants on this show.
That's OK.
No, but no, but I mean, honestly though, I
think to just to touch on I think
the Ambience space is absolutely on fire
right now in terms of, in good way.
I think that clinicians are saying,
and it actually is almost a comment
to reflect what you just said about the burden on providers.
If just saving that time is a revolution,
I mean, physicians are saying they were going to retire
and then they're not.
If that's how low the bar kind of is,
which means our health care workforce is suffering.
And there's no question about it.
In fact, to the point where, and we've
talked about this on the show multiple times,
some of the patterns of behavior in the market
are like the consumer patterns of behavior.
In other words, I'm going to use these models on my phone
if I have to, because I know they can save me time.
They're saving me time in my personal life.
I'm going to find ways that they help me.
And then, there's this tension because of the safety
and all the issues that we worry about,
privacy in the health system.
And a lot of folks are finding that just to shortcut directly.
Can I connect the models compliantly either
with fire or other interop standards directly to the EHR?
And there have been several versions of this.
I mean, this is just an article from one of the Stanford
efforts, ChatEHR, which is intimately tied to the work
that we're doing in the Multiplayer version of this,
which is the work with the health care agent
orchestrator and more complex workflows.
But if you look at the work that they've
done here is essentially, hey, we know it's useful.
We're going to put you through training so you at least have
an understanding of where things could go wrong
and you're aware of that.
But nonetheless, we want to make the best available technology
directly attached to the clinical systems of record.
So because the other cool thing, though,
that's happening, by the way, is in doing this,
now, they're seeing the most common use.
They're able to say like, OK, I use ChatEHR yesterday,
man, this prompt killed it.
I really got the really great care off note or whatever it is.
And that shared learning is really accelerating too.
So I'm extremely bullish on this.
I don't know where this starts to fit
in with more specific purpose-built models, like how
high up into the product land you need to go
versus just being able to provide the latest intelligence
directly into the context.
I don't know the answer.
PETER LEE: Yeah.
Well, first off, the ChatEHR work in Stanford
has really impressed me, because as you know, Matt,
within Microsoft Research, we've worked for years on machine
learning in order to understand clinical data,
largely unstructured clinical data.
And it's harder than it looks to do that.
And we've had some big wins.
I think the first really big win was working with the Providence
health system on their reporting to seven different state cancer
registries, which used to involve
about 3 dozen nurses all day gathering the clinical notes,
and then figuring out how to extract
the right information for these 7 different registry input
forms.
It's just a big mess.
And across 51 hospitals, that they have in their system,
it's a huge, huge mess.
And this was pre generative AI Hoifung Poon
led an effort that actually created
operational capability that's actually currently
in operation at Providence.
But now, I had this funny episode
I was working with what we now know as GPT-4,
but it wasn't released to the public yet,
and I wanted to test some things out.
And so I was interacting with Hoifung,
and I told him that I had just met this new postdoc at Harvard
Medical School, and just wanted to test his abilities
to do certain things.
And so we were going back and forth
with some ability to read some oncology research papers
and answer questions.
And I think Hoifung was really impressed.
In fact, there was a funny moment
when they disagreed on a certain point in the paper,
and they had to agree to disagree, and at some point,
Hoifung realized this is not a person.
This is an AI.
Well, I tell that story because after Hoifung finally
got his hands on that experimental GPT-4,
he managed to recreate that Providence work in one week.
And it's a watershed moment.
And it didn't solve all the problems.
But I agree with you, Matt, that the potential is so huge.
As you know, we're doing a lot with the Cosmos database
at Epic.
And that's where we're learning a lot of our lessons
about what's hard and what's easy, what works
and what doesn't work.
I think the dream of having a new modality
in diagnosis and treatment, that is very data-driven, where you
have the presentation of a patient, some labs,
physical exam, other data.
And you haven't-- one component of your doctoring being,
can you summarize 50 patients just like this?
Tell me how they were diagnosed, treated, their outcomes,
and let's have a conversation about that.
I think that can be done.
And tools like ChatEHR at Stanford are step one
on a 12-step path to get there.
MATTHEW LUNGREN: Absolutely.
Well, and this really starts to lead
into to some of the comments because most people know
we've done just tons of work on NLP
and health care like as a field.
And MSR, in general, really have led a lot of those techniques.
And then, now saying, because we have all this experience,
we know where some of the biggest problems to tackle
are, and how do we apply the newest intelligence,
and where does it work, where does it fall short.
I think as you start to look at connecting
the intelligence directly in the EHR, this comes up a lot.
And I think as I listen to your podcast,
and how you've revisited some of the predictions
in the book around this, I think some themes
are starting to emerge.
I don't know if you're, I mean, you're obviously
having a lot of these interviews with luminaries
across the spectrum, but I think the folks are making
the assumption at this point that you almost
are obligated to at least be double-checking with some
of these models on various tasks.
It's almost like it's not just like a nice to have.
It's almost mandatory to some extent.
And I don't know when that will flip entirely.
But I'm curious have you started--
JUSTIN NORDEN: When do you think?
MATTHEW LUNGREN: Well, I mean--
JUSTIN NORDEN: What's your bet?
Well, we can all go around.
MATTHEW LUNGREN: I'm the most biased guy here, in terms of--
I'm such a fanboy of some of this stuff--
but also realize where the pitfalls are.
But I will say, I'm willing to look past some of the faults
because I feel as though I have this mantra in my head
that, is the worst it'll ever be?
So I'm just OK with that.
And so I think it's almost now.
I mean, honestly, I think we really are at the point where,
especially just for literacy to get a feel of it if you haven't.
I think that that's been my, as you know,
my soapbox for a while now.
PETER LEE: Yeah, I think there's a difference between what
is medically best and what both doctors and patients are
ready for.
And so, I think that now I agree with you
that I think that there would be a benefit, for example,
in the reduction of medical errors,
or misdiagnosis if AI were used much more routinely
as a second opinion or second set of eyes.
It's just more, more data, more intelligence applied to things
that would happen right today.
Whether doctors and patients would trust that
or would go that extra mile, I don't know.
But I do agree that I think in just a single digit
number of years, at least patients would be alarmed
if they found out that their doctors weren't
getting the assistance of AI.
And so that flip, I think, is much less than 10 years.
JUSTIN NORDEN: Yeah, I think I agree with that, Peter.
And as I, and we've talked about this on before, 5% to 10%
of OpenAI queries are medically related.
Some of those doctors, some of those patients,
it's happening now.
I'll give a number.
I'll say two years.
And at least in certain cities--
take San Francisco, Silicon Valley,
where Matt and I are based--
it's a very tech-forward population.
A lot of early adopters in general.
If the patients are going to come in with significant turns
on AI to their doctors, and if the doctors
aren't at least capable to have a discussion with those patients
about those results and be literate on what's happening,
they're going to start to lose trust of their patients.
And so it's going, I think it's going to be forced,
and then the future is here.
It's just not evenly distributed.
PETER LEE: I mean, if you think of Stanford Medicine,
I don't know if Stanford Medicine has its own patient
portal or if they use MyChart or some blend.
But it's inconceivable, say, in that two-year time
frame, Justin, that patients wouldn't
be able to have a normal conversation with their chart
through the patient portal, whatever it is.
It could be an epic supplied MyChart thing, where
you have a conversation about things,
or it could be a Stanford thing or some blend.
But I think that the patient demand will be there.
And it's just such a natural thing
because everyone is motivated to have patients
engage with that portal more.
Again, it's an economic driver.
And right now, let's face it.
If you've just had a surgical procedure,
you look at your MyChart, let's say,
it's inscrutable to any normal patient.
You have no idea what these pathology results
are, and so on.
And so to be able to ask questions,
to have a conversation, explain it to me like a person who
has no medical knowledge or training or I'm 6 years old,
or whatever it is, and to have the conversation like that I
think is absolutely inevitable.
And I cannot imagine Stanford wouldn't be providing its
patients with that capability in that two-year time frame.
MATTHEW LUNGREN: I think, I mean,
what you're saying is, what's the ChatEHR equivalent
for patients in there?
Because the current problem was that doctors were having to go
cut and paste into this other place.
Patients are doing that.
They're taking the cutting and pasting.
How do we connect that?
And I think it's, to your point, complex.
And right now, the behavior that Justin is pointing out
is patients are using it.
And you have plenty of anecdotes,
where they're catching things or they're
able to understand things.
We've talked about the information asymmetry
that has plagued medicine since the dawn of the field.
Just how well you can explain things to your patient
and be really on the a peer-level journey together,
as opposed to this dictating at them
and expecting them just to follow along.
I think this is going to be inevitable in the very
short term.
PETER LEE: Yeah.
Since you mentioned, Matt, the podcast series, maybe
I can plug that a little bit.
Carrie Goldberg, Zak Kohane, and I wrote a book,
and published in March of 2023.
And we made a whole bunch of guesses
about what might happen with generative AI in medicine.
But since no one had access to GPT 4
at that time that we published the book,
it was all a work of pure speculation, informed
speculation, but still speculation.
And so two years on, the question
is, what have we learned, and what's really going on for real?
And the threat was, we'd have to write another book, which is
the last thing I wanted to do.
And so agreed instead to do a series of 12 podcasts
to talk to people who have been hands on out in the field,
or observing the business aspects of this,
both clinical and business aspects,
as well as technological aspects.
And so as part of the Microsoft Research podcast series
that people are interested.
And Matt, you are, of course, a great guest on this.
MATTHEW LUNGREN: We had this conversation before we came on,
but our production value is very much not at your level.
But no, it's a phenomenal listen and I
was saying before I think some of the comments
are echoed by different-- you have folks
from all these different backgrounds, all
these different areas of expertise,
but the themes are really pretty clear, I think.
So it's almost like a reinforcement of some things.
A lot of open questions are raised that I think are valid.
I think one of the most provocative things,
and we can probably wrap with this last topic.
But I think it was Zak that talked about,
how does the field of medicine change with this?
Not just that we're going to be using the models
and making our lives easier, but does the subspecialist
start to fade, and we go back to having a generalist that has
these superpowers with these?
I thought that was a very interesting topic,
because right for the last, what, 30-plus years,
even in my, I mean, I'm a subspecialist.
And part of it's the information doubles every 90 days.
All the things we know that make it important to know your field.
But the amount of knowledge that you can possibly
stuff in your head is limited.
But the generalist with these tools,
I think that was a very interesting comment.
And that would literally shift the culture of medicine,
even just what folks choose to go into and all kinds of things.
I don't know if you had a--
PETER LEE: Yeah, I thought about this,
and Zak and I obviously have discussed and debated
this a little bit.
For me, I always have the problem.
I see two opposing arguments here.
Historically, technology has only
increased the number of medical specialties, not decreased it.
And I've said publicly there have
been two times where technology has eliminated specialties.
We don't have phrenology anymore,
because of all the technologies that
have led to the rise of neurology and so on.
And we also don't have barbers doing bloodlettings anymore.
And there's lots of specialties that are, in part,
technology-powered.
But beyond that, you can't find examples of medical specialties
that have disappeared because of advances in technology.
In fact, just the opposite.
So that's one argument.
The other argument, though, is--
and I think Sebastian Bubeck, in the podcast made this point--
humans have a hard time coping with a huge amount of knowledge.
And so that's one of the likely reasons
why we have medical specialties to begin with.
You can become an endocrinologist,
but it's hard to become an endocrinologist
and a nephrologist and a cardiologist and so on.
It's just too much for a single human being to cope with.
But an AI could, and at least in other fields,
we've seen this most vividly in agronomics.
You can actually see a benefit to an AI model understanding
agriculture in Brazil versus Europe versus the US,
and being able to essentially triangulate
across those different agricultural practices
to become a superhuman agronomist.
And so the counterargument is, is it possible
that a single AI that can be as good as any human being in 50
medical specialties?
It might then enable general practitioners,
that's, human beings to be general practitioners.
And I don't know which way it would-- in fact,
I think I'm less equipped than both of you,
Justin and you, Matt, in predicting which way it'll go.
JUSTIN NORDEN: Well, my own version
is I also need more time.
I don't have a strong opinion yet.
And I know I'm copying out on the answer right now,
but I guess maybe it's a push to really think through
it and get there soon.
But I agree with you.
I see good arguments both ways.
Does primary care shift solely AI-first,
and care move towards specialties and interventions
in doing?
Does it shift the opposite and everything
gets done by a superpowered primary care physician?
I think, in some cases, we may see both of, I think early on,
actually we see both.
We see some primary care physicians
who take on so, so more, and really lean into the tools
and cover many more patients, much more in depth.
And I think we see certain routes in primary care
where patients are going to self-diagnose,
go to Amazon or something, get what
used to be a primary care task done almost in a fully automated
way.
And so I see both of those things happening,
almost immediately.
Then the question is, how does that
perturb the system, which I haven't
come to my own conclusions yet.
PETER LEE: If I could just say, I think it's so important for us
to be thinking about these things now,
and to be really grappling.
And I have tremendous optimism, because what
I see in the medical world as a techie is
the medical world, and especially
leading institutions like Stanford and others,
really confronting these things head on.
Because another question that's similar
is, if we empower every person with this kind
of medical superintelligence, will we finally
realize the true benefits of early diagnosis and better
health, and therefore, reduce cost and burden on the system?
Or will just the opposite happen,
where medicine is just always going
to be an exact enough that your AI is always
going to find things that are wrong with you,
and is going to now motivate people
to be an even bigger burden on an overstressed health care
system?
And again, I think it's very hard
to know which way it will go.
But what I think I feel good about
is that really, really smart people in the field
are thinking really hard about these things right now.
And we just have to keep pressing on that.
JUSTIN NORDEN: Well, we pulled this up.
I didn't go through it before.
So this is a friend from Morgan Cheatham,
and he wrote and talked about what you mentioned,
the shifting of value, where it's
been focused on diagnosis here in the middle,
and maybe it'll shift towards diagnosis or intervention.
I think this may be someday, someday, someday where it goes.
But I actually think value is going to shift far to the right.
I don't think we're near.
I don't think we're close at all to shifting value
upstream towards diagnostic and payment models.
Even though in theory, that's possible from the technology,
I think we're really going to shift to the right, which
is where we're talking about with changes
in coding practices, changing in finding high-value patients
and procedures, hospitals doing a lot more.
So maybe Morgan was just optimistic from his timeline.
But I think we're going to shift a lot to the right, at least
as a first step for where this goes.
PETER LEE: One thing I like about this chart, though,
is I would, let me just stick to Microsoft Research,
but I think every technology R&D organization is the same.
10 years ago, if a techie researcher
thought about medicine, they immediately
gravitated to diagnosis.
And that's good and that's important.
But that is not health care and medicine.
And so one thing I think that we've
gotten smarter about in Microsoft Research--
but we're far from alone, I think, lots and lots of places
have gotten smarter-- is we're seeing the bigger picture better
than we used to.
And so when I see Morgan's chart here, that's what I see,
is that we actually understand that there
are things like prevention and intervention
that could also be helped through technology.
And that's actually a major advance for the tech world.
MATTHEW LUNGREN: It's the start of the journey,
and it really is the diagnostic step.
And oftentimes, Justin, I mean, most of the patients
I've seen in my career even come with essentially
having that diagnosis.
And then it's about the decision-making, the judgment,
and the knowledge to make the right decisions later over time.
So anyway, this is more to come here,
but yeah, this has been absolutely phenomenal.
Thank you, Peter, for sharing your time
with us and your insights.
And as you know, we try to keep this as a ongoing thread
through multiple discussions.
And I think this really added a lot to the prior ones.
So thank you so much for joining us.
PETER LEE: It was really fun.
Matt, thanks, and thanks to you, Justin, for having me.
We recommend upgrading to the latest Chrome, Firefox, Safari, or Edge.
Please check your internet connection and refresh the page. You might also try disabling any ad blockers.
You can visit our support center if you're having problems.