Audioboom

{"flashNotices":[]}

Transcript for AI in Healthcare: Accelerating the AI Revolution in Medicine, with Peter Lee, Microsoft

00:00:00,000 → 00:00:15,094

00:00:15,094 → 00:00:16,219

MATTHEW LUNGREN: All right.

00:00:16,219 → 00:00:19,080

Welcome to another episode of The Pod.

00:00:19,080 → 00:00:22,680

Today's guest, if you're a regular listener

00:00:22,680 → 00:00:25,580

of this podcast, really needs no introduction.

00:00:25,580 → 00:00:30,420

But it's Peter Lee, who's one of my colleagues here at Microsoft,

00:00:30,420 → 00:00:33,080

and he's the president of Microsoft Research.

00:00:33,080 → 00:00:35,880

And he's focused on steering the company's worldwide network

00:00:35,880 → 00:00:38,420

of research labs, incubating breakthroughs,

00:00:38,420 → 00:00:40,940

research-powered business and artificial intelligence,

00:00:40,940 → 00:00:44,020

computer foundations, health and life sciences.

00:00:44,020 → 00:00:46,080

So covering the entire breadth.

00:00:46,080 → 00:00:49,920

Before joining Microsoft, Peter headed the Transformational

00:00:49,920 → 00:00:52,520

Convergence Technology Office at DARPA,

00:00:52,520 → 00:00:55,480

and also serve as chair of Carnegie Mellon University's

00:00:55,480 → 00:00:57,400

Computer Science Department.

00:00:57,400 → 00:00:59,800

He's a fellow of the ACM, and actually

00:00:59,800 → 00:01:03,560

recently elected member of the Mayo Clinic Board of Trustees.

00:01:03,560 → 00:01:05,600

And, of course, you may know him as well

00:01:05,600 → 00:01:08,680

from his very influential book from 2023,

00:01:08,680 → 00:01:12,815

called The AI Revolution in Medicine, GPT-4 and Beyond.

00:01:12,815 → 00:01:14,440

And then, of course, now, the follow-up

00:01:14,440 → 00:01:17,280

podcast series, which we'll get into today.

00:01:17,280 → 00:01:18,240

So welcome, Peter.

00:01:18,240 → 00:01:19,933

Thanks for joining us.

00:01:19,933 → 00:01:21,100

PETER LEE: Oh, thanks, Matt.

00:01:21,100 → 00:01:23,320

Or should I say Dr. Lungren?

00:01:23,320 → 00:01:25,360

It's great to be here.

00:01:25,360 → 00:01:26,860

MATTHEW LUNGREN: Only part time now.

00:01:26,860 → 00:01:27,500

So that.

00:01:27,500 → 00:01:30,040

00:01:30,040 → 00:01:34,820

So, Justin, I know that we and I have been trying to keep up,

00:01:34,820 → 00:01:38,408

as we always do, with this show of what's the most current event

00:01:38,408 → 00:01:40,200

and how are we following some of the trends

00:01:40,200 → 00:01:41,867

and then how does it impact health care.

00:01:41,867 → 00:01:46,040

I feel like the last few days has been even more

00:01:46,040 → 00:01:48,880

frenzied than usual, and I'm hearing that July, overall,

00:01:48,880 → 00:01:50,600

will be crazy.

00:01:50,600 → 00:01:54,280

But I think we're recording this around the time of the Grok

00:01:54,280 → 00:01:55,760

launch.

00:01:55,760 → 00:01:59,280

So I think it's probably the first topic that, at least,

00:01:59,280 → 00:02:01,520

I'm interested in covering.

00:02:01,520 → 00:02:04,680

So I don't know if you have some of the latest info.

00:02:04,680 → 00:02:06,740

JUSTIN NORDEN: So I've just pulled up now.

00:02:06,740 → 00:02:09,100

They just released their benchmarks.

00:02:09,100 → 00:02:11,620

This will be very similar to people who've been listening.

00:02:11,620 → 00:02:15,200

New model comes out, more money, more compute,

00:02:15,200 → 00:02:18,460

better performance all state.

00:02:18,460 → 00:02:20,800

Just a few other parts.

00:02:20,800 → 00:02:22,420

I missed a few other things.

00:02:22,420 → 00:02:24,020

The CEO has stepped down.

00:02:24,020 → 00:02:26,640

There's been some differences on alignment and training,

00:02:26,640 → 00:02:29,520

and some controversy around the model.

00:02:29,520 → 00:02:32,680

We won't dive into that today, but a few

00:02:32,680 → 00:02:34,320

of the other things, just to go fly

00:02:34,320 → 00:02:39,200

through from the data, new model kind of jumping

00:02:39,200 → 00:02:41,040

to the top of the benchmarks.

00:02:41,040 → 00:02:45,780

This is starting to outcompete o3 Pro, Gemini 2.5 Pro,

00:02:45,780 → 00:02:51,000

et cetera, on a multi modal benchmark putting together here.

00:02:51,000 → 00:02:53,400

And then I'll show one more here talking about some

00:02:53,400 → 00:02:56,863

of those specific exams, PAARC, AGI, et cetera.

00:02:56,863 → 00:02:59,280

And I know, Matt, actually you had some extra comments you

00:02:59,280 → 00:03:00,640

wanted to cover on this one.

00:03:00,640 → 00:03:01,807

MATTHEW LUNGREN: Well, yeah.

00:03:01,807 → 00:03:04,920

I think this is a great opportunity.

00:03:04,920 → 00:03:08,200

Now, that we have Peter with us too, it's just that, I think,

00:03:08,200 → 00:03:13,840

we talk a lot about bigger models, larger clusters,

00:03:13,840 → 00:03:14,900

maybe even more data.

00:03:14,900 → 00:03:16,500

And that pre-training world.

00:03:16,500 → 00:03:22,505

And I feel like, especially since '01, if not '03, I've used

00:03:22,505 → 00:03:24,880

DeepSeek and some others, we really have started to think

00:03:24,880 → 00:03:29,740

more about post training and test time compute.

00:03:29,740 → 00:03:31,620

And I don't know how you're seeing this.

00:03:31,620 → 00:03:35,200

I think there was that Noam Brown quote from OpenAI who

00:03:35,200 → 00:03:39,500

basically said that just a few seconds of letting the model,

00:03:39,500 → 00:03:43,400

quote-unquote, "think" is equivalent to scaling up

00:03:43,400 → 00:03:46,560

by a couple x.

00:03:46,560 → 00:03:47,740

How are you looking at this?

00:03:47,740 → 00:03:51,320

And then do you feel like this is yet another opportunity just

00:03:51,320 → 00:03:53,440

to continue to infinitely scale at some level

00:03:53,440 → 00:03:57,820

to the limits of physics, I suppose in electricity

00:03:57,820 → 00:04:01,600

to get to that next big breakthrough?

00:04:01,600 → 00:04:03,960

PETER LEE: Well, I think for sure,

00:04:03,960 → 00:04:07,040

at least for people who are involved in the more research

00:04:07,040 → 00:04:10,080

side of the development of these systems

00:04:10,080 → 00:04:15,453

that the hot area and the focus is on post-training

00:04:15,453 → 00:04:18,519

and on inference time compute.

00:04:18,519 → 00:04:21,000

I think that's where a lot of the thought

00:04:21,000 → 00:04:23,880

is going in, because it's getting harder and harder

00:04:23,880 → 00:04:28,240

to get breakthroughs and real advances

00:04:28,240 → 00:04:30,570

in the pre-training phase.

00:04:30,570 → 00:04:32,820

And pre-training also is just high stakes.

00:04:32,820 → 00:04:38,560

It's getting to the point where to have another pre-training run

00:04:38,560 → 00:04:41,280

at scale is like making a commitment to,

00:04:41,280 → 00:04:43,680

I don't know, a new silicon processor

00:04:43,680 → 00:04:47,880

architecture that you can't do too many of those things.

00:04:47,880 → 00:04:53,000

And so there's just a lot more flexibility and a lot more

00:04:53,000 → 00:04:55,880

opportunity if you are an innovator

00:04:55,880 → 00:04:59,620

to try out your ideas in the post training phase.

00:04:59,620 → 00:05:02,760

So there's just a huge flurry of activity there.

00:05:02,760 → 00:05:06,400

One thing, though, I think does seem to be--

00:05:06,400 → 00:05:09,320

we don't know everything about why these reasoning

00:05:09,320 → 00:05:11,240

models are working so well.

00:05:11,240 → 00:05:13,360

But one thing that seems to be true

00:05:13,360 → 00:05:17,680

is that the pre-trained model, that base model,

00:05:17,680 → 00:05:22,480

has to be very good in order to reliably get

00:05:22,480 → 00:05:28,800

good results in these reasoning paradigms.

00:05:28,800 → 00:05:31,560

MATTHEW LUNGREN: I feel like, it seems

00:05:31,560 → 00:05:34,880

like there's so many different paths on the test time

00:05:34,880 → 00:05:39,720

compute, but also that I worry that we're chasing benchmarks

00:05:39,720 → 00:05:41,880

because I feel like, in some cases,

00:05:41,880 → 00:05:43,680

it shows super awesome performance

00:05:43,680 → 00:05:45,480

on a lot of these benchmarks.

00:05:45,480 → 00:05:47,480

There's that old like, is it like Goodhart's law

00:05:47,480 → 00:05:50,848

or whatever, where it's when a measure becomes a target,

00:05:50,848 → 00:05:52,140

it ceases to be a good measure.

00:05:52,140 → 00:05:55,440

That whole idea I feel like it's now almost we

00:05:55,440 → 00:05:57,980

all are becoming more comfortable with various models,

00:05:57,980 → 00:05:58,980

maybe in our daily life.

00:05:58,980 → 00:06:02,000

And then we get the vibes, I want to say, or feel,

00:06:02,000 → 00:06:03,580

and we know our tasks pretty well,

00:06:03,580 → 00:06:07,460

and we get that Ethan Mollick's jagged edge understanding,

00:06:07,460 → 00:06:10,000

and then we can say, OK, that one is better

00:06:10,000 → 00:06:13,220

because I've used three or others for this exact same task.

00:06:13,220 → 00:06:15,160

And I feel like that is you almost

00:06:15,160 → 00:06:16,840

have to wait for the reaction as opposed

00:06:16,840 → 00:06:19,040

to overreacting to the benchmarks.

00:06:19,040 → 00:06:21,440

But this is pretty promising either way.

00:06:21,440 → 00:06:22,420

PETER LEE: Oh, yeah.

00:06:22,420 → 00:06:23,960

I think it is.

00:06:23,960 → 00:06:27,020

Part of the evaluation problem, of course,

00:06:27,020 → 00:06:29,480

is hard because it depends so much

00:06:29,480 → 00:06:33,040

on what it is that you want out of these models.

00:06:33,040 → 00:06:37,400

But I think we're getting a better and better sense of this.

00:06:37,400 → 00:06:39,120

Out of Microsoft Research, there's

00:06:39,120 → 00:06:42,520

a very experimental evaluation approach

00:06:42,520 → 00:06:44,800

called Adele, which actually used ideas

00:06:44,800 → 00:06:49,840

from psychometrics to try to evaluate these things.

00:06:49,840 → 00:06:54,720

And I think that's different than the current benchmarks

00:06:54,720 → 00:06:59,520

that people are using, which stress test,

00:06:59,520 → 00:07:03,520

problem-solving ability, logical reasoning, and world knowledge

00:07:03,520 → 00:07:05,480

much more directly.

00:07:05,480 → 00:07:09,780

But in the end, I think what we want are collaborators.

00:07:09,780 → 00:07:13,500

We want things that can work with us.

00:07:13,500 → 00:07:16,120

And so the question is, how do you

00:07:16,120 → 00:07:18,260

evaluate, when you hire an intern,

00:07:18,260 → 00:07:21,600

how do you evaluate whether you have a good one or not?

00:07:21,600 → 00:07:26,400

And so I think we're still trying to figure that out.

00:07:26,400 → 00:07:28,200

By the way, the--

00:07:28,200 → 00:07:29,300

oh, go ahead, Justin.

00:07:29,300 → 00:07:29,800

Yeah.

00:07:29,800 → 00:07:31,670

JUSTIN NORDEN: No, I was going to say, but you know,

00:07:31,670 → 00:07:33,740

and there are things you expect your intern to do.

00:07:33,740 → 00:07:35,620

Do they retrieve the right data that you ask?

00:07:35,620 → 00:07:37,520

Do they follow the instructions well?

00:07:37,520 → 00:07:40,080

Are they good at communicating back the results?

00:07:40,080 → 00:07:42,540

Are they giving you leverage on your time,

00:07:42,540 → 00:07:46,720

which is the ultimate metric that we're shooting for?

00:07:46,720 → 00:07:49,720

And I do love and I know you spoke with Ethan Mollick

00:07:49,720 → 00:07:52,980

as well, who talks a lot about just these concepts.

00:07:52,980 → 00:07:57,560

So there are some ways that we talk about it.

00:07:57,560 → 00:07:59,560

PETER LEE: Yeah, I think you want

00:07:59,560 → 00:08:01,920

someone who's a good listener, also

00:08:01,920 → 00:08:04,480

knows when to ask the right questions.

00:08:04,480 → 00:08:07,160

Doesn't waste your time with those.

00:08:07,160 → 00:08:09,680

Knows when to go somewhere else for help.

00:08:09,680 → 00:08:12,080

So there are things--

00:08:12,080 → 00:08:14,200

there are some things that are still missing.

00:08:14,200 → 00:08:16,360

Memory is a big one.

00:08:16,360 → 00:08:20,160

You want someone who kind of learns from experience

00:08:20,160 → 00:08:23,240

and remembers what you like, what you don't like.

00:08:23,240 → 00:08:25,720

There's something called entitlements,

00:08:25,720 → 00:08:30,160

where if you give permission to use certain kinds of tools

00:08:30,160 → 00:08:34,983

that it knows how to go out, learn how to use those, and use

00:08:34,983 → 00:08:35,900

those kinds of things.

00:08:35,900 → 00:08:38,539

And so things are coming along.

00:08:38,539 → 00:08:43,159

And I think we're going to see continued breakthroughs

00:08:43,159 → 00:08:45,960

as these new capabilities get integrated into the models

00:08:45,960 → 00:08:47,465

that we have available to us.

00:08:47,465 → 00:08:49,840

MATTHEW LUNGREN: I mean, I think this is a great allegory

00:08:49,840 → 00:08:51,940

for health care.

00:08:51,940 → 00:08:56,260

Because I was a part of missions committee for Med school.

00:08:56,260 → 00:08:58,440

We obviously interviewed residents

00:08:58,440 → 00:09:03,080

and yes, test scores played a role in that.

00:09:03,080 → 00:09:06,680

But everyone was always looking for that magic combination

00:09:06,680 → 00:09:10,380

of factors that led to phenomenal physicians.

00:09:10,380 → 00:09:14,885

And it's just so hard to do with humans.

00:09:14,885 → 00:09:17,260

I mean, for a while, there was a period of time, I think,

00:09:17,260 → 00:09:19,880

just in the, it was maybe the early 2000s,

00:09:19,880 → 00:09:22,560

where it was like liberal arts majors,

00:09:22,560 → 00:09:24,340

but also good MCAT scores.

00:09:24,340 → 00:09:26,462

That was the thing.

00:09:26,462 → 00:09:28,920

And they were kind of moving away from your typical biology

00:09:28,920 → 00:09:29,420

majors.

00:09:29,420 → 00:09:30,920

And then everyone was like, where

00:09:30,920 → 00:09:34,732

do we have an interesting philosophy major we can grab?

00:09:34,732 → 00:09:36,680

And I don't know if that actually led

00:09:36,680 → 00:09:39,518

to any meaningful selection.

00:09:39,518 → 00:09:41,060

There was lots of theories behind it.

00:09:41,060 → 00:09:43,240

But there's a similar, I think, effect here,

00:09:43,240 → 00:09:47,500

where it really is feel and using it and all those things.

00:09:47,500 → 00:09:50,600

And to your point, the things that are coming or the things

00:09:50,600 → 00:09:53,760

that are being explored around being

00:09:53,760 → 00:09:57,000

able to complete a complex task in the way

00:09:57,000 → 00:10:00,782

that you want it to without a lot of intervention.

00:10:00,782 → 00:10:02,240

I think those are the kinds of bars

00:10:02,240 → 00:10:05,900

that I'm looking for personally, because I get more frustrated.

00:10:05,900 → 00:10:07,780

If you look at this kind of data,

00:10:07,780 → 00:10:10,140

this is what folks are really looking to do.

00:10:10,140 → 00:10:13,640

And this may require additional post-training work

00:10:13,640 → 00:10:17,660

in terms of literally RL on a job category,

00:10:17,660 → 00:10:20,960

like the folks at thinking machines I think are famously

00:10:20,960 → 00:10:24,880

trying to look towards, which is take a given task,

00:10:24,880 → 00:10:29,120

and actually have folks think their way through the task,

00:10:29,120 → 00:10:32,440

could that potentially be a path towards a reinforcement

00:10:32,440 → 00:10:32,940

learning?

00:10:32,940 → 00:10:36,120

And that agent-based approach is that it ends up

00:10:36,120 → 00:10:39,640

being really good at completing these long tasks without getting

00:10:39,640 → 00:10:42,638

off into the ditch, which you get super frustrated.

00:10:42,638 → 00:10:44,180

You're like, oh, it's going to do it.

00:10:44,180 → 00:10:45,860

And then it comes back, you're like,

00:10:45,860 → 00:10:50,603

man, I waited for 20 minutes and it totally missed the boat.

00:10:50,603 → 00:10:53,020

And you almost want to pause it and stop it and you can't.

00:10:53,020 → 00:10:54,103

There's moments like that.

00:10:54,103 → 00:10:57,430

You know it's going to change, but.

00:10:57,430 → 00:10:59,880

PETER LEE: Well, Matt, you were involved

00:10:59,880 → 00:11:03,520

in that very interesting work on the health care agent

00:11:03,520 → 00:11:05,440

orchestrator.

00:11:05,440 → 00:11:09,910

And for people who aren't familiar with that,

00:11:09,910 → 00:11:14,520

it's literally an agent that participates in online tumor

00:11:14,520 → 00:11:19,160

board meeting, and can facilitate the use of other AI

00:11:19,160 → 00:11:24,240

models, but also helps facilitate a meeting in a really

00:11:24,240 → 00:11:25,400

interesting way.

00:11:25,400 → 00:11:28,000

And I think people who are getting

00:11:28,000 → 00:11:32,520

to use that kind of early on, particularly at Stanford, I

00:11:32,520 → 00:11:35,640

think are finding it to be surprisingly useful.

00:11:35,640 → 00:11:38,660

And there's a framework there that's interesting.

00:11:38,660 → 00:11:42,740

But you and I have discussed this, Matt.

00:11:42,740 → 00:11:43,900

It is still so limited.

00:11:43,900 → 00:11:46,680

For example, as a participant in the meeting,

00:11:46,680 → 00:11:49,280

it doesn't have the ability to raise its hand

00:11:49,280 → 00:11:53,720

and butt into the conversation.

00:11:53,720 → 00:11:56,000

There's a level of proactiveness.

00:11:56,000 → 00:12:01,240

It's a pure, it's a second-class citizen still today.

00:12:01,240 → 00:12:06,240

And as we think about the future of these types of systems

00:12:06,240 → 00:12:10,320

and the human machine collaboration,

00:12:10,320 → 00:12:14,080

that obviously where we're at today with that orchestrator

00:12:14,080 → 00:12:17,720

agent can't be the endpoint.

00:12:17,720 → 00:12:19,160

There's something that I think is

00:12:19,160 → 00:12:23,200

going to be much more of, I think,

00:12:23,200 → 00:12:28,600

an eventual equal citizen, first class citizen

00:12:28,600 → 00:12:31,640

in those kinds of settings.

00:12:31,640 → 00:12:33,440

JUSTIN NORDEN: It is really interesting

00:12:33,440 → 00:12:36,280

that we all spend a ton of time looking at the newest

00:12:36,280 → 00:12:41,120

research, things that are coming out, very, very little today

00:12:41,120 → 00:12:46,040

is still at the bedside actually delivering care.

00:12:46,040 → 00:12:48,680

And Matt's been following the AI and radiology

00:12:48,680 → 00:12:51,720

side of this for forever.

00:12:51,720 → 00:12:55,620

We're just starting to get more and more adoption.

00:12:55,620 → 00:12:58,200

And so what we can prove in a lab

00:12:58,200 → 00:13:01,720

is still just so far away from what we actually

00:13:01,720 → 00:13:03,800

see in practice.

00:13:03,800 → 00:13:06,300

MATTHEW LUNGREN: And it may turn out that, to your point,

00:13:06,300 → 00:13:09,940

we have been building tons of neuro models for a long time.

00:13:09,940 → 00:13:12,520

The way my framework has shifted, especially with,

00:13:12,520 → 00:13:15,040

to your point, Peter, about having different agents,

00:13:15,040 → 00:13:18,520

is now it almost makes it, like I can think

00:13:18,520 → 00:13:20,740

it's 1 plus 1 equals 3 to me.

00:13:20,740 → 00:13:22,147

Because like those narrow models,

00:13:22,147 → 00:13:23,480

we spend a lot of time on those.

00:13:23,480 → 00:13:25,612

Those are tools now.

00:13:25,612 → 00:13:28,100

For a given agent as long as, to your point,

00:13:28,100 → 00:13:31,300

the super agent has at least enough contextual understanding,

00:13:31,300 → 00:13:34,280

knows what I need to get done, understands the intent,

00:13:34,280 → 00:13:36,880

and whatever protocol ends up winning the day looks like MVP

00:13:36,880 → 00:13:40,730

or whatever that's going to end up being, can figure out, OK,

00:13:40,730 → 00:13:43,060

you're a model that does X, you measure lung nodules.

00:13:43,060 → 00:13:43,560

Great.

00:13:43,560 → 00:13:48,163

That was part of the thing that I was asked to do in my task.

00:13:48,163 → 00:13:50,080

I'm going to check that box off, and I'm going

00:13:50,080 → 00:13:51,330

to use that model to run that.

00:13:51,330 → 00:13:54,720

I mean, I see it coming into view.

00:13:54,720 → 00:13:57,160

And then I think we're starting to feel like in terms

00:13:57,160 → 00:13:59,840

of the scales where there's the camp that thinks we just

00:13:59,840 → 00:14:02,340

need one really super smart model to do it all,

00:14:02,340 → 00:14:04,060

and we can debate that.

00:14:04,060 → 00:14:07,100

But I also feel like the narrative today, to me,

00:14:07,100 → 00:14:12,412

is multiple agents that are more specialized and overall

00:14:12,412 → 00:14:14,620

coming up with a better way to complete a given task.

00:14:14,620 → 00:14:17,180

And I think you flash that paper up, Justin.

00:14:17,180 → 00:14:20,880

But similar to that MAI workers, can we actually

00:14:20,880 → 00:14:25,160

start to look at the comparison between here's

00:14:25,160 → 00:14:27,480

the various tasks that walk me through a diagnosis

00:14:27,480 → 00:14:31,560

and then base model versus multiple models.

00:14:31,560 → 00:14:33,720

And it's an interesting, especially when you start

00:14:33,720 → 00:14:36,400

to put other lenses on it.

00:14:36,400 → 00:14:38,760

This paper shows, which is you could

00:14:38,760 → 00:14:41,640

order every test in the world and you'll get the diagnosis.

00:14:41,640 → 00:14:43,740

But is that economically feasible?

00:14:43,740 → 00:14:44,240

No.

00:14:44,240 → 00:14:46,308

So can we start to look at this more practically.

00:14:46,308 → 00:14:48,100

And I think they did a great job with this.

00:14:48,100 → 00:14:52,200

PETER LEE: Oh, I think this latest work,

00:14:52,200 → 00:14:55,100

and I'm aware of three or four labs,

00:14:55,100 → 00:14:58,940

at least, around the world that are pursuing this same thing.

00:14:58,940 → 00:15:02,440

But I think the MAI team here at Microsoft

00:15:02,440 → 00:15:04,500

is the first out on this.

00:15:04,500 → 00:15:08,180

And internally, we refer to this as sequential diagnosis.

00:15:08,180 → 00:15:10,600

What's so interesting for people that haven't read

00:15:10,600 → 00:15:17,400

the paper is the model starts with the simplest

00:15:17,400 → 00:15:18,560

of all prompts.

00:15:18,560 → 00:15:21,360

You get the presentation of a patient that's literally a one

00:15:21,360 → 00:15:27,560

liner, like an 18-year-old woman presents with a cough and sore

00:15:27,560 → 00:15:28,760

throat.

00:15:28,760 → 00:15:33,600

So something as very minimal and simple as that.

00:15:33,600 → 00:15:40,320

And at that point, the model has to be able to ask questions,

00:15:40,320 → 00:15:46,820

has to be smart and economic about doing an exam,

00:15:46,820 → 00:15:50,120

ordering labs, perhaps making referrals

00:15:50,120 → 00:15:55,600

to other agents or other medical specialists,

00:15:55,600 → 00:15:59,920

and there's a penalty for the costs of those things.

00:15:59,920 → 00:16:04,900

And so as you start to really delve into this,

00:16:04,900 → 00:16:08,360

yes, the AI model itself is interesting,

00:16:08,360 → 00:16:12,680

but I think to your point, Matt and Justin,

00:16:12,680 → 00:16:14,800

to my mind, what's even more interesting is

00:16:14,800 → 00:16:19,840

that the evaluation setup for this thing, because it's really

00:16:19,840 → 00:16:23,080

having to be a collaborator, it's having to understand

00:16:23,080 → 00:16:26,680

the context of medical care, work

00:16:26,680 → 00:16:30,120

with other agents, both human and AI,

00:16:30,120 → 00:16:37,180

in order to achieve an economically reasonable outcome

00:16:37,180 → 00:16:38,140

for the case.

00:16:38,140 → 00:16:40,760

And, of course, the headline is this thing

00:16:40,760 → 00:16:44,180

does four times better than human doctors.

00:16:44,180 → 00:16:46,600

But that's really not the point.

00:16:46,600 → 00:16:52,520

The real point is that starting from that just simple prompt,

00:16:52,520 → 00:16:54,620

it's able to proceed with diagnosis.

00:16:54,620 → 00:16:57,406

Now, there's still huge questions.

00:16:57,406 → 00:16:59,160

There's the question of what happens

00:16:59,160 → 00:17:02,080

if a totally healthy patient that just needs a cup of tea

00:17:02,080 → 00:17:05,560

and needs to go have good a couple days of rest.

00:17:05,560 → 00:17:07,371

What happens in those situations?

00:17:07,371 → 00:17:09,079

And so there's still a lot of work to do.

00:17:09,079 → 00:17:14,488

But the evaluation framework, at least right now,

00:17:14,488 → 00:17:16,280

appears to be able to accommodate the study

00:17:16,280 → 00:17:17,197

of that kind of thing.

00:17:17,197 → 00:17:19,667

So I think that's super exciting.

00:17:19,667 → 00:17:20,500

JUSTIN NORDEN: Yeah.

00:17:20,500 → 00:17:22,800

And I think this paper has gotten a ton of attention

00:17:22,800 → 00:17:27,400

from the media and other places, especially with those headlines.

00:17:27,400 → 00:17:29,600

And it's something I think people

00:17:29,600 → 00:17:32,340

who don't spend a ton of time focused on AI and health--

00:17:32,340 → 00:17:33,813

it's like, oh, but the models won't

00:17:33,813 → 00:17:35,480

be able to ask the questions, the models

00:17:35,480 → 00:17:36,760

won't be able to follow.

00:17:36,760 → 00:17:38,360

Actually, no, no, no they can't--

00:17:38,360 → 00:17:41,000

to your point of starting with a very simple prompt.

00:17:41,000 → 00:17:43,680

Interestingly, though, what you mentioned on the evaluation

00:17:43,680 → 00:17:46,650

framework, there's been some discussion too of,

00:17:46,650 → 00:17:50,120

well, wait a second, a hospital actually doesn't necessarily

00:17:50,120 → 00:17:52,840

want the lowest cost path.

00:17:52,840 → 00:17:56,700

This is an issue if we're not ordering the expensive tests.

00:17:56,700 → 00:18:00,860

And so it just shows the really interesting point you bring up,

00:18:00,860 → 00:18:04,420

which is, what we choose for these models to evaluate

00:18:04,420 → 00:18:07,092

is just so important to make sure

00:18:07,092 → 00:18:08,800

we're kind of getting towards the answers

00:18:08,800 → 00:18:12,040

we want, and we won't get into the whole health care economics

00:18:12,040 → 00:18:18,320

debate for what happens and the incentives to use AI properly.

00:18:18,320 → 00:18:21,740

But the evaluation is just so important.

00:18:21,740 → 00:18:23,600

PETER LEE: It's such a good point, Justin.

00:18:23,600 → 00:18:26,020

And maybe this is a question for Matt,

00:18:26,020 → 00:18:29,060

since he's connected with the nuance business at Microsoft.

00:18:29,060 → 00:18:31,680

But there's always been a question in the story

00:18:31,680 → 00:18:40,066

that I've told on a podcast not too long ago, is during COVID,

00:18:40,066 → 00:18:42,160

I had to go see a dermatologist because I

00:18:42,160 → 00:18:45,600

had a growth on my left cheek.

00:18:45,600 → 00:18:49,040

So I go see the dermatologist, and this clinic

00:18:49,040 → 00:18:53,120

happened to be wealthy enough that they had a human scribe

00:18:53,120 → 00:18:54,260

in the exam room.

00:18:54,260 → 00:18:55,640

And so I get treated.

00:18:55,640 → 00:18:59,800

They freeze off this growth.

00:18:59,800 → 00:19:03,400

And a few days later, I go online

00:19:03,400 → 00:19:08,520

and look at the submitted note, clinical note.

00:19:08,520 → 00:19:16,320

And the note basically says that the treatment was necessary

00:19:16,320 → 00:19:20,880

because I was unable to wear a COVID mask, which was not really

00:19:20,880 → 00:19:27,520

true, but it did allow a code that then achieved,

00:19:27,520 → 00:19:31,080

it was upcoding basically.

00:19:31,080 → 00:19:34,960

And so I was thinking about this because at that time, Matt and I

00:19:34,960 → 00:19:37,840

and a bunch of others were really hard at work

00:19:37,840 → 00:19:45,905

on our medical clinical note-taking applications.

00:19:45,905 → 00:19:47,280

And actually it's something we've

00:19:47,280 → 00:19:49,300

been working on since 2018.

00:19:49,300 → 00:19:52,600

We started a project called Empower MD, which

00:19:52,600 → 00:19:57,440

actually resulted in both Dragon Copilot and the Bridge.

00:19:57,440 → 00:20:02,320

And I tried to wonder, would our product

00:20:02,320 → 00:20:05,480

or our Bridge or Ambience or any of these products

00:20:05,480 → 00:20:07,120

do similar kinds of coding, and what

00:20:07,120 → 00:20:10,620

is the revenue impact on clinics like that one?

00:20:10,620 → 00:20:13,120

MATTHEW LUNGREN: I mean, it's funny that you mentioned that,

00:20:13,120 → 00:20:17,460

because I think revenue cycle management is a massive space.

00:20:17,460 → 00:20:22,010

And I think where does the ambient note move into that?

00:20:22,010 → 00:20:25,860

The proper words in my field and in medical imaging,

00:20:25,860 → 00:20:29,120

there are certain things you have to mention in order to

00:20:29,120 → 00:20:31,160

is it a complete, is it a limited, those kinds

00:20:31,160 → 00:20:33,000

of things that come up.

00:20:33,000 → 00:20:36,780

I think that there are both really optimistic ways

00:20:36,780 → 00:20:39,680

to look at this and then also pessimistic ways.

00:20:39,680 → 00:20:42,940

And again, this goes into just the health economics side.

00:20:42,940 → 00:20:46,240

But you could go as far as to say into some of the darker

00:20:46,240 → 00:20:48,800

areas like literally fraud and other things

00:20:48,800 → 00:20:52,000

that we know occur that cause a lot of waste and expense

00:20:52,000 → 00:20:55,780

to the health system on top of the administrative costs.

00:20:55,780 → 00:20:58,840

And so maybe this is another way to say,

00:20:58,840 → 00:21:00,720

is the automated approach going to give us

00:21:00,720 → 00:21:04,960

a better handle on it, or a better way to audit

00:21:04,960 → 00:21:08,760

or monitor that at a system level,

00:21:08,760 → 00:21:11,120

as opposed to these one-offs that are happening

00:21:11,120 → 00:21:14,880

or folks, frankly, even having to potentially go back, and then

00:21:14,880 → 00:21:21,150

add these things to the notes post treatment, which again,

00:21:21,150 → 00:21:24,413

doesn't necessarily reflect reality in some cases?

00:21:24,413 → 00:21:25,080

PETER LEE: Yeah.

00:21:25,080 → 00:21:27,040

By the way, this touches on something

00:21:27,040 → 00:21:29,280

you brought up earlier about whether there's

00:21:29,280 → 00:21:31,720

one superintelligence that does it all

00:21:31,720 → 00:21:36,360

or whether there's lots of specialized AI models.

00:21:36,360 → 00:21:41,760

The one obvious next gap for something,

00:21:41,760 → 00:21:45,480

for this ambient listening AI, I think

00:21:45,480 → 00:21:51,600

is some connection to quality measures.

00:21:51,600 → 00:21:53,640

And the reason I think that makes

00:21:53,640 → 00:21:55,620

sense is there's a business logic there,

00:21:55,620 → 00:21:57,480

because quality measures are directly

00:21:57,480 → 00:22:00,320

tied to improved revenues for most health

00:22:00,320 → 00:22:03,160

care delivery organizations.

00:22:03,160 → 00:22:08,400

And right now, that's a problem for these ambient tools

00:22:08,400 → 00:22:11,800

because they are pure costs.

00:22:11,800 → 00:22:14,920

They're costs that doctors and nurses seem to like.

00:22:14,920 → 00:22:18,000

And so I think that's why they're doing surprisingly

00:22:18,000 → 00:22:19,800

well in the market right now.

00:22:19,800 → 00:22:22,640

But eventually, you would want these things also

00:22:22,640 → 00:22:26,940

to improve the kind of cost structures that these places.

00:22:26,940 → 00:22:29,480

And so quality measures seems, there's

00:22:29,480 → 00:22:33,120

a lot of other interest in prior authorization and referrals

00:22:33,120 → 00:22:37,320

and so on, as well as reducing errors, like medication

00:22:37,320 → 00:22:40,080

errors or other kind of diagnostic errors.

00:22:40,080 → 00:22:42,040

But quality measure seems like the one

00:22:42,040 → 00:22:47,880

that just it would surprise me if there

00:22:47,880 → 00:22:51,720

aren't real advances there.

00:22:51,720 → 00:22:53,600

And particularly with the receptiveness

00:22:53,600 → 00:22:57,200

of organizations like Medicare and Medicaid

00:22:57,200 → 00:22:58,960

to these kinds of ideas.

00:22:58,960 → 00:23:01,760

And so the question to my mind is

00:23:01,760 → 00:23:05,640

whether that ends up being a system of AI agents

00:23:05,640 → 00:23:09,960

specialized in different ways, or whether you need some more

00:23:09,960 → 00:23:12,560

holistic, integrated AI.

00:23:12,560 → 00:23:15,300

And I don't know the answer to that question right now,

00:23:15,300 → 00:23:18,700

but my instinct has always been the more integrated approach,

00:23:18,700 → 00:23:21,400

even though agents and more specialized models

00:23:21,400 → 00:23:27,495

seem to be the popular approach at the moment.

00:23:27,495 → 00:23:30,120

JUSTIN NORDEN: I won't have Matt comment on this because I know

00:23:30,120 → 00:23:30,680

he's--

00:23:30,680 → 00:23:34,680

but, yes, I spend my days speaking with health systems

00:23:34,680 → 00:23:37,240

across the country and its nuance

00:23:37,240 → 00:23:40,840

and a bridge that come up on the ambient documentation space.

00:23:40,840 → 00:23:45,060

Although I'll say, interestingly, Ambience,

00:23:45,060 → 00:23:48,240

I'd say another company in this space,

00:23:48,240 → 00:23:51,940

they are trying to publish now, by the way,

00:23:51,940 → 00:23:54,400

we let you build more.

00:23:54,400 → 00:23:59,560

And they actually, so is that the macro health care outcome

00:23:59,560 → 00:24:01,560

we want to get to?

00:24:01,560 → 00:24:02,060

No.

00:24:02,060 → 00:24:03,020

Of course not.

00:24:03,020 → 00:24:06,520

However, they do think that is the kind of early proof

00:24:06,520 → 00:24:08,920

point for where we're running with these tools.

00:24:08,920 → 00:24:10,780

I agree with you, Peter.

00:24:10,780 → 00:24:13,360

I would love if quality were the ultimate metric.

00:24:13,360 → 00:24:18,600

It's not really measured yet, especially with these tools.

00:24:18,600 → 00:24:20,460

PETER LEE: Actually, Justin, just

00:24:20,460 → 00:24:21,960

I don't want to let go of that point

00:24:21,960 → 00:24:26,640

that you're making because the one thing that's bothered me

00:24:26,640 → 00:24:34,040

the most over the last five or eight years in medicine

00:24:34,040 → 00:24:37,240

is that both technologists and policy

00:24:37,240 → 00:24:42,500

makers in a good-hearted attempt to improve matters,

00:24:42,500 → 00:24:50,960

have tended more often than not to put more burden on providers,

00:24:50,960 → 00:24:54,240

more cognitive burden, more effort burden, and more cost

00:24:54,240 → 00:24:55,640

burden.

00:24:55,640 → 00:25:01,800

And so we pushed very hard with the government on the fire

00:25:01,800 → 00:25:05,680

mandates, the fast health care Interoperability Resources data

00:25:05,680 → 00:25:07,280

standard mandates.

00:25:07,280 → 00:25:10,200

But so much of the burden of actually making that happen

00:25:10,200 → 00:25:13,960

has fallen on the shoulders of providers.

00:25:13,960 → 00:25:16,940

And this is a repeating pattern.

00:25:16,940 → 00:25:22,000

And techies like me have just blindly

00:25:22,000 → 00:25:23,620

done that over and over again.

00:25:23,620 → 00:25:25,980

We just think that doctors can just do more and more,

00:25:25,980 → 00:25:27,420

nurses can just do more and more.

00:25:27,420 → 00:25:30,930

We have to get way smarter about that somehow,

00:25:30,930 → 00:25:34,870

and spread the burden in smarter ways.

00:25:34,870 → 00:25:37,110

So that's a little bit of a rant.

00:25:37,110 → 00:25:41,210

But if there's one thing I've learned in my role

00:25:41,210 → 00:25:44,230

here at Microsoft Research over, let's say the last eight years,

00:25:44,230 → 00:25:47,417

is that we've got to stop doing that.

00:25:47,417 → 00:25:49,750

MATTHEW LUNGREN: Peter, we encourage rants on this show.

00:25:49,750 → 00:25:51,490

That's OK.

00:25:51,490 → 00:25:53,610

No, but no, but I mean, honestly though, I

00:25:53,610 → 00:25:55,890

think to just to touch on I think

00:25:55,890 → 00:25:59,970

the Ambience space is absolutely on fire

00:25:59,970 → 00:26:02,060

right now in terms of, in good way.

00:26:02,060 → 00:26:04,730

I think that clinicians are saying,

00:26:04,730 → 00:26:07,170

and it actually is almost a comment

00:26:07,170 → 00:26:09,670

to reflect what you just said about the burden on providers.

00:26:09,670 → 00:26:13,598

If just saving that time is a revolution,

00:26:13,598 → 00:26:15,890

I mean, physicians are saying they were going to retire

00:26:15,890 → 00:26:17,000

and then they're not.

00:26:17,000 → 00:26:20,270

If that's how low the bar kind of is,

00:26:20,270 → 00:26:24,015

which means our health care workforce is suffering.

00:26:24,015 → 00:26:25,390

And there's no question about it.

00:26:25,390 → 00:26:27,050

In fact, to the point where, and we've

00:26:27,050 → 00:26:30,820

talked about this on the show multiple times,

00:26:30,820 → 00:26:34,910

some of the patterns of behavior in the market

00:26:34,910 → 00:26:37,190

are like the consumer patterns of behavior.

00:26:37,190 → 00:26:41,490

In other words, I'm going to use these models on my phone

00:26:41,490 → 00:26:43,790

if I have to, because I know they can save me time.

00:26:43,790 → 00:26:45,742

They're saving me time in my personal life.

00:26:45,742 → 00:26:47,450

I'm going to find ways that they help me.

00:26:47,450 → 00:26:50,485

And then, there's this tension because of the safety

00:26:50,485 → 00:26:52,110

and all the issues that we worry about,

00:26:52,110 → 00:26:54,330

privacy in the health system.

00:26:54,330 → 00:26:58,650

And a lot of folks are finding that just to shortcut directly.

00:26:58,650 → 00:27:03,290

Can I connect the models compliantly either

00:27:03,290 → 00:27:07,130

with fire or other interop standards directly to the EHR?

00:27:07,130 → 00:27:09,310

And there have been several versions of this.

00:27:09,310 → 00:27:13,010

I mean, this is just an article from one of the Stanford

00:27:13,010 → 00:27:15,370

efforts, ChatEHR, which is intimately tied to the work

00:27:15,370 → 00:27:18,657

that we're doing in the Multiplayer version of this,

00:27:18,657 → 00:27:20,490

which is the work with the health care agent

00:27:20,490 → 00:27:22,910

orchestrator and more complex workflows.

00:27:22,910 → 00:27:24,930

But if you look at the work that they've

00:27:24,930 → 00:27:28,490

done here is essentially, hey, we know it's useful.

00:27:28,490 → 00:27:32,310

We're going to put you through training so you at least have

00:27:32,310 → 00:27:35,090

an understanding of where things could go wrong

00:27:35,090 → 00:27:36,190

and you're aware of that.

00:27:36,190 → 00:27:40,530

But nonetheless, we want to make the best available technology

00:27:40,530 → 00:27:44,470

directly attached to the clinical systems of record.

00:27:44,470 → 00:27:46,910

So because the other cool thing, though,

00:27:46,910 → 00:27:50,090

that's happening, by the way, is in doing this,

00:27:50,090 → 00:27:52,190

now, they're seeing the most common use.

00:27:52,190 → 00:27:55,230

They're able to say like, OK, I use ChatEHR yesterday,

00:27:55,230 → 00:27:57,390

man, this prompt killed it.

00:27:57,390 → 00:28:00,710

I really got the really great care off note or whatever it is.

00:28:00,710 → 00:28:02,950

And that shared learning is really accelerating too.

00:28:02,950 → 00:28:04,990

So I'm extremely bullish on this.

00:28:04,990 → 00:28:07,690

I don't know where this starts to fit

00:28:07,690 → 00:28:12,290

in with more specific purpose-built models, like how

00:28:12,290 → 00:28:15,195

high up into the product land you need to go

00:28:15,195 → 00:28:17,570

versus just being able to provide the latest intelligence

00:28:17,570 → 00:28:18,870

directly into the context.

00:28:18,870 → 00:28:20,003

I don't know the answer.

00:28:20,003 → 00:28:20,670

PETER LEE: Yeah.

00:28:20,670 → 00:28:23,890

Well, first off, the ChatEHR work in Stanford

00:28:23,890 → 00:28:28,330

has really impressed me, because as you know, Matt,

00:28:28,330 → 00:28:33,210

within Microsoft Research, we've worked for years on machine

00:28:33,210 → 00:28:39,810

learning in order to understand clinical data,

00:28:39,810 → 00:28:41,430

largely unstructured clinical data.

00:28:41,430 → 00:28:44,470

And it's harder than it looks to do that.

00:28:44,470 → 00:28:46,210

And we've had some big wins.

00:28:46,210 → 00:28:50,530

I think the first really big win was working with the Providence

00:28:50,530 → 00:28:54,930

health system on their reporting to seven different state cancer

00:28:54,930 → 00:28:57,170

registries, which used to involve

00:28:57,170 → 00:29:00,770

about 3 dozen nurses all day gathering the clinical notes,

00:29:00,770 → 00:29:03,130

and then figuring out how to extract

00:29:03,130 → 00:29:08,570

the right information for these 7 different registry input

00:29:08,570 → 00:29:09,530

forms.

00:29:09,530 → 00:29:11,430

It's just a big mess.

00:29:11,430 → 00:29:16,850

And across 51 hospitals, that they have in their system,

00:29:16,850 → 00:29:18,670

it's a huge, huge mess.

00:29:18,670 → 00:29:23,410

And this was pre generative AI Hoifung Poon

00:29:23,410 → 00:29:27,330

led an effort that actually created

00:29:27,330 → 00:29:30,490

operational capability that's actually currently

00:29:30,490 → 00:29:33,530

in operation at Providence.

00:29:33,530 → 00:29:38,010

But now, I had this funny episode

00:29:38,010 → 00:29:42,270

I was working with what we now know as GPT-4,

00:29:42,270 → 00:29:45,010

but it wasn't released to the public yet,

00:29:45,010 → 00:29:47,350

and I wanted to test some things out.

00:29:47,350 → 00:29:49,030

And so I was interacting with Hoifung,

00:29:49,030 → 00:29:53,290

and I told him that I had just met this new postdoc at Harvard

00:29:53,290 → 00:29:58,910

Medical School, and just wanted to test his abilities

00:29:58,910 → 00:30:01,050

to do certain things.

00:30:01,050 → 00:30:04,690

And so we were going back and forth

00:30:04,690 → 00:30:09,970

with some ability to read some oncology research papers

00:30:09,970 → 00:30:13,210

and answer questions.

00:30:13,210 → 00:30:16,995

And I think Hoifung was really impressed.

00:30:16,995 → 00:30:18,370

In fact, there was a funny moment

00:30:18,370 → 00:30:21,410

when they disagreed on a certain point in the paper,

00:30:21,410 → 00:30:24,550

and they had to agree to disagree, and at some point,

00:30:24,550 → 00:30:26,790

Hoifung realized this is not a person.

00:30:26,790 → 00:30:28,170

This is an AI.

00:30:28,170 → 00:30:30,970

Well, I tell that story because after Hoifung finally

00:30:30,970 → 00:30:34,150

got his hands on that experimental GPT-4,

00:30:34,150 → 00:30:40,210

he managed to recreate that Providence work in one week.

00:30:40,210 → 00:30:45,690

And it's a watershed moment.

00:30:45,690 → 00:30:47,430

And it didn't solve all the problems.

00:30:47,430 → 00:30:54,130

But I agree with you, Matt, that the potential is so huge.

00:30:54,130 → 00:30:58,530

As you know, we're doing a lot with the Cosmos database

00:30:58,530 → 00:30:59,603

at Epic.

00:30:59,603 → 00:31:01,770

And that's where we're learning a lot of our lessons

00:31:01,770 → 00:31:03,850

about what's hard and what's easy, what works

00:31:03,850 → 00:31:05,530

and what doesn't work.

00:31:05,530 → 00:31:11,890

I think the dream of having a new modality

00:31:11,890 → 00:31:18,170

in diagnosis and treatment, that is very data-driven, where you

00:31:18,170 → 00:31:21,550

have the presentation of a patient, some labs,

00:31:21,550 → 00:31:23,970

physical exam, other data.

00:31:23,970 → 00:31:32,810

And you haven't-- one component of your doctoring being,

00:31:32,810 → 00:31:37,530

can you summarize 50 patients just like this?

00:31:37,530 → 00:31:43,482

Tell me how they were diagnosed, treated, their outcomes,

00:31:43,482 → 00:31:45,190

and let's have a conversation about that.

00:31:45,190 → 00:31:48,650

I think that can be done.

00:31:48,650 → 00:31:54,170

And tools like ChatEHR at Stanford are step one

00:31:54,170 → 00:31:57,843

on a 12-step path to get there.

00:31:57,843 → 00:31:59,010

MATTHEW LUNGREN: Absolutely.

00:31:59,010 → 00:32:00,610

Well, and this really starts to lead

00:32:00,610 → 00:32:06,130

into to some of the comments because most people know

00:32:06,130 → 00:32:10,250

we've done just tons of work on NLP

00:32:10,250 → 00:32:13,870

and health care like as a field.

00:32:13,870 → 00:32:17,030

And MSR, in general, really have led a lot of those techniques.

00:32:17,030 → 00:32:20,070

And then, now saying, because we have all this experience,

00:32:20,070 → 00:32:22,728

we know where some of the biggest problems to tackle

00:32:22,728 → 00:32:24,770

are, and how do we apply the newest intelligence,

00:32:24,770 → 00:32:26,890

and where does it work, where does it fall short.

00:32:26,890 → 00:32:29,930

I think as you start to look at connecting

00:32:29,930 → 00:32:34,310

the intelligence directly in the EHR, this comes up a lot.

00:32:34,310 → 00:32:36,970

And I think as I listen to your podcast,

00:32:36,970 → 00:32:39,250

and how you've revisited some of the predictions

00:32:39,250 → 00:32:42,290

in the book around this, I think some themes

00:32:42,290 → 00:32:43,630

are starting to emerge.

00:32:43,630 → 00:32:45,890

I don't know if you're, I mean, you're obviously

00:32:45,890 → 00:32:47,890

having a lot of these interviews with luminaries

00:32:47,890 → 00:32:51,410

across the spectrum, but I think the folks are making

00:32:51,410 → 00:32:56,370

the assumption at this point that you almost

00:32:56,370 → 00:32:59,450

are obligated to at least be double-checking with some

00:32:59,450 → 00:33:01,350

of these models on various tasks.

00:33:01,350 → 00:33:03,670

It's almost like it's not just like a nice to have.

00:33:03,670 → 00:33:05,910

It's almost mandatory to some extent.

00:33:05,910 → 00:33:08,790

And I don't know when that will flip entirely.

00:33:08,790 → 00:33:14,502

But I'm curious have you started--

00:33:14,502 → 00:33:15,970

JUSTIN NORDEN: When do you think?

00:33:15,970 → 00:33:16,310

MATTHEW LUNGREN: Well, I mean--

00:33:16,310 → 00:33:17,602

JUSTIN NORDEN: What's your bet?

00:33:17,602 → 00:33:18,890

Well, we can all go around.

00:33:18,890 → 00:33:22,230

MATTHEW LUNGREN: I'm the most biased guy here, in terms of--

00:33:22,230 → 00:33:24,150

I'm such a fanboy of some of this stuff--

00:33:24,150 → 00:33:25,870

but also realize where the pitfalls are.

00:33:25,870 → 00:33:28,970

But I will say, I'm willing to look past some of the faults

00:33:28,970 → 00:33:33,530

because I feel as though I have this mantra in my head

00:33:33,530 → 00:33:37,250

that, is the worst it'll ever be?

00:33:37,250 → 00:33:40,230

So I'm just OK with that.

00:33:40,230 → 00:33:44,792

And so I think it's almost now.

00:33:44,792 → 00:33:47,250

I mean, honestly, I think we really are at the point where,

00:33:47,250 → 00:33:52,350

especially just for literacy to get a feel of it if you haven't.

00:33:52,350 → 00:33:54,090

I think that that's been my, as you know,

00:33:54,090 → 00:33:59,110

my soapbox for a while now.

00:33:59,110 → 00:34:11,250

PETER LEE: Yeah, I think there's a difference between what

00:34:11,250 → 00:34:16,409

is medically best and what both doctors and patients are

00:34:16,409 → 00:34:18,869

ready for.

00:34:18,869 → 00:34:22,290

And so, I think that now I agree with you

00:34:22,290 → 00:34:27,710

that I think that there would be a benefit, for example,

00:34:27,710 → 00:34:31,690

in the reduction of medical errors,

00:34:31,690 → 00:34:38,449

or misdiagnosis if AI were used much more routinely

00:34:38,449 → 00:34:42,489

as a second opinion or second set of eyes.

00:34:42,489 → 00:34:46,330

It's just more, more data, more intelligence applied to things

00:34:46,330 → 00:34:49,770

that would happen right today.

00:34:49,770 → 00:34:55,489

Whether doctors and patients would trust that

00:34:55,489 → 00:34:59,210

or would go that extra mile, I don't know.

00:34:59,210 → 00:35:02,930

But I do agree that I think in just a single digit

00:35:02,930 → 00:35:07,810

number of years, at least patients would be alarmed

00:35:07,810 → 00:35:10,530

if they found out that their doctors weren't

00:35:10,530 → 00:35:13,590

getting the assistance of AI.

00:35:13,590 → 00:35:21,810

And so that flip, I think, is much less than 10 years.

00:35:21,810 → 00:35:26,030

JUSTIN NORDEN: Yeah, I think I agree with that, Peter.

00:35:26,030 → 00:35:31,210

And as I, and we've talked about this on before, 5% to 10%

00:35:31,210 → 00:35:34,430

of OpenAI queries are medically related.

00:35:34,430 → 00:35:37,170

Some of those doctors, some of those patients,

00:35:37,170 → 00:35:41,120

it's happening now.

00:35:41,120 → 00:35:42,210

I'll give a number.

00:35:42,210 → 00:35:45,070

I'll say two years.

00:35:45,070 → 00:35:46,670

And at least in certain cities--

00:35:46,670 → 00:35:49,510

take San Francisco, Silicon Valley,

00:35:49,510 → 00:35:51,330

where Matt and I are based--

00:35:51,330 → 00:35:53,730

it's a very tech-forward population.

00:35:53,730 → 00:35:57,050

A lot of early adopters in general.

00:35:57,050 → 00:36:01,690

If the patients are going to come in with significant turns

00:36:01,690 → 00:36:04,810

on AI to their doctors, and if the doctors

00:36:04,810 → 00:36:08,650

aren't at least capable to have a discussion with those patients

00:36:08,650 → 00:36:12,490

about those results and be literate on what's happening,

00:36:12,490 → 00:36:15,170

they're going to start to lose trust of their patients.

00:36:15,170 → 00:36:18,810

And so it's going, I think it's going to be forced,

00:36:18,810 → 00:36:21,470

and then the future is here.

00:36:21,470 → 00:36:22,910

It's just not evenly distributed.

00:36:22,910 → 00:36:25,850

PETER LEE: I mean, if you think of Stanford Medicine,

00:36:25,850 → 00:36:28,170

I don't know if Stanford Medicine has its own patient

00:36:28,170 → 00:36:32,750

portal or if they use MyChart or some blend.

00:36:32,750 → 00:36:35,770

But it's inconceivable, say, in that two-year time

00:36:35,770 → 00:36:38,730

frame, Justin, that patients wouldn't

00:36:38,730 → 00:36:43,970

be able to have a normal conversation with their chart

00:36:43,970 → 00:36:46,610

through the patient portal, whatever it is.

00:36:46,610 → 00:36:50,450

It could be an epic supplied MyChart thing, where

00:36:50,450 → 00:36:52,110

you have a conversation about things,

00:36:52,110 → 00:36:54,550

or it could be a Stanford thing or some blend.

00:36:54,550 → 00:36:57,433

But I think that the patient demand will be there.

00:36:57,433 → 00:36:58,850

And it's just such a natural thing

00:36:58,850 → 00:37:03,130

because everyone is motivated to have patients

00:37:03,130 → 00:37:04,670

engage with that portal more.

00:37:04,670 → 00:37:07,770

Again, it's an economic driver.

00:37:07,770 → 00:37:10,730

And right now, let's face it.

00:37:10,730 → 00:37:12,430

If you've just had a surgical procedure,

00:37:12,430 → 00:37:15,330

you look at your MyChart, let's say,

00:37:15,330 → 00:37:18,530

it's inscrutable to any normal patient.

00:37:18,530 → 00:37:21,650

You have no idea what these pathology results

00:37:21,650 → 00:37:23,650

are, and so on.

00:37:23,650 → 00:37:26,130

And so to be able to ask questions,

00:37:26,130 → 00:37:32,090

to have a conversation, explain it to me like a person who

00:37:32,090 → 00:37:35,090

has no medical knowledge or training or I'm 6 years old,

00:37:35,090 → 00:37:38,890

or whatever it is, and to have the conversation like that I

00:37:38,890 → 00:37:46,350

think is absolutely inevitable.

00:37:46,350 → 00:37:49,370

And I cannot imagine Stanford wouldn't be providing its

00:37:49,370 → 00:37:53,175

patients with that capability in that two-year time frame.

00:37:53,175 → 00:37:54,550

MATTHEW LUNGREN: I think, I mean,

00:37:54,550 → 00:37:57,890

what you're saying is, what's the ChatEHR equivalent

00:37:57,890 → 00:37:59,790

for patients in there?

00:37:59,790 → 00:38:04,010

Because the current problem was that doctors were having to go

00:38:04,010 → 00:38:06,810

cut and paste into this other place.

00:38:06,810 → 00:38:08,110

Patients are doing that.

00:38:08,110 → 00:38:10,110

They're taking the cutting and pasting.

00:38:10,110 → 00:38:11,070

How do we connect that?

00:38:11,070 → 00:38:13,870

And I think it's, to your point, complex.

00:38:13,870 → 00:38:16,610

And right now, the behavior that Justin is pointing out

00:38:16,610 → 00:38:19,550

is patients are using it.

00:38:19,550 → 00:38:21,890

And you have plenty of anecdotes,

00:38:21,890 → 00:38:24,330

where they're catching things or they're

00:38:24,330 → 00:38:25,817

able to understand things.

00:38:25,817 → 00:38:27,650

We've talked about the information asymmetry

00:38:27,650 → 00:38:31,350

that has plagued medicine since the dawn of the field.

00:38:31,350 → 00:38:36,730

Just how well you can explain things to your patient

00:38:36,730 → 00:38:39,410

and be really on the a peer-level journey together,

00:38:39,410 → 00:38:42,930

as opposed to this dictating at them

00:38:42,930 → 00:38:44,910

and expecting them just to follow along.

00:38:44,910 → 00:38:47,802

00:38:47,802 → 00:38:50,690

I think this is going to be inevitable in the very

00:38:50,690 → 00:38:51,890

short term.

00:38:51,890 → 00:38:53,250

PETER LEE: Yeah.

00:38:53,250 → 00:38:56,690

Since you mentioned, Matt, the podcast series, maybe

00:38:56,690 → 00:39:00,142

I can plug that a little bit.

00:39:00,142 → 00:39:04,690

Carrie Goldberg, Zak Kohane, and I wrote a book,

00:39:04,690 → 00:39:08,530

and published in March of 2023.

00:39:08,530 → 00:39:12,170

And we made a whole bunch of guesses

00:39:12,170 → 00:39:17,922

about what might happen with generative AI in medicine.

00:39:17,922 → 00:39:21,210

But since no one had access to GPT 4

00:39:21,210 → 00:39:23,930

at that time that we published the book,

00:39:23,930 → 00:39:26,570

it was all a work of pure speculation, informed

00:39:26,570 → 00:39:29,490

speculation, but still speculation.

00:39:29,490 → 00:39:31,930

And so two years on, the question

00:39:31,930 → 00:39:35,550

is, what have we learned, and what's really going on for real?

00:39:35,550 → 00:39:40,050

And the threat was, we'd have to write another book, which is

00:39:40,050 → 00:39:42,770

the last thing I wanted to do.

00:39:42,770 → 00:39:48,650

And so agreed instead to do a series of 12 podcasts

00:39:48,650 → 00:39:53,590

to talk to people who have been hands on out in the field,

00:39:53,590 → 00:39:57,210

or observing the business aspects of this,

00:39:57,210 → 00:39:58,870

both clinical and business aspects,

00:39:58,870 → 00:40:00,590

as well as technological aspects.

00:40:00,590 → 00:40:03,170

And so as part of the Microsoft Research podcast series

00:40:03,170 → 00:40:04,570

that people are interested.

00:40:04,570 → 00:40:09,170

And Matt, you are, of course, a great guest on this.

00:40:09,170 → 00:40:11,670

MATTHEW LUNGREN: We had this conversation before we came on,

00:40:11,670 → 00:40:17,110

but our production value is very much not at your level.

00:40:17,110 → 00:40:20,450

But no, it's a phenomenal listen and I

00:40:20,450 → 00:40:23,610

was saying before I think some of the comments

00:40:23,610 → 00:40:25,302

are echoed by different-- you have folks

00:40:25,302 → 00:40:27,010

from all these different backgrounds, all

00:40:27,010 → 00:40:28,750

these different areas of expertise,

00:40:28,750 → 00:40:32,830

but the themes are really pretty clear, I think.

00:40:32,830 → 00:40:35,650

So it's almost like a reinforcement of some things.

00:40:35,650 → 00:40:39,010

A lot of open questions are raised that I think are valid.

00:40:39,010 → 00:40:40,950

I think one of the most provocative things,

00:40:40,950 → 00:40:43,550

and we can probably wrap with this last topic.

00:40:43,550 → 00:40:47,290

But I think it was Zak that talked about,

00:40:47,290 → 00:40:51,090

how does the field of medicine change with this?

00:40:51,090 → 00:40:53,090

Not just that we're going to be using the models

00:40:53,090 → 00:40:59,170

and making our lives easier, but does the subspecialist

00:40:59,170 → 00:41:02,770

start to fade, and we go back to having a generalist that has

00:41:02,770 → 00:41:04,150

these superpowers with these?

00:41:04,150 → 00:41:05,983

I thought that was a very interesting topic,

00:41:05,983 → 00:41:08,390

because right for the last, what, 30-plus years,

00:41:08,390 → 00:41:11,100

even in my, I mean, I'm a subspecialist.

00:41:11,100 → 00:41:14,190

And part of it's the information doubles every 90 days.

00:41:14,190 → 00:41:17,650

All the things we know that make it important to know your field.

00:41:17,650 → 00:41:19,730

But the amount of knowledge that you can possibly

00:41:19,730 → 00:41:21,330

stuff in your head is limited.

00:41:21,330 → 00:41:24,117

But the generalist with these tools,

00:41:24,117 → 00:41:25,950

I think that was a very interesting comment.

00:41:25,950 → 00:41:29,230

And that would literally shift the culture of medicine,

00:41:29,230 → 00:41:33,380

even just what folks choose to go into and all kinds of things.

00:41:33,380 → 00:41:35,210

I don't know if you had a--

00:41:35,210 → 00:41:37,370

PETER LEE: Yeah, I thought about this,

00:41:37,370 → 00:41:41,010

and Zak and I obviously have discussed and debated

00:41:41,010 → 00:41:41,950

this a little bit.

00:41:41,950 → 00:41:47,260

00:41:47,260 → 00:41:49,270

For me, I always have the problem.

00:41:49,270 → 00:41:54,150

I see two opposing arguments here.

00:41:54,150 → 00:41:57,370

Historically, technology has only

00:41:57,370 → 00:42:01,770

increased the number of medical specialties, not decreased it.

00:42:01,770 → 00:42:05,930

And I've said publicly there have

00:42:05,930 → 00:42:09,730

been two times where technology has eliminated specialties.

00:42:09,730 → 00:42:13,630

We don't have phrenology anymore,

00:42:13,630 → 00:42:15,130

because of all the technologies that

00:42:15,130 → 00:42:18,190

have led to the rise of neurology and so on.

00:42:18,190 → 00:42:22,270

And we also don't have barbers doing bloodlettings anymore.

00:42:22,270 → 00:42:28,570

And there's lots of specialties that are, in part,

00:42:28,570 → 00:42:30,930

technology-powered.

00:42:30,930 → 00:42:36,530

But beyond that, you can't find examples of medical specialties

00:42:36,530 → 00:42:39,990

that have disappeared because of advances in technology.

00:42:39,990 → 00:42:41,470

In fact, just the opposite.

00:42:41,470 → 00:42:43,770

So that's one argument.

00:42:43,770 → 00:42:46,450

The other argument, though, is--

00:42:46,450 → 00:42:52,930

and I think Sebastian Bubeck, in the podcast made this point--

00:42:52,930 → 00:42:59,770

humans have a hard time coping with a huge amount of knowledge.

00:42:59,770 → 00:43:03,730

And so that's one of the likely reasons

00:43:03,730 → 00:43:07,490

why we have medical specialties to begin with.

00:43:07,490 → 00:43:09,540

You can become an endocrinologist,

00:43:09,540 → 00:43:11,290

but it's hard to become an endocrinologist

00:43:11,290 → 00:43:15,210

and a nephrologist and a cardiologist and so on.

00:43:15,210 → 00:43:18,530

It's just too much for a single human being to cope with.

00:43:18,530 → 00:43:23,290

But an AI could, and at least in other fields,

00:43:23,290 → 00:43:28,570

we've seen this most vividly in agronomics.

00:43:28,570 → 00:43:33,370

You can actually see a benefit to an AI model understanding

00:43:33,370 → 00:43:38,970

agriculture in Brazil versus Europe versus the US,

00:43:38,970 → 00:43:42,410

and being able to essentially triangulate

00:43:42,410 → 00:43:45,250

across those different agricultural practices

00:43:45,250 → 00:43:49,810

to become a superhuman agronomist.

00:43:49,810 → 00:43:53,450

And so the counterargument is, is it possible

00:43:53,450 → 00:44:02,690

that a single AI that can be as good as any human being in 50

00:44:02,690 → 00:44:05,410

medical specialties?

00:44:05,410 → 00:44:08,610

It might then enable general practitioners,

00:44:08,610 → 00:44:11,910

that's, human beings to be general practitioners.

00:44:11,910 → 00:44:18,630

And I don't know which way it would-- in fact,

00:44:18,630 → 00:44:22,270

I think I'm less equipped than both of you,

00:44:22,270 → 00:44:26,880

Justin and you, Matt, in predicting which way it'll go.

00:44:26,880 → 00:44:31,490

00:44:31,490 → 00:44:33,450

JUSTIN NORDEN: Well, my own version

00:44:33,450 → 00:44:36,010

is I also need more time.

00:44:36,010 → 00:44:37,550

I don't have a strong opinion yet.

00:44:37,550 → 00:44:40,010

And I know I'm copying out on the answer right now,

00:44:40,010 → 00:44:44,650

but I guess maybe it's a push to really think through

00:44:44,650 → 00:44:45,870

it and get there soon.

00:44:45,870 → 00:44:47,630

But I agree with you.

00:44:47,630 → 00:44:51,710

I see good arguments both ways.

00:44:51,710 → 00:44:55,210

Does primary care shift solely AI-first,

00:44:55,210 → 00:44:57,710

and care move towards specialties and interventions

00:44:57,710 → 00:44:59,170

in doing?

00:44:59,170 → 00:45:01,330

Does it shift the opposite and everything

00:45:01,330 → 00:45:05,870

gets done by a superpowered primary care physician?

00:45:05,870 → 00:45:10,530

I think, in some cases, we may see both of, I think early on,

00:45:10,530 → 00:45:11,450

actually we see both.

00:45:11,450 → 00:45:13,010

We see some primary care physicians

00:45:13,010 → 00:45:17,650

who take on so, so more, and really lean into the tools

00:45:17,650 → 00:45:22,170

and cover many more patients, much more in depth.

00:45:22,170 → 00:45:25,370

And I think we see certain routes in primary care

00:45:25,370 → 00:45:27,970

where patients are going to self-diagnose,

00:45:27,970 → 00:45:30,810

go to Amazon or something, get what

00:45:30,810 → 00:45:34,490

used to be a primary care task done almost in a fully automated

00:45:34,490 → 00:45:35,290

way.

00:45:35,290 → 00:45:39,650

And so I see both of those things happening,

00:45:39,650 → 00:45:41,690

almost immediately.

00:45:41,690 → 00:45:43,610

Then the question is, how does that

00:45:43,610 → 00:45:45,450

perturb the system, which I haven't

00:45:45,450 → 00:45:47,583

come to my own conclusions yet.

00:45:47,583 → 00:45:50,250

PETER LEE: If I could just say, I think it's so important for us

00:45:50,250 → 00:45:52,850

to be thinking about these things now,

00:45:52,850 → 00:45:54,310

and to be really grappling.

00:45:54,310 → 00:45:56,490

And I have tremendous optimism, because what

00:45:56,490 → 00:46:00,250

I see in the medical world as a techie is

00:46:00,250 → 00:46:01,890

the medical world, and especially

00:46:01,890 → 00:46:04,810

leading institutions like Stanford and others,

00:46:04,810 → 00:46:07,250

really confronting these things head on.

00:46:07,250 → 00:46:10,250

Because another question that's similar

00:46:10,250 → 00:46:16,610

is, if we empower every person with this kind

00:46:16,610 → 00:46:20,170

of medical superintelligence, will we finally

00:46:20,170 → 00:46:25,010

realize the true benefits of early diagnosis and better

00:46:25,010 → 00:46:28,990

health, and therefore, reduce cost and burden on the system?

00:46:28,990 → 00:46:31,610

Or will just the opposite happen,

00:46:31,610 → 00:46:34,090

where medicine is just always going

00:46:34,090 → 00:46:37,370

to be an exact enough that your AI is always

00:46:37,370 → 00:46:39,930

going to find things that are wrong with you,

00:46:39,930 → 00:46:42,410

and is going to now motivate people

00:46:42,410 → 00:46:45,970

to be an even bigger burden on an overstressed health care

00:46:45,970 → 00:46:46,470

system?

00:46:46,470 → 00:46:48,850

And again, I think it's very hard

00:46:48,850 → 00:46:51,170

to know which way it will go.

00:46:51,170 → 00:46:53,410

But what I think I feel good about

00:46:53,410 → 00:46:57,290

is that really, really smart people in the field

00:46:57,290 → 00:46:59,990

are thinking really hard about these things right now.

00:46:59,990 → 00:47:04,250

And we just have to keep pressing on that.

00:47:04,250 → 00:47:06,670

JUSTIN NORDEN: Well, we pulled this up.

00:47:06,670 → 00:47:08,050

I didn't go through it before.

00:47:08,050 → 00:47:12,170

So this is a friend from Morgan Cheatham,

00:47:12,170 → 00:47:16,310

and he wrote and talked about what you mentioned,

00:47:16,310 → 00:47:18,650

the shifting of value, where it's

00:47:18,650 → 00:47:20,550

been focused on diagnosis here in the middle,

00:47:20,550 → 00:47:23,150

and maybe it'll shift towards diagnosis or intervention.

00:47:23,150 → 00:47:25,690

00:47:25,690 → 00:47:30,950

I think this may be someday, someday, someday where it goes.

00:47:30,950 → 00:47:35,130

But I actually think value is going to shift far to the right.

00:47:35,130 → 00:47:36,910

I don't think we're near.

00:47:36,910 → 00:47:40,090

I don't think we're close at all to shifting value

00:47:40,090 → 00:47:43,090

upstream towards diagnostic and payment models.

00:47:43,090 → 00:47:45,630

Even though in theory, that's possible from the technology,

00:47:45,630 → 00:47:48,290

I think we're really going to shift to the right, which

00:47:48,290 → 00:47:50,450

is where we're talking about with changes

00:47:50,450 → 00:47:53,690

in coding practices, changing in finding high-value patients

00:47:53,690 → 00:47:56,650

and procedures, hospitals doing a lot more.

00:47:56,650 → 00:48:00,670

So maybe Morgan was just optimistic from his timeline.

00:48:00,670 → 00:48:04,890

But I think we're going to shift a lot to the right, at least

00:48:04,890 → 00:48:06,822

as a first step for where this goes.

00:48:06,822 → 00:48:09,030

PETER LEE: One thing I like about this chart, though,

00:48:09,030 → 00:48:14,470

is I would, let me just stick to Microsoft Research,

00:48:14,470 → 00:48:21,050

but I think every technology R&D organization is the same.

00:48:21,050 → 00:48:25,210

10 years ago, if a techie researcher

00:48:25,210 → 00:48:27,970

thought about medicine, they immediately

00:48:27,970 → 00:48:31,714

gravitated to diagnosis.

00:48:31,714 → 00:48:34,070

And that's good and that's important.

00:48:34,070 → 00:48:39,750

But that is not health care and medicine.

00:48:39,750 → 00:48:41,757

And so one thing I think that we've

00:48:41,757 → 00:48:43,590

gotten smarter about in Microsoft Research--

00:48:43,590 → 00:48:46,130

but we're far from alone, I think, lots and lots of places

00:48:46,130 → 00:48:49,530

have gotten smarter-- is we're seeing the bigger picture better

00:48:49,530 → 00:48:50,590

than we used to.

00:48:50,590 → 00:48:55,650

And so when I see Morgan's chart here, that's what I see,

00:48:55,650 → 00:48:59,050

is that we actually understand that there

00:48:59,050 → 00:49:02,110

are things like prevention and intervention

00:49:02,110 → 00:49:04,870

that could also be helped through technology.

00:49:04,870 → 00:49:07,392

And that's actually a major advance for the tech world.

00:49:07,392 → 00:49:09,350

MATTHEW LUNGREN: It's the start of the journey,

00:49:09,350 → 00:49:11,470

and it really is the diagnostic step.

00:49:11,470 → 00:49:16,930

And oftentimes, Justin, I mean, most of the patients

00:49:16,930 → 00:49:19,890

I've seen in my career even come with essentially

00:49:19,890 → 00:49:21,710

having that diagnosis.

00:49:21,710 → 00:49:26,050

And then it's about the decision-making, the judgment,

00:49:26,050 → 00:49:31,310

and the knowledge to make the right decisions later over time.

00:49:31,310 → 00:49:34,742

So anyway, this is more to come here,

00:49:34,742 → 00:49:37,590

but yeah, this has been absolutely phenomenal.

00:49:37,590 → 00:49:39,810

Thank you, Peter, for sharing your time

00:49:39,810 → 00:49:42,530

with us and your insights.

00:49:42,530 → 00:49:47,470

And as you know, we try to keep this as a ongoing thread

00:49:47,470 → 00:49:49,510

through multiple discussions.

00:49:49,510 → 00:49:54,030

And I think this really added a lot to the prior ones.

00:49:54,030 → 00:49:55,722

So thank you so much for joining us.

00:49:55,722 → 00:49:56,930

PETER LEE: It was really fun.

00:49:56,930 → 00:50:01,040

Matt, thanks, and thanks to you, Justin, for having me.

Sorry, your browser isn't supported.

Page load failed