Hi everyone, my lab has recently made Foresight - in short, it is a GPT-3 like language model that can simulate a patient's future (forecast disorders, medications, procedures, symptoms, ...). It was trained and tested on two large hospitals in UK covering both physical and mental health. Any feedback is much appreciated (Twitter or here).

Paper: arxiv

Demo: foresight

Comments

You must log in or register to comment.

EmmyNoetherRing t1_j0yo8zu wrote on December 20, 2022 at 11:46 AM

Have you disaggregated your evaluation results? Ie, How does your accuracy differ by demographic segment or illness category?

w_is_h OP t1_j0yr1yu wrote on December 20, 2022 at 12:19 PM

Hi, we did not do that but will mark it down for the next iteration. During the manual tests, we did not see any obvious biases or problems given ethnicity/sex but probably good to make a quantitative analysis.

EmmyNoetherRing t1_j0yrv8i wrote on December 20, 2022 at 12:28 PM

Don’t forget to check for accuracy by illness category too. Humans have biases because of social issues, machines also pick up biases due to the relative shapes/distributions of the various concepts they’re trying to learn— they’ll do better on simpler ones and more common ones. You might get high accuracy on cold/flu cases that show up frequently in the corpus and have very simple treatment paths, and because they show up frequently that may bump up your overall accuracy. But at the same time you want to check how it’s handling less common cases whose diagnosis/treatment will likely be spread across multiple records over a period of time, like cancer or auto-immune issues.

It’s a good idea to verify that your simulation process isn’t accidentally stripping the diversity out of the original data, by generating instances of the rarer or more complex cases that are biased towards having traits from the simpler and more common cases (especially in this context that might result in some nonsensical record paths for more complex illnesses).

w_is_h OP t1_j0z6n68 wrote on December 20, 2022 at 2:39 PM

We did not explore intrinsic biases in the data, like doctors prescribing a certain medication or giving a certain diagnosis because of someone's social status, or because something is more common or anything else. This for sure happens, there are many papers talking about these problems in healthcare, and we in fact think that the model (foresight) can be used to explore biases in the data. In the future, we hope to resolve this by training the models also on medical guidelines and biomedical literature - so not just hospital text.

We did analyse the predictions for problems like the model always predicting the most common concepts or the simplest concepts. I will add a histogram of the F1 scores over different concepts to the paper, but it does show that the model predicts a very wide range of different concepts accurately. We've also done a manual analysis, where 5 clinicians checked the model predictions - and in fact, the model is better at predicting complex and strange cases. But, this is expected because forecasting someone's future and saying they will have the flu in 3 months is nearly impossible.

EmmyNoetherRing t1_j0zl99m wrote on December 20, 2022 at 4:21 PM

Nice!

[deleted] t1_j0z0lxj wrote on December 20, 2022 at 1:50 PM

[deleted]

FHIR_HL7_Integrator t1_j0z41ws wrote on December 20, 2022 at 2:19 PM

I work as an interop architect in healthcare. Payer,provider, device, and emr/ehr side. This is of interest to me. I will read your paper and try your demo and give you my thoughts.

I read your abstract but couldn't read the whole thing ARXIV access isn't working for me this morning. I tested the app and i believe the timeline view is listing a chronological list of encounters (visits) and symptoms and then provides in the box below predict a listing of predicted potential morbidities. I understand this is a POC app so its doesn't really show a ton of details.

My question is how are you getting at tis data? Is it a database ETL from the EHR? Are you capturing incoming messages (HL7, CDA, FHIR, etc.)? I am working on a project for spontaneous building of healthcare communication between different healthcare entities - as it stands right now we have to manually address data links. I want AI to translate between entities.

Anyway, fantastic job and congratulations. I hope you have continued success!

w_is_h OP t1_j0z8x15 wrote on December 20, 2022 at 2:55 PM

I can send the paper if needed. Regarding the timeline view - you are exactly right, and yes this is just a quick demo, new features will be coming out in the following months (this is a research tool, nothing commercial).

We take all free text from a hospital EHR (done using CogStack a data harmonization platform for hospitals) and extract disorders, symptoms, medications and all relevant biomedical concepts using MedCAT. Then create the timelines, enrich them with any structured data we might have access to and train the models.

Thank you for the feedback.

FHIR_HL7_Integrator t1_j0z9l0i wrote on December 20, 2022 at 3:00 PM

Would love to read the paper. If I DM my email would you be ok with that? Thanks

w_is_h OP t1_j0z9x5k wrote on December 20, 2022 at 3:03 PM

Of course, please go ahead.

m98789 t1_j105g7x wrote on December 20, 2022 at 6:30 PM

Where can I find more about medcat?

w_is_h OP t1_j1062ph wrote on December 20, 2022 at 6:34 PM

You can find medcat here.

persistentrobot t1_j128d70 wrote on December 21, 2022 at 3:14 AM

You should take a look at UniHPF. They make minimal assumptions on data format/mapping by chucking everything into a large language model. It's comparable to performance on FHIR embeddings. I think this is an interesting avenue for machine learning in health, but the failures of large language models are difficult to uncover. Like how much can covariates shift before a prediction flips? Or is the act of measuring the covariate the only information we need?