Submitted by ShakeNBakeGibson t3_10wblpv in IAmA

We’re Chris Gibson u/ShakeNBakeGibson, CEO and co-founder of Recursion Pharmaceuticals, and Imran Haque u/IHaque_Recursion, Recursion’s VP of Data Science. Our company was founded in 2013 by two grad students and a professor looking to take a less biased approach to drug discovery, using tech like AI and robotic automation.

Our work focuses on generating massive amounts of biological and chemical data in-house in our own labs using lots of robots, and use it to train our machine learning algorithms to get better at predicting the result of experiments before we do them! Our drug discovery engine maps biology and chemistry, and helps scientists navigate this map by generating trillions of predicted relationships between genes and chemical compounds. We also release some of this data to the public - we recently deployed our 5th open- source dataset of this information.

We’re all about figuring out how to predict how to treat diseases best! With 5 programs in clinical trials, and dozens more in the works, we’re here and looking forward to answering your questions on drug discovery, AI, data science and more. We'll kick off at 1PM PT / 2PM MT / 4PM ET - Ask us anything!

Proof: Here's my proof

Here's Imran's proof

Edit: Lots of great questions and comments! Our two hours have come to a close. Thank you to everyone who turned out. For more info on MolRec, you can check out the details here. For more info on our open source dataset, RxRx3, you can find that here. You can also catch us over on Twitter, YouTube, or email us at That’s a wrap, folks!



You must log in or register to comment.

Novel-Time-1279 t1_j7mcq3y wrote

What evidence exists that the insights gained via single-cell perturbations can help uncover novel disease targets? A critic might say a single cell perturbations are simply not a good model for complex multicellular disease processes as the disease phenotype is rarely a linear sum of single cell phenotypes. Is the method most applicable to rare diseases with a clearly understood gene driver or also to highly prevalent diseases? I think Yumanity failed recently with their yeast disease model in neurology so I’m curious of how you address this criticism


ShakeNBakeGibson OP t1_j7mh7nq wrote

All reductions of complex biology cut out some of the information and become poorer representations of the patient. Scale and translation are opposing forces in biological experimentation. The most translational model is human - which is hardest to scale. The least translational model is in silico, but is easiest to scale.

What we do at Recursion is work in a human cell, the smallest unit of biology that has all of the instructions. It is not perfectly translational, but there are many examples of where it has worked well. But it does allow us to scale across biology and chemistry (whole genome scale, ~1M compounds, etc).

Using that model, we find the strong correlates of gene function and patient biology from the world’s knowledge of disease, and explore those in our dataset to find ways of modifying those processes. We then do the rigorous work of translating success from our cellular models in much more complex systems. Our clinical programs demonstrate that we are able to confirm these insights from the platform in more complex in vivo models.


wellboys t1_j7okvwl wrote

How/do you anticipate overcoming regulatory hurdles associated with that type of use case? I can see how this data would be valuable, but this whole concept sounds like a giant HIPA violation as soon as you try and operationalize it.

ETA: I don't think the limiting factor on big data applications to public health is the lack of conceptual frameworks, I think it's a failure of this type of plan when the rubber hits the road. I'd rather be wrong, so tell me how I am!


WhatsFairIsFair t1_j7pw8ae wrote

I don't get where you're coming from. Is it the combining with the world's datasets piece? They're probably using either publicly available datasets or have specific agreements with companies to make use of their datasets.

HIPAA concerns patient identity mainly, so if the dataset is anonymized or fictionalized then it's likely fine. Or if it can't be anonymized then they'll just add some extra paperwork before sharing.

Don't think that HIPAA means your data isn't shared with other companies. It just means the companies will sign some paperwork first.

Edit: also the rubber was on the road 9 years ago apparently because they've been doing this since 2013


t_rexinated t1_j89cjqf wrote

they use a combo of already available public datasets in addition to strategic partnership or licensings that give them accessibility to otherwise walled-off, yet potentially highly valuable data sources.

Regardless of where it comes from, everything is regulatory/HIPAA compliant prior to the data actually moving hands.


IAmA_Nerd_AMA t1_j7puy0x wrote

To simplify: you let the AI do the brainstorming at the cellular level but you test the most successful of those predictions using traditional methods.


reddit455 t1_j7m7apr wrote

which outcome provides the most scientific benefit?

which one contributes more to our collective brain?


the millions of simulations that fail


the one that solves the problem


wasn't viagra a hair loss drug with an "unfortunate" yet common side effect identified during trials :P


is the AI looking for "alternative uses"?


ShakeNBakeGibson OP t1_j7mcrga wrote

Love that we have one of our first questions even before the official start. Honestly, the millions of simulations that fail enable the one that solves the problem. Both matter!

…and yes, Viagra was a drug originally developed for hypertension and angina pectoris, and as the story goes, when the drug didn’t work that well for those indications and they stopped the trials, none of the participants wanted to give back their clinical trial drugs…. because, well, you know…

But counting on serendipity to give us outcomes like that, in diseases of higher unmet need of course, is not a recipe for success. So we’ve created Recursion to systemize serendipity. But we aren’t stopping at known drugs… we’ve built a dataset spanning over a million molecules that could help us find totally new drugs for many diseases. So its alternative uses, new uses, unexpected uses, and more.

My super fun lawyer would want me to also say: this discussion may contain forward looking statements that are based on current day estimates and operations and importantly are subject to a number of risks. For more details please see the "Risk Factors" in our 10-Q and 10-K SEC filings.

EDIT: added link to comment


EmilyU1F984 t1_j7odx3x wrote

They didn’t stop the trials mate.

Viagra was brought to market first for Pulmobary Hypertension, and is still on the market for that indication.

After release reports showed massive benefit in ED, this approval for that second indication was obtained.

It is still the major treatment option for pulmonary hypertension an otherwise very quickly lethal disease and now progression can be delayed by decades at best.


ShakeNBakeGibson OP t1_j7qcsog wrote

Please see the following paper with many helpful refs ( Since it is behind a paywall, here's the relevant bit...

"Pfizer was seeking a drug for angina when it originally created sildenafil (Viagra) in the 1980s. As an inhibitor of phosphodiesterase-5 (PDE5), sildenafil was intended to relax coronary arteries and therefore allow greater coronary blood flow. The desired cardiovascular effects were not observed on the healthy volunteers tested at the Sandwich, England, R&D facility in 1991–1992. However, several volunteers reported in their questionnaires that they had had unusually strong and persistent erections. Pfizer researchers did not immediately realize that they had a blockbuster on their hands, but when a member of the team read a report that identified PDE5 as a key enzyme in the biochemical pathway mediating erections, a trial in impotent men was quickly set up. A large-scale study carried out on 3,700 men worldwide with erectile dysfunction between 1993 and 1995 confirmed that it was effective in 63% of men tested with the lowest dose level and in 82% of men tested with the highest dose. Of note, in many of these studies, Pfizer’s researchers had difficulties retrieving unused sample of the drug from many subjects in the experimental group as they did not want to give the pills back! By 2003, sildenafil had annual sales of US $1.88 billion and nearly 8 million men were taking sildenafil in the United States alone."

Sildenafil was approved for ED in the US in 1998, but was later approved for pulmonary hypertension in the US 2005.


Trumpfreeaccount t1_j7q372i wrote

What a surprise a guy who's touting his ai based business is full of shit. Lol.


JackIsBackWithCrack t1_j7qj9jz wrote

Settle down buddy


Trumpfreeaccount t1_j7qkzq5 wrote

Pretty settled. Not sure why you're defending a guy who is just making stuff up in an AMA to make himself look knowledgeable. And using phrases like systemize serendipity.


JackIsBackWithCrack t1_j7qwc6f wrote

Homie 99% of the posts on this god-forsaken subreddit are literally for the express purpose of free advertising. Claiming this guy is unknowledgeable solely because he is planning on using AI is both pointless and obtuse. Pointless because he’s probably just some PR intern and obtuse because AI has the potential to revolutionize the medical field (among others).


Hipshotopotamus t1_j7mjbiq wrote

Do you start with active sites and conformation and then try to identify a match from ChEMBL? How do you pick where to start?


IHaque_Recursion t1_j7n0hnm wrote

We actually don’t start our drug discovery efforts from single targets – check out my earlier reply in the AMA for more details. ChEMBL certainly is an excellent source of structural information, but our insights come not from these data, but rather from high-dimensional relationships between cells treated with compounds and genetic knockout. We advance series of compounds using this data prior to having any information about the target itself.


ShivohumShivohum t1_j7mkdbk wrote

How widely used are GNN based frameworks in your research?


IHaque_Recursion t1_j7n0rk7 wrote

GNNs are in the suite of methods that we use and evaluate. But it’s useful to recognize that although we often draw molecules as graphs, that is not necessarily the only useful (or best) representation for molecules in machine learning. We recently published (poster and talk, paper) research using DeBERTa-style representations and self-supervision over molecular graphs, achieving SOTA results on 9/22 tasks in the Therapeutic Data Commons ADMET tasks.


nucleosome t1_j7nfycc wrote

Do you guys need someone who can do CyTOF?


Softcorps_dn t1_j7my7oy wrote

Viagra was studied for use against high blood pressure before it became a boner pill.


Chance-Mammoth1245 t1_j7m8xyz wrote

You recently posted on LinkedIn that you were publicly sharing millions of microscopy images that Recursion had collected in order to enhance community drug development efforts. BUT, in that release, you purposefully kept 16,000 of the genes anonymous.

Are you trying to get the benefits of appearing to support "open science", while not actually providing data that could help your competitors?

Linkedin post:


IHaque_Recursion t1_j7mn89n wrote

So, data sharing in industrial science is complicated. I’ve spent my career in biotech driving for greater openness and data release in the companies where I’ve been. The “natural” state of data is to be siloed. This isn’t just an industrial thing – I’ve read plenty of papers from academic groups with “data available on request” (lol nope, I tried) – and the driver is always the same: a fear that “we spent this money to make the data, how do we get value out of it?”

One of the reasons I joined Recursion in 2019 was that Chris and the team shared that commitment to sharing learnings back to the world. The balance we’ve struck to support open science, but also use this data to drive internal research and develop therapeutics as a public company, is to share a huge dataset that is partially blinded. In RxRx3 we are revealing ~700 genes and 1600 compounds. We’ve sometimes chosen different points on the balance; for example, our COVID datasets RxRx19a and RxRx19b were released completely openly (CC-BY) because we thought the public health crisis was more important than any commercial interest we might have in the data. Our current aim is to continue to unblind parts of the RxRx3 dataset over time, so please stay tuned for additional releases over time.

We have also contributed to open science releasing not just datasets, but tools. Associated with our COVID datasets, we released a data explorer allowing folks to explore the results from our COVID screens. Along with RxRx3, we released a tool (MolRec) where people outside of Recursion can explore some of the same insights that our scientists use to generate novel therapeutic hypotheses and advance new discovery programs, and get a look at how Recursion is turning drug discovery from a trial-and-error process into a search problem.


70looking20 t1_j7mg0vb wrote

  1. How is the job market for biotech 2023/2024? Especially for computational scientists?
  2. I’m a Comp Chem PhD graduating end of 2023, looking to switch to CADD. What qualities are you guys looking for from a computational drug discovery scientist apart from those mentioned in the job descriptions? Thank you!

IHaque_Recursion t1_j7mp91v wrote

Though there have been a lot of painful layoffs in biotech and tech lately, we and many other companies are still hiring. That said, computational chemistry is without a doubt going to be a critical component of the future of drug discovery and it’s awesome you’re kicking off your career in this space. We will certainly be continuing to grow in this space and would love to hear more about your work and journey in this field. As you can probably tell, we look to hire innovators who are passionate about their work and committed to bold, outside the box thinking in pursuit of our mission.


NotAPreppie t1_j7mmep1 wrote

Is it true that to understand recursion you must first understand recursion?


sneaky_squirrel t1_j7oekt3 wrote

I'll take the first recursion joke I can I'll take the first recursion joke I can I'll take the first recursion joke I can ...


BioRevolution t1_j7m99oh wrote

  1. What is your reason behind not hosting quartly Earning Calls to adress and expand on certain topics together with analysts and make them available on your website/youtube?

  2. Are you planning to repeat the Recursion Download Day as a yearly event?


ShakeNBakeGibson OP t1_j7mexdn wrote

We don’t currently do earnings calls but we like engaging with people where they are, like here on reddit.


Download Day was a great event! We’re currently thinking we’ll do it every 12-24 months–stay tuned.


avelak t1_j7mxlh9 wrote

Wait you honestly think a reddit AMA is a better use of your time as CEO than actual earnings calls???

Who's your target audience? 14-year-olds with a weekly allowance?


ShakeNBakeGibson OP t1_j7n0xuf wrote

We spend a lot of time with investors and analysts in a wide variety of forums from the JP Morgan Healthcare conference to social media. For example, we recently spent a whole day with our analysts and many key investors digging deep into our strategy, platform, pipeline and partnerships at [Download Day]( You can watch all four hours of detailed content, including questions from analysts at the link.
We think spending <1% of our time finding creative ways to connect to new audiences is a good use of time. We know there are potential future employees on reddit, potential partners and collaborators and more on here. And if we can inspire a bunch of 14 year olds to use their talents for science, that sounds like a win too.


t_rexinated t1_j89eo1v wrote

haha wtf do you even know about anything, you dummy


SpaceElevatorMusic t1_j7m6pxd wrote

Hi, and thanks for this AMA.

I've read that AI could be used for reducing the amount of computation necessary to model really complex things like protein folding. Does your work touch on that, or are you otherwise able to comment on whether or not that's true?

In general, how much success have you had in "predicting the result of experiments before we do them"?

Lastly, while I realize you're a company and seeking to make money, do you have any standards in place that you're committed to to avoid price gouging people and/or taxpayers for access to the results of your healthcare-related research?


ShakeNBakeGibson OP t1_j7mcdru wrote

Thank you for the questions!
AI has made huge inroads into tough problems like protein folding. Huge credit to Deepmind and so many others there!
We’ve gone after a different problem than AlphaFold (and others). Can we understand the function of all the proteins in our body without necessarily needing to know the structure? If one could understand cause and effect of all the proteins (when they are overactive, not present, or broken, etc), we could start to better understand what protein to target… and that is important because 90% of drugs that go into clinical trials fail and most often that is because the wrong target is picked.
In terms of successes predicting the results of experiments — we can test ourselves by looking for “ground truths” about biology and chemistry – relationships and pathways that have been proven out in humans – that show up in our maps of biology and chemistry. When our teams search the map and see landmarks they expect, it gives them (and us) extra confidence to explore new ideas surfaced there.
And to your final question – while I can’t say exactly what we’ll charge for future medicines because we’re still fairly early in the development process, I do believe the best way to bring down drug prices is to industrialize the drug discovery process. If we can find a way to scale our pipeline, bringing better medicines to patients faster, with less failure, we can start to bend the cost curve. That’s our goal in the coming decades.


BioRevolution t1_j7mc4rl wrote

What are your "dream" partnerships? Are there any companies out there that you are excited/inspired by and would love to have by your side (Other than Bayer and Roche of course :))


ShakeNBakeGibson OP t1_j7meg5a wrote

I love this question. We’re really lucky to already be working with two dream partners! One with Bayer in fibrosis and one with Roche/Genentech in neuroscience and a single oncology indication.

What we look for in new, transformational partnerships are threefold:

  1. Learning for us - can we learn from a partner to make the company better for the future?
  2. Impact - can we drive value for patients and our shareholders?
  3. Data - can we gain access to, retain access to, subsidize access to, or otherwise build our dataset?

[Edited - list formatting]


BioRevolution t1_j7mb1we wrote

What was your reason behind the sequential entry into the different "omics" technologies: Phenomics makes sense, but why then not then move into Metabolomics or Proteomics that are more established in comparison to transcriptomics?


IHaque_Recursion t1_j7mg5db wrote

Might be some personal bias here – I come from a sequencing background before Recursion – but I don’t necessarily think metabolomics or proteomics are more established than transcriptomics (especially in a research context; clinical testing is different!). The past 10-15 years have seen an absolute _explosion_ in the ability to generate (and analyze/interpret) sequencing data at scale. One of our core principles is being able to generate high-dimensional data at scale, and from that perspective, transcriptomics is a great complement to phenomics. Metabolomic and proteomic technologies (whether affinity or MS-based) are still more expensive and smaller scale than what you can achieve by sequencing. That being said, as technology advances and we find the right application areas, we’re interested in exploring what these other readouts can do for us.


Linooney t1_j7mszql wrote

As a computational proteomics researcher who works mostly in MS, it feels like there are dozens more transcriptomics colleagues around me per metabolomics/proteomics person lol Though there are definitely exciting developments in high throughput technologies, even at single cell scale, coming up.


BioRevolution t1_j7mbq3y wrote

Questions regarding your lab automation:

  1. What are your ambitions for the automated chemical synthesis platforms? And how do they compare to e.g. the Eli Lilly platforms that they build together with Strateos? (

  2. Have you looked into partnering up for advanced automation with companies such as Zymergen/Gingko Bioworks and buy their RACS (Reconfiguarble Automation Carts)

  3. What Vendors are you most happy with/planning to continue using in that area? Hamilton/Tecan/Thermo Fischer/Chemspeed...

  4. Can you show more footage of your automated labs?


IHaque_Recursion t1_j7my3wq wrote

1 - We aim to close the loop between high-dimensional, biological profiling of compounds and rapidly learning how to drive the compound series’ evolution to higher potency, lower risk and better kinetics. This is a huge and critical component of the overall vision of industrializing drug discovery. In practice we are dedicating major efforts into ML-guided SAR and how automated synthesis integrates into this plan is part of our roadmap.

2,3 - given the highly custom nature of the automation systems we have built, and the need for ultra-high control over experimental precision, we have relationships with several automation experts in this space. As far as partnerships in this space are concerned, we can’t comment on specific business development plans or transactions until we announce them publicly. What I can say is that we recognize the work it has taken over the last decade to map and navigate biology, and we believe there are many other teams and technologies that have been developing in parallel and we’re always exploring options to bring in additional capabilities that may accelerate our mission.

4 - The “Recursion 101” video we released in October of 2022 has some of the most current footage of our automation labs — if you haven’t seen the video, we (selfishly) think it’s worth the watch. We have also released “Recursion's Mapping & Navigating Demonstration” which shows footage of our laboratories.


Novel-Time-1279 t1_j7md0o5 wrote

Are you limited by capital or by discovery? Eg have you discovered what you think are disease targets with unmet need where you’re reasonably confident you have a real target, but you have to deprioritize it due to trial costs? Or is the limiting factor finding targets and agonists/antagonists for them?


ShakeNBakeGibson OP t1_j7mncyd wrote

Neither. Time is the most limited resource. So much unmet need and so much science to explore. Having a searchable database of 3 trillion gene and compound relationships results in a superabundance of potential insights. We want to focus our efforts on those where we have the highest confidence in the compound<>gene relationship and that addressing this biology has a high likelihood of addressing patient needs. To do this, we integrate additional automated layers of information, such as transcriptomics and SAR tractability to accelerate discovery and reveal which insights have the highest potential to benefit our vision of a diverse pipeline of high-impact programs. We have to spend a lot of time onboarding folks to think this way and that’s why time is our most limited resource.


corgis_are_awesome t1_j7o26aw wrote

I’ve recently become obsessed with the idea of using AI and technology to solve the problem of human longevity. I want to figure out how to beat cancer and other diseases before they end up killing me or one of my loved ones.

I don’t understand why so many people are distracting themselves with random careers when they could be literally saving their own lives if they just went into medical research.

So my question for you is this:

How can I help you?

I am currently on a sabbatical, in between projects, and I’m looking for my next thing to dedicate my life to.

I am a software engineer with over 20 years of professional experience in the field. I have worked on tech and software in HIPAA healthcare environments as well as FERPA educational environments. I have helped maintain servers in physical data centers. I have built and scaled large virtual server systems. I have built numerous web apps and tools. I have built machine learning data pipelines and data warehouses. My most recent project was building out an ai voice home shopping assistant for a major retailer.

You say your most constrained resource is time. What if I could help with that?


apfejes t1_j7oofx7 wrote

Let me take a crack at this. It’s not my AMA, but it’s a question that comes up periodically in bioinformatics - the cross disciplinary field that deals with data science/programming and biology.

Most importantly, the field already exists, and the low hanging fruit was mostly picked 30 years ago, when it was reasonably possible for a programmer to work on a problem that hadn’t been tackled yet, and automate something that the biologists hadn’t gotten around to.

Alas, those days are gone. Bioinformaticians are usually very competent programmers, and rarely can make use of people from computer science without training them in biology first. Biology, after all, is the field in which nature has evolved solutions to problems, and exceptions are more common than the rules they break.

Thus, time may be short in this field, but insight is truly the valuable commodity. Understanding how to interpret the biological data is far far more important than automation. While we do see machine learning helping somewhat, pattern finding and the patterns themselves are useless without someone to interpret them and decide if they’re real. Or worth following up on. Usually they aren’t. Biology data is inherently very noisy.

So, all of that is the long way of saying that longevity or curing cancer isn’t going to be a question of automating our way to a solution. If you want to understand the complexity of the problem, you would need to understand more about the problem itself. There’s no simple solutions here, and time is only part of the missing piece needed to make real progress.


corgis_are_awesome t1_j7penly wrote

First of all, I greatly appreciate your time in writing a response!

If I understand you correctly, you believe that I would not have much to offer in longevity research without first going deep into training about biology.

If so, maybe that’s what I need to do next. I’m not opposed to going to school or being an apprentice somewhere while I transition into a helpful contributor.

With that said, I do feel like I could be a helpful contributor right now, as it is, even if I don’t have a degree in biology. I’m not too fond of wasting years of my life in college while the world goes by and all my other skills atrophy and become dated.

My thoughts on how to get around the time constraints revolve around multiplying the efficiency of our time by building and leveraging AI-powered data processing pipelines and ai workflows to analyze, summarize, and filter data and train new iterations or models. You say you want more insights and more time. That’s what I’m talking about—leveraging AI automation. I don’t have to know about biology to build those types of systems or to help you come up with out-of-the-box solutions to various problems.

I don’t have to spend years mastering how to solve Rubik’s cubes. I can use an app on my phone to solve any scrambled cube in less than a minute.

I think the life and longevity science fields could use a few more “ethical max scientists” and “ethical biohackers” to help them think outside of the box instead of being so stuck up in academia and clinically focused on minutiae


apfejes t1_j7pm3bk wrote

It’s not being “stuck up” - it’s just that the field doesn’t really lack for your skill set. There are plenty of people who know what you know who also know the biology side.

I’m not trying to gate keep, or tell you that your skills are useless. I am just trying to tell you the harsh reality that the skill sets that make you so valuable in your own field aren’t sufficient for you to solve difficult problems in my field.

Do you know the Dunning -Kruger curve? It basically translates to people having way more confidence in their own skills than warranted when venturing into areas they know little about, and are usually faced with a shocking “wake up call” when they start learning the complexity of the problems.

I’m not saying you can’t contribute, but I am saying that the low hanging “let’s apply AI to all these problems!” Days are already here for biology. What’s needed isn’t more programmers, it’s programmers who understand the complexity of the data they’re working on.

I’ve watched two decades of programmers volunteer to solve biology’s greatest problems all fall short. That shouldn’t stop you from trying, but keep in mind that your skills are necessary - but not sufficient - to accomplish your goals.


corgis_are_awesome t1_j7pnww7 wrote

I seriously doubt that the biology field is wholly saturated with ai engineers and that the only way to be helpful is to have a deep knowledge of biology.

I’m a generally intelligent person, and I learn and adapt to new problems reasonably quickly. The future will be full of the need for human-guided thinking machines of all levels of complexity.


apfejes t1_j7pr60b wrote

Well, it is not my job to stop you from trying. I just wanted to explain the issue, to save you some pain and surprise.

Please prove me wrong - that’s how it’s done in this field. Let me know when you have solved the problems that have stumped us for the last 60-odd years without a deep understanding of what those problems are.

Edit: can you design a rubiks cube solver without understanding how to solve a Rubik’s cube?


corgis_are_awesome t1_j7puwt4 wrote

Yes, I can 100% design a machine that will iteratively develop an algorithm that can solve Rubik’s cubes, without ever knowing exactly how to solve them myself.


apfejes t1_j7pw9ls wrote

Feel free to join the crowd of people who are trying to do that.

I've spent the last year talking with people in this space, and all of the big pharmaceutical companies are now saying they won't work with AI-based companies because their algorithms don't work on complex biology data. Too many people have made the claim that they could use machine learning to mine patterns out of biology data sets and failed.

It's not a knock on ML or AI. How would your algorithm know that the data it's working on is unreliable and that biology data often has 50% false positive rates on yeast-2-hybrid screens, or a given SNP may be a miscall that has propagated through 10 generations of reference genomes? Or that the assay that generated the data you're looking at used a promiscuous antibody that's triggered on a related protein that happens to express in the lab culture you're working on? If the data you're working on isn't clean, how are you planning on getting a clean signal out?

Rubik's cubes are child's play compared to the networks that Recursion is working on.


corgis_are_awesome t1_j7pzjpd wrote

Draw a circle around the intersection of Data Scientist, Programmer, Superman, and Bioinformatician.

That’s basically my career target


apfejes t1_j7q02uh wrote

Thank you for citing my own figure to refute me!

You can't be in the "superman" or bioinformatician areas without having an understanding of biology - that's how Venn diagrams work.


corgis_are_awesome t1_j7q2bzp wrote

Haha yeah I figured you might like that. :-)

Do you have any recommendations on the most efficient way to become knowledgeable about biology, especially in the way that would be useful to longevity research?

Would I have to go through a full college degree on the topic, or is there a way to bypass a lot of the noise and focus on learning the key parts that matter? I have a long history of rapidly learning new things. I like to start with a problem and work my way backwards towards the solution, learning and leveraging different technologies as I iterate toward a solution.

For example, when I was 13, I was approached by a company that wanted a software system that would let them have a communal inbox for their support staff, and a way for individual team members to pick up an email and start responding to it without stepping on someone else’s toes. So I repurposed a Matt’s Script Archive forum perl script, taught myself the basics of the perl language, and then molded it into a support ticket system that met their needs. I did that in a matter of weeks, at the age of 13, with a language I didn’t even know.

That was a long time ago, sure, but I have since learned many other languages and built many other solutions for companies over the years. For example l, I learned Python and got a job working with ai in education, specifically because I knew that Python was big in the machine learning world, and I wanted to move my career in that general direction.


apfejes t1_j7q4thu wrote

Actually, I don't have a recommendation, unfortunately. There are many different fields in biology, and learning each one can be a few years of work, plus the common foundations - so the question isn't how do you learn but "How much do you need to know to do a specific job?"

Unfortunately, biology is the opposite of programming. Programming is a logical set of tools that build on each other. If you learn arrays, or dictionaries or data structures, you can go out and apply them logically. You can figure out which one will have the best performance in a given situation, and optimization is a logical extension of what you know. You can spend a life time learning, but the basics don't change.

In biology, EVERYTHING is an exception to something else. Learn the entire "biochemical pathway" chart, and then you'll discover than some animals do things differently, or short circuit pieces of it, or just get a specific chemical from their diet and don't need to do a certain part of it. It's all chaos. Biology is the mad hatter's perspective and there's no real guarantee that something is going to work the way you think it should, or the way you were taught. eg. Translation of RNA to protein always begins with a Methionine (AUG codon)... except that sometimes it doesn't. Sometime organisms have found a way to get things started with a missing base, or sometime just that things are wobbly.. or maybe sometimes it's just not at all what you think it's going to be.

That's the rambly way of saying that you'll never know what you need to know until it's too late and you discover something was wrong. For my Masters thesis, I worked on a really slow growing bacteria, and was trying to convince it to do something for months (take up a plasmid so I could knock out a gene). I worked on that system for about a year, and never got it to work. A couple years later, working on a different project, I discovered that the post-doc who set up the system had missed a critical detail: the half life of one of the antibiotics, to which the entire system had been build around, was shorter than the incubation time of the bacteria we were growing. The system could never have worked on that organism, and no amount of work would ever have changed it. I wasted months on that, and never once thought to validate the actual system that had been used by the guy for a year before I started. Who knows what to make of the data he'd recorded.... is it all garbage? I really don't know.

How deep would I have to have studied to know to look at the half life of Kanemycin? I haven't a clue. In biology, it's not what you know that gets you - it's what you don't know.


corgis_are_awesome t1_j7q87s5 wrote

I don’t know… to be honest, the way you are describing biological systems, the more I think of the way how real world software systems actually evolve in the wild, and the nightmare that is debugging large, complex, undocumented systems. But even if it seems chaotic, there are logical patterns that can be found, and understanding that can be developed.

Out in the real world, software programs rarely grow into the perfectly optimized and well organized logical constructs taught about in college. More often than not, they are full of extremely wonky solutions and poorly documented workarounds that have been duct taped together years ago by random people pasting code from stack overflow.

In my mind, biology isn’t even a biology problem as much as it is a particle physics problem.

For example - Particle Life:


apfejes t1_j7qablb wrote

&gt; In my mind, biology isn’t even a biology problem as much as it is a particle physics problem.

Emergence is a thing, but 3.7 Billion years of emergent property evolution has created levels of complexity that are far FAR beyond the level of the simple software tools that can mimic the surface level complexity you see in "computer life" simulations.

The computer complexity you're talking about with wonky solutions and poorly documented code are, on average, about 40 years old.

The biological equivalence would be to continue building the same way for about 100,000,000x longer.

I don't dispute the analogy, but it's a bit of Dunning-Kruger, again. The level of complexity isn't going to be obvious to you until you start trying to solve the problems. 3.7 Billion years of wonky solutions layered on top of each other is a lot different than 40 years.


t_rexinated t1_j89jsip wrote

the overhype-underdelivery cycle is real and that's led to very understandable vaporware vibes amongst bigger biotech and pharma.

honestly, if you think that you'll simply be able to just pop the data from your absolute trash of an experiment into a magical shiny black box and get anything meaningful out of it, then you're an idiot and you deserve to lose your money on something you think will solve all of your problems for you.

agreed: if you're shoveling hot garbage in, hot garbage is def gonna be coming out.

when done properly and when done well, AI/ML,/GNNs/CNNs/GANs/blah blah blah are absolutely amazing and powerful tools. it just takes a lot of hard work to get to that point, and few do it well. when done well though, peeps are doing some really awesome work tho...especially in image processing phenotypic profiling:


apfejes t1_j89mgov wrote

Completely agree that AI has massive potential, but only when paired with people who understand the data they’re feeding in.


Zouden t1_j7u7gbv wrote

> I seriously doubt that the biology field is wholly saturated with ai engineers

It's not saturated at all. But from the tone of your comments it sounds like you think biotech companies haven't thought about hiring AI engineers, which isn't the case. Of course they see the benefit of engineers. Have you looked to see if Recursion is hiring?


corgis_are_awesome t1_j7u9im5 wrote

I guess I just wasn’t sure which one of their many job listings my particular skill set would best fall under, so I was hoping I could have a conversation, ya know?


BioRevolution t1_j7mbwuo wrote

Whats the best outcome so far out of releasing the public datasets such as RXRX1/2 and now 3? Do you expect to continue releasing more and more data sets like this?


IHaque_Recursion t1_j7mm269 wrote

I’ve been super excited to see how our datasets have driven academic research out in the world. Recursion has been on the cutting edge of developing phenomics as a high-throughput biological modality, and the RxRx datasets are among the largest and best-organized public datasets out there for folks to work with. I’ve seen blog posts, conference posters, MS theses, and more written on our datasets. (We’ve also hired a number of folks to our team based on their work on these data!)


Captain-Moroni t1_j7pputk wrote

Who is the sexiest member of your senior staff and why is it Mason Victors?


mr-kodiak t1_j82a5wo wrote

I mean, have you seen that guy? What possible evidence could you conceive of that would make it NOT Mason Victors... I contend there is no such evidence.


SandwichNo5059 t1_j7mb56j wrote

What steps do you take for controlling for batch variability?

How far do you think you’re from novel chemical matter rather than drug repurposing trials?


IHaque_Recursion t1_j7mjumw wrote

Batch effects are probably the most annoying part about doing machine learning in biology – if you’re not careful, ML methods will preferentially learn batch signal rather than the “real” biological signal you want.

We actually put out a dataset, RxRx1, back in 2019, to address this question. You can check this here.Here is some of what we learned (ourselves, and via the crowdsourced answers we got on Kaggle).

Handling batch effects takes a combination of physical and computational processes. To answer at a high level:

  1. We’ve carefully engineered and automated our lab to minimize experimental variability (you’d be surprised how clearly the pipetting patterns of different scientists can come out in the data – which is why we automate).
  2. We’ve scaled our lab, so that we can afford to ($ and time!) collect multiple replicates of each data point. This can be at multiple levels of replication – exactly the same system, different batches of cells, different CRISPR guides targeting the same gene, etc. – which enables us to characterize different sources of variation. Our phenomics platform can do up to 2.2 million experiments per week!
  3. We’ve both applied known computational methods and built custom ML methods to control / exclude batch variability. Papers currently under review!

SandwichNo5059 t1_j7mc3zs wrote

How do you balance time in dry lab machine learning predictions vs. experimental work in cells or animals to validate a compound?


ShakeNBakeGibson OP t1_j7mka71 wrote

We actually think about this a lot and we believe that these processes need to learn from each other. We build feedback and feed forward loops between dry lab and experimental work - essentially we think iteration is most important. We do up to 2.2 millions experiments in our wet lab each week to feed machine learning predictions and those predictions feed back into the wet lab experiment design. We do all of this in service of decoding biology and delivering therapeutics to patients.


EDIT: Removed a typo.


BioRevolution t1_j7md1lk wrote

What are your ambitions/acticities around 3 dimensional cell assays/Co-cultivation/Organ on a chip technologies to further advance your phenomics studies and bring them closer to animal models and finally to humans?


ShakeNBakeGibson OP t1_j7mze4o wrote

We’ve done a lot of work on co-culture at Recursion and we agree that 3D assays have a lot of utility; as a company focused on innovation these are areas that are highly interesting to us. Unfortunately we aren’t able to discuss all the methods and areas of research but feel free to take a look at our [presentation from Download Day] for some flavor on where we are innovating (


Neat_Caterpillar_759 t1_j7mdz7c wrote

Why do you suppose it has been so difficult for Recursion to keep a CSO (been without since 8/2021) and a CMO (been without since 6/2022)? How do you feel like the lack of such experienced leadership has affected your ability rapidly translate your insights into medicines?


ShakeNBakeGibson OP t1_j7mj1gd wrote

I’m really hard to work for…
In all seriousness, almost all of the executives at Recursion today have been with the company for four or more years, and we are proud of that track-record. That said, we have a really ambitious mission at the intersection of many diverse fields, and we fully support our current leadership while we make sure we get the right people into these roles.


YBGMelloYello t1_j7mfccd wrote

Heard that RXRX is 3x better than Moderna’s drug discovery yet moderna has way more drugs in the pipeline as well as many in phase 2 and 3. Isn’t mrna easier to work with vs small molecules? When do we see the 3x performance materialize?


ShakeNBakeGibson OP t1_j7mmu8i wrote

Always great to hear from a fan… we’re blushing.

But your question is good - mRNA works really well in some important parts of biology - like tricking your body into thinking it has seen components of a virus so it mounts an immune response. But mRNA is not probably the right tool for other areas of biology (like inhibiting an overactive protein).

We think Moderna’s work is awesome


YBGMelloYello t1_j7msas6 wrote

I’m an investor of both companies. I’ve been working my cost avg down on RXRX. And my cost avg on MRNA is moving up. I still believe in both platforms and have yet to sell any stock of either company. The future is bright for both of you. God speed.


BioRevolution t1_j7mfm0h wrote

When are you opening your first labs/offices in Europe (and where would you like them to be), so that you can also tap more extenisvely into the european talent pool without them having to relocate?


ShakeNBakeGibson OP t1_j7n12sm wrote

We don’t have any immediate plans for an expansion in Europe right now.


robin_arjn t1_j7mj5u0 wrote

Do you plan to export/adapt your software internationally?
Do you plan to collect data from other laboratories (national and international research)?


ShakeNBakeGibson OP t1_j7mpp3c wrote

We don’t sell software. Check out a demo of one of our internal tools, [MolRec]( We don’t collect data from other laboratories but we do partner closely with select drug discovery partners.


iamsupaman t1_j7ms0no wrote

Q1: What is your opinion opensourcing the full dataset? and the possible benefits for medicine of doing so.

Q2: What is your biggest struggle at this moment to go to the next level?


ShakeNBakeGibson OP t1_j7mwav6 wrote

Q1 - We just open-sourced [RxRx3](, the largest public dataset of its kind so far… but as for unblinding the rest… [insert picture of Dr. Evil with hairless cat]
Q2 - My biggest learning as a founder has been that the most complex thing in building a company with a mission as ambitious as ours is not the science, it is the people. Helping everyone here work at their maximum potential, together, and rowing in the same direction is and always will be (IMO at least), the hardest struggle.


NachoR t1_j7m82ja wrote

1 - On drug discovery: Are you researching new compounds, natural or synthetic? Or trying to map possible interactions of known compounds?

2 - Is your research in any way related to the work of AlphaFold?


ShakeNBakeGibson OP t1_j7mfids wrote

OK, Imran answered this question, but he’s currently restarting his computer, because Murphy’s Law… so from Imran:
In our early years we focused on using our approach to enable drug repurposing programs (“known compounds”), hence why 4 of our 5 clinical stage programs are with repurposed molecules. But for the last few years we’ve been using our maps to discover & optimize novel chemical entities, including both natural and synthetic ones - in fact our first new chemical entity (synthetic compound) just entered Phase 1 clinical trials!

For 2, see above!


DuckProfessional6774 t1_j7mcbtq wrote

Would you rather fight 100 duck-sized horses or 1 horse-sized duck?


IHaque_Recursion t1_j7mdie8 wrote

On a scale from Darkwing to the duckling in my kid's bedtime book that wandered away from his nest after specifically being told not to, what kind of ducks are we talking about?


nervez t1_j7ophr3 wrote

finally a question i understand.


Redcat16 t1_j7mdejn wrote

How does your technology compare to this automated scientist platform?


IHaque_Recursion t1_j7ms2y4 wrote

Directing evolution of bacteria to change their small molecule output is indeed a great example of the utility of AI and is definitely similar to how we view AI in the overall evolution of a compound series. Today, our core applications of AI are at a lower level in the stack – for example, taking raw images from our microscopes and projecting them into biologically meaningful embedding spaces. That said, we’re building our discovery technologies with an eye towards building closed-loop optimization cycles in small-molecule discovery. We actually just presented more about this a couple weeks ago – if you’re curious, see more here in the Recursion OS section from our recent Download Day.


Novel-Time-1279 t1_j7mdfcu wrote

To what extend (if any) do you think that a database profiling common human genetic variation in eg KRAS tumors would be helpful so that you can design antibodies that will be broadly applicable? Do you analyze mass datasets from eg TCGA or Genomics England and try to design antibodies considering common variants or do you pick a canonical target and work from there?


IHaque_Recursion t1_j7mqu1g wrote

I have genetics on the brain, so yes: I definitely think that data from both germline GWAS and somatic variation studies can be valuable for drug discovery. We don’t work on antibodies at Recursion today (though we have piloted them and they worked great on the platform), but we certainly make use of genetics data to inform our directions. As far as canonical targets, our platform allows us to be agnostic and to explore without having to select a target. As we move through our drug discovery process we aim to understand as much as possible about the target and its mechanism of action.


ReleaseSalty t1_j7mdoa4 wrote

Do you have the capability to utilize available ultra large chemical spaces?

At some point, will you be able to connect such implicit, non-enumerated spaces with predicted activity?


IHaque_Recursion t1_j7mo48q wrote

Yes - our digital chemistry platform allows our scientists to search and expand hits across multi-billion molecule virtual libraries and growing!


rubixd t1_j7mdsii wrote

Given the scale of opiate crisis and the general lack of reliable addiction treatment are you or your competitors looking into developing less or even non addictive pain management drugs?

Perhaps alternatives to opiates?


ShakeNBakeGibson OP t1_j7mqi3l wrote

This is not an area we are working on, but we think it is really important. We founded a biotech and healthcare incubator called [Altitude Lab]( to help grow the next Recursion and support underrepresented founders here in the Mountain West, and there is a young company there working on this exact problem.


Novel-Time-1279 t1_j7me0bc wrote

Do you see any use cases for looking at metagenomics data in your drug discovery or lead optimization efforts?


ShakeNBakeGibson OP t1_j7mteeo wrote

We have a vibrant innovation arm and we actively seek opportunities to enhance the use of our data to decode biology and develop therapeutics for patients. While we can’t comment on the specifics of our explorative biology and tech, metagenomics is certainly in the spirit of the work we do.


BioRevolution t1_j7mf1eh wrote

  1. The area of AI enabled Drug Discovery is a fast moving field: When have you planned to update the Frost & Suvillian Analys Slide showing the Top companies? It most likely will require regular updating.

  2. What made you change the visualization of your pipeline slide? (Going from the Horizontal "scatter" Plot with the different programs from early discovery to clincal to the newer illustration of the bar plots, that is no longer showing the number of early stage programs)


ShakeNBakeGibson OP t1_j7n1a5j wrote

We agree. It has been a while. Keeping up with all the great work in the space is hard, but this is on the list.

We changed the pipeline slide visualization based on feedback from lots of investors who appreciated seeing something they were more familiar with.


PatentSavvy t1_j7mo0tm wrote

Are you guys engaged in protecting your methods of drug discovery via patent applications? Or do you guys plan on protecting any potential candidates once their existence becomes known through the methods? Or both?

As a patent attorney, your model sounds interesting and I hope you protect your discoveries and inventions. I have been involved in patents relating to pharmaceutical design and drug development and have seen the various processes first hand. It definitely is an iterative and arduous process but it can be totally worth it in the end if you have that one successful candidate that proves therapeutically effective and obtains FDA approval.


ShakeNBakeGibson OP t1_j7msdaz wrote

We certainly protect and will continue to protect our development candidates using industry standard kinds of patent filings. But, as you imply, our development candidates are only a small part of the innovation that happens at Recursion. We do have multiple patents and filings on our RecursionOS, but we also look at protecting inventions in the biology and hardware spaces where we innovate. We also protect some of the key advances on our platform via trade secret. This doesn’t even take into account the massive amount of proprietary data we’ve generated.
That said, we think we can contribute a lot to open-science without giving away our advantage - see [our RxRx datasets]( and [publications](


scootty83 t1_j7msbmi wrote

Can this technology lead to customized healthcare on a per individual level?

Can you take someone’s genetic info, run it through the AI and pinpoint which medications would be best for that individual and/or synthesize new medications that would work best for that one person?


ShakeNBakeGibson OP t1_j7n216a wrote

We very much hope that the computationally-accelerated advancements in biology and chemistry one day results in exactly this - the ability to create the precise compound to treat a disease, even on the individual level. We think that may be a couple decades away, but we are going to keep pushing to make those crazy ideas real.


freedomofnow t1_j7mx7er wrote

How is it looking in the field of curing hearing damage through auditory trauma along with hyperacusis?


ShakeNBakeGibson OP t1_j7n28k9 wrote

We are not currently working on any auditory trauma indications, but are cheering on the organizations that are finding treatments.


freedomofnow t1_j7n2c14 wrote

Okay, thanks for the response. Do you see anything happening in the future?


zean_rm t1_j7n3x9z wrote

How often do you use the climbing wall?


AmbitiousExample9355 t1_j7n43mf wrote

Are there any cases within drug discovery where the source distribution shifts such that it differs from the original dataset?


gamingchemist952 t1_j7o2e2o wrote

Is your algorithm compatible with Oligonucleotide therapeutics? Not quite small molecules but not quite biologicals either.


agissilver t1_j7qnqca wrote

I don't work at recursion but I'd venture the answer is that they have a variety of libraries ranging from oligos, small molecules, to crispr constructs.


MyNameIsIgglePiggle t1_j7p7vtd wrote

If DNA is just the source code of living creatures, why can't we make an "emulator" to run it?


another_grackle t1_j7p9zsb wrote

So are you going to use AI to help people get more affordable healthcare or just exploit people in need to get rich?


IAmAModBot t1_j7mcgkd wrote

For more AMAs on this topic, subscribe to r/IAmA_Tech, and check out our other topic-specific AMA subreddits here.


Novel-Time-1279 t1_j7mgwv6 wrote

For your repurposing efforts, have you considered partnering with one of the large-scale EHR data providers and running causal inference algorithms to try to identify potential unexpected effects of certain therapeutics or combinations thereof in longitudinal outcome data?


IHaque_Recursion t1_j7mu67h wrote

It’s an interesting idea, but we think our unique advantage is being able to generate scalable,, relatable, and reliable data in-house. Clinical data are extremely challenging to work with from a statistical perspective (the number of confounders is astounding, and once you stratify you may be left with very few samples). That said, real-world evidence is certainly interesting from a clinical development perspective for understanding the patient landscape, longitudinal disease progression, and for informing patient selection strategies in clinical trials; and other population-scale datasets may be of interest for advancing our discovery and development pipelines.


mediaacc t1_j7micfw wrote

Doesn't the use of AI massively restrict the creative discoveries that could be made, restricting the discoveries to the information base present in the AI's machine learning algorithms?


ShakeNBakeGibson OP t1_j7mpbro wrote

The scale of data required to understand biology, paired with our susceptibility to bias as humans, is a big limiting factor on our (useful) creativity in biology. Augmenting our team with less biased ML and AI systems to explore the complexity of biology and chemistry is a recipe for success for increasing creativity IMO.


Crackracket t1_j7mij6w wrote

What the most interesting drug you've discovered so far in terms of use?


ShakeNBakeGibson OP t1_j7mr7p0 wrote

That’s like asking us to choose a favorite child… can’t say.


carocllb t1_j7mj31n wrote

What are the similarities between your AI and ChatGPT ?


ShakeNBakeGibson OP t1_j7molmg wrote

We asked ChatGPT…
It says: “Recursion Pharmaceuticals uses artificial intelligence as a tool to discover new medicines, but its AI is not similar to ChatGPT. ChatGPT is a language generation AI model that can generate human-like text based on input data. In contrast, Recursion Pharmaceuticals uses AI for image analysis and high-throughput screening to identify new drug targets and develop new treatments for diseases. The AI used by Recursion Pharmaceuticals is more specialized and focused on drug discovery, while ChatGPT is a more general-purpose language generation AI model.”

Thanks ChatGPT!


Pookie_0 t1_j7mjidv wrote

We all know that chat GPT made mistakes at its beginning - which is the point of machine learning and IA. But considering that your IA is in the pharmacetical domain, this is more of a life or death situation. How do you plan on dealing with such mistakes?


ShakeNBakeGibson OP t1_j7mttku wrote

This is why we don’t just take the inferences from our maps of biology and send them into clinical trials. The FDA has a lot of useful restrictions on testing drugs in humans that ensure that everyone does a ton of work to minimize risk of experimenting in humans. For example, we do numerous validation experiments in human cells, animal models and preclinical models after our AI gives us input but before we go into trials and many of these experiments address safety. That said, one can never minimize risk to zero and we take our responsibility to patients seriously.


[deleted] t1_j7mjupl wrote



IHaque_Recursion t1_j7mpdo7 wrote

It looks like the cloud. It also looks like BioHive-1, our private supercomputer (#115 in the world on the latest TOP500 list).


BioRevolution t1_j7mjzjz wrote

Last question from my side: What are you plans around Closed Loop optimization?

You are experts in AI/ML and super-users/heavy on lab. automation. Do you have any ambitions on implementing workflows for autonomous experiments (also called self driving labs in some publications)?

Thanks a lot for taking the time to do this and answer all the questions, I appreciate it.


GimmickNG t1_j7mm8cs wrote

Now that google DeepMind and other AI tools can predict protein structures, what's the real utility of programs like Folding@Home and FoldIt?


IHaque_Recursion t1_j7mv6ze wrote

I did my PhD in the Folding@home lab, so I like this one. There’s a distinction between what’s formally called “ground-state structure” and “structural dynamics”. “Ground state structure” is the lowest-energy, most stable structure of a protein; for me, the ground state structure is “lying in bed”. But only knowing that doesn’t tell you how the structure moves around, which it turns out is important. For example, when I sprained my shoulder, the movement of my arm was highly restricted, but you wouldn’t have known that from looking at one position in which I sleep (you creep). Folding@home is more focused on modeling the dynamics of proteins than their ground state structures. For example, the most effective recent COVID vaccines used a modification to the spike protein called “S-2P”/”prefusion-stabilized” that effectively froze the protein in one particular shape rather than allowing it to fluctuate, which enhanced its ability to generate a useful immune response.
That said, dynamics is the obvious next step for ML methods in protein structure, so I would not be surprised to see new developments here!


GimmickNG t1_j7n4avs wrote

I see, thanks! Good to know the effort in running Folding@Home hasn't been made redundant by AI just yet, although I certainly look forward to developments in the field!


Revlis-TK421 t1_j7mmzz2 wrote

> predicted relationships between genes and chemical compounds.

Are you controlling for expressed vs non-expressed genes for a given cell type / stage of development? Epigenetic factors?


IHaque_Recursion t1_j7n1l9p wrote

We build maps of biology in a range of cell types for exactly this reason – different cell types express different genes. For example, in our partnership with Roche and Genentech, we are building maps in a range of neuroscience-relevant cell types to capture their unique biology.


bo_rrito t1_j7moagt wrote

Why the decision to ignore structure based drug design?


IHaque_Recursion t1_j7mws6x wrote

The majority of drugs don’t fail because we can’t engage the target with a small or large molecule - they fail because we pick the wrong target. Hence our focus on mapping and navigating causal biology. Our platform is exceptionally well-suited to target-agnostic identification of compounds that impact biology, which absolutely means we don’t always know the target of our compounds. However, one of the major advantages of our map is that it can often uncover the real targets of our active compounds, enabling us to use advancements in structure-based. Additionally, the underlying learnings in this field are even useful in the target-agnostic space, as we try to featurize compounds and learn how to make molecules not only more potent against their primary target, but also in enhancing their overall efficacy, safety and metabolic profile.
That said, we actually do make use of structure-based methods where appropriate. What we don’t do is limit ourselves to solely identifying particular targets (and their structures) ahead of time when initiating discovery programs.


bo_rrito t1_j7n15as wrote

Thank you-- this is an interesting perspective! I spend large amounts of time convincing structure-based scientists that dynamics, thermodynamics, and kinetics are important to understand drug binding and biological function (and especially allostery), so circumventing structure seems like a whole other paradigm.

If you can point me to any comprehensive papers describing your approach, I'd be really grateful!


ReadsAndLearns t1_j7mq5gf wrote

Have you'll experimented with single cell Multiomic platforms like 10x or Missionbio?

The major benefit that I see with single cell data is that it provides clonal information which aren't available in bulk methods. Do you see any benefits of these technologies in drug discovery? Can they help improve your models?


IHaque_Recursion t1_j7n2ly1 wrote

I can’t comment about all of our internal technologies. But! We did recently publish work with our collaborators at Genentech on benchmarking methods to builds maps of biology, which we evaluated on both our phenomics data and (publicly-available) 10x scRNA-seq (Perturb-seq) data – check it out here. So, draw your own conclusions…


supertyson t1_j7mqwnc wrote

It's great that large datasets are being pulled in, but what are procedures around making sure that the data itself is good/useful?


IHaque_Recursion t1_j7n1vgs wrote

We run our experiments in house so that we can control the quality and relevance of the data. This type of attention to detail requires doing a lot of the unsexy behind-the-scenes operational improvements to control for as many 'exogenous' factors that can influence what actually takes place in our experimental wells. To manage this, we have (to an extent) backward integrated with our supply chain so that we can (i) anticipate where possible or (ii) correct for changes in the media our vendors supply, different coatings that suppliers may put on plates, etc... Additionally, we have built an incredibly robust tracking process that allows us to measure the meta data from every step in our multi-day assay, so that we maintain precise control over things like volume transfers, compound dwell times, plate movements, etc. to further ensure this relatability. I also wrote more earlier in the AMA about how we handle batch effects!


Groggolog t1_j7mrxcu wrote

Have you looked at using Conformal Prediction for uncertainty quantification in your ML pipeline? If so why not? It's a technique that has been around for a while but I don't see it massively widely used, though some of the example use cases I have seen were drug discovery NNs.


IHaque_Recursion t1_j7mzd3e wrote

Conformal prediction is indeed an interesting method (or family thereof). I can’t comment on our undisclosed internal machine learning research, but what I can say is that machine learning on biological problems tends to be much, much harder than that on common toy or benchmarking datasets. Uncertainty quantification is usually an even harder problem than pure accuracy measurement, especially when you have a mix of known and unknown systematic and random effects in your data-generating process.


jreverblades20 t1_j7mts1e wrote

How can we cure muscular dystrophy!?


ShakeNBakeGibson OP t1_j7mzmiu wrote

We are not working on this indication at this point in time as the genetics behind it are not a good fit for the technical parameters of our platform today, but it is a devastating disease and we are rooting for those who are actively pursuing discovery in that area.


jreverblades20 t1_j7n3c5n wrote

Any great resources to find those people that you’re able to share?


VitaScientiae t1_j7mv04c wrote

Why have you stayed in SLC as your headquarters, vs moving it to Silicon Valley or Cambridge or somewhere more biotech dense?


ShakeNBakeGibson OP t1_j7myf9q wrote

There are pros and cons to any geography today, many of which are being blurred by the move to (or from) remote work. We ended up in Salt Lake City serendipitously. I spun the company out of my dissertation work at the University of Utah with my co-founders back in 2013.

As we grew the company, we found a lot of great scientific and technical talent here in Utah. However, we had a harder time finding experienced, senior talent from biotech and pharma in the area. What that meant is that we had to build a really strong recruiting arm to the company, but once people commit to Recursion they tend to stay for a long time with little turnover, which is huge for us when building something this complex. We’re a proud leader of Utah’s Biohive community and believe deeply in the community we’ve created here in SLC. Not to mention all the fun things that come with being based in a mountainous state!

That said, we are now ~500 people and want to have the best talent in the world, and so we have remote staff, as well as teams in CA and Canada. And we certainly could imagine opening offices in other places in the future.


haunted-liver-1 t1_j7ogdcd wrote

What's the percent of chemicals your AI has discovered that would be classified as biological weapons?


agissilver t1_j7qo1qn wrote

I am late to the party but wondering how the expansion for new work cells is going? I interviewed for an automation engineering position last year and then was told it was unexpectedly cut and maybe there would be availability again in a year or so.


BioRevolution t1_j7tncw5 wrote

For everyone late to the party or re-reading the answers from Recursion:

Are you interested in staying up to date on Recursion and the AI/robotics enabled drug discovery field? Feel free to join/check out the UNOFFICIAL Recursion Pharma community on reddit: r/RecursionPharma and join in on the discussion, where we share related news/patents/interviews and discuss the technology/progress in the surrounding space.