AI in Science: Promise and Peril
This week, Google has launched a new AI tool called Co-Scientist. We hear from one researcher who has tried it out with stunning results. But how much should we trust tools like this - and what are the dangers?
And what about the problem of AI generated text and images? We talk to an ‘image integrity analyst’ who hunts down fake or manipulated pictures in scientific papers.
Finally, the planets of the solar system are coming into an unusual alignment. Astronomer Royal for Scotland Catherine Heymans shares how to glimpse the planetary parade.
Presenter: Victoria Gill
Producers: Ilan Goodman, Sophie Ormiston & Ella Hubber
Editor: Martin Smith
Production Co-ordinator: Jana Bennett-Holesworth
Listen and follow along
Transcript
This BBC podcast is supported by ads outside the UK.
Suffs!
The new musical has made Tony award-winning history on Broadway.
We demand to be home.
Winner, best score.
We demand to be seen.
Winner, best book.
We demand to be quality.
It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.
Suffs.
Playing the Orpheum Theater, October 22nd through November 9th.
Tickets at BroadwaySF.com.
BBC Sounds, Music, Radio, Podcasts.
Hello, lovely, curious-minded people.
Welcome to Inside Science, a program that was first broadcast on the 27th of February, 2025.
I'm Victoria Gill, and today we are going to delve straight into a scientific revolution, artificial intelligence, and its transformational role in science.
Over the next half hour, we're going to unravel the power that's promised and the threats that are posed by this transformational technology.
We'll investigate whether AI really solved a major microbiological mystery that had taken human scientists years to crack in just a few days.
And we're examining the bizarre world of AI fakery and how it's making its way into scientific papers.
To navigate all of this, we have technology expert and science communication lecturer Gareth Mitchell in the studio.
Hello, Gareth.
Hello, nice to be here.
It's very nice to have you.
And this is right up your street, isn't it?
Oh, very nice.
Why you're here?
Oh, I'm having such a good programme.
You know, because I'm so fascinated by machine learning.
I've been reporting on it for years.
I know you have as well, Vic.
And I'm fascinated by the way science happens.
And so the story we're talking about today brings both of those bumping into each other with amazing consequences, you know, for good, we hope.
But, you know, I've spoken to some scientists who are already sounding a bit sceptical about it as well.
So, this has put like a cat among the pigeons a little bit.
So, yeah, it's a story that matters for the world of science.
But by default, it means it matters for the rest of us too.
Yeah, yeah, absolutely.
Let's get right into it then, Gareth.
You have been speaking to a scientist who had a bit of a bewildering experience when he tested a new AI tool, is that right?
Yeah, absolutely, yes.
So, um, this was at Imperial College where I went to speak to Dr.
Thiago Costa.
And he and his team have been testing a tool that's being developed by Google.
And this is called the Co-Scientist.
So I suppose the clue's in the name a little bit about what this is about.
And so far, this tool is being aimed specifically at biomedical applications.
And Thiago and his team, they study bacteria, but also these little things called phages.
And I hope you don't mind, Vec, but I might put you on the spot here.
Do you know as a top correspondent and presenter what a phage is?
Top correspondent, I'm going to take take that um okay i mean with without referring at all to to my script and any preparatory conversations i've had with you um so phages are like little tiny viruses that can infect bacteria right indeed yes and they do so through these little tails and it's through the tail that genetic material dna in the phage goes into the bacteria and there's some excitement around them i mean scientifically they're very interesting but potentially therapeutically as well.
So these phages matter.
So, and we're talking about phages quite a lot in this interview we're about to hear.
So, yep, you have the phage, has some DNA in it, and there's a mechanism for that DNA going into the bacteria.
Right, that's what we need to hold in our heads.
So, you spoke to Thiago about his experience with co-scientists.
Um, so let's hear more from that conversation.
Phage is
an entity made by proteins, this very big, large ball with the DNA inside and a tube through which the DNA gets injected into the bacteria.
These non-infectious phages are lacking a tail.
And the tail is what is important for a phage to bind the surface of a bacteria and inject the DNA.
What we found was that the DNA of this non-infectious phages were embedded in different bacterial species and we didn't know why and how it was reaching there.
So, you're trying to find out how this process works.
That's ultimately what you're trying to get down to.
So, you need to sit down at some point and think, we think it might be this thing, as I understand, and then in science, you have to then go and test your hypothesis.
But Google came to you, didn't they, at one point and said, We've been working on something, we want you to help us try it out.
Is that how it happens?
So, they asked us to basically understand whether the scientific hypotheses that the AI system was generating were valid.
And the only way to validate those hypotheses is actually go to the lab and test whether those hypotheses are correct or not.
So what does the system look like?
I'm just trying to think, is it a bit like Chat GPT?
Well, actually, we didn't have direct access to the system because this was still being underdeveloped by Google.
So what we did was to send them what our scientific question was and a few lines of introduction and a few references.
So what were you typing into the Google co-scientist to kind of get the hypothesis that you wanted?
So it was a very simple question, was how these phages, which are non-infectious, are able to infect bacteria from different species and integrate themselves into their DNA.
What was the mechanism?
So after a few years of research, we understood that what those non-infectious entities were doing, they were ejecting the tails from different phages in order to inject their DNA, which was
encapsulated in this ball-shaped structure on top of the tail.
So we knew the answer to the question before we challenged the AI system.
So the question was like, how are these phages getting their DNA into the bacteria, even though they don't have the equipment, the mechanism, the means really to do any of that?
Yes, that was exactly the question.
And then,
two days later, the AI system came up with five different hypotheses.
So, the algorithm spits this out for you after 48 hours.
What was your reaction when you saw this?
Well, it was
we were very surprised because basically the main hypothesis was a mirror of the experimental results that we have obtained and took us several years to reach.
So it was
a shocking, I would say.
Is it possible though, just from the literature you'd already put out, because you'll be putting out preprints and stuff as you go along, presenting it at conferences, that this had kind of crept in, as it would do, into Google's data set.
And so really, it was just reciting your own research back to you or not.
No, so this this preprint was was kept secretive for a while.
Why?
Because we thought of opportunity to
patent this technology, which we did.
So that's the reason why this did not become public until the patent was filed.
And during this process, we will start to talk with Google.
There's no way that
Google will have access to our preliminary data
while they were generating the scientific hypothesis.
Some of this seems a little bit opaque to me and I don't know if Google have paid you to do this research either.
No, there was zero funding.
So this was a benefit for both, but
zero pounds involved in this.
As we all know, anybody who's used any of these LLMs and chatbots will know they hallucinate.
They can come up with stuff that seems plausible, but is wrong.
So,
one of the hypotheses that has generated, we have never thought about that.
And from the preliminary data that we already have, it looks very, very, very likely that that's a novel mechanism of DNA transfer.
So, it not only is validated our experimental work, but it has generated a novel hypothesis that we have never thought about and the preliminary data we have
it's it's looking it looks promising
were any of these hypotheses that are generated complete nonsense i wouldn't say nonsense but i i would say that are less relevant to the question that does not do not answer directly the the question and this is why it's so important the human brain to critically evaluate those hypotheses and not take it as any of the hypotheses as the final answer or definitive answer to the scientific question.
I think you say that with genuine belief as somebody who is paid to be a scientist and doesn't want to be replaced anytime soon.
Yeah, well,
definitely this system won't replace humans.
That's for sure.
That's for sure.
So it won't replace humans.
That's somewhat comforting.
This is such an intriguing story though.
And we're going to try and unpack exactly what happened here.
To help with that, we have Maria Leacarter from Queen Mary University of London, who is also a Turing AI Fellow.
She is professor of natural language processing, and some of her research focuses on the benefits and limitations of AI in science.
Hi, Maria.
Welcome to Inside Science.
Hi, I'm really glad to be here.
Oh, well, it's a pleasure to have you and to hold our hands through how exactly this works.
So many of us have at least played with things like ChatGPT or Google's Gemini.
Is Google Co-Scientist something very different from those chatbots that we might be more familiar with?
How does it work?
So, Co-Scientist is based on LLM technology.
So, large language models.
Itself is a much more complex system than a single large language model.
But basically, how large language models work, they are state-of-the-art probabilistic algorithms that have been trained on large amounts of data.
And they are trained to generate outputs given particular inputs.
Right.
And inputs are prompts.
So they usually are texts like a phrase, a sentence, a question, or more complex instructions.
So when we are asking a question to an LLM, we are providing it essentially with a prompt to generate an output.
Right.
I sort of see it as they're hoovering up all of this information and by listening to it, consuming it, when you give them a prompt, they're working out what the most probable next word answer to a question is from all of that text and info that they've consumed.
Is that fair, if massively oversimplified?
It's simplified because they are making some really complex associations.
So they're not just predicting the next word, but essentially they're predicting what would be an appropriate output given the prompt that you've given.
Right.
And Google described Co-Scientist as a multi-agent system, sort of multiple LLMs, large language models.
What does that mean in practice?
So that's correct.
So co-scientist itself is not an LLM.
It's rather a coalition of LLMs, and they each specialize in a different task.
And so what this multi-agent architecture means is that you have multiple LLMs that are interacting with each other to generate hypotheses that are evaluated and further refined.
And this kind of high-level supervisor agent, which coordinates all the others, I would say, is
the main technical novelty in the system.
And I find something quite reassuring about that in a way, Maria, because we have a lot of discussions about artificial general intelligence.
And, you know, this idea that before long we're going to end up with an amazing brain.
And you just say, hey, brain, how do you cure a range of diseases?
And it'll just do it.
Clearly, that's...
probably never going to happen, certainly not in the way that I've described.
This, Maria, from what you're saying, when you have multi-agents, it's just lots of of LLMs and other machine learning systems that specialize in certain things.
One of them might sort of generate some of the initial hypotheses.
One might run the tournament that matches one off against the other and works out which is the best, like a ranking kind of thing.
In other words, this is artificial, very specific intelligence.
It's going the opposite way from this idea of artificial general intelligence, isn't it?
Yes, I think the trend is to go for more sort of smaller, more specialized systems than having just one big system that does everything.
And sort of the idea here is that not only is it LLMs, but they also use external tools like web searches and access to online databases.
So it's kind of a quite a complex pipeline that consists of a lot of different components.
One thing that came out from the interview with Thiago is that this co-scientist came out with one or two hypotheses that were not anything to do with existing literature or any training data that was already there.
These seem to be absolutely novel.
And I was rather stunned by that, Maria.
Is that something that you might have expected from such a co-scientist as well?
So I think that this is possible.
So, I mean, a novel hypothesis can be created by essentially generating outputs that synthesizes and stems from indirect associations, right?
So you have these kind of complex contexts that you wouldn't necessarily put together, but by having this kind of iterative hypothesis generation and refinement, it can happen.
Right.
So I can see there how those tools by coming up with things that scientists hadn't spotted could help scientists make progress quickly.
But what are the pitfalls of this approach?
So what is presented here is a very, very complex pipeline.
It's very resource-intensive.
This is not really discussed very much in the paper, which, by the way, way is more of a showcase paper than actually revealing a lot of the technical details.
But the fact that it's so resource intensive means that it's not something that would be very readily replicable by scientists.
So they would have to sort of be working with someone like Google to do this.
So what do you think are the potential dangers of scientists overestimating and over-relying on technology like this?
My main concern is becoming complacent and not really understanding how hypothesis are generated, understanding the reasoning process itself, having the background knowledge.
I mean, at the moment, there are quite a few limitations to this system in particular, but assuming that these can be very much improved in the future, I think this is a concern I have.
Yeah.
Well, Maria Leacarta from Queen Mary University of London, thank you very much indeed for taking us through that.
So it seems kind of to come back, Gareth, to this idea that co-scientist isn't a scientist, it's not a human, it's a tool, and understanding the limitations and how that tool works is really important for science.
Yep, hugely.
And by sheer good fortune, I happened to spend a day with a whole room full of early career scientists yesterday.
So I asked them what they thought.
And I was reassured that there didn't seem to be too much potential complacency setting in.
If anything, they were really quite cautious about systems like this.
And they were worried worried also about their data, you know, because they'd have to give up a lot of their data to feed into the AI and prompts in order to get whatever results out.
And they had worries about their ownership of that data and then what might come out of the machine the other side.
And also that for them, hypothesis generation is a key, really important professional skill for any scientist, especially an early career scientist.
So It is lovely that you have a machine that will help you with your hypothesis.
But unlike your lab mate who might riff with you about your hypothesis and give you a few ideas, you can't say, wow, that's a genius idea.
Where did that idea come from?
You can't ask a black box how it did that.
So I think they were healthily skeptical about it.
On the other hand, there's huge pressure in science to get out there and publish.
So if you can use tools that are going to speed up your pipeline that your rival lab isn't using, you know, you can see some temptation there as well.
Yeah, which is, you know, where these kind of pitfalls interact with all of that promise, right?
So I should say that Google told told Inside Science that Co-Scientist is still in development.
It's currently available via their trusted tester program, and that's a program that research institutions can apply to join.
Now though, AI has also started making its presence felt in scientific papers, specifically generative AI, which can create new content based on all the information it's been trained on, including text or images.
Gareth, you have some striking examples for me.
Yes, I do.
This one, I'm blushing just thinking about this.
It was a paper with the rather dry title really of selling.
When never thought this would happen actually happens, Serve Pro's got you.
If disaster threatens to put production weeks behind schedule, Serve Pro's got you.
When you need precise containment to stay in operation through the unexpected, Serve Pro's got you.
When the aftermath of floods, wildfires, hurricanes, and other forces that are out of your control have you feeling a loss of control, Serve Pro's got you.
Simply put, whenever or wherever you need help in a hurry, make sure your first call is to the number one name in cleanup and restoration.
Because only ServePro has the scale and expertise to get you back up to speed quicker than you ever thought possible.
So, if fire or water damage ever threatens your home or business, remember to call on the team that's faster to any size disaster at 1-800-SERVPRO or by visiting ServePro.com.
Serve Pro, like it never even happened.
There was an image that went viral on social media, one of these machine learning-generated images.
And it was of a rat, okay.
And we can all remember, you know, in our school textbooks, maybe a biological picture of a rat where you can see it with its, you know, an artist's impression of, you know, the fur, the whiskers, and it looks quite cute, but then maybe a bit of it's been cut away as a kind of dissection.
Yeah.
And that was the case with this beautiful AI rendition.
But what it was trying to illustrate was the rat's penis.
But the only problem was that the penis that was also shown in that dissection cutaway style where you could see all the vessels and the tissue inside, it was nearly as thick as the rat was wide.
and was so large it extended outside the frame.
That's about twice the size of the rat's body.
It was about twice the size of the actual rat.
And some very puerile people on the internet thought it was incredibly shareable and red places all round, and the journal did retract the article and apologise.
I mean, this is a fairly well-known journal as well, wasn't it?
And some of the labelling on that diagram made no sense whatsoever.
Yeah, like serotomacell, I think it was, and sanctolic stem cells.
It also did point to rats and got that correct and said this was indeed a rat.
But this is just the tip of the iceberg.
AI-generated images are making an increasingly frequent appearance in published scientific papers.
So is that a problem?
Joining us now is image integrity analyst Jana Christopher.
Hi, Jana.
Welcome to the programme.
Hello, and thanks for having me.
It's an absolute pleasure.
Now, it's your job to check images in research papers before they're accepted for journal publication.
Is that right?
AI
must have really changed things for you then?
Yeah, with AI, obviously, things have changed.
We were talking there, you know, somewhat tongue-in-cheek about an image that was very obviously and ridiculously AI produced, but how difficult generally
in your job is it to distinguish an AI produced image from a genuine image?
I mean in a nutshell is very difficult.
And it can almost be impossible.
So at this point AI image generators still occasionally make mistakes.
If you have an image of a human or an everyday object that's AI generated, then you might find flaws like extra digits or body parts that don't align or things like that.
But when you're talking about histology images, for example, it becomes much more difficult.
So Jana, can you just explain what you mean by a histological image?
So by that I mean, you know, a microscopy image showing a piece of tissue on a cellular level so that you can see all the details.
Right.
So some of these more complex scientific images, you're delving into kind of more detail, more complexity.
And so it's not a problem.
That's right, yes.
And it's hard to know, you know, when there's like a little glitch, it's hard to know whether that's actually in the tissue or whether that's something to do with, you know with the ai generation so there was an interesting study actually with with over 800 participants published in nature in november last year that they studied the ability of human subjects to discriminate between artificial and genuine histological images and they found a clear difference between naive and expert test subjects so an example for an expert would be an oncologist studying liver cancer looking at images of liver cells right
so whereas naive participants only classified about half of the images correctly, the experts performed significantly better with a hit rate of about 70%.
So I guess this might give us hope that some of the incidences might be picked out during peer review.
But experience shows that peer reviewers rarely pay that much attention to manuscript figures and they might actually need to be prompted to do so.
They're more focused on the text.
And you can see how if you're presenting data looking, say, at the impact of a treatment on cancer cells, for example, that that's a real problem if that's being faked by AI.
How widespread is the problem?
Do we know?
We don't really know how many papers have AI-generated images in them, simply because we lack reliable tools to detect them.
We're seeing an exponential growth of academic articles published per year over the last decade.
whilst the time spent on obtaining the results and validating them and peer reviewing them has decreased significantly.
We've seen some journals trying to tackle this directly.
Nature has actually banned publication, the use of AI-generated images in scientific papers, but how else do we tackle this issue?
What can be done?
In terms of guidelines, most journals still permit the use of gen AI and large language models like ChatGPT to improve the readability of their own writing, of course.
However, they are accountable for the accuracy of their publication and any use of AI must be disclosed.
So do you think academics need to be specifically trained in spotting AI?
Well it's useful if scientists, you know, if the readership is able to spot these things, but I suppose, you know, it's not going to be possible without tools.
And so the publishers and journals are really at the front line of this and they are responding to what many regard an integrity crisis in scientific publishing.
And they're building up their defenses by expanding their integrity department.
But we also have a choice of image integrity tools, all of which mainly look for image duplications.
And these detection tools are also attempting to detect AI-generated images.
Is that using AI to detect AI?
That's right, exactly.
That's right.
And at this point, it has to be said that they are very unreliable, unfortunately.
Well, thank you very much, Jana Christopher from the Federation of European Biochemical Societies.
And thank you, Gareth Mitchell, for taking us on this fascinating deep dive into AI.
I think it's probably raised more questions than answers.
Certainly has.
And that image will stay with me for a while.
It's been a real pleasure, though, Vic.
Thank you.
Absolute pleasure.
Now, though, we're going to end today by turning away from AI-generated content and looking up at the night sky.
Because January and February, when it's not been raining, have given us some stargazing highlights.
And for the past few nights and into the weekend, a parade of planets has been visible.
I'm joined now by Catherine Haymans, Astronomer Royal for Scotland.
Hello, Catherine.
Hello, Vic.
Have you been looking looking up in the night sky and admiring the planetary parade?
Well, I had a little glimpse the night before last, but the cloud has just been, as per usual, really dominating my view of the night sky.
But yeah, I did have a little bit.
You can find me, Vic, shaking my fists at the clouds.
Yeah, well, I'm in the northwest of England.
You're in Scotland.
We're maybe not best with the most stargazing, friendly weather.
But talk us through, Catherine, what exactly is a planetary parade?
It's such a lovely phrase.
It is a lovely phrase isn't it yes as you said for the last few months we've had six planets up in our night sky um and in the last uh in this week Mercury is joining the pack so that completes your planetary bingo card you've got mercury venus earth we're sat on of course mars jupiter saturn uranus neptune all up in the night sky about half an hour after sunset you don't need to stay up really late at night you don't need to get up early in the morning just eat your dinner go out and tick all of those planets off your card the solar system on parade how often does this happen how rare is this Yeah, so there are always planets up in the night sky to look at.
The Earth goes around the Sun once every year.
So at some point during the year, you'll be able to see where the planets are.
But to have all seven up in the night sky at the same time isn't going to happen again until 2040.
So this is quite rare.
But some of my astronomy colleagues are a little bit grumpy about the hype because actually it's not the best time to see some of the planets.
So we've had Saturn has been beautiful in our night sky up until a couple of weeks ago and now it's getting really close to the Sun and so to see Saturn you're seeing it in the glare of the sunset on the western horizon.
So it's really hard to see Saturn at the moment and Mercury the same.
It's only just popped out from the glare of the sunset and in a few weeks time it's going to be much easier to see Mercury but by then Saturn will have gone.
So we're kind of in this sweet spot right now where we've got all seven up but actually really hard to see Mercury and Saturn at the moment and also Uranus and Neptune you always need a telescope for anyway but Venus really easy to see
super bright in the west if you think it's an aeroplane but it's not moving that's Venus.
You've got Jupiter right up above your head at the moment just after about sort of six o'clock in the evening and if you draw a line in your mind between Venus and Jupiter from the west across to Jupiter and then extend that in an arc all the way across to the east then you will hit the red planet of Mars.
Is there a time of that should we be going out in the middle of the night?
Is this right after sunset?
When's the best time to get the best display?
If you want all seven planets at the same time you've got a very short window about half an hour after sunset.
So head out about six o'clock.
I would advise people to download an app on your smartphone to cheat.
So there are lots of different star apps.
You can get one called Stellarium that's free and then you can just point your smartphone up at the night sky and it will tell you where everything is.
And that's a much easier way of particularly Uranus and Neptune, which you can't see with your own eye anyway.
But as the night goes on,
Neptune, Mercury and Saturn will set, but you've still got Venus, Jupiter and Mars shining brightly for a lot of the night.
And that will carry on throughout the rest of March anyway.
And Mercury is going to get easier to see as well.
Wonderful.
Get your coats on, take your smartphones, and then once you've picked out the planets, put the phones away and just stare up at the night sky.
Thank you very much indeed, Catherine.
It's been an absolute pleasure to talk to you about the planetary parade.
But that is all the night sky wonder and disturbing AI-generated imagery that we have time for this week.
You have been listening to BBC Inside Science with me, Victoria Gill.
The producers were Ellen Goodman, Ella Hubber, and Sophie Ormiston.
Technical production was by Matt Chamberlain and and Chris Mather, and the show was made in Cardiff by BBC Wales and West.
Do you think you know more about space than we do?
Head to bbc.co.uk, search for BBC Inside Science, and follow the links to the Open University to try the Open University Space quiz.
And if you have any questions or comments for the Inside Science team, do contact us by email on insidescience at bbc.co.uk.
Until next time, thanks for listening.
BBC Sounds, music, radio, podcasts.
Suffs!
The new musical has made Tony award-winning history on Broadway.
We demand to be home.
Winner, best score.
We the man to be seen.
Winner, best book.
We the man to equal it.
It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.
Suffs.
Playing the Orpheum Theater, October 22nd through November 9th.
Tickets at BroadwaySF.com.