Steve Haigh · Computational Biology

How do LLMs work? — a shallow-dive for scientists

Steve Haigh — Wed, 27 May 2026 00:00:00 GMT

A few weeks ago I gave a short lab-meeting talk introducing large language models to the group. The audience was scientists, not engineers, so it’s pitched at the “what is this thing actually doing” level rather than the implementation level. I’ve put the deck on the site now in case it’s useful to anyone else looking for the same kind of pitch.

How do LLMs work? →

A few caveats up front. This is not a rigorous machine-learning talk — it’s a shallow-dive on purpose. I deliberately leave out a lot of detail (no maths, no architecture diagrams, no training objectives beyond hand-waving about next-token prediction) to keep it short enough for a lab meeting slot. If you want the proper version, I link to the right people at the end of the deck — Karpathy, 3Blue1Brown, Jay Alammar’s Illustrated Transformer, Raschka’s Build an LLM from Scratch. Those are where to go after this.

What I do try to cover:

What an LLM actually is — a giant neural network trained to predict the next token, with everything (words, proteins, images) mapped into a high-dimensional vector space.
Embeddings as geometry — the punchline that “meaning is a direction in space”, and why the same trick powers the protein and chemistry models people in the lab will increasingly bump into.
Attention, briefly — what it does, why it scales, and why “every frontier system today is a transformer” isn’t an exaggeration.
The base model vs. the product — the gap between a raw LLM and ChatGPT / Claude / Copilot, and what RLHF actually does (with a thesis-editing analogy).
A live-ish demo comparing an untuned base model to a tuned one when you ask it something it shouldn’t help with — the kind of thing that’s hard to convey without seeing.
Using them well in research — where they actually help (coding, literature triage, data wrangling from PDFs), and where they bite (hallucination, plausibility bias, sycophancy, sampling variance, and the data-leaving-your-machine problem for unpublished or patient data).

The framing I keep coming back to is: treat the model like a fast but unreliable collaborator. You’d review an RA’s draft before signing your name to it — do the same here.

The deck is built with Quarto reveal.js from the same template I wrote up the other day. PDF and PPTX downloads are linked from the Projects & Decks page.

The Story So Far

Steve Haigh — Mon, 25 May 2026 00:00:00 GMT

This post is an amalgamation of a few LinkedIn posts, so if you follow me there you may have seen this already

From Debugging Code to Debugging Cells: A Year of Switching Fields

Last August, after thirty years in software — Microsoft, Skyscanner, Ericsson, Avanade, Logica, MDA Space I decided to leave it all behind. Not for another tech company, but to start an M.Res. in Biomedicine at the University of Reading. This is a short account of how that’s gone so far, written partly for friends and former colleagues who’ve asked, and partly for anyone else weighing up a similar leap.

Why biology, and why now

The interest wasn’t sudden. It had been building for years through BBC Horizon documentaries, books by Dawkins, Ridley, Nick Lane, and — probably the one that tipped it — Siddhartha Mukherjee’s The Emperor of All Maladies. I even started semi-seriously studying in my own time using MOOCs: the MITx Introduction to Biology course taught by Eric Lander was a turning point, and I worked through much of the MITx biochemistry, molecular and cell biology catalogue after it.

At some point the reading stopped feeling like enough. I didn’t just want to follow the science from the outside; I wanted to do some of it. So, I went back to full-time education. My plan was vague (still is), but enrolling on a Masters course near where I lived was low risk and whatever follows I knew I’d enjoy learning again.

The first six weeks: lectures, journal clubs, and introduction to a real lab

The course opened with six weeks of lectures and seminars — a tour through the research happening at Reading, journal clubs to get used to reading papers critically, and time to start thinking about where my own project might sit. I sat in on some undergraduate cancer lectures too, which were genuinely useful for filling gaps.

Then came the labs. Before this course I had never set foot in a life-sciences lab, so the first week felt like jumping in at the deep end. I learned to culture E. coli, use them to produce a human protein, then extract and purify it. Then a more challenging practical; culturing a cancer cell line and looking for specific proteins to verify if the cells were migrating. I came out the other side with a stack of techniques I didn’t have before, plus a useful refresher on stats and Excel. More importantly, to me at least, I absolutely loved the practical work. It’s one thing to read the theory, it is quite another tp spend days preparing a sample and then loom under a microscope and see results.

End of semester one: the unexpected hard part

By the end of the first semester I’d handed in my first major piece of coursework and had a chance to reflect.

The biggest worry going in had been whether I could keep up with classmates who’d done a Biomedicine BSc. The prep paid off though, the MITx courses, YouTube lectures, and a lot of self-directed revision had left me well-prepared. I’m still leaning on those resources now. Uri Alon’s systems biology lectures are excellent, and 3Blue1Brown has me genuinely enjoying calculus again, which I wouldn’t have predicted.

The harder adjustments were ones I hadn’t anticipated. Choosing a research project was much tougher than expected — there are so many interesting directions that committing to one felt like a real dilemma. Although I was new to lab work I really wanted to pursue that, but at the same time I also knew I could do some really meaningful work on the computational side too. I went the computational route (for now), but more on that later.

Academic submissions are a different rhythm from industry work: in a software team you sanity-check things constantly with colleagues. Hitting “submit” without that back-and-forth takes some getting used to. The staff support is excellent when you ask for it, but the default mode is more solitary than I was used to.

Some things have been just great: the people are friendly and supportive, and the lab facilities at Reading punch well above the university’s size. It’s an easy environment to stay curious in.

I’ve also picked up a whole new toolkit along the way — Zotero for references, BioRender for figures that make it look like I know what I’m doing, GraphPad for analysis, and even MATLAB, which I hadn’t expected to be learning at this stage but has been surprisingly fun (spoiler - I’m not using MATLAB for my current work, I went back to my happy place of Python in end).

The project: a whole-cell model of a platelet

The research project — the bulk of the course — is now well underway. I’m building a computer model of a human platelet. Platelets are the cells responsible for wound healing, but they’re also central to heart attacks and strokes, so well worth understanding in detail.

The project sits at the intersection of physics, computation, and biology, which is a good place for someone with my background to be useful. The first version of the model is running, which is a milestone I’m pleased with, and the work has been a genuine two-way exchange — my software skills get a real workout, and I’m absorbing a lot of biology in the process. The plan is to write up and submit the thesis in August.

What comes next

I’m making tentative plans for what comes after the M.Res., though my ideas change weekly. I could extend the model to other cell types. I could use it to study aspects of cancer. I could stay on platelets and push the research further. The idea currently in the lead is to take what I’ve built — usable by a software developer in its current form — and turn it into something a biologist with no programming background could actually use.

Whatever I land on, I’m certain it will be at the intersection of computing and biology. One of the things I’ve learned this year is the cliché that turns out to be true: the more you learn, the more you realise there is to learn. I have far too many ideas and not enough time, which feels like the right problem to have. If you’re thinking about something similar

So, 8 months in, no regrets. Coming back to study was the right call. If you’re a software person thinking about a move into research, or you’ve already made a similar jump — especially into biomedical research — I’d genuinely like to hear from you. And if you’re working at the computing/biology intersection and have thoughts on what useful routes forward look like, even more so.

An ‘academic’ deck template, and why it looks the way it does

Steve Haigh — Mon, 25 May 2026 00:00:00 GMT

I was never very impressed with the default PowerPoint templates so for lab-meeting deck I actually made my own set of slides (with some help from my friend Claude).

Downloads

Blank template (.pptx) — open in PowerPoint or Keynote and start typing.
Worked example: the platelet feasibility talk (.pptx) — the same template with real content.

Why this template

There are two things I wanted to avoid. The first is the corporate template with too much chrome, a wide logo bar eating the top of every slide, two stripes of branding at the bottom, and you end up with maybe 60% of the page left for the thing the audience came to see. The other is a blank deck where every slide ends up looking slightly different because there’s nothing pulling them together, which is a mistake I’ve made before.

The template tries to land somewhere in between. Here’s the rationale (I’m no designer, so this is perhaps a bit amateur-sounding, but I think it works)

Content fills the slide. No big banner, no side rail. Just a title, a small section breadcrumb above it, a footer line with the talk metadata, and a slide number in the corner. That’s it.

Two-column slides are first-class. Some of the slides in the example are some variant of “what I considered” on the left, “what I did” on the right. Forcing this into two columns is a useful constraint as it stops a slide from sprawling.

Section breadcrumb instead of divider slides. Rather than a full “Section 2: Methods” divider every few minutes, each slide carries a small uppercase label above its title showing which section it belongs to. Same orientation cue for the audience, no break in the flow.

The Quarto version

I’ve experimented with Quarto and using it to create a PPTX. It didn’t work out so well. I have had success in using Quarto tooling to take an existing PowerPoint deck and create a web-presentable deck and a PDF, and maybe when I get chance I’ll try again using Quarto to write the original PPTX too, but to be honest I’m not sure if that’s worth the effort.

The PPTX template above is for when you just want to open a file and edit in PowerPoint.

Caveats

It’s not pretty in the way a designed deck is pretty. There’s no illustration system, no animation, no carefully balanced colour palette. If you need any of that, you’ll want a real template from a designer (or BioRender, which has some nice looking templates). This is more of a minimalist template.

Lab meeting: the platelet model, first showing

Steve Haigh — Fri, 22 May 2026 00:00:00 GMT

Presented the platelet whole-cell model work at the Dash lab meeting yesterday — first time talking it through end-to-end outside my own head. Lots of good questions and (unexpectedly) a few offers of follow-up collaborations, including what might turn into a small future study for me. More on that once it firms up.

Slides are here:

The feasibility of a whole-cell model of the human platelet →

(PDF and PPTX downloads are linked from the Projects & Decks page.)