# [AI in] Scientific discovery

%[https://www.youtube.com/watch?v=1UUYjd2rjsE] 

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">A wise man said: <em>“Write smaller posts, but more often”</em>. And I said “no.”; so, here’s a multi-link behemoth (it’ll get better, I swear!).</div>
</div>

![Bugs Bunny No Meme Origins And History](https://uploads.dailydot.com/2024/06/bugs-bunny-no.jpg?auto=compress&fm=pjpg align="center")

*Literally me*

One thing - I’ll add own **thoughts on where each area is moving** (science/ventures) so we’ll be able to point fingers and laugh later!

# [Deepmind’s co-scientist](https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/)

![AICoScientist-1-Components](https://storage.googleapis.com/gweb-research2023-media/images/AICoScientist-1-Components.width-1250.png align="left")

So, after [Alphafold](https://alphafoldserver.com/) (protein folding prediction, one of the ‘unsolvable scientific mysteries’) and [GNoME](https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/) (discovering 2M+ materials, 300K+ of which are indeed worthy of future research), Deepmind has released an agentic system for scientific discovery. Contrary to the multitude of [LLM-generated bullcrap that had predictably infested publishing houses](https://misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/), and even [Stanford’s STORM](https://storm.genie.stanford.edu/) ([source code](https://github.com/stanford-oval/storm/)) that generates hypotheses and debates them.

## Any results?

![AICoScientist-10-RediscoveryTimeline](https://storage.googleapis.com/gweb-research2023-media/images/AICoScientist-10-RediscoveryTimeline.width-1250.png align="left")

One of the fascinating <s>cherry-picked</s> examples is **packing ~10 years of antimicrobial resistance research into 2 days of agent inference**. The solutions were comparable or better than current SotA research on antimicrobial resistance.

Co-scientist is able to use Internet, but **currently seems unable to automatically simulate/check the hypotheses**. Still, this proves as an incredibly powerful tool - Deepmind’s at it again ❣️

## Further developments (🧂)

*<s>Meaning “take this with a grain of salt”</s>*

Loosely based on [this comment](https://www.linkedin.com/feed/update/urn:li:activity:7252466712107773952?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7252466712107773952%2C7252601131116171264%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287252601131116171264%2Curn%3Ali%3Aactivity%3A7252466712107773952%29) of mine:

1. Heuristic/metric **automated ideation and development**
    
2. **Checking the hypotheses** where possible, using the required tools (e.g. an exposed API to an affinity checker, etc.)
    
3. **Generating the critic-generator-judge** (later meta-judge, e.g. [meta-rewarding LLMs](https://llmagents-learning.org/slides/Jason-Weston-Reasoning-Alignment-Berkeley-Talk.pdf)) sub-agent structure
    
4. <s>No, please no</s> Auto-**publishing** after human verification
    
5. <s>No, no, no, no</s> Auto-**reviewing** other papers, e.g. LLMs-as-scientific-reviewers
    

And **products on each and every stage.**

Anyway, given the recent rise of LLMs-as-judges and other *“let’s take this 0.5-12B param and finetune so it…”* cases, we could IMO expect **further productization of different LLM-based specialization areas**. The old *“do one thing and do it really well”* adage applies here without any friction.

# Graphs for scientific discovery

Launching in 3…2…1…Pew-pew-pew!

Recently I’ve stumbled upon [awesome work by Prof. Marcus Buehler from MIT](https://news.mit.edu/2024/graph-based-ai-model-maps-future-innovation-1112). Starting with fine-tuned [models for materials science](https://arxiv.org/pdf/2310.10445) to more [complex graph-based reasoning](https://www.linkedin.com/posts/markus-j-buehler-2245682_how-can-we-build-ai-models-that-do-not-just-activity-7264601243258437635-e1UX?utm_source=share&utm_medium=member_desktop&rcm=ACoAABLqCHQB-uNcud8rCCpyx2VRi__GJH-e5Ms) first. Being interested in graphs and neurosymbolic approaches ([covered before](https://posts.teleogenic.com/graph-news)), I’ve started diving <s>like a ravenous whale.</s>

![No alt text provided for this image](https://media.licdn.com/dms/image/v2/D4E22AQHcnl2DafHs2Q/feedshare-shrink_1280/feedshare-shrink_1280/0/1732015904882?e=1744243200&v=beta&t=4dUn1UA4Qmb_wezmOMfZogLFxq7Qz0E9sKM8ND4fyGw align="left")

## LLM approach was checked out first…

So:

* a graph was created from 1000+ papers
    
* calculating node metrics
    
* devising optimal graph reasoning algorithms and techniques
    
* a lot of weird similarities were found between materials’ properties graphs and, e.g., Beethoven’s 9th Symphony
    
    ![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741186833332/ac9ae52d-6112-488e-b02c-fba3eaaa3972.png align="center")
    
* using graph reasoning and principles from Kandinsky’s ‘Composition VII’ painting (that was weird, okay) to generate materials, incl. using DALL-E
    

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741187035345/1653b417-4f6b-4188-9150-b4e4ef1952a3.png align="center")

## Graph attention next!

Next frontier I loved was [replacing the usual Transformer attention module with Graph Isomorphic Networks](https://www.linkedin.com/posts/markus-j-buehler-2245682_how-can-we-build-ai-models-that-think-rather-activity-7292904582374932480-6kb1/) ([paper](https://arxiv.org/abs/2501.02393), [source code](https://github.com/lamm-mit/Graph-Aware-Transformers))

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741441123767/a248f9e3-759c-40d6-b76a-2ebfc6cd67c8.png align="center")

### TLDR (explanation [here, too](https://www.youtube.com/watch?v=qld6cIH5iJ4))

* the usual attention is Q, K, V - Query, Key, Value; nicely explained [by e.g. Ebrahim Pichka](https://epichka.com/blog/2023/qkv-transformer/), or an [awesome (!) visualization by Brendan Bycroft](https://bbycroft.net/llm), and [another one](https://poloclub.github.io/transformer-explainer/). Multi-head latent attention as the ‘trendiest’ below:
    
* ![DeepSeek Technical Analysis — (2)Multi-Head Latent Attention | by Jinpeng  Zhang | Jan, 2025 | Medium](https://miro.medium.com/v2/resize:fit:1200/1*g0kJ90z4LW4ZgtCKur4gTg.png align="left")
    
    it’s replaced by an [graph isomorphism](https://towardsdatascience.com/how-to-design-the-most-powerful-graph-neural-network-3d18b07a6e66) (to tell the truth, it’s [not the only lab that pursues GTs](https://github.com/wehos/awesome-graph-transformer)) network-like structure (i.e. *are the graphs of the same shape, and what is the difference*)
    

![Rethinking Attention with Performers](https://storage.googleapis.com/gweb-research2023-media/original_images/6ebe377c24a370b71518471f96dc1d48-image12.jpg align="center")

* * each **token** is viewed as a **node**, **attention score** is **edge weight**
        
        * **adjacency matrix** (graph representation) is computed, and aggregated to calculate the output
            
        * for multi-head attention, **each head computes an individual GIN adjacency matrix** which are then **concatenated**
            
        * GIN introduces a **sharpening parameter α** (⬆️ α leads to sharper attention, while ⬇️ α produces a smoother distribution) to dynamically shift ‘attention focus’
            
        * **sparse graphs** are used for greater expressivity and lower computational overhead <s>(e.g. why put up a 0.000(…) weight why you can optimize)</s>
            

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Attention mechanisms can be seen as <strong>dynamically evolving adjacency matrices</strong>, where <strong>learned relationships between input elements correspond to edge weights</strong> in a graph</div>
</div>

[Later](https://www.linkedin.com/posts/markus-j-buehler-2245682_we-trained-a-graph-native-reasoning-ai-then-activity-7299404678662873088-XuXr) on, it still seems like a multi-agent scientific reasoner ([SciAgents](https://github.com/lamm-mit/SciAgentsDiscovery)), but using graphs as a world model:

> We trained a graph-native reasoning AI, then let it think for days & discovered that it **formed a dynamic relational world model on its own**, no pre-programming. Emergent hubs, small-world properties, modularity, & scale-free structures arose naturally. The model then **exploited compositional reasoning & uncovered uncoded properties from deep synthesis**: Materials with memory, microbial repair, self-evolving systems.

![](https://media.licdn.com/dms/image/v2/D4E12AQG2Mx0oeNS0XQ/article-inline_image-shrink_1500_2232/B4EZVeOHHkH0AU-/0/1741042512915?e=1746662400&v=beta&t=5T2HMP1jZEcoinAMHl1VffzMPCGiVuVQKlwEaxPsXbY align="left")

## [Recent paper + breakdown](https://www.linkedin.com/pulse/small-worlds-yield-big-ideas-markus-j-buehler-ix12e)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741195140184/8971aad4-42da-4e81-a36f-1d371a62ad76.png align="center")

It outlines how the last version of the sci-discovery system works:

1. Question or topic for pondering (*“Does Shrek think of Roman Empire?”*)
    
2. Thinking in *&lt;thinking&gt;&lt;/thinking&gt;* tags a-la DeepSeek et al., but more akin to knowledge graph
    
3. Extracting new knowledge (i.e. local subgraphs to integrate into ‘world model’ graph, e.g. *“wood + fire = coal + ash”* → *ADD {“wood IS material”, “wood COMBINES with fire”})*
    
4. Merging with big global graph
    
5. Next prompt
    
6. Repeat → (1)
    

And there’s also a simple **generator-critic style agent loop** that incorporates graph enrichers/generators, too:

![](https://media.licdn.com/dms/image/v2/D4E12AQEIQcCwDUYuwg/article-inline_image-shrink_1500_2232/B4EZVgezxfGwAY-/0/1741080446434?e=1746662400&v=beta&t=tGD8Mse95v0Sem1F604otRubT8tspLebpITEyXl-xqw align="left")

In the end…we’re getting a system that has an **interpretable** and **explainable** world model:

![](https://media.licdn.com/dms/image/v2/D4E12AQECcKBredk9Ow/article-inline_image-shrink_1500_2232/B4EZVgegjlHMAU-/0/1741080366367?e=1746662400&v=beta&t=6LQOPp2EKV5qsNY5qEieqnrDe9NKrE8WDTO_4XVjATU align="left")

Yet I may be mistaken and propose that there should be a **world graph consensus/verification mechanism or agent** somewhere! <s>An idea for a pet project… Another one…</s>

### Other approaches, e.g. JEPA

I’ve written about [Yann LeCun’s JEPA architecture before](https://posts.teleogenic.com/yann-lecun-and-the-jepa-of-ai). This approach seems to move forward, with causal understanding of, for example, physics laws:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741268706322/77a73b12-08dc-4c57-a687-b456f0d34f80.jpeg align="center")

As soon as I’ve pondered on combining JEPA learning with graph representations - someone’s [already published a paper](https://www.aimodels.fyi/papers/arxiv/graph-level-representation-learning-joint-embedding-predictive) on it.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741269112441/f96c9232-508d-4b14-84f4-13ded909f662.png align="center")

A graph representation is learned; but those are **not knowledge graphs at the moment.** Another thing - nothing [enriches the graph using logic laws](https://www.perplexity.ai/search/knowledge-graph-enrichment-usi-elE7KFA7R_qx59_t1zUGTw).

![Knowledge Graph Reasoning Made Simple [3 Technical Methods]](https://i0.wp.com/spotintelligence.com/wp-content/uploads/2024/02/query2vec.jpg?fit=1200%2C675&ssl=1 align="left")

## Other things that caught my attention

* [Tokenized architecture](https://arxiv.org/html/2501.02007v1) - IMO any kind of parametric encoding/hot-swapping is the **step in the right direction**
    
* [Google Titans](https://substack.com/inbox/post/156966320) as a new memory paradigm (surprise-based, usage-retaining, dynamically reorganizing → all the features of real memory)
    
    ![Google Titans: End of Transformer based LLMs? | by Mehul Gupta | Data  Science in your pocket | Jan, 2025 | Medium](https://miro.medium.com/v2/resize:fit:1400/1*_YLjLN1GDHtAFWhr3rW5Pw.png align="left")
    
* A generalized world model graph, or maybe specialized ones? [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page) seems OK for the start. Or [Google’s](https://support.google.com/knowledgepanel/answer/9787176?hl=en).
    

![Cortical Labs](https://corticallabs.com/images/neuron-on-chip.jpg align="left")

* Definitely some bio-inspired computing, or bio-computing altogether. [Cortical Labs, I’m looking at you](https://newatlas.com/brain/cortical-bioengineered-intelligence/)! How long until this computer [learns to play DoTa](https://www.cell.com/neuron/fulltext/S0896-6273\(22\)00806-6)…
    
* Other emerging computing paradigms that may (!) lift off are:
    
    * [reversible computing](https://spectrum.ieee.org/reversible-computing) (i.e. internally recycling the heat/energy generated by calculations), spearheaded by [Vaire](https://www.datacenterdynamics.com/en/analysis/vaire-computing-reversible-computing-semiconductor-chip/). A pretty [awesome write-up here](https://www.sandia.gov/app/uploads/sites/210/2022/06/ECI22-talk-v7.pdf)
        
        ![](https://cdn.hashnode.com/res/hashnode/image/upload/v1741193096191/27c97dbd-8ac2-40f5-88b4-05dc74cd1074.png align="center")
        
    * photonic chips and [Lightmatter](https://lightmatter.co/) + recent [MIT advances](https://news.mit.edu/2024/photonic-processor-could-enable-ultrafast-ai-computations-1202) (*“was able to complete the key computations for a machine-learning classification task in* ***less than half a nanosecond*** *while achieving* ***more than 92 percent accuracy***\*”\*). It’s a chip that uses photons instead of electrons for several improvements:
        
        ![Hot Chips 32 Lightmatter Why Photonics - ServeTheHome](https://www.servethehome.com/wp-content/uploads/2020/08/Hot-Chips-32-Lightmatter-Why-Photonics.jpg align="left")
        
    * the long-hyped quantum computing (explanation [here](https://www.nextias.com/ca/current-affairs/21-02-2025/majorana-1-quantum-computing-chip) → uses quantum phenomena to process information faster, e.g. quantum entanglement to calculate several states at once), where lots are battling - starting with MS with their recent Majorana release, to e.g. [Rigetti](https://www.rigetti.com/) with a 9-qubit (at the moment) chip, [D-Wave and Quantum Computing](https://www.fastcompany.com/91281779/quantum-computing-stocks-d-wave-rigetti-surge-microsoft-chip-news)
        
        ![Majorana 1': A Quantum Chip](https://www.nextias.com/ca/wp-content/uploads/2025/02/Majorana-1-A-Quantum-Chip.png align="left")
        

The landscape is [pretty, pretty dense](https://www.reddit.com/r/coolguides/comments/100li1p/the_different_types_of_quantum_computers_being/) even with a 2-yr old chart:

![The different types of quantum computers being developed : r/coolguides](https://preview.redd.it/the-different-types-of-quantum-computers-being-developed-v0-6rlhvk5xnh9a1.jpg?width=640&crop=smart&auto=webp&s=7af01dc0b85c3603e8937ebf97f72a1e5fceb29e align="center")

# Future directions (🧂)

* Productized → on-premises → commodified **“specific scientific discovery AIs”** for one’s own purposes (i.e. one hooks it up to own simulation/heuristic systems to check hypotheses, working in collaboration with scientists, or at some point alone).
    
    * **Market**: CAGR at a [whopping 53.7%](https://market.us/report/agentic-ai-in-scientific-discovery-research-market/)
        
    * Possible **strategy**: backing companies leveraging the advances / backing the ‘boring’ enablers (APIs/products/security, etc.)
        
* **Goal/architecture-encoded “scientific AI”** akin to TART, so it’s more generalized and adaptable
    
* **Fast-faster chips** (one of the paradigms will blow up for sure), and IMO more **specialized chips** (e.g. **programmable-encodable FPGAs** for the particular task at hand, it seems) akin to [Lisp machines of the past](https://en.wikipedia.org/wiki/Lisp_machine).
    
    * The **market** is so big it’s indecent to mention - [at least 1T by 2030](https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-telecom-outlooks/semiconductor-industry-outlook.html) (will actually be bigger if we consider the emerging chip paradigm race)
        
    * Possible **strategy** - backing several early-to-mid stage paradigms
        
* Infrastructure/enablers, i.e. **storage**, **data transfer**, **orchestration**, **energy storage** and **production**, **automation**… You name it.
    

---

Welcome to **Teleogenic**❣️

Other places I cross-post (not always) to:

* [**Hashnode**](https://posts.teleogenic.com)
    
* [**Medium**](https://baldr.medium.com/)
    
* [**Telegram**](https://t.me/ohmyboi)
    
* [**Twitter**](https://twitter.com/ZakharKogan)
    
* [**LinkedIn**](https://www.linkedin.com/in/zakhar-kogan/)