But first, let’s listen to the ode to touching grass:

https://www.youtube.com/watch?v=eDr6_cMtfdA

Now, to the dull parts.

💡

This is not indended to be a direct comparison - rather estimating the dimensionality and parallel processing demands for the brain.

Paving the landscape

I’ve always thought about the GPU vs. brain computing derby - maybe out of sheer enjoyment of those “Top X of…” articles or comparisons, but still.

Also:

Why there’s so much fuss about wetware computing (mentioned here)
Where we’ll potentially be headed computing- and storage-wise (e.g. new Poxiao drive tech from China - looks great, but let’s see if it’s adopted in real-life)
~~Why did the LISP machine fail~~ (they’ve been outpaced by microchips on their own ground)
~~Why 42~~

And other deep theoretical questions, you know.

First, there are some maxims that I’d like to introduce:

🤖 is potentially using a lot less for the ‘infrastructure’ - at the same time, brain seems to only dedicate a small bit (i.e. 10 bps) to high-level thinking
🧠 is using less energy - about 0.0125 kWh compared to RTX 4090’s 0.45 kWh. Otherwise we’d have to get some noice cooling system, boyy! Some are already researching, as well as air inlets as T-Rexes did

Let’s do a comparison on the nature of data transmitted, and the parallelization capabilities.

Data

Conputer : r/ChatGPT

🤖 conputer: bits comprising numbers and operations. But in the end, it’s binary - 0 or 1. ~~Quantum computing changes that, yeah-yeah.~~
- in neural networks, we’re definitely operating bigger structures - usually tensors
🧠 brain: at the first sight it’s binary (neuron either activates or not), yet it’s a lot more complex on the second glance (someone even did more back-of-the-napkin calculations)
- action potential is slightly ‘more’ than binary: it can be stronger or weaker, at the very least
- ion gradients, i.e. the difference between ion concentrations both sides of the ion pump (they influence the activity)
- secondary messengers, neurotransmitters, neurosteroids, cytokines, neuron-glia interactions etc. transferring the information inside, between cells and regions. To me, this is the inflection point - those transmit and store a lot of information: whether to activate, how strong to activate, how much to spread, which pathways to induce etc. That can be aptly called a neuron’s metabolome.
  
  A simple example - a single messenger, cAMP, can reach ~40 bits/hour.
- synapses seem to hold 4.1-4.7 bits, a neuron has 7,000 synapses, and we have 86,000,000,000 neurons on average ~~(I know of a SWIM who’s able to write articles with, like, half that!)~~

Let’s highly theoretically represent any possible neuron’s data dimension x as d vector (metabolome and data), information rate as r vector, state–synapse weighting as a W=[d x s] (synapses) matrix, for neuron i, synapse j:

$$W_j = \left[ d_j \times s_j \right]$$

Individual information per synapse:

$$inf_j := W_j \cdot x$$

A-and the information rate per synapse:

$$I_j = r_j inf_j(x)$$

Finally we arrive at the possible information throughput per neuron:

$$I_i (x) = \sum_{j=1}^s I_j$$

Despite possible math notation fock-ups, that’s still a couple of orders more complex than “it fires or it doesn’t”. For more information, we should cater to computational neuroscience - I’m no real Pinocchio ~~(at least as of now).~~

My shabby 5-min drawing skills at it:

Processing

Let’s now consider parallel capabilities.

🤖 The said RTX 4090 has 16,384 cores, each operating on 2.52 GHz. ~~Okay, I didn’t expect these levels…~~ 41,287,680,000,000 or 4.13×10¹³ operations per second.

🧠 Brain has 86 billion * 7,000 cores, each operating a lot slower (up to Hz) - but operating simultaneously, and always carrying the state/metabolome. Each neuron and, by extension, synapse, is operating in parallel to all the others.

TL:DR

🤖:

a lot faster per core, yet synchronous
homogeneous architecture (mostly; we still have differing core types on CPUs and GPUs, right?)
requires less upkeep e.g. on movement, vision, proprioception, etc.
precise in the sense that everything is low-noise, stored and calculated deterministically
even single-bit errors pose problems and require correction
almost no on-chip re-learning, rather static
state/memory is stored separately, needs transmission

🧠:

a lot more parallelizable and fully asynchronous
heterogeneous (neuron/glial cell classes are a thousand and a some - more than 3000 types at the least)
state is higher-dimensional and carries more information, i.e. the information amounts per ‘cycle’ are incomparable. In addition, it’s stored in-cell. Essentially, it’s all in-memory ~~(Redis afficionados here?)~~
error-tolerant via self-repair and redundant transmission. You can’t lick a badger twice, and you can’t fool a neuron
probabilistic in its nature, possibly quantum (even more states encoded). UPD: looks like tryptophan complexes may indeed elicit quantum effects
on-line learning and adaptation, ‘weights’ updated dynamically all the time

Neuromorphic chips try to mimic the best of brains’ capabilities (namely - parallel processing). And yeah - the reason GPUs and TPUs fare better was precisely the same before optimizations started flowing (more cores working in parallel). More on them a tad later.

One more product I’d love to touch base on is Green Array Chips:

144 separate CPUs
each has built-in RAM and can communicate with adjacent cores
energy mostly consumed during computation
ability to continue the computation after power outage

Looks a bit more like the brain, eh?

Also, let’s consider the promised ~~land~~ neuromorphic chips, e.g. Brainchip’s Akida:

BrainChip's Akida: Neuromorphic Processor Bringing AI to the Edge | by NeuroCortex.AI | Apr 2024 | Medium | Medium

Is made of ‘spiking neurons’ (1.2 M of them) connected into ‘synapses’, all organized into 80 nodes with 6 MB on-device memory each
Converts data to ‘spikes’, i.e. signals that output activation signals able to propagate further via synapses (figure from here):

Frontiers | Direct training high-performance deep spiking neural networks: a review of theories and methods

Can scale to up to 1,024 chips
They state nodes work independently, but apparently they share the same clock (from their deck)
Can perform 🧠 on-chip learning - i.e. changing weights

So, these look more or less like the brain ~~(hence the~~ ~~neuro~~~~morphic, genius)~~

And that’s my inner mental landscape after trying to understand how virtually anything works:

Hey, you dropped this- brain v4" Sticker for Sale by zakariadesigner | Redbubble

Strategic conclusion

I hope that was okay for a short’eh differences’ snapshot. As always, strategically the thing that counts remains: which paradigm will be leading in the next 1-2-5-10-30 years?

IMO there are a lot of factors that could influence that:

Research funding and breakthroughs (something becoming real or cheap/reproducible enough)
Legal implications (the legal-illegal/tax-intensive/incentivized spectra)
Big players pushing some paradigm ~~(or a couple to be safe)~~ to the markets, sometimes because of (2)
Some paradigm becoming or staying a dead-end (e.g. because (1) not yet happened)