Over the past few years I’ve been thinking about whole brain emulation (WBE) and the required computational resources. My conclusion is that the required technology level will be reached in the 2025 – 2030 time frame.
Although most estimates focus on calculations per second, the relevant parameters are:
- Calculations per second
- Memory size
- Memory bandwidth per node
- Inter-node communication bandwidth
Here I assume the WBE detail level to be somewhere between level 4 (spiking neural model) and level 5 (electrophysiological model) from the Whole Brain Emulation Roadmap. The computational capacity required would be around 100 Exa-FLOPS (1e20) and the memory capacity 10 Peta-bytes (1e13). Plugging these into the CPU and memory timeline calculators, gives an expected arrival time of 2026 and 2020 respectively for $1M.
Based on my models, it appears that memory bandwidth will be a significant bottleneck, while inter-CPU node bandwidth will not be.
I’ve created a spreadsheet where the two bandwidths are estimated.
The architecture I envision is 1 million compute nodes arranged in a 2-D cluster, 1000 on the side. Each node will have 100 TFLOPS CPU capacity and 10 GB of memory. Nodes are connected to the nearest 4 neighbors using a high speed bus. Longer distances are covered by multiple hops.
The inter-node bandwidth is dependent on the distance that voltage information has to travel, which is related to the axon length. Although some axons are very long, axon lengths probably follow a power law distribution, and long ones are rare. Based on a 1KHz update frequency and ~60 nodes within average axon distance, the required communication bandwidth is 250 Gb/s, which is close to the capabilities of current technology.
The local node memory bandwidth is dependent on the amount of synapse state memory and required refresh rate. Based on a 1KHz update rate, this works out to 14 TB/s. This figure is about 3 orders of magnitude higher than current technology.
The high memory bandwidth required makes it is likely that some kind of CPU/memory on-die integration will be required. The memristor seems a good candidate.
It is interesting to think of the brain as a 2.x dimension structure, due to the non-zero depth and also due to the folds. The question has been asked if this provides a barrier to implementation. The depth aspect is not an issue, since dividing the brain on a 2-D grid will put most vertical columns on the same node. Folds effectively reduce the length that an axon has to travel. It seems that in the worst case two points that “should” be a meter apart (brain diameter when laid flat) are only 10 cm apart (brain actual diameter). This may increase the inter-node bandwidth by a factor of 100, if most axons take advantage of the added dimensionality for routing. Since the inter-node bandwidth is relatively modest, it is doubtful that even a 100 fold increase will make it the bottleneck instead of the memory bandwidth.
In sum, here are the performance requirements:
- 1 million compute nodes in a 2-D topology
- 100 TFLOPS per node
- 10GB memory per node
- 10TB/s memory bandwidth
- 4 links to neighboring nodes at 250 Gb/s
It seems likely that this kind of machine will be available by 2030 for $1M, and possibly as early as 2025. It would be useful to evaluate the increase in memory bandwidth over time and extrapolate to this time frame.