If that "1 task" happens to be performing matrix multiplication (or even merely fused multiply add), you can do a heck of a lot with that. You still need digital circuitry to support the IO, but the key idea is doing linear algebra in a way that is faster and/or generates less heat per unit compute.
Not a stupid question. Economically, memory density will hit a brick wall soon. Developers should prefer to waste time and save space, since parallel computation will not hit a similar limit in the foreseeable future. Memory-to-core ratio is going to be falling.
TLDR is you can't. For a very simple example, storing the products of all 3x3 matrices * length 3 vectors in Float16 precision would take 2^193 bytes (which is obviously impractical).
Part one (teaser): https://www.youtube.com/watch?v=IgF3OX8nT0w