I argue that JEPA and its Energy-Based Model (EBM) framework fail to capture the deeply intertwined nature of learning and prediction in the human brain—the “yin and yang” of intelligence. Contemporary machine learning approaches remain heavily reliant on resource-intensive, front-loaded training phases. I advocate for a paradigm shift toward seamlessly integrating training and prediction, aligning with the principles of online learning.
Update: Interesting paper, thanks. Comment on selection for Hydra — you mention v1 uses an arithmetic mean across timescales for prediction. Taking this analogy of the longer windows encapsulating different timescales, I’d propose it would be interesting to train a layer to predict weighting of the timescale predictions. Essentially — is this a moment where I need to focus on what just happened, or is this a moment in which my long range predictions are more important?
Ty for reading the paper! I completely agree! Assigning soft weights to the window based on context is a fascinating research area. This concept is similar to Ebbinghaus' forgetting curve, which emphasizes recency bias while requiring repeated exposure for long-term retention.
So you believe humans spend more energy on prediction, relative to computers? Isn't that because personal computers are not powerful enough to train big models, and most people have no desire to? It is more economically efficient to socialize the cost of training, as is done. Are you thinking of a distributed training, where we split the work and cost? That could happen when robots become more widespread.
The human brain operates at just 25W of power—less than the monitor you're likely using right now—whereas AI models like ChatGPT consume nearly 1GWh every 24 hours!
As I discuss in the paper, predictive coding suggests that the brain actively generates predictions and compares them to incoming sensory data (vision, hearing, etc.), prioritizing anomalies. Its efficiency stems from a hierarchical memory system that continuously updates only the "deltas"—the differences that matter. Embracing this approach could lead to a paradigm shift, enabling the development of significantly more energy-efficient AI in the future.
Disclosure: I am the author of this paper.
Reference: (PDF) Hydra: Enhancing Machine Learning with a Multi-head Predictions Architecture. Available from: https://www.researchgate.net/publication/381009719_Hydra_Enh... [accessed Mar 14, 2025].