In every great scientific revolution, there is a heretic—a brilliant, fiercely independent, and often controversial figure who claims to have discovered the key principles long before they were accepted by the establishment. In the epic tale of Artificial Intelligence, that role is played, with undeniable passion and relentless persistence, by the German computer scientist Jürgen Schmidhuber. For decades, he has been both a pioneering researcher and a vocal, unyielding crusader for what he sees as his rightful place in the history of the field.
While the “Godfathers of AI” are celebrated for their breakthroughs in the 2010s, Schmidhuber argues, with a mountain of evidence in the form of papers and technical reports, that his labs in Germany and Switzerland were developing the core concepts of modern deep learning back in the early 1990s. His story is a complex and fascinating one, a narrative of groundbreaking invention, a long fight for recognition, and a stark reminder that the history of science is often a messy, contested affair, written by the victors. He is the ghost in the machine of modern AI, the man who insists the revolution actually began decades earlier, in his own lab.
The Early Pioneer in a Deep Learning Winter
Jürgen Schmidhuber’s journey into AI began in the 1980s, a time when the field was still dominated by symbolic, rule-based approaches. Like Hinton, LeCun, and Bengio, he was drawn to the then-unfashionable idea of learning with artificial neural networks. Working on his diploma thesis in 1987 and later his PhD at the Technical University of Munich, he was already tackling problems that would become central to the field decades later. He explored concepts like unsupervised pre-training, generative models, and the use of neural networks to control robotic agents.
His work was characterized by a deep, almost obsessive focus on the fundamental principles of learning and computation. He was not just trying to solve a specific problem; he was trying to derive a universal theory of intelligence. This led him to work that was often years, if not decades, ahead of its time. He developed meta-learning systems (“learning to learn”), created AI that could generate its own goals, and theorized about artificial curiosity and creativity.
However, his most significant and enduring contribution from this era was the Long Short-Term Memory (LSTM) network. In 1991, his student Sepp Hochreiter, under his guidance, completed a seminal diploma thesis that identified a critical flaw in traditional Recurrent Neural Networks (RNNs): the “vanishing gradient problem.” This was the very issue that prevented RNNs from learning long-term dependencies in sequential data, like the relationship between the beginning and end of a long sentence.
To solve this, Schmidhuber and Hochreiter developed the LSTM architecture, publishing the core concepts in 1997. The LSTM was a brilliantly engineered solution. It was a type of RNN with a more complex internal structure, featuring a series of “gates”—an input gate, an output gate, and a crucial “forget gate.” These gates allowed the network to learn to control its own memory. It could learn to store important information for long periods, forget irrelevant information, and decide when to output what it had stored. It was, in effect, a neural network with a programmable, persistent memory cell.
The LSTM was a monumental breakthrough. For over a decade, before the rise of the Transformer architecture in 2017, it was the undisputed king of sequential data processing. LSTMs, and their variants, became the core technology behind a revolution in speech recognition, handwriting analysis, and natural language translation. The speech recognition systems on millions of Android phones, Apple’s Siri, and Amazon’s Alexa were all, at their core, powered by LSTMs. Jürgen Schmidhuber’s lab had created the dominant language-processing AI of its time.
The Battle for Credit
Despite the immense commercial and technical success of the LSTM, Schmidhuber found himself increasingly on the periphery of the official narrative of the deep learning revolution. The spotlight in the 2010s focused intensely on the “Godfathers” and their work on deep feedforward networks and computer vision, particularly after the 2012 ImageNet victory. While their contributions were undeniable, Schmidhuber grew increasingly frustrated that the foundational role of his lab’s work on LSTMs and other key concepts was being, in his view, overlooked or minimized.
This began his long and public campaign for recognition. He became a fixture in the comments sections of articles, on social media, and in interviews, meticulously pointing out the publication dates of his lab’s papers and drawing direct lines from his early work to the modern breakthroughs being celebrated. His argument is not just about the LSTM. He points to a long list of innovations from his research groups:
- He claims his student’s 1991 work was the first to show the power of deep, unsupervised pre-training, years before it was popularized by others.
- He argues that his lab’s use of convolutional neural networks in the early 90s, particularly for winning image recognition competitions, predates and influenced later, more famous applications.
- He points to his work on “neural history compressors” as a precursor to the attention mechanisms found in the Transformer.
- He asserts that his work on policy gradients for reinforcement learning was fundamental to the later successes of systems like AlphaGo.
His campaign is detailed, relentless, and backed by an extensive bibliography of his lab’s work. To his supporters, he is a truth-teller, a righteous academic fighting to correct a historical record that has been skewed by the powerful marketing and public relations machines of Big Tech and a handful of well-connected researchers. They see him as a victim of a “winner-takes-all” culture in science, where credit is often consolidated around a few famous names.
To his critics, however, his constant and sometimes aggressive campaign for credit can come across as sour grapes. They argue that while his lab did indeed produce many important early ideas, science is not just about having the first idea. It’s about building on those ideas, demonstrating their power on a large scale, and popularizing them within the broader scientific community. They contend that while Schmidhuber’s lab may have planted many seeds, it was the work of Hinton, LeCun, Bengio, and others that cultivated those seeds, provided the right conditions (like massive datasets and GPU computing), and ultimately made the forest grow. The awarding of the 2018 Turing Award to the three “Godfathers,” without including him, was seen by many as the definitive statement from the establishment.
The Vision of a Universal Problem Solver
Beyond the controversy, Jürgen Schmidhuber remains a profoundly original and ambitious thinker. His ultimate goal has always been far grander than just building a better speech recognizer. He is driven by the quest to create a true AGI, what he sometimes calls a “universal problem solver.”
His vision is rooted in algorithmic information theory and the principles of compression. He believes that learning is, at its core, a process of finding and exploiting regularities in data to create a compressed, predictive model of the world. A simple example is the mathematical constant Pi. The sequence 3.14159… seems random, but a very short computer program (a compressed representation) can generate its digits infinitely. For Schmidhuber, this is the essence of intelligence: finding the simple program that explains the complex data.
This philosophy has led him to explore concepts like artificial curiosity. He designed systems that are intrinsically motivated to explore their environment, not to achieve an external reward, but to find data that helps them improve their internal world model—data that is “interesting” because it is both novel and compressible. He sees this as the driving force behind the creativity of scientists and artists, and he believes it will be a key component of any true AGI.
Now, as the Scientific Director of the Swiss AI Lab IDSIA and the co-founder of a company called NNAISENSE (which stands for “Neural Network AI Sense”), he continues to pursue this grand vision. He aims to build the first “practical general purpose AI,” a single system capable of tackling a vast range of problems across industries, from manufacturing to finance.
Conclusion: The Unyielding Pioneer
Jürgen Schmidhuber’s place in the history of AI will likely remain a subject of debate for years to come. He is a complex figure: a brilliant, undeniably prescient pioneer whose foundational work on LSTMs powered a decade of progress, and a relentless, often abrasive critic of the very field he helped to build. His story highlights the uncomfortable truth that the process of assigning scientific credit is as much a social and political process as it is a factual one.
Whether one views him as a slighted genius or a difficult revisionist, his technical contributions are undeniable. The Long Short-Term Memory network was not a minor tweak; it was a fundamental architectural breakthrough that solved a critical problem and enabled the first wave of truly useful natural language AI. His early explorations into meta-learning, unsupervised pre-training, and artificial curiosity were years ahead of their time and laid conceptual groundwork that is still being explored today.
Ultimately, Schmidhuber’s legacy may be that of the unyielding pioneer, the researcher who toiled in the AI winter and refused to let his contributions be buried by the snows of time. He forces the AI community to look back, to question the simplicity of its own origin story, and to acknowledge the tangled, multi-threaded nature of scientific discovery. He is the heretic who challenges the canon, demanding that we remember that the roots of this revolution run deeper and spread wider than the official narrative might suggest.