Most of the time, these things are resource hogs arriving way before their time to shine, either needing Moore's law to catch up the hardware, or some nerd to wrestle with the combinatorial explosion and win. Transformers can be seen as a variation on Markov chains, but the innovation of attention mechanisms means you can use hundreds of thousands of tokens and thousands of tokens in sequences without the problem space going all Buzz Lightyear on you.
Ultra Hal was a best in class chat bot when fixed response systems like Alice/ AIML were the standard. Ultra Hal used Markov chains and some clever pruning, but it dealt with a few hundred tokens as words and sequences only 2 or 3 tokens out. It occasionally produced novel and relevant output, like a really shitty gpt-2.
I think we may see a resurgence of expert systems soon, as gpt-3 and transformers have proved capable of automating rule creation in systems like Cyc. They've already incorporated direct lookups into static databases gpt / RETRO type models. Incorporating predicate logic inference engines seems like the logical and potent next step. GPT could serve as a personality and process engine that eliminates the flaw (tedium) in massive, tedious, human level micro-tasking systems from GOFAI.
It's worth going through all the literature all the way back to the 1956 summer of code and hunt for ideas that just didn't work yet.
...Markov Chains (via MCMC) underly most Bayesian inference problems, and pretty much all stochastic dynamical systems models are based on Markov Chains.
Not to mention that the entire class of Markov Chain Monte Carlo techniques only form a subset of general uses for Markov chains.
Markov chains form the basis of n-gram language models, which are still useful today.
Markov chains are also the basis of the Page-rank algorithm.
Hidden Markov Models (which are just an extension of Markov Chains to have unobserved states) are a powerful and commonly used time series model found all over the place in industry.
In the pre-deep learning model Markov chains (and HMMs) in particular had very wide spread usage in Speech processing.
They are probably one of the most practical statistical techniques out there (out side of obvious example like linear models).
Not to mention, it was less than a decade ago that one could have said about neural networks "Decades pass and you realize they either have little to no application or are incredibly niche".
To give an example: Prediction by Partial Matching is basically a Markov chain in disguise, and an incredibly powerful way to do compression that beats most other forms of text compression (at the price of having a lot more memory overhead)
Decades pass and you realize they either have little to no application or are incredibly niche :(
Too bad that "solution in a search of a problem" is generally bad approach to problem-solving. I wish our industry was more fun as a whole.