Hacker News new | past | comments | ask | show | jobs | submit login
Chick Fil A's Edge Enterprise Architecture (medium.com/chick-fil-atech)
93 points by freshrap6 on Jan 12, 2023 | hide | past | favorite | 50 comments



> The goal of the Restaurant Edge Compute platform was to create a robust platform in each restaurant where our DevOps Product teams could deploy and manage applications to help Operators and Team Members keep pace with ever-growing business, whether in the kitchen, the supply chain, or in directly serving customers.

> (Previous article) Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

I must admit, from an outsiders perspective, it really sounds like a bunch of buzzwords justifying a solution in search of a problem. Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision (which I can't find on Google). In the end, it turned out it was a lot easier for a human manager to simply open a new line when required, and the computer vision provided the wrong forecasting to be accurate. I wonder what CFAs success criteria and metrics are for this project.

Tech-wise, wouldn't it be a lot simpler to do a single node, single application that gets updated via something like RAUC? Especially if you have a small team (which they emphasized), it seems to me like adding a Kubernetes cluster at the edge adds complication without much benefit, other than "redundancy" (how redundant is a single rack with the same power source anyways?). Also, how would they get an important security update to the host, if it becomes necessary?

It's a lot of nitpicks, but the project overall is very cool. Sounds like they solved a lot of hard tech problems and executed well on the ops.


Fundamentally what they've deployed here is something a lot of organisations struggle with or don't even recognize as worth having - A reliable edge which you can trust sufficiently to form the functional core of a site.

For a restaurant chain this is something worth putting the development effort into because once you've figured it out and ran it for a few years to demonstrate it's reliability you can pitch the shift from a network-optional edge at each site to a network-dependent site with intelligent components hanging off it and depending on it. That's a pathway to having a major competitive advantage in the medium term that your competitors won't be able to put into place overnight once they realize you've left them behind.

You can't get there with the amount of effort often put into untrusted edge sites like this - aka a pc in a cupboard. You also can't get there with cloud when the weakest element in the chain is unrealiable site connectivity.

They could have done it in a lot of different ways, but going with cheap commodity hardware and avoiding expensive cluster license nonsense (vSphere etc) were smart choices. Spend that money on a compenent centralized tech team rather than vendor shinyness, and you can do a hell of a lot more (and often move faster, to boot).


> Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision (which I can't find on Google).

CFA leads the industry in revenue per site. I think more accurate forecasting is a significant factor contributing to this. Their sites aren't larger or better located than their competitors. In fact, they're often right next to their competitors in a similar footprint. Since I have a CFA nearby which I drive by multiple times a day, I've seen first hand that they always have substantially more cars in the drive-thru line and parking lot than their competitors in the same strip mall parking lot.

Customers will see the overflowing CFA line at the drive-thru yet still choose to pull into line because they've learned that CFA's throughput is dramatically faster than their competitors. In my experience I'd guess about 2x-3x faster which is incredible when you think about it. They achieve that by getting a lot of things right but it seems obvious ensuring their order delivery backlog is as fast as possible through more accurate load prediction would be a key factor.


Buzzwords or not, Chick-fil-A has an incredibly efficient setup that has really improved significantly in the last four years.


All I know is that I can be car #105 in line at Chick-fil-A and I'll get my food faster than being car #2 in line at McDonald's. Whatever they're doing is working.


Yeah, in my experience they are by far both the fastest and most consistent fast food restaurant. I've read they are also the highest revenue per-site chain.


I read that they have an incredibly efficient setup, sometimes in the drive thru you get served 100 places before some competitors.

And they have the best revenue per site.


I agree that K8s is adding unnecessary complexity here. But the edge computing idea might actually apply in a couple ways.

Mostly though, any data they collect will be very valuable, as forecasting is a core component of fast food logistics. Fast food lives and dies on efficiency.


> Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision

Maybe this?

https://ixr.com/queue-prediction


Instead of building the entire stack using (OS + K8) why not use Azure IoTEdge or AWS Greengrass for fleet management? These services seem to have solved a lot of the problems (low-footprint, redundancy, cloud management) already.


Picking standard open source starting places seems like a more than obvious move.

Saying you want to invest in ongoing intense data-driven store innovation, then building the whole thing atop a platform that you cannot rely on (may get discontinued, price may become huge, may become a barrier to technical innovation), that you dont control seems like an obviously bad move.

Finding smart people, rolling up your sleeves, & recognizing this as a core competency, an enabler, a driver of your business, & not outsourcing the problems, is the right move. If future teams do a better job building edge kubernetes, there should also be good portability.


Cloud services like those tend to be limited in annoying ways. Also sometimes they simply vanish (@google). The lock in is also a negative.


The lock in isn't a negative, it's just a cost. If you didn't build it yourself, that was a time and expertise savings. If they go away, you either just use the other vendor, or you have to pay for the time and expertise now, which you would have been paying anyway if you didn't use the vendored solution to begin with.


> The lock in isn't a negative, it's just a cost.

> just use the other vendor,

The point of lock in is to make it not a “just” to use a different vendor.

I also disagree with characterizing it as a cost. It is a negative because of its risk.


And they mention that in the article, too.


They talked about this at a Kubecon iirc. I wasn’t sure if it was an elaborate prank but they were seriously smart folks who patiently explained why they needed to do this and I remember being very impressed.


People laughing at chick fil a should look at dominoes stock price. Which has absolutely exploded due to technical innovations (and a new recipe)


HN discussion on their initial 2018 blog post: https://news.ycombinator.com/item?id=17820626 (570 points, 392 comments)


Refreshing to see dynamic and nimble solutions from a large organization.


It's how they dominate.

It's all the more important as walk ins and drive thrus reduce while delivers continue to rise.


   Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

   As a simple example, imagine a forecasting model that attempts to predict how many Waffle Fries (or replace with your favorite Chick-fil-A product) should be cooked over every minute of the day. The forecast is created by an analytics process running in the cloud that uses transaction-level sales data from many restaurants. This forecast can most certainly be produced with a little work. Unfortunately, it is not accurate enough to actually drive food production. Sales in Chick-fil-A restaurants are prone to many traffic spikes and are significantly affected by local events (traffic, sports, weather, etc.).

   However, if we were to collect data from our point-of-sale system’s keystrokes in real-time to understand current demand, add data from the fryers about work in progress inventory, and then micro-adjust the initial forecast in-restaurant, we would be able to get a much more accurate picture of what we should cook at any given moment. This data can then be used to give a much more intelligent display to a restaurant team member that is responsible for cooking fries (for example), or perhaps to drive cooking automation in the future.

   Goals like this led us to develop an Internet of Things (IOT) platform for our restaurants. To successfully scale our business we need the ability to 1) collect data and 2) use it to drive automation in the restaurant.
The football game next door is over and the home team won? Start extra burgers in anticipation of hungry fans - great. I buy that.

The whole thing can be one app running on an iPad with multiple redundant data plans enabled, esims from AT&T and Verizon or whatever. You're going to need a touchscreen tablet for the POS anyway, no need for additional hardware or Kubernetes.


The game is over and carryout orders are starting to flood in from hundreds of customers on the smartphone app. And now Grubhub, Doordash, and UberEats are sending orders too.

The iPad is going to handle that and signal to the cooks to drop more chicken tenders?


Do people go out to eat more after wins than losses?


No idea, but I'm sure a super fancy machine learning big data model can run on the iPad/POS itself instead of 2000 Kubernetes clusters.


Could we have replaced the EdgeCommander with infrastructure tools like Puppet or Ansible?


Year 2023, FAA systems fail nationwide due to a “corrupt file” and Chick Fil A is highly redundant with a Kubernetes cluster running every store.


The private sector will almost always be more efficient than the public sector, but that said, the last time that system in question was in the news it was because it had been given a more "inclusive" name. It speaks to the priorities of the regime.


Isn’t NOTAM run by a private company already? I don’t think the FAA runs the actual hardware.


But why?


When I was at Walmart the holy grail was edge compute in every store (which I think is now live). There is a mainframe in each store that powers point-of-sale, inventory, etc. But in terms of building modern apps, you couldn't assume 24/7 local connectivity was a thing or that it would be fast enough for what you wanted to do.

Making each location its own failure domain was also a huge win. Imagine a cloud outage taking out hundreds of stores.


>There is a mainframe in each store Nope. vSphere clusters in every store. Possibly the biggest ‘field’ footprint of vSphere that exists.


That might be true, but the heart of every store is an AS/400 or whatever IBM now calls the iSeries replacement.


HA design for stores is interesting. You want the store to stay up (dispensing goods and accepting payments generally, additionally preparing food in this case) as much as possible. At the same time you want a solution that is cheap to deploy and maintain.

Making everything in the store a dumb client is cheap and easy, but also fragile. Doing as much computing in each store (and even on each POS) as possible is great for HA but now you have more complicated hardware and software deployment problems. Different merchants trade these off in different ways.

CFA seem to have gone for a lot of computing in the store, and the rest of the design is about mitigating those deployment and maintenance problems and costs. I like the NUC cluster, Gitops, API and support team stories. Am less keen on the K3S deployment per store, seems like a questionable choice of orchestration engine for this scenario but maybe there are details of rest of their store architecture that I'm missing.


I remember 10 years ago when my friend asked me to go look at his restaurant client's computer and diagnose/fix if possible.

The 486 based PC had a mix of grease and lint/dust on every possible surface including the power supply fan, all cabling and the entirety of the motherboard. It had been placed on a shelf near one of the deep fryers and had run without problems for years. Certainly the other end of the 'long tail' of computing!


The previous article linked in the post goes more into the "Why"

https://medium.com/chick-fil-atech/enterprise-restaurant-com...



It sound a lot better than what we had when I worked at a fastfood restaurant: every location remote desktoping to some overloaded server in the regional office, so simple things like printing an inventory list would involve a lot of waiting.


Why do all the tech? The goal is to have the ultimate model of just-in-time manufacturing: to never run out of supplies and, without sounding creepy, to know precisely what the customer plans to order and have it in a bag in and in their hands the moment they arrive at the store.

There are upstream benefits for marketing to see the feedback of their campaigns in real-time, but this is mostly about keeping inventories low and service times lower. That equates directly to profit.

CFA isn't the only chain working on this, but they're the most open about how they're implementing the infrastructure.


Justifying the most expensive franchise 'take' of any chain in the nation, apparently.

They're basically the mob of the fast food industry. They only want $10k from you to start your franchise; they cover all the costs of starting the business. The tradeoff they have the highest percentage take of any franchise.


Almost like they gave you a bunch of capital and they want their money back. Weird huh? "mob of the fast food industry" That's funny.


[flagged]


It's easy to have a low-effort cynical take about Chick-Fil-A's use of Kubernetes in its restaurants, but the actual post outlining why they chose to provision each restaurant with a 3-node Kubernetes cluster [1] makes a fairly compelling argument. Simply put, they needed to ensure that their point-of-sale terminals had bulletproof reliability, and they wanted to do it in a way that leveraged commodity hardware. The solution they came up with, Kubernetes clusters running on consumer-grade Intel NUCs is a fairly elegant solution to that problem. It reminds me of how Google chose to run on "commodity" Intel servers, rather than "enterprise-grade" hardware from IBM or Sun, realizing that, at scale, failure is guaranteed, no matter how overdesigned your hardware is, and it's more important to design for redundancy, easy provisioning and replacement than it is to design for absolute uptime. Chick-Fil-A has come to a similar realization for their point of sale systems.

[1]: https://medium.com/chick-fil-atech/edge-computing-at-chick-f...


Low effort is pointing out just one thing that may or may not even help. This doesn't address the machine learning pieces, 100k sensors per restaurant, fryer sensors, everything else that is listed. It's a chicken sandwich restaurant. How much will all of this really buy over typical weekly/monthly/yearly sales trends?

It's not just CFA. I for some reason watched a Taco Bell video on LinkedIn that layed out one small problem regarding the ability to accept orders, and how a -completely- new k8s microservice architecture addressed it. This smells a lot of the same, mixed in with more tech overoptimism. GE had the same spiels 10 years ago, with better use cases, and realized most of it was a dead end.


I don't know how Chikfilas are near you, but with the ones I've been to there are often lines of cars wrapping around the building a couple times. And when that happens, there are several Chikfila employees walking car-to-car with tablets, pre-emptively taking and charging orders from customers.

Allowing them to batch-make orders from cars that would otherwise have to wait for someone else to get their food to place the order.

Not having to have the stores talk to a remote server to record an order during peak load could be a win. Not sure if it justifies "edge computing" but if a fast food restaurant needed it, it'd be them. I've never seen anything like that level of congestion at McDonalds or Taco Bell.


Yep, same. Actually, same everywhere I've ever lived in the US, come to think of it.

I think CFA has a cult following, due to politeness and quality. In all my rambling, I guess my point is that CFA could go back to pen and paper, non electric cash only cash registers, and still probably not miss a beat.

Given that we both know there will always be a line blocking the highway, every day but Sunday, what do all the sensors, machine learning, etc really enable? I guess I'm just asking for a bit of pragmatism, even where it obviously isn't my place to say so.


Yeah, but on the other hand, just a few days ago, I went into a Potbelly and had to leave because the point of sale system was "rebooting", and they could neither process credit cards nor even get the cash drawer open to handle sales the old fashioned way. Pretty much everyone in line behind me left too once they realized what was going on. Maybe if they'd had a Kubernetes cluster keeping their local systems online, they wouldn't have missed out on a busy lunch rush.


It sounds like you are skeptical that computers and automation may be useful to streamline and make more efficient what is otherwise a tedious, manual process with many logistical elements.


They also seem to be conveniently located in a lot of places. I never ate there until recently when I moved and wound up with one right next to my gym. Now I eat there multiple times a week, and yes the politeness is a big factor I’d say.


Dual redundant internet(!), the wifi gear to reliably support that many devices, the cost of more expensive equipment, the added cost of support contracts to keep those fancy sensors working, especially since in a lot of locations they're probably the only food service place using them.

...and then the infrastructure of running the cloud services to support things like every restaurant's K8 cluster hitting a git server constantly throughout a day.

...and then there's the overhead of three separate dev ops teams.

etc.


See my other comment [1] re: my experience at Potbelly. Restaurant usage is extremely bursty. For a lot of restaurants in business-y areas, you basically have a single spike of traffic from roughly 11:30 to 13:30 that drives >90% of the day's sales. You miss that, and you may as well write off the entire day's revenue. Investing in the redundancy (two internet connections, three servers, etc) to ensure that the point-of-sale system is online for that spike can be worth it, even for a relatively low-margin business. In fact, it might be more important for a low-margin business given the ephemeral nature of restaurant sales. It's not like e.g. Amazon, where if you're down for a few hours it doesn't matter, because customers will wait to buy their knick-knacks. If you're down for a lunch rush, you've lost that day's sales forever. After all, it's not as if customers are going to stay hungry and come back and buy two sandwiches tomorrow. They'll just go somewhere else for lunch.

[1]: https://news.ycombinator.com/item?id=34350556


Other restaurants could use some of the ML stuff. Starbucks for instance. I go there all the time ordered an iced mocha and 1 out of every 5-10 times I get a white mocha and they have to remake it. That’s 7$ they lose out even if it was all profit. This could be easily fixed with a camera and a ML model that ensures the drink is the right order.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: