eystein's comments

eystein · on Jan 29, 2020

I'm involved with Mender which focuses on OTA updates and does not limit the type of update you can deploy, some popular choices: * full system image (with robust a/b layout for rollback support, can also do delta updates to reduce bandwidth with 70-90%) * docker containers * single files or directories * deb packages

You can see the ones OOTB here: https://hub.mender.io/c/update-modules

Also, you can write your own update type if you have a more custom use case: https://docs.mender.io/2.2/devices/update-modules

Does that make sense? Git and buildpack support is not yet OOTB but should be fairly simple to add with a new update module. Note however, that those type of updates may not be easy to roll back (or make atomic - meaning you can get partially installed updates and bricked devices/applications if you lose power).

eystein · on Jan 29, 2020

Hi, I am also working with the Mender project and we have been in touch earlier.

The device support for mender-convert will be drastically widened (actually, there will not be any limit as such) in the Mender 2.3 release, beta ETA in a week or two and final in March. But you can already test it with the development branch today: https://docs.mender.io/development/devices/debian-family

There will also be stock converted images for the most common development boards.

Let us know what you think!

eystein · on Jan 18, 2018

Disclaimer: I work on the Mender project.

Signing and verification in Mender is covered here: https://docs.mender.io/artifacts/signing-and-verification

eystein · on Feb 27, 2017

Signing an archive would probably be good enough for many cases. Block level is a bit simpler (all or nothing) and thus less risk of mixing with unsigned parts (sideloading attacks).

For security-sensitive embedded devices (e.g. payment terminals), block level signatures would allow hardware verification during boot as well (1st stage bootloader verifies 2nd stage, then kernel, etc.) if designed correctly.

eystein · on Feb 27, 2017

Yocto has quite large community and is growing fast. That said, think of Yocto as the first integration not the only - buildroot is surely interesting too but we had to start somewhere. :)

eystein · on Feb 27, 2017

I work on Mender, so I can tell you how automated rollback works there.

The update is written to the inactive rootfs partition, uboot is configured to boot from it and the device is rebooted. Using the bootcount feature of uboot it is possible to roll back automatically if booting fails. Once the mender daemon comes up it will try to report the success of the deployment to the server. If this fails it will also roll back. Only after successfully reporting the success to the server Mender will "commit" the update, meaning configuring uboot to persistently boot from this updated partition.

Mender already does compression, but you are right that there are optimizations that can be made for application updates, e.g. delta or other types of updates. We are planning to implement this as well. The first priority for Mender is to make it robust, i.e. make sure the update is atomic and that you can always roll back.

dividuum · on Feb 27, 2017

> Using the bootcount feature of uboot it is possible to roll back automatically if booting fails.

I see. Thanks for the info. I suspected that u-boot does have support for that, but I wasn't sure.

> Once the mender daemon comes up it will try to report the success of the deployment to the server. If this fails it will also roll back.

Is there any deadline at all for that? I explicitly spawn a reboot command that ensures that even if everything gets stuck (in software, not in hardware) for whatever reason, the system falls back to the previous version (unless the reboot command gets killed too, in which case a manual restart is required). Any thoughts on that?

ralphmender · on Feb 27, 2017

This is a valid point. If booting just hangs after the bootloader but before the Mender daemon comes up is actually quite tricky to manage.

We have looked into hardware watchdog for this, but it is in the gray-zone of what an updater should be involved in. This is actually a more generic problem - maybe it hangs even when you did not deploy an update. There is varying support for hardware watchdogs across boards as well, unfortunately.

Most of the time it will not just hang, maybe it will crash or kernel panic and in those cases Mender will rollback. But the indefinite-hanging case is quite tricky and not yet handled.

Would be open to ideas here.

xyzzy_plugh · on March 1, 2017

Hey! Love the product, but I'm out of the embedded game. Thought I'd give my $0.02:

The first step of our boot process was to enable the watchdog. We extend the timeout periodically during the boot process, but generally if userspace isn't reached within 30 seconds or so we reset. Once in userspace, the daemon validates that things look good (this includes things beyond just application of the update -- did services start up correctly? Is the hardware operating as we expect?) before disabling the watchdog and marking the update as a success, at which point rollback isn't possible. At this point we might consider applying new updates, etc.

We also modified our first stage bootloader to be resilient to bootloader update issues, and chainloaded our second stage bootloader from a stub which could rollback.

We also niced the update process to avoid resource contention, allowed the updates to be delayed until the network was quiet, and paused them when it became noisy to make for a good user experience. There was a server-side flag to force updates to apply regardless, with higher priority, as well as one to basically disable all other functionality in the case of a unforeseen serious, perhaps security related, issue.

We actually had a discrete watchdog service which was responsible for petting an always-on watchdog, to rescue the system if it locked up or became unresponsive (if certain processes were not running, or responding, the watchdog would not be pet).

All of this led to effectively 0 failures in the field, a seamless user experience (except for the 30-second reboot when inactive). I wish everything I owned worked this way.

I could talk ad nauseum about this stuff. It's very cool to see the designs of others. I feel this is an under appreciated and under explored problem space.

eystein · on Feb 27, 2017

It is not that uncommon for an updater to support both local and remote updates. For example, Mender has two modes of operation: standalone and managed [0].

Like you, many teams are still doing local updates, or transitioning from local to OTA, at least for some products.

[0] https://docs.mender.io/1.0/Architecture/Overview#modes-of-op...

eystein · on Feb 27, 2017

Thanks! :)

Yes, we have looked into it and the nice thing is that TUF seems to be quite easy to add as an additional security layer down the road.

One interesting challenge is downgrade attacks. How do you allow rollback of a bad deployment while disallowing an attacker to deploy an old and vulnerable version?

eystein · on Feb 27, 2017

Cryptographic signing and verification is in scope for Mender [0], and frankly it should be in scope for all updaters -- too many hacks have happened due to lack of codesigning.

[0] https://tracker.mender.io/projects/MEN/issues/MEN-1020