Hacker News new | past | comments | ask | show | jobs | submit login
Extreme Pi Boot Optimization (kittenlabs.de)
558 points by todsacerdoti 5 months ago | hide | past | favorite | 153 comments



Power is really one of the weaknesses of the rpi family (I'm quite excited for the new pico 2 for exactly this reason - it seems like they're finally making it easy to enter a relatively deep sleep without external hardware).

I built some cameras for an application like this using a Google Coral mini, whose camera is not nearly as good as the HQ cam, unfortunately, but it supports a built in suspend + wake from onboard RTC that is very easy to use and perfect for a periodic camera app - while still having enough oomph and 2GB of memory to handle a high resolution image. (You can physically hook an HQ camera up but the software pipeline doesn't exist to manage it on the coral AFAIK.)

The Rpi ecosystem is a lot more mature and (sorry, friends) I trust the future availability of rpi more than I trust Google to keep delivering the coral line, but it really underscored how helpful good power support in the hw was.

(Ironically, we ended up outsourcing the next version of these cameras to a firm that built them using an rpi and we just threw in a much larger battery to compensate. Which means I have a stack of 100 unopened coral dev minis + cameras looking for either good ideas or to sell to someone. Oops.)


Isn't the Coral line already dead/discontinued? The site (coral.ai) seems to have been last updated in 2021 and it says Copyright 2020.

Oh god, just searched for "google coral twitter" looking for an official twitter presence of the project and the second hit was a tweet of yours looking to sell your 100 excess boards.


I haven't seen anything official indicating it's discontinued, and they've released a few updated libraries, but there certainly seems to be very little momentum and I'm skeptical about its future. I think some friends have still bought the USB accelerator version of it to use with the frigate DVR, though.

Not a good sign when offers to sell old ones feature prominently in the search results. :)

It's a shame. They're actually really quite nice boards. I may have to sell them from the startup to my academic self and use them for some project or another if nobody wants them - but I don't really want to teach a machine vision course.


Well, they haven't announced it's dead.

In the electronics industry, even 30-year-old microcontrollers often have pin- and software-compatible replacements available to this day. So just because something has long been surpassed, doesn't mean it becomes unavailable.

The electronics industry also has an orderly process for discontinuing parts, where customers are given advanced notice and a chance to place one last order if they want to stock up. This process hasn't been started for the Coral accelerators.

With all that said - it's pretty clear that Google has lost interest in the product. When you're making hundreds of billions of dollars from ads, who's got the time for a product line that's bringing less than 0.1% of that?

So personally I wouldn't base a new product on Coral today.


It is really common for Google to delete projects out of the blue. So don't be surprised that they haven't much updates.


Interesting, I’ve always run pis from wall power. Is the pi hardware incapable of similar power optimizations to coral, or is this a problem of a lack of software support for power management on pi? (I assume from your mention of external hardware to manage power that it’s not just a software issue.)


A typical battery-powered IoT device like the Ring video doorbell will last ~2 months on a ~6000mAh ~3.3v battery. An average power draw of about 14 milliwatts.

This is quite low - a power LED can use more than 14 milliwatts. Of course some products have power consumption even lower than that, right down to the tens-of-microwatts level.

Meanwhile a raspberry pi, when idle, consumes ~3 watts [1]. That's 200x more than the video doorbell.

Getting the power consumption down requires (a) that your hardware draws very little power when it's in sleep mode, and (b) that it spends as much time as possible in that sleep mode. Hardware and software have to work together to achieve this, and the software changes can be extensive.

[1] https://www.jeffgeerling.com/blog/2024/new-2gb-pi-5-has-33-s...


I'm really mixing two things and wasn't very clear about it.

The pico can kinda deep sleep but it requires an external wakeup trigger. It can't deep sleep from its own clock. Even so its deep sleep is pretty high power compared to most embedded chips.

The zero (w) and zero 2 w don't have the equivalent of suspend-to-ram with a low sleep current. I'm not sure if that's a limitation of the SOC or the driver or both, but rpi was fairly clear it wasn't in the cards: https://github.com/raspberrypi/linux/issues/1281


I'm pretty sure it's the hardware. The SOCs they use in the mainline pi products are generally targeted at applications which aren't battery powered, so they don't focus on fast sleep support and similar power optimisations (which make the system design much more complex). Unfortunately if you want that you generally need to go with phone SOCs which are generally incredibly NDA-bound, very hard to buy even if you're as big as the rpi company, and have short availability windows which runs counter to requirements of large parts of the SBC market segment.


Curious if you have looked at the BeagleBone hardware with its PRU devices for low-power operation; they can stay awake while the system sleeps.


Out of curiosity what are you doing that can’t be done with an OTS camera these days?


Dual cameras at 90deg to each other with lightweight onboard ml to decide capture / no capture + geotargeted higher frame rate captures for regions of interest. In a housing that can handle hard impacts from brush and off-road use (we take pictures of pasture).

A lot of the video solutions have much worse image quality than a still camera operating at 1fps to 1/20 fps; you can stick a quite good camera on an rpi.

It's quite likely we could COTS this but there was no interest from vendors when we wanted to start with 200 of them. So we went with a custom solution.

(We are https://enriched.ag -- if you scroll down you can see our overly-heavy-metal but quite tough camera unit. Some day it will be injection molded instead...)


I will contact you, I may be able to help with a better solution. This is something I do a lot of.


Please do, I'd love to learn. We just completed a run of our latest version so we're a ways from thinking about a new design, but it would be helpful to know know what I should be thinking about for our roadmap.

dave.andersen at Gmail is probably my easiest address.


You can still save quite a bit by bundling your application in an initramfs linked into the kernel, which obviates the needs for any filesystem mounts in simple cases.

In some cases, you can even replace something like BusyBox init with a simple bash script that does the bare minimum to boot your application. Mounting devtmpfs, proc, sysfs, etc. Dumping glibc is also worth exploring, if feasible.

Chroot is a good tool to test your initramfs and see if all the necessary application dependencies are present before bundling it into the kernel. If you can run it in a chroot, the kernel can run it during boot, and the development loop is much tighter.

Disabling kernel modules and enabling only the features needed linked into the kernel will save further space and boot time.

It would also be helpful to test zstd compression instead of gzip.


On the flip side of this, if your kernel + initramfs is being loaded slowly (either the previous boot stage and is not using the hardware at its full capacity or the image is large enough that it would be better to do something else in parallel with loading the image), then having the smallest practical image that can load the remaining software after userspace starts can be faster.


Absolutely, there's a point of size and complexity where trying to bundle everything into an initramfs image is counter-productive.

One example of loading the kernel and initramfs slowly is netbooting over TFTP. You're better off with a smaller kernel that can load the rest of the rootfs over a faster protocol, i.e. NFS, NBD, iSCSI, etc. Alternatively, you can load a bootloader that supports a faster protocol, such as GRUB, which can load the kernel binary over HTTP.


You actually don't need a shell script to mount the different pseudo filesystems. You can do that in your application. So all that remains is an initramfs with a statically linked binary.


Very true, you can call into the C library or use system calls directly and have your application do all the init itself.


Two other good articles on decreasing Pi boot times are:

- https://www.furkantokac.com/rpi3-fast-boot-less-than-2-secon...

- http://himeshp.blogspot.com/2018/08/fast-boot-with-raspberry...

I used these two to make a digital photo frame with a Pi that boots very quickly to a browser in kiosk mode. If you have very minimal requirements, you can get some very impressive boot times.


Reading the first article it seems like OP could benefit from using start_cd.elf (3rd stage bootloader, but with the graphic subsystem removed), they report a 0.5s improvement in loading time


Thabks for the link ! Got the code accessible for this kiosk picture frame by any chance ?


The real tragedy is the proprietary bootcode.bin gpu code that is a blackbox and we don't have the source code for.

How horrible that a tinkering/hobbies project has to have these hidden secret blackboxes that can't be modified.


> The real tragedy is the proprietary bootcode.bin gpu code that is a blackbox and we don't have the source code for.

The Pi firmware is ThreadX, later bought by Microsoft and renamed Azure RTOS.

It is now FOSS.

https://www.theregister.com/2023/11/28/microsoft_opens_sourc...

That does not mean the whole Pi firmware is automatically FOSS -- drivers are not -- but they could if they wanted.


I'm guessing that having the source of the bootcode available would allow for extreme tinkering to the point that RPI can no longer guarantee it functioning properly? Or maybe it has something to do with proprietary drivers loading? Curious as well what's in there that they need to keep it closed source.


Well, you can already do extreme tinkering to the point that RPi can no longer guarantee it functioning properly :)

IIRC it's just that the bootcode.bin file is provided by Broadcom, and not the RPi foundation, so they can't open source it because they don't have the license to do so (this isn't the only proprietary blob in the default Pi distribution, but most of the other ones have open source alternatives/aren't that necessary/are open source now that they use the RP1 chip instead of broadcom peripherals).

There's similar, slightly more open arrangements, with the Pi Pico W, where they can't provide the firmware for the Wifi chip, but they can provide a library to interface with it, with the caveat that the license _only_ allows for that library to be used with the RP* family of microcontrollers [1]

[1]: https://github.com/georgerobotics/cyw43-driver/blob/faf36381...


Other than broadcom licensing, I can guess that there's a warranty issue because the firmware controls the voltage.


You can kill Pi with a thousand ways. I don’t really understand how could a warranty work outside of obvious cases…


The firmware is in control of the clocks, and in case too high of a clock is specified in config.txt it burns a 'warranty void' fuse.


I like the article as a whole, but I'm unsure about this point:

> For example: Disabling CPU turbo just to save some current consumption is a bad choice, because the resulting extra time will use more energy than just getting the job done quickly and shutting off.

In one of my computer engineering classes, I learned that power consumption rises as the square of clock frequency - so doubling the clock will quadruple the power.

That seems like it'd imply that you'd actually have to measure the power difference to determine if the quadratic increase from the clock boost will outweigh the product of the constant power consumption with the additional time spent on the task.

Related - it'd be nice if the Pi's CPUs included granular power consumption information, either derivable from the datasheet, or as real-time values exposed in registers.


> In one of my computer engineering classes, I learned that power consumption rises as the square of clock frequency - so doubling the clock will quadruple the power.

This is not quite correct. Switching power of a chip (ignoring static leakage) is proportional to voltage squared times frequency. Most chips require a higher voltage to reach higher clock speeds, so there is a quadratic relationship there. However, I believe that the raspberry pi does not have dynamic voltage control, so reducing clock speed without also reducing voltage will not effect total switching energy consumption.



This is a well understood power optimization strategy called race to idle. It works because there are a lot of periferals taking power in addition to the cpu that you can’t switch off until the cpu is done.

There is also definitely a sweet spot. If you overclock the cpu too much your performance per watt drops too far and race to idle won’t work anymore.


For a continuous workload that's a reasonable rule of thumb, but it doesn't tell the whole story. You always have a certain static power draw, just from having a component enabled. So modern embedded systems will often use a "race-to-sleep" or "race-to-halt" strategy where they will execute tasks really quickly, before shutting down most of their components waiting for the next event to trigger.


There's a base amount of power overhead the device will use no matter what, even if it does nothing. They even provide benchmarks that show that current consumption for turbo increases 10% but reduces boot time by 11%, for a small but measurable difference in total energy used.


What about the internal resistance of the battery? Doesn't that increase with higher current?

As in 1A for 2 seconds uses less actual battery power than 2A for 1 second due to internal loss in the battery?

I may be remembering this wrong, It has been a long time since I studied this stuff.


That’s true when you’ve saturated all of those subsystems but not when you’re just CPU bound. If you’re doing high throughput from disk to memory to CPU and back to disk, there are levels of use where throttling IO helps with battery draw. There are old papers on the subject, and I have a suspicion that OS X started doing something of the sort when they went to nonremovable batteries in the MacBook. There’s a 30% reduction in power draw in that generation that they brag about but don’t really explain, and it was a handful of years after that first paper showed up.


This is very interesting, thanks for sharing!

So if it takes 1J to do some computation in 1 second (say 1GHz at 1W), you're saying that in the perfectly spherical cow case, it takes 2J to do that same computation in 0.5 seconds (2GHz at 4W).

However, that's just CPU consumption, if the overall system has a static rate of 4W, then it takes 5J (1J CPU, 4J system) at 1Ghz to do the task in a second, or 4J (2J CPU, 2J system) at 2GHz to do the task in 0.5 seconds.

Am I understanding you correctly? Basically, if the overall system's power consumption is similar to the CPU's power consumption at turbo, then it makes sense to turbo, if not, it doesn't?


I think you have it right but also my experience from optimising Android power usage was: your your intuitions are helpful for knowing what to try, but you have to test and measure everything as there are always complications. Luckily you are well equipped to benchmark it already :)


Yes, this is all correct, as long as you're implicitly assuming that the CPU itself has some static power dissipation as well (which it does) in addition to the rest of the system.

Unfortunately I missed the actual benchmarks in the article that empirically measured the power difference.


Impressive. But every time I read one of these pieces I remember when I recorded Plan 9 booting on a Pi Zero: https://taoofmac.com/space/blog/2020/09/02/1900#resurrecting (GIF is real time output).


Nice in its own way.

But will that allow OP to load the camera and wifi drivers he needs for his project?


IMHO boot times of Linux distros in general are rather sad, which is then significantly amplified on weak(er) hardware such as this. I've gone through similar efforts with the MQ-Pro SBC. One can also really feel this on laptops (except Macbooks I guess). Annoying.


It very much depends what you define as "boot time". For example, Windows optimizes for time to fist UI, meanwhile everything else continues to load and the PC stays unusable for multiple seconds after "boot".


This is frequently due to startup apps, not Windows itself. I generally find when I disable stuff like Steam, Teams, Creative Cloud, OEM (e.g. HP) software, etc Windows is usable as soon as I log on. My work Windows laptop has most startup apps disabled (with the exception of OneDrive and Teams which I use frequently enough that it makes sense for it to be enabled on my work laptop) and I log on to a desktop ready to go. My personal laptop has everything except OneDrive disabled. YMMV and I acknowledge the default settings of letting apps insert themselves as Startup apps without user approval is not ideal. I note this is also an issue on MacOS from experience at my previous employer who issued me a Macbook which was configured as per corporate policy to have various apps including anti virus and the like loading on login to desktop. Only platforms that is so far immune to this is open source distros like Debian (using GNOME, its default desktop environment) et al but I've yet to work for an employer that uses Linux.


"Windows optimizes for time to fist UI,"

I'm going to apologise right now for giggling at your typo. Microsoft have done some horrible things to UI.

The start menu - yes they created the fucking thing, yes it is now called start by everyone - own it (think Biro and co and stop being dicks) and it belongs at the left hand side. "We" know better than you, lets put it in the middle and surround it with weird shit and lets make it odd and put fucking games controllers on a corporate laptop and other wankery that we can't be bothered to curate because we are so poor but if you love our weather forecasts and shitty ... whomever will pay us .. whatever thing.

I think that Microsoft have lost interest in humanity as anything than a pool of subscription slaves to contribute to their bottom line.

That is some pretty aggressive fisting.


The principal menu is still bottom left doing what they clearly see as most important thing in the OS, advertising shit and feeding you heavily politicised propaganda. So useful!


My Windows box starts fast, gets to login in moments, and is usable immediately. Are you sure you don't have a bunch of extra software set to run at startup? Steam? Adobe Creative Cloud? Oculus Support?

My corp Mac actually has this issue as it launches some security software, launches the browser, checks that apps are up to date and security patches are applied etc. It takes several seconds (10-20?) before it's ready to use.


you all reboot windows? madness.

I got devuan down to about 8 seconds from startup -> login -> shutdown on the tty. Gentoo boots pretty fast to a desktop (maybe 15-20 seconds) - a lot of the slowdown on "modern" linux is systemd waiting for NICs and whatever to quiesce. Hilariously, Ubuntu is by far the worst at boot times, i've had ubuntu sit there for minutes because it was airgapped.

I kinda lost interest in attempting to speed up boots more than that, maybe if i had some funding i could get debian or gentoo down to a couple of seconds of boot overhead before X/wayland/whatever runs.


Windows is pretty fast nowdays on modern hardware (especially if it's booting off an SSD drive) and Debian/GNOME is also pretty fast to a desktop as well. Slow boots off a SSD drive would point at an issue somewhere regardless of your OS.

I generally do shutdown on my gaming tower PC as the idle power usage isn't great and shutdown/boot is fast enough that it's not much difference compared to suspend. Laptops though I just suspend -- I often get weeks of uptime in Windows.


> Windows is pretty fast nowdays on modern hardware

This feels the wrong way around. Windows development just stagnated for about a decade while hardware made huge leaps, so hardware speed increased much faster than Windows became slower. Therefore, "Windows is pretty fast nowadays on modern hardware".

I remember Windows XP being able to boot from a hard drive without the progress bar making two full revolutions (so, like, 3-4 seconds?).

Windows 7 managed to do something similar (finish booting before the animated Windows logo has fully "bloomed"), but already required an SSD and a multi-GHz multi-core processor to do so.

Windows 10 does a few spinny spins, but uses a CPU that's another integer multiple faster and an NVMe SSD delivering hundreds of thousands of IOPS and gigabytes per second of bandwidth.


I almost want to test this out - shut down my PC, unplug it, hit the power button, wait like 15 minutes, then turn it back on. Will it still be "pretty fast"? My (probably wrong) understanding was windows boot times are so fast because it's really just hibernating when you shut down. I'm sure there's technical names for the power state (S0?) - but ram is kept "warm" so that resume basically just needs to check that all the devices it needs are connected and reset the software clock.

I'm currently showing 50.6GB "in use" and 62GB commit on my windows machine. My boot drive is an intel SSD on sata, less than a GB/second - this implies if windows does recover memory from disk on "cold" boot my machine will take about a minute to restore RAM.

i may be conflating things, but i've lived with this assumption that shutting down my PC is merely hibernating it, and as such, i always pull mains power before removing hardware.


If you see the BIOS screen, then it isn't suspended.

It could still be suspend-to-disk, but desktop PCs don't do that.

Windows 'fast boot' does work by saving some critical state on shutdown, but it's still a complete shutdown.


when booting my 5950x into windows, i never see the bios screen. It's about 50/50 if i see the nvidia bios screen, then i see the motherboard logo with the windows spinner under it. there's about 500ms where i can push F2 or whatever between when the logo shows up and the spinner shows up that i can "enter bios" - but that itself takes about 5 seconds to become fully realized on the screen.

I know of what you speak, but i don't know if this is universally true.


The motherboard or manufacturer logo is the bios running...

For example on xps, bios is now just when the plain dell logo displays. Thank Apple for that sort of thing.


> Windows 'fast boot' does work by saving some critical state on shutdown, but it's still a complete shutdown.

Nope.

The easy way to falsify this is to dual-boot.

If you have Windows Fast Boot enabled, Linux will not be able to mount Windows's NTFS drives, because they are still mounted. This tells us the kernel is just hibernated, not stopped.

This is one of my personal annoyances with systemd: if a drive in `/etc/fstab` isn't available, systemd waits for it to become available... forever. So if a drive is mounted (because another OS is hibernated), or it's changed UUID because it's been reformatted, or it's been deleted: your computer won't boot, because systemd is trapped in a retry loop.

The easy way to disable fast boot is just to open an Admin Command Prompt and type:

powercfg /h off

This disables Hibernation. The `C:\HIBERFIL.SYS` file is deleted and Windows now will do a full shutdown and full boot every time... and when you boot into Linux, you can now mount the Windows drives.

Note, with hibernation as opposed to suspend, you still see the BIOS screen.


That's the exact state I was talking about... but you did go into more detail. :)


Exactly my experience as well. I moved away from Windows (10) earlier this year, and login times (waiting for the user desktop to come up after putting in credentials) took ~10 seconds. Sometimes it was faster after a fresh install, but it never lasted, even when regularly managing what services and apps were loaded on startup.


When I reboot my Windows PC every 3 months or so, I can start using it as soon as the desktop shows up. No slowdown. I have 5800x3d/64GB/2TB/4090. I don't have any startup apps other than the ones that come built-in with enterprise edition.

Why do you have such a different experience?


> Why do you have such a different experience?

Well...

> I don't have any startup apps other than the ones that come built-in with enterprise edition.

There is the matter where 99% of people aren't on enterprise, and of the 1% who are, virtually all are running it laden with a pile of corporate garbage.


Guy goes into the doctor's office. Says, "Doctor, it hurts when I do this. " You know what the doctor says?

"Don't do that"

Seriously tho, anyone can activate enterprise edition with MAS. And even if you're on home edition, why would it take longer to boot up? All the crap is on your SSD, not your RAM


I'm pretty sure crowdstrike and its competitors can slow down any computer regardless of the OS to the point of uselessness.


I consider boot time from power off until I can start the first application I want.

But I am a bit strict about what I consider slow on modern hardware though. Why should a regular desktop distro wait around five seconds for networking and NTP before displaying login, or why should it take UEFI 5s to start the OS. I can forgive SBCs running off an SD card taking 15-30 seconds to boot, but not a PC that's significantly faster in all other aspects.

I'm not even going to start with all the crap that starts on an average Windows desktop. It's disgusting.


> Why should a regular desktop distro wait around five seconds for networking and NTP before displaying login

Well, for one, there may be a Kerberos ticket based auth system in place that requires accurate time. Or for AD on the Windows side.


So those who need that should configure their systems as such and the majority should get to enjoy faster boot times?


sorry man but that typo had me lol, early mornings here I do apologize.


> except Macbooks I guess???

My M1 MacBook takes an order of magnitude longer to start than my Windows Desktop PC. Once it's started up, leaving it on re-logging in takes no time but rebooting takes a while.


It really is a shame. When Mac OS X first implemented launchd, its boot times improved drastically. It booted really fast even when computers still using hdds. So when Macs got SSDs, they booted incredibly quickly.

But then, with some macOS update, they screwed it up and never bothered to fix it. Imagine how fast those things could boot with their super fast drives and SOCs if someone fixed this regression…


There’s something wrong with your M1 MacBook. I’ve used three and none of them has taken longer than 20-30 seconds from power button to desktop.


20-30 seconds is much longer than the ~6 seconds for my Windows PC.

Old article but:

https://www.tomshardware.com/reviews/fastest-windows-10-boot...


IMO having Windows Fast Boot enabled is cheating.


Would it be cheating if an equivalent Apple technology was enabled?


20-30s seems extremely slow but I'm not normally using a Mac. Anecdotally on Linux and Windows systems I commonly see 6-10 seconds on a hard reset with absolutely no optimizations


It takes a while to verify all the signatures in every chip to make sure you didn’t do something evil like repair your own hardware.


Linux can boot quite quickly with the right settings, I've written about it at [0], but distros (reasonably) build very generic kernels and initramfs, which are not particularly fast to boot

[0]: https://blog.davidv.dev/posts/minimizing-linux-boot-times/


I'm not sure how much distros can do here, the userspace part of boot time is negligible (unless there is some horrible misconfiguration, like networkmanager waiting 90 seconds for nonexistent wifi...). My linux box takes about 4 seconds until graphical.target, most of which is connecting to wifi and ntpd, both of which are optional in principle.

If you really want a fast boot, ditch all the bootloader compatibility layers, abstractions and dynamic configuration possibilities like initramfs. But then you would be at the mercy of the hardware vendor, which is definitely not worth it.


> My linux box takes about 4 seconds until graphical.target, most of which is connecting to wifi and ntpd, both of which are optional in principle.

But why should a login screen wait behind networking target for example? That ordering is up to the distributions.

> If you really want a fast boot, ditch all the bootloader compatibility layers, abstractions and dynamic configuration possibilities like initramfs. But then you would be at the mercy of the hardware vendor, which is definitely not worth it.

You'd expect that would be the case with SBCs, most if not all do overlays instead of ACPI. Very few also offer UEFI, so there isn't a slow(er) layer there either, but you are at the mercy of the vendor.


Login screen doesn't wait for graphical.target, its the other way around - display manager must be started for graphical.target to be complete. So in my case, ly-dm started 3 seconds into boot and it doesn't depend on anything significant. Either way, the broader point is that even if distros somehow managed to cut the time in half, thats still just 2 seconds compared to the massive time needed for firmware.

The only thing that pops out is systemd-binfmt.service somehow taking almost 1 second, which is strange since AFAIK it just echoes some strings into /proc file. There is still some room for optimization by mounting external drives asynchronously but that's not a safe optimization to make for general use.


I'm confused by this statement. For me Linux boot is incredibly fast, even on old machines with slow storage. For example, my MacBook Air 11 (running Linux) boots to login so fast I barely see any boot logs. systemd-analyze reports the graphical target is reached in < 4 s.

Two things seem to be key here. I don't use a desktop environment. I either boot in text mode (and then startx as needed), or I boot to X with a lightweight login manager (lightdm). The important bit is that no DE reduces the number of services by an order of magnitude, which put a lot of I/O pressure during boot on old hardware. The booted system is less than 200 MB, even when running X. The second thing that can speed things up is EFI stub: https://wiki.archlinux.org/title/EFISTUB.


> Two things seem to be key here. I don't use a desktop environment. I either boot in text mode (and then startx as needed), or I boot to X with a lightweight login manager (lightdm).

> I'm confused by this statement.

Should you be confused?


My first instinct: can we use some other core? Do we really need Linux to take a photo and transfer it to the cloud?

I'm not a hw person so curious how to complete the task with minimum budget.

Interesting read. Thank you!


My same first thought.

For no reason other than I have a pair of them sitting on my kitchen table right now, I wondered how the ESP32-CAM setup would compare. I think it's only good for 2megapixel images, But I'd bet both its startup time and its power consumption would be close to an order of magnitude lower. (Here's some details if you're curious: https://components101.com/modules/esp32-cam-camera-module )


I'm always a little perplexed by the world of microcontrollers. How would you program this without having some kind of embedded linux? And where does the OS live in this modules? Or does this sit on a Pi?


In many cases there is no OS, just bare metal. I have dabbed into embedded programming (but never really into hardware) briefly and the process looks like this: you manipulate some pins and they make things work. You read manuals to figure out which pins to use and how to manipulate them to make certain things happen. For example, to make a peripheral work, you first need to connect certain pins (following the manual), then you need to send some black magic signals to these pins to make it work in certain ways (think ROM reading/writing, LCD screen display, etc.). Reading the manual and the data sheets, IMO, is where the real complexity comes from -- and you can always use "Standard" components and use a library.

Here is a good textbook: https://web.eece.maine.edu/~zhu/book/

If you need an OS sometimes a RTOS is considered instead of Linux. Embedded Linux is pretty "heavy" in the embedded world AFAIK.


In the context of the camera modules above, what would you wire the pins into? Surely a Pi, running Linux?


You kinda cheat. There's a bunch of dedicated hardware/firmware that you use to run the camera module. That lets a fairly simple microcontroller and code to mostly ignore the electrical details of the camera, and just abstract that away so you can just send commands to the camera via pre existing libraries, then deal with the data representing an image.

Here's a reasonable overview:

https://lastminuteengineers.com/getting-started-with-esp32-c...


Oh wow, so that tiny chip can run a whole webserver with bluetooth and wifi, programmed purely using arduino. Pretty cool!


Craziest thing about it is, you can get one including the camera for under $10: https://www.aliexpress.com/item/1005006299363624.html


No OS. There's just a setup() function and a main loop() that runs forever.

It's really really fun, at least to my brain.

If you want to see how it works without spending any money, TinkerCAD (https://tinkercad.com) will let you layout, program, and simulate an Arduino. They're somewhat less powerful than the ESP32 CAM proposed to replace this, but it's a good way to "dip your feet" in programming and wiring up microcontrollers.


The problem is that this particular project uses camera and wireless networking, both requiring very non-trivial drivers. It is possible, in principle, to do it on bare metal, but getting the required peripherals working won't be easy.


ESP is a platform that has both though - wireless and camera on the esp32. Those can quick resume out of a low power sleep and connect to Wi-Fi and dump a picture or a series of pictures - I don't know what's more efficient.


Are you talking about the 2MP ESP32-CAM modules? Those things are an order of magnitude worse when it comes to fps and perceptual image quality vs Arducam's offerings for the RPi. Also all sorts of specialized hardware like depth sensing cameras work out of the box with the RPi.

ESP32 can do both wifi and cameras in the same sense that I can run back to back marathons. I just gotta take a couple of naps at hotels along the way.


You can connect the esp32 to a proper camera, you don't just have to use the development board for it.

If you're just taking a picture and uploading it via wifi, you're better off doing it bare metal. It can do everything stated in OP's post. MIPI support isn't available until ESP-P4 though.


How can the esp32 be connected to a proper camera? All the RPi cameras as MIPI so what's an example of a compatible camera? Do you have to debayer and sharpen the image yourself? I've seen a lot of half-assed ISPs and they make good cameras look like crap


ESP32-P4 has MIPI support, which will support the wider variety of high-end ones, but there are a small selection of SPI cameras.

If you wanted something that's available now, you could use the STM32F4/F7 or the STM32MP1.


Or Pi Pico W, I've used that on a few projects. Nearly instant boot


I'm not sure if you've worked with embedded but everything you just described shows up for free when you compile your first hello world with the platform SDK. All trivial, solved problems.

Take a look at nearly any consumer camera and note that it isn't running linux, or anything like linux.

There's a reason RPi isn't used to build actual consumer products. It's a neat toy for tinkering, handy around the shop and home for a bunch of purposes, but it's also making all the wrong tradeoffs for something you can deploy and support at scale. Nothing in the OP use case requires linux, you can do everything cheaper, faster, and FAR more efficiently on an ESP32 or similar.


I wondered why a custom kernel came so late. If you want to optimize, wouldn't you start with LFS or some source-based distribution? Autonomous software updates don't seem to be a necessity anyway on such a device.

In addition, I wonder if it would be possible to optimize the EFI/BIOS on such a device. At least on my standard Arch Linux desktop, it takes a significant amount of boot time:

  $ systemd-analyze 
  Startup finished in 10.076s (firmware) + 1.339s (loader) + 1.569s (kernel) + 2.974s (initrd) + 3.894s (userspace) = 19.854s


> I wondered why a custom kernel came so late. If you want to optimize, wouldn't you start with LFS or some source-based distribution? Autonomous software updates don't seem to be a necessity anyway on such a device.

Buildroot (which they used) is made exactly for this. With buildroot, you configure your own "Distribution" and generate a single bootable image from it.

> In addition, I wonder if it would be possible to optimize the EFI/BIOS on such a device. At least on my standard Arch Linux desktop, it takes a significant amount of boot time:

Not exactly sure about raspberry pi hardware, but a lot of other embedded SoCs have a pretty minimal bootloader that runs with u-boot, which is typically very fast (at least if you set the delay it waits for user input to 0)


> wouldn't you start with LFS or some source-based distribution

You don't ever want to actually use LFS (the manual from the LFS project) in the real world as compiling GNU is far too much work. A minimalistic kernel + busybox system is much less pain. But Gentoo would not be a bad option too.


christ, you've just shown me that I need to optimize my boot loader (systemd-boot), and how great apparently my firmware is.

    > systemd-analyze
    Startup finished in 3.259s (firmware) + 35.127s (loader) + 1.823s (kernel) + 2.927s (userspace) = 43.138s


3.5s is cool, but if the entire scenario was really connecting to WiFi and uploading an image every couple of minutes, an ESP32 would've been a much better choice for power consumption (unless the camera module you need for Pi has some specific features that none of the esp32-cam compatible cameras does)


ESP32 only supports up to 4MB of PSRAM while a single RPi HQ Camera still is 18MB.


Doesn't the camera have it's own framebuffer that the MCU can stream? I don't see why the MCU would have to hold the whole frame in memory.


The library comes with ESP32 use DMA for image stream. Don't think you can workaround that, unless you write your own driver


At least with ESP32-CAM api, the instruction to capture an image returns a pointer to image data in psram.

I would imagine a Pi Zero is more efficient at converting that raw image data to some compressed file format too.


ESP23-P4 will support up to 32mb in PSRAM, MIPI, and hardware h.264 encoding. It'll be a great chip for video.


I might recommend a slightly higher end micro with a mipi csi interface but otherwise agree. This is so much work to do what microcontrollers can do almost effortlessly.


Just stay booted and use a lower power microcontroller … 105mA … that’s not the right order of magnitude


Every single person is reading this with a voice in their head asking "why not use an esp32 et al?"

But of course the article is good enough because it's interesting, even if it's not the right tool for the job.


Answer is simple: speed of development.

I am (sort of) in a same boat as the author. I have a silly little project that takes pictures that I PoC/mock-ed with RPi Zero. It does use battery, but I am not overly concerned with battery life - however boot time is killer. It takes loooong time to boot and be ready. However I wrote all the code in an afternoon and it just worked.

I could use a higher level language (Python) to clue in all the C libraries that are used to talk with my camera, screen, and wifi stack.

I know that I could achieve way better battery life and boot time by switching to ESP32, which is why I ordered one the day of after I got my PoC working. The parts have sat 9 months in my projects bin. I did write the basics of getting the image bytes off of the camera, but after that I couldn't find the motivation to continue. I already have a working version of the thing I wanted to make. It does everything I want already. Why spend bunch of time figuring out the way more esoteric libraries and actually try to implement everything (again) in C just to get a bit better battery life and boot time?

Of course in a "production" system it would be very weird and suboptimal to continue using RPi after this point, but for a small one off or even low run project any extra development effort feels like a waste.

I should dig up my project and see if I could flip some config bits and get the boot time lower, I'd be extremely happy with anything sub 10 seconds and then I could completely burry the EPS32 idea as redundant.


OTOH if you do want to use the ESP32, I find using AI assistance is great to help translate codebases/programs/apps across languages/usecases (or outright generate it).


While I am sure nothing is more optimized and battery conserving than C code hallucinated by an LLM, I really prefer writing my code my self when I work with my personal projects


I don't understand what this means exactly, could you please elaborate?


I’m sure there are others, but as mentioned elsewhere in this thread ESP32 or NRF70 could take care of this for a lot less (off of bare metal or RTOS if you just need WiFi and Camera.)


A good reason would be the lack of support for MIPI for the camera.


Why did I always had impression that decompressing data is much faster than reading inflated data off disk? Like, if you need to read just 5MB and decompress it would take less time than just to read 10MB off a disk, for example, but this article kinda states the otherwise.


This is actually something that flip-flops across hardware generations and platforms. Hard drives used to be really slow and consumer machines would often reduce data bandwidth by compressing things because processors were “faster” than disk. But today’s SSDs are actually really fast, sometimes so fast that CPUs can barely keep up just processing the data coming off of them, so the balance can also shift in the opposite direction. And in embedded your storage might be slow but you may not have the processing power to spare decompressing. Or maybe you do and it saves on flash write cycles. This is a complicated topic!


The article states a "net-positive energy result", not necessarily a faster time for this specific optimization. They say GZIP decompression is energy-intensive, so while the combination of read + decompress may be faster, the CPU load during decompression and memory relocation of the decompressed data ultimately consumes more energy than reading an uncompressed kernel and running it directly.


Might've been the case with "spinning rust" (hard drives) but solid state storage can have lower access and read times -- no need to wait for a disk to spin up or move read heads to the right position on a platter etc


Ok decreasing the regulator voltage was a real surprise! I thought switching regulators would be far more efficient at higher voltages! (Less current = less heat)


Same. I really didn't expect that (and thus didn't even test it at first).


I was thinking that Circle (https://github.com/rsta2/circle) might be faster to boot than a kernel, but it doesn't seem to support MIPI cameras.


Very impressive. I've toyed with using the Pi for an intelligent trail camera. Startup time is critical - a PIR sensor detects an animal passing and you want to be taking photos ASAP so every second counts.

Lowering the power usage is awesome too.


Just use a purpose-built trail cam. Sub-second response, optics optimised for trail conditions, weatherproof, robust, durable, long battery life, designed to be secured to trees.

There's a bit of engineering in a reliable trail cam. (Don't buy no-name Chinese.)


I already have a couple of Bushnells. My desire is to be able to attach a small network of PIR sensors to each camera get better information.


I wonder if booting the OpenBSD kernel would be faster. Although, the OpenBSD init system is notoriously slow.

Also I feel (but don't know for sure) most of the time before executing the user space program would be spent by systemd.


>Although, the OpenBSD init system is notoriously slow.

Is this on the PI where it is slow. On my T420 it seems fine, but the re-linking of various daemons does add time. But that is done for security so I am fine to live with it.

Me, I want fast power down time so I can get out the door fast. And so far NetBSD, OpenBSD and Linux seems to meets that need :)


Don’t these types of systems generally just nuke init entirely and provide a custom PID 1 ?


If you like Rasp Pi ecosystem you might want to try the Pi Pico W, it's similar in spirit to microcontrollers like ESP32 but allows you to use micropython and has a neat set of peripherals that work "out of the box": https://shop.pimoroni.com/products/raspberry-pi-pico-w?varia...


micropython supports esp32 too


Supposedly on the Pi 5, the SoC could be put to sleep while RP1 remains active, and the RP1 has enough compute to handle like 4 or 8 pixels of data from an attached camera... I think RPi might be able to get much better suspend support with their new PMIC and RP1. But so far still waiting to see something handy like Wake on LAN support native in Pi OS.


> Supposedly on the Pi 5, the SoC could be put to sleep while RP1 remains active [..]

You mean "echo +60 > /sys/class/rtc/rtc0/wakealarm && halt"?


It is such a shame the RPi Zero2 does not support "traditional" sleep modes like the ESP32 for example - which is why we have to optimize the Linux boot process. https://forums.raspberrypi.com/viewtopic.php?t=243719


Assuming it stays up for about 10-15s this is a saving over staying idle of around 85%, based on the idle burn rate from toms hardware. Not bad at all!


Is the Pi connected to the network with a static IP? Getting a fresh one from DHCP can, in this context, take quite a bit of time and energy.


Three seconds? A purpose built trail cam is considered slow if it takes 0.7 seconds to boot up and take a picture.

0.15 s is the going rate these days.


Ehm instead of spending like weeks on this why not use a hardware that is meant for such applications like an ESP32?


According to the back of my jar of mayonnaise there is 8400kJ stored in it, enough energy to power this rpi for ~62 days. This is probably a stupid question but just out of curiosity, why do people express electrical energy in Watts per second or Watts per hour instead of Joules. Unless school physics has deserted me completely 1 Ws = 1J no?


Because no one talks about units of Ws. It's all Wh or kWh. Or they talk about power instead of energy.


Well you didn't read the article because the comparisons of the energy consumed are listed in Ws (Watt Seconds)

e.g.

> We can now boot into a Linux user space program in less than 3.5s!

>~400ms is spent in the Linux kernel (difference between pin 0 and pin 1)

>Total energy consumption: 0.364 As * 5.0 V = 1.82 Ws


Why not just boot baremetal, something like rsta2/circle?


Was expecting to see different governors tested.

Is that not a thing on a pi?


I'm fairly sure this is, I recall playing with this when I first got my Raspi.


Lovely project


You really should look into using the right hardware for the purpose instead. (Disclamer - I despise Raspberry and their overpriced closed devices, and also HN maniacally trying to use them for stuff they the wrong choice for)


What sort of hardware would you recommend for this use case?


A bunch of other SoC manufacturers have working system sleep implementation, either manufacturer supported or community supported. Never mind much faster boot options (like hundreds of ms to kernel).

So you just need to pick one that also has whatever camera interface you need supported. Say any RK3399 based board can be made to boot to simple userspace in 1-2s and have working upstream camera MIPI-CSI drivers and ISP. System sleep is ~300mW so ~60mA@5V. Pick one with wifi onboard if you need that. And it's all opensource software, no binary crap that can't be optimized.


Literally any ESP32 would be better suited. There is zero reason to be booting an entire OS to take a picture and blip it over WiFi.


ESP32 is great, but it simply can't work with the IMX477 camera used in this project. This camera has resolution of 4072x3176, or about 12M pixels, which is way above what any ESP32 can handle.


I can imagine following should be doable (with assumption that IMX477 has it's own buffer and doesn't DMA directly): 1) take a picture 2) read some lines 3) stream them via WiFi to some server 4) repeat 2-3 until whole picture is read 5) reconstruct the picture from slices on the server side


The sensor doesn’t have a framebuffer (because it’s just a sensor) and the RPi HQ cam is basically just a sensor on a board with some mipi connectors. You might be able to buy a package with an IMX477 sensor and some microcontroller/FPGA and frame buffer ram, but that would cost a lot more.


But the "community"!!! :D




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: