Hacker News new | past | comments | ask | show | jobs | submit | GGO's comments login

I have seen the variation of this question so many times - I am surprised HN does not auto-delete AHA posts.

S3 on us-east-2 stabilized for us as of 2 minutes ago


is there going to be D0 stepping for the 8GB version?


I don't see why there wouldn't be, it's cheaper to manufacture with seemingly no downsides. They probably won't revise the 4GB and 8GB versions until their stocks of the original stepping are used up though, and once they do introduce revised versions it may be a lottery which version you get for a while.


I'll bet they just slipstream them in. There was a huge backlog of the 8GB, that now looks pretty much cleared out. So it could be awhile before the D0 show up.


TSMC charged just a hair under $4000 per 16nm wafer in 2020.

Wafer calculators at 0.2 defect/cm2 on a 300mm wafer gives 950 fully-good dies out of 1061 for the old die (~89% good) and 1469 fully-good dies out of 1584 (~93%) for the new dies.

Dividing that out gives $4.21/chip for the old chip and $2.72/chip for the new chip. At $80 for an 8gb board, that represents a ~1.9% increase in profit per board. For the $60 4gb version, it's more like 2.5% increase in profit per board.

In real-world terms, if they sell 10M Pi5 units with the new chip, they'll have an extra $15M in the bank in saved production costs alone (minus whatever costs to strip everything out and tape out again). Furthermore, the new chip gets cheaper with every chip they make as the R&D costs get more and more diluted.


I am happy to see this example project is using Vue instead of React. It is a great framework and it deserves more visibility.


Thanks! Yea, I really love vue, and how simple feeling it is to work with for me.


I dont understand the recommendation of using bigserial with uuid column when you can use UUIDv7. I get that it made sense years ago when there was no UUIDv7, but why do people keep recommending it over UUIDv7 now beats me.


As uuid v7 hold time information, they can help bad actors for timing attacks or pattern recognition because they contain a time information linked to the record.

You can guess the time the system took between 2 uuid v7 id's.

They can only be used if they're not shown to the user. (so not in the form mysite.com/mypage? id=0190854d-7f9f-78fc-b9bc-598867ebf39a)

A big serial starting at a high number can't provide the time information.


Big serial is sequential and it’s very easy to guess the next number. So you got the problem of sequential key attack…

If you use only uuid in your outwards facing api then you still have the problem of slow queries. Since you need them to find the object (as mentioned below)

UUIDv7 has a random part, can be created distributedly, and indexes well.

It’s the best choice for modern application that support distributed data creation.


Sequential numbers can not be used publicly.

Also, security can be built around not allowing querying records which are not yours.

I'm all for a little security through obscurity including UUIDs but it shouldn't be the sole thing. Easier to generate a UUID for the sequential and let the database do what it does best (relate many serials among each other).

The other part is being able to use what's built into the database out of the box without a lot more configuration.

Selfishly, I always appreciate learning more about Postgres though :)


You never expose the bigserial, you generate a ID (like UUID) for external use/identification and simply have an index over that column for fast selects.


If you have an index on the uuid anyways having a separate big serial field for PK doesn’t help that much.


As mentioned elsewhere, it ensures the ability to perform resumable and consistent batching queries across the data set without missing records.

Ordering over an insertion timestamp is not enough if two records may have the same timestamp: You may miss a record (or visit a record twice) across multiple queries.


This is solved sorting by timestamp first then by random PK UUID. Don't think a little simpler batch queries justify leaking time and quantity information or complexity of handling two types of IDs.


You wouldn't expose the numeric IDs publically, and ideally you'd use your database's automatic ID selection to avoid any complexity.

The UUID sorting works in the common case, but if you happen to end your batch near the current time, you still run the risk of losing a few records if the insert frequency is sufficiently high. Admittedly this is only a problem when you are batching through all the way to current insertions.


I agree with not baking more intelligence into a piece of data than needed, especially an index.


Having an index over the uuid is equivalent to it being a PK, so why would you bother having both?


Because it's much better for range queries and joins. When you inevitably need to take a snapshot of the table or migrate the schema somehow you'll be wishing you had something else other than a UUID as the PK.


This. Highly recommend using a numeric primary key + UUID. Using UUID relations internally can have some strategic advantages, but when UUIDv4 is used as the only primary key, you completely lose the ability to reliably iterate all records across multiple independent queries.

Also, the external thing isn't just for exposing it out to your own apps via APIs, but way more importantly for providing an unmistakable ID to store within external related systems. For example, in your Stripe metadata.

Doing this ensures that ID either exists in your own database or does not, regardless of database rollbacks, database inconsistencies etc. In those situations a numeric ID is a big question mark: Does this record correspond with the external system or was there a reuse of that ID?

I've been burnt taking over poorly managed systems that saved numeric IDs externally, and in trying to heal and migrate that data, ran into tons of problems because of ill-considered rollbacks of the database. At least after I leave the systems I build won't be subtly broken by such bad practices in the future.


Ha? Please elaborate.


When running a batched migration it is important to batch using a strictly monotonic field so that new rows wont get inserted in already processed range


It's not even necessarily it being strictly monotonic. That part does help though as you don't need to skip rows.

For me the bigger thing is the randomness. A uid being random for a given row means the opposite is true; any given index entry points to a completely random heap entry.

When backfilling this leads to massive write amplification. Consider a table with rows taking up 40 bytes, so roughly 200 entries per page. If I backfill 1k rows sorted by the id then under normal circumstances I'd expect to update 6-7 pages which is ~50kiB of heap writes.

Whereas if I do that sort of backfill with a uid then I'd expect to encounter each page on a separate row. That means 1k rows backfilled is going to be around 8MB of writes to the heap.


Isn't that solved because UUIDv7 can be ordered by time?


Yeah pretty much, although ids can still be a little better. The big problem for us is that we need the security of UUIDs not leaking information and so v7 isn't appropriate.

We do use a custom uuid generator that uses the timestamp as a prefix that rotates on a medium term scale. That ensures we get some degree of clustering for records based on insertion time, but you can't go backwards to figure out the actual time. It's still a problem when backfilling and is more about helping with live reads.


Are page misses still a thing in the age of SSDs?


Strictly monotonic fields are quite expensive and the bigserial PK alone won't give you that.


PG bigserial is already strictly monotonic


No they're not, even with a `cache` value of 1. Sequence values are issued at insert rather than commit. A transaction that commits later (which makes all updates visible) can have an earlier value than a previous transaction.

This is problematic if you try to depend on the ordering. Nothing is stopping some batch process that started an hour ago from committing a value 100k lower than where you thought the sequence was at. That's an extreme example but the consideration is the same when dealing with millisecond timeframes.


Okay, but in a live DB, typically you won't have only inserts while migrating, won't you?


Yes, but updates are covered by updated app code


would creation/lastmod timestamps cover this requirement?


Yes, although timestamps may have collisions depending on resolution and traffic, no? Bigserials (at least in PG), are strictly monotonic (with holes).


Amen (or similar)


I don’t understand how that’s an issue. Do you have an example of a possible attack using UUIDv7 timestamp? Is there evidence of this being a real security flaw?


The draft spec for uuid v7 has details about the security considerations : https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

The way I see it is that uuid v7 in itself is great for some use but not for all uses.

You always have to remember that a v7 always carries the id's creation time as metadata with it, whether you want it or not. And if you let external users get the v7, they can get that metadata.

I'm not a security expert but I know enough to know that you should only give the minimal data to a user.

My only guess is that v7 being so new, attacks aren't widespread for now, and I know why the author decided not to focus on "if UUID is the right format for a key", because the answer is no 99% of the time.


That just seems overly cautious. I’d rather use UUIDv7 unless I have a reason not to. The convenience of sortable ids and increased index locality are very much worth the security issues associated with UUIDv7. Maybe I wouldn’t use UUIDv7 for tokens or stuff like that, but DB IDs seem pretty safe.


I don't get it either. If UUIDv7 lacks security due to its revelation of a timestamp, why don't bigserials also lack security? After all, given a bigserial ID, you can tell whether it was generated before or after some other bigserial ID and thereby infer something about the time it was generated.


BigSerials come from a relational database world for me.

The use of UUIDs for documents may come from a no-sql background.

I use bigserials for relational data in relational databases, and if there is a unique document value needed, a UUID is good.


I don’t understand this thinking. If you understand what’s at play, you can infer the potential security implications. What you’re advocating for is being entirely reactive instead of also being proactive.


No, I don’t. Even with a timestamp uuids are not enumerable, and honestly I don’t care that the timestamp they were created at is public. Is the version of uuid used being a part of the uuid considered a leak too?


For almost all use cases just showing a UUIDv7 or sequential ID is fine. There are a few exceptions, but it's not the common case.


How would it be fine, e.g. for e commerce which is arguably very large portion of the use cases?

You would be immediately leaking how many orders a day your business is getting with sequential id.


> You would be immediately leaking how many orders a day your business is getting with sequential id.

Which is fine for almost all of them. All brick and mortar stores "leak" this too; it's really not that hard to guess number of orders for most businesses, and it's not really a problem for the overwhelming majority.

And "Hi, this is Martin, I'd like to ask a question about order 2bf8aa01-6f4e-42ae-8635-9648f70a9a05" doesn't really work. Neither does "John, did you already pay order 2bf8aa01-6f4e-42ae-8635-9648f70a9a05" or "Alice, isn't 2bf8aa01-6f4e-42ae-8635-9648f70a9a05 the same as what we ordered with 7bb027c3-83ea-481a-bb1e-861be18d21ea?"

Especially for order IDs UUIDs are huge PITA because unlike user IDs and other more "internal" IDs, people can and do want to talk about them. You will need some secondary human-friendly unique ID regardless (possibly obfuscated, if you really want to), and if you have that, then why bother giving UUIDs to people?


Best solution is to have a serial identifier internally and a generated ID for external. And yes it shouldn't be a UUID as they are user-hostile, it should be something like 6-10 letters+digits.


There are jurisdictions e.g. Germany in which a consecutive sequence for invoice numbers is a mandatory, legislated requirement (mercifully, gaps are generally permitted, with caveats)

For extra spice, in some places this is legislated as a per-seller sequence, and in others as a per-customer sequence, so there’s no policy you can apply globally, and this once again highlights the separation of concerns between a primary key and a record locator/identifier.


> consecutive sequence for invoice numbers is a mandatory, legislated requirement (mercifully, gaps are generally permitted, with caveats)

That’s surprising. In Denmark gaps are not allowed. You have to account for all invoices and if you have an invoice numbered 50, then you have at least 50 invoices to account for.


It's nice when you change invoicing software, to be able to have gaps. For example, before Stripe is invoice <500 and Stripe invoices have >500. This makes it simple for humans to determine where an invoice may be located during the transition year. Further, it means we can plan the entire switch-over in advance, vs. only knowing the invoice number AFTER the switch-over. This makes a huge difference in internal communications to customer support, because you can let them know how things will operate once the switch is done. If you can't have gaps, you won't know how to tell customer support where to find new/old invoices until after the switch.


In the Netherlands gaps aren’t allowed either, and I’m surprised that it is elsewhere, as that allows to you get rid of unwanted invoices whenever you want.

However you are allowed to have multiple sequences, differentiated through a prefix, but all starting at 0. That’s what we recently did to switch invoice generation tools (we actually still run both of them alongside each other atm).

Of course you could still drop some invoices from the end when you do this, but I guess tax authorities accept that risk.


The prefixes have to be in order though. You cannot start a prefix with A after already using prefix starting with B.


> There are jurisdictions e.g. Germany in which a consecutive sequence for invoice numbers is a mandatory

Same in France. I thought it was a pretty common requirement.


Can I ask (as a humble application developer, not a backend/database person), if the two requirements are:

1. The UUIDs should be ordered internally, for B-tree performance

2. The UUIDs should not be ordered externally, for security reasons

Why not use encryption? The unencrypted ID is a sequential id, but as soon as it leaves the database, it's always encrypted. Like, when getting it out:

    SELECT encrypt(id) FROM table WHERE something = whatever;
and when putting stuff in:

    UPDATE table SET something = whatever WHERE id = decrypt(<encrypted-key>)
Seems like the best of both worlds, and you don't need to store separate things.


>Why not use encryption?

Because then you have a key management problem, which adds complexity.


If the key and encryption mechanism are ever leaked, those opaque external IDs can be converted easily back to sequence numbers, and vice versa, which might pose a risk for you or your users. You won't be able to rotate the encryption key without breaking anything external that tracks those encrypted IDs... third party services, SEO, user bookmarks, etc.


You store the key in the database, right? Like, if the database leaks, it doesn’t matter if your ids are sequeneced or unsequenced, because all data has leaked anyway. The key leaking doesn’t seem like a realistic security issue.


Ideally if you do this, you store the key in a separate schema with proper roles so that you can call encrypt() with the database role, which can't select the key. Even then, the decrypted metadata should not be particularly sensitive - and should immutably reference a point in time so you can validate against some known key revocation retroactively.

My take is it's rarely necessary to have a token, that you give to an external entity, that has any embedded metadata all - 99.9% of apps aren't operating at a scale where even a million-key hashmap sitting in ram and syncing changes to disk on update would cause any performance difference.


This is a very weird thread: half the people are arguing that having these timestamps is not a realistic security problem at all, and the other half is arguing that any fix to it has to have Fort Knox level security policies.

It seems to me: the actual value of knowing these ids/timestamps to a hacker is tiny, but it's not nothing (German tank problem and all that). Like, if a hacker was able to decode the timestamps, it's not ideal, but it's not like a catastrophe either (especially given that half the people in this thread thinks it has no security value at all). Given that threat model, a simple scheme like I suggested seems fine to me.


> The key leaking doesn’t seem like a realistic security issue.

But it is.

If you have a password in a system, you want to rotate it regularly or at least have that ability (for example, when angry colleague leaves).


> As uuid v7 hold time information, they can help bad actors for timing attacks or pattern recognition because they contain a time information linked to the record.

Are you then not doing security by randomness if that is the thing that worries you?


The comment above warns against it due to the embedded timestamp info as a info leak risk. Perhaps that was a problem for them in some circumstance.


It wasn’t a problem for me directly but was observed and related by a colleague: an identifier for an acquired entity embedded the record’s creation timestamp and effectively leaked the date of acquisition despite it being commercial-in-confidence information. Cue post-M&A ruckus at board level.

Just goes to show you can’t inadvertently disclose anything these days.


You're saving storage space but potentially leaking details. Is that ok for your application? No one can answer but your org.


The details part is so miniscule that I doubt it even matters. You'd have difficult time trying to enumerate uuidv7s anyways.


Leaking time leaks information about customer growth and usage. It may matter to your competitors.


64bit coming soon


2 different machines


why wait this long (until Sept 2024) to opensource the code?


Winamp has a ton of proprietary licensed library code (codecs, Gracenote API, etc) that all has to be replaced with open source equivalents before the code can be released. I believe the skeleton crew that they had working on maintaining Winamp a few years ago started on some of this work, but I'm assuming that the whole codebase needs to be audited to make sure that they're legally in the clear.


Note that they don't actually say "open source" anywhere.


It's also weird that the timestamp on the press release is "Dec 16, 1".

There are 5 press releases total on that site, 2 from 2023, 2 from 2024, and this one from year "1". It just seems very strange.


you can be required to keep logs - they need to design a system that cannot collect logs - You cannot share what you cannot have.


I’d be more interested in a system that can prove to me that it’s not collecting logs. Hard, but not impossible.


As long as we are talking about classical communication (and not quantum) it is impossible to prove that it isn't collecting at least ciphertext logs.


Consider a certified tamper-resistant operating system which cryptographically certifies the versions of software it operates, and prohibits uncertified processes from running. The certificate of authenticity verifying the software is made available to the clients which connect to the remote application. This cert specifies all of the program transforms which were required in order to produce the compiled software, and they specify the capabilities required for the transform.

It is certainly a very hard and complex problem but I wouldn’t necessarily go as far as “impossible”. Maybe you know something I don’t know, though.


> Consider a certified tamper-resistant operating system which cryptographically certifies the versions of software it operates, and prohibits uncertified processes from running.

If I own the hardware, I can decide how the software is executed, including containerizing your certification processes to make them feel warm and fuzzy and happy but in reality they are running inside a simulation.

If push comes to shove I could theoretically manufacture my own RAM sticks that copy everything and your OS wouldn't even know, but there's a 99% chance I could successfully pull it off at the kernel virtualization level.


Not really. Tor, I2P, and Monero manage this just fine. Building on these technologies should allow one to have privacy and anonymity without any exotic quantum technology.


Well they don't actually, Tor especially has enormous amounts of government nodes so they can trace and log exactly what and who. And all of those still rely on the IP network which always will allow logging without you ever knowing, it's just math really, the proof of not-logged is just impossible.


Interesting, do you have a source? All fully p2p networks are vulnerable to sybil attacks to some extent, but specifically a source that Tor actively has enough "government nodes" to de-anonymize everything.


These technologies give privacy and anonymity under normal conditions, but they do not prevent anyone from logging ciphertexts. If someone has logged ciphertext, and the government subponies someone to divulge their private key and subponies whoever has the ciphertext, those ciphertexts as good as plain text.


I mean, I don’t think anyone really expects that encrypted messages are necessarily secure in context of stolen private keys. I assume that a lot of encrypted traffic is either recorded at the ISP/backbone level or at least can be on demand.


I buy hard drives based on these reports. Thank you Backblaze.


Where do you buy your drives? Last time I was in the market, I couldn't find a reputable seller selling the exact models in the report. I'm afraid that the less reputable sellers (random 3rd party sellers on Amazon) are selling refurbished drives.

I ended up buying a similar sounding but not same model from CDW.


These are useful data points, but I've found that at my risk tolerance level, I get a lot more TB/$ buying refurbished drives. Amazon has a couple of sellers that specialize in server pulls from datacenters, even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.


> even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.

The Amazon refurb drives (in this class) typically come with 40k-43k hours of data center use. Generally they're well used for 4½-5yrs. Price is ~30% of new.

I think refurb DC drives have their place (replaceable data). I've bought them - but I followed other buyers' steps to maximize my odds.

I chose my model (of HGST) carefully, put it thru an intensive 24h test and check smart stats afterward.

As far as the 5yr warranty goes, it's from the seller and they don't all stick around for 5 years. But they are around for a while -> heavy test that drive after purchase.


Buying refurbished also makes it much easier to avoid having the same brand/model/batch/uptime, for firmware and hardware issues. I do carefully test for bad sectors and verify capacity, just in case.


I think you're better off buying used and using the savings for either mirroring or off-site backup. I'd take two mirrored used drives from different vendors over one new drive any day.


There was a Backblaze report a while ago that said, essentially, that most individual drives are either immediate lemons or run to warranty.

If you buy used, you're avoiding the first form of failure.


Indeed- RAID used to stand for Redundant Array of Inexpensive Disks. The point was to throw a bunch of disks together and with redundancy it didn't matter how unreliable they were. Using blingy drives w/ RAID feels counter-intuitive- at least as a hobbyist.


A lot of those resellers do not disclose that the drive isn't new, even labeling the item as new.

GoHardDrive is notorious for selling "new" drives with years of power on time. Neither Newegg nor Amazon seem to do anything about those sellers


Any specific sellers you'd recommend?


Refurbed drives have a MUCH HIGHER failure rate. I used to send back lots of drives to Seagate, they come back with the service sticker and that means trouble. YMMV


These generally aren't refurbed drives, they are used drives that sat in a datacenter for 3-5 years.


In europe lambda tek is my goto for enterprise hardware as a retail customer.


Lots of good options here: https://diskprices.com/



Note that they list at least one vendor as selling "New" drives when they are not even close to being new.


It's definitely scraped with a few simple queries and not moderated by a human, you have to manually check before buying of course. It just saves a few minutes of time automating the initial search.


I think there will eventually be a false advertising lawsuit or some regulatory action against Amazon about this. Until that happens, it’s hard to say for certain which items are used.


And for stuff like this, many companies will have an approved vendor, and you have to buy what they offer or go through a justification for an exception.


B&H has quite a few


I guess it isn’t that surprising given the path the development took, but it is always funny to me that one of the most reputable consumer tech companies is a photography place.


Similar to how the most popular online retailer is a bookstore. Successful businesses are able to expand and I wish B&H the best of luck on that path, we need more companies like them.


I'd rather companies stick to one thing and do it well, rather than expand into every industry out there and slowly creep into every facet of society.

Like that bookstore that just happens to retail some stuff too.


B&H seems to be pretty focused on techy things (and cameras of all sorts have always been techy things, though that corner of the tech market that has been declining for a long time now).

When they branch out to selling everything including fresh vegetables, motor oil, and computing services, then maybe they might be more comparable to the overgrown bookstore.


I definitely learn towards B&H for electronic things. It’s quite a bit less “internet flea market” that Amazon often is.


There used to be a much more distinct camera—and all rhe ancillary gear and consumables than there used to be. Though B&H still sells a ton of lighting and audio gear as well as printers and consumables for same.

They sell other stuff too but they’re still pretty photo and video-centric, laptops notwithstanding.


AWB&H alone is a Fortune 500 company


I buy most, but not all, of my tech at B&H and have now for more than a decade. Especially peripherals.


What's the risk of buying Amazon & running a SMART/crystaldisk test?


I don’t buy hard drives based on these reports. I buy SSDs and let my cloud providers deal with hard drives.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: