Hitchhikers guide was a personality defining read for me when i was younger.
I also recommend this video:
Parrots, the Universe and Everything
https://www.youtube.com/watch?v=_ZG8HBuDjgc
I recommend 'The Cyberiad' by Stanislaw Lem. Get the Michael Kandel translation.
As an Adams fan since high school I was floored when I eventually read The Cyberiad and realized that Lem had laid all of the groundwork fourteen years earlier. It's very much the proto Hitchhiker's Guide. It's got it all: Intergalactic protagonists on a series of highly absurd adventures, enabled by fantastical tech, and a playful approach to themes at the intersection of technology, philosophy, and contemporary physics.
It is laugh out loud funny. Especially once you get into the first, second, and third sallys. The humor is fiendishly clever. Lem is incredibly punny and it blows my mind to know that it was translated from Polish(!). I hope Michael Kandel gets his due for keeping the spirit of this book intact, because it really hinges on some very clever use of language.
> Just this week i was looking for more humour writing like Douglas Adams, PG Wodehouse.
They are both masters of producing an absolutely perfect phrase that could have come from no-one but them—so's Pterry, by the way—but otherwise it'd never occur to me to lump them together. They seem radically different tonally to me.
Culture is mostly generational habits, learnings, hacks etc. Poverty and scarcity for generations could scar people into adopting certain habits as a way to survive; and it can be eradicated if the causal conditions go away and others take its place.
I love gevent, but i never was 100% sure that nothing is secretly breaking or some weird thread safety issue. In a large SaaS app all sorts of 3rd party libs do weird background threading stuff or someone randomly starts doing threading.Local and shared global context. After hitting some weird hanging redis-py client issues, i turned gevent off and it went away. Never really got around to spend time to debug the issue(especially since it happened on prod and hard to replicate on stage/local).
Does your app have a lot of dependencies that do background threads? Like Launchdarkly(feature flags), redis, spyne(rpc) and on and on.
We also heavily use gevent but this is indeed the greatest frustration. Random and difficult to diagnose issues in external libraries like sockets being closed prematurely or timing out.
I've had success with strategies at introducing some abstractions/patterns at my current place(doing this alone for a enterprise SaaS company with 200-ish devs). It's weird that we don't teach these or talk about them in software engineering(AFAIK). I see them being re-invented all the time.
To borrow from medicine: First step is to always stop the `stop the hemorrhage`, then clean the wound, and then protect the wound(or wounds).
- Add a deprecation marker. In python this can be a decorator, context-manager, or even a magic comment string. This i ideally try to do while first introducing the pattern. It makes searching easier next time.
- Create a linter, with an escape hatch. If you can static analyse, type hint your way; great! In python i will create AST, semgrep or custom ones to catch these but provide a magic string similar to `type: noqa` to ignore existing code. Then there's a way to track and solve offending place. You can make a metric out of it.
- Everything in the system as to have a owner(person, squad, team or dept). Endpoints have owners, async tasks have owners, kafka consumers might have owners, test cases might have owners. So if anything fails you can somehow make these visible into their corresponding SLO dashboards.
The other alternative to this last step is "if possible" some platform squad can take over and do this as zero-cost refactor for the other product squad. Ofcourse the product squads have to help test/approve etc. It's an easier way to get people to adopt a pattern if you do it for them. But the ROI on the pattern has to be there, and the platform squad does get stuck doing cruft thankless work sometimes. If you do this judiciously the win might be thanks enough, like more robust systems, better observability/traces, less flaky tests etc. etc.
Tests might cover more code than a single unit owned by different teams, thus end up with multiple owners. Prefer "squads" as the owners rather the individuals.
But just like documentation the ownership might be stale and out of sync. So the idea would be let some reds in SLO dashboard correct them over time. It's not possible to automatically link "tests" to the "code" always.
End-to-end tests might get tricky. But unit tests should be owned by the person/team/squad that owns the unit.
And unit tests should never break/be red. If the code needs to changed, the test needs to be changed at the same time.
End-to-end tests can be flaky. Those probably shouldn't prevent deployments and can be red for awhile. Should probably manually confirm if the test is acting up, there's a change in behavior, or if something is legitimately broken before ignoring them though.
While agreeing somewhat with the post above, the answer isn't really so black and white but depends on your context, i.e. scale, app-complexity, search needs, data size etc.
>the fulltext tool, can and should hold only 'active' data
Same can be said about your DB. You can create separate tables, partitions to hold only active data. I assume materialized views are also there(but never used them for FTS). You can even choose to create a separate postgres instance but only use it for FTS data.
The reason to do that might be to avoid coupling your business logic to another ORM/DSL and having your team t learn another query language and its gotchas.
> as data size is smaller, it better fits in RAM
> as data size is smaller, it better fits in RAM
> as the indexed data are mostly read-only, the VM where it runs can be relatively easily cloned
> as the indexed data are mostly read-only, the can be easily backup-ed
> as the backups are smaller, restoring a backup can be very fast
Once the pg tables are separate and relevant indexing, i assume PG can also keep most data in memory.
There isn't anything stopping you from using a different instance of PG for FTS if needed.
> as FTS tools are usually schema-less, there is no outage during schema changes
True. But in practice for example ES does have schema(mappings, columns, indexes), and will have you re-index your rows/data in some cases rebuild your index entirely to be safe. There are field types and your querying will depend on the field types you choose. i remember even SOLR did, because i had to figure out Geospatial field types to do those queries, but haven't used it in a decade so can't say how things stand now.
While the OPs point stands, in a sufficiently complex FTS search project you'll need all of the features and you'll have to deal with the following on search oriented DBs
- Schema migrations or some async jobs to re-index data. Infact it was worse than postgres because atleast in RDBMS migrations are well understood. In ES devs would change field types and expect everything to work without realizing only the new data was getting it. So we had to re-index entire indexes sometimes to get around this for each change in schema.
- At scale you'll have to tap into WAL logs via CDC/Debezium to ensure your data in your search index is up-to-date and no rows were missed. Which means dealing with robust queues/pub-sub.
- A whole another ORM or DSL for elasticsearch. If you don't use these, your queries will soon start to become a mish-mash of string concats or f-strings which is even worse for maintainability.
- Unless your search server is directly serving browser traffic, you'll add additional latency traversing hops. In some cases meilisearch, typesense might work here.
I usually recommend engineers(starting out on a new search product feature) to start with FTS on postgres and jump to another search DB as and when needed. FTS support has improved greatly on python frameworks like Django. I've made the other choice of jumping too soon to a separate search DB and come to regret it because it needed me to either build abstractions on top or use DSL sdk, then ensure the data in both is "synced" up and maintain observability/telemetry on this new DB and so on. The time/effort investment was not linear is and the ROI wasn't in the same range for the use-case i was working on.
I actually got more mileage out of search by just dumping small CSV datasets into S3 and downloading them in the browser and doing FTS client side via JS libs. This basically got me zero latency search, albeit for small enough per-user datasets.
Yes, it always depends on application and purpose.
But once you will have to deal with a real FTS load, as you say, you have to use separate instances and replication, use materialized views etc.. and you find your self almost halfway to implementing ETL pipeline and because of replicas, with more complicated setup than having a FTS tool. And than somebody finds out what vector search is, and ask you if there is an PG extension for it (yes it is).
So IMHO with FTS in database, you'll probably have to deal with the almost same problems as with external FTS (materialized views, triggers, reindexing, replication, migrations) but without all its features, and with constrains of ACID database (locks, transactions, writes)...
Btw. I've SOLR right behind the OpenResty, so no hops. With database there would be one more hop and bunch of SQL queries, because it doesn't speaks HTTP (although I'm sure there is an PG extension for that ;-)
Seems like it's deliberately using "committee" as a pejorative to make a political point. Flip it to "you'll find no statues of groups of people" and it's patently false.
I think so. I'd normalize the text first: lowercase it and remove all non-alphanumeric characters. E.g for the phrase "What now?" I'd create these trigrams: wha, hat, atn, tno, now.
Disclaimer: I live in Chile, but not a Chilean national(nor of similar ethnicity), and certainly not a historian.
The dispute is seen differently in Chile and is not as simplistic as Chile invading a port. In general i've gotten the sense that the general populace believes that Bolivia(with its secret alliance with Peru) had other intentions.
>In February 1878, Bolivia increased taxes on the Chilean mining company Compañía de Salitres y Ferrocarril de Antofagasta [es] (CSFA), in violation of the Boundary Treaty of 1874 which established the border between both countries and prohibited tax increases for mining. Chile protested the violation of the treaty and requested international arbitration, but the Bolivian government, presided by Hilarión Daza, considered this an internal issue subject to the jurisdiction of the Bolivian courts. Chile insisted that the breach of the treaty would mean that the territorial borders denoted in it were no longer settled.
>Ill-defined borders and oppressive measures allegedly taken against the Chilean migrant population in these territories furnished Chile with a pretext for invasion.
Just this week i was looking for more humour writing like Douglas Adams, PG Wodehouse. I came across this award which seems interesting. https://en.wikipedia.org/wiki/Bollinger_Everyman_Wodehouse_P...
I plan to read some of them this year.