Hacker News new | past | comments | ask | show | jobs | submit login

> and may even be useful in higher-level garbage-collected languages to reduce pressure on the garbage collector.

> The approach described here works very well though with a data-oriented architecture, where central systems work on arrays of data items packed tightly in memory.

I have been pushing us into a new data-oriented architecture for our software, but for reasons of domain modeling (managing complexity) rather than performance. This is a very interesting read from my perspective.

Let me give an example - We took deep, circular object graphs like:

  Account.Customers[0].Accounts[1].Customers[2].FirstName
And turned them into 2 flat arrays of Account and Customer. A 3rd relation type is also developed to arbitrarily relate these types together. The relation type also conveys all facts that pertain to the nature of the relationship, rather than forcing this knowledge onto Account or Customer.

This ideology is carried to the logical extreme - We do not permit any nesting of complex types in our revised domain model object graph. There is a root type which contains all of the List<T>s of domain types (and applicable relation types). The domain model is composed such that we can directly map every single type & public property into SQL tables. And we do. Because SQL is amazing for executing business logic.

Now that I am thinking this through, performance is certainly also seeing a benefit here. We are no longer serializing these absolutely horrific JSON object graphs. It's just a bunch of flat complex type arrays now, with everything related together using GUIDs.

From a developer QOL perspective, one could argue that this is more difficult to work with because everything is separated by relations. I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.




It sounds like you’re describing normalised data, is that a fair statement?


Yes - Something around 3NF/BCNF.


I was going to implement exactly this, but out of laziness ended up using this: https://www.npmjs.com/package/json-serialize-refs

For my use case serialized json was 30+% smaller.


> I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.

In my own database mapping code for my game (I use RethinkDB) I made a type called DatabaseLink<T> where T : DatabaseEntry, with a Value property that will retrieve the link target from the database cache. It's almost as fast/convenient as a simple field, and it serializes to a GUID.


> Because SQL is amazing for executing business logic.

Can you explain this a bit more? I've always heard that there are supposed benefits to writing business logic in SQL, and I've made efforts to put it into practice, but the more I deal it, the more I dislike it.

I just finished rewriting a complex calculation that was being done in SQL into C#, and the C# runs about 10 times faster, is easier to understand, easier to debug, easier to change, and easier to test. The SQL was written as a series of views each building on top of another, some doing grouping, some doing calculations, etc. until it gets the final result which is copied into a table.

Let's say I need to calculate intermediate result A (like maybe a giant case statement) as input to calculate B, C, and D, which will then go on to be combined in various ways. In order to avoid inlining complex calculation A in 3 different places, I can either write a sub-select, a CTE, put it in a view, or copy it to a new table variable. But no matter how I handle it, I end up with layers and layers of nested select statements, each performing one more step of the calculation. Doing the same thing in an imperative language usually ends up seeming trivial compared to doing the same thing in SQL. In C#, I'd just add a function or read-only property to calculate A, and then use that wherever it's needed, but in SQL, adding such a requirement can mean restructuring the entire query.

In another case, I've taken a stored procedure that selected data into a table variable and manipulated that, and rewrote the whole thing in C#, getting a 15x performance gain (from almost a minute down to 3-4 seconds, and pretty much all that time is retrieving relevant rows from various tables). It does the exact same work, but working with classes instead of a table variable. When I originally started working on the stored procedure, it used cursors, and took 15 minutes to run.

In another case I was recently working on, what should be an efficient calculation joining 5 tables together into a view ended up taking several seconds to select one record and is called quite frequently (i.e. like 50 times in a second during application startup). I added an index to speed it up, but instead that index made other queries unbearably slow because the optimizer for those queries completely changed how they were executed based on the new index. So I put triggers to copy the view to a cache table every time something changes that the view depends on. But now it takes several seconds for each update of a row, which is still a win, because writes are maybe a few times a day, but reads are continuous. But it means that updating multiple rows at once will now be really, really slow, because the cache table has to always be up to date, and the trigger has to update it preemptively rather than lazily, because querying from the cache table can't cause it to update itself. I might end up having to rewrite the trigger to do the joining in C# if it causes too many performance problems.

On another database, I have to manually call update statistics pretty frequently, or the query optimizer breaks (i.e. extreme slowness), because frequently appending hundreds of records to a table that holds over a hundred thousand rows with an auto-increment primary key should obviously invalidate the entire query plan. I'm not blaming this on a poor job by the query optimizer. Rather, on the fact that SQL depends too much on letting a generic query optimizer decide on the fly what it thinks is the best way to accomplish a complex query, instead of writing code that will always have deterministic performance characteristics no matter what the statistics say. It's a blessing and a curse that the query optimizer can change based on actual data. But that seems much more important for writing ad hoc queries, and when writing an application, I think it's more important to have more direct control over the performance characteristics.

SQL definitely has its strengths, and I have no desire to stop using it, but I don't understand the claim that it's good for writing general business logic. Maybe I just haven't seen the places where it really shines in this regard, so I would like to hear about ways it's better, because maybe I'm doing it wrong.


We don't use SQL as the authoritative store of business state. We use C# types and all the code that goes around them for most of our heavy-lifting.

We map our C# domain model instance into a SQL database (SQLite in-memory) each time we want to execute business rule SQL against the state. These databases are ephemeral. For some interactions, we might generate 5-6 of them per user action. It is very fast to do this when your row counts are low (i.e. projections per user or per user transaction).


What are the advantages of this over using LINQ on the in-memory object model?


You don't have to recompile the application to write new SQL queries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: