In theory it's a nice abstraction, and the benefit is clear. In practice, your repository likely ends up forwarding its arguments one-for-one to SQLAlchemy's select() or session.query().
That's aside from their particular example of SQLAlchemy sessions, which is extra weird because a Session is already a repository, more or less.
I mean, sure, there's a difference between your repository for your things and types you might consider foreign, in theory, but how theoretical are we going to get? For what actual gain? How big of an app are we talking?
You could alias Repository = Session, or define a simple protocol with stubs for some of Session's methods, just for typing, and you'd get the same amount of theoretical decoupling with no extra layer. If you want to test without a database, don't bind your models to a session. If you want to use a session anyway but still not touch the database, replace your Session's scopefunc and your tested code will never know the difference.
It's not a convincing example.
Building your repository layer over theirs, admittedly you stop the Query type from leaking out. But then you implement essentially the Query interface in little bits for use in different layers, just probably worse, and lacking twenty years of testing.
Thanks, that makes a lot of sense. I don't have a whole bunch of experience with SQLAlchemy itself. In general, I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Also in my opinion it's important to decouple the database structure from the domain model in the code. One might have a Person type which is constructed by getting data from 3 tables. A Repository class could do that nicely: maybe run a join query and a separate query, combine the results together, and return the Person object. ORMs usually tightly couple with the DB schema, which might create the risk of coupling the rest of the application as well (again, I don't know how flexible SQLAlchemy is in this).
There could be some value in hiding SQLAlchemy, in case one would ever like to replace it with a better alternative. I don't have enough experience with Python to understand if that ever will be the case though.
All in all, trade-offs are always important to consider. A tiny microservice consisting of a few functions: just do whatever. A growing modulith with various evolving domains which have not been fully settled yet: put some effort into decoupling and separating concerns.
I've used SqlAlchemy in a biggish project. Had many problems, the worst ones were around session scoping and DB hitting season limits, but we had issues around the models too.
The argument for hiding SqlAlchemy is nothing to do with "what if we change the DB"; that's done approximately never, and, even if so, you have some work to do, so do it at the time. YAGNI
The argument is that SA models are funky things with lazy loading. IIRC, that's the library where the metaclasses have metaclasses! It's possible to accidentally call the DB just by accessing a property.
It can be a debugging nightmare. You can have data races. I remember shouting at the code, "I've refreshed the session you stupid @#£*"
The responsible thing to do is flatten them to, say, a pydantic DTO. Then you can chuck them about willy-nilly. Your type checker will highlight a DTO problem that an SA model would have slipped underneath your nose.
The difficulty you have following that is that, when you have nested models, you need to know in advance what fields you want so you don't overfetch. I guess you're thinking "duh, I handcraft my queries" and my goodness I see the value of that approach now. However, SA still offers benefits even if you're doing this more tightly-circumscribed fetch-then-translate approach.
This is partly how I got from the eager junior code golf attitude to my current view, which is, DO repeat yourself, copy-paste a million fields if you need, don't sweat brevity, just make a bunch of very boring data classes.
Just a heads-up if you haven't seen it: Overriding lazy-loading options at query time can help with overfetching.
class Author(Model):
books = relationship(..., lazy='select')
fetch_authors = select(Author).options(raiseload(Author.books))
Anything that gets its Authors with fetch_authors will get instances that raise instead of doing a SELECT for the books. You can throw that in a smoke test and see if there's anything sneaking a query. Or if you know you never want to lazy-load, relationship(..., lazy='raise') will stop it at the source.
SQLModel is supposed to be the best of both Pydantic and SQLAlchemy, but by design
an SQLModel entity backed by a database table doesn't validate its fields on creation, which is the point of Pydantic.
I can't take a position without looking under the hood, but what concerns me is "SqlModel is both a pydantic model and an SA model", which makes me think it may still have the dynamic unintended-query characteristics that I'm warning about.
I seem to recall using SqlModel in a pet project and having difficulty expressing many-to-many relationships, but that's buried in some branch somewhere. I recall liking the syntax more than plain SA. I suspect the benefits of SqlModel are syntactical rather than systemic?
"Spaghetti" is an unrelated problem. My problem codebase was spaghetti, and that likely increased the problem surface, but sensible code doesn't eliminate the danger
I mean that from the point of view of YAGNI for a small app. For a big one, absolutely, you will find the places where the theoretical distinctions suddenly turn real. Decoupling your data model from your storage is a real concern and Session on its own won't give you that advantage of a real repository layer.
SQLAlchemy is flexible, though. You can map a Person from three tables if you need to. It's a data mapper, then a separate query builder on top, then a separate ORM on top of that, and then Declarative which ties them all together with an ActiveRecord-ish approach.
> I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Yep, I hear ya. Maybe if they'd built on top of something lower-level like stdlib sqlite3, it wouldn't be so tempting to dismiss as YAGNI. I think my comment sounded more dismissive than I really meant.
SQLAlchemy Session is actually a unit of work (UoW), which they also build on top. By the end of the book they are using their UoW to collect and dispatch events emitted by the services. How would they have done that if they just used SQLAlchemy directly?
You might argue that they should have waited until they wanted their own UoW behaviour before actually implementing it, but that means by the time they need it they need to go and modify potentially hundreds of bits of calling code to swap out SQLAlchemy for their own wrapper. Why not just build it first? The worst that happens is it sits there being mostly redundant. There have been far worse things.
The tricks you mention for the tests might work for SQLAlchemy, but what if we're not using SQLAlchemy? The repository pattern works for everything. That's what makes it a pattern.
I understand not everyone agrees on what "repository" means. The session is a UoW (at two or three levels) and also a repository (in the sense of object-scoped persistence) and also like four other things.
I'm sort of tolerant of bits of Session leaking into things. I'd argue that its leaking pieces are the application-level things you'd implement, not versions of them from the lower layers that you need to wrap.
When users filter data and their filters go from POST submissions to some high-level Filter thing I'd pass to a repository query, what does that construct look like? Pretty much Query.filter(). When I pick how many things I want from the repository, it's Query.first() or Query.one(), or Query.filter().filter().filter().all().
Yes, it's tied to SQL, but only in a literal sense. The API would look like that no matter what, even if it wasn't. When the benefit outweighs the cost, I choose to treat it like it is the thing I should have written.
It isn't ideal or ideally correct, but it's fine, and it's simple.
You seem to have stopped reading my comment after the first sentence. I asked some specific questions about how you would do what they did if you just use SQLAlchemy as your repository/UoW.
That's aside from their particular example of SQLAlchemy sessions, which is extra weird because a Session is already a repository, more or less.
I mean, sure, there's a difference between your repository for your things and types you might consider foreign, in theory, but how theoretical are we going to get? For what actual gain? How big of an app are we talking?
You could alias Repository = Session, or define a simple protocol with stubs for some of Session's methods, just for typing, and you'd get the same amount of theoretical decoupling with no extra layer. If you want to test without a database, don't bind your models to a session. If you want to use a session anyway but still not touch the database, replace your Session's scopefunc and your tested code will never know the difference.
It's not a convincing example.
Building your repository layer over theirs, admittedly you stop the Query type from leaking out. But then you implement essentially the Query interface in little bits for use in different layers, just probably worse, and lacking twenty years of testing.