Yes, and this is where statically typed languages shine (in my opinion). I like programming in a style that makes heavy use of the type system to enforce this.
For example when writing an api endpoint to create a task I would typically deserialise the json into a CreateTaskRequest. If the object is created without exceptions I can be sure it is valid. CreateTaskRequest implements the ToTask interface. The service layer takes only objects of this interface and converts into a Task object that gets persisted. The persisted Task then gets converted into a TaskResponse so only valid JSON comes back out.
Lots of classes and interfaces but they are all small and with a single purpose.
This is more or less how my teams' back end REST API server code is written as well.
It's far, far superior to the idiotic God classes most Java devs tend to write - the ones covered in attributes that are sometimes populated, other times not, and annotations for ten different purposes. Just no. Don't use the same single class to parse incoming requests, persist records at the innermost db layer, serializing the outgoing json etc etc etc.
The sane way to do it is to have a dedicated class to parse incoming requests, another dedicated class that represents a validated and decorated record to be persisted (that doesn't have an id attribute, because it hasn't been persisted yet), another that represents a persisted record (with an id now, that is guaranteed to never be null), another that represents the outgoing response etc etc.
A common criticism of this style is that it's wordier, there are too many types, and it can be hard to follow if you're not used to it. The thing is, that's the price you pay for more accurately modelling what's actually going on at each step. The "use a single God class for everything" alternative is "easy to follow" only because it omits important differences that are actually there at each step, which doesn't mean they're not there - they are there, they're just hidden.
I feel the same way. Much better to encode the complexity of what you're modeling in types instead of ignoring the inherent complexity. By precisely encoding the shape of the data in types you are forced to engage with the values it can represent which leads to better understanding and better code in my experience, similar to how writing tests can lead to better code, except it's much faster to get feedback.
That's basically how most modern frameworks do it.
Django-Rest-Framework does that with view + serializer + validators.
A more recent example, with a leaner and cooler implementation is FastAPI, where type hints are also use to declare validation on your end point, while a model is used for serializing the response.
> If the object is created without exceptions I can be sure it is valid.
Without some kind of custom validation system in place (JSON schema, property attributes, etc), that doesn't tell you much. With most serialization libraries I've used the default settings would let you deserialize {} into any class without an exception and all the properties would just have the default value for their type. Using stricter setting gets you a little more but definitely no guarantee of a logically valid state. If I want to go even one step past what little validation static typing provides, and I usually do, I'd rather just take the type noise completely out of my data and move all validation to a single place.
FWIW, this sort of thing can be done with a dynamic language, too. It's true that some static languages happen to be really far ahead of the curve with this sort of thing. But it's also true that some dynamic languages put you in a better position on this front than many of the most popular static languages do.
(And for those of us who really do love an intractable quagmire, there's always JavaScript.)
For my part, I tend to find this debate to be mostly a distraction, because the influence of the language's type discipline is quite small relative to the influence of the programmer's coding discipline. In the group I'm working with currently, the most committed fans of static typing tend to also be the ones who have the greatest tendency to assume, "If it compiles, it works," and proceed to check in glaring bugs. I don't bring that up in order to point fingers at static typing proponents (I tend to prefer static myself, though my preference is not particularly strong) so much as to point out that we should be wary of memes that subtly encourage us to become complacent.
That's an interesting perspective (and now that you mention it I can think of someone at work who loves types and also tends to break stuff). I myself love static types because it allows me to avoid certain classes of errors and I try my hardest not to introduce bugs. Static typing also greatly informs my workflow. I tend to practice type driven development to the extent possible so when I go back to dynamic languages I feel like I'm coding with one hand tied behind my back since I can't encode my assumptions in types and rely on the type checker to help me validate them.
Just as bad are those that check in code that that works, but no longer makes logical sense when you read the code. The following is a simple case of what I'm talking about:
var flag_is_unset = flag_is_set
I generally land more on the dynamic side of things, but there are certainly problem domains where I love types. The more closed and "mathy" the domain, the better I think types fit. I just wish it was less an all-or-nothing choice, and that people were less religious about it.
I keep wishing someone would take C#'s concept of dynamic references and run with it.
Optional static typing, like what you get in Typescript or MyPy, is interesting, but defaulting to dynamic and making static opt-in undermines a lot of the potential benefit of static typing. I don't think that it works the other way around, though. My hunch, which I am not particularly prepared to defend, is that, at least as long as you've got good type inference, defaulting to static and making dynamic opt-in does let you keep most of the practical benefits of dynamic typing.
I guess it's kind of like unsafe code blocks, also from C#: Pointers and weak typing can be very useful in some circumstances, but I'm generally much happier having the compiler try to guarantee as much as it can, and then sometimes be able to tell it, "Nah, hang on, I got this one."
> defaulting to dynamic and making static opt-in undermines a lot of the potential benefit of static typing. I don't think that it works the other way around
Yes, there's certainly truth to this. However, I write a lot of TypeScript and it provides an _exquisite_ on ramp in that you can add types gradually and telling the compiler not to worry about it can be super useful, especially if you're dealing with poorly behaved third party stubs that may have diverged from the actual implementation. The fact that you can use TypeScript as a super-duper linter _or_ try to maximally leverage its powerful type system is a huge strength. It's also very helpful for adoption. As frustrated as I get sometimes with TypeScript's unsoundness and the lack of pattern matching, the fact that I get to use it instead of JavaScript (and have also got half the company using it!) is a huge win.
Strongly depends on the language. If you have ML style data types (eg ML, Haskell, OCaml, Rust) ands some way to abstract types (ML structs, Haskell modules hiding type constructors, OCaml modules, Rust modules) then it is generally possible to design your data structures in such a way that invalid states cannot be represented. If a user has an optional first/last name but if one is specified then so is the other, your user type has a name of type (eg) Maybe (string, string) [1]. If your deserialisation framework just fills in default values for everything or let’s you have unused fields then it is, in my opinion, broken. If you have a field that should be a positive bumber, you should have a type for a positive number that fails to deserialise if you give it a negative number.
[1] the caveat is that it somewhat sucks to change these restrictions and therefore the types. Compilers can hopefully make these refractors easier, but they may still suck. In languages like clojure, you mostly have to try to write programs/tests to be resilient to any reasonable changes to the data structures that you can imagine.
That depends on the language and deserialisation library used. And that is also the reason why I have multiple classes.
My CreateTaskRequest class which is used for deserialisation only does not have defaults (unless intentional) so it throws when required values are not present in the source json. It also takes care of dateformat/uuid parsing. The output is always fully valid typed or it throws. It’s a 1-to-1 mapping on what is described in the API docs.
The service layer that takes CreateTaskRequest and converts to Task for persistence is where the business logic validation happens (valid foreign keys, date ranges, etc, unique checks). Cleanly separated from deserialization.
For reference: I use Kotlin with Jackson which has great support for this.
You implement your own custom validation. The point is to encode the fact that you've validated a value in its type. So you have e.g. DeserializeJSON<Foo>(String) -> Foo, and then ValidateFoo(Foo) -> ValidatedFoo. And then all the business code works on ValidatedFoo.
I know the point, I'm a C# dev by trade, but if I'm going to implement validation that goes beyond static type checking, which is most of the time, I'd rather put it all in a single place and not have to deal with the type nuisance in every single line of code I write.
Types are a huge help for implementing validation in one place - by having separate types for non-validated and validated versions, you can get the compiler to ensure that every code path actually goes through validation. Static type checking isn't in opposition to custom validation code, it supports it.
Isn't this as far from Clojure's "it's just data" as it gets? The author was making the case for not doing this, ie. turning data into Java-style classes and interfaces.
One thing I wish for is some kind of "type tags". Being able to express concepts like List[Widget], List[Widget, Nonempty], List[Widget, Nonempty, Sorted], Vector[User, Sorted], etc. - or even, more generic, <Container>[<T>, NonEmpty] (where <Container> and <T> are parameters, like in C++ templates) - without implementing an explicit new type for each. Logic verification through typing would then involve not just changing "main" types, but also adding and dropping "tags" from the "set of tags" attached to the "main" type. This should cut down on the amount of boilerplate.
Hell, in the extreme, perhaps types in general could be generalized as a set of tags?
This is quite similar to units (e.g. F# units of measure) or taints (Perl, for tracing user input and detecting unsafe usage).
A large part of the point of these systems is that you can write code which generalizes across the different subtypes (different units, tainted vs untainted values) yet it still passes through invariants by default (values calculated from tainted input are themselves tainted, a unit-quantified value multiplied by a scalar retains its unit, and so on).
I've done similar things with syntax trees in compilers. With multiple passes, you have a representation for the output of the parser, then you might do type annotation, rewrites due to coercion etc., symbol binding and overload resolution, constant expression evaluation, and so on in separate passes. If the input and output trees for each pass have different types, it's easier to keep track of what's going on, of what invariants hold for any given tree node, or indeed which node types are permissible to exist in the input or output of a pass (e.g. you don't want unbound overload calls after overload resolution).
One problem is that tags probably aren't enough when you're dealing with data structures rather than simple zero-dimensional values. Some methods shouldn't be called, or some fields shouldn't be accessed, if a value has a type with the wrong tag. But if you have a wholly different type, then you can end up reimplementing a lot of the same logic, once per type, simply to get the types to flow through.
One solution I've used a couple of times for the tree problem is tree grammars. That is, a data structure which encodes a description of a valid tree and can encode invariants like nodes of type T need to have attributes of type Y with values satisfying predicate P, and between N and M children of type U, V or W, and so on. The grammar is defined in terms of an abstract supertype, and the concrete subtypes have the grammar specific to their phase baked in. This is a hybrid between static and dynamic typing, a compromise necessary for languages like Java without much expressiveness in the type domain.
I will, thanks. I never had a chance to catch up on the "state of the art" in typing systems.
I'm not even sure if my "types as a set of tags" idea makes much sense - perhaps it decays to what is typically understood as types. Or perhaps it hits computability problems.
I did some mental experiments on a "set of tags" type system last night, and I quickly realized the complexity will be around deciding when to keep a tag (property) on a type, and when to drop it. With a set of tags being open, a function could not possibly know about them. This creates a problem.
Consider a function like: Sort([List[T], ...], [Function([T, T] -> Bool)]) -> [List[T], Sorted, ...]. Takes a list and a comparator, sorts the list, attaches a "Sorted" tag to return type, retains all other tags. Looks like a reasonable definition. Let's look at the use cases below (with P being some applicable predicate):
- Sort([List[Int], NonEmpty], P) -> [List[Int] Sorted, NonEmpty] -- Awesome, exactly what I want!
- Sort([List[Int], Foobar], P) -> ...? Should it be [List[Int], Sorted, Foobar]? But what if Foobar designates some business rule that depends on the element ordering? How is Sort supposed to know?
Without knowing a tag, we'd sometimes want a function to drop it, and other times want the function to retain it. I have no idea how to solve this at compile time (or even at runtime, without crippling the tagging system).
A language like Kotlin can do some of these things using delegates, interfaces and extension methods.
For example a MutableList<T> can be dropped down to a List<T> which is not mutable. And a generic conversion from Collection<T> to NonEmptyCollection<T> should be trivial to write as an extension method.
Can Kotlin handle multiple "tags" on a type as a set, and not a sequence? I'm not familiar with the language, so I'll use a C++ analogy. If you tried to tag types in C++, you'd end up with something like:
TaggedType<std::vector, NonEmpty, Sorted>
but such type is strictly not the same as:
TaggedType<std::vector, Sorted, NonEmpty>
What I mean by "set" instead of a "sequence" is to have the two lines above represent the same type, i.e. the order of tags should not matter.
You can maybe get there in kotlin with a generic type with multiple constraints in a where clause. Let’s say you have Sorted and NonEmpty as interfaces (could be empty marker interfaces so they behave like tags). Then you can write a method
fun <T> doSomething(values: T) where T: Sorted, T: NonEmpty {}
And that function will take any type that has both Sorted and NonEmpty interfaces.
Personally I like that bit of extra code because it gives every class one reason to exist. There are no conflicts so the type system can be used fully without ambiguity.
I’ve always disliked the way early rails promoted fat models that combined serialisation, deserialisation, validation, persistence, querying and business logic in the same class.
I personally bounce back and forth about this. My experience is probably colored by the fact that I'm doing this in C++. Boilerplate gets annoying there (and attempts to cut it down tend to produce lots of incomprehensible function templates). I like the idea of using types to encode assertions at a fine granularity. I dislike the amount of tiny little functions this creates. I also dislike that the resulting code is only navigable with an IDE - otherwise you spend 50% of your time chasing definitions of these little types.
Ok yes, C++ might not be the greatest language for this.
My experience here is mostly from Kotlin which is a great language for this. Nullability, extension methods, (reified) generics, data classes, delegates, etc can all help reduce boilerplate.
You should take a look at Scala. Lots of things that have to be special-case language features in Kotlin become just a straightforward use of higher-kinded types or a combination of a couple of existing language features.
I think it is always important to understand that the data models and the approaches to modelling the world are totally different between Clojure and strongly typed languages. I’m going to ignore C++ and Java style languages because I think they give a Clojure-style model of the world without any of the benefits of a well-suited programming language or a type system that can enforce new invariants.
In the ML-family of language, you have a few key things:
1. Primitive types like ints, bools, strings, floats, arrays, not much else.
2. Product types which are records/triples of other types
3. Sun types which are proper tagged unions of types. Including things like optional or result (aka or-error) types, and also list (= nil or cons) types.
4. Abstract types which are types whose representations are hidden. You can have an abstract type called “hour_of_day” which is secretly backed by an int but which you can only interact with by using conversion functions or eg something that adds two values (mod 24).
5. Polymorphic types: you can have a list type which can be a list of units or a list of floats but not really a list of a mix of arbitrary different things.
The idea is to represent with these types a model of the world in such a way that only valid states of the world can be constructed. A user’s bank account balance isn’t an int, it’s a positive_dollars and if you try to do a transaction to make it negative, that isn’t possible as you can’t construct a suitable positive_dollars value. This can have annoying difficulties for maintainability because it is tedious to invert or change a one-to-one or one-to-many relation (eg previously 1 tax_number per person, now 1 person per uk_tax_number and one or two people per us_tax_number) and hard to represent with types a many-to-many relation like “every person has at least 1 bank account, and every bank account is associated with at least one person, and the bank accounts associated with person A have A amongst their associated persons.”
The promise is that the type system makes the practice and correctness of these refactorings easier.
In Clojure, the data model is more like:
1. There is a rich set of primitive, atomic types, eg strings, ints, but also dates and symbols and keywords and fractions and Uris and so on
2. There are collections like lists (of logically unalike objects), vectors (of logically alike objects), hash tables, and sets.
Data is built up out of e.g. hash tables of keywords to objects. The language has a rich set of features for acting on these types so one can do a lot with hash tables, whereas in an ML system only a few operations are available with a record type (eg constructing, reading fields, maybe updating them) and functions that do general things with any record can’t really be written. Closure is a language for manipulating data in general more than a framework for writing functions to manipulate your small, strict data types.
Clojure tries to model the world in a metarational way, accepting that it is unlikely that one can write a strict scheme capturing all and only valid states and instead programs should try to allow for the possibility of extra or missing information. You don’t want to care about whether you have a us_person or an eu_person so much as whether your data has a :person/preferred-name field. Issues with relations come up less because those relations are not trying to be forced into rigid types (they may be enforced by a database though—another cultural difference between Clojure and ML-family languages).
Fundamentally, I think the differences stem more from philosophies about modelling the world than type systems.
> Fundamentally, I think the differences stem more from philosophies about modelling the world than type systems.
It seems like people mistake the battleground for the countries. If you only saw the US and Japanese battles in early World War II, you'd think they were tiny nations of ships, soldiers and aircraft fighting over a handful of islands. It wouldn't be clear that there were full countries with big populations backing all this, and that they were thousands of miles apart.
So Rich Hickey gives a talk criticizing types, and a Haskeller responds with a blog post, and you think that types are the difference between the camps. But it's not. It's just that this is the spot that's close enough to home to reach, but contentious enough to fight over.
> Yes, and this is where statically typed languages shine (in my opinion).
Indeed. This feels like a just-before-the-moment-of-realization situation.
The endless cycle between "more dynamic" and "more static" continues it seems.
I wonder if there is any correlation between experience in the field and static vs. dynamic vs. "fail fast dynamic".
(I'd say Erlang falls in the latter category and it has a pretty good track record for reliability, but so does Python. It's an imperfect axis for sure.)
I like this design for making requests, but I don't like it for serving requests. Because if you don't just make every field a String in CreateTaskRequest, and you want to examine it in some way, then you're losing information. E.g. if you make a field an int, and they provide a non-int value, your CreateTaskRequest can't hold that value so its just gonna have a 0.
A class full of strings feels like a code smell - but that is the proper representation of serving the request. And the alternative is needlessly restrictive. Ultimately it feels like pointless ceremony.
If a field is an int and the incoming data has a string the parsing shouldn't succeed - instead it should give an error like "expected an int but saw a string". Maybe include the string it saw in the error. Add in the JSON path for bonus debugging points.
You shouldn't get to the point of having a CreateTaskRequest with a 0 that shouldn't be there.
My concern with all-or-nothing validation of the sort you're suggesting is that sometimes the invalid data is data that isn't your concern and doesn't matter to the task you're trying to accomplish. It doesn't even have to be a mistake on the sender's side; maybe something evolved, maybe you made the mistake. I prefer defining the minimum I need for a specific function, instead of defining exactly what I expect. For much the same reasons as we scribble in the margins sometimes.
When encoding types for requests like this, I'm exactly as strict as I need to be: no more, no less. Any assumption about the shape of the data that my code will make is encoded in the parser, but the parser is very generous with everything else.
For example, if a request has a field I didn't expect, that's not an error, I simply ignore that field because I obviously didn't need it. Likewise if requests start using a more specific schema, such as now sending only positive integers. If, however, the schema changes such that a field I rely on is dropped or changes type, my code will eventually fail anyway as soon as it makes an assumption about the data. Why not fail at the point of entry instead of saving it for later?
That sounds like it might work. I was commenting based on my experience of people deciding there had to be a single representation for an idea/type no matter what it was used for, across multiple projects and repositories. I was not dismissing appropriate validation at a boundary, just trying to pointing out that, at least in my experience, people get way too wound up about enforcing unnecessary consistency.
I understand and have done that before with XyzRequest classes. What I'm saying is the field shouldn't be an int. It's not the proper level of abstraction for what it represents, and any of: tossing out, transforming, or hiding under layers of abstraction the data that comes into your system like this is a bad idea and is the sort of thing that makes people hate OOP. Especially when perfectly fine alternatives exist for this.
Interesting. I'd say the proper level of abstraction for the incoming JSON is some structure that can represent all valid JSON. Then you convert from that JSON thing into the form you expect - in this case a CreateTaskRequest (with int fields and whatnot) - which would be the proper level of abstraction for the code that deals with creating tasks.
Is that substantially different from what you suggest?
I've caused myself a lot of problems by using/creating types that don't make sense. Coding myself into a corner. So I don't think it's fair to not count type declarations as code.
For any type A -> B operation, it's possible to fail - that is the whole point of parsing into type B, to catch when B's invariants would not be satisfied. The bug could be as simple as neglecting to handle that failure scenario.
Some languages make this less likely (Haskell, Rust) but most mainstream languages will happily let you introduce this bug.
For example when writing an api endpoint to create a task I would typically deserialise the json into a CreateTaskRequest. If the object is created without exceptions I can be sure it is valid. CreateTaskRequest implements the ToTask interface. The service layer takes only objects of this interface and converts into a Task object that gets persisted. The persisted Task then gets converted into a TaskResponse so only valid JSON comes back out.
Lots of classes and interfaces but they are all small and with a single purpose.