I've been working on a similar idea for replacing types. If you can specify how to take apart a type, name the parts and put them back together you can then get validators, pattern matching, recursive-descent parsing with backtracking, generic traversals, lenses and generators more or less for free. By using simple data-structures to describe the types and compiling them to clojure at the call site you can have first-class types without being penalised in performance.
The opening paragraphs really resonated with me including that code example that can be replaced with any function I wrote yesterday.
A lot of my time in Clojure is spent re-reading and repl-evaluating my function implementations just to remind myself what its symbols look like. Even code I wrote an hour ago. Often I stoop to the level of caching my findings in a docstring/comment.
(defn alpha-map-sum-combining-thing
"TODO: Think of better name.
Example input: {:a 1 :b 2}
Example output: {[:a :b] 3}"
[alpha-map]
...)
Sometimes I can spot abstractions and patterns and protocols that unify a namespace of functions so that their inputs/outputs are obvious, but often I can't.
"Often I stoop to the level of caching my findings in a docstring/comment."
I am not ashamed of this. When my co-workers have to read what I wrote, this kind of information makes clear what the function does. If I don't do this, then they will eventually figure it out by running it at the REPL and examining the input and the output, but if I document that in the docstring, I have saved them a lot of time. I also make it possible for them to actually read the code -- otherwise they can't really read it, expect by repeating it at the REPL.
While browsing the docs in a web browser you can fill out some fields for input and see what the output for that input is.
My friend forked it and extracts and lists any example data so you can click to load that example into those fields (my idea!). I tried it out and it's a very nice way to learn/explore a new API, especially testing edge cases where the docs are ambiguous.
It really hits a sweet spot when you have documentation browsing, live execution and example data. I think creating a similar interface combining docstrings, the repl and example data from test fixtures would be a nice tool to have (I find doctests to be one of those things that are better in theory than in practice as they get too long and maintenance is annoying, although perhaps a more literate style in your source code would change that).
Thanks! I definitely have personally run into the issue a lot and our Clojure teams at Prismatic struggled with commenting discipline. We find schemas are easier to maintain and apply, plus they save a lot of time during dev and testing.
It might be easy to consider Schema as "competitor" to core.typed. In fact, it's the opposite: once they play nicely together they will form a formidable bug-fighting, finely-documenting team. :)
core.typed has accurate compile time checking, and Schema gives an expressive contracts interface for runtime checking.
Once they understand each other, you can start pushing and pulling between static and dynamic checking by using both libraries to their strengths.
Currently, core.typed requires all vars have top level annotations. This is partly because there is no way to recover type information once inside a function body. However, if we have an entire namespace using Schema liberally, we can use schemas to recover information!
This means we can lean on schemas for most annotations, and rely on core.typed to catch arity mismatches, bad arguments to clojure.core functions, potential null-pointer exceptions and many more nasties at compile time.
Then you might start adding static annotations or removing schemas, depending on the kind of code your dealing with. You might do some "static" debugging to ask whether a schema is needed to prevent a type error. core.typed would also let you know when your contracts are insufficient to rule out type errors. Really, you're free to use both tools as you'd like.
Schema looks very nice, thanks for open sourcing it Prismatic folks!
I've been really pumped about core.typed since I first heard about it, and I'm excited about the possibilities of combining (or eventually replacing) schema with core.typed.
I think core.typed has come a long way since last time I looked at it (when we started developing schema). Obviously it's a much bigger and more serious effort, and schema was about getting as much bang right away as we could for a few bucks.
I'd love to talk in more detail about how we can collaborate, or at least make sure we play nice together.
I wonder if there's some way to combine schema, core.typed and clj-schema in some way that is better than the sum of its parts. (https://github.com/runa-dev/clj-schema)
I regrettably haven't had much time to see where the cross-section between these libraries is exactly, being very occupied at work. But I'd rather work on one grand thing, than end up with disparate libraries that aren't as fully-featured.
This is a very good idea. And the reason I say that is because in our code base we have developed something similar, feeling the need for it :-). Our stuff is half-baked, though and I never got around to releasing it.
What I'd like to see is something similar to what we did — checking more than just types. Here's an example from our code:
This describes a map containing mappings from strings to maps of a certain kind (where "products" and "type" are required, "products" must map to a sequence of strings and "type" must map to a sequence of strings from a specific set).
From our "conforms" docstring:
Returns true if value conforms to typespec.
typespec can be either:
- a string, which value must match literally;
- a predicate, which value must satisfy;
- [seq sub-typespec], which means that value must be a
collection of elements, each of which must conform to
sub-typespec;
- [map specifiers...], where optional specifiers may include:
:from sub-typespec -- keys in the map must match sub-typespec;
:to sub-typespec -- likewise for values;
:required-keys [key1 typespec1 key2 typespec2...] -- map must
include the required keys and their values must match the
typespecs.
We don't use that for function contracts and it might be overkill for that, but I'm pretty certain I'd like to be able to specify more than just types for map keys. A list of possible values would be tremendously useful.
This could provide you with some ideas. I can of course contribute the "conforms" function (although it isn't that difficult to write).
You mention generating core.typed annotations from schemas, to allow for some compile-time checking.
I would have thought that this would be extremly hard to automate for non-trivial schemas, or at least that core.typed would have a hard time proving that return values match a schema. Is this only for a subset?
core.typed will have to be extended to understand schemas. core.typed already plays nicely with assertions and branches (via occurrence typing), so we just need information on what type a schema "casts" to. This will not be difficult, it might not even have to live in core.typed's implementation and be extended via multimethods.
One of the problems with gradually annotating an untyped namespace with core.typed is we immediately have to be very accurate with our type annotations.
However if the namespace is completely "schema'ed", we can assume functions simply take `Any` as arguments and rely on Schema to regain type information.
Hey, I like what you're doing here, I think I'll use this... it gives you a lot of the benefits of a deep type system while being completely orthogonal to the design of the program and without the cognitive overhead of a complex type system.
Also, functional programming is heavily focused on data transformations, which in practice means lots of deeply nested heterogenous data structures... these types of structure are usually tedious to put into a static type system, but your system appears to make it easy.
Thanks! Yeah I think a traditional type system wouldn't make sense for Clojure, you need something that can describe data structures more succinctly, which is what we were shooting for
Just so readers don't deduce you're talking about core.typed, core.typed is plenty expressive enough to represent many idiomatic Clojure data structures, and very succinctly.
Definitely an inspiration. The core utility of Schema is that the declarations are still data and can be utilized in code. I think we'll have some interesting applications of this idea soon...
Thanks so much. We spent a long time thinking about how to
have the benefits of types in a Clojure-y data-oriented way. The advantage of having Schemas as data is that we can use Clojure code to process them and generate things like Objective-C classes (https://github.com/Prismatic/schema/blob/d82bf0b049fc1205a81...) and Avro specifications. It will be our glue to declarations in other languages.
What timing! I have been making this exact thing myself.
Would you be interested in pull requests making your api simpler? Maybe allow parameters to not have to have shapes? Perhaps allowing a syntax that allows a simple way to shapes in the meta alongside the ability to change the signature for people who don't want to couple too tightly to the library? When I showed mine to my local user group last month, that was one of their biggest requests (that I was in the middle of on mine but would be happy to attempt in yours).
Schemas are already optional -- you can have a s/defn with no, some, or full schematization. You can also put the schemas in the meta if you don't like the :- syntax. I'm not quite sure what you mean by the other point though. My email is in my profile, let's talk :)
Also, what about having your defn macro generate a second function like foo-t that just calls with-fn-validation on foo, to clean up that extra call? Would you be interested in such a pull request?
I'd have to think about that a bit -- I'm hesitant to create names not provided by the user, but maybe there is another way.
Quite honestly that part of things (turning validation on and off, etc) isn't really done yet -- we have plans to make it much more flexible and powerful, but haven't got there yet.
I feel your argument against just using Scala was weak, but you guys are obviously smart and feel building Schema was the right choice. So could you elaborate a bit on the decision to stick with Clojure when your problem domain lends itself to types?
First and foremost, schemas are simple, minimal, easy to read and write, and gracefully extend Clojure's existing type hints. This means that (in my biased opinion) they are significantly better for documentation, which was the primary motivation for developing them.
Second, schemas are data, so it's easy to do more with them beyond documentation. Runtime data validation is one such use, but we can also easily do things like generate core.typed annotations, generate model classes for clients, generate test data, and so on.
I think core.typed is restrictive enough to allow annotations like the one above, while being opinionated enough to guide the programmer to write clear, comprehensible types.
Sorry, that didn't come across the way I wanted -- it certainly wasn't meant as a dig on the expressiveness or power of core.typed, which is a project I'm really excited about.
The main driver for Schema was to make annotating function inputs and outputs as simple and readable as possible. Personally, I find annotations directly on the function arguments easier to parse than separate function type declarations, but I suppose that's a matter of preference.
FWIW there are macros like fn> that allow you to write (fn> :- Number [a :- Symbol] 1), but you lose the ability to write ordered function types with multiple cases, so I don't use it very much.
Seems like the major distinctions I see here compared to say Haskell (and Scala to the degree that Scala is equivalent to Hs) is that Schema are dynamic and first-class.
Dynamic means that their usefulness is tied somewhat to your ability to exercise code paths. Compare this to Hs's static types where the type logic of your entire program (and all dependent libraries) is checked upfront before compilation. Schemata likely must be triggered by a validation function being called on live code. Endless further argumentation about this distinction goes here.
First-class comes from Schema's dynamic nature as well but is worth further investigation. Schema look like they can be arbitrary functions of the arguments, much like inserting `assert`s at the beginning of a function and then flipping those on or off at a later time. They can also be composed/decomposed/analyzed as Clojure values. This vastly increases the flexibility and complexity of Schema for better and worse. You can express much more sophisticated invariants in your Schema than you can in Haskell types. It looks like it's even possible for these invariants to be value-based—a concept which, in static typing land, is deep into research territory.
I'd say these contract-like invariant checkers are in a pretty different boat from static types. They check different classes of things at different times and make vastly different promises. What they both provide however, so long as your Schema don't get too complex, is some wonderful "living" documentation.
It's still work in progress but there are working examples as of https://github.com/jamii/strucjure/commit/e0e56a25c1b880c382...
It's similar to the old ideas at http://scattered-thoughts.net/blog/2012/12/04/strucjure-moti... but significantly simpler. I'm hoping to be able to release at least the core functionality in a few weeks.