Hacker News new | past | comments | ask | show | jobs | submit login
Schema for Clojure(Script) Data Shape Declaration and Validation (getprismatic.com)
130 points by sorenmacbeth on Sept 5, 2013 | hide | past | favorite | 56 comments



I've been working on a similar idea for replacing types. If you can specify how to take apart a type, name the parts and put them back together you can then get validators, pattern matching, recursive-descent parsing with backtracking, generic traversals, lenses and generators more or less for free. By using simple data-structures to describe the types and compiling them to clojure at the call site you can have first-class types without being penalised in performance.

It's still work in progress but there are working examples as of https://github.com/jamii/strucjure/commit/e0e56a25c1b880c382...

    (using strucjure.sugar
      ;; define a pattern
      (def peano-pattern
        (graph num ~(or ~succ ~zero)
               succ (succ ~num)
               zero zero))
      (comment ;; desugars to 
        {'num (->Or (->Node 'succ) (->Node 'zero))
         'succ (list 'succ (->Node 'num))
         'zero 'zero})

      ;; define a view over that pattern
      (def peano->int
        (view peano-graph {'succ (fnk [num] (inc num))
                           'zero (fnk [] 0)}))

      (peano->int 'zero) ;; => 0
      (peano->int '(succ (succ zero))) ;; => 2
      (peano->int '(succ (succ succ))) ;; => throws MatchFailure
    )
                           
It's similar to the old ideas at http://scattered-thoughts.net/blog/2012/12/04/strucjure-moti... but significantly simpler. I'm hoping to be able to release at least the core functionality in a few weeks.


Very cool. Also check out the implementation of encapsulation in John Shutt's Kernel Lisp: http://web.cs.wpi.edu/~jshutt/kernel.html


Interesting. I've seen his fexpr work before on LTU but never really looked at it closely. I suppose the only way to go is to read the whole thesis?


Nice! Really looking forward to the release!


The opening paragraphs really resonated with me including that code example that can be replaced with any function I wrote yesterday.

A lot of my time in Clojure is spent re-reading and repl-evaluating my function implementations just to remind myself what its symbols look like. Even code I wrote an hour ago. Often I stoop to the level of caching my findings in a docstring/comment.

    (defn alpha-map-sum-combining-thing
      "TODO: Think of better name.
       Example input: {:a 1 :b 2}
       Example output: {[:a :b] 3}"
      [alpha-map]
      ...)
Sometimes I can spot abstractions and patterns and protocols that unify a namespace of functions so that their inputs/outputs are obvious, but often I can't.

This kind of tool is essential for me.


I do this a lot:

"Often I stoop to the level of caching my findings in a docstring/comment."

I am not ashamed of this. When my co-workers have to read what I wrote, this kind of information makes clear what the function does. If I don't do this, then they will eventually figure it out by running it at the REPL and examining the input and the output, but if I document that in the docstring, I have saved them a lot of time. I also make it possible for them to actually read the code -- otherwise they can't really read it, expect by repeating it at the REPL.


Yeah, you're right. I would love if every Clojure function came with an example input and output.

What I meant to suggest is that my docstring example is something that could be expressed in code.


A friend of mine has been doing similar things with the Swagger documentation spec (https://developers.helloreverb.com/swagger/)

While browsing the docs in a web browser you can fill out some fields for input and see what the output for that input is.

My friend forked it and extracts and lists any example data so you can click to load that example into those fields (my idea!). I tried it out and it's a very nice way to learn/explore a new API, especially testing edge cases where the docs are ambiguous.

It really hits a sweet spot when you have documentation browsing, live execution and example data. I think creating a similar interface combining docstrings, the repl and example data from test fixtures would be a nice tool to have (I find doctests to be one of those things that are better in theory than in practice as they get too long and maintenance is annoying, although perhaps a more literate style in your source code would change that).


This would let you run tests, too: https://en.wikipedia.org/wiki/Doctest

On the other hand, why not just make it statically typed and list the types for each function?


You would think that being most functions being pure, that these types of docsctrings would be more common (or standard, even).


This is exactly what R does. It's a huge timesaver. Best. docs. ever.


May I suggest stealing some of core.typed's syntax for your docstrings, especially for polymorphic functions?

It scales very well, from simple types like clojure.core/symbol:

https://github.com/clojure/core.typed/blob/57da1175037dfd61c...

to polymorphic types like clojure.core/repeat:

https://github.com/clojure/core.typed/blob/57da1175037dfd61c...

to succinct representations of complicated polymorphic types like clojure.core/fnil

https://github.com/clojure/core.typed/blob/57da1175037dfd61c...

and I'll spare you the really insane annotations.

Of course you can pick and choose what's to your liking.

Here's a brief tutorial on type syntax: https://github.com/clojure/core.typed/wiki/Types


Thanks! I definitely have personally run into the issue a lot and our Clojure teams at Prismatic struggled with commenting discipline. We find schemas are easier to maintain and apply, plus they save a lot of time during dev and testing.


This has been my experience also, even as I'm simultaneously learning clojure and building my own webapp.


core.typed author here.

It might be easy to consider Schema as "competitor" to core.typed. In fact, it's the opposite: once they play nicely together they will form a formidable bug-fighting, finely-documenting team. :)

core.typed has accurate compile time checking, and Schema gives an expressive contracts interface for runtime checking.

Once they understand each other, you can start pushing and pulling between static and dynamic checking by using both libraries to their strengths.

Currently, core.typed requires all vars have top level annotations. This is partly because there is no way to recover type information once inside a function body. However, if we have an entire namespace using Schema liberally, we can use schemas to recover information!

This means we can lean on schemas for most annotations, and rely on core.typed to catch arity mismatches, bad arguments to clojure.core functions, potential null-pointer exceptions and many more nasties at compile time.

Then you might start adding static annotations or removing schemas, depending on the kind of code your dealing with. You might do some "static" debugging to ask whether a schema is needed to prevent a type error. core.typed would also let you know when your contracts are insufficient to rule out type errors. Really, you're free to use both tools as you'd like.

Schema looks very nice, thanks for open sourcing it Prismatic folks!


I've been really pumped about core.typed since I first heard about it, and I'm excited about the possibilities of combining (or eventually replacing) schema with core.typed.

I think core.typed has come a long way since last time I looked at it (when we started developing schema). Obviously it's a much bigger and more serious effort, and schema was about getting as much bang right away as we could for a few bucks.

I'd love to talk in more detail about how we can collaborate, or at least make sure we play nice together.


Fantastic, bring it on!


I wonder if there's some way to combine schema, core.typed and clj-schema in some way that is better than the sum of its parts. (https://github.com/runa-dev/clj-schema)

I regrettably haven't had much time to see where the cross-section between these libraries is exactly, being very occupied at work. But I'd rather work on one grand thing, than end up with disparate libraries that aren't as fully-featured.


Author here. Happy to answer any questions or comments.


This is a very good idea. And the reason I say that is because in our code base we have developed something similar, feeling the need for it :-). Our stuff is half-baked, though and I never got around to releasing it.

What I'd like to see is something similar to what we did — checking more than just types. Here's an example from our code:

(conforms example-object [map :from string? :to [map :required-keys {"products" [seq string?] "type" [seq #{"type1" "type2"}]}]])

This describes a map containing mappings from strings to maps of a certain kind (where "products" and "type" are required, "products" must map to a sequence of strings and "type" must map to a sequence of strings from a specific set).

From our "conforms" docstring:

Returns true if value conforms to typespec. typespec can be either: - a string, which value must match literally; - a predicate, which value must satisfy; - [seq sub-typespec], which means that value must be a collection of elements, each of which must conform to sub-typespec; - [map specifiers...], where optional specifiers may include: :from sub-typespec -- keys in the map must match sub-typespec; :to sub-typespec -- likewise for values; :required-keys [key1 typespec1 key2 typespec2...] -- map must include the required keys and their values must match the typespecs.

We don't use that for function contracts and it might be overkill for that, but I'm pretty certain I'd like to be able to specify more than just types for map keys. A list of possible values would be tremendously useful.

This could provide you with some ideas. I can of course contribute the "conforms" function (although it isn't that difficult to write).


Thanks for the feedback!

Actually, schema can express arbitrary constraints. Your example translates to schema as:

  {String {(s/required-key "product") [String]
           (s/required-key "type") (s/enum "type1" "type2")
            s/Any s/Any}} ;; allow any other k-v pairs


Interesting! Is the enum automatically a vector/sequence, or is that just an omission in the example?

Looks like I'll be using this library sooner than I thought, thanks!


Nope, I missed that in the example -- the enum should be in square brackets.


You mention generating core.typed annotations from schemas, to allow for some compile-time checking.

I would have thought that this would be extremly hard to automate for non-trivial schemas, or at least that core.typed would have a hard time proving that return values match a schema. Is this only for a subset?


core.typed will have to be extended to understand schemas. core.typed already plays nicely with assertions and branches (via occurrence typing), so we just need information on what type a schema "casts" to. This will not be difficult, it might not even have to live in core.typed's implementation and be extended via multimethods.

One of the problems with gradually annotating an untyped namespace with core.typed is we immediately have to be very accurate with our type annotations.

However if the namespace is completely "schema'ed", we can assume functions simply take `Any` as arguments and rely on Schema to regain type information.

This could be a very nice way to work.


Hey, I like what you're doing here, I think I'll use this... it gives you a lot of the benefits of a deep type system while being completely orthogonal to the design of the program and without the cognitive overhead of a complex type system.

Also, functional programming is heavily focused on data transformations, which in practice means lots of deeply nested heterogenous data structures... these types of structure are usually tedious to put into a static type system, but your system appears to make it easy.


Thanks! Yeah I think a traditional type system wouldn't make sense for Clojure, you need something that can describe data structures more succinctly, which is what we were shooting for


Just so readers don't deduce you're talking about core.typed, core.typed is plenty expressive enough to represent many idiomatic Clojure data structures, and very succinctly.

eg. Heterogeneous keyword maps https://github.com/frenchy64/core.typed-example/blob/master/...



Thank you for this. I might want to start using Clojure in production now :)


You're welcome :) Please let us know if you have ideas for improving it


I posted on the wrong level, my suggestions are here https://news.ycombinator.com/item?id=6339708


This is really brilliant, and damned useful. Reminds me of how type declarations provide self documenting code in Haskell. Thanks alot!


Definitely an inspiration. The core utility of Schema is that the declarations are still data and can be utilized in code. I think we'll have some interesting applications of this idea soon...


Great stuff, thanks for releasing this. Definitely going to use it soon.


Thanks so much. We spent a long time thinking about how to have the benefits of types in a Clojure-y data-oriented way. The advantage of having Schemas as data is that we can use Clojure code to process them and generate things like Objective-C classes (https://github.com/Prismatic/schema/blob/d82bf0b049fc1205a81...) and Avro specifications. It will be our glue to declarations in other languages.


Interesting work. BTW, there's a typo: "shee here" should be "see here".


I assumed it was meant to be read in Sean Connery's voice.


What timing! I have been making this exact thing myself.

Would you be interested in pull requests making your api simpler? Maybe allow parameters to not have to have shapes? Perhaps allowing a syntax that allows a simple way to shapes in the meta alongside the ability to change the signature for people who don't want to couple too tightly to the library? When I showed mine to my local user group last month, that was one of their biggest requests (that I was in the middle of on mine but would be happy to attempt in yours).

Check my (simpler and more immature) library here https://github.com/steveshogren/deft


Sure, we're happy to consider pull requests.

Schemas are already optional -- you can have a s/defn with no, some, or full schematization. You can also put the schemas in the meta if you don't like the :- syntax. I'm not quite sure what you mean by the other point though. My email is in my profile, let's talk :)


Also, what about having your defn macro generate a second function like foo-t that just calls with-fn-validation on foo, to clean up that extra call? Would you be interested in such a pull request?


I'd have to think about that a bit -- I'm hesitant to create names not provided by the user, but maybe there is another way.

Quite honestly that part of things (turning validation on and off, etc) isn't really done yet -- we have plans to make it much more flexible and powerful, but haven't got there yet.


I really like that your defn macro generates a real function, good call on that!


I feel your argument against just using Scala was weak, but you guys are obviously smart and feel building Schema was the right choice. So could you elaborate a bit on the decision to stick with Clojure when your problem domain lends itself to types?


My guess would be they want to continue using a Lisp-style language.


Very interesting. I have a somewhat related project called Herbert, which attempts to define a schema language for EDN.

https://github.com/miner/herbert


This looks like it occupies the same space as core.contracts and core.typed. Is the main benefit, over those two, cljs support?


I think there are a few benefits.

First and foremost, schemas are simple, minimal, easy to read and write, and gracefully extend Clojure's existing type hints. This means that (in my biased opinion) they are significantly better for documentation, which was the primary motivation for developing them.

Second, schemas are data, so it's easy to do more with them beyond documentation. Runtime data validation is one such use, but we can also easily do things like generate core.typed annotations, generate model classes for clients, generate test data, and so on.


Author of core.typed here.

I will challenge that schemas are "significantly" better for documentation than core.typed types.

With unions, intersections, heterogeneous maps, parameterised classes, recursive types we can be very expressive.

Here's some examples of the syntax for types: https://github.com/clojure/core.typed/wiki/Types

And some declarative types in action: https://github.com/frenchy64/core.typed-example/blob/master/...

I think core.typed is restrictive enough to allow annotations like the one above, while being opinionated enough to guide the programmer to write clear, comprehensible types.


Sorry, that didn't come across the way I wanted -- it certainly wasn't meant as a dig on the expressiveness or power of core.typed, which is a project I'm really excited about.

The main driver for Schema was to make annotating function inputs and outputs as simple and readable as possible. Personally, I find annotations directly on the function arguments easier to parse than separate function type declarations, but I suppose that's a matter of preference.


No problem, just was eager to clarify :)

FWIW there are macros like fn> that allow you to write (fn> :- Number [a :- Symbol] 1), but you lose the ability to write ordered function types with multiple cases, so I don't use it very much.


Newbie here, mostly familiar with OOP.

What is the difference between how Clojure and other functional programming languages declare types.


Seems like the major distinctions I see here compared to say Haskell (and Scala to the degree that Scala is equivalent to Hs) is that Schema are dynamic and first-class.

Dynamic means that their usefulness is tied somewhat to your ability to exercise code paths. Compare this to Hs's static types where the type logic of your entire program (and all dependent libraries) is checked upfront before compilation. Schemata likely must be triggered by a validation function being called on live code. Endless further argumentation about this distinction goes here.

First-class comes from Schema's dynamic nature as well but is worth further investigation. Schema look like they can be arbitrary functions of the arguments, much like inserting `assert`s at the beginning of a function and then flipping those on or off at a later time. They can also be composed/decomposed/analyzed as Clojure values. This vastly increases the flexibility and complexity of Schema for better and worse. You can express much more sophisticated invariants in your Schema than you can in Haskell types. It looks like it's even possible for these invariants to be value-based—a concept which, in static typing land, is deep into research territory.

I'd say these contract-like invariant checkers are in a pretty different boat from static types. They check different classes of things at different times and make vastly different promises. What they both provide however, so long as your Schema don't get too complex, is some wonderful "living" documentation.


pretty cool - I've used something similar for python JSON validation:

https://code.google.com/p/jsonvalidator/

but love the idea of something for validating nested data structures as well.


I had come across another schema validation lib for Python on Github a while back (but never tried/used it):

https://github.com/halst/schema


This is excellent. I used to occasionally use assertions to accomplish the same thing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: