Hacker News new | past | comments | ask | show | jobs | submit login
Doctree (github.com/sourcegraph)
144 points by tosh on April 30, 2022 | hide | past | favorite | 36 comments



Whoa, wasn't expecting to see this on HN! I've only done a week of work on this, super early stages - here are our plans for it:

* 100% OSS tool, run locally on your machine (static Go binary) or use https://doctree.org (not online yet, plug in a repository name, get docs) - really want this to be a proper, useful FOSS tool.

* Work with any language, based on tree-sitter.

* Provide symbol-level search functionality.

* Surface real-world usage examples automatically, probably based on some statistical analysis of how functions are commonly used in open source code via Sourcegraph API, similar to what https://codestat.dev is doing.

Tech details (again, just a week in):

* Go for backend, Elm for frontend

* Indexers will be written in Go, use tree-sitter queries to produce a standard index schema which then gets served to Elm frontend for rendering. https://github.com/sourcegraph/doctree/blob/main/doctree/sch...

Probably not worth trying out right now, but if you're interested in it we set up a Discord server for collaboration, etc.

https://discord.com/invite/vqsBW8m5Y8

Happy to answer any questions!


After a week of work, how much of this have you managed to accomplish so far? (Very little, I assume, but I could be wrong.)


Quite a lot, actually! It's definitely not ready for prime-time usage or anything, but I feel good about the week of time on it:

* There are some Docker commands you can use to try it out on some Go code right now[0] (still working on getting binary releases for each OS so Docker is not necessary)

* There's a not-too-bad frontend (written in Elm), screenshots in the README are real & it all functions!

* There's an indexer implemented for the Go language[1] that runs tree-sitter queries & emits a basic schema[2], the idea is each language would emit to a common schema like this and then the frontend can serve it, we can index it for search, etc.

So, I mean, yeah - just a week into it, but like - you can already view documentation for Go functions in it so moving quickly!

[0] https://github.com/sourcegraph/doctree#try-it-out-extremely-...

[1] https://github.com/sourcegraph/doctree/tree/main/doctree/ind...

[2] https://github.com/sourcegraph/doctree/blob/main/doctree/sch...


Impressive, thanks for clarifying!


Why is this open source? Sourcegraph could make this a successful paid, enterprise product.


Fair question! I want to say really, the goal is just to find a way to make an actually useful library documentation tool like this that people enjoy using. Myself, Beyang (CTO), and Quinn (CEO) have all wanted something like this for a while and think there's potential for something cool here.

It'll require _a lot_ of iteration to make this work really well, though, and make it something that everyone feels good about using *for their projects*. Don't want there to be barriers to using it, if it was enterprise/paid it'd be tough to do that.

There will be features/functionality doctree _can_ gain if you connect it to a Sourcegraph instance, but largely because they'd be impossible to do otherwise:

* Usage examples - we need statistical analysis of a large corpus of open source code to find good real-world usage examples, so we'll leverage Sourcegraph for that (it already has that data.)

* Respecting repository permissions, OAuth integration, etc.; very important in enterprise environments, super complex/annoying to do. Sourcegraph already has all this data about your github/gitlab/bitbucket repos, user accounts, etc. and so maybe you can one-click connect doctree to a Sourcegraph instance to gain this functionality if you're some large enterprise that needs it.

I think there are some great synergistic ways doctree will work with Sourcegraph if you use that (or are OK with it contacting Sourcegraph.com for public code, but very important to make that respectful / opt in.)

I want to be clear, though, doctree is 100% open source, it'll be a proper OSS project - just want to make a useful tool for everyone first and foremost.


Can someone explain why I was downvoted?


I interpreted your question as "How does it make sense for Sourcegraph to build this as an open source thing, and not as a paid product?" which I thought was a super reasonable question.

I think "Why is this open source?", though, is what got you down-voted because it implies it should be closed source, when folks obviously prefer open source.

Hope that helps!


I'm ecstatic that someone is building this.

Figuring out sourcecode-to-apidocs for one language is annoying, and figuring it out in the context of multi-language monorepos is exhausting. Then on top of that, I want to fail the build if someone adds public APIs to a library and doesn't document them! Now I have to go back to all my doc generators and get some kind of metadata out of them??????? And what if I want to make my docs pretty, and link to each other across languages?????? SFLSJHDFKJSDHKJF

So, this is great. A small dream, coming true. Best of luck to y'all!


I've been wanting this to exist for years, ever since using Elixir and the amazing first class doc support. I hope it is super successful!


I’m getting “server not found” when I click on the web page


I’m helping with the doctree effort at Sourcegraph. Apologies, the site isn’t actually up yet. This project is still very early stages and we wrote up the README to serve as a sort of launch spec that we could update in response to feedback we receive. We made the repo public so we could build in public but didn’t expect it to receive this much attention this early!

So sorry that the site isn’t up yet. We’ll update the README soon to reflect that. If folks are interested in trying out a super early version, there’s the Docker run command and if you’d like to help us realize this vision, please join our Discord! https://discord.gg/vqsBW8m5Y8


And it's not just DNS issues. The domain doesn't even seem to be registered: it's available for purchase...

edit: No longer! Hopefully someone benevolent picked it up


Sweet, a phishing opportunity!

    | Welcome to the doctree.dev demo site!
    |---
    | Enter your github oauth2 token to see it work


To be clear, we do own the domain. It was registered about ~7 days ago. It's just not deployed yet.


May want to double check that: domain data shows it was only registered today about an hour ago: https://client.rdap.org/?type=domain&object=doctree.dev

It was definitely available to purchase when I commented


..huh, yep, you're right. That's.. super embarrassing and huge screw up on my part, uhg. I was 100% positive I submitted the order through Namecheap before pushing the repo up to GitHub for this exact reason, and that it went through.. but yeah, looks like I didn't and we don't own it. :(

Good news is we've got doctree.org, so will be using that instead. I've removed all references to the other domain.

If it was a good samaritan, shoot me an email -> stephen@sourcegraph.com


I love namecheap but I have walked away thinking I’ve completed an order to find that there’s a second order confirmation screen numerous times.


This was probably it.. luckily, it seems like it may have been a good samaritan from HN, they reached out to me just now. People here rock :)


Probably some developer misusing the .dev domain for their own personal test projects, yet again.


You mean "some dev deciding to keep using the .dev for personal test projects, as was the standard before ICANN and Google took a standard community resource and privatized it, yet again"


> as was the standard

Which standard was that? I thought only example.com was special-cased?


Before ICANN started auctioning off TLDs it was common practice to use .dev and .test (probably others that escape me).

It wasn't formalized, but that doesn't really matter. It was well known and commonly done.

In fact, it couldn't have been formalized, because the TLDs were limited and by definition any non standard TLD was for internal use only. It would make no sense to have a defined standard for an impossible situation.


> In fact, it couldn't have been formalized, because the TLDs were limited and by definition any non standard TLD was for internal use only.

No, there never was any guarantee that the existing TLDs were all that would exist ever, so non-standard TLDs were just that: non-standard, undefined what happens to them. And you even provided a counter-example: .test is explicitly reserved by an internet standard to never be in public DNS and thus safe to use for testing purposes.


Using .dev was always contrary to spec. Dunno how common it actually was—I personally never encountered it. Clearly ICANN decided it wasn’t such a hazard as .home and .corp, which are both indefinitely delayed (https://icannwiki.org/Name_Collision) due to their popularity (despite being contrary to spec). You should instead have used something like .localhost (reserved in RFC 2606) if it’s on your local machine, or .test (reserved in RFC 2606) in a local network, or some domain that you control (even if it’s not publicly routable).


https://github.com/basecamp/pow

Pow used .dev and .test in the 2000s-2010s


That’s not a standard that’s just someone using them.


Incidentally, I've been getting a lot of timeouts from .dev TLDs today.


Unless "tosh" is one of the project authors, I'd guess this was submitted to HN before an official launch. That is, I bet there are a lot of broken links from the website field in GitHub projects, and the only reason you expected this one to work was because it made the front page of HN and was in sourcegraph's GitHub org



https://mir-algorithm.dpldocs.info/mir.ndslice.html is a D specific approach to something like this (it doesn't use the same approach as the builtin doc generator you get with the D compiler, e.g. everything appears in the docs just some things are noted as being undocumented)


This seems like a really slick idea.

Something I've always wanted is better multi-language documentation support. kLike suppose I have a c++ project that is integrated to python with pybind11. The python bindings may be the highlevel interface, but sphinx doesn't make it easy (as far as I know) to integrate python documentation generated from doc strings with c++ doxygen style briefs, especially in a way that lets you navigate seamlessly between the two.

I wonder if you have considered a use-case like this?


Definitely want to enable this use case better. I think it will be quite easy to have a "you're viewing your project, you search for a function named 'curl fetch' and it turns up both the Python and C++ method named that"-type experience.

Linking between the two may be tougher (showing you with confidence they're related), but maybe possible for us to do something there, not sure yet.


Interesting - wonder if there's a way to combine this with fzf + some kind of html or e.g. markdown viewer to search and read docs straight in the terminal


Since `doctree` will be a static Go binary you can run locally, the idea is to have a "get docs straight in the terminal" via the CLI in the future. But probably a little ways away from that since we're just a week into it!


If the docs are generated from comments in the code a simple ripgrep search of the source (with fzf or other integration) will find things just as well too. Hound is a nice little web UI that does this: https://github.com/hound-search/hound




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: