If people are interested in this area and not already aware, there has been an Eclipse project that operates in this general area for a while. https://www.eclipse.org/recommenders/
I want something like this for math. Write an equation, or a definition and see a bunch of different 'versions' of that snippet and where they are being used.
This would help so much with understanding concepts and merging fields. It is way too common for different fields to independently "discover" some concept and be completely ignorant of all the work that has been done on that concept by some other field.
Interesting. In typical Facebook style, they do not attempt to fix the root problem (there's too much code, and much of it has been copy-pasted) but instead expend even more resources just to allow (or even encourage) it to proliferate. The effort would be far better expended on a tool to refactor out all that duplication, because they've created something that can clearly identify duplication.
It reminds me of how they hit a limit in the Android VM because their code had so many classes, and decided to work around it instead of reflecting (no pun intended) on how they ended up with so much code in the first place: https://news.ycombinator.com/item?id=5321634
Maybe ontopic: I would like a voice controlled system that works with me. For example: saying, "I need a loop over a list" and promptly I get served in my text editor the loop. Or "I need to open a file and read contents". Or "Create an object ThisAndThat, with three properties" ... etc. Of course ideally would even ask for more details like, what kind of list is that, or how shall the file be read.
As a software engineer, a part of me dreads that day, as this may be the beginning of the end of our profession as it is. But then when I think of it, the AI that can accomplish this will be trained using existing code as a learning sample, and not just static code, it will learn by looking at commits and how code evolves over time, so it is bound to also learn to write bugs, change its mind on design, refactor things that need not refactoring, do premature optimization, rewrite it all in go/rust/the newest cool language on the block, then get stuck because all of its questions got closed as non constructive on stackoverflow, So maybe we'll still have a job after all.
I think my idea goes about a rapid prototyping, where i build the skeleton of a program faster, no matter the boiler code, and then workout the details.
Intellisense or shortcuts do this up to a level, but the current big IDEs are limited. Maybe some editor with the concept of VIM with a separate command and edit mode would be more fit to work like that.
I've envisioned something like this even for written code. Basically unify the language and the editor, so that you could (theoretically) right-click on main() and say "add loop" and have the correct code auto-generated. Not because a mouse is somehow better (it's much worse in fact), but basically an editor/UI that only allows you to produce valid code.
Currently for most languages, we have: "type productions of a particular syntax and try really-really-hard to color between the lines, and subject yourself to the chinese-water-torture of syntax errors till YOU get better at it".
Why not invert that, whether via mouse input, a visual (as in literally, visual, not microsoft-visual) connection, or a text editor that simply doesn't let you type invalid productions. Like Intellisense, but taken to the function or block level. You cannot save the file or even leave insert mode until the code compiles. Or even better, you cannot even temporarily input invalid syntax. From the first keystroke, it inserts a variable declaration, click/type up or down to choose a function call, conrol structure, etc.
Some vim-like integration would go as follows:
command mode:
* <space>+F outputs a function called func1, auto-highlighted for you to rename (or accept default).
* <space>+R on the func name lets you set its return type
* <space>+A for args,
* <space>+B to edit the function body
At no point would you be allowed to input non-compiling syntax. Things like indentation would be non-issues, set uniformly by defaults.
yes, that is awesome. Now imagine connecting that to a machine-learning backend and letting it slowly train itself on how to write software. yes, I know ML doesn't need this vim-type language specifically, but it should help by only feeding it valid productions.
As you expanded your database of common tasks, do you think that would eventually become a repository of “things I shouldn’t have to remember”, which could then be used to redesign languages?
Machine code instructions designed by hand are not necessarily the best fit for the code we actually generate. Similarly, might our approach to language design lack pragmatic insight as to which constructs should be favoured, adopted, simplified etc?
This is actually a good idea. Also let's not forget the StackOverflow huge "library" on quick solutions. Someone has to harness that vast knowledge source!
The most interesting (and I think difficult) approach here is properly representing the ASTs as vectors. There is a lot more possible when you get this right.
The primary use case I experience for searching for idiomatic usage patterns is to know how to do a higher level refactoring, meaning I don’t want results that have syntax tree similarity to what I’ve got or even the small bit I start from to create the query. I want the intention of my search query but expressed in a better design.
Separately, for very micro-level idiomatic things, like use of a certain data type operation or efficient constructor patterns, I need to search by natural language descriptions of the subtle differences between options. This is what makes Stack Overflow so helpful, the accompanying natural language description of intentionality or special cases, even if the code that is found isn’t precisely what’s needed, it demonstrates directionally what to do.
This tool seems like yet another example of trying to force machine learning solutions to problems nobody actually has.
Considering the idea that I’d need to integrate this into my coding environment, I’ll say No Thanks!
> This is what makes Stack Overflow so helpful, the accompanying natural language description of intentionality or special cases, even if the code that is found isn’t precisely what’s needed, it demonstrates directionally what to do.
You're entirely right, but if you're in an incredibly huge monorepo like Facebook, this information literally doesn't exist; that's part of the problem that Aroma is trying to solve - "how can we show people the Facebook App Way To Do That Thing, even if That Thing doesn't have current documentation"
(Disclaimer: I worked on the coding environment UX for Aroma)
Wouldn’t it make more sense to spend the effort annotating these things? Or building models to provide the annotation? I mean, I work professionally in embedding models for computer vision and NLP, and my reaction to the article is that this seems like totally the wrong approach. You’re putting all this effort to create the embedding model out of the part that is both most superficial and least human interpretable (the AST).
Building models for natural language _and_ code for either NL/intent-based code search or automatically annotating code is indeed another hot research area!
I'd argue Aroma solves a different problem in that it surfaces more idiomatic patterns based on the code you already have. This also can be important especially in production environment, when you need to do things "the right way".
Your website is the first I've heard of "Information Foraging" as a field of study. Absolutely fascinating. Any recommendations on where I might dive into the topic?
The paper applies IFT to software engineering, but IFT has also been applied to navigating websites or even physical offices. Use Scholar.Google.com to find a PDF of the paper if you don't have ACM access.
The related work cited in the CodeDeviant paper may help.
CodeDeviant itself is a tool to help programmers perform manual refactorings without unit tests (in a visual programming language), so it may not be helpful for you :)
Was anyone able to find a link to Aroma in that document? I found the colours made it very difficult to differentiate the links from the text and I couldn't find it.
A quick search through Facebook's profile on Github turned up nothing.
Not directly about the article but I am annoyed that the FB AI blog is very much a part of FB the social network. While reading the blog I got three useless notifications (boxes in the bottom left corner). The whole page has no other indication that I am logged into FB nor any option to log out.
Is this project open source? I would like to experiment with something like this to generate boilerplate code. Often when programming I copy something that already do kinda what I want, then modify it until it does exactly like I want.
From what I read, it's doing search & clustering on AST based feature vectors. I'm a bit lost on the learning part, how does the system improve over time?
One danger I can see coming up is that if someone writes incorrect code it could end up propagating throughout other codebases. I guess this is still an issue without automatic tools, but I feel like this might make it easier…
Aroma would only surface what it thinks is "idiomatic" coding patterns. So if you have many instances of incorrect code, you might already be in trouble :)
If you discover a better pattern, it might be easier to convert across the application if the same pattern is followed everywhere. So you might consider this a win.