> just mount the remote filesystem locally using FUSE somehow
This is the step that never works consistently for me. There is always some amount of random extra latency that makes the this workflow painful. I work with some extremely large data files, so random access to these is the primary issue.
In general, the idea is that it is often better to do compute where the data already is. My experience is that you should also do the programming closer to where the data is as well. This tends to make an iterative development loop tighter.
But this is highly dependent upon what you’re doing.
That's a different thing, though. You don't edit the data in a text editor interactively, do you? I would do any interactive editing with a local editor and then fire off remote processes to operate on the data.
It's funny because my reasons against using a text editor remotely are exactly the same: to make the development loop tighter. I am very upset by latency and always try to remove it where possible. I think this is the kind of thing where we'd need to look over each other's shoulders to understand our respective workflows.
> You don't edit the data in a text editor interactively, do you?
That’s exactly what I’m doing. The code is written on the remote server. VSCode’s remote setup is actually very good at this. Mainly because, it is really a web editor that is hosted remotely and you use a local browser (Electron) to interact with it. The processing loop then happens all remotely.
But really, I’m talking more about data analysis, exploration, or visualization work. This is when I need to have good (random) access to 100’s of GB of data (genomics data, not ML). For these programs, having the full dataset present during development is very important.
If I’m working on more traditional programming projects, I can work locally and then sync, but recently I’ve been using more docker based devcontainers. These are great for setting up projects to run wherever, and even in this case, the Docker containers could be hosted remotely or locally (or more accurately in a VM).
Yeah I used to work with genomics data and never did I think I needed to have part of my text editor running on the high performance cluster.
I think people are just talking about different things and confusing each other. The original comment I replied to was arguing against SSHing in (or vnc or something) and running the text editor there. VSCode isn't doing that. It is running the interactive part locally. It's hard for me to understand why it needs a server part, though. If you want to edit something locally it has to send it across the network. There's no way around it. It seems like six of one and half a dozen of the other.
This is the step that never works consistently for me. There is always some amount of random extra latency that makes the this workflow painful. I work with some extremely large data files, so random access to these is the primary issue.
In general, the idea is that it is often better to do compute where the data already is. My experience is that you should also do the programming closer to where the data is as well. This tends to make an iterative development loop tighter.
But this is highly dependent upon what you’re doing.