> Implement their methods from scratch (i.e. numpy not pytorch)
lol this is basically impossible and completely pointless. please show me a numpy implementation of BERT or CycleGAN or deformable convolutions (note that jax != numpy). it's like suggesting implementing a kernel to someone who wants to learn about virtual memory or scheduling.
better advice would be take a paper and implement the model using pytorch without looking at their implementation and fiddle with that.
2. Implement their methods from scratch (i.e. numpy not pytorch)
3. Experiment a bit, tweaking the models/algs to gain intuition
4. Repeat 1-3