A lovely knot to unravel! First, get everything in source control! Next, make it...

neverartful · on Sept 18, 2022

I largely agree with this approach, but with 2 important changes:

1) "Next, make it possible to spin service up locally, pointing at production DB."

Do this, but NOT pointing at production DB. Why? You don't know if just spinning up the service causes updates to the database. And if it does, you don't want to risk corruption of production DB. This is too risky. Instead, make a COPY of the production DB and spin up locally against the COPY.

2) OP mentions team of 3 with all of them being junior. Given the huge mess of things, get at least 1 more experienced engineer on the team (even if it's from another team on loan). If not, hire an experienced consultant with a proven track record on your technology stack. What? No budget? How would things look when your house of cards comes crashing down and production goes offline? OP needs to communicate how dire the risk is to upper management and get their backing to start fixing it immediately.

threatripper · on Sept 18, 2022

Yeah, having the experimental code base pointing at the production data base sounds like fun. I did that. We had a backup. I'm still alive.

jtolmar · on Sept 18, 2022

This is the right way to think about it. My only disagreement is that I'd do the local DB before the local service. A bunch of local versions of the service pointing at the production DB sounds like a time bomb.

And it's definitely worth emphasizing that having no framework, MVC, or templating library is not a real problem. Those things are nice if you're familiar with them, but if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.

Aeolun · on Sept 18, 2022

> if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.

You can write a website in it, but you cannot test it for shit.

rendall · on Sept 18, 2022

If this is true, OP can consider writing tests of the website using a frontend test suite like Cypress, especially with access to local instances connected to local databases.

paledot · on Sept 18, 2022

There's no value to retroactive unit testing. Retroactive tests should be end-to-end or integration level, which you certainly can do without a framework.

Arcanum-XIII · on Sept 18, 2022

Framework are not needed to test. I've been testing and validating my code way back, in C. Not because I was an early adopter (I'm still not), but because I needed to debug it, so... faster.

janee · on Sept 18, 2022

Good strategy. I would suggest not hooking it up to prod DB at the start. Rather script out something to restore prod DB backups nightly to a staging env. That way you can hookup non prod instances to it and keep testing as the other engineers continue with what they do until you can do a flip over as suggested. Key here is always having a somewhat up to date DB that matches prod but isn't prod so you don't step on toes and have time to figure this out.

Note that going from no source control to first CD instance in prod is going to take time...so assume you need a roll out strat that won't block the other enigneers.

Considering what sounds like reluctance for change the switch to source control is also going to be hard. You might want to consider scripting something that takes the prod code and dumps it into SC automatically, until you have prod CD going...after that the engineers switch over to your garden variety commit based reviews and manual trigger prod deploy.

Good luck! It sounds like a interesting problem

chinathrow · on Sept 18, 2022

> Next, make it possible to spin service up locally, pointing at production DB.

I think this is bad advice, just skip it.

I would make a fresh copy of the production DB, remove PII if/where necessary and then work from a local DB. Make sure your DB server version is the same as on prod, same env etc.

You never know what type of routines you trigger when testing out things - and you do not want to hit the prod DB with this.

rendall · on Sept 18, 2022

I am inclined to agree. The other advice was excellent, but pointing local instances to production databases is a footgun.

rendall · on Sept 19, 2022

I've kind of reconsidered this a bit. Right now, the only way to test that the database and frontend interact properly is to visit the website and enter data and see it reflected either in the database or in the frontend.

It's less terrible to have a local instance that does the same thing. As long as the immediate next step is setting up and running a local database.

chinathrow · on Sept 19, 2022

But the ting is, you have no idea if not even a single GET request fires of an internal batch job to do X on the DB.

I mean, there are plenty of systems in place who somehow do this (Wordpress cron I think) so that's not unheard of.

For me, still a nope: Do not run a against prod DB especially if the live system accounts for 20M yearly revenue.

doctor_eval · on Sept 18, 2022

Agree with this approach. You have nginx in front of it already so you can replace one page at a time without replacing everything.

One thing I haven’t seen mentioned here is introducing SSO on top of the existing stack, if it’s not there. SSO gives you heaps of flexibility in terms of where and how new pages can be developed. If you can get the old system to speak the new SSO, that can make it much easier to start writing new pages.

Ultimately, a complete rewrite is a huge risk; you can spend a year or 2 or more on it, and have it fail on launch, or just never finish. Smaller changes are less exciting, but (a) you find out quickly if it isn’t going to work out, and (b) once it’s started, the whole team knows how to do it; success doesn’t require you to stick around for 5 years. An evolutionary change is harder to kick off, but much more likely to succeed, since all the risk is up front.

Good luck.

jrochkind1 · on Sept 18, 2022

I think "SSO" here maybe doesn't mean "Single-sign on"? Something else?

doctor_eval · on Sept 18, 2022

No, I meant single sign on.

In my experience, if you can get SSO working for (or at least in parallel with) the old codebase, it makes it much easier to introduce a new codebase because you can bounce the user outside of the legacy nginx context for new functionality, which lets the new code become a lot more independent of the old infra.

I mean there are obviously ways to continue using the old auth infra/session, but if the point is to replace the old system from the outside (strangler fig pattern) then the auth layer is pretty fundamental.

That’s what I faced a similar situation - I needed to come up with ways to ensure the new code was legacy free, and SSO turned out to be a big one. But of course YMMV.

specialdragon · on Sept 18, 2022

I'd add putting in a static code analysis tool in there because that will give you a number for how bad it is (total number of issues at level 1 will do), and that number can be given to upper management, and then whilst doing all the above you can show that the number is going down.

samus · on Sept 18, 2022

There is significant danger that management will use these metrics to micromanage your efforts. They will refuse changes that temporarily drive that number up, and force you to drive it down just to satisfy the tool.

For example, it is easy to see that low code coverage is a problem. The correct takeaway from that is to identify spots where coverage is weakest, rank them by business impact and actual risk (judged by code quality and expected or past changes) and add tests there. Iterate until satisfied.

The wrong approach would be to set something above 80% coverage as a strict goal, and force inconsequential and laborious test suites on to old code.

mgkimsal · on Sept 18, 2022

Many tools allow you to set the existing output as a baseline. That's your 0 or 100 or whatever. You can track new changes from that, and only look for changes that bring your number over some threshold. You can't necessarily fix all the existing issues, but you can track whether you introduce new ones.

robotron · on Sept 18, 2022

The results might also be overwhelming in the beginning.

ilikeatari · on Sept 18, 2022

Solid advice. I did 2 full re-writes with great success and to add to this list I would also make sure you are communicating with executives (possibly gently at first depending on the situation), really learning about the domain and the requirements (it takes time to understand the application), investing in your team (or changing resources - caution not right away and at once since there is a knowledge gap here). The rewrite will basically have massive benefits to the business: in our case: stability (less bugs), capability to add new features faster and cheaper, scalability, better user experience etc. This can get exiting to executives depending on the lifecycle of the company. Getting them exited and behind is one of the core tasks. Dont embark on this right away as you need more information, but this will matter.

rebelpixel · on Sept 19, 2022

Among the things I'd prioritize is to make a map of all services/APIs/site structure and how everything falls into place. This would help you make informed decisions when adding new features or assessing which part of the monolith is most prone to failure.

eric4smith · on Sept 18, 2022

Best advice so far.

RedShift1 · on Sept 18, 2022

This is the way.

gargarplex · on Sept 18, 2022

This hacker agiles.