Each test result creating *several* rows of data seems like a problem too. In cl...

remus · on Oct 5, 2020

Looking from an outside perspective I would agree, but in practice we don't actually know what this dataset is, so several things rows per test may make sense in the context of this stage of the data processing.

sjansen · on Oct 5, 2020

It never ceases to amaze me how often we as developers are quick to point out other people's obviously incorrect decisions, only to defend our own Rube Goldberg implementations 30 minutes later by pointing out that critics just don't understand the design constraints.

Or at least how often my peers do that. Obviously all of my systems and code are perfect.

jamil7 · on Oct 6, 2020

Nothing seems as easy as another engineer’s problem.

This also amazes me, especially the whining about other people's code from developers. Truely believing that they would do it better. I've even seen inherited code posted to be ridiculed/shamed/bashed in some slacks and subreddits.

vhold · on Oct 5, 2020

I'd say it depends on the data.

If the schema is consistent between rows, and it turns out a test result is made up of several rows because the test is composed of several stages, I would leave it as is until reporting time.

If you pivot prematurely, you could end up dropping data because there are new stages didn't exist when you implemented the pivot.

diarrhea · on Oct 6, 2020

> If the schema is consistent between rows, and it turns out a test result is made up of several rows because the test is composed of several stages, I would leave it as is until reporting time.

In that scenario, indeed no action seems to be needed, because each row is one observation: every test stage is an observation. So it would seem to make sense.

One could then argue that each patient deserves their own table (observational unit).

But as other commenters pointed out, this is all speculation.

kobe_bryant · on Oct 5, 2020

these are data entries from all over the UK so its most certainly not clean data