Hacker News new | past | comments | ask | show | jobs | submit login

Each test result creating several rows of data seems like a problem too. In clean data, every observation is one row. It makes working with the dataset much easier. In this scenario, I would expect one observation to correspond to one test result. The multiple rows are then better off pivoted into columns.



Looking from an outside perspective I would agree, but in practice we don't actually know what this dataset is, so several things rows per test may make sense in the context of this stage of the data processing.


It never ceases to amaze me how often we as developers are quick to point out other people's obviously incorrect decisions, only to defend our own Rube Goldberg implementations 30 minutes later by pointing out that critics just don't understand the design constraints.

Or at least how often my peers do that. Obviously all of my systems and code are perfect.


Nothing seems as easy as another engineer’s problem.

This also amazes me, especially the whining about other people's code from developers. Truely believing that they would do it better. I've even seen inherited code posted to be ridiculed/shamed/bashed in some slacks and subreddits.


I'd say it depends on the data.

If the schema is consistent between rows, and it turns out a test result is made up of several rows because the test is composed of several stages, I would leave it as is until reporting time.

If you pivot prematurely, you could end up dropping data because there are new stages didn't exist when you implemented the pivot.


> If the schema is consistent between rows, and it turns out a test result is made up of several rows because the test is composed of several stages, I would leave it as is until reporting time.

In that scenario, indeed no action seems to be needed, because each row is one observation: every test stage is an observation. So it would seem to make sense.

One could then argue that each patient deserves their own table (observational unit).

But as other commenters pointed out, this is all speculation.


these are data entries from all over the UK so its most certainly not clean data




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: