Hacker News new | past | comments | ask | show | jobs | submit login

> I think it can yield the highest success rate.

At what?

If you want to be an SRE for a data platform, sure, but this pretty thankless work:

- cleaning up dodgy data

- cleaning up behind low code data pipelines and other painful integration work with systems that suck and sometimes just don’t work (like PowerBI).

- cleaning up behind data scientists that create models in arcane and imaginative ways and expect you to “productionise” them.

- cleaning up behind brain dead scheduling systems that fail unexpectedly.

- constant churn with partners and cloud products for whatever the latest hotness is.

If you want to be solving actual problems with ML, this is a dead end.

It’s IT support for the people doing real work.

…so, it depends on your goals. Getting a job? Sure! Everyone wants a workhorse who they can dump all the annoying problematic on-call tasks to.

Learning ML, contributing to research, building models?

This isn’t a path that leads there.




> It’s IT support for the people doing real work.

This is an appalling perspective. Good MLE skills seem a lot harder to find that good ML ones.


Someone who can engineer infrastructure, pipelines and fire fight production issues is hard to find, but that’s not the point I was making.

My apologies; It is real work; the point I was making is it’s not ML work, any more than writing a yaml file is ML work.

If you want to write yaml files, any number of possibilities exist.

If you want to work with machine learning, then don’t become a data engineer. The skills are, mostly, not related ML, and more closely aligned with SRE / devops.

It’s not infrastructure and helping build models as you mature and advance: it’s almost literally just infrastructure and fire fighting… in my, limited, 3 years of experience as such.


What?

A lot of the hard part isn't the model, and especially in a world where bert, xgboost, optuna, pytorch, etc have solved much of the classic problem and forced 'real' DS to specialize on either the business consulting side (not math/engineering) or theory side (barely implemented). The rebrand of 'data analyst' (SQL, powerbi, . ..) to 'data scientist' by even top tech companies underscores this. It's not yet to where web dev has gotten in terms of global $20/hr fiverrr contractors, but already at say $40/hr for someone who can build real production models for more boring scenarios.

The result is the vast bulk of data scientists (phd, self-trained, consulting, ...) we interview are weak engineers, so going from a make-believe notebook to a trickier production scenario requires the data engineer / MLOps / etc to solve a lot that a typical DS doesn't really understand in practice. Scale, latency, distributed systems, testing, etc. Likewise, the part the DS solves has little to do with the latest neuroips paper, and more just about lifecycle tasks like getting better data, which the other folks on the team will often be involved with as well.

So 2 natural high-paying paths here:

data engineer / MLOps -> MLEngineer -> DS

data engineer -> all-in-one data analyst/scientist -> ML/AI data scientist


I agree with this. From my experience most of the data scientists I have worked with didn't exit the world of Jupyter notebooks. For them, code management, CI/CD, dev/stage/prod separation, etc. is a world of its own that they are not very comfortable with. Heck, they even used Sagemaker to create git repo for their Jupyter notebooks.

It doesn't mean that there aren't data scientists who have some engineering experience as well, but this seems to be rare. For that reason, getting those ML models that they painstakingly build to where they'll generate some real value is super hard. They just don't know where to start. Working across multiple teams and multiple functions is very challenging and it often creates friction. Therefore, creating tools and systems that will enable those data scientists to see the actual value of their labor is paramount.

That's why we're seeing a huge resurgence of so called MLOps tools and platforms that aim to solve all or some of the problems of the entire stack. We are very very early in this journey, but I believe 2020's will be for ML and AI what 2010's were for the cloud and data, ie. new Snowflakes and Databricks but for the actual ML apps. It's exciting.


Definitely agree with your first two paragraphs, but am confused by the pay paths. Can you expand on what the paths mean?


It's useful to work backwards from the knowledge a DS needs to be worth their weight. Imagine a small team of $400K/yr DS + $400K/yr DE + ... and whatever hw/sw . So say a $2-3M/yr project driving $3M+ of new growing revenue or $6-12M of annual savings. At bigger companies, even more magnitudes & pressure :)

The DS will likely:

- be close to the business case & business stakeholders to ask questions a normal lead can't

- know the relevant math + ML algorithms, and build up specializations pairing DS niches ("time series forecasting") with industry niches ("supply chains in manufacturing")

- enough engineering & performance understanding to work with a DE on going from small data sets to big ones

- have an intuitive feel for all of the above - how data/usecases/etc. go right/wrong

That's a lot!!

One path is jumping in as a low-paid intern or new grad and doing your time. But a pivot is different, esp. to get paid along the way. Most CS grads had little math ("intros to stats, combinatorics, & algs; dropped linear algebra"), weak ML ("did algs; intro to ML only covered kmeans & bayes; tried running a BERT model on some data"), and little intuition for how ML typically goes wrong ("what's class imbalance?"). So if they do get hired directly as a mid-level DS, it's probably on a team of the blind-leading-the-blind. Oops.

BUT SQL/Spark/K8S/pandas/regex are real skills. Doing the data engineering, ML operations, etc., around making an ML pipeline more than a fanciful notebook that wouldn't last a minute in production is real work. That stuff does pay well, and by working with the ML folks, you'd naturally get pulled into the ML tasks as well. DS write all sorts of bugs that surface as production evolves and the full team works together on, and new features that needs a team to make real. So taking a job that mixes engineering specialties with ML specialties is a smoother pivot path for the typical CS backgrounds I've seen. Over time, drift to more ML-y aspects of the projects happening until you can do the full hop. (Nit: That won't teach the math & deeper intuition, so I'd still do courses + projects on the side.)


In general, does the DE have higher salary than DS?

Am I understood correctly that there is much more demand for DE than for DS?


I wish I had real numbers. So instinct from what I've seen:

- a data analyst role rebranded as a DS role will be lower paid than a DE role, maybe 50% diff

- an actual DS role is probably higher paid than a DE role, but really depends on the job+co

- a great DS role and a great DE role are both super well compensated. Though maybe again DS higher than DE in most just b/c ability to more directly drive $. Unless something like an infra company, the DS will be inherently closer to the business & outcomes. ("I did this clever thing that netted 2% revenue spike that adds up to $40M/yr in new revenue, what did you do?")


NeurIPS paper, not neuroips paper


still not used to the new name ;-)


With the right mindset it can be insanely fun building infrastructure, automating things, and engineering solutions such that improvements ratchet forward and fires get put out before they have a chance to grow. While the ML people, bless them, are chasing an 0.001 improvement in the metric of the day, data engineers can be having huge impact and changing the game. Meantime ML is becoming commoditized in its most common use cases.


I first had the DE title 7 years ago (going into it having never heard of DE), and have been doing MLE/platform work for the past 5. You’re projecting your limited experience onto a poorly defined role that varies wildly from company to company. My experience is much different from yours: little firefighting, lots of actual building. Yes there is infrastructure, but any good programmer these days should be able to stand up some basic infrastructure.

Yes, don’t get into it if you want to do ML research or apply ML, but if you are interested a bit in it and find building models the least creative, most boring shit ever like I do, and prefer traditional coding, it’s a nice spot to be in.


What is the average salary for DE currently in US?


Great comments. I agree with your take on what being an ML Eng actually means. Of course this will vary to a degree from team to team and company to company, but I think you still capture it well.

I absolutely think MLEng is important and much needed, but too often under appreciated. Being this half breed part engineer part ML leaves you on a lonely island often in many orgs. The ML managers don't really understand what you do and neither do the engineering managers. It is kind of thankless unless your management really understands your role and appropriately advocates for you.

MLEng is often an engineer who wanted to get into the sexy ML space and since it is in the title it feels cool. Then you realize you're more an Ops engineer who deals with the inane code of many "true" DS/ML scientists. Thankless, indeed.


Especially in the edge / embedded space, MLEng will imply more than just doing ops.

Stuff to do could include: - Getting a network architecture to run. - Applying optimization depending on target arch (pruning, quantisation, custom cuda kernels, etc). - Integrating models (rule of thumb: a product is 95% ordinary code, 5% is ML related). - Constructing benchmarks, monitoring


Sure - fair clarification. But conversely, you could make some awesome automation that you rule with a light touch as an engineer, or as a DS you could come up with that ML that Amazon has to recommend you endless TVs as soon as you bought a TV :)


I think he meant "real work" from the perspective of the overeducated people who tend to end up in (and resent being in) corporate data science roles.

From the boss's perspective, the grungy IT support stuff is closer to real work (although the only thing that's actually respected in management, because it's what they do) than the shiny ML stuff that one star hire is allowed to do because it makes the company look cooler than it actually is.


It highly depends. I was hired for a small research group that didn't have a product in production. Got hired for programming, was on the table discussing and contributing to research within a couple months without any background in ML.


These types of anecdotes make the actual practice of both ML and AI seem rather, well, less than scientific. There is supposed to be Ph.D. level math behind all of this, yet an amateur with admittedly no ML background is part of the team.

In Star Wars, it takes Luke Skywalker years to learn to use a light saber skillfully. Then in The Force Awakens, some ex-Stormtrooper with no training picks up the light saber and within 5 minutes is a pro. Kinda ruined the mystique of Star Wars, just like people jumping into ML with no training ruins the mystique of ML.


I am very against the idea that somebody with a PhD is the only one that can do a certain kind of work. But I am ofcourse biased given that you call me out.

Creative and critical thinking is not exclusive to people with a PhD. The ability to understand ones strengths and weaknesses is not exclusive to a PhD.

I would never attempt to write or publish a paper without help of somebody with stronger mathematical or statistical knowledge. On the other hand they should not write source code for a paper without consulting somebody with a strong background in sw engineering. You complement each other. Power is in recognizing that.

You would be surprised how many software bugs I have found that invalidated entire (draft) papers. A PhD in ML doesn't save you from that.


Is mystique worth having? In fiction, sure, but I think not in R&D. I think a lot of ML and AI isn't especially scientific, but I also think there's a lot of low-hanging fruit in applying it to new areas, and both of those make it easier for an amateur to contribute.


Let me expound on this, as there's a lot of PhD hate in the comments parallel to mine.

What is unique to a PhD is that you took a very long time to master a small slice of the knowledge pie. The emphasis here is on long time: Most people simply aren't willing to go for years on a low salary and tedious job.

It doesn't mean PhD's are more (or less) creative, top coders and whatnot; it means we took the time to read all the papers, to know all previous solution attempts and who all the big players in the field are ("all" w.r.t. our niche). It also mean we can read papers much faster than other people because that is basically what we do all day.

There you have it! Now don't send me ML job offers, cause I gotta read this next obscure paper to figure out if they are legit. :D Just kidding.


I think your metaphor kinda plays against your point.

Years and years ago in the movie universe, Luke had to painfully learn to use his lightsaber from a mentor who passed down techniques and philosophy to the student.

In the current day of the movies, lightsabers are understood to be powerful, yet temperamental and exotic, weapons. Mildly trained individuals can use them, even if it's to a limited extent (i.e. flick switch; shiny side is the business end; heat bad, ouch, no touch).

To belabor the metaphor, you've also had a tradition of people, from all walks of life, using vibroblades (basic to advanced standard statistical analysis and regression) in order to achieve some level of parity against users of lightsabers.


There is PhD level math involved. And yet, ML (deep learning in particular) is much more of an empirical endeavor than many would like to admit. A deep understanding of the underlying mathematics does not necessarily give you a better model. Modern models are so complicated that no one can reason through them. Parameter spaces are non-convex and fully of ugly pathologies that make neat and tidy analysis methods useless.

From one perspective, it is disheartening that a deep understanding of the underlying methods doesn't necessarily win the day. From another, it is quite remarkable that having good implementation skills and a methodical mindset can get you quite far.


Fuck mystique.


Much 'applied' ML is based on tweaked existing models, getting them production ready and integrating them into a product. That's inherently a bit of non-trivial engineering work, but as you pointed out, not scientific per-se.


You're disillusioned if you think a PhD is what makes the difference.

Smart people will be able to contribute even if they don't have a PhD. Some PhD are useless and everyone is wondering how the hell they go through that.


In smaller teams/companies one gets to wear multiple hats. However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.


Sorry, no. The term "ML scientist" was specifically created by/for data folks to get rid of unpleasant repetitive work with math equations that has to be done but nobody looks forward to it.

If you've ever crafted a pipeline and tuned it to hum along, then watched it break with new/more/messier data, then figured out creative ways to fix it or replace parts of it with more robust parts, iterating on that and scaling it up, you would know some of the fantastic pleasure of whatever you call it, data engineering.


Sounds like what the media entertainment companies call a "pipeline engineer". Hmmm...


> However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.

This may indeed be how the term "data engineer" is used sometimes, but I have my doubts that it was originally created with this meaning. Not really sure where/when the term "data engineer" was actually created, but ICDE started in 1984 [1] and the Data Engineering Bulletin was renamed in 1987 [2] (from "Database Engineering"). It seems likely that the term "data engineer" has also been used since at least then.

Of course ML did also already exist then, but it's certainly a while before the current "big data" / "deep learning" time. And regarding the topics considered "data engineering" at that time, this is from the foreword of the December 1987 issue of the Data Engineering bulletin:

> The reasons for the recent surge of interest in the area of Databases and Logic go beyond the theoretical foundations that were explored by early work [...] and include the following three motivations:

> a) The projected future demand for Knowledge Management Systems. These will have to combine inference mechanisms from Logic with the efficient and secure management of large sets of information from Database Systems.

Which sounds just as relevant today as it did back then. It also does sound like a rather challenging task, and not exactly like "unpleasant repetitive work". Or at least not any more repetitive than: change some model parameters / retrain model / evaluate results / repeat ;)

[1]: https://ieeexplore.ieee.org/xpl/conhome/1000178/all-proceedi...

[2]: http://sites.computer.org/debull/bull_issues.html


Data engineering jobs named as such started to pop up only in the past few years, coinciding with Map Reduce/Spark availability. I wouldn't be surprised if it was re-introduced by one of the companies developing those systems to distinguish themselves (like Databricks, Cloudera etc.), a sort of a marketing. In the past we had DBAs, now DBA + DevOps + unspecified everything morphed into data engineering.

I used to be a member of SIGMOD and the "data engineering" you mentioned was just an academic term.


Data engineers exist at organisations without any ML work.


Yes, but they are basically what DBAs were before with the addition of ETL. OP is asking about data engineers in the context of ML.


Can confirm.

While data engineer is an excellent role for a fresh graduate, the data engineering profession shares many similarities with the SRE/IT professonional.

The best data engineers are the folk who had job title of DBAs of yonder year.

You will always be the supporting cast, rarely the star.


SRE = Site Reliability Engineer?


yes


This, 100%. That said, most data scientists don't do what you would consider real work (meaning, I assume, interesting work with significant mathematical/analytical meat). There just isn't a lot that's both interesting and useful to private-sector rent-seekers whose opinions of your work determine whether or not you advance.

Most of the people doing real ML in industry are prestige hires--they're hired because their names draw people in, but basically get to work on whatever they want--and you need a top-10 PhD at an absolute minimum to be eligible for those.

The ugly truth about industry is that 99.9997% of it is flow capture based on power relationships, found artifacts (i.e., corruption opportunities) within the state, and the implementation of very simple processes but in a way such that the threat to executive reputations as a first priority, and profit as an important second one, are minimized. This doesn't exactly make a market for ML innovation, unless your boss for some weird reason still cares about being a co-author on your papers (which his bosses will pressure him not to let you publish, because after all, this publishing is a distraction from your paid work).

On the other hand, if you want to be able to afford a house in the Bay Area, and to be tapped for (indeed, most likely forced into, both due to losing interest in and being unhireable for IC work) management in your mid-30s... then go for industry. The poison carrot will make you sick but it will kill you more slowly than poverty, so that ain't so bad, now is it?


Strongly disagree.

There's a vast amount of work that doesn't involve unethical recommendation systems.

Expand your horizon outside the Bay Area.

The plurality of work I see is straightforward computer vision/NLP applications.


I suspect the work you're talking about could be easily handled by an intern working with Core ML and a MacBook.

The landscape is varied. There are companies doing real actual big leading edge stuff, there are companies where ML is sprinkled onto projects as a buzzword but no real interesting work happens, and companies that just need a practical small solution like the ones you mentioned, and could get by with Core ML, but don't because they hire a PhD who isn't aware of Core ML.


What does productionizing coreml look like if I wanted to stand a model up as an rpc service?


I really like your style. I think you should write a book (this is not sarcasm)


I’ve kinda developed the view that large organisations come to mirror the Russian Communist Party.

I’m interested in “flow capture based on power relationships”.

Do you have any recommended reading on this?


Every large organisation tends to be like a small government. Inefficient, drown in politics and unable to change.

There are exceptions - where someone principled dictator impose a VC style model where teams basically become independent startup and die or succeed. 100 fails, one becomes the next revenue maker for the company. That's how AWS was born.


There’s way less inefficiency and way more accountability in the public sector. Look at how efficient publicly funded schools are, for example, or publicly funded rail or healthcare. You could literally pick almost any industry.

Accountability comes from elections. If managers in companies had to be re-elected it would be interesting.


This. Corporations are great at imposing mean-spirited personal accountability (i.e., if you're perceived to have fucked up, you get fucked) but that doesn't actually solve problems or change anything. People get fired, careers end, new faces replace the old, nothing gets learned. Of course, once you get into middle management you're exempted from the stack-ranking bukkake, and executives write their own performance reviews and almost never face consequences for their actions.

Companies are fantastic at making it look like accountability exists, because people at the bottom get punished for even the smallest mistakes, but avoiding any consequences that would affect high-ranking members or force the organization to change how it does business.


That sounds a lot like the DARPA model too.


> I’ve kinda developed the view that large organisations come to mirror the Russian Communist Party.

Only the ones which have an unkillable cash cow. So, I suspect Google or large banks are mostly like that, but places like SpaceX or even large consulting firms (Delloitte, IBM etc., where managers essentially eat what they kill) cannot allow themselves to degenerate into a Chinese court.


Now this is interesting. I've always found it fascinating that when profit is on the table, democracy is nowhere to be found. I've looked, not too hard TBH, for essays and literature discussing the correlations to business model management structures and government/nation political hierarchies - not education level (propaganda), but critical analysis. I've been an employee of several of the top corporations on our planet, and the idea that corruption is not rampant is a farce. One simply lives within the environmental constraints and leaves when it gets to be too much. Does caring about corporate (and the larger realm of ethics) cast one incompatible with a modern corporate hierarchy?


Unfortunately, the only way to prevent hierarchy is to create a limited hierarchy (this is the purpose of constitutions) a priori; hierarchically naive organizations fail on this account. External parties will demand hierarchy simply because they want to know your organization (or nation) isn't wasting their time--no one wants to deliver a sales pitch to people who can't authorize purchases. If they're not careful, a group of people can end up in a state where the necessary-for-external-relations hierarchy becomes a total one. You see this with startup founders; the one who talks to the investors the most ends up in charge, and the ones who deal with employees or low-status counterparties lose power. This is why "flat" organizations can't really work; people who need things from the organization demand to know who to talk to in order to actually get things done, and eventually those "who to talk to" people end up with informal, then formal, power and it's very difficult to get them to give it back.

The large-scale failure of democracy that's happening all over the world is something different, though. Regulation is struggling to keep up with technology, and it doesn't help that nation-states have already been doing a piss-poor job of protecting people from their employers. If the US falls in the next 20 years, it won't be due to Covid or Trump or nation-level adversaries; it'll be due to the obscene power given to employers, who can literally ruin an employee's life--not just fire him, but anally ravage him in perpetuity with bad references--for any reason or none. Eventually, unless national governments start dropping serious lead pipe on employers' heads, people are going to tire of paying 30+ percent of their incomes to a government that lets bosses get away with this shit.


Interesting take. In British history the first positive step towards freedom that I note was the creation of law courts. These gave surfs some power over their lords and provided some level of fairness rather than everything being about favour.

The US does seem to lag the UK and Europe in terms of employment law in some cases (no formal employment contracts for most employees, can be fired without notice, little statutory holiday, maternity or paternity leave entitlement, etc. etc.)

It has been argued that the union movement — while susceptible to corruption — was a hugely positive force in economic and political terms for American workers. Unfortunately thatcher and Reagan saw this as such a threat that they attempted to destroy their own manufacturing base in order to smash the unions.


> If the US falls in the next 20 years, it won't be due to Covid or Trump or nation-level adversaries; it'll be due to the obscene power given to employers, who can literally ruin an employee's life--not just fire him, but anally ravage him in perpetuity with bad references--for any reason or none.

How do you define "US falls"?

>Eventually, unless national governments start dropping serious lead pipe on employers' heads, people are going to tire of paying 30+ percent of their incomes to a government that lets bosses get away with this shit.

People endured much worse in medieval times, and endure much worse right now in China.


> People endured much worse in medieval times, and endure much worse right now in China.

Really? The USA has hollowed out portions of the country equal/worse than the worst 3rd world countries.


I hope you're hyperbolic, the worst 3rd world countries have no governance (unless you count local warlords), 5 year olds working in dangerous and toxic conditions, hunger and slavery.


You don't realize what is going on in the United States. We have portions of the USA where the police don't even bother, and are run by local gangs. We also have children working, in dangerous and toxic conditions. We also have hunger, and yes we have slavery: prison labor. The USA is not what you think it is.


> We have portions of the USA where the police don’t even bother, and are run by local gangs.

The places where the police do “bother” are, ipso facto, also run by local gangs.


> We also have children working, in dangerous and toxic conditions.

Can you elaborate on that? I've never heard that parcitular thing about the US. For reference, In Kongo, there are 5 year olds today carrying heavy buckets in makeshift cobalt mines, a'la XIX century England or France (plus the toxicity of cobalt, people who work in these mines get cancer if they don't die in an accident first). Even with whole families working in such conditions, the pay is not enough and not stable enough to sustain the family, and they are often working while hungry. Is there anything comparable going on in the US?


Gee, the news appears to be scrubbed from most the 'net now, but I recently read about Mitsubishi using child labor in the US: https://flipboard.com/article/major-car-company-used-child-l... This is not as bad as your reference, but know where our police do not go anything is on the table. The US plays extreme.


I'm confused here, does migrant mean illegal or is there some program similar to farms to bring people across the border to work?


It means both; to the employer they are good low expense labor and the business is wise to hire them, to the working class they are illegals taking jobs, (their illegal status tends to be in control of their employer, btw) to the political class they are a source of outrage funding, to the workers themselves they are simply struggling to survive anyway they can - caught by bad luck and an unforgiving world.


Interesting you say that because there’s some critical analysis I’ve come across in the past that states that taxation is a key driver towards democracy and that captured wealth (such as from oil) promotes oligarchy and dictatorship. There are outliers in either direction, naturally.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: