Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A data science fellowship to solve the world’s toughest problems (bayesimpact.org)
80 points by pyduan on July 1, 2014 | hide | past | favorite | 25 comments



I appreciate what you guys are trying to do but I can't seen many mathematicians or statisticians applying for this unless you provide a little more information about what these "hard" problems are.

Honestly it reads like your offering basic in training in a a random selection of tools and then hoping some non profits present a problem with nice clean data that can be solved through application of a few methods from scikit.learn.

If you wan't to attract math people my suggestion would be to identify a few intriguing and hard problems a head of time and taking applications specifically for them...you can always suggest a change if you think an applicant would be better suited to a different one. Providing intriguing problems that might match up with peoples pre existing research interests is key...there is lots of room for cross pollination and growth but a bayesian statistician is going to be much more intrigued by something that might benefit from a hierarchical model then something that needs ODE's or online convex optimization.

Worse 4-6 months might not even be enough time to formulate a problem that needs a solution and get the required data in place. Non profits are generally extremely overworked and take a long time to do things. They will not have their data in anything resembling a database or standardized format...think short hand notes in word files if you're lucky. Identifying people and data you can work with on this end a head of time is key.

For the record I work for a non profit analyzing complex diseases and my background is in math. I've also sat on the board of and been involved in a few other non profits.


Paul from Bayes Impact here. I appreciate the sentiment, though in all respect it does seem like most of your concerns are addressed on the website, either on the fellowship page or in the others.

> unless you provide a little more information about what these "hard" problems are

The second paragraph does go briefly over the problems we are currently working on (granted, not in much detail for the sake of brevity, but enough to give an idea of what type of challenges they are). There is a little bit more information on the front page, but granted since we started Bayes Impact two months ago we haven't been able to put as much work into the website content as we'd like to.

> Honestly it reads like your offering basic in training in a a random selection of tools

This is simply not the case -- while their level of experience varies, our current fellows actually comprise some well-established data scientists in their own right. It is precisely because the problems worth solving are tough to solve that we need to round up talented individuals who are able to commit to working on social impact projects full-time and pair them up with industry and domain experts who have the domain knowledge but may not have the time.

They each bring their own set of skills -- for example, someone who built Lyft's grid optimization system might be uniquely suited to help save lives by improving ambulance and fire truck dispatch and reducing average emergency response times.

> and then hoping some non profits present a problem with nice clean data that can be solved through application of a few methods from scikit.learn

This is precisely the point of Bayes Impact and why a longer engagement model such as fellowships is needed in the space (most current data science for social good organizations work on a volunteer basis model), so we have the time to build these longer relationships with nonprofits to leverage data science even in cases where data is messy or sensitive. We go a little bit more in-depth about it on our article here: http://blog.bayesimpact.org/blog/the-bayes-impact-mission/

> Worse 4-6 months might not even be enough time to formulate a problem that needs a solution

This is why they're not 4-6 months, but typically 6-12. We do have a pilot 3 month program in the summer for problems that are comparatively easier to work on.

> and then hoping some non profits present a problem with nice clean data that can be solved through application of a few methods from scikit.learn

This is why we have a fellowship application page and not a project application page -- we actually tend to identify and scope projects ourselves.

On that note though, I want to point out there is no need to be so overly dismissive of the work nonprofit and civic organizations have been doing in collecting and storing clean data. For example, most fire departments we talked to had surprisingly good data, and some such as the Fire Department of New York had even started initiatives of their own to use data science to improve their processes. For example, by integrating building permit data with their own systems, they've been able to direct inspectors where fire were predicted to be more likely to occur.

One direction we've been headed towards is seeking these data-educated organizations to create pilot projects, then use the results of these as a basis to export these solutions in similar institutions whose data practices may not be as good. In that end, we are helped by some data engineers from companies like Splunk or Cloudera so we do believe in working with these organizations in the long run to bring them up to speed. This is precisely the problem we're trying to solve with our model!

> For the record I work for a non profit analyzing complex diseases

Then you might be interested in the project we are doing on Parkinson's with the Michael J. Fox Foundation! Feel free to email me for more details.


I'm trying to offer constructive, if harsh, criticism based on my own experience which includes recruiting for similar positions and working with large and small 501(c)(3)'s.

I don't mean to come off as dismissive but to suggest that your write up is vague to the point of being easily dismissed and provide feedback on how someone from outside your local peer group might read this.

And there are organizations out there with great IT and clean data but I and most people in this field have lost months writing hideous combinations of NLP and regular expression to pull data out of old medical records and things and hand validate it or correct for batch effect in supposedly clean data.

I think that fleshing out the projects and areas of investigation you guys already have lined up would go a long ways towards addressing my concerns and making the program more appealing to the typical analytical folks i've worked with. I'd also suggest focusing the intensive course on analytical methods not the tools, this is what will intrigue people with expertise. At the moment it reads like it is focused at people new the the field with no programing experience.

What data sets/types are you using for the Parkinson's thing? My main focus is on analysis methods that resist the noise, imbalance, heterogeneity and other issues typical in extremely wide/multivariate genetic+clinical+proteomic studies...a few sentences about the study in the write up would have told me a lot about if my skills could be useful. (I'm not looking to relocate but I am always open to collaborations and correspondence with people working on similar things.)


As I said earlier -- I definitely appreciate the sentiment, and constructive criticism is always welcome when actually substantiated. I also took your post as an opportunity to elaborate a bit more on our model so my post got longer as a result.

> And there are organizations out there with great IT and clean data but (...)

This argument also works the other way round -- there are organizations out there with terrible data (and this is especially common with medical data), but there are also many high impact projects for which the data does exist in a workable form that are begging to be solved (and that we are actually working on solving). We are focusing on these in the short term, while laying the groundwork for the others in the medium-long term (both through the research arm we are building, and our data engineers). There is no reason not to get the low-hanging fruit first.

> I think that fleshing out the projects and areas of investigation you guys already have lined up (...)

Agreed. Since we created Bayes Impact two months ago our main focus has been on building the program from scratch and working on the projects as well, so the website has unfortunately taken a backseat. Another problem is that government organizations are very sensitive about communication and we can only communicate about our projects on their timeline. This results in us not having a website as fleshed out as we'd like, but this is par for the course for a new organization.

> I'd also suggest focusing the intensive course on analytical methods not the tools

Ah, I just saw the paragraph you're referring to. I get how the language may be a bit confusing and will make the appropriate changes -- our goal is actually to do the opposite: we bring on individuals who already have the analytical methods but some may not have had exposure to best industry practices. Because we focus on building production systems and not just write case studies, it's important to bring them up to speed in that minor respect. This is why we can spend only a week teaching tools -- teaching analytical methods to people without the required background would likely take much longer, which is not our target audience.

At a broad level we simply provide an avenue for data scientists to work on social impact problems in collaboration with domain experts, with us taking care of the overhead of scoping projects and doing the dirty work of acquiring and preparing the data as well as defining the implementation strategy. We also smooth out the edges in our Fellows' backgrounds if any but this is really not the core of the program.

Fortunately the pool of applicants as well as our current fellows does not seem to echo your fears but I'll review and see which changes to the fellowship page could help remove ambiguities in the future.

Hope it helps clarify. Regarding the Parkinson's project, feel free to reach out to me by email -- unfortunately we need to wait for the press release from the MJFF and the other partner before I can actually communicate about the details publicly.


Seems like you've got big data problems to solve and data scientists up the wazoo.

I would think the missing element would include avant problem-solvers, regardless of (advanced) degrees or not who are as outstanding in that specialty as the data scientists are in theirs.


To get more exposure, consider posting the fellowship to these subreddits:

http://www.reddit.com/r/datascience

http://www.reddit.com/r/datasets/

http://www.reddit.com/r/statistics

http://www.reddit.com/r/machinelearning/

If you have not already, I would recommend reaching out to these companies to sponsor: Cloudera, Palantir, New Relic, Tableau, Domo.


Awesome - thanks for the feedback. We're indeed going to post to those subreddits and reach out to those companies to potentially sponsor us. If you know a good contact, we'd love to be introduced!


You will also probably find people interested in this on: http://lesswrong.com/


We just tried to, but couldn't b/c of the karma requirement :(


This is an awesome initiative. It's good to see an organization using and promoting data science for something other than "optimizing click ads." Kick some ass guys!


I love the last item in the FAQ,

> I am a frequentist. Can I still join Bayes Impact?


Always glad to see these skills put to uses besides selling products and eyeballs!

Here's another fellowship using data science towards non-commercial goals (global health research): http://www.healthdata.org/get-involved/fellowships

Full disclosure: I participated in the fellowship in 2008.


Hi kfor, the fellowship program sounds really interesting. Do you mind chatting with our team and telling us about your experience?


I have a vehicle routing solution (minimal routes via multiple destinations, with time windows, capacity constraints, weekly scheduling; it's a website service on top of Google Maps) that I would be happy to provide for free to social impact projects. Email is in my profile if you're interested.


This site does not work properly on Firefox, because of cross-origin requests of fonts.

  downloadable font: download failed (font-family: "sinkin_sans600_semibold" style:normal weight:normal stretch:normal src index:1): 
  bad URI or cross-site access not allowed
  source: http://d1arcc3qu8ndpn.cloudfront.net/fonts/SinkinSans-600SemiBold-webfont.woff


Thanks gulbrandr! We're fixing right now


For those who think this is an awesome idea, but that don't want to relocate and/or work full-time, I recommend you check out the similarly minded http://www.datakind.org/


Thanks for the link man, this really look wonderfull.


Can you elaborate on what a "Fully funded fellowship" means? I'm guess it's vague because you haven't figured out how much support you'll be able to provide yet?


Hi Shoyer, one of the founders here! For our fall fellowship, support will likely be in the range of $4,000-6,000 per month based on experience. We also provide a fellowship house in San Francisco for our fellows to live in.


Hi ajiang, This is a great initiative. Glad to see Data Science knowledge put to use for noble causes. I am a mentor in a Data science/analytics program based in Bay Area where we help professionals looking for a career change to data science. We are always hunting for interesting projects for them to work on. Would love to have them work on real projects with noble goals. Love to connect to discuss this possibility. If interested, please ping me. You can find my email in my profile. Thanks.


Hi hsshah, that sounds interesting. Shoot us a note at hello@bayesimpact.org - we'd love to talk!


I am hunting on your webpage for program dates but can't find any... how would the fellowship work if I'm a grad student?


Hi Balsam, the program would start mid september going on for 6-12 months with some flexibility on timing. For grad students, we partner with a number of universities to work with capstone / final project programs. Reach out to us at fellowships@bayesimpact.org.


This sounds amazing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: