clured's comments

clured · on June 1, 2020

Open Syllabus | Senior Machine Learning Engineer, Full Stack Software Engineer | Remote | Full-time | https://docs.opensyllabus.org/

Open Syllabus (OS) is a non-profit organization that collects and analyzes millions of university syllabi to support novel teaching and learning applications. Open Syllabus' first two applications - the Syllabus Explorer and Co-Assignment Galaxy - are recognized as major contributions to the open learning ecosystem. The project has been featured in The New York Times (twice), The Washington Post, Nature, Time, FiveThirtyEight, FastCompany, Lifehacker, and dozens of other publications and media.

To learn more about Open Syllabus, check out:

- Open Syllabus Explorer (https://opensyllabus.org/): Top-ranked books and articles in the corpus, sliced by author, field, university, publisher, and country

- Open Syllabus Galaxy (https://galaxy.opensyllabus.org/): Visualization of the book and article citation graph (node2vec -> UMAP)

- Dataset documentation (https://docs.opensyllabus.org/): Description of the underlying dataset, with details about the ETL and model inference pipeline

We're hiring for two roles to help us build tools to query and explore OS's 22-billion word corpus of syllabi:

- Senior Machine Learning Engineer (NLP, recommender systems) - https://docs.google.com/document/d/15lhJY9gzAmUe23WH3D8qKmaS...

- Full Stack Software Engineer https://docs.google.com/document/d/1A-xICUedIK6iG0t0Ji58XDe8...

Get in touch at contact@opensyllabus.org. Come help us build tools that help people learn things!

clured · on May 5, 2017

Open Syllabus Project | Web Applications Developer (Python and Javascript) | Full-time | NYC or Remote

The Open Syllabus Project is an academic data mining project based at Columbia University that's analyzing a corpus of 1M+ college course syllabi. We launched a beta version of the project with an op-ed in the New York Times last year [1]. Since then the project has appeared in Nature, Time, The Washington Post, The Chronicle of Higher Education, MarketWatch, Der Spiegel, Business Insider, Lifehacker, FiveThirtyEight, WNYC, QZ, and elsewhere. It's also been picked up by major news outlets in Europe, Russia, China, Japan, South Korea, Ukraine, Egypt, and Mexico. With new funding from the Sloan, Hewlett, and Templeton foundations, we're working towards second release of the project that will feature much larger collections of syllabi, books, authors, institutions, and publishers.

http://explorer.opensyllabusproject.org/

We're hiring a full-stack web applications developer to take a leading role in the development of these public-facing web services. We're looking for a developer who has significant experience at both layers of the web stack - someone who enjoys building large, stateful Javascript applications, and also is able to build and maintain the server-side APIs that feed these client applications.

PROJECTS

- Build a API in Python (Flask or Django) that organizes the results of the metadata extraction pipeline into web-facing data stores (Elasticsearch, Postgres) and exposes well-structured REST endpoints for the client application.

- Build a front-end application using React and Redux / MobX that surfaces the data on the web.

- Work with the data engineering team to define data requirements for the front end application.

QUALIFICATIONS

- 3+ years of professional experience in software engineering.

- Demonstrated ability to build high-quality, fast web applications that serve sizable traffic.

- Experience building large, stateful Javascript applications with React and React-ecosystem libraries like Redux and MobX.

- Experience with modern Javascript build tools like Webpack or Gulp.

- Experience with server-side Python development with Flask or Django.

- Commitment to sustainable engineering practices - automated testing and deployment, continuous integration, and reproducible development environments.

- An eye for clean, readable, extensible, well-tested code.

- Experience with remote / distributed collaboration on GitHub.

Drop us a line at syllabusopen@gmail.com.

[1] https://www.nytimes.com/2016/01/24/opinion/sunday/what-a-mil...

clured · on April 4, 2016

Full-Stack Engineer / Data Scientist | The Open Syllabus Project (http://explorer.opensyllabusproject.org) | NYC / SF | Full-time | NYC or Remote

The Open Syllabus Project is an academic data-mining project at Columbia and Stanford that’s extracting structured information from a corpus of 1M+ college course syllabi. What’s actually being taught in college classrooms? How has this changed over time? What can we learn about the organization of the modern university from large-scale trends in the texts that are being assigned? How can insights from these data be applied to curriculum development, education policy, and lifelong learning?

We launched a beta version of the platform with an op-ed in the New York Times in January, and since then the project has appeared in The Washington Post, Time, The Chronicle of Higher Education, MarketWatch, Der Spiegel, Business Insider, Lifehacker, FiveThirtyEight, WNYC, QZ, and elsewhere. It's also been picked up by major news outlets in Europe, Russia, China, Japan, South Korea, Ukraine, Egypt, and Mexico.

We're looking for someone who has experience with large-scale data analysis, natural language processing, web archiving, and web application development to help us grow OSP into a comprehensive, feature-rich authority about teaching trends in higher education. Some of the things we're going to be working on in the coming months:

* Build a scalable infrastructure for crawling university websites for syllabi, with the goal of growing the corpus to 4-5M documents in the next 6 months.

* Expand the universe of books and articles that we search for in syllabi by identifying new bibliographic databases (Citeseer, arXiv) and integrating them into OSP’s data extraction pipeline.

* Write classifiers to improve the accuracy of the citation and metadata extraction jobs.

* Expand the public-facing web application to surface new types of information – visualize change in assignment trends over time, add profile pages for authors and publishers, and build richer ways to explore the citation graph.

* Help develop a research program around the data. We’re interested in applications to information science, literary studies, education policy, history of science, and canon / university studies.

If these kinds of projects sound interesting, we'd love to hear from you! We use Python for the data extraction rig and the public-facing website (Flask), Elasticsearch for citation extraction, React+Redux on the front end, and Ansible to manage infrastructure on AWS. Beyond specific technologies, though – first and foremost we're looking for a collaborator and partner who can help us build on what we have and push the project in new directions.

Drop us a line at syllabusopen@gmail.com.

Links:

* http://www.nytimes.com/2016/01/24/opinion/sunday/what-a-mill...

* https://www.washingtonpost.com/news/wonk/wp/2016/02/03/what-...

* http://time.com/4234719/college-textbooks-female-writers

* http://www.spiegel.de/unispiegel/studium/aristoteles-bis-mar...

* http://www.businessinsider.com/the-most-popular-required-rea...

* http://lifehacker.com/open-syllabus-project-shows-the-books-...

* http://fivethirtyeight.com/features/to-kill-a-mockingbird-au...

clured · on Jan 23, 2016

No, I don't believe there is, but something like that would be fantastic. I believe some universities in the UK publish reading lists in XML format, but I don't know of any US universities that do that.

It would certainly make this kind of citation analysis much easier and more accurate. Syllabi are tricky to work with because there's basically no standardization in how texts are referenced / assigned. Sometimes there's a full, structured bibliographic citation, but more often it's just a title / author pair, and the formatting can vary widely. It's an interesting information extraction problem.

jkaraganis · on Jan 23, 2016

Other author here... There is this new W3C initiative: https://www.w3.org/community/schema-course-extend/

danso · on Jan 23, 2016

Thanks! Excellent article btw

jkaraganis · on Jan 23, 2016

More generally, we decided that trying to get faculty to adopt better structured authoring tools for syllabi was hopeless. There are lots of good, unused syllabus-building tools.

elviejo · on Jan 24, 2016

Examples of such tools? Preferably open source

clured · on Jan 23, 2016

Co-author here. Yes, this UI is showing metadata extracted from the syllabi (namely, text assignments). Not the documents themselves, which, unfortunately, we're unable to make public for a mix of copyright and privacy reasons.

SeanLuke · on Jan 24, 2016

Why do you need to provide the documents? Why not just provide links to them? Google can do this with no legal issues: surely you can too.

gingerlime · on Jan 24, 2016

like the GP, I was also disappointed not to have links to the original syllabi if it's available online.

For subjects like Human Anatomy, the list of potential books is very limited, but I'm more interested in how the course is structured. Which anatomical structures or body regions are highlighted or covered, whether the syllabi uses systems or regional approach etc.

clured · on July 3, 2014

Hm, weird, definitely not intentional - that may just be something about how the font (IM Fell French Canon, via Google) gets rendered on Windows. I'll see if I can find something better.