- US & State Government tax information. The tax code is amazingly complex. It would be good if you could figure out the tax implications of various events. Better if they simplified the tax code to where this was possible.
Moreover, data.gov so far is a disappointment. Maybe they'll improve it. But I'd love to dig in to purchasing and personnel costs for every branch of government.
- Good stock market historical tick data (and streaming data). Opentick tried for a time, but mostly you need to contact expensive services. This makes it hard to mess around.
- A good local event API. Lots of companies have tried, few have good results.
- LinkedIn: They've been promising an open contact book API since 2007, but have kept it closed. If they had an open API (that lets you actually store data/invite), it would mean a lot of sites would be building on them.
- My dream: The big scientific journals would require everyone publishing a paper to upload all relevant datasets to a central repository which could be queried.
- Anyone you have an account with (Financial firms, banks, vendors) would have a standard commerce API. Sure, today you can export stuff in Quicken, etc formats, but Mint had to do a big deal with Yodlee to get the data in a uniform, queryable way.
And, mad props to Microsoft for opening the Bing API with pretty good terms. Google used to have a search API and it had horrible terms. Then they decommissioned it. Who would have guessed that MS would be more developer-friendly than G?
Who would have guessed that MS would be more developer-friendly than G?
Anyone who's done Microsoft-based development? MS is incredibly developer-friendly compared to many platform vendors. It's a point of business strategy for them (if memory serves me, that's what Ballmer's infamous "developers, developers, developers, developers" thing was about in context).
Just don't try and mix in any technology from one of MS's competitors and you'll be fine.
True, they are developer friendly. .NET is a good platform. It's extensive and well documented. There is innovation in C# as well. But the purpose of all that goodness is to make you sell their licenses. Where Oracle has their salesforce, Microsoft has their developers to do the selling for them.
Unfortunately their licensing is from an era that has come and gone. Building software that uses things like Windows Server, SQL Server, SharePoint or Office means to limit your scale to what they call "Micro ISV". You provide that package to your client and 95% of your revenues go straight to Microsoft.
You can build on their stuff but you won't scale and you won't grow, not because their technology doesn't scale, but because their licensing doesn't scale.
You can build on their stuff but you won't scale and you won't grow, not because their technology doesn't scale, but because their licensing doesn't scale.
What about their licensing for Azure? Does that scale or is it more of the same?
I don't think it's entirely clear yet what direction Azure will take. The base offering using only .NET and Windows with simple storage is priced almost exactly the same as google app engine. It's difficult to compare to Amazon because the architecture is so different. Google and Azure (I believe) won't let you do any serious computation in memory whereas Amazon does support that very well.
However, look at SQL Azure. Microsoft thinks that their SQL offering is worth paying 66 times what you pay for google's database (or Azure BLOBs&Tables) and that's just for storage alone ($1 per GB, maxing out at 10GB right now). Add data transfer and you may get to 100 times.
Yes SQL Server has vastly more features than google's db, but does it make me 100 times more productive or profitable? I don't think so. If I need more SQL features I could run Postgres on Amazon or use Amazon's MySQL service for roughly 10% of what SQL Azure costs.
So if SQL Azure is the model of what's to come then I think it's indeed more of the same.
"My dream: The big scientific journals would require everyone publishing a paper to upload all relevant datasets to a central repository which could be queried."
I work on a site that's trying to do that for the learning sciences.
Most of the data so far is from various studies in the Pittsburgh Science of Learning Center, because the head of the PSLC can tell the researchers to put their data in there, but we'd like to convince others to share, too.
I also happen to be working on a web services API to this data at work right now.
What quotas do they have, though? I remember people reporting to be blocked from Google for "looking like a bot". So I thought the search API will only work if the requests come from lots of different IP addresses, as would be the case for use in AJAX applications.
Tax information would be HUGE. I would also like to see an API for general government records.
For example, if we had voting records, we could finally stop hearing all this "but you voted on -----" "no I didn't go back and check the record" "no look you voted on ----- which is basically the same" "shut up I served in the war" "yay war" stuff every election. We could just look it up and say "hey look you did". The Times "Congress API" is on its way to this, but last I checked, all it had was the attendance records.
Interestingly, since the failure of OFX, many many banks in North America prefer screen scraping because they can guarantee the web interface has accurate data since their actual customers use it. Conversely, these banks find supporting APIs to be highly fragile since no one is watching them.
I find that logic amazing, but considering how old the online banking software is and the high risk of changing it wholesale, I don't think they are in a position to fix it in the short term.
Check out http://www.cruxle.com. It recommends movies and TV shows available on TV you might love to watch. We are planning to open our TV guide recommendations via XML-based API. Please send us an email at info@cruxle.com, if you would like to access our APIs.
Generally: I wish practically everything had an open API. It's incredible what people can build when they have decent access to an API.
Specifically:
TuneCore, as it would make my startup idea a whole lot easier.
School registrations. I really wish there was some standardized API for universities. That'd make it possible to plug in the classes you want to take along with when you want to take them and get back a personalized schedule. As it is now (at least for my school), you pretty much have to write down on paper the classes you want to take along with when they're available and do it all yourself. I'd prefer something like this be standardized so one could make one website that would serve all universities and their students.
Appliances: stove, oven, coffee maker, refrigerator. Various other household items: garage doors, locks, etc. This overly expensive, yet dumb, flat panel TV we just bought. Extending out to the driveway...my car. The energy meter on the side of the house (read-only access, obviously, just to keep the power co. happy). The most mundane stuff could use an API, if you ask me!
But more than a specific API, it would be cool if websites simply provided an XML(/JSON/etc) version for every urls. Eg, http://news.ycombinator.com/item.xml?id=955077 would return the data in this page in XML format. This would be pretty simply to create (at least as read-only API), handle the situations where people resort to HTML scraping and effectively remove the need for API docs.
There's also RSS, but all of these provide access to mail functionality, while Gmail is a lot more than that. The uses I have in mind are closer to Google Labs' "Unsend", or having web service-like access to moving stuff around in gmail, creating alternate UI etc.
Definite business opportunity there, the two services I have found (in this case I'm looking for Texas data) both consider using an iframe "integration".
Yes. I got hired to integrate MLS data and man was that a nightmare. Many are surprisingly low-tech and almost all use different formats. There is a half-assed attempt at standards (RETS), but that's for accessing the data not the data itself. Not to mention getting and presenting the data legally--each MLS group has different rules about what you're allowed to display (and it can be incredibly restricting). tl;dr the project went down in flames.
I'd love to see Google get ahold of the data and make it available through Base.
I'm sure Google will get there eventually, as they're doing with all industries that traditionally make information inaccessible. Google just enhanced Google Scholar to be a major competitor to Westlaw, which should make a lot of frugal lawyers happy.
I work for a sports data integration company, so you can get this data easily. Oh, you mean for free? Yeah, that's not gonna happen any time soon. Someone has to pay people to collect those statistics and then make them available.
You know how sportscasters always have some sort of weird statistic to pull out of their ass, like "Brett Favre has never lost a game, at home, with the temperature less than 34 degrees"? Is there some sort of API or query system so someone at the networks can throw this shit together out of the raw data in real time?
Again, the company I work for offers this kind of data but you have to pay for it. We have lots of clients who run fantasy baseball web sites and make use of our real-time stats feeds to update their games.
Do you ever have any issues with leagues about using their data? I figured something like this would be a good idea, to format stats etc, but thought I might get in trouble for using "their" stats inappropriately or whatever.
We offer stuff ranging from simple on-demand calls to our web service (you purchase credits, docs cost you X credits per access) to a Perl-based app that captures an XML feed and parses and inserts the data into a DB for you to use as you will at your end, usually MySQL but we support MS-SQL too.
We've got some pretty big clients: ESPN, USA Today are amongst them. We supplied Google with Olympic content during the Beijing Olympics.
I've been building a little side thing called snailpad (http://www.snailpad.com) and have a beta API running for some of the paying customers at this point.
Disclaimer: I'm an amateur in every sense of the word but eager to learn. A Craigslist API could make for some really cool mashups but many of those could probably also be created using the RSS feeds they provide. Do you see any substantial advantage to a distinct Craigslist API?
There was a mashup that did something with Craigslist and photos, but Craigslist shut them down. I think it allowed you to view the listing with the photos in it (rather than needing to click on each listing to see the photos).
No kidding. He's looking to get "comments for a given thread, score of posts and comments, and user information", which you can use an HTML parser (like Hpricot if you're using Rails) to retrieve any info you need.
You're being annoying. Nobody wants to screenscrape, it's awkward and fiddly and fragile and verbose and requires reverse engineering and is wasteful of bandwidth.
It really doesn't matter if you can get the information by screenscraping. The thread topic is "what would you like an API for?", not "scorn people about what they want an API for."
Care to explain why I'm being downvoted?
Because I want less of this sort of thing on HN. I want it to fall to the bottom of the page, and to discourage it in future. I cannot articulate precisely what it is that I want less of, it's a big vague fuzzy blob of things, some of which I even agree with, and your few posts here and above here are within it, so down they go.
Subway. On the stations here they have displays, that show in how many minutes the next train will arrive (which is real-time data). Would be cool if they broadcast this information on the net, you could see if you need to hurry to the station or not.
a real-world grep. this might be solved in my lifetime, since books, notes, etc are all moving to digital. sometimes i just wish i could grep for x and it would find x in all my books, notebooks, etc.
I'm actually working on this problem. It's remarkable the amount of content we actually generate and the number of services/mediums we generate them on. Simply fetching all the data and hosting it centrally is a large enough problem, let alone indexing all of it.
I really want an API that provides info on TV shows. Ever since I heard about the Tommyverse[1] I've wanted to build an application that automatically maps it and lets you play around with the connections between shows, and while it's technically possible to do it with the data IMDB does make available, you can't easily make it public and it's a real hassle.
Fortunately there are a few different startups working on this, and a rudimentary way to hack it with Google AJAX API but there's still nobody who can allow me to simply punch in name+city search and give me the address, geocode meta reliably for any city in the world and without cache restrictions.
Google has it all but needs to open this data up better.
Is that even possible at the moment? Most software I've tried just isn't ready for something like that (granted, I haven't tried any of the commercial products). Tesseract is pretty good, but definitely not perfect.
The general answer to the character recognition problem is negative (currently not possible). However, several important subproblems are solvable. For example, if you know that a particular image came from a printed text, or from handwriting, or a (printed) form, etc, there are some very good solvers. Not perfect (neither is a human, in case of handwriting or a poor fax), but good to very good.
For further googling, see terms ICR, OCR, character recognition.
As a user, I have had very good results with Finereader. Have not tried Tesseract. Parascript was good with online character recognition, but that market is small, and I have not looked at them for a while (disclaimer: I used to work in a previous incarnation of the company).
"Not perfect" is good enough for me. There's a whole class of applications where being able to take a photo of some words and have at least a few of them understood would be useful. I'd like to be able to index against the words that can be understood, and present the original image as the search result.
I think it would be a fantastic idea be able to hold up a cameraphone, take a picture of a sign in a foreign country, send it to the cloud and have the text recognised and Google-translated.
"Not perfect" would be fine for many non-life-or-death signs.
College class schedules (as well as registration schedules). Currently, even the best are buggy, slow and cumbersome. I have to take certain classes at different schools in order to fulfill my degree requirements and keep working full time. So, I often need to search for a single class across multiple institutions for the time that fits.
I had the same problem, worked over-time (startup life), and took classes at two institutions as a way around the bureaucracy for pre-reqs. I ended up automating the class look up process using selenium on Firefox. But then they called me in because my account was making requests every second, and were worried more students would try what I was doing and crash the system. (The DID tell me to keep checking for open spots, they never said to not use scripts)
This was the only way I could get the classes I needed to graduate on time
Suggest/autocomplete. Well, it's not really enough to call it an API, since it's already just AJAX requests for JSON, but it'd be nice if the format was standardized and they all supported JSONP so that it can be cross-domain.
I've always wanted to break into CL's space due to their utter disregard for the garden of web APIs growing up around them. They could be doing so much more with their service for the community, and I don't mean going the big time corporate route obsessed with short-term revenue.
There have been many CL clones(kijiji), but none gained enough eyeballs to make an interoperating API work. I suspect these attempts failed because they didn't pay attention to the community and instead acted liked closed corporations right out the starting gate. I thought the CL killer would appear on Facebook, but that hasn't happened because of totally different dynamics in Facebook's ecosystem.
Actually, I'm surprised that I haven't seen someone one here posting a hacked up API wrapper around Craigslist as a mini-startup.
Sorely needed. I can imagine pretty amazing mashups that revolved around your local community if Craigslist had an API. I'd imagine there would be a much bigger developer community than Twitter's because of the usefulness of their data and how important they are to local businesses
An semantic equivalent to an "#include" or "using" statement that looked for the library on a central repository instead the local disk, automatically managing caching and versioning.
Pandora - Music Genome Project. I've been working on a media player, AuraMP, for a number of years and I've really wanted to plug into sites like Pandora.
Correct, they do have an open api but it just doesn't always seem to have the ability or content that Pandora has. I would like best to use as many of the major providers as possible.
i wrote a parser for NBA games, get the data straight from yahoo. Using Python and BeautifulSoup, piece of cake. I would think MLB could work the same.
Yeah, HTML scraping ESPN.com is not very hard. I did it for college basketball once in a strange sort of database that attempted to predict March Madness brackets. It worked really well for teams that had encountered each other during the season... except only 2 or 3 pairs of teams had ever played each other during the regular season. Works much better on NBA and NFL games, where there are far fewer teams, they see each other more often, and the rosters are a little more static.
I've seen a few people respond with this "you can just html scrape that." Sure you can HTML-scrape for the information, but the topic of this "Ask HN" is about what APIs you would like to see. Maybe he already HTML-scrapes ESPN.com, but would prefer that there was an official API for it, no?
and justMagicallyWork(), the call that papers over leaky abstractions, off-by-one errors, misunderstandings, logical errors in your thinking and just does whatever it is you were trying to do.
>>> data = urlopen('example.org/lovelydata/2009/')
YetAnotherException: [...]
It needs some kind of web form cookie login non-standard authentication. Aaargh I can't spend any more time on this sub-project it was only supposed to take two minutes!
>>> import bigGuns
Warning, Universe enters a fragile state. Tread carefully.
>>> data = justMagicallyWork(urlopen('example.org/lovelydata/2009/'))
Success. Cost: 12 Karma.
>>> del bigGuns
normality restored
Local movie times. It's stupid this isn't available from someone. The only thing it's going to do is increase the number of people going to see a movie.
Maybe there's one now but 3 years ago there wasn't one and I had to build a scraper for Yahoo movies to build the product I wanted.
An API for a service that prints and snail mails documents cheaply and professionally in the Netherlands. That would let me build a cool online billing app.
- Good stock market historical tick data (and streaming data). Opentick tried for a time, but mostly you need to contact expensive services. This makes it hard to mess around.
- A good local event API. Lots of companies have tried, few have good results.
- LinkedIn: They've been promising an open contact book API since 2007, but have kept it closed. If they had an open API (that lets you actually store data/invite), it would mean a lot of sites would be building on them.
- My dream: The big scientific journals would require everyone publishing a paper to upload all relevant datasets to a central repository which could be queried.
- Anyone you have an account with (Financial firms, banks, vendors) would have a standard commerce API. Sure, today you can export stuff in Quicken, etc formats, but Mint had to do a big deal with Yodlee to get the data in a uniform, queryable way.
And, mad props to Microsoft for opening the Bing API with pretty good terms. Google used to have a search API and it had horrible terms. Then they decommissioned it. Who would have guessed that MS would be more developer-friendly than G?