Beyond Local Pattern Matching: Recent Advances in Machine Reading

nopinsight · on Feb 28, 2019

It is noticeable that all of the top 4 results in the recent Stanford Conversational Question Answering (CoQA) challenge are from groups based in China (the top 2 are from Microsoft Research Asia; the 3th and 4th, Sogou and Fudan; the 5th, anonymous):

https://stanfordnlp.github.io/coqa/

Chinese universities are ranked at 1st, 3th, and 4th in the world for papers published in top AI conferences in 2018. (Tsinghua at the top; followed by CMU, Beijing, CAS, and Stanford) Although some people might disagree with the methodology (e.g. larger departments have an advantage), it is still a decent indicator for China’s progress in the field.

http://csrankings.org/#/fromyear/2018/toyear/2018/index?ai&v...

Perhaps it is time for the US as a nation to spend more resources to develop and attract more people into AI research.

See also an Economist’s review of a recent book: “In the struggle for AI supremacy, China will prevail; Or so reckons Kai-Fu Lee, a tech insider, in “AI Superpowers””

https://www.economist.com/books-and-arts/2018/09/27/in-the-s...

1_over_n · on Feb 28, 2019

Yesterday i was researching IP strategies for AI companies and came across this paper from a legal firm.

https://www.bereskinparr.com/files/file/IAM88_AI-and-IP_Jim%...

It blew my mind how much China is filing patents around AI compared to the rest of the world. It should also be considered that from an IP perspective American companies could be going down the trade secret route and academia is open sourcing. I don't have any data on this. I know in the USA trade secrets do now have some legal protection.

https://www.uspto.gov/patents-getting-started/international-...

speedplane · on Feb 28, 2019

China has been filing tons of patents because of the national prestige it brings. A large percentage are of dubious quality and they are setting themselves up for a huge patent troll problem 10 years from now, much like the US had as a result of bogus late 1990/early 2000s patents.

The US patent system has become much better (although Europe is still even better). They now reject over 90% of business method patents (i.e., doing business with a computer). Many of those would be granted in China.

The number of granted patents in any given country is not a good indicator of technical progress.

1_over_n · on Feb 28, 2019

So the question becomes - will China respect the USA's patents, will the USA respect Chinese patents?

speedplane · on Feb 28, 2019

There are well established treaties for this. No country's patent office is forced to grant a patent filed in another country. For example, if you get a patent granted in the U.S., there's no guarantee it'll be granted in Britain. If you want a patent in both countries, you'll need to file a separate patent application in both countries.

The same goes for the U.S. and China. If a U.S. company only has U.S. patents, companies operating in China can make a product that directly infringes it. That said, the second that product is imported into the U.S., the U.S. patent will kick in, and you can get a judge to block it at the border.

Going back to the AI space, the fact that China has a ton of AI patents in their country won't stop Google from developing the same in the U.S. However, if Google tried to offer those infringing services to Chinese residents, they could be sued/stopped in China.

JamesBarney · on Feb 28, 2019

I think I understand how this works with a tangible good. But I have no idea how this works for more abstract goods, for instance like a machine learning patents.

Say I was given a u.s. patent on a an AI process that looks at photos of parking lots and tells you number, type, and length of stay of different cars. Now if someone steals my patent and sets up a server in country I don't like a patent like China. And then another company sends pictures of parking lots to the Chinese copycat company which runs my patented algorithm and then sends them back the data.

Is this illegal? Is there any way to stop it?

speedplane · on March 2, 2019

> Say I was given a u.s. patent on a an AI process .... Now if someone steals my patent and sets up a server in country I don't like a patent like China. And ... runs my patented algorithm and then sends them back the data. > > Is this illegal? Is there any way to stop it?

Yes, under U.S. patent law, this would still probably be illegal (but see below for some nuance). Given how globalized our economy has become, a very similar issue already came up over ten years ago. It wasn't between U.S. and China, but between U.S. and Canada.

Ten years ago, Blackberry reigned supreme in cell phones, and they were a Canadian company. They were sued by companies in U.S. courts holding U.S. patents over networking technology. Blackberry argued that most of the tech covered by the U.S. patents actually took place in Canada (where there were no corresponding patents). The U.S. judges said that, even though most of the steps were performed outside the U.S., U.S. patent law still applied. This is the short version of the story, you can read a much better write-up here: https://www.finnegan.com/en/insights/the-extraterritorial-re...

So, going back to your example of U.S. patented AI algorithms being run in China: If everything stays in China, then it won't be covered by U.S. patent law. However, even if just the results from that patented process that was done entirely in China get imported back into the U.S., there's a good chance that U.S. patent law would kick-in and whoever did that importing could be sued.

That said, this is a relatively new area of patent law. Patent law has been around for hundreds of years, but globalization has only been around for a few decades. The truth is that U.S. judges don't come across these issues very often, so even though it's likely illegal, maybe it isn't, it's very fact and case-by-case specific. It's likely the exact parameters of what is and isn't allowed will be fought in courts for many more years to come.

theblackcat1002 · on Feb 28, 2019

FYI, the top 6 results are mainly based on BERT which is proposed by Google[1]. [1] https://arxiv.org/abs/1810.04805

nopinsight · on Feb 28, 2019

Yes, BERT is a significant advance. I think most would agree that many key advances still originate from the US. (And the US is probably still ahead overall in research.) China is apparently faster at adapting and deploying these changes in some areas and they are very focused on excelling at the technology, which has many major implications on society.

antpls · on Feb 28, 2019

Those rankings based on countries don't say much IMHO. It depends on :

- the score. If the top 5 achieves 85/100 and the the rest of the top 20 achieves 83/100, they are all on par.

- the field. If it's academic with open source code and scientific results shared with everyone on Earth, who cares who is in the top 5?

It's a global effort

nopinsight · on Feb 28, 2019

Actual figures for CoQA from the link above: No.1 is 86.8%; no.5, 81.4%; no.20, 66.5%. No.5, though quite impressive (humans are at 88.8%), still give about two-fifth more errors than no.1.

These technologies tend toward natural monopolies as users tend to choose among the top 3-4 providers which allow them to spread costs, lower prices, and acquire yet more data. The winners at the right time tend to hold long-term monopolistic/oligopolistic power, until a major breakthrough happens.

Also, all results in the top 20 from the same link are from either the US or China (except for a couple of anonymous results). At least in this subfield, there could be only two national contenders, despite many key papers being open.

Exact algorithms, implementation details, and system engineering and integration count.

andreyk · on Feb 28, 2019

Some other fun recent-ish developments in QA datasets:

- Natural Questions (https://ai.googleblog.com/2019/01/natural-questions-new-corp...)

- SearchQA (https://arxiv.org/abs/1704.05179)

Many AI researchers worry the community is overfitting to getting good numbers and not thinking big enough, but it seems to me datasets evolve fast enough to keep up with doing interesting things.

palad1n · on Feb 28, 2019

At what point would we say that a machine/system actually understands natural language? Or is the Chinese Room Argument always going to apply?

serioussecurity · on Feb 28, 2019

Maybe we reckon with the issue that "understand" doesn't mean anything in the absence of it's implications and we rephrase the question. What do you view as consequences of deciding that it "understands"?

palad1n · on Feb 28, 2019

This might go with the question of whether the Turing Test is a valid measure of "intelligence". If we can't tell the difference between a machine understanding something and a person, is that all that is required?

pas · on Feb 28, 2019

There's a view that intelligence without context is meaningless. (You can't do pattern matching without definition of what your signal is -- and if you just use a trivial measure, like entropy, then white noise is the best signal!) So, you can't be generally intelligent. You can be great at a lot of things, but that means you have priors for those things, which inherently predisposes you against other priors. (Though there are of course basically infinite dimensions, in which you can have priors, but let's just assume some dimensions are more important than others.)

What we want is adaptability under goal permanence, meta-goals, usually called alignment. We want an intelligence that is very similar to us. And we like to talk a lot, and we can talk about everything, so we can represent our intelligence through talking, so a Turing test could be a great way to measure how human-similar two "intelligences" are.

mjburgess · on Feb 28, 2019

The machine has no reference for its terms, "salt" does not refer.

To a machine "salt" is just a distributional pattern within a large body of text, it isnt salt.

Ie., what computer scientists call "semantics" is again metaphorical BS, as much as calling an if-statement a "decision maker".

Denotation is the most fundamental part of what a natural language is (ie., it is about the world, and caused by the world in a fundamental way). Machines processing terms without their denotations are not using language, they are shuffling it aorund.

kreelman · on Feb 28, 2019

This looks to be very useful. I wonder how long it will be before we can have one of these datasets being used around the house. Would help with cheating on homework!