Файл:Linguistic Regularities in Continuous Space Word Representations Rvecs.pdf
Tomas Mikolov∗, Wen-tauYih, GeoffreyZweig Microsoft Research Redmond, WA 98052
Continuous space language models have recently demonstrated outstanding results across a variety of tasks. In this paper, we examine the vector-space word representations that are implicitly learned by the input-layer weights. We ﬁnd that these representations are surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-speciﬁc vector offset. This allows vector-oriented reasoning based on the offsets betweenwords. Forexample,themale/female relationship is automatically learned, and with the induced vector representations, “King Man + Woman” results in a vector very close to “Queen.” We demonstrate that the word vectors capture syntactic regularities by means of syntactic analogy questions (provided with this paper), and are able to correctly answer almost 40% of the questions. We demonstrate that the word vectors capture semantic regularities by using the vector offset method to answer SemEval-2012 Task 2 questions. Remarkably, this method outperforms the best previous systems.
Нажмите на дату/время, чтобы просмотреть, как тогда выглядел файл.
|текущий||16:02, 22 декабря 2016||0 × 0 (121 КБ)||Slikos|
- Вы не можете перезаписать этот файл.