维基词向量
我们正在为 294 种语言发布预训练的单词向量, 并使用 fastText 在 维基百科 上进行了训练. 这些 300 维的向量是通过使用 Bojanowski 等人 (2016) 描述的 skip-gram 模型(使用: 默认参数)获得的.
请注意, 新版本的多语言词语向量可在: [https://fasttext.cc/docs/en/crawl-vectors.html].
Models(模型)
这些模型可以从下面下载:
Abkhazian: bin+text, text | Acehnese: bin+text, text | Adyghe: bin+text, text |
Afar: bin+text, text | Afrikaans: bin+text, text | Akan: bin+text, text |
Albanian: bin+text, text | Alemannic: bin+text, text | Amharic: bin+text, text |
Anglo_Saxon: bin+text, text | Arabic: bin+text, text | Aragonese: bin+text, text |
Aramaic: bin+text, text | Armenian: bin+text, text | Aromanian: bin+text, text |
Assamese: bin+text, text | Asturian: bin+text, text | Avar: bin+text, text |
Aymara: bin+text, text | Azerbaijani: bin+text, text | Bambara: bin+text, text |
Banjar: bin+text, text | Banyumasan: bin+text, text | Bashkir: bin+text, text |
Basque: bin+text, text | Bavarian: bin+text, text | Belarusian: bin+text, text |
Bengali: bin+text, text | Bihari: bin+text, text | Bishnupriya Manipuri: bin+text, text |
Bislama: bin+text, text | Bosnian: bin+text, text | Breton: bin+text, text |
Buginese: bin+text, text | Bulgarian: bin+text, text | Burmese: bin+text, text |
Buryat: bin+text, text | Cantonese: bin+text, text | Catalan: bin+text, text |
Cebuano: bin+text, text | Central Bicolano: bin+text, text | Chamorro: bin+text, text |
Chavacano: bin+text, text | Chechen: bin+text, text | Cherokee: bin+text, text |
Cheyenne: bin+text, text | Chichewa: bin+text, text | Chinese: bin+text, text |
Choctaw: bin+text, text | Chuvash: bin+text, text | Classical Chinese: bin+text, text |
Cornish: bin+text, text | Corsican: bin+text, text | Cree: bin+text, text |
Crimean Tatar: bin+text, text | Croatian: bin+text, text | Czech: bin+text, text |
Danish: bin+text, text | Divehi: bin+text, text | Dutch: bin+text, text |
Dutch Low Saxon: bin+text, text | Dzongkha: bin+text, text | Eastern Punjabi: bin+text, text |
Egyptian Arabic: bin+text, text | Emilian_Romagnol: bin+text, text | English: bin+text, text |
Erzya: bin+text, text | Esperanto: bin+text, text | Estonian: bin+text, text |
Ewe: bin+text, text | Extremaduran: bin+text, text | Faroese: bin+text, text |
Fiji Hindi: bin+text, text | Fijian: bin+text, text | Finnish: bin+text, text |
Franco_Provençal: bin+text, text | French: bin+text, text | Friulian: bin+text, text |
Fula: bin+text, text | Gagauz: bin+text, text | Galician: bin+text, text |
Gan: bin+text, text | Georgian: bin+text, text | German: bin+text, text |
Gilaki: bin+text, text | Goan Konkani: bin+text, text | Gothic: bin+text, text |
Greek: bin+text, text | Greenlandic: bin+text, text | Guarani: bin+text, text |
Gujarati: bin+text, text | Haitian: bin+text, text | Hakka: bin+text, text |
Hausa: bin+text, text | Hawaiian: bin+text, text | Hebrew: bin+text, text |
Herero: bin+text, text | Hill Mari: bin+text, text | Hindi: bin+text, text |
Hiri Motu: bin+text, text | Hungarian: bin+text, text | Icelandic: bin+text, text |
Ido: bin+text, text | Igbo: bin+text, text | Ilokano: bin+text, text |
Indonesian: bin+text, text | Interlingua: bin+text, text | Interlingue: bin+text, text |
Inuktitut: bin+text, text | Inupiak: bin+text, text | Irish: bin+text, text |
Italian: bin+text, text | Jamaican Patois: bin+text, text | Japanese: bin+text, text |
Javanese: bin+text, text | Kabardian: bin+text, text | Kabyle: bin+text, text |
Kalmyk: bin+text, text | Kannada: bin+text, text | Kanuri: bin+text, text |
Kapampangan: bin+text, text | Karachay_Balkar: bin+text, text | Karakalpak: bin+text, text |
Kashmiri: bin+text, text | Kashubian: bin+text, text | Kazakh: bin+text, text |
Khmer: bin+text, text | Kikuyu: bin+text, text | Kinyarwanda: bin+text, text |
Kirghiz: bin+text, text | Kirundi: bin+text, text | Komi: bin+text, text |
Komi_Permyak: bin+text, text | Kongo: bin+text, text | Korean: bin+text, text |
Kuanyama: bin+text, text | Kurdish (Kurmanji): bin+text, text | Kurdish (Sorani): bin+text, text |
Ladino: bin+text, text | Lak: bin+text, text | Lao: bin+text, text |
Latgalian: bin+text, text | Latin: bin+text, text | Latvian: bin+text, text |
Lezgian: bin+text, text | Ligurian: bin+text, text | Limburgish: bin+text, text |
Lingala: bin+text, text | Lithuanian: bin+text, text | Livvi_Karelian: bin+text, text |
Lojban: bin+text, text | Lombard: bin+text, text | Low Saxon: bin+text, text |
Lower Sorbian: bin+text, text | Luganda: bin+text, text | Luxembourgish: bin+text, text |
Macedonian: bin+text, text | Maithili: bin+text, text | Malagasy: bin+text, text |
Malay: bin+text, text | Malayalam: bin+text, text | Maltese: bin+text, text |
Manx: bin+text, text | Maori: bin+text, text | Marathi: bin+text, text |
Marshallese: bin+text, text | Mazandarani: bin+text, text | Meadow Mari: bin+text, text |
Min Dong: bin+text, text | Min Nan: bin+text, text | Minangkabau: bin+text, text |
Mingrelian: bin+text, text | Mirandese: bin+text, text | Moksha: bin+text, text |
Moldovan: bin+text, text | Mongolian: bin+text, text | Muscogee: bin+text, text |
Nahuatl: bin+text, text | Nauruan: bin+text, text | Navajo: bin+text, text |
Ndonga: bin+text, text | Neapolitan: bin+text, text | Nepali: bin+text, text |
Newar: bin+text, text | Norfolk: bin+text, text | Norman: bin+text, text |
North Frisian: bin+text, text | Northern Luri: bin+text, text | Northern Sami: bin+text, text |
Northern Sotho: bin+text, text | Norwegian (Bokmål): bin+text, text | Norwegian (Nynorsk): bin+text, text |
Novial: bin+text, text | Nuosu: bin+text, text | Occitan: bin+text, text |
Old Church Slavonic: bin+text, text | Oriya: bin+text, text | Oromo: bin+text, text |
Ossetian: bin+text, text | Palatinate German: bin+text, text | Pali: bin+text, text |
Pangasinan: bin+text, text | Papiamentu: bin+text, text | Pashto: bin+text, text |
Pennsylvania German: bin+text, text | Persian: bin+text, text | Picard: bin+text, text |
Piedmontese: bin+text, text | Polish: bin+text, text | Pontic: bin+text, text |
Portuguese: bin+text, text | Quechua: bin+text, text | Ripuarian: bin+text, text |
Romani: bin+text, text | Romanian: bin+text, text | Romansh: bin+text, text |
Russian: bin+text, text | Rusyn: bin+text, text | Sakha: bin+text, text |
Samoan: bin+text, text | Samogitian: bin+text, text | Sango: bin+text, text |
Sanskrit: bin+text, text | Sardinian: bin+text, text | Saterland Frisian: bin+text, text |
Scots: bin+text, text | Scottish Gaelic: bin+text, text | Serbian: bin+text, text |
Serbo_Croatian: bin+text, text | Sesotho: bin+text, text | Shona: bin+text, text |
Sicilian: bin+text, text | Silesian: bin+text, text | Simple English: bin+text, text |
Sindhi: bin+text, text | Sinhalese: bin+text, text | Slovak: bin+text, text |
Slovenian: bin+text, text | Somali: bin+text, text | Southern Azerbaijani: bin+text, text |
Spanish: bin+text, text | Sranan: bin+text, text | Sundanese: bin+text, text |
Swahili: bin+text, text | Swati: bin+text, text | Swedish: bin+text, text |
Tagalog: bin+text, text | Tahitian: bin+text, text | Tajik: bin+text, text |
Tamil: bin+text, text | Tarantino: bin+text, text | Tatar: bin+text, text |
Telugu: bin+text, text | Tetum: bin+text, text | Thai: bin+text, text |
Tibetan: bin+text, text | Tigrinya: bin+text, text | Tok Pisin: bin+text, text |
Tongan: bin+text, text | Tsonga: bin+text, text | Tswana: bin+text, text |
Tulu: bin+text, text | Tumbuka: bin+text, text | Turkish: bin+text, text |
Turkmen: bin+text, text | Tuvan: bin+text, text | Twi: bin+text, text |
Udmurt: bin+text, text | Ukrainian: bin+text, text | Upper Sorbian: bin+text, text |
Urdu: bin+text, text | Uyghur: bin+text, text | Uzbek: bin+text, text |
Venda: bin+text, text | Venetian: bin+text, text | Vepsian: bin+text, text |
Vietnamese: bin+text, text | Volapük: bin+text, text | Võro: bin+text, text |
Walloon: bin+text, text | Waray: bin+text, text | Welsh: bin+text, text |
West Flemish: bin+text, text | West Frisian: bin+text, text | Western Punjabi: bin+text, text |
Wolof: bin+text, text | Wu: bin+text, text | Xhosa: bin+text, text |
Yiddish: bin+text, text | Yoruba: bin+text, text | Zazaki: bin+text, text |
Zeelandic: bin+text, text | Zhuang: bin+text, text | Zulu: bin+text, text |
Format(格式化)
单词向量以 fastText 的二进制和文本默认格式出现.
在文本格式中,每行包含一个单词,后面跟着它的向量. 每个值都是空格分隔的. 单词按降序排序.
License(许可证)
该词向量分布在知识 共享署名 - 相同方式共享3.0许可下.
References(参考)
如果您使用这些单词向量, 请引用以下文章:
P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}