Was this article helpful?
0
0
No votes have been submitted yet.
0
No votes have been submitted yet.

GaroVec v1.0 is the first publicly documented Latin-script Garo embedding model, developed by MWire Labs to support linguistic equity and low-resource NLP for Northeast India. It combines FastText-style subword embeddings with bilingual alignment techniques to create a hybrid English↔Garo vector space, enabling cross-lingual applications such as lexicon building, translation support, and semantic search.


This model was built in collaboration with native Garo speakers and is part of a broader initiative to create reproducible, timestamped language resources for endangered and underrepresented languages. GaroVec is optimized for Latin-script Garo, which is commonly used in digital communication and educational contexts, and is designed to be modular and extensible for future dialectal or phonetic variants.


Released under a permissive license (CC BY-SA 4.0), GaroVec v1.0 is intended for public use in research, education, and civic technology. It is hosted on Hugging Face with full documentation, including training methodology, evaluation notes, and usage examples. This submission aims to make GaroVec discoverable to linguists, educators, and technologists working to preserve and revitalize the Garo language.

ELP Language
Garo
ELP Categories
Language and Technology Language Revitalization, Education, and Learning
Resource Types
App/Software
Country
India
Media Image
Placeholder 6
Tag
Computational Linguistics and NLP Creating Digital Materials Technology

Source URL: https://www.endangeredlanguages.com/resource/garovec-v10-hybrid-english-garo-embedding-model