Author

Created on

Last Updated On

Language

Garo

Resource Type

App/Software

Computational Linguistics and NLP

Computational Linguistics and NLP

Creating Digital Materials

Creating Digital Materials

Technology

Technology
GaroVec v1.0 — Hybrid English↔Garo Embedding Model

GaroVec v1.0 is the first publicly documented Latin-script Garo embedding model, developed by MWire Labs to support linguistic equity and low-resource NLP for Northeast India. It combines FastText-style subword embeddings with bilingual alignment techniques to create a hybrid English↔Garo vector space, enabling cross-lingual applications such as lexicon building, translation support, and semantic search.


This model was built in collaboration with native Garo speakers and is part of a broader initiative to create reproducible, timestamped language resources for endangered and underrepresented languages. GaroVec is optimized for Latin-script Garo, which is commonly used in digital communication and educational contexts, and is designed to be modular and extensible for future dialectal or phonetic variants.


Released under a permissive license (CC BY-SA 4.0), GaroVec v1.0 is intended for public use in research, education, and civic technology. It is hosted on Hugging Face with full documentation, including training methodology, evaluation notes, and usage examples. This submission aims to make GaroVec discoverable to linguists, educators, and technologists working to preserve and revitalize the Garo language.

Was this article helpful?
0
0
No votes have been submitted yet.
0
No votes have been submitted yet.

Recommended Resources

Drawing of people sitting around a campfire and standing in a group, in colorful organic style. Text reads "Endangered Languages: A Fact Sheet"

Outreach and Awareness

Outreach and Awareness

Language Diversity and Language Endangerment

Language Diversity and Language Endangerment
Endangered Languages: A Fact Sheet

Submitted by

ELP

Published on:

Illustration of a leaf and an arch shape

Poetry and Literature

Poetry and Literature

Multilingualism and Bilingualism

Multilingualism and Bilingualism

Language Learning and Teaching

Language Learning and Teaching

Creating Digital Materials

Creating Digital Materials
1001 Languages on bilingual-picturebooks.org

Submitted by

ELP Community

Published on:

Illustration of two fish

Linguistics

Linguistics

Language Learning and Teaching

Language Learning and Teaching

Grammars and Language Description

Grammars and Language Description
Beginner Aymara Course

Submitted by

ELP Community

Published on: