Seanghay Yath

Senior ML Engineer

GitHubTwitterHugging FaceLinkedIn

I am a Senior ML Engineer at the Digital Government Committee at the moment, previously I led the Front-end development team there. My interests are product design, product development, graphics programming, music and deep learning.

Products

Verify.gov.kh

A national document verification platform for Cambodia

StopCOVID.gov.kh

An innovative national COVID-19 tracing web app

Sarika.gov.kh

High-quality Khmer Text-to-speech engine for government

Joon.com.kh

Local food directory and food review social network

Koh Santepheap Daily

A well-known newspaper company. I work on the native Android client and its backend API

Pi Pay (Android)

The most innovative and complete cashless payment platform in Cambodia

Khmer Dictionary

The blazingly fast Khmer dictionary

KhmerOCR

An OCR app for any languages

CarnetDia (Android)

The first mobile app that raises awareness about diabetes in Khmer language

QR.GOV.KH

A productive and fast QR generator

Clients
Research

Khmer Text-to-Speech System Proprietary

An end-to-end system for Khmer speech synthesis which includes a custom text phonemizer, text normalizer, tokenizer and vocoder optimized for the Khmer language.

KhmerTagger

Inverse Text Normalization for Khmer Automatic Speech Recognition

Khmer Forced Aligner

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Khmer Pronouce

A Khmer pronounciation toolkit

Tha (ថា)

A Python library for Khmer Text Normalization and Verbalization Toolkit

Acoustic Model for Khmer language

Text to Audio forced aligner similar to KFA but it was trained with Montreal Forced Aligner instead of Wav2Vec

Khmer Punctuate

Punctuation Restoration for Khmer language

XLM-RoBERTa for Khmer Language

Training from scratch using Masked Language Modeling task on 5M Khmer sentences or 162M words or 578K unique words for 1M steps.

Open-Source
Sone

A declarative Canvas layout engine for JavaScript with advanced rich text support.

sosap (សូរសព្ទ)

Python binding for Phonetisaurus

Joint Khmer Part-of-Speech Tagger and Word Segmenter

An open-source part of speech tagger for Khmer language using BiLSTM.

khnormal.cpp

Khmer encoding normalization implementation in C++.

khmercut

A (fast) Khmer word segmentation toolkit

khmer-unicode-converter

Khmer Unicode Converter for JavaScript

SoundCheck

A multi-processing audio check

Khmer Segment

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

pycrfpp

Python binding for CRF++

CleanVoice

A Fast Speech Enhancement toolkit using Conv-TasNet (Yi Luo, Nima Mesgarani)

web-crfsuite

A CRFSuite port for Node, Browser & Deno

Khmer Lunar Calendar

A simple and lightweight Khmer lunar calendar. (1.7kB minified)

vector-drawable-svg

Convert Android VectorDrawable to SVG

Contact

You can reach me at seanghay.dev@gmail.com