Thoai's DataBrain

Export my resume Go to my notesGo to my notes

About Me

Loading models

Hi there! 👋 I'm Thoai.

I work in the cloud and data platform space, mostly around Kubernetes, Kafka, Spark, and the tools that make modern data systems run smoothly.

I enjoy digging into how data actually flows through systems, how storage works under the hood (pages, blocks, execution…), and how to turn a bunch of scattered services into a clean, maintainable pipeline.

Recently, I've been focusing more on Data Engineering, including:

designing reliable and scalable data platforms
technical stacks such as Iceberg, Lakehouse, Trino, Spark
data modeling techniques
building ETL/ELT pipeline
and so on...

I created this blog/docs site to capture what I learn, what I experiment with, and the mistakes I run into along the way. Hopefully it helps someone else, or at least helps future me.

If you're into data, distributed systems, or just want to debug a burning pipeline together, feel free to reach out.

I also keep longer writeups and experiments in my blog/docs notes.

Go to my notesGo to my notes

My Experiences

Data Platform Engineer

UDP - The CrownX - Masan Group Ho Chi Minh City, Vietnam

March 2026 - Present

Designed and implemented an LLM-powered data dictionary and metadata governance workflow on Databricks leveraging Unity Catalog for 10K+ datasets.
Generated table and column descriptions, business grain, ownership and governance tags, key field suggestions, and PII classifications.
Added human-in-the-loop validation to improve catalog completeness and support downstream data quality, masking, access control, and discovery use cases.

Databricks Azure Blob Storage

Data Engineer

ZDA - Zalopay - VNG Corporation Ho Chi Minh City, Vietnam

September 2025 - February 2026

Designed and integrated a unified feature store based on Feast into a legacy data platform, supporting batch and real-time feature engineering and API-based online serving with GitOps-style governance and versioning.
Operated feature serving at scale with 14M MAUs, around 2M streaming events/day, and sub-200 ms feature retrieval latency.
Built batch and streaming ETL / feature engineering pipelines using Spark and Airflow.
Developed internal frameworks and libraries to automate streaming pipeline deployment and management, allowing ML Engineers and Analysts to focus on business logic instead of infrastructure complexity.
Owned and improved the Risk data and ML platform with Spark, Airflow, HDFS, and related systems, ensuring stable daily operation of 50-60 Spark applications, each processing up to 500M-1B records per run.

Feast Spark Airflow TiDB Hadoop Docker

Data Platform Engineer

xPlat - FPT Smart Cloud - FPT Corporation Ho Chi Minh City, Vietnam

April 2023 - August 2025

Contributed as a core member of the Data Platform team at a cloud service provider, delivering a platform-as-a-service for real-time ingestion, distributed processing, governed access, and self-service analytics to enterprise clients.

Engineered a full-fledged LakeHouse platform with comprehensive data governance for Spark and Trino.
Integrated OAuth2-based identity propagation, fine-grained access control, and dynamic data masking through Apache Ranger.
Added automated lineage tracking through OpenMetadata and standardized encryption at rest with S3 SSE-C.
Developed high-throughput CDC pipelines (100GB/day, 5K TPS) using Kafka Connect & Debezium, migrating 500+ PostgreSQL tables to ClickHouse, Iceberg, and S3.
Built a self-service Spark environment on JupyterHub with a custom Profile Manager, secure session provisioning, LakeHouse integration, and dynamic environment configuration.
Enhanced Spark orchestration by creating custom Airflow plugins integrated with Spark Operator for modular job submission, runtime tracking, and real-time log streaming.
Built unified monitoring dashboards using Prometheus and Grafana to track pipeline SLAs and detect anomalies across Spark, Kafka, and Airflow.

Kubernetes ArgoCD Debezium Kafka ClickHouse Airflow JupyterHub FastAPI Iceberg Spark Prometheus Grafana

Backend Engineer Intern

xPlat - FPT Smart Cloud - FPT Corporation Hanoi, Vietnam

October 2022 - March 2023

Researched Kafka architecture and deployment feasibility, then designed Kafka-as-a-Service solutions on both VMs and Kubernetes.
Deployed Kafka on Kubernetes using Strimzi and implemented end-to-end monitoring with JMX, Telegraf, Prometheus, Grafana, and alerting through Telegram.
Built Kong plugins and integrated API gateway into microservices on Kubernetes.

Kafka Strimzi Kong API Gateway Spring Boot

My Education

B.Sc. in Computer Science

Hanoi University of Science and Technology Hanoi, Vietnam

September 2018 - September 2023

The program was a 5-year engineering track, which is internationally mapped as a B.Sc., though locally recognized as an engineer's degree.

Open Source Contribution

Feast

Optimized MySQL Online Store write performance by implementing batch insert and transaction grouping, significantly reducing write latency. #5699
Introduced HDFS Registry backend, allowing teams to manage Feast feature definitions on Hadoop-compatible file systems. #5655
Added HDFS Staging support for Spark Offline Store, enabling distributed materialization and more efficient large-scale feature computation. #5635

ClickHouse Kafka Sink Connector

Refactored Hikari connection pool logic to prevent NPEs, avoid memory leaks, and improve thread safety using ConcurrentHashMap. #1048