Hi there! 👋 I'm Thoai.
I work in the cloud and data platform space, mostly around Kubernetes, Kafka, Spark, and the tools that make modern data systems run smoothly.
I enjoy digging into how data actually flows through systems, how storage works under the hood (pages, blocks, execution…), and how to turn a bunch of scattered services into a clean, maintainable pipeline.
Recently, I've been focusing more on Data Engineering, including:
- designing reliable and scalable data platforms
- technical stacks such as Iceberg, Lakehouse, Trino, Spark
- data modeling techniques
- building ETL/ELT pipeline
- and so on...
I created this blog/docs site to capture what I learn, what I experiment with, and the mistakes I run into along the way. Hopefully it helps someone else, or at least helps future me.
If you're into data, distributed systems, or just want to debug a burning pipeline together, feel free to reach out.
I also keep longer writeups and experiments in my blog/docs notes.




