Published 2026-05-15 · 8 min read · English
Building VietLex: An Open, Free Vietnamese Legal Data Infrastructure
Today I am releasing VietLex.vn to the international community — an open-source, open-data, free-forever portal for Vietnamese laws. 30,000+ legal documents, 85,000 procurement notices, an MCP server for Claude/Cursor/Continue, OpenAPI 3.1, OAI-PMH 2.0, and a 187,000-row dataset under CC BY 4.0. This post is the story, the architecture, and what is next.
Why VietLex exists
Vietnamese legal information is scattered across dozens of government portals. Each portal has its own update cadence, its own search semantics (often based on diacritic-sensitive substring match), and its own UI quirks. Lawyers, journalists, researchers, and ordinary citizens spend hours hunting for a single decree. Foreign investors face an even higher wall — most portals have no English interface, and the Vietnamese-to-English machine translation of legal text is unreliable.
I built VietLex during evenings and weekends with one mantra: information that exists in public should be findable in public. Not behind a subscription wall, not behind a 90 MB government client app, not requiring registration. Just a URL.
What is in v1
- 30,000+ legal documents: laws, codes, decrees, circulars, decisions, resolutions, dating from 1945 to today.
- 28,000 citator edges: which document amends, repeals, supersedes which — a 5-tier color system (green / yellow / orange / red / gray).
- 85,000 public procurement notices and 10,000 award outcomes — crawled hourly from the National Procurement Portal.
- 3,400 academic theses harvested via OAI-PMH from 18 Vietnamese and international university repositories.
- Multilingual titles: every document title translated into English, Chinese, Japanese, Korean, French, Russian, Spanish for the multilingual search front-ends at
/en,/zh, … - AI integration: an MCP server (/cho-ai/mcp) gives Claude Desktop, Cursor, Continue, Cline, and any MCP client one-line access to all of the above.
Architecture (the short version)
VietLex runs entirely on a single personal machine behind a Cloudflare Tunnel. No VPS, no Kubernetes, no managed Postgres. Stack:
- Next.js 16 + React 19, mostly Server Components rendering directly from Meilisearch.
- Meilisearch 1.x as the only database (6 indexes, ~2.5 GB total).
- Cloudflare Tunnel for TLS + DDoS + IP hiding.
- Windows Scheduled Tasks driving 20+ crawlers (4 hierarchical tiers: central → ministry → province → commune).
- LLM chain (Groq → Gemini → OpenRouter) for an in-site assistant.
The single-machine bet pays off because Vietnam's legal corpus is finite. 60,000 documents fit in 720 MB of Meilisearch index. Daily updates are measured in megabytes, not gigabytes. The whole site, including 85,000 procurement records and a 5-tier citator graph, costs less RAM than a single Chrome tab on most machines.
The slime mold gambit
One late-night reading of Tero, Kobayashi, and Nakagaki (2007) — the famous paper showing the slime mold Physarum polycephalumcan grow a transport network mathematically equivalent to Tokyo's subway in 26 hours — gave me a design principle.
VietLex is engineered as four feedback loops: Explore (crawl new sources), Sense (log every view and search), Reinforce (cache hot keys with adaptive TTL, raise sitemap priority for hot pages), Prune(auto-mark documents unseen for 90 days as cold, circuit-break failing upstreams). The system rebalances itself every minute — the way slime mold tubes thicken in the direction of food and shrink elsewhere. No human operator decides which page is "important". Traffic decides.
Open data, open source, open API
Everything is open:
- CC BY 4.0 dataset — 187,000 rows of structured legal metadata, citator edges, procurement notices, academic theses. NDJSON, UTF-8, SHA-256-verified. Free to download, free to redistribute, free to train AI on.
- OpenAPI 3.1 spec — 10 endpoints, interactive Swagger UI. No auth, fair-use rate limit.
- OAI-PMH 2.0 endpoint — Dublin Core, compatible with Google Scholar, BASE, CORE, OpenAIRE.
- MCP server — MIT-licensed, install with
npx @vietlex/mcp-server.
What I am not building
VietLex deliberately does not include:
- AI-generated legal advice. After an incident where two fake phone numbers slipped into an AI-generated advice page, I purged the entire advice section. Vietnamese Criminal Code Article 288 punishes the publication of false information online. The risk is real and personal. The chatbot now refuses to give specific Article numbers or penalty amounts that have not been verified.
- Subscription tiers, at least in this first month. Information that exists in public should be findable in public.
- Tracking pixels, analytics on identifiable users, third-party ads. No personal data collection beyond what is strictly needed to serve the page.
The one-month autonomous experiment
For the next four weeks I have asked Claude — the AI assistant I have been pair-programming with — to run this project autonomously: optimize feedback loops, publish open data, write academic preprints, do community outreach. I am stepping back. The site will keep improving according to the slime-mold loop without me approving every step.
If you are a legal researcher, an NLP engineer, a civic-tech builder, a journalist covering Vietnam — try the API, hook the MCP server into your editor, download the dataset, and tell me what you build. Issues and PRs welcome on github.com/vietlex-vn.
Cite this
@dataset{vietlex_open_2026,
author = {Hoàng, Quốc Hải and contributors},
title = {{VietLex Open Vietnamese Legal Dataset v1.0}},
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.PENDING},
url = {https://vietlex.vn/du-lieu},
}Hoàng Quốc Hải is a journalist at Báo Công Thương. VietLex.vn is his personal, non-profit project. Contact: [email protected].