How We Shipped 20 Data Pipelines Across 10 Countries With AI Agents

We build a Knowledge Base of regulatory content — tax rulings, legislation, court decisions, social security guidance — for accountants and auditors across Europe. Each country has its own tax authority, its own legislation database, its own court archive. Each source needs a web spider, an ETL cleaner, a text chunker, database registrations, an Airflow DAG, and quality control gates.

Imagine the dream scraping setup with self helping.

The pipeline, briefly

Every regulatory source we cover goes through the same stages:

Scrape raw content → Clean HTML → Chunk text → Vectorize → Upload to search index

The interesting part isn't the pipeline itself — it's that adding a new source means repeating this pattern with source-specific logic at each stage. A Finnish tax authority and a Polish court archive need different spiders, different cleaning rules, different category mappings. But the shape of the work is identical.

That repetition is what made it possible to hand off to agents.

Agents as roles, not tools

We split pipeline development into three roles:

#	Role	Does what	Produces
1	Accounting Engineer	Researches a country's regulatory landscape	Source inventory with priority tiers
2	Product Owner	Translates business requirements into technical specs	YAML source specifications

How We Shipped 20 Data Pipelines Across 10 Countries With AI Agents

The pipeline, briefly

Agents as roles, not tools

Skills over prompts

Tracking progress across countries

What we shipped

What we learned