Senior Data Engineer

Tech Stack: Python, Airflow, Scrapy, Azure AI Search, Azure, SQL. Come ambitious and hungry!

About the Role

Join our core engineering team to build the data backbone of our AI-powered accounting platform. You will take full ownership of our data ingestion and processing infrastructure—transforming raw, unstructured web data into high-quality search indexes for our LLMs. We value engineers who treat data pipelines as production products and can navigate the complexity of high-scale web scraping.

Responsibilities

Architect resilient scraping infrastructure: Build and maintain high-volume, compliant web scrapers using Scrapy to ingest financial and regulatory data from diverse sources.
Power the AI Context Window: Design pipelines to clean, chunk, and index data specifically for Azure AI Search, ensuring our RAG systems have the most relevant and up-to-date context.
Orchestrate complex workflows: Design and optimize data pipelines (ETL/ELT) using Apache Airflow, ensuring data quality and timely delivery.
Anti-Bot Evasion & Proxy Management: Implement sophisticated strategies to handle CAPTCHAs, IP rotations, and headless browsing to ensure 99.9% pipeline uptime.

Requirements

5+ years of data engineering experience, with a heavy focus on Python.
Deep knowledge of web scraping and building scraping pipelines at scale (handling anti-bot countermeasures, dynamic content, and headless browsers).
Experience configuring and optimizing Azure AI Search indexes (vector search, semantic search, hybrid retrieval).
Proficiency with Apache Airflow for DAG authoring and scheduling.
Strong SQL skills and experience modeling data for analytics.

Nice to Have

Familiarity with running workloads on Kubernetes.
Experience fine-tuning ranking algorithms or scoring profiles in search indexes.
Knowledge of LLM integration patterns (RAG).

Taxxa.ai

Senior Data Engineer

Senior Data Engineer

About the Role

Responsibilities

Requirements

Nice to Have

Interested?