Senior Data Engineer
Tech Stack: Python, Airflow, Scrapy, Azure AI Search, Azure, SQL. Come ambitious and hungry!
About the Role
Join our core engineering team to build the data backbone of our AI-powered accounting platform. You will take full ownership of our data ingestion and processing infrastructure—transforming raw, unstructured web data into high-quality search indexes for our LLMs. We value engineers who treat data pipelines as production products and can navigate the complexity of high-scale web scraping.
Responsibilities
- Architect resilient scraping infrastructure: Build and maintain high-volume, compliant web scrapers using Scrapy to ingest financial and regulatory data from diverse sources.
- Power the AI Context Window: Design pipelines to clean, chunk, and index data specifically for Azure AI Search, ensuring our RAG systems have the most relevant and up-to-date context.
- Orchestrate complex workflows: Design and optimize data pipelines (ETL/ELT) using Apache Airflow, ensuring data quality and timely delivery.
- Anti-Bot Evasion & Proxy Management: Implement sophisticated strategies to handle CAPTCHAs, IP rotations, and headless browsing to ensure 99.9% pipeline uptime.
Requirements
- 5+ years of data engineering experience, with a heavy focus on Python.
- Deep knowledge of web scraping and building scraping pipelines at scale (handling anti-bot countermeasures, dynamic content, and headless browsers).
- Experience configuring and optimizing Azure AI Search indexes (vector search, semantic search, hybrid retrieval).
- Proficiency with Apache Airflow for DAG authoring and scheduling.
- Strong SQL skills and experience modeling data for analytics.
Nice to Have
- Familiarity with running workloads on Kubernetes.
- Experience fine-tuning ranking algorithms or scoring profiles in search indexes.
- Knowledge of LLM integration patterns (RAG).