Job Title:
Founding Data Engineer – Speech & Data Quality
Overview
We are seeking a Founding Data Engineer with deep experience in speech data, conversational AI, and large-scale data infrastructure. This individual will be responsible for transforming raw conversational audio into structured, high-quality datasets. The role blends engineering rigor with domain expertise in transcription, categorization, and data quality, ensuring that outputs are technically robust and aligned with customer needs.
Key Responsibilities
- Build and maintain scalable data pipelines for ingesting, cleaning, and structuring conversational audio.
- Implement transcription workflows integrated with speech-to-text systems; validate and refine results.
- Design and enforce data quality standards, including categorization (e.g., by topic, accent) and fraud detection.
- Leverage platforms such as Snowflake for querying, analysis, and delivery of datasets to customers.
- Partner with sales and customer-facing teams to prepare and present data in clear, usable formats.
- Develop automated systems to detect anomalies, fraudulent activity, and low-quality submissions.
- Collaborate with product and research teams to optimize data workflows for machine learning and analytics use cases.
Qualifications
- Prior experience at leading AI, conversational AI, cloud, or big-data companies (e.g., Google, AWS, Meta, OpenAI, Anthropic, 11labs) or equivalent high-scale environments.
- Strong coding skills in Python (preferred), Java, or Scala, with solid SQL expertise.
- Experience with data platforms such as Snowflake, BigQuery, Redshift, or Databricks.
- Familiarity with audio/speech data, transcription pipelines, and NLP/ASR systems.
- Proven ability to design and implement data quality and fraud mitigation frameworks.
- Strong communication skills for collaboration across technical and customer-facing teams.
Preferred Skills