Neon Founding Data Engineer

Job Title:

Founding Data Engineer – Speech & Data Quality

Overview

We are seeking a Founding Data Engineer with deep experience in speech data, conversational AI, and large-scale data infrastructure. This individual will be responsible for transforming raw conversational audio into structured, high-quality datasets. The role blends engineering rigor with domain expertise in transcription, categorization, and data quality, ensuring that outputs are technically robust and aligned with customer needs.

Key Responsibilities

Build and maintain scalable data pipelines for ingesting, cleaning, and structuring conversational audio.
Implement transcription workflows integrated with speech-to-text systems; validate and refine results.
Design and enforce data quality standards, including categorization (e.g., by topic, accent) and fraud detection.
Leverage platforms such as Snowflake for querying, analysis, and delivery of datasets to customers.
Partner with sales and customer-facing teams to prepare and present data in clear, usable formats.
Develop automated systems to detect anomalies, fraudulent activity, and low-quality submissions.
Collaborate with product and research teams to optimize data workflows for machine learning and analytics use cases.

Qualifications

Prior experience at leading AI, conversational AI, cloud, or big-data companies (e.g., Google, AWS, Meta, OpenAI, Anthropic, 11labs) or equivalent high-scale environments.
Strong coding skills in Python (preferred), Java, or Scala, with solid SQL expertise.
Experience with data platforms such as Snowflake, BigQuery, Redshift, or Databricks.
Familiarity with audio/speech data, transcription pipelines, and NLP/ASR systems.
Proven ability to design and implement data quality and fraud mitigation frameworks.
Strong communication skills for collaboration across technical and customer-facing teams.

Preferred Skills