📅 February 11, 2021 (Clubhouse Data & AI Group)

🎙️ Bilal Mahmood - Head of Product @ Amplitude (the product analytic SaaS company), Past: CEO ClearBrain (Causal Analytics Platform, acq. by Amplitude), PM Data Science @ Optimizely

🔑 ****Plaintext his. Italic comments mine.

Product Development Framework

Evaluate product market fit without code.

  1. Interview without preconceptions. Start agnostic to the solution (and if you need AI). Interview 10/20 potential customers about their problem with no bias to the solution.
  2. Validate with contracts. Iterate to come up with a solution. Ask: does this solve your problem and will you pay me ~$1k+ per month for it? Biggest sign of validation (and not polite agreement) is pen to the paper.
  3. Build with 'question-validation' sprints. Each 2 week sprint validates a question or derisks a component. Epics are hypotheses, user stories are solutions.

Scaling Challenges for AI SaaS Products

The transformation layer

The transformation and data cleaning layer that integrates with data lake is the most challenging technically to solve. There are a variety of input sources, types, and non-obvious problems with data that need to be removed and canonicalized to enable processing across entire datasets. Hire ML engineers before data scientists because ETL problems are more pressing than investigating state of the art ML models.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/4478f558-cabb-4650-9917-ac82bf7c0216/Screen_Shot_2021-02-04_at_8.46.15_PM.png

This echoes the classic 'Hidden Technical Debt in Machine Learning Systems'. The common trope with ML systems is the vast complexity. The actual model represents only a small fraction of the total infrastructure. The rest is held together with (hopefully not) pipeline sprawl and spaghetti code.

Data idiosyncrasies cause crashes. ETL pipelines have a long-tail of potential inputs. Compute on dataflows is hard (Apache Spark is a sharp, finicky knife). ****Handling the long-tail of varied potential user input is a serious scalability issue for AI systems.

Identity resolution across multiple input sources is hard. Luckily, data warehouse APIs have standardized schemas that are similarly formatted, removing need for one-off pipelines. APIs help you scale. Avoid data lakes without identical taxonomies.

Building for actual user need

"Transforming data isn't a product, it's a job".

You need the whole end-to-end system. Marketers don't want a data engineering product, they want churn prediction at scale for their digital experiences.

No code is still too hard for marketers. KISS for business audiences. Persona to work with is Product (actual users). Marketers don't live in data tools. Changing workflows is hard (remembering how to use functionality if it is not engrained is a great way to introduce friction). Build for the every day user.

Expansion beyond initial big contracts