ML/DS Casual Tech Talk @March 29, 2024

※資料内の画像は明記がない限り、記載されているURLのものを使用しています

LLMOps: What It Is, Why It Matters, and How to Implement It

LLMOpsに関する包括的なブログ
LLMOpsとはどのような関心事があり、それらを扱う際に直面する課題やベストプラクティス等が紹介されている
MLOpsのサブ分野としつつ、どこが従来のMLOpsと異なるかなどがまとまっており参考になった

Task MLOps LLMOps

Primary focus Developing and deploying machine-learning models. Specifically focused on LLMs.

Model adaptation If employed, it typically focuses on transfer learning and retraining. Centers on fine-tuning pre-trained models like GPT-3.5 with efficient methods and enhancing model performance through prompt engineering and retrieval augmented generation (RAG).

Model evaluation Evaluation relies on well-defined performance metrics. Evaluating text quality and response accuracy often requires human feedback due to the complexity of language understanding (e.g., using techniques like https://huggingface.co/blog/rlhf.)

Model management Teams typically manage their models, including versioning and metadata. Models are often externally hosted and accessed via APIs.

Deployment Deploy models through pipelines, typically involving feature stores and containerization. Models are part of chains and agents, supported by specialized tools like vector databases.

Monitoring Monitor model performance for data drift and model degradation, often using automated monitoring tools. Expands traditional monitoring to include prompt-response efficacy, context relevance, hallucination detection, and security against prompt injection threats.

Task	MLOps	LLMOps
Primary focus	Developing and deploying machine-learning models.	Specifically focused on LLMs.
Model adaptation	If employed, it typically focuses on transfer learning and retraining.	Centers on fine-tuning pre-trained models like GPT-3.5 with efficient methods and enhancing model performance through prompt engineering and retrieval augmented generation (RAG).
Model evaluation	Evaluation relies on well-defined performance metrics.	Evaluating text quality and response accuracy often requires human feedback due to the complexity of language understanding (e.g., using techniques like https://huggingface.co/blog/rlhf.)
Model management	Teams typically manage their models, including versioning and metadata.	Models are often externally hosted and accessed via APIs.
Deployment	Deploy models through pipelines, typically involving feature stores and containerization.	Models are part of chains and agents, supported by specialized tools like vector databases.
Monitoring	Monitor model performance for data drift and model degradation, often using automated monitoring tools.	Expands traditional monitoring to include prompt-response efficacy, context relevance, hallucination detection, and security against prompt injection threats.

プロンプトエンジニアリングの開発生産性的なベスプラと課題など
RAGシステムのアーキテクチャ図など

Untitled

@suk1yak1

ABEMAにおけるレコメンドロジックのA/Bテストを分析してみた

ABEMAにおけるレコメンドロジックのA/Bテストを分析してみた | CyberAgent Developers Blog

2stageモデルで、2stage目がrerankerか類似度ベースかでABテストした際の効果検証