| | Show HN: RULER – Easily apply RL to any agent (openpipe.ai) |
| 81 points by kcorbitt 39 days ago | past | 11 comments |
|
| | Summary-RL (openpipe.ai) |
| 1 point by s16h 54 days ago | past |
|
| | Everything I know about reward hacking (openpipe.ai) |
| 3 points by kcorbitt 68 days ago | past |
|
| | ART·E: how we built an email research agent that beats o3 (openpipe.ai) |
| 3 points by kcorbitt 3 months ago | past | 2 comments |
|
| | PII-Redact – SOTA PII Redaction on Your Laptop (openpipe.ai) |
| 6 points by Arctic_fly 4 months ago | past | 1 comment |
|
| | Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” (openpipe.ai) |
| 199 points by kcorbitt 5 months ago | past | 55 comments |
|
| | Analyzing OpenAI's Reinforcement Fine-Tuning: Less Data, Better Results (openpipe.ai) |
| 4 points by kcorbitt 7 months ago | past |
|
| | Using reinforcement learning and $4.80 of GPU time to find the best HN post (openpipe.ai) |
| 217 points by kcorbitt 9 months ago | past | 95 comments |
|
| | DPO fine-tuning outperforms SFT (openpipe.ai) |
| 1 point by kcorbitt 10 months ago | past |
|
| | OpenPipe (openpipe.ai) |
| 1 point by handfuloflight 10 months ago | past |
|
| | Fine-Tuning Best Practices: Models (openpipe.ai) |
| 2 points by gk1 10 months ago | past |
|
| | Fine-Tuning for Production Apps (openpipe.ai) |
| 2 points by ijidak 11 months ago | past |
|
| | Fine-Tuning Best Practices Series Introduction and Chapter 1: Training Data (openpipe.ai) |
| 3 points by sebg 11 months ago | past |
|
| | LLM Fine-Tuning Best Practices: Base Models Proprietary/Open Source, Large/Small (openpipe.ai) |
| 2 points by billmalarky 11 months ago | past | 1 comment |
|
| | LLM Fine-Tuning Best Practices for Training Data Curation (openpipe.ai) |
| 1 point by billmalarky on Aug 2, 2024 | past | 2 comments |
|
| | OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (openpipe.ai) |
| 13 points by kcorbitt on June 20, 2024 | past | 2 comments |
|
| | What we've learned in 3 days of Llama 3 (openpipe.ai) |
| 3 points by kcorbitt on April 22, 2024 | past |
|
| | Mixtral Curious? Comparing Mistral 7B and Mixtral for fine-tuning (openpipe.ai) |
| 1 point by kcorbitt on Feb 29, 2024 | past |
|
| | S-LoRA: Serving Thousands of Models from One GPU for Fun and Profit (openpipe.ai) |
| 1 point by kcorbitt on Jan 18, 2024 | past |
|
| | Mistral 7B Fine-Tune Optimized (openpipe.ai) |
| 234 points by tosh on Dec 20, 2023 | past | 103 comments |
|
| | Is AI the next crypto? Insights from HN comments (openpipe.ai) |
| 237 points by kcorbitt on Nov 8, 2023 | past | 367 comments |
|