| 1 | E2B | 43s | 13 | 1 | 0 | $0.47 | 4/5 | 88Bzero errors, first-try success, 43s from key to working sandbox |
| 2 | Vercel Sandbox | 1m 37s | 19 | 1 | 0 | $1.23 | 4/5 | 81Bexcellent docs, zero code errors, first-run success — but OIDC auth model requires human OAuth device flow |
| 3 | Daytona | 2m 8s | 19 | 1 | 1 | $0.52 | 4/5 | 75Bclean core SDK, one fs.uploadFile doc gap, 4/5 discoverability |
| 4 | Freestyle | 2m 33s | 22 | 1 | 0 | $0.96 | 4.5/5 | 75Bzero errors, clean first-run success — but 22 tool calls and incomplete VM lifecycle docs drag the score |
| 5 | Cloudflare Sandbox | 5m 5s | 29 | 0 | 1 | $1.66 | 3.5/5 | 69Czero interruptions and excellent template scaffold — but bare Ubuntu container without Python cost 4 debug cycles |
| 6 | Blaxel | 3m 46s | 34 | 1 | 1 | $1.01 | 5/5 | 66Cclean SDK, scattered docs, image catalog gap caused first-run failure |
| 7 | Modal | 2m 50s | 31 | 1 | 2 | $0.63 | 4/5 | 66CpipInstall SDK mismatch between Python and Node SDK, fragmented Node docs, 31 tool calls |
| 8 | CodeSandbox | 3m 25s | 32 | 2 | 1 | $2.11 | 4/5 | 62Ctyped SDK with clean quickstart, but underdocumented return types and auth friction |