How I Reduced Our Startup's LLM Costs by Almost 90%

@redeemed2000 I take it they don't keep your endpoints warm all the time then. Do you have any insights in the cold start times? You likely don't care about cold start times since I figure you do batch inference once a day, but i am trying to find out how viable this is for real time inference (lets say, is sub 5 seconds latency (startup+inference) for 1k tokens achievable?).
 
@redeemed2000 Yes, I always wondered about the running costs and OpenAI bills. Seems that you made the right decision in only sending API requests when necessary! Great job. I'd love to read more about that side of "AI startups"
 

Similar threads

Back
Top