— Things I find online and I am looking into it.
LLM Token Rate Optimization:
Calculating GPU Memory for Serving LLMs