Random papers I find interesting:


DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

arxiv.org


AWQ: ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

arxiv.org


TRANSFORMERS WITHOUT NORMALIZATION

Transformers without Normalization