Plurrrr

Thu 13 Apr 2023

Transformer Deep Dive: Parameter Counting

Recently, the machine learning community has been buzzed with discussions surrounding how many parameters state-of-the-art models are. This blog post aims to offer a comprehensive overview of parameter counting, delving into the methods of counting parameters in various transformer-like architectures from first principles and shedding light on how the majority of parameter counts can be attributed to the feedforward network. For this blog post, we will mostly focus on OpenAI’s GPT-3 architecture and Google’s PALM architecture.

Source: Transformer Deep Dive: Parameter Counting, an article by Oren Leung.