Plurrrr

Thu 01 Jun 2023

Transformer Math 101

A lot of basic, important information about transformer language models can be computed quite simply. Unfortunately, the equations for this are not widely known in the NLP community. The purpose of this document is to collect these equations along with related knowledge about where they come from and why they matter.

Source: Transformer Math 101, an article by Quentin Anthony, Stella Biderman, and Hailey Schoelkopf.

Data Compression Drives the Internet. Here’s How It Works.

With more than 9 billion gigabytes of information traveling the internet every day, researchers are constantly looking for new ways to compress data into smaller packages. Cutting-edge techniques focus on lossy approaches, which achieve compression by intentionally “losing” information from a transmission. Google, for instance, recently unveiled a lossy strategy where the sending computer drops details from an image and the receiving computer uses artificial intelligence to guess the missing parts. Even Netflix uses a lossy approach, downgrading video quality whenever the company detects that a user is watching on a low-resolution device.

Very little research, by contrast, is currently being pursued on lossless strategies, where transmissions are made smaller, but no substance is sacrificed. The reason? Lossless approaches are already remarkably efficient. They power everything from the PNG image standard to the ubiquitous software utility PKZip. And it’s all because of a graduate student who was simply looking for a way out of a tough final exam.

Source: How Lossless Data Compression Works, an article by Elliot Lichtman.