Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Forgive me starting with the cliché, the financial sector has entered the technical lexicon, but I’m afraid I have to talk about “moats.” Popularized decades ago by Warren Buffett to mean a company’s competitive advantage, the phrase made its way into Silicon Valley pitch decks when the memo says. Downloaded from Googletitled “We Don’t Have Moat, Nor OpenAI,” complained that open AI will take over Big Tech’s platform.
Years later, the walls of the building remain intact. Apart from the slow fear when DeepSeek first appeared, open source AI models did not fare much better than proprietary ones. However, none of the frontier labs—OpenAI, Anthropic, Google—have a moat to speak of.
The company that owns the moat is Nvidia. The company’s CEO, Jensen Huang, has called it “the most valuable asset”. No, as you can imagine chip companya piece of hardware. It’s something called CUDA. What sounds like an FDA-banned drug could be a real moat in AI.
CUDA technically stands for Compute Unified Device Architecture, but as laser or scubano one bothers to expand the acronym; we just say “KOO-duh.” So what is this most important resource? When forced to give a one-word answer: equality.
Here is a simple example. Suppose we work on a machine and fill a 9 × 9 table. Using a computer with one core, all 81 tasks work one by one. But a GPU with nine cores can distribute work so that each core takes a different role—one from 1×1 to 1×9, another from 2×1 to 2×9, and so on—for a six-fold speedup. Modern GPUs can be very clever. For example, if they were designed to recognize a variable — 7 × 9 = 9 × 7 — they could avoid duplicate operations, reducing 81 operations to 45, roughly halving the workload. When a single course costs hundreds of millions of dollars, every optimization counts.
Nvidia GPUs were originally designed to render graphics for video games. In the early 2000s, a Stanford PhD student named Ian Buck, who first got into GPUs as a gamer, realized that their architecture could be repurposed for high-performance computers. He created a programming language called Brook, was hired by Nvidia, and, with John Nickolls, led the development of CUDA. If AI ushers in a generation of autonomous devices and autonomous devices, know that it will be because someone is playing the game somewhere. Destruction I thought the demon’s scrotum should vibrate at 60 frames per second.
CUDA is not a programming language itself but a “platform.” I use the term weasel because, unlike The New York Times and a newspaper that is also a game company, CUDA has, for years, been a bundle of AI programming libraries. Each task shaves off nanoseconds for a single calculation – added, it makes GPUs, in the company’s words, go brrr.
Modern pictures The card is not a circuit board full of chips and memory and fans. It is a detailed combination of cache units and special units called “tensor cores” and “streaming multiprocessors.” In this sense, what the chip companies sell is like a professional kitchen, and many cores are like a kitchen. But even a kitchen with 30 burners can’t run fast without a master chef capable of intelligently distributing tasks – like CUDA does for GPU cores.
Extending the analogy, hand-crafted CUDA libraries designed for one matrix task are the equivalent of kitchen tools designed for one task and nothing else – a cherry pitter, a shrimp deveiner – which is great for home cooks but not if you have 10,000 shrimp guts to get out. Which brings us back to DeepSeek. Its engineers went below the deep layers to work directly in PTX, the language version of Nvidia GPUs. Let’s say the job is to peel garlic. A default GPU would go: “Keep skin and claws.” CUDA may advise: “Cut the clove with a knife.” PTX allows you to dictate a few instructions: “Raise the blade 2.35 inches above the cutting board, align it with the equator of the clove, and hit the ground with your hand with a force of 36.2 newtons.”