A Brief History of AI

October 2025

From the 1956 Dartmouth conference to ChatGPT and beyond — the short version of how we got here.

The idea of artificial intelligence is older than computers. But most of what matters happened in the last decade. Here is the short version.

Where it started (1956)

The word “artificial intelligence” was coined in 1956 at a summer conference at Dartmouth College. A group of researchers gathered to figure out whether machines could think. They decided the answer was probably yes, and named the discipline.

For the next few decades, AI meant rules. Programmers would hand-code logic. If X, then Y. Expert systems that knew a lot about a narrow topic, nothing about anything else. They were brittle. They broke constantly. Progress came in cycles, followed by loss of funding and interest. These slow periods became known as AI winters.

The core problem: you cannot write enough rules to cover the real world.

The turning point (2012)

The shift came from a different approach. Instead of writing rules, let the machine learn from examples.

This idea, called machine learning, had existed since the 1980s. What changed in 2012 was scale. A neural network called AlexNet, trained on a lot of images using graphics cards (GPUs) that had been built for video games, won an image recognition competition by an enormous margin. The gap between it and every other approach was so large it ended the debate.

Within a few years, the entire field moved to this approach: large neural networks, trained on massive datasets, using GPU hardware.

The paper that changed everything (2017)

In 2017, a team at Google published a research paper called “Attention Is All You Need.” It described a new way to build neural networks, called the Transformer architecture.

Without getting into the details: before Transformers, language models were slow and struggled with long-range context. The attention mechanism the paper introduced let models relate any word in a sentence to any other word, regardless of how far apart they were. This was a major improvement. It made large language models possible at scale.

Every major language model today — GPT, Claude, Gemini, Llama — is built on this architecture.

The LLM era (2018 to now)

Things moved fast from there.

In 2018, Google released BERT, the first major Transformer-based language model for understanding language. In 2019, OpenAI released GPT-2 — and briefly held it back because they worried it was too good at generating convincing text.

In 2020, GPT-3 arrived. 175 billion parameters. It could write essays, answer questions, generate code, translate languages. The public perception of what AI could do shifted dramatically.

In November 2022, ChatGPT launched. One million users in five days. The fastest-growing consumer product ever at the time. Anyone with a browser could now have a conversation with a large language model. Most people had never done that before.

The two years after that saw model after model. GPT-4, Claude, Gemini, Llama, Mistral. Open-source models became competitive with commercial ones. The ability to run models locally on a laptop became real. Reasoning models appeared — models that think through problems step by step before answering. Coding agents arrived that could read, write, and run code without a human in the loop.

The two companies worth understanding

OpenAI was founded in 2015 as a non-profit, by Sam Altman, Elon Musk, Greg Brockman, and others. The stated mission: build safe artificial general intelligence for the benefit of humanity. It later became a capped-profit company to raise investment. It built ChatGPT and the GPT model family, and became the most recognized name in AI.

Anthropic was founded in 2021 by Dario Amodei, Daniela Amodei, and several other researchers who left OpenAI. The reason they left was a disagreement about safety. Anthropic’s focus is on understanding why models behave the way they do, and building systems that behave reliably. Their model family is Claude.

Where we are now

As of 2025, models have surpassed human performance on most professional benchmarks. This does not mean they are smarter than people. It means they score higher on the tests. The distinction matters.

What has changed is that coding in particular has been transformed. Models can now write functional software from a description, debug errors, explain codebases, and operate as autonomous agents within a project. This is not the future. It is already the day-to-day reality for most people building things with software.

The pace is not slowing.