Links - Johannes Hagemann

This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.

All Time Reads

The Metamorphosis of Prime Intellect
The Fable of the Dragon-Tyrant
"The Coming Technological Singularity" by Vernor Vinge
"Understand" by Ted Chiang
"The Gentle Seduction"

June 2025

Thiel's infamous essay from 2009:
"I stand against confiscatory taxes, totalitarian collectives, and the ideology of the inevitability of the death of every individual."

"The fate of our world may depend on the effort of a single person who builds or propagates the machinery of freedom that makes the world safe for capitalism."
https://www.cato-unbound.org/2009/04/13/peter-thiel/education-libertarian/

February 2025

How to Make Superbabies
https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies

October 2023

Phi-1.5 Model: A Case of Comparing Apples to Oranges?
https://pratyushmaini.github.io/phi-1_5/
Flash-Decoding for long-context inference
https://pytorch.org/blog/flash-decoding/
RingAttention
https://arxiv.org/abs/2310.01889
- The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
- This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
- You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…
It's time to be a definite techno-optimist
https://a16z.com/the-techno-optimist-manifesto/

June 2023

Large Language Models can Simulate Everything
https://kliu.io/post/llms-can-simulate-everything/
- It might be time to build a General LLM Company — a virtual company of LLMs, with each "employee" specialized into a particular task.
Large Language Models as Tool Makers
https://arxiv.org/abs/2305.17126
- In similar fashion to the recent Voyager paper
Blockwise Parallel Transformer for Long Context Large Models
https://arxiv.org/abs/2305.19370
- Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network

May 2023

Jason Wei's response to emergent abilities of LLMs are a mirage arguments
https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

April 2023

Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer
https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/

March 2023

Is ChatGPT 175 Billion Parameters? Technical Analysis
https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis
- Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416
A step towards self-improving LLMs
https://finbarr.ca/self-improving-LLMs/
Alexey Guzey's Lifehacks
https://guzey.com/lifehacks/
Huge L for Chomsky
https://scottaaronson.blog/?p=7094
"like the Jesuit astronomers declining to look through Galileo's telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn't predict and that doesn't fit their worldview."
The Waluigi Effect of LLMs
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
- I stopped myself from reading the waluigi post until today because I don't really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.
Could you train a ChatGPT-beating model for $85,000 and run it in a browser?
https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/

July 2022

The effective altruist work ethic and the spirit of utilitarianism
https://www.dwarkeshpatel.com/p/ea-billionaires
The Track Record of Futurists Seems ... Fine
https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/
Balaji's new book the Network State
https://thenetworkstate.com/

June 2022

Gwern's GPT-3 2nd Anniversary predictions
https://www.reddit.com/r/mlscaling/comments/uznkhw/gpt3_2nd_anniversary/
- Check here for a summary: https://twitter.com/johannes_hage/status/1530898189162782721

April 2022

DeepMind releases new scaling laws that contradict with the ones from OpenAI
https://arxiv.org/abs/2203.15556
PaLM - 540B parameter model by Google AI
https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
DALL-E-2 by OpenAI
https://openai.com/dall-e-2/
Second order effects of the rise of large language models
https://twitter.com/russelljkaplan/status/1513128005828165634

March 2022

Georg Hotz - Ride or Die
https://return.life/2022/03/07/george-hotz-comma-ride-or-die/
Super realistic AI takeoff scenario by gwern, based on current models and scaling effects
https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world
Examples of barbell strategies for everyday life
https://dwarkeshpatel.com/barbell-strategies/
Deep Neural Nets - 33 years ago and 33 years from now → Andrej Karpathy reimplemented one of the first neural net papers by LeCun from 1989 and analysed if we made any fundamental progress
https://karpathy.github.io/2022/03/14/lecun1989/
Directory of all Large Language Models
https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0
BigScience published several interesting blog posts this month on how they are training their 176B parameter language model
https://bigscience.huggingface.co/blog

February 2022

Slate Star Codex analysis of AGI timelines
https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might
Thesis on Sleep by Alexey Guzey
https://guzey.com/theses-on-sleep/
Why Tyler Cowen with Emergent Ventures has been so successful in curating talent and why the first batch of YC and the Thiel Fellowship were so successful
https://www.highmodernism.com/blog/talentcuration

January 2022

Motivation for the roaring 20s. We choose to solve problems like alignment (+aging) not because they are easy but because they are hard!
https://www.lesswrong.com/posts/BseaxjsiDPKvGtDrm/we-choose-to-align-ai

December 2021

Cool newsletter by Sonia Joseph
https://mirror.xyz/soniajoseph.eth/AsFhFt-JOjqdyb6GCVhcNCIK6whBIZpcu_XbSvpc6W8
Sequence to understand the relationship between Progress Studies and Effective Altruism
https://www.highmodernism.com/sequence
New AGI Workshop at Mila Quebec with a bunch of videos
https://sites.google.com/mila.quebec/scaling-laws-workshop/schedule
WebGPT: Improving the factual accuracy of language models through web browsing
https://openai.com/blog/improving-factual-accuracy/
Gopher: DeepMinds 280B parameter model with new SOTAs across the board
https://deepmind.com/blog/article/language-modelling-at-scale
First published work by Aleph Alpha
https://arxiv.org/pdf/2112.05253.pdf
New developments on nuclear reactors in Wyoming!
https://www.terrapower.com/natrium-demo-kemmerer-wyoming/

October 2021

Deep Learning Diminishing Returns → Important piece but I don't agree with a lot of stuff in this
https://spectrum.ieee.org/deep-learning-computational-cost
Whole Brain Emulation → No Progress on C. elgans After 10 Years
https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years
- Also check out the Reddit discussion about the WBE topic: https://www.reddit.com/r/slatestarcodex/comments/q0hlyh/whole_brain_emulation_no_progress_on_c_elegans/
For fellow ML Engineers - How to Train Really Large Models on Many GPUs?
https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html
Your chance to invest in a Longevity company that went trough YC with a reasonable valuation
https://wefunder.com/gerostate.alpha
How to Train Large Deep Learning Models as a Startup
https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/
The Vitalik Buterin Fellowships in AI Existential Safety
https://grants.futureoflife.org/
530B parameter language model by Microsoft + NVIDIA
https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
- Also check out this awesome thread about the model: https://twitter.com/BlancheMinerva/status/1447560921530896389
State of AI Report 2021
https://www.stateof.ai/
Bryan Johnson measuring all his 70+ organs to maximally reverse the quantified biological age of each
https://blueprint.bryanjohnson.co/
Awesome video about the Scaling Laws
https://www.youtube.com/watch?v=StLtMcsbQes
Just Ask for Generalization by Eric Jang
https://evjang.com/2021/10/23/generalization.html

September 2021

In What Sense is Matter 'Programmable'? → A lot of interesting ideas inspired by David Deutsch ideas in the Beginning of Infinity
https://jaredtumiel.github.io/blog/2021/08/14/programmable-matter.html
Founder of NeuraLink is building a new company called Science
https://maxhodak.com/nonfiction/2021/09/03/science.html
Summary of Sam Altman Q&A on AGI predictions and GPT-4
https://www.lesswrong.com/posts/aihztgJrknBdLHjd2/sam-altman-q-and-a-gpt-and-agi

August 2021

New chip cluster that will make 120 trillion parameter models possible (almost 100x from GPT-3)
https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/
Scott Alexander on AGI risks + his answers to the comments are super interesting
https://astralcodexten.substack.com/p/contra-acemoglu-onoh-god-were-doing
- Highlights from the comments: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu
We need founder-led Biotech companies
https://www.pillar.vc/news/the-future-of-biotech-is-founder-led/
If Einstein Had The Internet: An Interview With Balaji Srinivasan
https://sotonye.substack.com/p/if-einstein-had-the-internet-an-interview

July 2021

Must read on the arguments for a slow AGI takeoff
https://sideways-view.com/2018/02/24/takeoff-speeds/
Prompt design of neural networks (learning how to talk to an AI) will be a superpower in the future. Adding "dramatic atmospheric ultra high definition free desktop wallpaper" to the prompt for CLIP produces much more realistic images
https://ai-weirdness.ghost.io/the-art-of-asking-nicely/
There will be a dope Ethereum documentary
https://ethereumfilm.mirror.xyz/3SV8gLXHIW8Ot45h3RL7aOgDINxN2hjLfFVOvyatB2A
Funniest AI blog post I've ever read
https://blog.eleuther.ai/year-one/
How to apply for an insane amount of free TPUs as an ML Engineer
https://blog.gpt4.org/jaxtpu
Building Europes AGI
https://www.aleph-alpha.de/
Putting the power of AlphaFold into the world's hands
https://deepmind.com/blog/article/putting-the-power-of-alphafold-into-the-worlds-hands

June 2021

Breakthrough Initiatives → Research initiative that wants to go to Alpha Centauri by 2060 via an ultra-light uncrewed space flight at 20% of the speed of light
https://breakthroughinitiatives.org/
Inexperienced engineers tend to undervalue simplicity → a justification of the cross entropy loss
https://jacobjackson.com/cross-entropy/