All Time Reads
- The Metamorphosis of Prime Intellect
- The Fable of the Dragon-Tyrant
- "The Coming Technological Singularity" by Vernor Vinge
- "Understand" by Ted Chiang
- "The Gentle Seduction"
June 2025
-
Thiel's infamous essay from 2009:
"I stand against confiscatory taxes, totalitarian collectives, and the ideology of the inevitability of the death of every individual."
"The fate of our world may depend on the effort of a single person who builds or propagates the machinery of freedom that makes the world safe for capitalism."
https://www.cato-unbound.org/2009/04/13/peter-thiel/education-libertarian/
February 2025
-
How to Make Superbabies
https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies
October 2023
-
Phi-1.5 Model: A Case of Comparing Apples to Oranges?
https://pratyushmaini.github.io/phi-1_5/ -
Flash-Decoding for long-context inference
https://pytorch.org/blog/flash-decoding/ -
RingAttention
https://arxiv.org/abs/2310.01889- The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
- This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
- You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…
-
It's time to be a definite techno-optimist
https://a16z.com/the-techno-optimist-manifesto/
June 2023
-
Large Language Models can Simulate Everything
https://kliu.io/post/llms-can-simulate-everything/- It might be time to build a General LLM Company — a virtual company of LLMs, with each "employee" specialized into a particular task.
-
Large Language Models as Tool Makers
https://arxiv.org/abs/2305.17126- In similar fashion to the recent Voyager paper
-
Blockwise Parallel Transformer for Long Context Large Models
https://arxiv.org/abs/2305.19370- Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network
May 2023
-
Jason Wei's response to emergent abilities of LLMs are a mirage
arguments
https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities
April 2023
-
Scaffolded LLMs are not just cool toys but actually the substrate of a
new type of general-purpose natural language computer
https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/
March 2023
-
Is ChatGPT 175 Billion Parameters? Technical Analysis
https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis- Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416
-
A step towards self-improving LLMs
https://finbarr.ca/self-improving-LLMs/ -
Alexey Guzey's Lifehacks
https://guzey.com/lifehacks/ -
Huge L for Chomsky
https://scottaaronson.blog/?p=7094"like the Jesuit astronomers declining to look through Galileo's telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn't predict and that doesn't fit their worldview."
-
The Waluigi Effect of LLMs
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post- I stopped myself from reading the waluigi post until today because I don't really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.
-
Could you train a ChatGPT-beating model for $85,000 and run it in a
browser?
https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/
July 2022
-
The effective altruist work ethic and the spirit of utilitarianism
https://www.dwarkeshpatel.com/p/ea-billionaires -
The Track Record of Futurists Seems ... Fine
https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/ -
Balaji's new book the Network State
https://thenetworkstate.com/
June 2022
-
Gwern's GPT-3 2nd Anniversary predictions
https://www.reddit.com/r/mlscaling/comments/uznkhw/gpt3_2nd_anniversary/- Check here for a summary: https://twitter.com/johannes_hage/status/1530898189162782721
April 2022
-
DeepMind releases new scaling laws that contradict with the ones from
OpenAI
https://arxiv.org/abs/2203.15556 -
PaLM - 540B parameter model by Google AI
https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html -
DALL-E-2 by OpenAI
https://openai.com/dall-e-2/ -
Second order effects of the rise of large language models
https://twitter.com/russelljkaplan/status/1513128005828165634
March 2022
-
Georg Hotz - Ride or Die
https://return.life/2022/03/07/george-hotz-comma-ride-or-die/ -
Super realistic AI takeoff scenario by gwern, based on current models
and scaling effects
https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world -
Examples of barbell strategies for everyday life
https://dwarkeshpatel.com/barbell-strategies/ -
Deep Neural Nets - 33 years ago and 33 years from now → Andrej
Karpathy reimplemented one of the first neural net papers by LeCun
from 1989 and analysed if we made any fundamental progress
https://karpathy.github.io/2022/03/14/lecun1989/ -
Directory of all Large Language Models
https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0 -
BigScience published several interesting blog posts this month on how
they are training their 176B parameter language model
https://bigscience.huggingface.co/blog
February 2022
-
Slate Star Codex analysis of AGI timelines
https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might -
Thesis on Sleep by Alexey Guzey
https://guzey.com/theses-on-sleep/ -
Why Tyler Cowen with Emergent Ventures has been so successful in
curating talent and why the first batch of YC and the Thiel Fellowship
were so successful
https://www.highmodernism.com/blog/talentcuration
January 2022
-
Motivation for the roaring 20s. We choose to solve problems like
alignment (+aging) not because they are easy but because they are
hard!
https://www.lesswrong.com/posts/BseaxjsiDPKvGtDrm/we-choose-to-align-ai
December 2021
-
Cool newsletter by Sonia Joseph
https://mirror.xyz/soniajoseph.eth/AsFhFt-JOjqdyb6GCVhcNCIK6whBIZpcu_XbSvpc6W8 -
Sequence to understand the relationship between Progress Studies and
Effective Altruism
https://www.highmodernism.com/sequence -
New AGI Workshop at Mila Quebec with a bunch of videos
https://sites.google.com/mila.quebec/scaling-laws-workshop/schedule -
WebGPT: Improving the factual accuracy of language models through web
browsing
https://openai.com/blog/improving-factual-accuracy/ -
Gopher: DeepMinds 280B parameter model with new SOTAs across the
board
https://deepmind.com/blog/article/language-modelling-at-scale -
First published work by Aleph Alpha
https://arxiv.org/pdf/2112.05253.pdf -
New developments on nuclear reactors in Wyoming!
https://www.terrapower.com/natrium-demo-kemmerer-wyoming/
October 2021
-
Deep Learning Diminishing Returns → Important piece but I don't agree
with a lot of stuff in this
https://spectrum.ieee.org/deep-learning-computational-cost -
Whole Brain Emulation → No Progress on C. elgans After 10 Years
https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years- Also check out the Reddit discussion about the WBE topic: https://www.reddit.com/r/slatestarcodex/comments/q0hlyh/whole_brain_emulation_no_progress_on_c_elegans/
-
For fellow ML Engineers - How to Train Really Large Models on Many
GPUs?
https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html -
Your chance to invest in a Longevity company that went trough YC with
a reasonable valuation
https://wefunder.com/gerostate.alpha -
How to Train Large Deep Learning Models as a Startup
https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/ -
The Vitalik Buterin Fellowships in AI Existential Safety
https://grants.futureoflife.org/ -
530B parameter language model by Microsoft + NVIDIA
https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/- Also check out this awesome thread about the model: https://twitter.com/BlancheMinerva/status/1447560921530896389
-
State of AI Report 2021
https://www.stateof.ai/ -
Bryan Johnson measuring all his 70+ organs to maximally reverse the
quantified biological age of each
https://blueprint.bryanjohnson.co/ -
Awesome video about the Scaling Laws
https://www.youtube.com/watch?v=StLtMcsbQes -
Just Ask for Generalization by Eric Jang
https://evjang.com/2021/10/23/generalization.html
September 2021
-
In What Sense is Matter 'Programmable'? → A lot of interesting ideas
inspired by David Deutsch ideas in the Beginning of Infinity
https://jaredtumiel.github.io/blog/2021/08/14/programmable-matter.html -
Founder of NeuraLink is building a new company called Science
https://maxhodak.com/nonfiction/2021/09/03/science.html -
Summary of Sam Altman Q&A on AGI predictions and GPT-4
https://www.lesswrong.com/posts/aihztgJrknBdLHjd2/sam-altman-q-and-a-gpt-and-agi
August 2021
-
New chip cluster that will make 120 trillion parameter models possible
(almost 100x from GPT-3)
https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/ -
Scott Alexander on AGI risks + his answers to the comments are super
interesting
https://astralcodexten.substack.com/p/contra-acemoglu-onoh-god-were-doing- Highlights from the comments: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu
-
We need founder-led Biotech companies
https://www.pillar.vc/news/the-future-of-biotech-is-founder-led/ -
If Einstein Had The Internet: An Interview With Balaji Srinivasan
https://sotonye.substack.com/p/if-einstein-had-the-internet-an-interview
July 2021
-
Must read on the arguments for a slow AGI takeoff
https://sideways-view.com/2018/02/24/takeoff-speeds/ -
Prompt design of neural networks (learning how to talk to an AI) will
be a superpower in the future. Adding "dramatic atmospheric ultra high
definition free desktop wallpaper" to the prompt for CLIP produces
much more realistic images
https://ai-weirdness.ghost.io/the-art-of-asking-nicely/ -
There will be a dope Ethereum documentary
https://ethereumfilm.mirror.xyz/3SV8gLXHIW8Ot45h3RL7aOgDINxN2hjLfFVOvyatB2A -
Funniest AI blog post I've ever read
https://blog.eleuther.ai/year-one/ -
How to apply for an insane amount of free TPUs as an ML Engineer
https://blog.gpt4.org/jaxtpu -
Building Europes AGI
https://www.aleph-alpha.de/ -
Putting the power of AlphaFold into the world's hands
https://deepmind.com/blog/article/putting-the-power-of-alphafold-into-the-worlds-hands
June 2021
-
Breakthrough Initiatives → Research initiative that wants to go to
Alpha Centauri by 2060 via an ultra-light uncrewed space flight at 20%
of the speed of light
https://breakthroughinitiatives.org/ -
Inexperienced engineers tend to undervalue simplicity → a
justification of the cross entropy loss
https://jacobjackson.com/cross-entropy/