Cath Wang

Home

❯

My Notes on AI Safety Papers

Folder: My-Notes-on-AI-Safety-Papers

6 items under this folder.

  • Sep 25, 2025

    Causal Scrubbing, 2022

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Induction Heads, 2022

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Mathematical Framework for Transformers, 2021

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    The case for AI control and criticisms

    • control
    • ai_safety
  • Sep 25, 2025

    Towards Monosemanticity, 2023

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Toy Models of Superposition, 2022

    • ai_safety
    • mech_interp

Created with Quartz v4.5.1 © 2026

  • GitHub
  • Any Feedback?
  • Email Me