Cath Wang

Tag: ai_safety

9 items with this tag.

  • Apr 18, 2026

    AI x Infohazards x Biosecurity @ Non-Trivial

    • ai_safety
    • research
  • Apr 18, 2026

    Invariance-aware diverse reward ensembles

    • research
    • ai_safety
  • Dec 22, 2025

    Attack Selection @ MARS

    • ai_safety
    • control
  • Sep 25, 2025

    Causal Scrubbing, 2022

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Induction Heads, 2022

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Mathematical Framework for Transformers, 2021

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    The case for AI control and criticisms

    • control
    • ai_safety
  • Sep 25, 2025

    Towards Monosemanticity, 2023

    • ai_safety
    • mech_interp
  • Sep 25, 2025

    Toy Models of Superposition, 2022

    • ai_safety
    • mech_interp

Created with Quartz v4.5.1 © 2026

  • GitHub
  • Any Feedback?
  • Email Me