Differentiable programming

Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation.^[1]^[2]^[3]^[4] This allows for gradient based optimization of parameters in the program, often via gradient descent. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and artificial intelligence.^[4]

Approaches

Most differentiable programming frameworks work by constructing a graph containing the control flow and data structures in the program.^[5] Earlier attempts generally fall into two groups:

Static, compiled graph based approaches such as TensorFlow,^{[note 1]}Theano, and MXNet. They tend to allow for good compiler optimization and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving loops or recursion), as well as making it harder for users to reason effectively about their programs.^[5]^[6]^[7]

Operator overloading, dynamic graph based approaches such as PyTorch and AutoGrad. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to interpreter overhead (particularly when composing many small operations), poorer scalability, and struggle to gain benefit from compiler optimization.^[6]^[7]^[4]

Both of these early approaches are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs.

A more recent package for the Julia programming language – Zygote – resolves the issues that earlier attempts faced by treating the language's syntax as the graph. The intermediate representation of arbitrary code can then be differentiated directly, optimized, and compiled.^[5]^[8]

A programming language "currently under development and is not yet ready for use" called Myia^[6] allows defining a model using a subset of Python, which is compiled to Myia.

Applications

Differentiable programming has been applied in areas such as combining deep learning with physics engines in robotics, solving electronic structure problems with differentiable density functional theory, differentiable ray tracing, image processing, and probabilistic programming.^[9]^[10]^[11]^[12]^[13]^[4]

Notes

^ TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

References

^ Baydin, Atilim Gunes; Pearlmutter, Barak; Radul, Alexey Andreyevich; Siskind, Jeffrey (2018). "Automatic differentiation in machine learning: a survey". Journal of Machine Learning Research. 18: 1–43.
^ Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 10201–10212, retrieved 2019-02-13
^ Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018.
^ ^a ^b ^c ^d Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019), ∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing, arXiv:
^ ^a ^b ^c Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Rudilosso, Marco Concetto; Joy, Neethu Mariya; Karmali, Tejan; Pal, Avik; Shah, Viral (2018-10-31). "Fashionable Modelling with Flux". arXiv: [cs.PL].
^ ^a ^b ^c "Automatic Differentiation in Myia" (PDF). Retrieved 2019-06-24.
^ ^a ^b "TensorFlow: Static Graphs". Retrieved 2019-03-04.
^ Innes, Michael (2018-10-18). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv: [cs.PL].
^ Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016-11-05). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv: [cs.NE].
^ Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. doi:.
^ "Differentiable Monte Carlo Ray Tracing through Edge Sampling". people.csail.mit.edu. Retrieved 2019-02-13.
^ "SciML Scientific Machine Learning Open Source Software Organization Roadmap". sciml.ai. Retrieved 2020-07-19.
^ "Differentiable Programming for Image Processing and Deep Learning in Halide". people.csail.mit.edu. Retrieved 2019-02-13.

[6] TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

[baydin2018automatic-1] Baydin, Atilim Gunes; Pearlmutter, Barak; Radul, Alexey Andreyevich; Siskind, Jeffrey (2018). "Automatic differentiation in machine learning: a survey". Journal of Machine Learning Research. 18: 1–43.

[2] Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 10201–10212, retrieved 2019-02-13

[innes-3] Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018.

[diffprog-zygote-4] Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019), ∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing, arXiv:

[flux-5] Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Rudilosso, Marco Concetto; Joy, Neethu Mariya; Karmali, Tejan; Pal, Avik; Shah, Viral (2018-10-31). "Fashionable Modelling with Flux". arXiv: [cs.PL].

[myia1-7] "Automatic Differentiation in Myia" (PDF). Retrieved 2019-06-24.

[pytorchtut-8] "TensorFlow: Static Graphs". Retrieved 2019-03-04.

[9] Innes, Michael (2018-10-18). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv: [cs.PL].

[10] Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016-11-05). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv: [cs.NE].

[Li2021-11] Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. doi:.

[12] "Differentiable Monte Carlo Ray Tracing through Edge Sampling". people.csail.mit.edu. Retrieved 2019-02-13.

[13] "SciML Scientific Machine Learning Open Source Software Organization Roadmap". sciml.ai. Retrieved 2020-07-19.

[14] "Differentiable Programming for Image Processing and Deep Learning in Halide". people.csail.mit.edu. Retrieved 2019-02-13.

[1]

[2]

[3]

[4]

[5]

[note 1]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Contents

Differentiable programming

Approaches

Applications

See also

Notes

References