Teaching

Stochastic Gradient Descent Methods (CS 331)

Graduate course, Teaching Assistance, KAUST, Spring, 2024

Stochastic gradient descent (SGD) in one or another of its many variants is the workhorse method for training modern supervised machine learning models. However, the world of SGD methods is vast and expanding, which makes it hard for practitioners and even experts to understand its landscape and inhabitants. This course is a mathematically rigorous and comprehensive introduction to the field, and is based on the latest results and insights. The course develops a convergence and complexity theory for serial, parallel, and distributed variants of SGD, in the strongly convex, convex and nonconvex setup, with randomness coming from sources such as subsampling and compression. Additional topics such as acceleration via Nesterov momentum or curvature information will be covered as well. A substantial part of the course offers a unified analysis of a large family of variants of SGD which have so far required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. This framework includes methods with and without the following tricks, and their combinations: variance reduction, data sampling, coordinate sampling, arbitrary sampling, importance sampling, mini-batching, quantization, sketching, dithering and sparsification.

Graduate Seminar (CS 398)

Graduate course, Teaching Assistance, KAUST, Spring, 2023

Graduate seminar focusing on special topics within the field.

Stochastic Gradient Descent Methods (CS 331)

Graduate course, Teaching Assistance, KAUST, Fall, 2022

Stochastic gradient descent (SGD) in one or another of its many variants is the workhorse method for training modern supervised machine learning models. However, the world of SGD methods is vast and expanding, which makes it hard for practitioners and even experts to understand its landscape and inhabitants. This course is a mathematically rigorous and comprehensive introduction to the field, and is based on the latest results and insights. The course develops a convergence and complexity theory for serial, parallel, and distributed variants of SGD, in the strongly convex, convex and nonconvex setup, with randomness coming from sources such as subsampling and compression. Additional topics such as acceleration via Nesterov momentum or curvature information will be covered as well. A substantial part of the course offers a unified analysis of a large family of variants of SGD which have so far required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. This framework includes methods with and without the following tricks, and their combinations: variance reduction, data sampling, coordinate sampling, arbitrary sampling, importance sampling, mini-batching, quantization, sketching, dithering and sparsification.

Graduate Seminar (CS 398)

Graduate course, Teaching Assistance, KAUST, Fall, 2022

Graduate seminar focusing on special topics within the field.

Stochastic Gradient Descent Methods (CS 331)

Graduate course, Teaching Assistance, KAUST, Fall, 2021

Stochastic gradient descent (SGD) in one or another of its many variants is the workhorse method for training modern supervised machine learning models. However, the world of SGD methods is vast and expanding, which makes it hard for practitioners and even experts to understand its landscape and inhabitants. This course is a mathematically rigorous and comprehensive introduction to the field, and is based on the latest results and insights. The course develops a convergence and complexity theory for serial, parallel, and distributed variants of SGD, in the strongly convex, convex and nonconvex setup, with randomness coming from sources such as subsampling and compression. Additional topics such as acceleration via Nesterov momentum or curvature information will be covered as well. A substantial part of the course offers a unified analysis of a large family of variants of SGD which have so far required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. This framework includes methods with and without the following tricks, and their combinations: variance reduction, data sampling, coordinate sampling, arbitrary sampling, importance sampling, mini-batching, quantization, sketching, dithering and sparsification.

Optimization and Applications 1

Graduate course, Teaching Assistance, Ozon Masters, Spring, 2020

The introductory course to convex optimization and modern optimization methods.

Optimization Methods

Undergraduate course, Teaching Assistance, MIPT, Fall term, 2019

Theory: Convex Sets and Functions, Optimality Conditions, Foundations of duality theory
Practice: Optimization Problem Statement, Methods for solving problems without restrictions, Methods for solving problems with simple constraints Linear programming, Cone Optimization Problems and SDP

Machine Learning

Undergraduate course, Teaching Assistance, MIPT, Spring term, 2019

This course aims to introduce students to modern state of Machine Learning and Artificial Intelligence. It is designed to take one year (two terms at MIPT) - approximately 2 * 15 lectures and seminars.

Physics Tutor

Tutoring, Physics Olympiads and Tests, 2019

Preparation of 7-11th grade students for the Olympiads and tests, 2017-2019