KTH Royal Institute of Technology, Stockholm

Contact • *
Kathlén Kohn kathlen@kth.se *

The research group on Applied Algebraic Geometry in Data Science and AI led by Kathlén Kohn that is part of the division Mathematics of Data and AI at the Department of Mathematics at KTH Stockholm has two open postdoctoral positions with an agreed upon starting date (between April and September 2022):

**Postdoc Fellow:**employment for 2 years

**Requirement:**a doctoral degree or an equivalent foreign degree, obtained within the last three years prior to the application deadline (with exceptions for special reasons, e.g., sick or parental leave)

**How to apply:**read the instructions here and click on Login and apply at the very bottom of the page

**Application deadline:**February 11, 2022**Researcher:**employment for 1 year

**Requirement:**a doctoral degree or an equivalent foreign degree

**How to apply:**read the instructions here and click on Apply for position at the very bottom of the page

**Application deadline:**February 15, 2022

The positions are funded by WASP (Wallenberg AI, Autonomous Systems and Software Program) and The Göran Gustafsson Foundation for Research at Uppsala University and KTH.

The successful candidates will work on the multidisciplinary project

Besides a doctoral degree as specified above, the successful candidates for both positions are expected to have:

- a strong background in algebraic, complex, differential, or discrete geometry, or related areas of mathematics,
- a strong interest in applications, particularly machine learning and computer vision,
- written and spoken English proficiency, very good communication and teamwork skills, in particular a willingness to collaborate with engineers and industry, and
- strong motivation and ability to work independently.

This multidisciplinary project is a WASP NEST (Novelty, Excellence, Synergy, Teams)
with a total duration of 5 years, starting in April 2022.

**The Team:** The project is led by the four principal investigators listed above.
In addition to the postdoc and researcher from this announcement, three PhD students will be hired for the project (one at each of the three involved universities).
To facilitate transfer of results and experience from the project for further exploitation in industry, the project will collaborate with a group of industrial partners:
H&M,
Volvo Cars,
Zenseact, and
Embellence Group.

The NEST will organize regular joint seminars and workshops. Moreover, regular visits among the involved sites are possible and encouraged.

**The Goal:**
In recent years, generative neural network models for creation of photo-realistic images have become increasingly popular.
Their training results in a low-dimensional latent space representation of a distribution from which new examples can be drawn and images be generated.
To be able to control and manipulate the images, one aims for a disentangled latent representation, so that different qualities of the resulting images are kept separate.
For example, the shape of an object can be separated from its material properties, the viewing direction and the overall illumination.

In this project, we will go one step further and develop disentangled latent representations, not just for individual objects, but for full three-dimensional scenes with multiple objects that might change over time.
The goal is to provide support for operations like adding, modifying and removing objects, changing scene conditions, modelling scene dynamics, as well as automatically complete missing parts of the 3D scene.

**Possible Projects:**
To successfully complete the project, it is necessary to develop both new theory and algorithms.
On the theoretical side, algebra and geometry can be used to address the following problems:

**Disentanglement**is a widely used concept and one of the most ambitious challenges in learning. Yet, it still lacks a widely accepted formal definition. The challenge is to develop learning algorithms that disentangle the different factors of variation in the data, but what exactly that means highly depends on the application at hand. For instance, in order to be able to modify content of an indoor scene, we seek algorithms to learn situated and semantic encodings (e.g., positions and colors of objects or lighting). Several mathematical ideas have been proposed to provide a formal definition of disentanglement (e.g., statistical independence, flattening manifolds, irreducible representations of groups, direct products of group actions, but none of them applies to all practical settings where disentanglement is used. We aim to develop a more generally applicable formal definition and to study theoretical guarantees of what can be achieved with disentanglement.**Autoencoders**are at the core of our project. An important aspect is their ability to memorize the training data, which has been recently explored from a dynamical systems perspective. Empirical results suggest that training examples form attractors of these autoencoders, but the theoretical reasons behind that mechanism are still not clear. Algebraic techniques can be applied in the setting of ReLU autoencoders with Euclidean loss, as the underlying geometric problem is to find a closest point on a semi-algebraic set. An open conjecture is that all training examples are attractors in a (global) minimum of the Euclidean loss of a sufficiently deep ReLU autoencoder. We aim to investigate that conjecture as well as further conditions under which attractors are formed.**Equivariances**under group actions are a key factor in our project. A large part of the success of deep learning for 2D image interpretation is the convolutional layer, which induces translation equivariance. We want to analyze, construct and develop efficient implementations for 3d equivariances, e.g., SO(3), SE(3) and their subgroups, for instance, azimuthal rotations. Moreover, the 3D representation should be equivariant under permutation of the objects in the scene. We also plan to investigate the benefits of equivariant representations for disentanglement.**Scene completion**will be a challenge in our project. Given a trained encoder-decoder model, we need to be able to perform inference on incomplete data. Typically, objects are only visible from a few viewpoints or even a single viewpoint, whereas the encoder-decoder model training usually relies on complete data. One possible approach to cope with missing data is to optimize the latent 3D representation such that it matches the given observations. More specifically, for a given 2D observation, we find the best 3D representation by minimizing the Euclidean distance between the given 2D data and the projected 3D representation.