Header logo is

Institute Talks

  • Simon Donne
  • Virtual, Live stream at Max-Planck-Ring 4, N3, Aquarium

Current diffusion models only generate RGB images. If we want to make progress towards graphics-ready 3D content generation, we need a PBR foundation model, but there is not enough PBR data available to train such a model from scratch. We introduce Collaborative Control, which tightly links a new PBR diffusion model to a pre-trained RGB model. We show that this dual architecture does not risk catastrophic forgetting, outputting high-quality PBR images and generalizing well beyond the PBR training dataset. Furthermore, the frozen base model remains compatible with techniques such as IP-Adapter.

Organizers: Soubhik Sanyal


  • Slava Elizarov
  • Virtual, Live stream at Max-Planck-Ring 4, N3, Aquarium

In this talk, I will present Geometry Image Diffusion (GIMDiffusion), a novel method designed to generate 3D objects from text prompts efficiently. GIMDiffusion uses geometry images, a 2D representation of 3D shapes, which allows the use of existing image-based architectures instead of complex 3D-aware models. This approach reduces computational costs and simplifies the model design. By incorporating Collaborative Control, the method exploits rich priors of pretrained Text-to-Image models like Stable Diffusion, enabling strong generalization even with limited 3D training data. GIMDiffusion produces 3D objects with semantically meaningful, separable parts and internal structures, which enhances the ease of manipulation and editing.

Organizers: Soubhik Sanyal


  • Adriana Cabrera
  • Copper (2R04)

This talk explores the prototyping of e-textiles and the integration of Soft Robotics systems, grounded in experimentation within digital fabrication spaces and Open Innovation environments like Fab Labs. By leveraging CNC fabrication methods and soft material manipulation, this approach reduces barriers between high and low tech, making experimentation more accessible. It also enables the integration of pneumatic actuators, sensors, and data collection systems into e-textiles and wearable technologies. The presentation will highlight how these developments open up new possibilities for creating smart textiles with soft robotic capabilities. Finally, it aims to inspire discussions on the application of haptics and actuators, such as HASEL, in wearables and e-textiles, fostering co-creation of future solutions that blend these innovative technologies with design.

Organizers: Paul Abel Christoph Keplinger


The Atomic Human: Understanding ourselves in the age of AI

Max Planck Lecture
  • 17 October 2024 • 16:00—18:00
  • Neil Lawrence
  • Lecture Hall 2D5, Heisenbergstraße 1, Stuttgart

The Max Planck Institute for Intelligent Systems is delighted to invite you to its 2024 Max Planck Lecture in Stuttgart.

Organizers: Michael Black Barbara Kettemann Valeria Rojas

Advancements in 3D Facial Expression Reconstruction

Talk
  • 23 September 2024 • 12:00—13:00
  • Panagiotis Filntisis and George Retsinas
  • Hybrid

Recent advances in 3D face reconstruction from in-the-wild images and videos have excelled at capturing the overall facial shape associated with a person's identity. However, they often struggle to accurately represent the perceptual realism of facial expressions, especially subtle, extreme, or rarely observed ones. In this talk, we will present two contributions focused on improving 3D facial expression reconstruction. The first part introduces SPECTRE—"Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos"—which offers a method for precise 3D reconstruction of mouth movements linked to speech articulation. This is achieved using a novel "lipread" loss function that enhances perceptual realism. The second part covers SMIRK—"3D Facial Expressions through Analysis-by-Neural-Synthesis"—where we explore how neural rendering techniques can overcome the limitations of differentiable rendering. This approach provides better gradients for 3D reconstruction and allows us to augment training data with diverse expressions for improved generalization. Together, these methods set new standards in accurately reconstructing facial expressions.

Organizers: Victoria Fernandez Abrevaya


Generalizable Object-aware Human Motion Synthesis

Talk
  • 12 September 2024 • 14:00—15:00
  • Wanyue Zhang
  • Max-Planck-Ring 4, N3, Aquarium

Data-driven virtual 3D character animation has recently witnessed remarkable progress. The realism of virtual characters is a core contributing factor to the quality of computer animations and user experience in immersive applications like games, movies, and VR/AR. However, existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalize well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. In this talk, I will present ROAM, an alternative framework that generalizes to unseen objects of the same category without relying on a large dataset of human-object animations. In addition, I will share some preliminary findings from an ongoing project on hand motion interaction with articulated objects.

Organizers: Nikos Athanasiou


  • Lorena Velásquez
  • Hybrid - Webex plus in-person attendance in Oxygen (5N18)

Individuals with limb loss often choose prosthetic devices to complete activities of daily living (ADLs) as they can provide enhanced dexterity and customizable utility. Despite these benefits, high abandonment rates persist due to uncomfortable, cumbersome, and unreliable designs. Despite restoring motor function, dexterous sensorimotor control remains severely impaired due to the absence of haptic feedback. This presentation details the design and evaluation of tendon-actuated mock prostheses with integrated state-based haptic feedback and their anthropomorphic tendon-actuated end effectors.

Organizers: Katherine Kuchenbecker Uli Bartels


Modelling the Musculoskeletal System

Talk
  • 04 September 2024 • 10:30—11:30
  • Thor Besier
  • Max Planck Ring 4, N3

Thor Besier leads the musculoskeletal modelling group at the Auckland Bioengineering Institute and will provide an overview of the institute and some of the current research projects of his team, including the Musculoskeletal Atlas Project, Harmonising clinical gait analysis data, Digital Twins for shoulder arthroplasty, and Reproducibility of Knee Models (NIH funded KneeHUB project).

Organizers: Marilyn Keller


Technologies of Thin Films for High-power Laser Systems

Talk
  • 22 August 2024 • 10:00—11:30
  • Prof. Zhanshan Wang
  • Copper (2R04)

High-power laser systems significantly influence solutions for major scientific issues and high-tech industries. Thin films are one of the core components of advanced high-power laser systems. With the development of output power and application scenarios, high-power laser systems have to satisfy increasingly stringent requirements on the damage threshold, optical loss, and capabilities of optical field control for thin-film components of laser systems. In terms of improving the laser damage threshold, we revealed a physical mechanism of "localized strong point", which will induce laser damage on thin films, and established a solution of "field control design method" for manipulating the distribution of standing wave fields by adjusting the film structure. We further proposed a new method for obtaining quantitative damage laws of "localized strong point" using artificial defects, which lays the foundation for a "full-process quantification" approach to control defect preparation in thin films. Regarding optical loss in thin films, we clarified the relationship between optical factors, interface relevance, and film interface scattering. We then proposed engineering strategies of 1) multi-objective optimization techniques for synergistically control optical factors and spectral efficiency, 2) oblique growth for changing interface PSD relevance, 3) ion-activated oxygen technology, nano-composite material technology, and high-temperature annealing technology for reducing film absorption loss, and 4) defect flattening technology for mitigating the absorption and scattering loss. In the aspect of optical field control, we detailed discussed the pros and cons of traditional optical thin films and metasurfaces in controlling the amplitude and phase of electromagnetic waves. A quasi 3D multilayer metasurface structure enhances the non-local energy flow control capability through the efficient coupling of transmission waves and Bloch waves, achieving an efficiency exceeding 99% in anomalous reflection of light frequency for the first time. We elucidated the controllability of the degrees of freedoms of the metasurface structure over the phase difference and phase dispersion of Bragg modes within the structure, and achieve an efficiency exceeding 99% in broadband depolarization perfect Littrow diffraction through topological optimization of the metasurface structure shape. Additionally, we developed a new additive manufacturing method based on atomic layer deposition and etching techniques, avoiding microstructure shape changes and localized hotspots caused by etching, that effectively improves the efficiency and damage threshold of the multilayer film metasurface structure.

Organizers: Christoph Keplinger


Real Virtual Humans

Talk
  • 22 August 2024 • 14:00—15:00
  • István Sárándi
  • Max Planck Ring 4, N3

With the explosive growth of available training data, 3D human pose and shape estimation is ahead of a transition to a data-centric paradigm. To leverage data scale, we need flexible models trainable from heterogeneous data sources. To this end, our latest work, Neural Localizer Fields, seamlessly unifies different human pose and shape-related tasks and datasets though the ability - both at training and test time - to query any arbitrary point of the human volume, and obtain its estimated location in 3D, based on a single RGB image. We achieve this by learning a continuous neural field of body point localizer functions, each of which is a differently parameterized 3D heatmap-based convolutional point localizer. This way, we can naturally exploit differently annotated data sources including parametric mesh, 2D/3D skeleton and dense pose, without having to explicitly convert between them, and thereby train large-scale 3D human mesh and skeleton estimation models that outperform the state-of-the-art on several public benchmarks including 3DPW, EMDB and SSP-3D by a considerable margin.

Organizers: Marilyn Keller


  • Yijie Gong
  • Hybrid - Webex plus in-person attendance in 2P04 (MPI-IS Stuttgart)

Teleoperation allows workers on a construction site to assemble pre-fabricated building components by controlling powerful machines from a safe distance. However, teleoperation's primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, compromising their situational awareness and thus increasing the difficulty of the task. To bridge this gap, we created AiroTouch, a naturalistic vibrotactile feedback system tailored for use on construction sites but suitable for many other applications of telerobotics. Then we evaluate AiroTouch and explore the effects of the naturalistic vibrotactile feedback it delivers in three user studies conducted either in laboratory settings or on a construction site.

Organizers: Yijie Gong Katherine Kuchenbecker


4D Dynamic Scene Reconstruction, Editing, and Generation.

Talk
  • 25 July 2024 • 14:00—15:00
  • Jiawei Liu
  • Virtual (Zoom)

People live in a 4D dynamic moving world. While videos serve as the most convenient medium to capture this dynamic world, they lack the capability to present the 4D nature of our world. Therefore, 4D video reconstruction, free-viewpoint rendering, and high-quality editing and generation offer innovative opportunities for content creation, virtual reality, telepresence, and robotics. Although promising, they also pose significant challenges in terms of efficiency, 4D motion and dynamics, temporal and subject consistency, and text-3D/video alignment. In light of these challenges, this talk will discuss our recent progress on how to represent and learn the 4D dynamic moving world, from its underlying dynamics to the reconstruction, editing, and generation of 4D dynamic scenes. This talk will motivate discussions about future directions on multi-modal 4D dynamic human-object-scene reconstruction, generation, and perception.

Organizers: Omid Taheri


  • Angelica Lim
  • Virtual (Zoom)

Science fiction has long promised us interfaces and robots that interact with us as smoothly as humans do - Rosie the Robot from The Jetsons, C-3PO from Star Wars, and Samantha from Her. Today, interactive robots and voice user interfaces are moving us closer to effortless, human-like interactions in the real world. In this talk, I will discuss the opportunities and challenges in finely analyzing, detecting and generating non-verbal communication in context, including gestures, gaze, auditory signals, and facial expressions. Specifically, I will discuss how we might allow robots and virtual agents to understand human social signals (including emotions, mental states, and attitudes) across cultures as well as recognize and generate expressions with controllability, transparency, and diversity in mind.

Organizers: Yao Feng Michael Black


  • Siheng Chen
  • N3

As the rapid growth of AI techniques, we might witness the emergence of AI agents entering our lives, reminiscent of new species. Ensuring these AI agents can well integrate into human life would be a profounding challenge. We urge these agents to be highly performant, safe, and well-aligned with human values. However, directly training and testing AI agents in real-world environments to guarantee their performance and safety is costly and can disrupt everyday life. Thus, we are exploring a simulation-based approach to incubate these AI agents. In this talk, we will highlight the role of simulation in two key scenarios: large language models (LLMs) and autonomous driving. Through these two studies, I will demonstrate how simulation can effectively facilitate the development of LLM agents and driving agents, ensuring they are both powerful and safe for human use.

Organizers: Yao Feng