Invited Speakers



Assoc. Prof. Muhammad Tariq Mahmood

Korea University of Technology and Education (KOREATECH), Korea

He received the MCS degree in Computer Science from AJK University, Muzaffarabad, Pakistan, in 2004, the MS degree in Intelligent Software Systems from Blekinge Institute of Technology, Sweden, in 2006, and the PhD degree in Information and Mechatronics from Gwangju Institute of Science and Technology (GIST), Korea, in 2011. In the early stages of his career, he worked as a Software Engineer for over eight years at Khaksar and Co., Islamabad, Pakistan. He has been involved in several research projects funded by the National Research Foundation (NRF) of Korea, focusing on areas such as shape-from-focus/defocus, smart cities, and underwater imaging. He has authored more than 100 research articles published in reputable journals and international conferences. He is currently serving as an Associate Professor in the School of Computer Science and Engineering at Korea University of Technology and Education (KOREATECH), Cheonan, Korea. His research interests include image processing, 3D shape recovery from image focus, computer vision, pattern recognition, and machine/deep learning.

Speech Title: Focal Stack Size Agnostic Depth from Focus via Multi-Scale Recurrent Networks

Depth from Focus (DFF) estimates scene depth by identifying the focal slice where each pixel appears sharpest within a captured focal stack. Existing deep learning approaches typically collapse this 3D focus volume into a depth map in a single pass, which often leads to amplified noise, blurred object boundaries, and loss of contextual information. To overcome these limitations, we reformulate DFF as a learned iterative energy minimization process, inspired by classical optimization-based formulations. Our architecture consists of two parallel encoders: a multi-scale focal encoder that extracts spatial features from each slice of the stack, and a guidance encoder that processes the mean image of the stack to provide global priors. A focus mapping module integrates these features through 3D convolutions to produce per-slice focus likelihood volumes, which are then fused into a slice-invariant embedding, allowing support for stacks of arbitrary length. An initial coarse depth map is estimated from the highest-resolution focus volume and is progressively refined using a stack of ConvGRU layers. These recurrent units, guided by the global priors, iteratively correct residual focus errors, enabling sharper object boundaries and more spatially consistent depth estimates. A learned upsampling head restores high-frequency details lost during encoding, and the model is supervised at every iteration, enforcing depth consistency throughout the refinement process. Our method produces high-quality depth maps with crisp edges and coherent surfaces, and is capable of operating effectively with as few as two input images. Extensive experiments on synthetic and real-world datasets demonstrate superior accuracy and generalization compared to state-of-the-art DFF approaches.