Ajad Chhatkuli

I am a postdoc at CVL Computer Vision Lab in ETH Zurich, group of Prof. Luc Van Gool. Before that I completed my PhD thesis under the supervision of Prof. Adrien Bartoli and Prof. Daniel Pizarro in the field of non-rigid 3D reconstruction. Currently my research scope is divided in two aspects: i) learning mid to high level understanding from images and ii) 3D shape analysis and 3D implicit functions.

Email / GitHub / Google Scholar /

Publications

	TADA: Taxonomy Adaptive Domain Adaptation Rui Gong, Martin Danelljan, Dengxin Dai, Wenguan Wang, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc Van Gool To appear in ECCV, 2022 arxiv / We propose a few shot domain adaptive method for incorporating source and target data which have inconsistent label spaces as well as different input domains. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features.
	Neural Vector Fields for Surface Representation and Inference Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool CoRR, 2022 arxiv / We propose a novel representation for implicit surfaces. This is the first method that can represent open surfaces and also obtain mesh representation with a feedforward network and a Marching Cubes like algorithm.
	Zero pixel directional boundary by vector transform Edoardo Mello Rella, Ajad Chhatkuli, Yun Liu, Ender Konukoglu, Luc Van Gool ICLR, 2022 pdf / We revisit the boundary detection problem and introduce a principled approach based on a redefinition of boundary with dense labels without label imbalance. We specifically use the unit vector to the closest boundary, beating standard binary label based methods and a baseline method based on distance transform in thorough experiments.
	ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization Simon Mauer, Menelaos Kanakis, Matteo Spallanzani, Ajad Chhatkuli, Luc Van Gool CoRR, 2022 arxiv / We propose a method for training a fast binary local image point descriptor. The paper describes a method to train binary descriptor by adapting network quantization techniques. The method achieves unprecedented speed in deep local image descriptors.
	Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes Ayça Takmaz, Danda Pani Paudel, Thomas Probst, Ajad Chhatkuli, Martin R Oswald, Luc Van Gool 3DV, 2021 arxiv / We present an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and nonrigid parts without explicitly modelling the camera motion. Our method uses the as rigid as possible deformation prior.
	Cluster, split, fuse, and update: Meta-learning for open compound domain adaptive semantic segmentation Rui Gong, Yuhua Chen, Danda Pani Paudel, Yawei Li, Ajad Chhatkuli, Wen Li, Dengxin Dai, Luc Van Gool CVPR, 2021 pdf / The paper tackles the challenging problem of Open Compound Domain Adaptation (OCDA), where target domain is modeled as a compound of multiple unknown homogeneous domains. We propose a principled meta-learning based approach to OCDA for semantic segmentation, MOCDA, by modeling the unlabeled target domain continuously.
	Efficient conditional gan transfer with knowledge propagation across classes Mohamad Shahbazi, Zhiwu Huang, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool CVPR, 2021 pdf / We introduce a new GAN transfer method to explicitly propagate the knowledge from the old classes to the new classes. The key idea is to enforce the popularly used conditional batch normalization (BN) to learn the class-specific information of the new classes from that of the old classes, with implicit knowledge sharing among the new ones. This allows for an efficient knowledge propagation from the old classes to the new ones, with the BN parameters increasing linearly with the number of new classes.
	Transformer in convolutional neural networks Yun Liu, Yun Liu, Guolei Sun, Yu Qiu, Le Zhang, Ajad Chhatkuli, Luc Van Gool CVPR, 2021 arxiv / This paper tackles the low-efficiency flaw of the vision transformer caused by the high computational/space complexity in Multi-Head Self-Attention (MHSA). To this end, we propose the Hierarchical MHSA (H-MHSA), whose representation is computed in a hierarchical manner. Specifically, we first divide the input image into patches as commonly done, and each patch is viewed as a token. Then, the proposed H-MHSA learns token relationships within local patches, serving as local relationship modeling. Then, the small patches are merged into larger ones, and H-MHSA models the global dependencies for the small number of the merged tokens. At last, the local and global attentive features are aggregated to obtain features with powerful representation capacity.
	Learning Condition Invariant Features for Retrieval-Based Localization from 1M Images Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool CoRR, 2020 arxiv / In this paper, we train and evaluate several localization methods on three different benchmark datasets, including Oxford RobotCar with over one million images. This large scale evaluation yields valuable insights into the generalizability and performance of retrieval-based localization. Based on our findings, we develop a novel method for learning more accurate and better generalizing localization features. It consists of two main contributions: (i) a feature volume-based loss function, and (ii) hard positive and pairwise negative mining.
	Unsupervised learning of category-specific symmetric 3d keypoints from point sets Clara Fernandez-Labrador, Ajad Chhatkuli, Danda Pani Paudel, Jose J Guerrero, Cédric Demonceaux, Luc Van Gool ECCV, 2020 pdf / code / This paper aims at learning semantic 3D keypoints across misaligned shapes in a category, in an unsupervised manner. In order to do so, we model shapes defined by the keypoints, within a category, using the symmetric linear basis shapes without assuming the plane of symmetry to be known. The plane of symmetry and the basis shapes are learned as weights of the network for a category, while the coefficients are predicted per shape instance.
	Self-Calibration Supported Robust Projective Structure-from-Motion Rui Gong, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool CoRR, 2020 arxiv / In this paper, we propose a unified SfM method, in which the matching process is supported by self-calibration constraints. We use the idea that good matches should yield a valid calibration. In this process, we make use of the Dual Image of Absolute Quadric projection equations within a multiview correspondence framework, in order to obtain robust matching from a set of putative correspondences.
	Geometrically mappable image features Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool RAL, 2020 pdf / In this work, we propose a method that learns image features targeted for image-retrieval-based localization. Retrieval-based localization has several benefits, such as easy maintenance and quick computation. However, the state-of-the-art features only provide visual similarity scores which do not explicitly reveal the geometric distance between query and retrieved images. Knowing this distance is highly desirable for accurate localization, especially when the reference images are sparsely distributed in the scene. Therefore, we propose a novel loss function for learning image features which are both visually representative and geometrically relatable.
	Convex relaxations for consensus and non-minimal problems in 3D vision Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool ICCV, 2019 pdf / The proposed method exploits the well known Shor’s or Lasserre’s relaxations, whose theoretical aspects are also discussed. Notably, we further exploit the Polynomials Optimization Problems (POP) formulation of non-minimal solver also for the generic consensus maximization problems in 3D vision. We support the proposed framework by three diverse applications in 3D vision, namely rigid body transformation estimation, Non-Rigid Structure-fromMotion (NRSfM), and camera autocalibration
	Unsupervised learning of consensus maximization for 3d vision problems Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool CVPR, 2019 pdf / In this paper, we propose for the first time an unsupervised learning framework for consensus maximization, in the context of solving 3D vision problems. For that purpose, we establish a relationship between inlier measurements, represented by an ideal of inlier set, and the subspace of polynomials representing the space of target transformations. Using this relationship, we derive a constraint that must be satisfied by the sought inlier set.
	What Correspondences Reveal About Unknown Camera and Motion Models? Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool CVPR, 2019 pdf / The work describes finding a particular camera geometry from two view image correspondences. We first describe a framework that can be used to compute the `simplest’ camera model for a given set of correspondences. We further provide a theoretical analysis on what type of motions and camera models are discoverable from two view correspondences.
	Mapping, localization and path planning for image-based navigation using visual features and map Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Thomas Probst, Luc Van Gool CVPR, 2019 pdf / The problem of localization often arises as part of a navigation process. In this paper we summarize the reference images as a set of landmarks, which meet the requirements for image-based navigation. A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: compact map construction and accurate self localization. These requirements are then exploited for compact map representation and accurate self-localization, using the framework of a network flow problem. During this process, we formulate the map construction and self-localization problems as convex quadratic and second-order cone programs, respectively.
	Model-free consensus maximization for non-rigid shapes Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool ECCV, 2018 pdf / We formulate the model-free consensus maximization as an Integer Program in a graph using ‘rules’ on measurements. We then provide a method to solve it optimally using the Branch and Bound (BnB) paradigm. We focus its application on non-rigid shapes, where we apply the method to remove outlier 3D correspondences and achieve performance superior to the state of the art.
	Incremental non-rigid structure-from-motion with unknown focal length Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool ECCV, 2018 pdf / In this paper we present a method for incremental Non-Rigid Structure-from-Motion (NRSfM) with the perspective camera model and the isometric surface prior with unknown focal length. In the template-based case, we provide a method to estimate four parameters of the camera intrinsics. For the template-less scenario of NRSfM, we propose a method to upgrade reconstructions obtained for one focal length to another based on local rigidity and the so-called Maximum Depth Heuristics (MDH). On its basis we propose a method to simultaneously recover the focal length and the non-rigid shapes. We further solve the problem of incorporating a large number of points and adding more views in MDH-based NRSfM and efficiently solve them with Second-Order Cone Programming (SOCP).
	Automatic tool landmark detection for stereo vision in robot-assisted retinal surgery Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool ICRA/RAL, 2018 arxiv / In this paper, we solve the problem of the calibration of stereo-microscope and consequently that of the 3D reconstruction of an unknown scene under the microscope. For the first time using a single pipeline, starting from uncalibrated cameras we achieve the metric 3D reconstruction and registration, for retinal microsurgery. The key ingredients of our method are: (a) surgical tool landmark detection, and (b) 3D reconstruction with the stereo microscope, using the detected landmarks. To address the former, we propose a novel deep learning method that detects and recognizes keypoints in high definition images at higher than real-time speed. We use the detected 2D keypoints along with their corresponding 3D coordinates obtained from the robot sensors to calibrate the stereo microscope using an affine projection model. We design an online 3D reconstruction pipeline that makes use of smoothness constraints and performs robot-to-camera registration.
	Inextensible non-rigid structure-from-motion by second-order cone programming Ajad Chhatkuli, Daniel Pizarro, Toby Collins, Adrien Bartoli T-PAMI, 2017 pdf / We present a global and convex formulation for the template-less 3D reconstruction of a deforming object with the perspective camera. We show for the first time how to construct a Second-Order Cone Programming (SOCP) problem for Non-Rigid Structure-from-Motion (NRSfM) using the Maximum-Depth Heuristic (MDH). In this regard, we deviate strongly from the general trend of using affine cameras and factorization-based methods to solve NRSfM, which do not perform well with complex nonlinear deformations. In MDH, the points’ depths are maximized so that the distance between neighbouring points in camera space are upper bounded by the geodesic distance. In NRSfM both geodesic and camera space distances are unknown. We show that, nonetheless, given point correspondences and the camera’s intrinsics the whole problem can be solved with SOCP. This is the first convex formulation for NRSfM with physical constraints. We further present how robustness and temporal continuity can be included in the formulation to handle outliers and decrease the problem size, respectively.
	Inextensible non-rigid shape-from-motion by second-order cone programming Ajad Chhatkuli, Daniel Pizarro, Toby Collins, Adrien Bartoli CVPR, 2016 pdf / We present a global and convex formulation for template-less 3D reconstruction of a deforming object with the perspective camera. We show for the first time how to construct a Second-Order Cone Programming (SOCP) problem for Non-Rigid Shape-from-Motion (NRSfM) using the Maximum-Depth Heuristic (MDH). In this regard, we deviate strongly from the general trend of using affine cameras and factorization-based methods to solve NRSfM. In MDH, the points’ depths are maximized so that the distance between neighbouring points in camera space are upper bounded by the geodesic distance. In NRSfM both geodesic and camera space distances are unknown. We show that, nonetheless, given point correspondences and the camera’s intrinsics the whole problem is convex and solvable with SOCP.
	A stable analytical framework for isometric shape-from-template by surface integration Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli, Toby Collins T-PAMI, 2016 pdf / Shape-from-Template (SfT) reconstructs the shape of a deforming surface from a single image, a 3D template and a deformation prior. For isometric deformations, this is a well-posed problem. However, previous methods which require no initialization break down when the perspective effects are small, which happens when the object is small or viewed from larger distances. That is, they do not handle all projection geometries. We propose stable SfT methods that accurately reconstruct the 3D shape for all projection geometries. We follow the existing approach of using first-order differential constraints and obtain local analytical solutions for depth and the first-order quantities: the depth-gradient or the surface normal. Previous methods use the depth solution directly to obtain the 3D shape. We prove that the depth solution is unstable when the projection geometry tends to affine, while the solution for the first-order quantities remain stable for all projection geometries. We therefore propose to solve SfT by first estimating the first-order quantities (either depth-gradient or surface normal) and integrating them to obtain shape.
	Non-Rigid Shape-from-Motion for Isometric Surfaces using Infinitesimal Planarity Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli BMVC, 2014 pdf / This paper proposes a general framework to solve Non-Rigid Shape-from-Motion (NRSfM) with the perspective camera under isometric deformations. Contrary to the usual low-rank linear shape basis, isometry allows us to recover complex shape deformations from a sparse set of images. Existing methods suffer from ambiguities and may be very expensive to solve. We bring four main contributions. First, we formulate isometric NRSfM as a system of first-order Partial Differential Equations (PDE) involving the shape’s depth and normal field and an unknown template. Second, we show this system cannot be locally resolved. Third, we introduce the concept of infinitesimal planarity and show that it makes the system locally solvable for at least three views. Fourth, we derive an analytic solution which involves convex, linear least-squares optimization only, and outperforms existing works.
	Live image parsing in uterine laparoscopy Ajad Chhatkuli, Adrien Bartoli, Abed Malti, Toby Collins ISBI, 2014 pdf / Augmented Reality (AR) can improve the information delivery to surgeons. In laparosurgery, the primary goal of AR is to provide multimodal information overlaid in live laparoscopic videos. For gynecologic laparoscopy, the 3D reconstruction of uterus and its deformable registration to preoperative data form the major problems in AR. Shape-from-Shading (SfS) and inter-frame registration require an accurate identification of the uterus region, the occlusions due to surgical tools, specularities, and other tissues. We propose a cascaded patient-specific real-time segmentation method to identify these four important regions. We use a color based Gaussian Mixture Model (GMM) to segment the tools and a more elaborate color and texture model to segment the uterus. The specularities are obtained by a saturation test. We show that our segmentation improves SfS and inter-frame registration of the uterus.
	Stable template-based isometric 3D reconstruction in all imaging conditions by linear least-squares Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli CVPR, 2014 pdf / Reconstructing an isometric surface from a single 2D input image matched to a 3D template has been shown to be a well-posed problem. This however does not tell us how reconstruction algorithms will behave in practical conditions, where the amount of perspective is generally small and the projection thus behaves like weak-perspective or orthography. We here bring answers to what is theoretically recoverable in such imaging conditions, and explain why existing convex numerical solutions and analytical solutions to 3D reconstruction may be unstable. We then propose a new algorithm which works under all imaging conditions, from strong to loose perspective by using the algebraic solution of the depth’s Jacobian.
	Separating compound figures in journal articles to allow for subfigure classification Ajad Chhatkuli, Antonio Foncubierta-Rodríguez, Dimitrios Markonis, Fabrice Meriaudeau, Henning Müller SPIE, 2013 pdf / Journal images represent an important part of the knowledge stored in the medical literature. Figure classification has received much attention as the information of the image types can be used in a variety of contexts to focus image search and filter out unwanted information or ”noise”, for example non–clinical images. A major problem in figure classification is the fact that many figures in the biomedical literature are compound figures and do often contain more than a single figure type. Some journals do separate compound figures into several parts but many do not, thus requiring currently manual separation. In this work, a technique of compound figure separation is proposed and implemented based on systematic detection and analysis of uniform space gaps. The method discussed in this article is evaluated on a dataset of journal figures of the open access literature that was created for the ImageCLEF 2012 benchmark and contains about 3000 compound figures. Automatic tools can easily reach a relatively high accuracy in separating compound figures. To further increase accuracy efforts are needed to improve the detection process as well as to avoid over–separation with powerful analysis strategies.

Design and source code from Jon Barron's website and Leonid Keselman's Jeckyll fork.

Ajad Chhatkuli

Publications

TADA: Taxonomy Adaptive Domain Adaptation

Neural Vector Fields for Surface Representation and Inference

Zero pixel directional boundary by vector transform

ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization

Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes

Cluster, split, fuse, and update: Meta-learning for open compound domain adaptive semantic segmentation

Efficient conditional gan transfer with knowledge propagation across classes

Transformer in convolutional neural networks

Learning Condition Invariant Features for Retrieval-Based Localization from 1M Images

Unsupervised learning of category-specific symmetric 3d keypoints from point sets

Self-Calibration Supported Robust Projective Structure-from-Motion

Geometrically mappable image features

Convex relaxations for consensus and non-minimal problems in 3D vision

Unsupervised learning of consensus maximization for 3d vision problems

What Correspondences Reveal About Unknown Camera and Motion Models?

Mapping, localization and path planning for image-based navigation using visual features and map

Model-free consensus maximization for non-rigid shapes

Incremental non-rigid structure-from-motion with unknown focal length

Automatic tool landmark detection for stereo vision in robot-assisted retinal surgery

Inextensible non-rigid structure-from-motion by second-order cone programming

Inextensible non-rigid shape-from-motion by second-order cone programming

A stable analytical framework for isometric shape-from-template by surface integration

Non-Rigid Shape-from-Motion for Isometric Surfaces using Infinitesimal Planarity

Live image parsing in uterine laparoscopy

Stable template-based isometric 3D reconstruction in all imaging conditions by linear least-squares

Separating compound figures in journal articles to allow for subfigure classification