Publications
2024
- ECCVUPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal CuesVandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, and Ali EtemadEuropean Conference on Computer Vision (ECCV) 2024
Abstract. We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields performance rivalling methods that rely on 3D annotated data while being the state-of-the-art among methods relying only on 2D supervision.
- ECCVWSkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal TransformersVandad Davoodnia, Saeed Ghorbani, Alexandre Messier, and Ali EtemadECCVW Workshop on Video Games 2024
Abstract. We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields performance rivalling methods that rely on 3D annotated data while being the state-of-the-art among methods relying only on 2D supervision.
2023
- CGFZeroEGGS: Zero-shot Example-based Gesture Generation from SpeechComputer Graphics Forum 2023
Abstract We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at https://github.com/ubisoft/ubisoft-laforge-ZeroEGGS.
2022
- ICMIExemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022Saeed Ghorbani, Ylva Ferstl, and Marc-André CarbonneauIn International Conference on Multimodal Interaction 2022
We present our entry to the GENEA Challenge of 2022 on data-driven co-speech gesture generation. Our system is a neural network that generates gesture animation from an input audio file. The motion style generated by the model is extracted from an exemplar motion clip. Style is embedded in a latent space using a variational framework. This architecture allows for generating in styles unseen during training. Moreover, the probabilistic nature of our variational framework furthermore enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. The GENEA challenge evaluation showed that our model produces full-body motion with highly competitive levels of human-likeness.
2021
- APINEstimating Pose from Pressure Data for Smart Beds withDeep Image-based Pose EstimatorsVandad Davoodnia, Saeed Ghorbani, and Ali EtemadJournal of Applied Intelligence 2021
In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directlyor by retraining them on two pressure datasets. We also explore other strategies utilizing a learnable pre-processing domain adaptation step, which transforms the vague pressure maps to a representation closer tothe expected input space of common purpose pose estimation modules. Accordingly, we used a fully convolutional network with multiple scales to provide the pose-specific characteristics of the pressure maps to the pre-trained pose estimation module. Our complete analysis of different approaches shows that the combination of learnable pre-processing module along with re-training pre-existing image-based pose estimators on the pressure data is able to overcome issues such as highly vague pressure points to achieve very high pose estimation accuracy.
- ICCASPIn-bed pressure-based pose estimation using image space representation learningVandad Davoodnia, Saeed Ghorbani, and Ali EtemadIn IEEE International Conference on Acoustics, Speech and Signal Processing 2021
Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics. However, pose estimation models fail to generalize when facing images acquired from in-bed pressure sensing systems. In this paper, we address this challenge by presenting a novel end-to-end framework capable of accurately locating body parts from vague pressure data. Our method exploits the idea of equipping an off-the-shelf pose estimator with a deep trainable neural network, which pre-processes and prepares the pressure data for subsequent pose estimation. Our model transforms the ambiguous pressure maps to images containing shapes and structures similar to the common input domain of the pre-existing pose estimation methods. As a result, we show that our model is able to reconstruct unclear body parts, which in turn enables pose estimators to accurately and robustly estimate the pose. We train and test our method on a manually annotated public pressure map dataset using a combination of loss functions. Results confirm the effectiveness of our method by the high visual quality in the generated images and the high pose estimation rates achieved.
- PLOS ONEMoVi: A large multi-purpose human motion and video datasetSaeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, and Nikolaus F. TrojePlos one 2021
Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement. In five capture rounds, the same actors and movements were recorded using different hardware systems, including an optical motion capture system, video cameras, and inertial measurement units (IMU). For some of the capture rounds, the actors were recorded when wearing natural clothing, for the other rounds they wore minimal clothing. In total, our dataset contains 9 hours of motion capture data, 17 hours of video data from 4 different points of view (including one hand-held camera), and 6.6 hours of IMU data. In this paper, we describe how the dataset was collected and post-processed; We present state-of-the-art estimates of skeletal motions and full-body shape deformations associated with skeletal motion. We discuss examples for potential studies this dataset could enable.
2020
- CGFProbabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable ModelComputer Graphics Forum (Symposium on Computer Aanimation) 2020
We present a probabilistic framework to generate character animations based on weak control signals, such that the synthesized motions are realistic while retaining the stochastic nature of human movement. The proposed architecture, which is designed as a hierarchical recurrent model, maps each sub-sequence of motions into a stochastic latent code using a variational autoencoder extended over the temporal domain. We also propose an objective function which respects the impact of each joint on the pose and compares the joint angles based on angular distance. We use two novel quantitative protocols and human qualitative assessment to demonstrate the ability of our model to generate convincing and diverse periodic and non-periodic motion sequences without the need for strong control signals.
- ICPRGait Recognition using Multi-Scale Partial Representation Transformation with CapsulesAlireza Sepas-Moghaddam, Saeed Ghorbani, Nikolaus F. Troje, and Ali EtemadInternational Conference on Pattern Recognition 2020
Gait recognition, referring to the identification of individuals based on the manner in which they walk, can be very challenging due to the variations in the viewpoint of the camera and the appearance of individuals. Current methods for gait recognition have been dominated by deep learning models, notably those based on partial feature representations. In this context, we propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules to obtain more discriminative gait features. Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor. It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions using Bi-directional Gated Recurrent Units (BGRU). Finally, a capsule network is adopted to learn deeper part-whole relationships and assigns more weights to the more relevant features while ignoring the spurious dimensions. That way, we obtain final features that are more robust to both viewing and appearance changes. The performance of our method has been extensively tested on two gait recognition datasets, CASIA-B and OU-MVLP, using four challenging test protocols. The results of our method have been compared to the state-of-the-art gait recognition solutions, showing the superiority of our model, notably when facing challenging viewing and carrying conditions.
2019
- CGIAuto-labelling of markers in optical motion capture by permutation learningSaeed Ghorbani, Ali Etemad, and Nikolaus F. TrojeIn Computer Graphics International 2019
Optical marker-based motion capture is a vital tool in applications such as motion and behavioural analysis, animation, and biomechanics. Labelling, that is, assigning optical markers to the pre-defined positions on the body is a time consuming and labour intensive postprocessing part of current motion capture pipelines. The problem can be considered as a ranking process in which markers shuffled by an unknown permutation matrix are sorted to recover the correct order. In this paper, we present a framework for automatic marker labelling which first estimates a permutation matrix for each individual frame using a differentiable permutation learning model and then utilizes temporal consistency to identify and correct remaining labelling errors. Experiments conducted on the test data show the effectiveness of our framework.
- CVRAutomatic initialization and tracking of markers in optical motion capture by learning to rankSaeed Ghorbani, Ali Etemad, and Nikolaus F. TrojeIn CVR Vision Conference 2019
2010
- WCSPSub-pixel image registration based on physical forcesAli Ghayoor, Saeed Ghorbani, and Ali Asghar Beheshti ShiraziIn International Conference on Wireless Communications & Signal Processing (WCSP) 2010
A new method for image registration has been previously proposed by the authors, which the registration is based on physical forces. The registration parameters are translation and rotation. This method assumes images like charged materials that attract each other. In this case, one of the images moves in the same direction as the applied force while the other one is still. The movement of the image continues until the resultant force becomes zero. This approach estimates the registration parameters simultaneously and leading to a better optimized set of registration parameters. The registration error for this method is 1 to 3 pixels. In this paper we aim to develop this method for the applications which need sub-pixel accuracy. First, by applying the Canny edge detector on the input images, the edge information is also used for the registration process to increase the robustness of this method in the presence of noise. After that, sub-pixel accuracy is provided for this method by using interpolation techniques.