Publications | Saeed Ghorbani

2024

ECCV
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues

Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, and Ali Etemad

European Conference on Computer Vision (ECCV) 2024

Abs Bib PDF Website

Abstract. We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields performance rivalling methods that rely on 3D annotated data while being the state-of-the-art among methods relying only on 2D supervision.
@article{davoodnia2024upose3d, title = {UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues}, author = {Davoodnia, Vandad and Ghorbani, Saeed and Carbonneau, Marc-Andr{\'e} and Messier, Alexandre and Etemad, Ali}, journal = {European Conference on Computer Vision (ECCV)}, year = {2024}, }
ECCVW
SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, and Ali Etemad

ECCVW Workshop on Video Games 2024

Abs Bib PDF Website

Abstract. We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields performance rivalling methods that rely on 3D annotated data while being the state-of-the-art among methods relying only on 2D supervision.
@article{davoodnia2024skelformer, title = {SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers}, author = {Davoodnia, Vandad and Ghorbani, Saeed and Messier, Alexandre and Etemad, Ali}, journal = {ECCVW Workshop on Video Games}, year = {2024}, }

2023

CGF
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus F. Troje, and Marc-André Carbonneau

Computer Graphics Forum 2023

Abs Bib PDF Blog Code Website

Abstract We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at https://github.com/ubisoft/ubisoft-laforge-ZeroEGGS.
@article{ghorbani2023zeroeggs, author = {Ghorbani, Saeed and Ferstl, Ylva and Holden, Daniel and Troje, Nikolaus F. and Carbonneau, Marc-Andr{\'e}}, title = {ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech}, journal = {Computer Graphics Forum}, year = {2023}, keywords = {animation, gestures, character control, motion capture}, doi = {https://doi.org/10.1111/cgf.14734}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14734}, }

2022

ICMI
Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022

Saeed Ghorbani, Ylva Ferstl, and Marc-André Carbonneau

In International Conference on Multimodal Interaction 2022

Abs Bib PDF

We present our entry to the GENEA Challenge of 2022 on data-driven co-speech gesture generation. Our system is a neural network that generates gesture animation from an input audio file. The motion style generated by the model is extracted from an exemplar motion clip. Style is embedded in a latent space using a variational framework. This architecture allows for generating in styles unseen during training. Moreover, the probabilistic nature of our variational framework furthermore enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. The GENEA challenge evaluation showed that our model produces full-body motion with highly competitive levels of human-likeness.
@inproceedings{ghorbani2022exemplar, title = {Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022}, author = {Ghorbani, Saeed and Ferstl, Ylva and Carbonneau, Marc-Andr{\'e}}, booktitle = {International Conference on Multimodal Interaction}, pages = {778--783}, year = {2022}, }

2021

APIN
Estimating Pose from Pressure Data for Smart Beds withDeep Image-based Pose Estimators

Vandad Davoodnia, Saeed Ghorbani, and Ali Etemad

Journal of Applied Intelligence 2021

Abs Bib

In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directlyor by retraining them on two pressure datasets. We also explore other strategies utilizing a learnable pre-processing domain adaptation step, which transforms the vague pressure maps to a representation closer tothe expected input space of common purpose pose estimation modules. Accordingly, we used a fully convolutional network with multiple scales to provide the pose-specific characteristics of the pressure maps to the pre-trained pose estimation module. Our complete analysis of different approaches shows that the combination of learnable pre-processing module along with re-training pre-existing image-based pose estimators on the pressure data is able to overcome issues such as highly vague pressure points to achieve very high pose estimation accuracy.
@article{davoodnia2021_b, journal = {Journal of Applied Intelligence}, title = {Estimating Pose from Pressure Data for Smart Beds withDeep Image-based Pose Estimators}, author = {Davoodnia, Vandad and Ghorbani, Saeed and Etemad, Ali}, publisher = {Springer}, year = {2021} }
ICCASP
In-bed pressure-based pose estimation using image space representation learning

Vandad Davoodnia, Saeed Ghorbani, and Ali Etemad

In IEEE International Conference on Acoustics, Speech and Signal Processing 2021

Abs Bib PDF

Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics. However, pose estimation models fail to generalize when facing images acquired from in-bed pressure sensing systems. In this paper, we address this challenge by presenting a novel end-to-end framework capable of accurately locating body parts from vague pressure data. Our method exploits the idea of equipping an off-the-shelf pose estimator with a deep trainable neural network, which pre-processes and prepares the pressure data for subsequent pose estimation. Our model transforms the ambiguous pressure maps to images containing shapes and structures similar to the common input domain of the pre-existing pose estimation methods. As a result, we show that our model is able to reconstruct unclear body parts, which in turn enables pose estimators to accurately and robustly estimate the pose. We train and test our method on a manually annotated public pressure map dataset using a combination of loss functions. Results confirm the effectiveness of our method by the high visual quality in the generated images and the high pose estimation rates achieved.
@inproceedings{davoodnia2021, title = {In-bed pressure-based pose estimation using image space representation learning}, author = {Davoodnia, Vandad and Ghorbani, Saeed and Etemad, Ali}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing}, year = {2021}, }
PLOS ONE
MoVi: A large multi-purpose human motion and video dataset

Saeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, and Nikolaus F. Troje

Plos one 2021

Abs Bib PDF Code Website

Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement. In five capture rounds, the same actors and movements were recorded using different hardware systems, including an optical motion capture system, video cameras, and inertial measurement units (IMU). For some of the capture rounds, the actors were recorded when wearing natural clothing, for the other rounds they wore minimal clothing. In total, our dataset contains 9 hours of motion capture data, 17 hours of video data from 4 different points of view (including one hand-held camera), and 6.6 hours of IMU data. In this paper, we describe how the dataset was collected and post-processed; We present state-of-the-art estimates of skeletal motions and full-body shape deformations associated with skeletal motion. We discuss examples for potential studies this dataset could enable.
@article{ghorbani2020movi, title = {MoVi: A large multi-purpose human motion and video dataset}, author = {Ghorbani, Saeed and Mahdaviani, Kimia and Thaler, Anne and Kording, Konrad and Cook, Douglas James and Blohm, Gunnar and Troje, Nikolaus F.}, year = {2021}, journal = {Plos one}, primaryclass = {cs.CV}, }

2020

CGF
Probabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable Model

Saeed Ghorbani, Calden Wloka, Ali Etemad, Marcus A. Brubaker, and Nikolaus F. Troje

Computer Graphics Forum (Symposium on Computer Aanimation) 2020

Abs Bib PDF Website

We present a probabilistic framework to generate character animations based on weak control signals, such that the synthesized motions are realistic while retaining the stochastic nature of human movement. The proposed architecture, which is designed as a hierarchical recurrent model, maps each sub-sequence of motions into a stochastic latent code using a variational autoencoder extended over the temporal domain. We also propose an objective function which respects the impact of each joint on the pose and compares the joint angles based on angular distance. We use two novel quantitative protocols and human qualitative assessment to demonstrate the ability of our model to generate convincing and diverse periodic and non-periodic motion sequences without the need for strong control signals.
@article{ghorbani2020b, journal = {Computer Graphics Forum (Symposium on Computer Aanimation)}, title = {{Probabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable Model}}, author = {Ghorbani, Saeed and Wloka, Calden and Etemad, Ali and Brubaker, Marcus A. and Troje, Nikolaus F.}, year = {2020}, publisher = {The Eurographics Association and John Wiley & Sons Ltd.}, issn = {1467-8659}, doi = {10.1111/cgf.14116}, talk = {https://www.youtube.com/watch?v=r9F74LcGC0A} }
ICPR
Gait Recognition using Multi-Scale Partial Representation Transformation with Capsules

Alireza Sepas-Moghaddam, Saeed Ghorbani, Nikolaus F. Troje, and Ali Etemad

International Conference on Pattern Recognition 2020

Abs Bib PDF

Gait recognition, referring to the identification of individuals based on the manner in which they walk, can be very challenging due to the variations in the viewpoint of the camera and the appearance of individuals. Current methods for gait recognition have been dominated by deep learning models, notably those based on partial feature representations. In this context, we propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules to obtain more discriminative gait features. Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor. It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions using Bi-directional Gated Recurrent Units (BGRU). Finally, a capsule network is adopted to learn deeper part-whole relationships and assigns more weights to the more relevant features while ignoring the spurious dimensions. That way, we obtain final features that are more robust to both viewing and appearance changes. The performance of our method has been extensively tested on two gait recognition datasets, CASIA-B and OU-MVLP, using four challenging test protocols. The results of our method have been compared to the state-of-the-art gait recognition solutions, showing the superiority of our model, notably when facing challenging viewing and carrying conditions.
@article{sepas2020gait, title = {Gait Recognition using Multi-Scale Partial Representation Transformation with Capsules}, author = {Sepas-Moghaddam, Alireza and Ghorbani, Saeed and Troje, Nikolaus F. and Etemad, Ali}, journal = {International Conference on Pattern Recognition}, year = {2020}, }

2019

CGI
Auto-labelling of markers in optical motion capture by permutation learning

Saeed Ghorbani, Ali Etemad, and Nikolaus F. Troje

In Computer Graphics International 2019

Abs Bib PDF

Optical marker-based motion capture is a vital tool in applications such as motion and behavioural analysis, animation, and biomechanics. Labelling, that is, assigning optical markers to the pre-defined positions on the body is a time consuming and labour intensive postprocessing part of current motion capture pipelines. The problem can be considered as a ranking process in which markers shuffled by an unknown permutation matrix are sorted to recover the correct order. In this paper, we present a framework for automatic marker labelling which first estimates a permutation matrix for each individual frame using a differentiable permutation learning model and then utilizes temporal consistency to identify and correct remaining labelling errors. Experiments conducted on the test data show the effectiveness of our framework.
@inproceedings{ghorbani2019auto, title = {Auto-labelling of markers in optical motion capture by permutation learning}, author = {Ghorbani, Saeed and Etemad, Ali and Troje, Nikolaus F.}, booktitle = {Computer Graphics International}, pages = {167--178}, year = {2019}, organization = {Springer}, award = {Best Paper Award} }

CVR

Automatic initialization and tracking of markers in optical motion capture by learning to rank

Saeed Ghorbani, Ali Etemad, and Nikolaus F. Troje

In CVR Vision Conference 2019

Bib Poster

@inproceedings{ghorbani2019,
  title = {Automatic initialization and tracking of markers in optical motion capture by learning to rank},
  author = {Ghorbani, Saeed and Etemad, Ali and Troje, Nikolaus F.},
  booktitle = {CVR Vision Conference},
  pages = {1--5},
  year = {2019},
  organization = {CVR},
  award = {Best Poster Award}
}

2010

WCSP
Sub-pixel image registration based on physical forces

Ali Ghayoor, Saeed Ghorbani, and Ali Asghar Beheshti Shirazi

In International Conference on Wireless Communications & Signal Processing (WCSP) 2010

Abs Bib HTML

A new method for image registration has been previously proposed by the authors, which the registration is based on physical forces. The registration parameters are translation and rotation. This method assumes images like charged materials that attract each other. In this case, one of the images moves in the same direction as the applied force while the other one is still. The movement of the image continues until the resultant force becomes zero. This approach estimates the registration parameters simultaneously and leading to a better optimized set of registration parameters. The registration error for this method is 1 to 3 pixels. In this paper we aim to develop this method for the applications which need sub-pixel accuracy. First, by applying the Canny edge detector on the input images, the edge information is also used for the registration process to increase the robustness of this method in the presence of noise. After that, sub-pixel accuracy is provided for this method by using interpolation techniques.
@inproceedings{ghayoor2010sub, title = {Sub-pixel image registration based on physical forces}, author = {Ghayoor, Ali and Ghorbani, Saeed and Shirazi, Ali Asghar Beheshti}, booktitle = {International Conference on Wireless Communications \& Signal Processing (WCSP)}, pages = {1--5}, year = {2010}, organization = {IEEE}, }