All you need to know about coordinate conversions

Last updated on Apr 7, 2024 2 min read

In many situations, we need to obtain the position and rotation of a camera or an object from different software applications. However, the way these parameters are acquired can vary. The representation of position and rotation may differ; we might get a camera-to-world matrix, a world-to-camera matrix, a rotation matrix, or other forms such as quaternions or Euler angles.

Moreover, our data might come from various sources, such as Blender, Unreal, or CodeMap. We might also want to convert our data to be compatible with other software like OpenCV or PyTorch 3D.

So, what should we do? Here, we propose an approach and introduce an intermediate layer that is more intuitive for users. This layer allows for the conversion between different formats and data sources, facilitating the interchange of information.

This intermediary layer operates in a right-handed coordinate system where the Z-axis points upwards, the X-axis points out of the screen, and the Y-axis points to the right. In this system, the rotation of an object is represented by pitch, roll, and yaw.

The reason for using this coordinate system is that it is the easiest to understand and is also widely used in practice. For example, the Blender coordinate system is very similar to this one.

As for the use of the pitch, roll, and yaw system, there are a couple of reasons. Firstly, it is very intuitive as it uses three Euler angles to represent the state of an object. Secondly, the definition of Euler angles is based on fixed-axis rotation, rather than rotating along with the axis. This makes it relatively easier to directly visualize the position of a camera or the posture of an object. By visualizing these angles, you can directly check if your conversion logic is correct.

All you need to know about coordinate conversions

Citation

Zhiyuan Gao

PhD Student@USC