From f8f743032fde5948790706df7858fdf24dfda7e6 Mon Sep 17 00:00:00 2001 From: Swathi Sheshadri Date: Mon, 2 Dec 2019 15:02:48 +0100 Subject: [PATCH] doc edits from Hans comments --- .DS_Store | Bin 14340 -> 14340 bytes paper/paper.md | 19 +++++++++---------- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/.DS_Store b/.DS_Store index 1dae0c35d527292088a190795e005bea22bbaba5..76feab139487db68137b1eaaee6c95a9f41ac51e 100644 GIT binary patch delta 632 zcmaJ;&ubG=5S~#g?>$WNVw&W|?Al;V4asiRS{rJOq_njKD;111Y7x`!3X9DqCaIv8 z9;`Qi^89g-ii$TCrCL3DRHWh$dJ;T|di5U=-R!C-aSj9XeKX8A^Uc@hYmYyOH9gyN zTB2LB^pI3y$J)f6ARCo)n8gFeI%9IA_t%EkcO=t#cW&3fiq9Y5!F7xs zwhYs5TE{K3F7uk0E869X8C?B_Qnw`%6{0jdWu2V0nx}b^5EqRrf=wFbrn#?bv_=iS zM7;ilBE*pvGgvpBY&ZklAMml1*i3nu6dR5d56Li%@&+xQux!Ic#1^)V9O4E&I8&Z7 zXFRNj^%B09PMk994XbK~6?>Km@8{B9wRwpNdo8_n^WX*;rB{oXGW zA^mYNo-AwyF9^h-3kom_lVCy(&cb=P2$$d{+=EB31TWwvEW=xP2k+q{e1U`5J)RdYbgs+n4wEPa29bkjBWex|DETp|l=zorq9i55{ z>7j5MGnhpkdohQ7xDy9)2#00AuKO=fB7$u!{=;ai2H&W!EoroY zN=0I&C~CKtDrj|Z(V~NxDmXZZi;D_1LlHz2T^t;`IJkIk0!|fgI5@t~aolIRFkN`E z=w#9M?VDRWwso0ngDKPPI&M>UI+R_9j1yz|ea2+Y91W$+CD%H0%C(;R5)rB{Gq$PE zeC$;t!e(!5ajVLZo-O7E#&qMPk<-O9#w%87b*t(TtMs1B$z5m6T=wcEE5*B2^{AEl zv*X1*gJuT7)TB0xXmA=t;`bob~j|Qv(LsJ?X;K4=&(S|LEAc`1zkVFc-=*M2{ z#{nF~F`R;dGZ;Y;7jX$wxQrXPjXB)KJv_ie%%hA2yfi%yzxk;pWYZ*E;D2zx9fmpV z50=zsr%P+qyiERDu!?n5{91rZJAvv^^J^i-#Hv~gYY{H8Ow`p{hq9Fkor~4A+NH#} z80VtCR^v(!6T7)+sJNP)YBHrLf|}~ya{gDcMEswS^YoNnNyLxzS2cU=(>w;5;UAL0VqHRoJiLI%Y77 aTaxq+=FGEgfpXfjVE3;)zv6iKIsXG{!=d>A diff --git a/paper/paper.md b/paper/paper.md index 8318a11..19d842e 100755 --- a/paper/paper.md +++ b/paper/paper.md @@ -29,21 +29,20 @@ bibliography: paper.bib # Summary -Markerless tracking is a crucial experimental requirement for behavioral studies conducted in many species in different environments. A recently developed toolbox called DeepLabCut (DLC) (@Mathis2018) leverages Artificial Neural Network (ANN) based computer vision to make precise markerless tracking possible for scientific experiments. DLC uses a deep convolutional neural network, ResNet (@He2016) pre-trained on ImageNet database (@Imagenet) and adapts it to make it applicable for behavioral tracking tasks. To track complex behaviors such as grasping with object interaction in 3D, experimental setups with multiple cameras have to be developed. Development of such systems can largely benefit from a robust and easy to use camera calibration and 3D reconstruction toolbox. The current version of DLC allows 3D reconstruction of features tracked in 2D from pairs of cameras only (@Nath2019). This solution is not sufficient when behavior is tracked with more than 2 cameras. Furthermore, for recording conditions that might result in noisy 2D tracking such as low light and low resolution, the accuracy in the tracked 3D coordinate can be improved by using data from more than two cameras.
+Markerless tracking is a crucial experimental requirement for behavioral studies conducted in many species in different environments. A recently developed toolbox called DeepLabCut (DLC) (@Mathis2018) leverages Artificial Neural Network (ANN) based computer vision to make precise markerless tracking possible for scientific experiments. DLC uses a deep convolutional neural network, ResNet (@He2016) pre-trained on ImageNet database (@Imagenet) and adapts it to make it applicable for behavioral tracking tasks. To track complex behaviors such as grasping with object interaction in 3D, experimental setups with multiple cameras are required. Such systems can largely benefit from a robust and easy to use camera calibration and 3D reconstruction toolbox. The current version of DLC allows 3D reconstruction of features tracked in 2D from pairs of cameras only (@Nath2019) and is not sufficient when behavior is tracked with more than 2 cameras. Furthermore, for noisy 2D tracking conditions, such as low light or low resolution, the accuracy of tracked 3D coordinates can be improved by evaluating data from more than two cameras.
To facilitate 3D tracking of complex behaviors requiring multiple cameras (n>=2), we developed pose3d: a Matlab (The MathWorks Inc., Natick, Massachusetts) implementation for 3D reconstruction. pose3d can be divided into two main parts: camera calibration with pairwise estimation of relative camera positions to each other and 3D reconstruction with triangulation performed over 2D feature coordinates tracked from all cameras. Camera calibration refers to the estimation of intrinsic camera parameters including the level of magnification captured by the focal length and the position in space given by the optical center of the lens, extrinsic camera parameters including the location of camera in 3D coordinates as well as the distortion in the camera image caused by the lens. In stereo camera calibration, one of the cameras is fixed as the primary camera and the position of the secondary camera with respect to the primary camera is estimated and given as an additional parameter. Stereo camera calibration in pose3d is performed by using checkerboard presented at varied angles to the pair of cameras being calibrated to establish the correspondence between 2D image coordinates and 3D coordinates. Following calibration, the 2D feature tracked across cameras is transformed into 3D coordinates using triangulation. pose3d implements triangulation by minimizing the distance of the estimated 3D feature coordinate from the lines passing through the focal center and the image plane of the cameras simultaneously.
# pose3d usage and features -To use pose3d for stereo camera calibration and 3D reconstruction the users need only to edit a configuration file to enter their experiment specific details. The configuration file is extensively commented to help users through the process of editing it. Running the main function of the repository ‘main_pose3d.m’ with user’s configuration file, creates the folder structure required for stereo camera calibration and 3D reconstruction automatically. Then, pose3d launches Matlab’s stereoCameraCalibrator GUI sequentially for every pair of primary and secondary cameras to estimate the parameters of the cameras in the user’s experimental setup. The GUI first parses through checkerboard frames recorded simultaneously from the two cameras being calibrated to select frames in which checkerboard can be detected in both cameras. At this point, the user can select the number of distortion coefficients which determines the degree of polynomial used to estimate distortion. If the lens used in the experiment causes higher distortion of images, 3 distortion coefficients can be used instead of the 2 (default) by simply selecting the radio-button on the GUI corresponding to it. After this, clicking the calibrate button on the GUI, estimates camera parameters and relative positioning of secondary camera with respect to the primary. The GUI also displays reprojection errors across calibration frames allowing users to iteratively recalibrate after removing outliers (this can be done by right-clicking on the data browser of the GUI and selecting the option remove and recalibrate). After calibration is complete, session can be saved by clicking the save as button. This procedure is then repeated for all camera pairs to be calibrated. pose3d prompts users throughout this procedure, providing messages on the next steps to be carried out.
-After calibrating all cameras in the setup with respect to the primary, the 2D feature of interest tracked across cameras are corrected for distortion and 3D coordinate is estimated by triangulation. The function for triangulation across ‘n’ cameras is called ‘triangulate_ncams’. This function is available for users in three modes that can be selected based on prior knowledge of experimental setup. The first mode of triangulation is referred to as ‘all’ and uses data from all ‘n’ cameras that are used to track the feature in 2D. The second mode is referred to as ‘avg’ and performs triangulation over all pairs of cameras in the setup and provides the average over all pairs as the result. The third mode is referred to as ‘best-pair’ and selects the best camera pair for every time point and feature of interest for triangulation. While the first and second modes can be used for 3D reconstruction of features tracked with any software, the third mode is applicable only when tracked 2D features are associated with likelihood values which are a given when using DLC.
-Overall, pose3d provides a semi-automated 3D reconstruction workflow going beyond Matlab’s existing functions (details in section titled ‘Comparison of pose3d to existing Matlab functions') that takes users through the entire process of camera calibration, undistortion, triangulation as well as post processing steps such as filtering to reduce outliers. pose3d also allows users to try different pre- and post-processing parameters and visualize its effect on the accuracy of 3D reconstruction to help perform manual parameter tuning before saving final results. There exists other OpenCV (@opencv) based implementations in python particularly Anipose (@Anipose) which provide similar functionality as pose3d. However, in comparison to OpenCV based implementations, pose3d provides a user-friendly application by integrating with the feature-rich graphical user interface (GUI) for stereo camera calibration in Matlab (The MathWorks Inc., Natick, Massachusetts). A more detailed comparison between pose3d and anipose is included in a subsequent section titled 'Comparison of pose3d and anipose'. Furthermore, we provide two demo datasets (details of which are provided in the next paragraph) illustrating the different capabilities of pose3d that users can refer to and get a quick-view of the features included in pose3d. Given the popularity of Matlab in academia and its well documented and easy to use core functions for camera calibration, we believe this toolbox will help make 3D reconstruction of tracked 2D behavior easier for researchers to use.
+To use pose3d for stereo camera calibration and 3D reconstruction the users need only to edit a configuration file to enter their experiment specific details. The configuration file is extensively commented to help users through the process of editing it. Running the main function of the repository ‘main_pose3d.m’ with the user’s configuration file, automatically creates the folder structure required for stereo camera calibration and 3D reconstruction. Then, pose3d launches Matlab’s stereoCameraCalibrator GUI sequentially for every pair of primary and secondary cameras to estimate the parameters of the cameras in the user’s experimental setup. The GUI first parses through checkerboard frames recorded simultaneously from the two cameras being calibrated to select frames in which checkerboard can be detected in both cameras. At this point, the user can select the number of distortion coefficients which determines the degree of polynomial used to estimate distortion. If the lens used in the experiment causes higher distortion of images, 3 distortion coefficients can be used instead of the 2 (default) by simply selecting the radio-button on the GUI corresponding to it. After this, clicking the calibrate button on the GUI, estimates camera parameters and relative positioning of secondary camera with respect to the primary. The GUI also displays reprojection errors across calibration frames allowing users to iteratively recalibrate after removing outliers (this can be done by right-clicking on the data browser of the GUI and selecting the option remove and recalibrate). After calibration is complete, the session can be saved (by clicking the save as button). This procedure is then repeated for all camera pairs to be calibrated. pose3d prompts users throughout this procedure, providing messages on the next steps to be carried out.
+After calibrating all cameras in the setup with respect to the primary camera, the 2D feature of interest tracked across cameras are corrected for distortion and the 3D coordinates estimated by triangulation. The function for triangulation across ‘n’ cameras is called ‘triangulate_ncams’. This function is available for users in three modes that can be selected based on prior knowledge of the experimental setup. The first mode of triangulation is referred to as ‘all’ and uses data from all ‘n’ cameras that are used to track the feature in 2D. The second mode is referred to as ‘avg’ and performs triangulation over all pairs of cameras in the setup and provides the average over all pairs as the result. The third mode is referred to as ‘best-pair’ and selects the best camera pair for every time point and feature of interest for triangulation. While the first and second modes can be used for 3D reconstruction of features tracked with any software, the third mode is applicable only when tracked 2D features are associated with likelihood values, e.g., as provided by DLC.
+Overall, pose3d offers a semi-automated 3D reconstruction workflow going beyond Matlab’s existing functions (details in section titled ‘Comparison of pose3d to existing Matlab functions') that takes users through the entire process of camera calibration, undistortion, triangulation, and post processing steps such as filtering to reduce outliers. pose3d also allows users to try different pre- and post-processing parameters and visualizes its effect on the accuracy of 3D reconstruction to help perform manual parameter tuning before saving final results. Other implementations of similar functionality as pose3d exist and use OpenCV (@opencv) in python: Anipose (@Anipose). However, in comparison to OpenCV based implementations, pose3d provides a user-friendly application by integrating with a feature-rich graphical user interface (GUI) for stereo camera calibration in Matlab. A more detailed comparison between pose3d and anipose is included in a subsequent section 'Comparison of pose3d and anipose'. Furthermore, we provide two demo datasets (see next paragraph) illustrating the capabilities of pose3d and a quick-view of the included features. Given the popularity of Matlab in academia and its well documented and easy to use core functions for camera calibration, we believe this toolbox will help make 3D reconstruction of tracked 2D behavior easier to use.
# Demo datasets and error measurement in 3D reconstruction -Demo datasets provided in pose3d were acquired by moving a Rubik’s cube on a table along different directions and by recording it simultaneously from a 5-camera tracking system. For the demos, we track the corners of the cube using DLC in the first example and manually in the second example. Manual annotations are used to mimic the usage of pose3d for any other 2D tracking software.
-The key difference between the two examples is as follows. DLC, in addition to 2D tracking provides users with a likelihood value for every tracked feature that informs the users on how confident the network is about the inferred location of a particular feature of interest at any given time point. pose3d makes use of this information by applying a threshold and automatically selecting the cameras that cross this threshold for 3D reconstruction. From the 2D tracked corners we use pose3d to track corners in 3D for 1000 example frames with DLC and 20 example frames with manual annotations. Following this, we reconstruct the edges of the cube and compare it to the standard edge length of a Rubik’s cube (57 mm). In the demo data using DLC based 2D annotations, we obtain on average an error of 1.39 mm in 3D reconstructed edge lengths computed over all 12 edges of the cube across 1000 example frames. For the demo data using manual annotations across 20 frames we obtain an average error of 1.16 mm over all 12 edges computed over 20 manually annoted frames. Furthermore, using ‘all’ mode of triangulation provided significantly better results in both our demo datasets than the other two modes of triangulation (comparison tests for the 3 modes of 3D reconstruction included in the demo functions for reference).
+Demo datasets provided in pose3d were acquired by moving a Rubik’s cube on a table along different directions and by recording it simultaneously from a 5-camera tracking system. For the demos, we tracked the corners of the cube using DLC in the first example and manually in the second example. Manual annotations are used to mimic the usage of pose3d for any other 2D tracking software.
+The key difference between the two examples is as follows. DLC, in addition to 2D tracking provides users with a likelihood value for every tracked feature that informs the users on how confident the network is about the inferred location of a particular feature of interest at any given time point. pose3d makes use of this information by applying a threshold and automatically selecting the cameras that cross this threshold for 3D reconstruction. From the 2D tracked corners we use pose3d to track corners in 3D for 1000 example frames with DLC and 20 example frames with manual annotations. Following this, we reconstruct the edges of the cube and compare it to the standard edge length of a Rubik’s cube (57 mm). In the demo data using DLC based 2D annotations, we obtain on average an error of 1.39 mm in 3D reconstructed edge lengths computed over all 12 edges of the cube across 1000 example frames. For the demo data using manual annotations across 20 frames we obtain an average error of 1.16 mm over all 12 edges computed over 20 manually annotated frames. Furthermore, using ‘all’ mode of triangulation provided significantly better results in both of our demo datasets than the other two modes of triangulation (comparison tests for the 3 modes of 3D reconstruction included in the demo functions for reference).
# Comparison of pose3d to existing Matlab functions -pose3d is a 3D reconstruction workflow integrated closely with functions from Matlab’s computer vision toolbox. This workflow fixes one of the cameras in the experimental setup as primary which is crucial to ensure that the data points across camera pairs are within the same coordinate system. The stereo camera calibration part of our implementation can be considered a wrapper around functions from Matlab’s computer vision toolbox, however, the triangulation function included with pose3d provides an extension to the existing function in Matlab for triangulation which is currently for use with 2 cameras only.
+pose3d is a 3D reconstruction workflow integrated closely with functions from Matlab’s computer vision toolbox. This workflow fixes one of the cameras in the experimental setup as primary which is crucial to ensure that the data points are within the same coordinate system across camera pairs. The stereo camera calibration part of our implementation can be considered a wrapper around functions from Matlab’s computer vision toolbox, however, the triangulation function included with pose3d extends the existing triangulation function in Matlab that operates only for 2 cameras.
# Comparison of pose3d and anipose -Functionally, pose3d and anipose provide users with pre- and post-processing tools for 3D reconstruction including the process of camera calibration. The main difference is that pose3d provides a user-friendly GUI utility which is missing in Anipose. With pose3d by entering experiment details in configuration file and running a single function in Matlab namely ‘main_pose3d.m’ the user can perform the entire procedure of 3D reconstruction. Furthermore, integration of pose3d with stereoCameraCalibrator GUI in Matlab makes the task of camera calibration very intuitive. This allows users to estimate camera and lens distortion parameters, select outlier frames from calibration with the click of a button. -On the other hand, anipose uses OpenCV functions which can be quite difficult for non-experts to get into. For instance, with pose3d, due to its integration with Matlab’s stereoCameraCalibrator GUI, all pairs of images used for camera calibration are visualized along with their reprojection errors allowing easy identification and removal of images with larger error values. With OpenCV functions, users get error messages that are often quite technical in nature and can take longer for non-experts to perform root-cause analysis. Lastly, with pose3d demo datasets including example calibration images are included along with codes to run 3D reconstruction on demo datasets. Furthermore, tutorial videos illustrating the usage of pose3d are included.
-Some other minor similarities and differences between anipose and pose3d include the following. First, both anipose and pose3d allow users to have the same set of calibration files across sessions or have a separate set per session. This can be decided by the user depending on if the user wants to have the same calibration files across sessions or not. This is possible in pose3d by having the same experiment path (assigned to variable exp_path) and changing only the experiment name (assigned to variable exp_name) in the configuration file when the same calibration files are to be used across sessions. With anipose this is implemented by having a hierarchy for the selection of calibration videos placed in different folders. Second, the triangulation approach used is the same in the two packages. However, with pose3d for comparison purposes we have included 3 modes of 3D reconstruction namely 'all', 'avg' and 'best-pair'. Of these three modes 'all' corresponds to the one implemented in anipose. On our demo dataset, we have shown 'all' mode to give the lowest on average error and is the recommended mode. Third, anipose allows for camera calibration using charuco boards, checker boards or aruco boards of which it recommends the usage of charuco and checker boards. Since pose3d integrates with the camera calibration GUI from MATLAB, only checker board is used for calibration.
+Functionally, pose3d and anipose provide users with pre- and post-processing tools for 3D reconstruction including the process of camera calibration. The main difference is that pose3d provides a user-friendly GUI utility which is missing in Anipose. With pose3d, by entering the experimental details in a configuration file and running a single Matlab function (‘main_pose3d.m’), the user can perform the entire procedure of 3D reconstruction. Furthermore, the integration of pose3d with the stereoCameraCalibrator GUI in Matlab makes the task of camera calibration very intuitive. This allows users to estimate camera and lens distortion parameters, select outlier frames from calibration with the click of a button. On the other hand, anipose uses OpenCV functions that can be quite difficult to operate for non-experts. For instance, with pose3d, due to its integration with Matlab’s stereoCameraCalibrator GUI, all pairs of images used for camera calibration are visualized along with their reprojection errors, which allows easy identification and removal of images with larger error values. With OpenCV functions, users get error messages that are often quite technical in nature and can take longer for non-experts to perform root-cause analysis. Lastly, pose3d provides demo datasets including example calibration images along with code to run 3D reconstructions on demo datasets as well as video tutorials to illustrate its operation.
+Some other minor similarities and differences between anipose and pose3d include the following. First, both anipose and pose3d allow users to have the same set of calibration files across sessions or have a separate set per session. This can be decided by the user depending on whether the user wants to have the same calibration files across sessions or not. This is possible in pose3d by having the same experiment path (assigned to variable exp_path) and changing only the experiment name (assigned to variable exp_name) in the configuration file in case the same calibration files are used across sessions. With anipose this is implemented by having a hierarchy for the selection of calibration videos placed in different folders. Second, the triangulation approach used is the same in the two packages. However, with pose3d we have included 3 modes of 3D reconstruction including 'all', 'avg', and 'best-pair'. Of these three modes 'all' corresponds to the one implemented in anipose. With our demo dataset, we have shown that the 'all' mode gives the lowest on average error and is therefore recommended. Third, anipose allows for camera calibration using charuco boards, checker boards or aruco boards of which it recommends the usage of charuco and checker boards. Since pose3d integrates with the camera calibration GUI from MATLAB, only checker board is used for calibration.
For further technical reading on details of triangulation for 3D reconstruction please refer to our [supporting document](Appendix.pdf)
# References