Skip to content

Official implementation of the ECCV 2024 paper "GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views"

Notifications You must be signed in to change notification settings

yanivw12/gs2mesh

Repository files navigation

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf · Amit Bracha · Ron Kimmel

ECCV 2024

pipeline

We introduce a novel pipeline for multi-view 3D reconstruction from uncontrolled videos, capable of extracting smooth meshes from noisy Gaussian Splatting clouds. While other geometry extraction methods rely on the location of the Gaussian elements, and attempt to regularize them using geometric constraints, we simply utilize a pre-trained stereo model as a real-world geometric prior, that can extract accurate depths from every view, and fuse them together to create a smooth mesh, without any further optimization needed. Our method achieves state-of-the art reconstruction results, and only requires a short overhead on top of the GS optimization.

Please read "Common Issues and Tips" before opening a new issue!

Updates

  • 2024/09/11:
    • Small fix in renderer.
    • Removed old functions.
  • 2024/09/01:
    • Added support for automatic masking using GroundingDINO! Now you can remove background automatically or reconstruct a specific object in the scene using a text prompt as input to the script. See "Custom Data" section for more details.
    • Improved visualizations in custom_data.ipynb - You can now see which areas of the scene will be taken into account in the TSDF process when visualizing the renderer poses.
    • Cleaned function calls.
    • Improved handling for non-360 scenes , which is initiated by a parameter no-renderer_scene_360 for scenes which aren't 360 (for example, DTU scenes). If your scene is taken from a partial sphere, set this parameter to get a better estimation of the scene radius and the required horizontal baseline.
  • 2024/08/22:
    • Added option for local SAM2 weights.
  • 2024/08/21:
    • Small bug fix in renderer initialization.
  • 2024/08/14:
    • Added support for SAM2! You can now extract meshes of specific objects with ease using the interactive notebook custom_data.ipynb.
    • Changed default CUDA to 11.8 and default python to 3.8 for better compatibility. If you have already installed a previous version of the environment, please remove it using "conda remove -n gs2mesh --all" and re-install it using the instructions below.
    • Improved COLMAP on custom data to fix distortion.
    • Added support for downsampling - relevant especially for MipNeRF360 dataset.
  • 2024/07/30: Code is released! Includes preprocessing, reconstruction and evaluation code for DTU, TNT and MobileBrick, as well as code for reconstruction of custom data and MipNerf360.

Future Work

  • Add support for additional state-of-the-art GS models, and the gsplat framework
  • Add support for automatic background removal with SAM2 (without the need for an interactive prompt)
  • Add support for SAM2 in the interactive notebook
  • Release Google Colab demo
  • Release notebook for interactive reconstruction of custom data
  • Release reconstruction/evaluation code for DTU, TNT, MobileBrick, MipNerf360

Installation

Environment

We tested the environment on Ubuntu 20.04, with Python 3.8.18 and CUDA 11.8, with an Nvidia A40/L40 GPU.

First, clone the repository:

git clone https://github.com/yanivw12/gs2mesh.git

Then, create the environment, activate it and install dependencies:

# create the environment
conda create --name gs2mesh python=3.8
# activate the environment
conda activate gs2mesh
# install conda dependencies
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 cudatoolkit=11.8 colmap -c pytorch -c nvidia -c conda-forge
# install additional dependencies
cd gs2mesh
pip install -r requirements.txt

Download DLNR Stereo weights

# create the folder for the pretrained models
cd third_party/DLNR
mkdir pretrained
cd pretrained
# download Middlebury model
wget https://github.com/David-Zhao-1997/High-frequency-Stereo-Matching-Network/releases/download/v1.0.0/DLNR_Middlebury.pth
# download Sceneflow model
wget https://github.com/David-Zhao-1997/High-frequency-Stereo-Matching-Network/releases/download/v1.0.0/DLNR_SceneFlow.pth

Download SAM2 weights

SAM2 local weights is optional, as it automatically downloads weights using Huggingface. If you are still interested in local weights, download them and set the parameter masker_SAM2_local to True if using local weights.

# navigate to the folder for the pretrained model
cd third_party/segment-anything-2/checkpoints
# download pretrained model
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt

Download GroundingDINO weights

# create the folder for the pretrained model
cd third_party/GroundingDINO
mkdir weights
cd weights
# download pretrained model
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Datasets

DTU

For the DTU dataset, download the preprocessed dataset from 2DGS, the DTU sample set with observation masks from the DTU website and the ground truth point clouds from the DTU website as well.

The dataset should look as follows, under gs2mesh/data/DTU/:

DTU
|---SampleSet
|	|---...
|	|---MVS_Data
|	|	|---...
|	|	|---Points
|---scan24
|	|---...
|	|---depths
|	|---images
|---...

Notice that the ground truth points are placed in gs2mesh/SampleSet/MVS_Data/.

To run the reconstruction and evaluation, run the command

python run_and_evaluate_dtu.py

For additional parameters, see the 'parameters' table below.

The output of the evaluation should be in gs2mesh/evaluation/DTU/eval_output/.

Tanks and Temples

For the TNT dataset, download the training dataset from the TNT website. You will need the image set, Camera Poses, Alignment, Cropfiles and ground truth. The dataset should be organized as follows, under gs2mesh/data/TNT/:

TNT
|---Barn
|	|---images_raw
|	|	|---000001.jpg
|	|	|---...
|	|---Barn_COLMAP_SfM.log
|	|---Barn_trans.txt
|	|---Barn.json
|	|---Barn.ply
|---...

Run the preprocessing code:

python preprocess_tnt.py

Afterwards, the dataset should look as follows:

TNT
|---Barn
|	|---images
|	|	|---000001.jpg
|	|	|---...
|	|---sparse
|	|	|---0
|	|	|	|---cameras.bin
|	|	|	|---cameras.txt
|	|	|	|---images.bin
|	|	|	|---images.txt
|	|	|	|---points3D.bin
|	|	|	|---points3D.txt
|	|---Barn_COLMAP_SfM.log
|	|---Barn_trans.txt
|	|---Barn.json
|	|---Barn.ply
|	|---database.db
|---...

To run the reconstruction, run the command

python run_tnt.py

For additional parameters, see the 'parameters' table below.

The TNT evaluation code requires an older version of Open3D (9.0). create a separate Conda environment with the following libraries:

matplotlib>=1.3
open3d==0.9

Activate it, and run the command

python evaluate_tnt.py

Use the same parameters that you used for the reconstruction, as it constructs a ply filename based on these parameters.

The output of the evaluation should be in gs2mesh/evaluation/TNT/eval_output/.

**Note: The results reported in the paper are from an older version where the RADIAL camera model was used as input to GS (an older GS version as well). The current results using the evaluation code with the updated GS model and a PINHOLE camera model are slightly better:

TNT Results
Barn Caterpillar Ignatius Truck Mean
Paper (RADIAL) 0.21 0.17 0.64 0.46 0.37
Code (PINHOLE) 0.33 0.20 0.68 0.47 0.42

MobileBrick

For the MobileBrick dataset, download the dataset from the official site and extract it into gs2mesh/data/MobileBrick/.

Run the preprocessing code:

python preprocess_mobilebrick.py

To run the reconstruction and evaluation, run the command

python run_and_evaluate_mobilebrick.py

For additional parameters, see the 'parameters' table below.

The output of the evaluation should be in gs2mesh/evaluation/MobileBrick/eval_output/.

**Note: In the results reported in the paper, we used a horizontal baseline of 14% of the scene radius instead of 7%. The results for the 7% baseline are slightly worse (most likely due to the camera views being above the object, making the scene radius significantly different from the camera location radius which is used to calculate the baseline):

MobileBrick Results
Accuracy (2.5mm) Recall (2.5mm) F1 (2.5mm) Accuracy (5mm) Recall (5mm) F1 (5mm) Chamfer Distance
Paper (14%) 68.77 69.27 68.94 89.46 87.37 88.28 4.94
Code (7%) 67.63 69.00 68.20 88.57 87.36 87.85 5.05

MipNerf360

For the MipNerf360 dataset, download the dataset from the official site and extract it into gs2mesh/data/MipNerf360/.

Run the preprocessing code:

python preprocess_mipnerf360.py

To run the reconstruction, run the command

python run_mipnerf360.py

Results are saved in gs2mesh/output/MipNerf360_<params>.

**Note: Since the original resolution of the MipNeRF360 scenes is very large, we added a downsample argument that can reduce the size of the images before processing.

Custom Data

Place your data under gs2mesh/data/custom/<data_name>.

If your input is a video:

custom
|---<data_name>
|	|---<data_name>.<extension>

If your input is a set of images:

custom
|---<data_name>
|	|---images
|	|	|---000001.png
|	|	|---...

If your input is a COLMAP dataset:

custom
|---<data_name>
|	|---images
|	|	|---000001.png
|	|	|---...
|	|---sparse
|	|	|---0
|	|	|	|---cameras.bin
|	|	|	|---cameras.txt
|	|	|	|---images.bin
|	|	|	|---images.txt
|	|	|	|---points3D.bin
|	|	|	|---points3D.txt

You can either run an interactive notebook custom_data.ipynb, which also has visualizations and allows masking of objects interactively, or run a script run_single.py and use the automatic masking parameters:

# if your input is a video
python run_single.py --colmap_name <data_name> --video_extension <video_extension without '.'>
# if your input is a set of images
python run_single.py --colmap_name <data_name> --skip_video_extraction
# if your input is a COLMAP dataset
python run_single.py --colmap_name <data_name> --skip_video_extraction --skip_colmap

# For automatic masking add params:
--masker_automask --masker_prompt <object_name>
# (default object name is main_object which works on most object-centered scenes)

Results are saved in gs2mesh/output/custom_<params>.

Parameters
Parameter Description Custom DTU TNT MobileBrick MipNerf360
colmap_name Name of the directory with the COLMAP sparse model sculpture scan24 Ignatius aston counter
dataset_name Name of the dataset. Options: custom, DTU, TNT, MobileBrick, MipNerf360 custom DTU TNT MobileBrick MipNerf360
experiment_folder_name Name of the experiment folder None None None None None
downsample Downsampling factor 1 1 1 1 3
GS_iterations Number of Gaussian Splatting iterations 30000 30000 30000 30000 30000
GS_save_test_iterations Gaussian Splatting test iterations to save [7000, 30000] [7000, 30000] [7000, 30000] [7000, 30000] [7000, 30000]
GS_white_background Use white background in Gaussian Splatting False False False False False
renderer_max_images Maximum number of images to render 1000 1000 1000 1000 1000
renderer_baseline_absolute Absolute value of the renderer baseline (None uses 7 percent of scene radius) None None None None None
renderer_baseline_percentage Percentage value of the renderer baseline 7 7 7 14 7
renderer_scene_360 Scene is a 360 scene True True True True True
no-renderer_scene_360 Scene is not a 360 scene False False False False False
renderer_folder_name Name of the renderer folder (None uses the colmap name) None None None None None
renderer_save_json Save renderer data to JSON (default True - disable with no-renderer_save_json) True True True True True
no-renderer_save_json Disable renderer_save_json False False False False False
renderer_sort_cameras Sort cameras in the renderer (True if using unordered set of views) False False False False False
stereo_model Stereo model to use DLNR_Middlebury DLNR_Middlebury DLNR_Middlebury DLNR_Middlebury DLNR_Middlebury
stereo_occlusion_threshold Occlusion threshold for stereo model (Lower value masks out more areas) 3 3 3 3 3
stereo_warm Use the previous disparity as initial disparity for current view (disable with no-stereo_warm) False True True True False
no-stereo_warm Disable stereo_warm ---- False False False ----
stereo_shading_eps Small value used for visualization of the depth gradient. Adjusted according to the scale of the scene. 1e-4 1e-4 1e-4 1e-4 1e-4
masker_automask Use GroundingDINO for automatic object detection for masking with SAM2 False ---- ---- ---- ----
masker_prompt Prompt for GroundingDINO main_object ---- ---- ---- ----
masker_SAM2_local Use local SAM2 weights False ---- ---- ---- ----
TSDF_scale Fix depth scale 1.0 1.0 1.0 0.1 1.0
TSDF_dilate Take every n-th image (1 to take all images) 1 1 1 1 1
TSDF_valid Choose valid images as a list of indices (None to ignore) None None None None None
TSDF_skip Choose non-valid images as a list of indices (None to ignore) None None None None None
TSDF_use_occlusion_mask Ignore occluded regions in stereo pairs for better geometric consistency (default True - disable with no-TSDF_use_occlusion_mask) True True True True True
no-TSDF_use_occlusion_mask Disable TSDF_use_occlusion_mask False False False False False
TSDF_use_mask Use object masks (optional) (default True - disable with no-TSDF_use_mask) False True False True False
no-TSDF_use_mask Disable TSDF_use_mask ---- False ---- False ----
TSDF_invert_mask Invert the background mask for TSDF False False False False False
TSDF_erode_mask Erode masks in TSDF (default True - disable with no-TSDF_erode_mask) True True True True True
no-TSDF_erode_mask Disable TSDF_erode_mask False False False False False
TSDF_erosion_kernel_size Erosion kernel size in TSDF 10 10 10 10 10
TSDF_closing_kernel_size Closing kernel size in TSDF 10 10 10 10 10
TSDF_voxel Voxel size (voxel length is TSDF_voxel/512) 2 2 2 2 2
TSDF_sdf_trunc SDF truncation in TSDF 0.04 0.04 0.04 0.04 0.04
TSDF_min_depth_baselines Minimum depth baselines in TSDF 4 4 2 4 4
TSDF_max_depth_baselines Maximum depth baselines in TSDF 20 20 10 20 15
TSDF_cleaning_threshold Minimal cluster size for clean mesh 100000 100000 100000 10000 100000
video_extension Video file extension mp4 ---- ---- ---- ----
video_interval Extract every n-th frame - aim for 3fps 10 ---- ---- ---- ----
GS_port GS port number (relevant if running several instances at the same time) 8080 8080 8080 8080 8080
skip_video_extraction Skip the video extraction stage False True True True True
skip_colmap Skip the COLMAP stage False True True True True
skip_GS Skip the GS stage False False False False False
skip_rendering Skip the rendering stage False False False False False
skip_masking Skip the masking stage False False False False False
skip_TSDF Skip the TSDF stage False False False False False
scans Scan names/numbers ---- [24, 37, 40, 55, 63, 65, 69, 83, 97, 105, 106, 110, 114, 118, 122] [Barn, Caterpillar, Ignatius, Truck] [aston, audi, beetles, big_ben, boat, bridge, cabin, camera, castle, colosseum, convertible, ferrari, jeep, london_bus, motorcycle, porsche, satellite, space_shuttle] [counter, garden]

Common Issues and Tips

Below are some common issues that have risen by users. Please note that we are constantly improving the code and adding new features according to user feedback.

  • TSDF integration gets killed/segfault. This is a bug with Open3D ScalableTSDFVolume. It seems to happen in older versions of Ubuntu such as 18.04, and in newer versions of Python (from our experiments, it happened on Python 3.9 and 3.10, and worked on Python 3.7 and 3.8). To avoid this bug, we recommend using Ubuntu 20.04 with Python 3.8 and Open3D 0.17.0.
  • Mesh not visible. The main culprit is usually TSDF_min/max_depth_baselines. It's a parameter that truncates the depth maps before TSDF fusion, expressed as a multiple of the horizontal baseline. The default for a custom 360 scene with 7% baseline is between 4 and 20 baselines. If the object is not visible, it usually means that your baseline is too small. If your scene is not a 360 scene (for example, the DTU dataset), use a larger baseline. To avoid dealing with the min/max depth baselines, you can segment the object using SAM2 in the interactive notebook custom_data.py or using the parameters --masker_automask and optionally --masker_prompt <object> as input to the script run_single.py, and set TSDF_max_depth_baselines to a high value that won't affect truncation.
  • Poor mesh quality. There are many factors that contribute to the quality of the mesh:
    • The video should cover the reconstructed object completely. We recommend the following practices for a good video:
      • Filming in landscape.
      • If the object is larger, do 2 cycles around the object from 2 different heights and angles, while keeping the object centered and close to the camera. It's ok if the object is not fully in the frame at all times.
      • When extracting the video, make sure you maintain around 3 FPS. Lower frame rates can cause COLMAP/GS to fail or produce low quality reconstructions.
      • We recommend a resolution of 1920x1080, since larger resolutions can take much longer to process, and lower resolutions reduce the resulting reconstruction's quality.
    • Naturally, stereo matching models tend to struggle with transparent/textureless surfaces, and the resulting mesh might be noisy/missing in these places. We hope to improve robustness in an upcoming release. Setting the no-stereo_warm flag appears to help in this case, since it doesn't propagate errors between depth maps.
    • Sometimes, changing the horizontal baseline to a larger baseline for larger objects can improve stereo matching. Make sure to adjust TSDF_min/max_depth_baselines proportionally to how much you changed from the default baseline percentage value (for example, if you double the percentage from 7 to 14, then you need to divide the values of TSDF_min/max_depth_baselines by 2).
    • With custom datasets which come pre-computed poses (other than the ones officially supported), the TSDF_scale parameter might need to be adjusted (usually changed from 1 to 0.1), otherwise the resolution of the resulting mesh will be very low due to inconsistency with the scale in which the TSDF algorithm expects the data.
  • Code runs slow. There are several factors that can slow down the code:
    • Image size - Images larger than 1920x1080 take much longer to process in the stereo model and TSDF. set the downsample argument to reduce the size of the image at the beginning of the process.
    • Depth truncation - If the depth truncation for TSDF is too loose, the TSDF will struggle to integrate the background and take much longer. Make sure that you are either truncating depth to ignore the background, or using a segmentation mask on the object you want to reconstruct using the automatic masking tool or the interactive masking tool.

Acknowledgements

We thank the following works and repositories:

  • 3DGS for their pioneering work which made all this possible.
  • DLNR for their stereo model.
  • Open3D for the TSDF fusion algorithm.
  • SAM2 and GroundingDINO for the masking and object detection models.
  • 2DGS and GOF for their preprocessing code for TNT and the preprocessed DTU dataset and evaluation code.
  • Tanks and Temples and MobileBrick for their evaluation codes.

Citation

If you use GS2Mesh or parts of our code in your work, please cite the following paper:

@article{wolf2024surface,
	title={GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views},
	author={Wolf, Yaniv and Bracha, Amit and Kimmel, Ron},
	journal={arXiv preprint arXiv:2404.01810},
	year={2024}
}

License

The Gaussian Splatting library is under the Gaussian-Splatting License.

About

Official implementation of the ECCV 2024 paper "GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published