If our project is helpful for your research, please consider citing :
@inproceedings{wang2022tokencut,
title={Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut},
author={Wang, Yangtao and Shen, Xi and Hu, Shell Xu and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
booktitle={Conference on Computer Vision and Pattern Recognition}
year={2022}
}
02/26/2022
A simple TokenCut Colab Demo is available.
02/21/2022
Initial commit: Code of TokenCut is released, including evaluation of unsupervised object discovery, unsupervised saliency object detection, weakly supervised object locolization.
2. Installation
2.1 Dependencies
This code was implemented with Python 3.7, PyTorch 1.7.1 and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed :
pip install torch==1.7.1 torchvision==0.8.2
In order to install the additionnal dependencies, please launch the following command:
pip install -r requirements.txt
2.2 Data
We provide quick download commands in DOWNLOAD_DATA.md for VOC2007, VOC2012, COCO, CUB, ImageNet, ECSSD, DUTS and DUT-OMRON as well as DINO checkpoints.
3. Quick Start
3.1 Detecting an object in one image
We provide TokenCut visualization for bounding box prediction and attention map. Using all for all visualization results.
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize pred
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize attn
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize all
3.2 Segmenting a salient region in one image
We provide TokenCut segmentation results as follows:
cd unsupervised_saliency_detection
python get_saliency.py --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --vit-arch small --patch-size 16 --img-path ../examples/VOC07_000036.jpg --out-dir ./output
4. Evaluation
Following are the different steps to reproduce the results of TokenCut presented in the paper.
4.1 Unsupervised object discovery
PASCAL-VOC
In order to apply TokenCut and compute corloc results (VOC07 68.8, VOC12 72.1), please launch:
mkdir features
python main_lost.py --dataset VOC07 --set trainval --save-feat-dir features/VOC2007
COCO
Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 58.8), following previous litterature. To be noted that the 20k images are a subset of the train set.
(CVPR 2022) TokenCut
Pytorch implementation of Tokencut:
Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut
Yangtao Wang, Xi Shen, Shell Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz
[Project page] [ Github (Video Segmentation) ] [Paper]
data:image/s3,"s3://crabby-images/0f3e3/0f3e361837647fdf401850b745b1a643f99cb0cd" alt="Hugging Face Spaces"
If our project is helpful for your research, please consider citing :
Table of Content
1. Updates
09/06/2022 Extension work of TokeCut Video Segmentation is realised!
03/10/2022 Creating a 480p Demo using Gradio. Try out the Web Demo:data:image/s3,"s3://crabby-images/0f3e3/0f3e361837647fdf401850b745b1a643f99cb0cd" alt="Hugging Face Spaces"
Internet image results:
02/26/2022 Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:data:image/s3,"s3://crabby-images/0f3e3/0f3e361837647fdf401850b745b1a643f99cb0cd" alt="Hugging Face Spaces"
02/26/2022 A simple TokenCut Colab Demo is available.
02/21/2022 Initial commit: Code of TokenCut is released, including evaluation of unsupervised object discovery, unsupervised saliency object detection, weakly supervised object locolization.
2. Installation
2.1 Dependencies
This code was implemented with Python 3.7, PyTorch 1.7.1 and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed :
In order to install the additionnal dependencies, please launch the following command:
2.2 Data
We provide quick download commands in DOWNLOAD_DATA.md for VOC2007, VOC2012, COCO, CUB, ImageNet, ECSSD, DUTS and DUT-OMRON as well as DINO checkpoints.
3. Quick Start
3.1 Detecting an object in one image
We provide TokenCut visualization for bounding box prediction and attention map. Using
all
for all visualization results.3.2 Segmenting a salient region in one image
We provide TokenCut segmentation results as follows:
4. Evaluation
Following are the different steps to reproduce the results of TokenCut presented in the paper.
4.1 Unsupervised object discovery
PASCAL-VOC
In order to apply TokenCut and compute corloc results (VOC07 68.8, VOC12 72.1), please launch:
If you want to extract Dino features, which corresponds to the KEY features in DINO:
COCO
Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 58.8), following previous litterature. To be noted that the 20k images are a subset of the
train
set.Different models
We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).
Previous results on the dataset
VOC07
can be obtained by launching:4.2 Unsupervised saliency detection
To evaluate on ECSSD, DUTS, DUT_OMRON dataset:
This should give:
4.3 Weakly supervised object detection
Fintune DINO small on CUB
To finetune ViT-S/16 on CUB on a single node with 4 gpus for 1000 epochs run:
Evaluation on CUB
To evaluate a fine-tuned ViT-S/16 on CUB val with a single GPU run:
This should give:
Evaluate on Imagenet
To Evaluate ViT-S/16 finetuned on ImageNet val with a single GPU run:
5. Acknowledgement
TokenCut code is built on top of LOST, DINO, Segswap, and Bilateral_Sovlver. We would like to sincerely thanks those authors for their great works.