MaGGIe utilizes the V-HIM2K5 and V-HIM60 datasets, categorizing video instance matting into three difficulty levels based on occlusion and overlap.MaGGIe utilizes the V-HIM2K5 and V-HIM60 datasets, categorizing video instance matting into three difficulty levels based on occlusion and overlap.

Video Data Synthesis: Categorizing Matting Difficulty by Instance Overlap

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

\ Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

9. Video matting

This section elaborates on the video matting aspect of our work, providing details about dataset generation and offering additional quantitative and qualitative analyses. For an enhanced viewing experience, we recommend visit our website, which contains video samples from V-HIM60 and real video results of our method compared to baseline approaches.

9.1. Dataset generation

To create our video matte dataset, we utilized the BG20K dataset for backgrounds and incorporated video backgrounds from VM108. We allocated 88 videos for training and 20 for testing, ensuring each video was limited to 30 frames. To maintain realism, each instance within a video displayed an equal number of randomly selected frames from the source videos, with their sizes adjusted to fit within the background height without excessive overlap.

\ We categorized the dataset into three levels of difficulty, based on the extent of instance overlap:

\ • Easy Level: Features 2-3 distinct instances per video with no overlap.

\ • Medium Level: Includes up to 5 instances per video, with occlusion per frame ranging from 5 to 50%.

\ • Hard Level: Also comprises up to 5 instances but with a higher occlusion range of 50 to 85%, presenting more complex instance interactions.

\ During training, we applied dilation and erosion kernels to binarized alpha mattes to generate input masks. For testing purposes, masks were created using the XMem technique, based on the first-frame binarized alpha matte.

\ We have prepared examples from the testing dataset across all three difficulty levels, which can be viewed in the website for a more immersive experience. The datasets V-HIM2K5 and V-HIM60 will be made publicly available following the acceptance of this work.

\

:::info Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.