Introduction

Current person re-identification (re-id) methods typically rely on single-frame imagery features, and ignore space-time information from image sequences. Single-frame (single-shot) visual appearance matching is inherently limited for person re-id in public spaces due to visual ambiguity arising from non-overlapping camera views where viewpoint and lighting changes can cause significant appearance variation. In this work, we present a novel model to automatically select the most discriminative video fragments from noisy image sequences of people where more reliable space-time features can be extracted, whilst simultaneously to learn a video ranking function for person re-id. Also, we introduce a new image sequence re-id dataset (iLIDS-VID) based on the i-LIDS MCT benchmark data. Using the iLIDS-VID and PRID 2011 sequence re-id datasets, we extensively conducted comparative evaluations to demonstrate the advantages of the proposed model over contemporary gait recognition, holistic image sequence matching and state-of-the-art single-shot/multi-shot based re-id methods.

Contribution Highlights

  1. We derive a multi-fragments based space-time feature representation of image sequences of people. This representation is based on a combination of HOG3D features and optic ow energy profile over each image sequence, designed to break down automatically unregulated video clips of people into multiple fragments.
  2. We propose a discriminative video ranking model for cross-view re-identification by simultaneously selecting and matching more reliable space-time features from video fragments. The model is formulated using a multi-instance ranking strategy for learning from pairs of image sequences over non-overlapping camera views. This method can significantly relax the strict assumptions required by gait recognition techniques.
  3. We introduce a new image sequence based person re-identification dataset called iLIDS-VID, extracted from the i-LIDS Multiple-Camera Tracking Scenario (MCTS). To our knowledge, this is the largest image sequence based re-identification dataset that is publically available.

Citation

  1. Person Re-Identification by Video Ranking.
    T. Wang, S. Gong, X. Zhu, and S. Wang.
    In Proc. European Conference on Computer Vision, Zurich, Switzerland, September 2014.  
    [PDF]   [CMC]   [Data Splits]   [Spotlight Video]  
  2. Person Re-Identification by Discriminative Selection in Video Ranking.
    T. Wang, S. Gong, X. Zhu, and S. Wang.
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, No. 12, pp. 2501-2514, December 2016.  
    [PDF]   [arXiv]   [CMC]  

Images

Overview of our approach:

Discriminative Video Ranking (DVR) model learning pipeline:

  1. Generating candidate fragment pools by Flow Energy Profiling (FEP)
  2. Creating candidate fragment pairs as positive and negative instances
  3. Simultaneously selecting and ranking the most discriminative fragment pairs.

Datasets:

Examples of two image sequence based re-id datasets.

Evaluation:

Comparison of person re-id performance between different methods using Cumulated Matching Characteristics (CMC).

Dataset

iLIDS Video re-IDentification (iLIDS-VID) Dataset

An image sequence based person re-identification dataset captured in a crowded public space.

Details ...