mediapipe/docs/solutions/media_sequence.md

211 lines
7.8 KiB
Markdown
Raw Normal View History

---
layout: default
title: Dataset Preparation with MediaSequence
parent: Solutions
nav_order: 15
---
# Dataset Preparation with MediaSequence
{: .no_toc }
<details close markdown="block">
<summary>
Table of contents
</summary>
{: .text-delta }
1. TOC
{:toc}
</details>
---
## Overview
MediaPipe is a useful and general framework for media processing that can
assist with research, development, and deployment of ML models. This example
focuses on development by demonstrating how to prepare video data for training
a TensorFlow model.
The MediaSequence library provides an extensive set of tools for storing data in
TensorFlow.SequenceExamples. SequenceExamples provide matched semantics to most
video tasks and are efficient to use with TensorFlow. The sequence semantics
allow for a variable number of annotations per frame, which is necessary for
tasks like video object detection, but very difficult to encode in
TensorFlow.Examples. The goal of MediaSequence is to simplify working with
SequenceExamples and to automate common preparation tasks. Much more information
is available about the MediaSequence pipeline, including how to use it to
process new data sets, in the documentation of
[MediaSequence](https://github.com/google/mediapipe/tree/master/mediapipe/util/sequence).
## Preparing an example data set
1. Checkout the mediapipe repository
```bash
git clone https://github.com/google/mediapipe.git
cd mediapipe
```
1. Compile the MediaSequence demo C++ binary
```bash
bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo --define MEDIAPIPE_DISABLE_GPU=1
```
MediaSequence uses C++ binaries to improve multimedia processing speed and
encourage a strong separation between annotations and the image data or
other features. The binary code is very general in that it reads from files
into input side packets and writes output side packets to files when
completed, but it also links in all of the calculators for necessary for the
MediaPipe graphs preparing the Charades data set.
1. Download and prepare the data set through Python
To run this step, you must have Python 2.7 or 3.5+ installed with the
TensorFlow 1.14+ package installed.
```bash
python -m mediapipe.examples.desktop.media_sequence.demo_dataset \
--path_to_demo_data=/tmp/demo_data/ \
--path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/media_sequence/media_sequence_demo \
--path_to_graph_directory=mediapipe/graphs/media_sequence/
```
The arguments define where data is stored. `--path_to_demo_data` defines
where the data will be downloaded to and where prepared data will be
generated. `--path_to_mediapipe_binary` is the path to the binary built in
the previous step. `--path_to_graph_directory` defines where to look for
MediaPipe graphs during processing.
Running this module
1. Downloads videos from the internet.
1. For each annotation in a CSV, creates a structured metadata file.
1. Runs MediaPipe to extract images as defined by the metadata.
1. Stores the results in numbered set of TFRecords files.
MediaSequence uses SequenceExamples as the format of both inputs and
outputs. Annotations are encoded as inputs in a SequenceExample of metadata
2023-02-23 17:34:02 +01:00
that defines the labels and the path to the corresponding video file. This
metadata is passed as input to the C++ `media_sequence_demo` binary, and the
output is a SequenceExample filled with images and annotations ready for
model training.
1. Reading the data in TensorFlow
To read the data in tensorflow, first add the repo to your PYTHONPATH
```bash
PYTHONPATH="${PYTHONPATH};"+`pwd`
```
and then you can import the data set in Python using
[read_demo_dataset.py](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/media_sequence/read_demo_dataset.py)
## Preparing a practical data set
As an example of processing a practical data set, a similar set of commands will
prepare the [Charades data set](https://allenai.org/plato/charades/). The
Charades data set is a data set of human action recognition collected with and
maintained by the Allen Institute for Artificial Intelligence. To follow this
code lab, you must abide by the
[license](https://allenai.org/plato/charades/license.txt) for the Charades data
set provided by the Allen Institute.
The Charades data set is large (~150 GB), and will take considerable time to
download and process (4-8 hours).
```bash
bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo --define MEDIAPIPE_DISABLE_GPU=1
python -m mediapipe.examples.desktop.media_sequence.charades_dataset \
--alsologtostderr \
--path_to_charades_data=/tmp/demo_data/ \
--path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/media_sequence/media_sequence_demo \
--path_to_graph_directory=mediapipe/graphs/media_sequence/
```
## Preparing your own data set
The process for preparing your own data set is described in the
[MediaSequence documentation](https://github.com/google/mediapipe/tree/master/mediapipe/util/sequence).
The Python code for Charades can easily be modified to process most annotations,
but the MediaPipe processing warrants further discussion. MediaSequence uses
MediaPipe graphs to extract features related to the metadata or previously
extracted data. Each graph can focus on extracting a single type of feature, and
graphs can be chained together to extract derived features in a composable way.
For example, one graph may extract images from a video at 10 fps and another
graph extract images at 24 fps. A subsequent graph can extract ResNet-50
features from the output of either preceding graph. MediaPipe enables a
composable interface of data process for machine learning at multiple levels.
The MediaPipe graph with brief annotations for adding images to a data set is as
follows. Common changes would be to change the frame_rate or encoding quality of
frames.
```
# Convert the string input into a decoded SequenceExample.
node {
calculator: "StringToSequenceExampleCalculator"
input_side_packet: "STRING:input_sequence_example"
output_side_packet: "SEQUENCE_EXAMPLE:parsed_sequence_example"
}
# Unpack the data path and clip timing from the SequenceExample.
node {
calculator: "UnpackMediaSequenceCalculator"
input_side_packet: "SEQUENCE_EXAMPLE:parsed_sequence_example"
output_side_packet: "DATA_PATH:input_video_path"
output_side_packet: "RESAMPLER_OPTIONS:packet_resampler_options"
options {
[type.googleapis.com/mediapipe.UnpackMediaSequenceCalculatorOptions]: {
base_packet_resampler_options {
frame_rate: 24.0
base_timestamp: 0
}
}
}
}
# Decode the entire video.
node {
calculator: "OpenCvVideoDecoderCalculator"
input_side_packet: "INPUT_FILE_PATH:input_video_path"
output_stream: "VIDEO:decoded_frames"
}
# Extract the subset of frames we want to keep.
node {
calculator: "PacketResamplerCalculator"
input_stream: "decoded_frames"
output_stream: "sampled_frames"
input_side_packet: "OPTIONS:packet_resampler_options"
}
# Encode the images to store in the SequenceExample.
node {
calculator: "OpenCvImageEncoderCalculator"
input_stream: "sampled_frames"
output_stream: "encoded_frames"
node_options {
[type.googleapis.com/mediapipe.OpenCvImageEncoderCalculatorOptions]: {
quality: 80
}
}
}
# Store the images in the SequenceExample.
node {
calculator: "PackMediaSequenceCalculator"
input_side_packet: "SEQUENCE_EXAMPLE:parsed_sequence_example"
output_side_packet: "SEQUENCE_EXAMPLE:sequence_example_to_serialize"
input_stream: "IMAGE:encoded_frames"
}
# Serialize the SequenceExample to a string for storage.
node {
calculator: "StringToSequenceExampleCalculator"
input_side_packet: "SEQUENCE_EXAMPLE:sequence_example_to_serialize"
output_side_packet: "STRING:output_sequence_example"
}
```