2020-06-06 01:49:27 +02:00
|
|
|
---
|
|
|
|
layout: default
|
|
|
|
title: YouTube-8M Feature Extraction and Model Inference
|
|
|
|
parent: Solutions
|
2020-12-10 04:13:05 +01:00
|
|
|
nav_order: 15
|
2020-06-06 01:49:27 +02:00
|
|
|
---
|
|
|
|
|
|
|
|
# YouTube-8M Feature Extraction and Model Inference
|
|
|
|
{: .no_toc }
|
|
|
|
|
2020-12-10 04:13:05 +01:00
|
|
|
<details close markdown="block">
|
|
|
|
<summary>
|
|
|
|
Table of contents
|
|
|
|
</summary>
|
|
|
|
{: .text-delta }
|
2020-06-06 01:49:27 +02:00
|
|
|
1. TOC
|
|
|
|
{:toc}
|
2020-12-10 04:13:05 +01:00
|
|
|
</details>
|
2020-06-06 01:49:27 +02:00
|
|
|
---
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
MediaPipe is a useful and general framework for media processing that can assist
|
|
|
|
with research, development, and deployment of ML models. This example focuses on
|
2019-10-25 23:12:58 +02:00
|
|
|
model development by demonstrating how to prepare training data and do model
|
|
|
|
inference for the YouTube-8M Challenge.
|
|
|
|
|
|
|
|
## Extracting Video Features for YouTube-8M Challenge
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
[Youtube-8M Challenge](https://www.kaggle.com/c/youtube8m-2019) is an annual
|
|
|
|
video classification challenge hosted by Google. Over the last two years, the
|
|
|
|
first two challenges have collectively drawn 1000+ teams from 60+ countries to
|
|
|
|
further advance large-scale video understanding research. In addition to the
|
|
|
|
feature extraction Python code released in the
|
|
|
|
[google/youtube-8m](https://github.com/google/youtube-8m/tree/master/feature_extractor)
|
|
|
|
repo, we release a MediaPipe based feature extraction pipeline that can extract
|
|
|
|
both video and audio features from a local video. The MediaPipe based pipeline
|
|
|
|
utilizes two machine learning models,
|
|
|
|
[Inception v3](https://github.com/tensorflow/models/tree/master/research/inception)
|
|
|
|
and
|
|
|
|
[VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish),
|
|
|
|
to extract features from video and audio respectively.
|
|
|
|
|
|
|
|
To visualize the
|
|
|
|
[graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/youtube8m/feature_extraction.pbtxt),
|
|
|
|
copy the text specification of the graph and paste it into
|
|
|
|
[MediaPipe Visualizer](https://viz.mediapipe.dev/). The feature extraction
|
|
|
|
pipeline is highly customizable. You are welcome to add new calculators or use
|
|
|
|
your own machine learning models to extract more advanced features from the
|
|
|
|
videos.
|
|
|
|
|
|
|
|
### Steps to run the YouTube-8M feature extraction graph
|
|
|
|
|
2020-01-10 02:51:05 +01:00
|
|
|
1. Checkout the repository and follow
|
|
|
|
[the installation instructions](https://github.com/google/mediapipe/blob/master/mediapipe/docs/install.md)
|
|
|
|
to set up MediaPipe.
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
```bash
|
|
|
|
git clone https://github.com/google/mediapipe.git
|
|
|
|
cd mediapipe
|
|
|
|
```
|
|
|
|
|
2019-10-25 23:12:58 +02:00
|
|
|
2. Download the PCA and model data.
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
```bash
|
|
|
|
mkdir /tmp/mediapipe
|
|
|
|
cd /tmp/mediapipe
|
|
|
|
curl -O http://data.yt8m.org/pca_matrix_data/inception3_mean_matrix_data.pb
|
|
|
|
curl -O http://data.yt8m.org/pca_matrix_data/inception3_projection_matrix_data.pb
|
|
|
|
curl -O http://data.yt8m.org/pca_matrix_data/vggish_mean_matrix_data.pb
|
|
|
|
curl -O http://data.yt8m.org/pca_matrix_data/vggish_projection_matrix_data.pb
|
|
|
|
curl -O http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
|
|
|
|
tar -xvf /tmp/mediapipe/inception-2015-12-05.tgz
|
|
|
|
```
|
|
|
|
|
2019-10-25 23:12:58 +02:00
|
|
|
3. Get the VGGish frozen graph.
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
Note: To run step 3 and step 4, you must have Python 2.7 or 3.5+ installed
|
|
|
|
with the TensorFlow 1.14+ package installed.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
# cd to the root directory of the MediaPipe repo
|
|
|
|
cd -
|
2020-03-11 02:07:12 +01:00
|
|
|
|
|
|
|
pip3 install tf_slim
|
2019-09-05 03:19:36 +02:00
|
|
|
python -m mediapipe.examples.desktop.youtube8m.generate_vggish_frozen_graph
|
|
|
|
```
|
|
|
|
|
2019-10-25 23:12:58 +02:00
|
|
|
4. Generate a MediaSequence metadata from the input video.
|
2019-09-05 03:19:36 +02:00
|
|
|
|
2019-10-29 23:53:13 +01:00
|
|
|
Note: the output file is /tmp/mediapipe/metadata.pb
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
```bash
|
2019-10-25 23:12:58 +02:00
|
|
|
# change clip_end_time_sec to match the length of your video.
|
2019-09-05 03:19:36 +02:00
|
|
|
python -m mediapipe.examples.desktop.youtube8m.generate_input_sequence_example \
|
2019-10-25 23:12:58 +02:00
|
|
|
--path_to_input_video=/absolute/path/to/the/local/video/file \
|
|
|
|
--clip_end_time_sec=120
|
2019-09-05 03:19:36 +02:00
|
|
|
```
|
|
|
|
|
2019-10-25 23:12:58 +02:00
|
|
|
5. Run the MediaPipe binary to extract the features.
|
2019-09-05 03:19:36 +02:00
|
|
|
|
|
|
|
```bash
|
2020-03-11 02:07:12 +01:00
|
|
|
bazel build -c opt --linkopt=-s \
|
2019-09-05 03:19:36 +02:00
|
|
|
--define MEDIAPIPE_DISABLE_GPU=1 --define no_aws_support=true \
|
|
|
|
mediapipe/examples/desktop/youtube8m:extract_yt8m_features
|
|
|
|
|
2019-10-25 23:12:58 +02:00
|
|
|
GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/youtube8m/extract_yt8m_features \
|
2019-09-05 03:19:36 +02:00
|
|
|
--calculator_graph_config_file=mediapipe/graphs/youtube8m/feature_extraction.pbtxt \
|
2019-10-29 23:53:13 +01:00
|
|
|
--input_side_packets=input_sequence_example=/tmp/mediapipe/metadata.pb \
|
|
|
|
--output_side_packets=output_sequence_example=/tmp/mediapipe/features.pb
|
|
|
|
```
|
|
|
|
|
|
|
|
6. [Optional] Read the features.pb in Python.
|
|
|
|
|
|
|
|
```
|
|
|
|
import tensorflow as tf
|
|
|
|
|
|
|
|
sequence_example = open('/tmp/mediapipe/features.pb', 'rb').read()
|
|
|
|
print(tf.train.SequenceExample.FromString(sequence_example))
|
2019-09-05 03:19:36 +02:00
|
|
|
```
|
2019-10-25 23:12:58 +02:00
|
|
|
|
|
|
|
## Model Inference for YouTube-8M Challenge
|
|
|
|
|
|
|
|
MediaPipe can help you do model inference for YouTube-8M Challenge with both
|
|
|
|
local videos and the YouTube-8M dataset. To visualize
|
|
|
|
[the graph for local videos](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/youtube8m/local_video_model_inference.pbtxt)
|
|
|
|
and
|
|
|
|
[the graph for the YouTube-8M dataset](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/youtube8m/yt8m_dataset_model_inference.pbtxt),
|
|
|
|
copy the text specification of the graph and paste it into
|
|
|
|
[MediaPipe Visualizer](https://viz.mediapipe.dev/). We use the baseline model
|
|
|
|
[(model card)](https://drive.google.com/file/d/1xTCi9-Nm9dt2KIk8WR0dDFrIssWawyXy/view)
|
|
|
|
in our example. But, the model inference pipeline is highly customizable. You
|
|
|
|
are welcome to add new calculators or use your own machine learning models to do
|
|
|
|
the inference for both local videos and the dataset
|
|
|
|
|
|
|
|
### Steps to run the YouTube-8M model inference graph with Web Interface
|
|
|
|
|
|
|
|
1. Copy the baseline model
|
|
|
|
[(model card)](https://drive.google.com/file/d/1xTCi9-Nm9dt2KIk8WR0dDFrIssWawyXy/view)
|
|
|
|
to local.
|
|
|
|
|
|
|
|
```bash
|
2020-05-21 18:46:31 +02:00
|
|
|
curl -o /tmp/mediapipe/yt8m_baseline_saved_model.tar.gz http://data.yt8m.org/models/baseline/saved_model.tar.gz
|
2019-10-25 23:12:58 +02:00
|
|
|
|
|
|
|
tar -xvf /tmp/mediapipe/yt8m_baseline_saved_model.tar.gz -C /tmp/mediapipe
|
|
|
|
```
|
|
|
|
|
|
|
|
2. Build the inference binary.
|
|
|
|
|
|
|
|
```bash
|
2020-03-11 02:07:12 +01:00
|
|
|
bazel build -c opt --define='MEDIAPIPE_DISABLE_GPU=1' --linkopt=-s \
|
2019-10-25 23:12:58 +02:00
|
|
|
mediapipe/examples/desktop/youtube8m:model_inference
|
|
|
|
```
|
|
|
|
|
|
|
|
3. Run the python web server.
|
|
|
|
|
2020-03-11 02:07:12 +01:00
|
|
|
Note: pip3 install absl-py
|
2019-10-25 23:12:58 +02:00
|
|
|
|
|
|
|
```bash
|
|
|
|
python mediapipe/examples/desktop/youtube8m/viewer/server.py --root `pwd`
|
|
|
|
```
|
|
|
|
|
|
|
|
Navigate to localhost:8008 in a web browser.
|
|
|
|
[Here](https://drive.google.com/file/d/19GSvdAAuAlACpBhHOaqMWZ_9p8bLUYKh/view?usp=sharing)
|
|
|
|
is a demo video showing the steps to use this web application. Also please
|
|
|
|
read
|
|
|
|
[youtube8m/README.md](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/youtube8m/README.md)
|
|
|
|
if you prefer to run the underlying model_inference binary in command line.
|
|
|
|
|
|
|
|
### Steps to run the YouTube-8M model inference graph with a local video
|
|
|
|
|
2019-10-29 23:53:13 +01:00
|
|
|
1. Make sure you have the features.pb from the feature extraction pipeline.
|
2019-10-25 23:12:58 +02:00
|
|
|
|
|
|
|
2. Copy the baseline model
|
|
|
|
[(model card)](https://drive.google.com/file/d/1xTCi9-Nm9dt2KIk8WR0dDFrIssWawyXy/view)
|
|
|
|
to local.
|
|
|
|
|
|
|
|
```bash
|
2020-05-21 18:46:31 +02:00
|
|
|
curl -o /tmp/mediapipe/yt8m_baseline_saved_model.tar.gz http://data.yt8m.org/models/baseline/saved_model.tar.gz
|
2019-10-25 23:12:58 +02:00
|
|
|
|
|
|
|
tar -xvf /tmp/mediapipe/yt8m_baseline_saved_model.tar.gz -C /tmp/mediapipe
|
|
|
|
```
|
|
|
|
|
|
|
|
3. Build and run the inference binary.
|
|
|
|
|
|
|
|
```bash
|
2020-03-11 02:07:12 +01:00
|
|
|
bazel build -c opt --define='MEDIAPIPE_DISABLE_GPU=1' --linkopt=-s \
|
2019-10-25 23:12:58 +02:00
|
|
|
mediapipe/examples/desktop/youtube8m:model_inference
|
|
|
|
|
|
|
|
# segment_size is the number of seconds window of frames.
|
|
|
|
# overlap is the number of seconds adjacent segments share.
|
|
|
|
GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/youtube8m/model_inference \
|
|
|
|
--calculator_graph_config_file=mediapipe/graphs/youtube8m/local_video_model_inference.pbtxt \
|
2019-10-29 23:53:13 +01:00
|
|
|
--input_side_packets=input_sequence_example_path=/tmp/mediapipe/features.pb,input_video_path=/absolute/path/to/the/local/video/file,output_video_path=/tmp/mediapipe/annotated_video.mp4,segment_size=5,overlap=4
|
2019-10-25 23:12:58 +02:00
|
|
|
```
|
|
|
|
|
|
|
|
4. View the annotated video.
|