84 lines
3.5 KiB
Markdown
84 lines
3.5 KiB
Markdown
|
## Extracting Video Features for YouTube-8M Challenge
|
||
|
|
||
|
MediaPipe is a useful and general framework for media processing that can assist
|
||
|
with research, development, and deployment of ML models. This example focuses on
|
||
|
model development by demonstrating how to prepare training data for the
|
||
|
YouTube-8M Challenge.
|
||
|
|
||
|
[Youtube-8M Challenge](https://www.kaggle.com/c/youtube8m-2019) is an annual
|
||
|
video classification challenge hosted by Google. Over the last two years, the
|
||
|
first two challenges have collectively drawn 1000+ teams from 60+ countries to
|
||
|
further advance large-scale video understanding research. In addition to the
|
||
|
feature extraction Python code released in the
|
||
|
[google/youtube-8m](https://github.com/google/youtube-8m/tree/master/feature_extractor)
|
||
|
repo, we release a MediaPipe based feature extraction pipeline that can extract
|
||
|
both video and audio features from a local video. The MediaPipe based pipeline
|
||
|
utilizes two machine learning models,
|
||
|
[Inception v3](https://github.com/tensorflow/models/tree/master/research/inception)
|
||
|
and
|
||
|
[VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish),
|
||
|
to extract features from video and audio respectively.
|
||
|
|
||
|
To visualize the
|
||
|
[graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/youtube8m/feature_extraction.pbtxt),
|
||
|
copy the text specification of the graph and paste it into
|
||
|
[MediaPipe Visualizer](https://viz.mediapipe.dev/). The feature extraction
|
||
|
pipeline is highly customizable. You are welcome to add new calculators or use
|
||
|
your own machine learning models to extract more advanced features from the
|
||
|
videos.
|
||
|
|
||
|
### Steps to run the YouTube-8M feature extraction graph
|
||
|
|
||
|
1. Checkout the mediapipe repository
|
||
|
|
||
|
```bash
|
||
|
git clone https://github.com/google/mediapipe.git
|
||
|
cd mediapipe
|
||
|
```
|
||
|
|
||
|
2. Download the PCA and model data
|
||
|
|
||
|
```bash
|
||
|
mkdir /tmp/mediapipe
|
||
|
cd /tmp/mediapipe
|
||
|
curl -O http://data.yt8m.org/pca_matrix_data/inception3_mean_matrix_data.pb
|
||
|
curl -O http://data.yt8m.org/pca_matrix_data/inception3_projection_matrix_data.pb
|
||
|
curl -O http://data.yt8m.org/pca_matrix_data/vggish_mean_matrix_data.pb
|
||
|
curl -O http://data.yt8m.org/pca_matrix_data/vggish_projection_matrix_data.pb
|
||
|
curl -O http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
|
||
|
tar -xvf /tmp/mediapipe/inception-2015-12-05.tgz
|
||
|
```
|
||
|
|
||
|
3. Get the VGGish frozen graph
|
||
|
|
||
|
Note: To run step 3 and step 4, you must have Python 2.7 or 3.5+ installed
|
||
|
with the TensorFlow 1.14+ package installed.
|
||
|
|
||
|
```bash
|
||
|
# cd to the root directory of the MediaPipe repo
|
||
|
cd -
|
||
|
python -m mediapipe.examples.desktop.youtube8m.generate_vggish_frozen_graph
|
||
|
```
|
||
|
|
||
|
4. Generate a MediaSequence metadata from the input video
|
||
|
|
||
|
Note: the output file is /tmp/mediapipe/metadata.tfrecord
|
||
|
|
||
|
```bash
|
||
|
python -m mediapipe.examples.desktop.youtube8m.generate_input_sequence_example \
|
||
|
--path_to_input_video=/absolute/path/to/the/local/video/file
|
||
|
```
|
||
|
|
||
|
5. Run the MediaPipe binary to extract the features
|
||
|
|
||
|
```bash
|
||
|
bazel build -c opt \
|
||
|
--define MEDIAPIPE_DISABLE_GPU=1 --define no_aws_support=true \
|
||
|
mediapipe/examples/desktop/youtube8m:extract_yt8m_features
|
||
|
|
||
|
./bazel-bin/mediapipe/examples/desktop/youtube8m/extract_yt8m_features
|
||
|
--calculator_graph_config_file=mediapipe/graphs/youtube8m/feature_extraction.pbtxt \
|
||
|
--input_side_packets=input_sequence_example=/tmp/mediapipe/metadata.tfrecord \
|
||
|
--output_side_packets=output_sequence_example=/tmp/mediapipe/output.tfrecord
|
||
|
```
|