Virtual Cinematographer

By Mohamed Tanweer Khatieb

Introduction

The Virtual Cinematographer is crucial in generating video output which is both smaller to read and easier to follow. This is done by using computer vision algorithms to find the lecturer and then crop each 4K frame down to 720p such that only the lecturer (and most of the board he is working on) is in the frame.

Aims

This module aims to provide the viewer with a smaller video in such a way that:

  1. The video can be streamed online without much buffering.
  2. The video quality of the reduced resolution is good enough to view without losing any crucial information from the lecturer's gestures, facial expressions or writing on the boards.
  3. The output video pans smoothly, responds appropriately and resembles a human cameraman's work in general.

Component Overview

The diagram below shows a high-level overview of the virtual cinematographer module. After that, we discuss each component in more detail.



Pan Sequencing

This diagram represents the pan operations over which the cropping window will pan. These pan operations are generated from the lecturer's positions and are stored as objects containing the start point, end point, number of frames and direction of the pan. This information will be needed when the cropping window is made to pan smoothly over these operations to produce the output video. The horizontal axis is the lecturer's position in the venue, and the vertical axis is time (top = start; bottom = end). The blue arrows represent a left movement (start = right; end = left) while the red arrows represent a right movement (start = left; end = right). Black arrows are noise arrows which have no direction since they are a collection of small left and right movements combined into one operation, to yield a smooth pan.

Cropping

This figure shows how the cropping window moves over a pan operation. The curve above the frames is the function used to determine the gaps by which the cropping window jumps across the gap from start to end over the number of frames specified. The formula for the function is Cos (x - 135) +1 where x is the lecturer's position on the x-axis of the frame.

Output

Here is an example of the output video's quality and framing. The lecturer is usually framed on the border of the frame in order to display the content to which he/she is referring. If the lecturer is not using or referring to any of the boards in the venue, the cropping window frames him/her in the centre of the frame.

Results and Conclusions

In the end, the Virtual Cinematographer was a success. It achieved all of its primary objectives by making the video significantly smaller and framing the lecturer correctly. It also panned responsively and appropriately. CILT evaluated the output video and provided a professional evaluation, stating that the quality of the output video produced matched (or exceeded, in some cases) the quality produced by their current implementations. Overall the module works well, but it is not perfect. There is still scope to improve the framing by adding a zoom functionality as well as panning up/down, in future work.