CILT has indicated that one of the most important factors to be considered by an automated tracking
system is the ability to see all the content written on the boards by the lecturer. A proposed solution
to this request is to create a separate video stream with a very low frame rate (1 frame per second, to
reduce file size), that would include only the board region to serve as an overview video to the
The board detection module of the Track4k system is responsible for detecting the boards in the venue and finds the tightest bounding box that includes all the boards. This crop region is saved to a video file. The primary purpose is to focus on the writing on the boards as required by CILT.
This module aims to investigate the detection of boards and when they are used as well as segmenting/cropping only the region in which the boards are. This information will then be able to be used to make better panning adjustments. CILT has stated that their expectation of this module is that the blackboards are legible and to avoid a video post-processing backlog from forming, videos must be processed within 8 hours of recording, and each video may not take more than 300% of its length for processing. With these constraints we need to know the following to determine if the project is a success:
- Can we detect blackboard usage in a lecture recording video stream using computer vision algorithms?
- Using only modest computer hardware, can the required processing be completed in a time that is less than 90% of the video length?
The diagram below shows a high-level overview of the board detection module. After that, we discuss each component in more detail.
Detect Areas of Motion
This functionality detects where all motion occurs over a set number of frames and then bounds these regions with rectangles. This is useful for the tracking stage of the project as it reduces the search space when looking for the lecturer. It uses the idea of subtracting frames to find the difference between them and then combines all the resultant differences into one final image representing all motion over those frames.
To be able to find the board regions we need to look for shapes in the frame and this is achieved by using an edge detector to detect the boundaries of the boards and the board columns which can then be used to segment the boards. The boards are then bound with rectangles and passed into the feature detection stage to detect usage.
Board Usage Detection
To be able to detect when the board is used, we needed some way of determining how much of the board is used and to detect when this changes. To do this we use a feature detection algorithm that returns the number of key points such as corners and edges which would typically be associated with writing. A significant change in the number of features indicates that the board was used.
Results and Conclusions
We needed to develop a system that would successfully be able to segment the blackboards while detecting the board usage. Computer vision algorithms were used to identify the board usage in an input lecture video and the following conclusions were made. It is possible to detect blackboard usage in a lecture recording using computer vision algorithms. Although our system struggled to identify the boards in low-light conditions as well as when the boards are very dusty, the algorithms are robust in the sense that they only need the detection to occur at some point through the video, to segment the board region correctly. We also found that it is possible to process the video in time less than 90% of the original video’s length. This finding satisfies the requirement from CILT that the processing time should not exceed 300% of the original video length (or 100% for this module).