The mission of QuTech is to bring quantum technology to industry and society by translating fundamental scientific research into applied research. To this end we are developing Quantum Inspire (QI), a full-stack quantum computer prototype for future co-development and collaborative R&D in quantum computing. A prerelease of this prototype system is already offering the public cloud-based access to QuTech technologies such as a programmable quantum computer simulator (with up to 31 qubits) and tutorials and user background knowledge on quantum information science (www.quantum-inspire.com). Access to a programmable CMOS-compatible Silicon spin qubit-based quantum processor will be provided in the next deployment phase. The first generation of QI’s quantum processors consists of a double quantum dot hosted in an in-house grown SiGe/28Si/SiGe heterostructure, and defined with a single layer of Al gates. Here we give an overview of important aspects of the QI full-stack. We illustrate QI’s modular system architecture and we will touch on parts of the manufacturing and electrical characterization of its first generation two spin qubit quantum processor unit. We close with a section on QI’s qubit calibration framework. The definition of a single qubit Pauli X gate is chosen as concrete example of the matching of an experiment to a component of the circuit model for quantum computation.
Person tracking across non-overlapping cameras and other types of video analytics benefit from spatial calibration information that allows an estimation of the distance between cameras and a relation between pixel coordinates and world coordinates within a camera. In a large environment with many cameras, or for frequent ad-hoc deployments of cameras, the cost of this calibration is high. This creates a barrier for the use of video analytics. Automating the calibration allows for a short configuration time, and the use of video analytics in a wider range of scenarios, including ad-hoc crisis situations and large scale surveillance systems. We show an autocalibration method entirely based on pedestrian detections in surveillance video in multiple non-overlapping cameras. In this paper, we show the two main components of automatic calibration. The first shows the intra-camera geometry estimation that leads to an estimate of the tilt angle, focal length and camera height, which is important for the conversion from pixels to meters and vice versa. The second component shows the inter-camera topology inference that leads to an estimate of the distance between cameras, which is important for spatio-temporal analysis of multi-camera tracking. This paper describes each of these methods and provides results on realistic video data.
Object recognition and localization are important to automatically interpret video and allow better querying on its content. We propose a method for object localization that learns incrementally and addresses four key aspects. Firstly, we show that for certain applications, recognition is feasible with only a few training samples. Secondly, we show that novel objects can be added incrementally without retraining existing objects, which is important for fast interaction. Thirdly, we show that an unbalanced number of positive training samples leads to biased classifier scores that can be corrected by modifying weights. Fourthly, we show that the detector performance can deteriorate due to hard-negative mining for similar or closely related classes (e.g., for Barbie and dress, because the doll is wearing a dress). This can be solved by our hierarchical classification. We introduce a new dataset, which we call TOSO, and use it to demonstrate the effectiveness of the proposed method for the localization and recognition of multiple objects in images.
In the security domain, cameras are important to assess critical situations. Apart from fixed surveillance cameras we observe an increasing number of sensors on mobile platforms, such as drones, vehicles and persons. Mobile cameras allow rapid and local deployment, enabling many novel applications and effects, such as the reduction of violence between police and citizens. However, the increased use of bodycams also creates potential challenges. For example: how can end-users extract information from the abundance of video, how can the information be presented, and how can an officer retrieve information efficiently? Nevertheless, such video gives the opportunity to stimulate the professionals’ memory, and support complete and accurate reporting. In this paper, we show how video content analysis (VCA) can address these challenges and seize these opportunities. To this end, we focus on methods for creating a complete summary of the video, which allows quick retrieval of relevant fragments. The content analysis for summarization consists of several components, such as stabilization, scene selection, motion estimation, localization, pedestrian tracking and action recognition in the video from a bodycam. The different components and visual representations of summaries are presented for retrospective investigation.
Airborne platforms, such as UAV’s, with Wide Area Motion Imagery (WAMI) sensors can cover multiple square kilometers and produce large amounts of video data. Analyzing all data for information need purposes becomes increasingly labor-intensive for an image analyst. Furthermore, the capacity of the datalink in operational areas may be inadequate to transfer all data to the ground station. Automatic detection and tracking of people and vehicles enables to send only the most relevant footage to the ground station and assists the image analysts in effective data searches. In this paper, we propose a method for detecting and tracking vehicles in high-resolution WAMI images from a moving airborne platform. For the vehicle detection we use a cascaded set of classifiers, using an Adaboost training algorithm on Haar features. This detector works on individual images and therefore does not depend on image motion stabilization. For the vehicle tracking we use a local template matching algorithm. This approach has two advantages. In the first place, it does not depend on image motion stabilization and it counters the inaccuracy of the GPS data that is embedded in the video data. In the second place, it can find matches when the vehicle detector would miss a certain detection. This results in long tracks even when the imagery is of low frame-rate. In order to minimize false detections, we also integrate height information from a 3D reconstruction that is created from the same images. By using the locations of buildings and roads, we are able to filter out false detections and increase the performance of the tracker. In this paper we show that the vehicle tracks can also be used to detect more complex events, such as traffic jams and fast moving vehicles. This enables the image analyst to do a faster and more effective search of the data.
Proactive detection of incidents is required to decrease the cost of security incidents. This paper focusses on the
automatic early detection of suspicious behavior of pickpockets with track-based features in a crowded shopping mall.
Our method consists of several steps: pedestrian tracking, feature computation and pickpocket recognition. This is
challenging because the environment is crowded, people move freely through areas which cannot be covered by a single
camera, because the actual snatch is a subtle action, and because collaboration is complex social behavior. We carried
out an experiment with more than 20 validated pickpocket incidents. We used a top-down approach to translate expert
knowledge in features and rules, and a bottom-up approach to learn discriminating patterns with a classifier. The
classifier was used to separate the pickpockets from normal passers-by who are shopping in the mall. We performed a
cross validation to train and evaluate our system. In this paper, we describe our method, identify the most valuable
features, and analyze the results that were obtained in the experiment. We estimate the quality of these features and the
performance of automatic detection of (collaborating) pickpockets. The results show that many of the pickpockets can be
detected at a low false alarm rate.
Automatic detection of abnormal behavior in CCTV cameras is important to improve the security in crowded
environments, such as shopping malls, airports and railway stations. This behavior can be characterized at different time
scales, e.g., by small-scale subtle and obvious actions or by large-scale walking patterns and interactions between people.
For example, pickpocketing can be recognized by the actual snatch (small scale), when he follows the victim, or when he
interacts with an accomplice before and after the incident (longer time scale). This paper focusses on event recognition
by detecting large-scale track-based patterns. Our event recognition method consists of several steps: pedestrian
detection, object tracking, track-based feature computation and rule-based event classification. In the experiment, we
focused on single track actions (walk, run, loiter, stop, turn) and track interactions (pass, meet, merge, split). The
experiment includes a controlled setup, where 10 actors perform these actions. The method is also applied to all tracks
that are generated in a crowded shopping mall in a selected time frame. The results show that most of the actions can be
detected reliably (on average 90%) at a low false positive rate (1.1%), and that the interactions obtain lower detection
rates (70% at 0.3% FP). This method may become one of the components that assists operators to find threatening
behavior and enrich the selection of videos that are to be observed.
KEYWORDS: Video, 3D image processing, Panoramic photography, 3D video streaming, 3D displays, Surgery, 3D modeling, Laparoscopy, Endoscopy, 3D image reconstruction
In comparison to open surgery, endoscopic surgery offers impaired depth perception and narrower field-of-view. To improve depth perception, the Da Vinci robot offers three-dimensional (3-D) video on the console for the surgeon but not for assistants, although both must collaborate. We improved the shared perception of the whole surgical team by connecting live 3-D monitors to all three available Da Vinci generations, probed user experience after two years by questionnaire, and compared time measurements of a predefined complex interaction task performed with a 3-D monitor versus two-dimensional. Additionally, we investigated whether the complex mental task of reconstructing a 3-D overview from an endoscopic video can be performed by a computer and shared among users. During the study, 925 robot-assisted laparoscopic procedures were performed in three hospitals, including prostatectomies, cystectomies, and nephrectomies. Thirty-one users participated in our questionnaire. Eighty-four percent preferred 3-D monitors and 100% reported spatial-perception improvement. All participating urologists indicated quicker performance of tasks requiring delicate collaboration (e.g., clip placement) when assistants used 3-D monitors. Eighteen users participated in a timing experiment during a delicate cooperation task in vitro. Teamwork was significantly (40%) faster with the 3-D monitor. Computer-generated 3-D reconstructions from recordings offered very wide interactive panoramas with educational value, although the present embodiment is vulnerable to movement artifacts.
We present a robust method for landing zone selection using obstacle detection to be used for UAV emergency landings. The method is simple enough to allow real-time implementation on a UAV system. The method is able to detect objects in the presence of camera movement and motion parallax. Using the detected obstacles we select a safe landing zone for the UAV. The motion and structure detection uses background estimation of stabilized video. The background variation is measured and used to enhance the moving objects if necessary. In the motion and structure map a distance transform is calculated to find a suitable location for landing.
Compared to open surgery, minimal invasive surgery offers reduced trauma and faster recovery. However, lack of direct
view limits space perception. Stereo-endoscopy improves depth perception, but is still restricted to the direct endoscopic
field-of-view. We describe a novel technology that reconstructs 3D-panoramas from endoscopic video streams providing
a much wider cumulative overview. The method is compatible with any endoscope. We demonstrate that it is possible to
generate photorealistic 3D-environments from mono- and stereoscopic endoscopy. The resulting 3D-reconstructions can
be directly applied in simulators and e-learning. Extended to real-time processing, the method looks promising for
telesurgery or other remote vision-guided tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.