In this article, we will detail key considerations for building a workstation best equipped for facial recognition.
Workstations are high-performing computers, much more than your average PC. Intended for business or professional use, they are designed to improve workflows and perform multiple functions at once.
Technically speaking, workstations are comprised of a variety of hardware, such as high-end x86 Intel CPU (either the Core i or Xeon), AMD CPU and NVIDIA GPU. They often run on Linux or Windows. For facial recognition, a configuration like this would work well to run and manage multiple live video streams in a single workstation.
The use cases for facial recognition workstations spread across industries. They typically involve multiple video feeds capturing multiple faces that need to be analyzed concurrently, for example a security system to which all IP cameras from a single building are connected. And more often than not, workstations need to also run other applications at the same time. Let’s go over a couple of examples to illustrate how it works.
Security is a major concern for all types of buildings, whether public or private. Think about power plants, public transportation stations, commercial and office buildings, industrial plants, schools, malls and more. In these environments, the security and safety of individuals as well as asset protection are of the utmost importance. A facial recognition workstation serves as an excellent solution for this exact task. It could be housed in an IT or computer room in the building and when equipped with facial recognition, can monitor feeds from tens to hundreds of security cameras or smart devices throughout the facility.
For example, devices like smart locks, surveillance cameras and self-service kiosks can run facial detection, while the workstation they are connected to runs facial recognition. This saves much of the computing power needed to actually run recognition for the workstation, which is much more powerful than these thin devices.
Facial recognition can also be deployed to a server that lives in the cloud or a data center, providing service for mobile or web applications. In these use cases, you have thousands or millions of end-user devices capturing images or videos that are communicated back to the workstation for facial recognition and authentication.
For example, imagine a user interacting with a mobile banking app. When the user opens the app, they are prompted to login using their face as their ID. The device will capture the facial vectors, encrypt the information in a small file, send to the server for facial recognition and validate if the user is the actual account holder. Once confirmed, the user will be allowed access to the app and can perform their desired task, such as a mobile deposit.
Another example could be for car, bike or scooter renting services that are built on web applications. A user seeking to rent a car would go to the service’s website, make their desired selection, and instead of manually entering all their information, including a driver’s license number, a web camera could capture, encrypt and send live facial vector data to the server to authenticate whether the user is a registered member, as well as their driver’s license details. Once confirmed, the user would be able to complete their reservation. The entire process could be completed in a matter of seconds.
There are two common workstation infrastructure setups, either on-premise or in the cloud. Below we will detail factors to consider for each.
Workstations used for building security and surveillance will be best installed on-premise so you can directly connect them to devices via the intranet (internal network). In these environments, there will be tens to hundreds of cameras performing facial recognition throughout a single building, plant or campus, all communicating back to the workstation. The intranet guarantees security, with all image and video feeds controlled and staying within the organization’s firewall.
In addition to security, intranet networks provide benefits for workstation performance. Many surveillance cameras run 1080p video streams, which takes about 2-8mbps of bandwidth. High-speed intranets are capable of meeting such high bandwidth requirements.
When designed optimally, on-premise workstations are able to decode large amounts of data in real-time. This is one of the most important factors to consider. Surveillance cameras encode video streams into H.264 (AVC) or H.265 (HEVC) codec for transmission. AI vision algorithms then decode the compressed video stream into raw video buffers. By adding GPUs into any workstation, they are able to decode all of this data in real-time. Many GPUs can decode more than 40 video streams simultaneously. GPUs by NVIDIA in particular can accelerate AI algorithms to run on top of decoded video buffers, offering unmatched runtime performance and computing capacity.
As an alternate to on-premise setups, workstations can also be installed in the cloud or data centers. These are either public cloud, private cloud or a hybrid of the two. Cloud installations are hosted by third-party providers, whereas on-premise is hosted in house. Because of this, workstations in the cloud do not require much effort from the end-user. They also allow for businesses to scale up or down easily. However, the security risks to cloud installation can be greater than on-premise. If a business rents space at a data center but sets up its own servers or workstations, risks are limited. In cases where it also rents virtual processing on shared machines, exposure can be significantly higher, as the data is managed by a third-party.
It is most common for a server to be deployed in the cloud when the use case has devices, such as mobile phones, send in images or videos for facial recognition.
The two most popular configurations are workstation grade or server grade. Below we will outline the details of each.
For most on-premise installations where you need to monitor tens to hundreds of surveillance cameras, a workstation-grade configuration is best.
Workstation: Intel Core i, Xeon CPU or AMD server-class CPU; with NVIDIA Quadro GPU
Here are examples of configurations well suited to perform facial recognition, as well as other AI vision algorithms, for multiple surveillance camera channels and feeds. Supermicro™ makes workstations that match these popular configurations. They typically include up to four GPU cards in a singular framework, enabling them to handle hundreds of video channels, especially when using a fully optimized facial recognition algorithm, like FaceMe®.
For optimal performance, we recommend using the Quadro GPU series, such as the Quadro RTX 6000/8000, or the newly released NVIDIA RTX A6000. If budget is a constraint, the Quadro RTX 5000 is a more affordable solution. However, it cannot handle as much traffic or video channels as the higher-level models. The Quadro RTX 6000 and Quadro RTX 8000 have similar performance for facial recognition. When we tested on our VH model for FaceMe®, both provided 340fps. The Quadro RTX 5000 supported 220fps. The highest performing GPU is the RTX A6000, which outperforms all – delivering 410fps.
Other viable options include the GeForce series GPU, such as the RTX 3090. While offering high performance for a low cost, they are not designed to support 24/7 use and only have a one-year warranty. For these reasons, the Quadro series offers better options. All Quadro series GPU card frameworks are well designed. They come with fans and do not require strict temperature or humidity control. They are flexible and easier to deploy in nearly any facility, from smart offices to factories to commercial buildings.
If you want to install the workstation in a data center or the cloud and be able to handle facial recognition requests from hundreds to millions of devices, you will need a server-grade configuration.
Server: CPU and Fanless (passive cooling) GPU
Because server-grade configurations are housed in data centers and server rooms, temperature and humidity must be controlled. GPUs often include fans to help exhaust heat to maintain high performance for heavy workloads. However, passive cooling cards, like the Tesla V100 or T4, are completely fanless. Instead, they are built with a heat sink. They require systems that provide good air flow through the heat sink to ensure the GPU operates within thermal limits. Fanless GPU with passive cooling is best for server and data center environments.
We recommend either the iEi HTB-200-C236 or Advantech MIC-770 systems. Both are well designed, providing great airflow and catering to the NVIDIA T4 heat sink. Temperature is controlled, keeping them well under thermal limits while the GPU operates at its full workload. For more information on qualified servers, you can check out NVIDIA’s Qualified Server Catalog.
The number of servers and GPUs needed to best run facial recognition depends on how many transactions per second are required during peak hours. Think about the user base, including the number of daily active users, as well as frequency. We tested the FaceMe® VH model and found the T4 was able to support facial recognition for 192fps with the GPU temperature well controlled.
To save on server computation power, we suggest moving workloads from the server to the edge. For example, you could run face detection on the edge and face recognition on the server. This split frees up server power – only requiring it to process image frames that contain a face in it. You could also have face detection, face template extraction and anti-spoofing on the edge, handled entirely by high-end smartphones (such as Android phones using Snapdragon). The server would only need to handle face extraction for lower-end to middle-end smartphones (like legacy iPhones).
Overall, when thinking about server configurations and which is best for your needs, you will want to choose a facial recognition algorithm that supports both server and edge deployment.
As we outlined in our in-depth article, Facial Recognition at the Edge - The Ultimate Guide 2021, it can be challenging to design a good facial recognition system running on a high performance workstation or PC with GPU (or VPU). There are dozens of concurrent video streams running between CPU, GPU and memory over the system bus. If not properly implemented on the system architecture, even the strongest facial recognition algorithms will be slow. The system architecture should always minimize the data flow between CPU, GPU and memory.
FaceMe® has optimized system architecture through several iterations to ensure it delivers the absolute best performance. For example, on a single workstation, FaceMe® with NVIDIA RTX A6000 can handle 340-410fps (the exact number may vary depending on which FaceMe® facial recognition model is used). This is equivalent to handling 25-41 concurrent video channels (each with 10fps) per GPU, an outstanding cost-performance offering. The following table shows the performance for facial recognition of a handful of popular GPUs that we tested with our latest model:
* Tested with 1080p image, each image contains one face.
The two most common OS for servers and workstations are Windows and Linux. Both have unique pros and cons that you should consider before deciding which is right for you. One of the first items to look at is which applications, or SDKs, you plan to use. A good facial recognition algorithm, like FaceMe®, supports both Linux and Windows – providing equal functionality and performance for each. This grants you much more flexibility to plan and manage your platforms. Let’s take a closer look at each.
Linux is most often used for server-grade configurations. It’s a popular choice because there are no license fees. It’s open source, reliable and stable – able to operate 24/7. Linux is also easier to control and manage. We highly recommend it.
Of the Linux variations, Ubuntu is the most popular, with Debian and Red Hat following close behind. Ubuntu is well liked because it’s easy to set up Docker, which is the most popular container deployment and management platform.
Windows is a dominant desktop OS. Many IT teams are much more familiar with it than Linux. If this the case for your business, choosing a server that runs Windows is a good option. In addition, if you want your system to run other Microsoft applications, such as Exchange, Microsoft SQL or Active Directory, it will be smart to keep the OS consistent. This will make it much easier to write, develop and maintain all application components on the server.
There are several considerations when building a workstation that best suits your specific needs for facial recognition. When evaluating options, one of the most important things is to understand the use case and performance needs, as well as where you want the workstation to live – either on-premise or the cloud.
Once you think you have the right build and design in mind, we recommend conducting proof of concept (POC) projects before the application is installed for real-time and real-life use. This way you can understand any improvements that need to be made and adjust before launching fully.
For a full overview of facial recognition, how it works and how it can be deployed, read Edge-based Facial Recognition - The Ultimate Guide.
For how is facial recognition used in 2021, read Facial Recognition – How is It Used in 2021?