The new image sensors of the Sony IMX500 Intelligent Vision series contain AI image analysis systems directly on the chip, which opens up some new and faster functions for cameras.
The announcement describes two new Intelligent Vision CMOS chip models, the Sony IMX500 and the IMX501. As far as I can tell, it's the same basic chip, except that the 500 is the bare-chip product, while the 501 is a packaged product.
They are both 1 / 2.3-inch chips with 12.3 effective megapixels. It seems clear that one of the main markets for the new chip is security and system cameras. AI processes on chip, however, offer some exciting new opportunities for future video cameras, especially those that are mounted on drones or in action cameras like a GoPro or Insta 360.
What can the Sony IMX500 sensor do?
An outstanding ability of the new chip lies in functions such as the identification of objects or people. This can be done by tracking such objects or by actually identifying them. The new chip also does not have to be issued in image form. Metadata can be output so that a description of what is displayed without the associated visual image can easily be sent. This can reduce data storage requirements up to 10,000 times.
For security or system camera purposes, a camera equipped with the new chip can count the number of people who pass it or identify a small inventory on a store shelf. It could even be programmed to identify customer behavior using heat maps.
With conventional cameras, autofocus systems could be improved by being able to identify and track subjects much more precisely. Such AI systems could make auto focus systems smarter by identifying areas of an image that you are likely to focus on. For example, if you want to photograph a flower, the AF system must focus on it and not, for example, on the branch behind it. Face recognition would also become much faster and more reliable.
Autofocus systems are already becoming incredibly good these days, but if they were supported by ultra-fast on-chip object identification, they could be even better. The ability to use more reliable object tracking metadata also helps with post-processing for 360 cameras.
Why do we need AI on chip?
There are two main impulses for placing AI skills directly on the chip. The first is that it makes processing much, much faster. The Sony IMX500 can perform its capabilities at the speed of a video image instead of having to send that data along a pipeline to process it elsewhere. The other advantage is greater security. Data for AI image analysis is very often sent via the cloud. When these systems are on the chip, this potential vulnerability is resolved.
Cloud AI cannot be used offline and also limits the ability to perform analyzes reliably in real time. Cloud computers are also increasing in energy and costs, and that's not good for the environment.
With small cameras like GoPros, this means that this type of processing does not have to be carried out by another chip at another location in the camera. This saves electricity, but also means that the main processing chip and the memory of the camera can be released for other tasks such as better electronic stabilization or color processing.
It is only limited by your imagination
However, the capabilities of the new chip, which developers can program individually to do exactly what they need, are limited only by imagination. Sony uses a car as an example of its use, identifies the driver and automatically adjusts the seating position of the car. Another example is recognizing whether the driver has fallen asleep.
With sports cameras, the device may be able to identify your shape while moving. For example, if you want to improve your yoga or martial arts, this can help identify opportunities for improvement by comparing it to a “perfect” example. Speech recognition through lip movements could possibly be done much faster and could be included in all cameras. For people filming drama, this would have great potential when it comes to logging recordings or identifying them based on a script, when the camera outputs performing performances in text form at the same time as the picture is taken.
The IMX500 also looks like a high-performance chip from a pure video perspective. It is capable of 4K up to 60 fps and 1080p up to 240 fps. Although the chip is currently limited to 30 fps for full video and AI processing together.
All in all, while this is only the first generation of chips, you can expect this type of skill to be transferred to other more conventional chips over time, and so it's a significant development that's worth it to be treated.
Do you have any ideas on how on-chip AI would enable a camera function that you would like to see? Let us know in the comments below!