-
Essay / Computer Vision: Technology Development and Applications
Table of ContentsIntroductionThe Challenges of Computer VisionThe Role of Machine Learning in Computer VisionApplications of Computer VisionConclusionReferencesIntroductionWhen it comes to storing images on a computer, Anyone somewhat literate in computers and the Internet knows that the computer simply stores images in the form of millions of pixels and bytes. Computer scientists would say that images are just arrays of bytes. There is no reason to think that the computer will know exactly who or what the image represents, because the image is only stored using bytes that differ based on RGB values and transparency. This is changing completely with developments in computer vision over the last sixty years. Computer vision is a subfield of the rapidly growing field of artificial intelligence. As the name suggests, computer vision is a field in which humans attempt to replicate humanity's ability to perceive digital images and videos on a computer. Sight, something most humans take for granted, is incredibly difficult to replicate, especially on a machine that naturally speaks in binary. The difficulty of computer vision seems justified because it has many beneficial and useful applications in the workplace and personal life. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get the original essay Challenges of Computer Vision Computer vision is not natural for computers and yet seems necessary for machines in the age of technology. As Richard Szeliski, founding director of the Computational Photography Group at Facebook, said in his book Computer Vision: Algorithms and Applications, computer vision is very difficult, partly because it is an inverse problem (Szeliski , 2010). Being an inverse problem makes it much more difficult than computer graphics for games or computer-generated imagery for movies, both of which seemed impossible for decades. These advanced problems can be solved by deeply analyzing the physics and model of an object in the physical world and applying it to the generated images (Szeliski, 2010). In particular, computer vision would involve having a real extract of the physical world and describing said extract. Szeliski shed light on the scale of computer vision's challenges by mentioning that even having computer vision function at the same level as a two-year-old was still an elusive dream at the time of publication of his book in 2010 (Szeliski, 2010). . In just under a decade, computer vision has had a breakthrough and surpassed the goal of outperforming a two-year-old in image interpretation. This essentially allowed computers to recognize images better than humans, which seems incredible because imitations are usually worse than the original. Computer vision has made only limited progress in recent years thanks to machine learning, and in particular deep learning. The role of machine learning in computer vision problem with machine learning. Integrating machine learning with computer vision has made the problem much simpler. In 2012, a competition-winning network called AlexNet completely redefined what constituted computer vision (Szegedy, Vanhoucke, Iogge, Shlens & Wojna, 2016). This network was ultimately applied to numerous projects ofcomputer vision and led to the first uses of machine learning and deep learning. The use of deep convolutional neural networks and other forms of deep learning methods and techniques have brought significant improvements to computer vision performance. However, convolutional neural network architecture is more widely used because it works correctly even when the orientation of the image is changed. This is particularly important for many computer vision applications such as image recognition, facial recognition, and label scanning. Athanasios Voulodimos, assistant professor of computer vision research in the Department of Computer Science and Computer Engineering from the University of Western Attica in Greece, stated that deep learning methods have eclipsed basic machine learning techniques, especially in computer vision (Voulodimos, Doulamis, Doulamis, & Protopapadakis, 2018) . To imitate a human function such as perceiving an image, one must also imitate the functioning of the human brain. Voulodimos and his team consisting of three branches at the National Technical University of Athens state that this was achieved primarily using deep convolutional neural network architecture (Voulodimos et al., 2018). This architecture involves obtaining the input and passing it through three types of layers called convolutional layers, pooling layers, and fully connected layers. The process of going through these three types of layers will map the inputs into feature maps where the machine can understand, reduce and simplify these feature maps and then use the neural networks to mimic what is happening in a brain's brain. human when a human sees an image (Voulodimos et al., 2018). Christian Szegedy, a Google researcher specializing in computer vision using deep learning, demonstrated the effectiveness of the architecture in 2016. Using a unique form of this architecture, Szegedy and his team were able to reduce the error rate classification of an image up to only 21.2% with one attempt and 5.6% with five attempts (Szegedy et al., 2016). This is significantly better than computer vision with state-of-the-art machine learning and especially better than computer vision in 2010, when even a two-year-old could beat a computer with image classification. The reliability of computer vision has increased from around 50% before 2012 to almost 99% today in just a few years. This will only increase as computer vision develops by improving convolutional neural network architecture via reducing computational costs and optimizing feature map mapping and reduction (Szegedy et al , 2016).Computer Vision ApplicationsSeeing how difficult computer vision was to implement perfect, there should be many revolutionary applications for it. One of the most useful applications of computer vision allows robots to see and perceive their immediate surroundings. This allows robots to work more flexibly and free themselves from the constraint of a rigid script to perform an action. Weiwei Wan, a researcher at the National Institute of Industrial Science and Technology, worked on such a robot with his team in 2016. He wanted to develop an "intelligent robot assembly system using multimodal vision" that could reproduce a series of 'humans. actions by observing human actions using a live 3D camera (Wan, Lu & Harada, 2016). Beforedeep learning of computer vision, this would be impossible because it remains difficult even with this. According to Wan, the main difficulty lies in the faulty accuracy of visual detection of human movements, which requires both rapid and accurate analysis of 3D images (Wan et al., 2016). Implementing such a robot assembly system would be incredibly innovative, as the system would be able to perfectly mirror the actions of a skilled worker without requiring months or even years of training to reach the same level. This process would be versatile because it follows any action, not just a predefined sequence of actions. Fine-tuning this system would make it more efficient than even the most skilled human workers in any industrial manufacturing assembly. Another useful application that has recently started to gain popularity is facial recognition/detection. In particular, Apple uses a facial recognition system called FaceID to provide convenience to iPhone X and later iPhone users, who use iOS 10 and later (Apple 2017). Facial recognition was previously not feasible because it was inaccurate and slow. With deep learning models, the Apple Computer Vision Machine Learning team achieved this by encrypting and decrypting images used for facial recognition and sending them to and from an image processing unit (Apple 2017). This ensures that the facial recognition process is both fast, accurate and memory efficient. The process of encrypting and decrypting images protects user privacy, which is an important issue in computer vision. Users want to enjoy both the convenience of computer vision facial recognition while maintaining their privacy. With FaceID, users can unlock their phone without needing to enter a password every time they want to access it from a locked state. This provides more convenience and reduces frustration and wasted time. One application of computer vision that might surprise some people is privacy. There are general trends and agreements that better technology means less privacy for users, because technology can become too personalized and personalized information can be sent to databases elsewhere. However, applying computer vision to surveillance tasks can help protect privacy by replacing humans who analyze surveillance camera images. This is better than having no surveillance or having a human analyze surveillance footage. According to Andrew Tzer-Yeu Chen, a member of the Department of Electrical and Computer Engineering at the University of Auckland, the human brain will "inadvertently collect a lot more information" than is indicated, which is not necessary in especially for surveillance to catch criminals (Chen, Biglari-Abhari, & Wang, 2017). Machines with fast and accurate computer vision solve this problem by doing the same work without any human interference. Chen said that while this may not yet be possible with current technologies, it is the best approach to protecting privacy because it affirms confidentiality (Chen et al., 2017). This means that the images are viewed entirely by machines and no archives are kept. Current human surveillance relies primarily on human interpreters, archives and no censorship, which constitutes a violation of privacy0736584515301242