Goo Goo Gaga is a simple interjection. It is onomatopoeia from the Greek “onoma” for “name” and “poiein” for “to make,” meaning “to make a name”. It is a phrase that resembles the sounds that a baby makes before they are old enough and smart enough to carry on a conversation (Figure 1). Somehow, as if by osmosis, babies naturally learn to speak and ultimately converse in the native language they find themselves living in. If only it were like this for computers and machines. Instead, these items need to be programmed by humans. Natural language processing (NLP), a subset within the domain of artificial intelligence (AI), combines high technology with the linguistics of human language, enabling machines and humans to communicate. It provides machines with the ability to understand both written and spoken human communication. It allows humans and machines to speak the same language and to talk to each other to exchange information and ideas. This article will look at how AI is helping us to have conversations with our machines.
Figure 1: A young boy babbling on a mobile phone. He is too young to have a meaningful discussion. (Source: Mouser Electronics)
Human Language
The field of linguistics studies the methods and manner whereby processed information is communicated both internally and externally to things with intelligence. Humans use what is called natural language. Natural language can be coupled with technology. Conversational AI is helping to merge humans’ sophisticated communication and intellectual abilities with their technological capabilities.
Human communication happens by way of complex symbols that are perceivable to the senses. Notable human examples include speech (hearing), written or sign language (visual), and physical contacts such as handshakes and hugs (touch). Raymond Kurzweil, who was hired by Google in 2012 with the mission of bringing natural language insight and understanding to the company, asserts that human language was the first invention of humanity. Language is a way that humans can work together and build a society, have culture, and create technology.
All intelligence-manipulating communication requires a method to structure language. Grammar, syntax, and discourse provide structure to a language so that its constituent components may be appropriately understood and interpreted. Author, critic, and educator Neil Postman (1931–2003) believes that language is “pure ideology” and should be viewed as an “invisible technology.” By this, Postman means that language is not neutral. It is a reflection of the starting assumptions and its use frames the entire informational content that intelligence utilizes.
Interpretation is the art of adequately receiving communication and processing it in the manner that was intended by the communicator. The circumstances surrounding the specific grammar, syntax, and discourse employed are called the context. It provides the external and internal environment into which the information is being processed. The context that language finds itself in is a critical key in ascertaining what a communication means from the perspective of the communicator’s intention. Because the intention of the communicator matters to the context, the issue of agency is brought to bear. If intention is not part of the communicated message, then whatever is transmitted cannot produce meaningful action since it will have only been derived from an original set of happenstance, from which it is not possible to assert meaning.
Humans can create and utilize symbols to express themselves in new and unique ways without limitation. A human can cry in pain, read Shakespeare, or sing an opera (Figure 2). These symbols come to have meaning as a result of social interaction and agreement. Because human language is based upon social purpose, it allows both change over time and unlimited variety as society develops new symbols to communicate what people experience.
Figure 2: The title page from an antique book of the plays of Shakespeare. (Source: Mouser Electronics)
Other life forms can communicate in a manner that is natural yet distinct and different from human language. This communication is generally a form of signaling understood within the species, but it does not involve the manipulation of symbols and creative thought. For example, a dog’s bark may provide information to other dogs who receive it and understand in a manner beyond general human understanding. Animals may also use other ways to communicate that are not inherently understandable to humanity at initial glances, such as the abilities of bees to do a dance that indicates the direction to fly to obtain pollen for the hive. Scientists recognize that, even though animals may communicate with other animals of their species, there is no animal, including apes and chimpanzees, that can manipulate signs and symbols to the degree that humanity can. Animals only work and communicate regarding particular contexts and do not communicate regarding universal or abstract relationships.
From Humans to Machines
In contrast, machines and computers do not use human language. Their intelligence happens in the form of AI. All AI utilizes programming, which enables it to receive information, compute, and act in an attempt to make sense of what it is experiencing. Humans have created these machines and programming languages to be able to participate in what the machine is capable of doing.
These languages follow a specific set of rules that have been agreed upon by social and primarily scientific convention. Because of the general desire to be utilized universally, they are most frequently constructed with formality; that is, there is a universally agreed-upon method to the logic contained within the artificial language. Artificial languages (machine code or code) can be set up to perform specific predefined tasks.
Programming is the art and science of writing machine code. It is performed by manipulating the functionally equivalent elements found in human language, including grammar, syntax, semantics, and discourse. Programming is initially set up by humans but can be assigned to be done by machines (robots/computers) after the initial setup. An algorithm is a set of instructions that have been formatted and arranged to achieve a specific function. Programming code is generally broken down into a long series of discrete binary digital signals. These signals, representing particular ON and OFF sequences, are then stored, analyzed, and processed in conjunction with the available intelligence of the machine. All AI programming is based upon human conceptions of structure. AI semantics and syntax thus function in a manner that emulates humans rather than, for instance, another species like apes, dolphins, or rats.
Large Model Sizes
Human language is vast and complicated. It is a collection of shared knowledge and wisdom. Understanding and meaning are derived from experience and context. The tremendous amount of variables means that the model size for one language, such as English, is vast. When expanded to understand other languages that operate in different ways such as Chinese, French, German, Hindi, Japanese, Spanish, etc. the model sizes required are genuinely staggering. Language models must train on the most extensive and broadest data sets available to capture the most exceptional level of nuance implied in the message. The upshot is that AI and NLP models must be able to handle a vast amount of data and access it quickly and efficiently to have everything needed for understanding.
High Computation Demands
Machines must be able to train themselves quickly from a vast field of language in order to understand humans. This requires high computational capabilities. GPUs, FPGAs, CPUs, ASICs, crossover processors, and microcontroller units (MCUs) are necessary elements for any successful implementation. Let’s look further at how a crossover processor might be part of a conversational AI solution.
Applications processors and MCUs are employed in embedded applications. Applications processors provide excellent integration and performance, while MCUs are easy-to-use and low cost. NXP Semiconductors has placed these two products together to provide one part that can simultaneously provide high performance, low latency, power efficiency, and security in a low cost part. This product is ideally suited to handle a variety of human language and voice-assistance applications.
NXP Semiconductors i.MX RT106A Crossover Processor is a solution specific variant of the i.MX RT1060 family of MCUs, targeting cloud-based embedded voice applications (Figure 3). It features NXP’s advanced implementation of the Arm Cortex-M7 core, which operates at speeds up to 600MHz to provide high CPU performance and the best real-time response. i.MX RT106A based solutions enable system designers to easily add voice control capabilities to a wide variety of smart appliances, smart home, smart retail, and smart industry devices. The i.MX RT106A is licensed to run NXP turnkey voice-assistant software solutions, which may include a far-field audio front-end softDSP, wake-word inference engine, media player/streamer, and a host of associated items.
Figure 3: NXP Semiconductors i.MX RT106A Crossover Processor. (Source: Mouser Electronics)
Instant Inferencing
Machines must also be able to train themselves quickly from the massive field of human language and be able to draw exceptionally fast inferencing with extremely low latency times if not in real-time. Products like Intel Xeon Second Generation Scalable Gold Processors are enhanced to produce excellent inferencing results (Figure 4).
Figure 4: Intel Xeon Second Generation Scalable Gold Processors optimized for inferencing. (Source: Mouser Electronics)
These Intel processors are 64-bit, multicore server microprocessors built on 14nm lithography process technology. The processors are based on the Cascade Lake microarchitecture that allows for higher clock speeds. The processors are also optimized for demanding mainstream data centers, multi-cloud computing, and network and storage workloads. These processors offer up to 22 cores/44 threads and feature Intel Turbo Boost Technology 2.0 that ramps up to 4.4GHz. The processors also feature up to four-socket scalability and support up to 46-bits of physical address space and 48-bits of virtual address space. The devices take embedded AI performance to the next level with new AI acceleration, including new Intel Deep Learning Boost.
Conclusion
Technology is talking sense. The implementation of AI and NLP is an example of an emerging technology that will allow humans and machines to communicate seamlessly and in real-time. Humans and machines are now starting to speak the same language.
About the Author
Paul Golata joined Mouser Electronics in 2011. As a Senior Technical Content Specialist, Golata is accountable for contributing to the success in driving the strategic leadership, tactical execution, and overall product line and marketing direction for advanced technology related products. Golata provides design engineers with the newest and latest information delivered through the creation of unique and valuable technical content that facilitates and enhances Mouser Electronics as the preferred distributor of choice.