The Ultimate AI and ML Glossary

Feb 4, 2020 9:15:00 AM

ai_ml blog

Artificial intelligence (AI) and machine learning (ML)--two things we’ve all been hearing plenty about. Our AI and ML super-page lays out everything you need to know about these technologies in one spot. But, what about all the other terms that surround them? Here’s the ultimate AI and ML glossary, defining all the commonly used terms, so you’ll never be confused again. From digital footprints to reinforcement learning, we break it all down; learn all you need to know about the next-level technologies we use at Edify daily and that impact the contact center and CX industries...and beyond. Life is easier when you’re in-the-know.

 

Algorithm: a process or series of instructions to be followed to calculate and generate problem-solving operations by a computer (used to generate machine learning models)

Artificial intelligence: a wide-ranging branch of computer science that aims to replicate or simulate human intelligence in machines; the theory and development of computer systems to be able to perform tasks that normally require human intelligence; true AI requires free cognitive thoughts and thus doesn’t exist yet

Attribute: a piece of information to determine the properties of a field or tag in a database (e.g. customer ID, first name, address)

Augmented intelligence: a concept of AI that focuses on emphasizing the fact that AI is designed and built to enhance human intelligence, not replace it (AI and humans should work together)

Bias metric: the average difference between what you predicted and the correct value

Categorical variables: variables with a discrete set of possible values (examples are sex, age group, and educational level); these represent types of data may be divided into groups

Classification: systematic arrangements in groups or categories according to previously established criteria / predicting a categorical output

  • Binary classification: classifying instances into one of two (is this person an existing customer or not a customer?)
  • Multi-class classification: the problem of classifying instances into one of three or more classes (classifying digits)


Classification threshold: a decision threshold, i.e., the lowest probability value at which you’re comfortable asserting a positive classification

Clustering: a grouping of data points with similar numerical values / grouping similar data into buckets; an unsupervised machine learning algorithm where the target is not known

Confusion matrix: a table used to describe the performance of a classification model, or a classifier, on a set of test data for which the true values are known

Continuous variables: variables with a range of possible values defined by a number of scales (income, lifespan, etc.)

Convergence: the integration of two or more different technologies in a single device/system

Data footprints: the breadth of space a business spans in terms of its deployed technology stack

Deduction: a top-down approach to problem-solving; begin with a theory, test that theory with observations, and then derive a conclusion

Deep learning: a subset or class of machine learning algorithms that is capable of learning unsupervised from unstructured data; can identify concepts relevant to humans, like letters or faces

Digital footprints: the data footprints of experiences—such as past behavior data, customer characteristics, and survey response data; these can help predict future customer behavior

Dimension: how many features you have in your data set (i.e., if you are only talking about one specific feature of your data set, that would be 1-dimensional data)

Experience data: goes beyond traditional data to combine activity, learning, behavior and performance; it enhances our ability to detect patterns in customer behavior by looking at customer comments, feedback scores, voice of the customer, and more; can suggest actions that might affect the CX and better direct your agents

Feature: the basic building blocks of datasets; features represent an attribute and value combination

Feature selection: the process of selecting relevant features from a data set for creating a ML model

Hyperparameters: what you use to actually teach and tweak the AI's prediction guidelines (parameters)

Image segmentation: identifying parts of an image and understanding what object they belong to

  • It involves dividing a visual input into segments to simplify the image analysis, where segments represent objects or parts of objects.
  • Deep learning can understand patterns in visual inputs to predict object classes that make up an image.

 

Induction: a bottom-up approach to problem-solving; moving from observations to theory

Instance: a single data point or sample in a dataset 

Machine learning (ML): an application of AI that gives computer systems the ability to learn and improve with each experience, relying heavily on data and natural language processing; ML accomplishes specific tasks by processing large amounts of data, recognizing patterns, and adjusting the response

  • A common use case is the implementation of smart bots.
  • Machine learning models requires constant feedback for advancing.
  • The more data and input humans can provide as feedback, the more human-like machine learning agents will be.

 

Machine learning bias / AI bias: a phenomenon that occurs when an algorithm produces results that are systematically prejudiced due to erroneous assumptions in the ML process

Model: a data structure or simulation used to reproduce the behavior of a system

Neural networks: mathematical algorithms designed to recognize patterns and relationships in data

Noise: irrelevant information in a dataset that obscures the underlying pattern

Observation: a data point or sample in a dataset (the same as instance)

Outlier: a data point that differs significantly from other observations / falls outside the range of probability for a data set

Overfitting: with machine learning, this is essentially over-training the AI, where it relies too heavily on the training data you gave it to make a prediction; receiving a high error rate based on excess data to learn from; occurs when your model learns the training data too well and incorporates details and noise specific to your dataset

  • You can tell a model is overfitting if it performs well on your training/validation set but performs poorly on your test set.

 

Parameters: what the AI 'learns' to use to make its predictions after you train it; as AI is learning, it develops its own markers, or parameters, to use when making predictions

Recall vs. Precision: recall refers to the percentage of total relevant results correctly classified by your algorithm (a measure of completeness or quantity), while precision refers to the percentage of your results which are relevant (a measure of exactness or quality)

Regression: when software begins lacking in performance; it still functions, but performs more slowly or uses up more resources than it previously did

Reinforcement learning: a programming area of ML that trains chatbots on how to take actions using a system of reward and punishment; software agent learns by interacting with its environment; one of three learning models (reinforcement learning, supervised learning, and unsupervised learning)

Segmentation: clustering customer observations to decide how to proceed based on user needs and wants

Supervised learning: a programming area of ML that trains chatbots on how to take actions using a labeled dataset; one of three learning models (reinforcement learning, supervised learning, and unsupervised learning)

Test set: a set of observations or examples to validate the predictive power of your model; used only to assess and evaluate the performance of a fully-specified classifier

Text analytics: examining text that was written by customer and intelligently grouping feedback into topics and associated sentiments (powered by ML); automatically determines patterns and topics of interest; empowers agents to decide which insights to take action on

Training set: a set of examples used to fit the parameters; a set of observations used to generate ML models

Underfitting: with machine learning, this is essentially under-training the AI, when your model over-generalizes and fails to incorporate relevant variations in your data to gain more predictive power; underfitting models will perform poorly on training and test sets

Unsupervised learning: a programming area of ML that trains chatbots on how to find patterns in an unlabeled dataset/clustering; one of three learning models (reinforcement learning, supervised learning, and unsupervised learning)

Validation set: a set of examples used to select and tune the final AI/ML model; used during training to provide feedback on how well the current hyperparameters work

Kendal Rodgers

Written by Kendal Rodgers

Kendal is the Marketing Manager at Edify and has been writing and curating content most of her professional career. She’s passionate about working with start-ups and sharing life experiences through storytelling. Kendal earned her B.S. in Marketing and International Studies from the IU Kelley School of Business.