Advertisements

Data Science Fundamentals: Classification and Class Probability Estimation (Scoring)

Over the next 3 months, I will be focusing on Data Science and my next few posts will cover some fundamental topics of Data Science.

The essential purpose of Data Science, like Business Intelligence, is to gain knowledge and insights from data. This knowledge can then be used for a variety of purposes – such as, driving more sales, retaining more employees, reducing marketing costs, and saving lives.

In this post, I will be covering Classification and will include examples to make it more meaningful.  Upcoming posts over the next few days will cover Clustering, Regression, Matching, and other data science fundamental concepts.

Classification is the process of using characteristics, features, and attributes of a data entity (such as a person, company, or thing) to determine what class (group or category) it belongs to and assigning it to that class.  As an example, demographic data is usually a classification – marital status (married, single, divorced),  income bracket (wealthy, middle-class, poor), homeowner status (homeowner or renter), age bracket (old, middle-aged, young), etc.

classification

Shapes are classified by characteristics such as number of sides, length of sides, etc.

When a large amount of data needs to be analyzed, Classification needs to be an automated process.  If the classes are not know ahead of time, a process called Clustering can be used on existing data to discover groups that can in some way be used to form the classes.(Clustering will be covered in an upcoming post)

Class Probability Estimation (Scoring) is the process of producing a score that represents the probability of the data entity being in a particular class.  As an example, Income Bracket – top 5%.

A few Use Cases and examples of Classification and Class Probably Estimation/Scoring are:

(1) Financial: credit risk – High-Risk, Medium-Risk, Low-Risk, Safe.
A person’s past credit history (or lack of one) will determine their credit score. And their credit score will determine what class of credit risk they fall into, and therefore, will determine if they get the loan, and how favorable the terms of the loan would be.

As an example of Class Probability Estimation (Scoring) for this use case, a person may fall in the Low-Risk class, but their credit score (sometime called FICO score) shows that they are in the low-end of the Low-Risk class making them bordering on Medium-Risk.

(2) Marketing: Marketing offer/promotion interest – Highly likely, Likely, Unlikely
Based on past promotions and those who responded to it, classification can be used to determine the likelihood of a person being interested in a specific marketing offer/promotion.  This is known as targeted marketing where specific promotions are sent only to those who will likely be interested, and therefore, different classes/groups may receive different marketing messages from the same company.

As an example of Class Probability Estimation (Scoring) for this use case, a customer or prospect could be scored as 70% Unlikely, or 90% Highly Likely.

(3) Customer Base: Top-customer, Seasonal Customer, Loyal customer, High-Chance of Losing customer, …
A company may use some set of criteria to classify customers into various categories. These categories can be used for various customer-focused efforts, such as marketing, special offers, rewards, and more.

(4) Fraud detection & security:  Transaction or Activity occurrence – Highly Unusual, Unusual, Normal
Based on past activity and all other activities as a whole, a person’s activity/transaction can be classified as unusual or normal, and the appropriate actions taken to protect their accounts.

(5) Healthcare:
Data from past health analysis and treatments can be used to classify the level of a patient’s illness, and classify their treatment class. This will then drive the recommended treatment.

(6) Human behavior/Workforce:
Today’s workforce consists of multiple generations (Baby Boomers, GenX, GenY/Millennials, etc) of workers.  Generational classification of people based on the period in which they were born is used for marketing purposes, but is also used to help educate a diverse workforce on understanding their team members of different generations and how to work with them.

There are of course many more types of classification and use cases. Feel free to share your use cases.

Advertisements

One Response to Data Science Fundamentals: Classification and Class Probability Estimation (Scoring)

  1. Pingback: Data Science Fundamentals: Matching | Business Intelligence - technology, solutions, and resources

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: