Data type is a simple but very important topic as this forms the foundation of data analysis and hypothesis testing. You go through this module and I promise that you will not face any problem in identifying data types in your future data analysis work.
We will cover following items in this module:
Types of Data
There are three types of data, discrete, continuous and locational data. In our data analysis we mostly use continuous and discrete type of data. When we plan to apply any particular analysis to test a hypothesis, we have to first make sure that required data types are available. Basically application of any analysis type is linked with type of data, we have to first understand the type of data points available. If our data is discrete then we cannot apply some of the analysis types which work with continuous data only(Please refer to Fig-2).
- Data is objective information that everyone can agree on
- What we measure is not the object but some characteristic of it
Discrete data can only be integers as it is count data, for example 2, 40, 41 etc. Counted data or attribute data are answers to questions like “how many”, “how often”, “pass/fail count”.
Only two possible outcomes (yes / no, on time / late, Ok / Not Ok)
- A cab is either on time or late
- An agent is either present or not present
Count of incidences
- Number of Computer breakdowns in a week
- Number of times agent puts client on hold during a call
Variable data is continuous data, this means that the data values can be any real number like 2.12, 3.33, -3.3 etc. This data is measured on a continual scale like distance, time, weight, length etc. Measured data is regarded as being better than counted data. It is more precise and contains more information. For example knowing how much it rained each day is much better information than number of days it rained. However collecting continuous data is time consuming and expensive as compared to counted/discrete data.
Data that can be measured on a Continual Scale with resolution that is limited only by precision of the measuring equipment
- Time it takes to Close a Call
- Actual reporting time of a cab at the gate
- Temperature of the room
- Exchange rate of a currency
- Height of a person
Locational data simply answers the question “where”. Charts that utilize locational data are often called “measles charts” or “concentration chart”. They can also be “Heat map” showing volume or concentration on a map.
Primary Scales of Measurement
There are four primary scales of measurement : nominal, ordinal, interval and ratio. These scales are summarized in Fig – 2.
1.Nominal Scale : This is a figurative labeling scheme in which the numbers serve only as labels or tags for identifying and classifying objects. For example, the number assigned to the runner in a race is nominal. Here each number is assigned to only one runner and the numbers are unique. Another example could be Social Security Number. The numbers in a nominal scale do not reflect the amount of the characteristic possessed by the object. For example a person with higher SSN number is not superior to those with lower value SSN number. The only mathematical operation we can do is counting on nominal scale.
2.Ordinal Scale : An ordinal scale is a ranking scale in which numbers(ranks) are assigned to objects to indicate the relative extent to which the objects posses some characteristic. It indicates the relative position but it doesn’t indicate the magnitude of the difference between the objects. Along with counting, we can calculate percentile, quartile, median, rank-order correlation or other summary statistics from ordinal data.
3.Interval Scale : In an interval scale, scale represents equal distance between the values in the characteristic being measured. The most important point is that in interval scale, location of zero point is not fixed. The difference between any two scale values is identical to the difference between any other two adjacent values.
4.Ratio Scale : Ratio scale possesses all the properties of nominal, ordinal and interval scale and, in addition, an absolute zero point. Common examples of ratio scale are height, weight, distance, age etc. All statistical techniques can be applied to ratio scale.
Data Type of Distributions
Continuous Data – Normal Distribution
Discrete Data – Binomial / Poisson Distribution
Basic Statistics for Continuous Data
Measures of Location
- Arithmetic average of a set of values
- Equally reflects the influence of all values
- Strongly Influenced by extreme values
- reflects the 50% rank
- the center number after a set of numbers has been sorted
- is “robust” to extreme scores
- A single data point appeared maximum no. of times in data set
Continuous Data – Variability
Measures of Spread
- Numerical distance between the highest and the lowest values in a data set.
- The square root of the variance, it is the most commonly used measure to quantify variability
The ‘Variance’ is the square of Standard Deviation; usually it is used for calculation of capability.
There are great online courses available for Six Sigma, PMP, Data Science, Big Data, Machine Learning and Python.
If you want to have a course from a recognized university then Coursera is the place for you. Otherwise I would recommend Simplilearn.
Simplilearn certificate is well recognized in the industry and courses are really helpful.
Top Selling Materials and Templates
SUBMIT YOUR QUERY PLEASE CLICK HERE