data-analysis

Definition

Numerical Data

Numerical data (or quantitative data) is a data type consisting of numerical values that represent measurable quantities. Formally, numerical data is characterised by values from a set where arithmetic operations (e.g., addition, subtraction) and ordinal comparisons are well-defined. Numerical data is further categorised into two fundamental types:

  • Discrete Data: Observations that take distinct, separate values (typically integers), such as counts.
  • Continuous Data: Observations that can take any value within a given interval, such as measurements of length or time.

Discrete vs. Continuous

Discrete

These variables are restricted to a countable set of values. In the context of the course examples, this includes the number of items (e.g., {1, 2, 3} apples) or specific timestamps.

Continuous

These variables reside in an uncountable set, typically or a sub-interval thereof. This includes physical dimensions such as height, weight, or the diameter of a mushroom cap.

Preprocessing

  • Deal with missing values (delete or impute)
  • Discretise where necessary/applicable (e.g. put age values into age groups, such as 15-25)
  • Scale your data (e.g.: min-max, mean normalisation, …)
    • Scaling should be done on feature level, not on data set level
    • each feature/attribute is scaled individually as a feature such that no statistics from other features influence the scaling