Next: Exploratory Data Analysis
Data Organization and Management
Statistics Toolbox provides two specialized arrays for storing and managing statistical data: dataset arrays and categorical arrays.
Dataset arrays enable convenient organization and analysis of heterogeneous statistical data and metadata. Dataset arrays provide columns to represent measured variables and rows to represent observations. With dataset arrays, you can:
- Store different types of data in a single container.
- Label rows and columns of data and reference that data using recognizable names.
- Display and edit data in an intuitive tabular format.
- Use metadata to define units, describe data, and store information.
Dataset array displayed in the MATLAB Variable Editor. This dataset array includes a mixture of cell strings and numeric information, with selected columns available in the Plot Selector Tool.
Statistics Toolbox provides specialized functions to operate on dataset arrays. With these specialized functions, you can:
- Merge datasets by combining fields using common keys.
- Export data into standard file formats, including Microsoft® Excel® and comma-separated value (CSV).
- Calculate summary statistics on grouped data.
- Convert data between tall and wide representations.
Categorical arrays enable you to organize and process nominal and ordinal data that uses values from a finite set of discrete levels or categories. With categorical arrays, you can:
- Decrease memory footprint by replacing repetitive text strings with categorical labels.
- Store nominal data using descriptive labels, such as red, green, and blue for an unordered set of colors.
- Store ordinal data using descriptive labels, such as cold, warm, and hot for an ordered set of temperature measurements.
- Manipulate categorical data using familiar array operations and indexing methods.
- Create logical indexes based on categorical data.
- Group observations by category.