Loading and saving data (
Orange.data.Table supports loading from several file formats:
Comma-separated values (*.csv) file,
Tab-separated values (*.tab, *.tsv) file,
Excel spreadsheet (*.xls, *.xlsx),
In addition, the text-based files (CSV, TSV) can be compressed with gzip, bzip2 or xz (e.g. *.csv.gz).
The data in CSV, TSV, and Excel files can be described in an extended three-line header format, or a condensed single-line header format.
Three-line header format¶
A three-line header consists of:
Feature names on the first line. Feature names can include any combination of characters.
Feature types on the second line. The type is determined automatically, or, if set, can be any of the following:
d) — imported as
a space-separated list of discrete values, like "
male female", which will result in
Orange.data.DiscreteVariablewith those values and in that order. If the individual values contain a space character, it needs to be escaped (prefixed) with, as common, a backslash ('\') character.
c) — imported as
text) — imported as
Flags (optional) on the third header line. Feature's flag can be empty, or it can contain, space-separated, a consistent combination of:
c) — feature will be imported as a class variable. Most algorithms expect a single class variable.
m) — feature will be imported as a meta-attribute, just describing the data instance but not actually used for learning,
w) — the feature marks the weight of examples (in algorithms that support weighted examples),
i) — feature will not be imported,
<key>=<value>are custom attributes recognized in specific contexts, for instance
color, which defines the color palette when the variable is visualized, or
type=imagewhich signals that the variable contains a path to an image.
Example of iris dataset in Orange's three-line format
sepal length sepal width petal length petal width iris c c c c d class 5.1 3.5 1.4 0.2 Iris-setosa 4.9 3.0 1.4 0.2 Iris-setosa 4.7 3.2 1.3 0.2 Iris-setosa 4.6 3.1 1.5 0.2 Iris-setosa
Single-line header format¶
Single-line header consists of feature names prefixed by an optional "
string, i.e. flags followed by a hash ('#') sign. The flags can be a consistent
cfor class feature (also known as a target variable or dependent variable),
ifor feature to be ignored,
mfor meta attributes (not used in learning),
Cfor features that are continuous (numeric),
Dfor features that are discrete (categorical),
Tfor features that represent date and/or time in one of the ISO 8601 formats,
Sfor string features.
If some (all) names or flags are omitted, the names, types, and flags are discerned automatically, and correctly (most of the time).