Hello, Vitor!
DataFrame is a tabular data structure that works like a spreadsheet or a database. Each column is a feature and every row is an observation.
DataSeries is an array-like data structure for working with one specific feature (column) or observation (row).
To put it simply:
- DataFrame is 2-dimensional (table)
- DataSeries is 1-dimensional (row or column)
Consider the following data frame
df := DataFrame fromRows: #(
('Barcelona' 1.609 true)
('Dubai' 2.789 true)
('London' 8.788 false)).df columnNames: #(City Population SomeBool).
df rowNames: #(A B C).
The pretty-printed table (Ctrl+P) would look like this
| City Population SomeBool
---+-----------------------------------
A | Barcelona 1.609 true
B | Dubai 2.789 true
C | London 8.788 falseIf we ask this DataFrame for #Population column, we will get a DataSeries
series := df column: #Population.All elements in this series are objects of BoxedFloat64, which means that the series itself reflects the behavior of this class — you can multiply it by 2, or compare it to some number (this reflection was described in the blog post). The series stores its name #Population and keys #(A B C) — this way each column or row preserves the indexing of a data frame.
The printed version (Ctrl+P) may look like a data frame, but in fact it’s just one column (one-dimensional series)
| Population
---+------------
A | 1.609
B | 2.789
C | 8.788If we ask for row #C, it will also be DataSeries
series := df row: #C.The elements of this series belong to different classes: ByteString, BoxedFloat64, and Boolean. So it can’t be multiplied or converted to uppercase.
| C
------------+--------
City | London
Population | 8.788
SomeBool | falseI would suggest that you read the DataFrame tutorial on GitHub: https://github.com/PolyMathOrg/DataFrame, but it’s a bit outdated. I will update it as soon as I have a decent internet connection.
Still, give it a try.
Hope this helps,
Oleks
