In daru, the Daru::Vector is a 1 dimensional array with axis labels.
Labels should be unique. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods automatically exclude missing data (currently represented by default as nil).
Operations between Vectors (+, -, /, , *) align values based on their associated index values. The Vectors need not be of similar length. The result index will be the sorted union of the two indexes.
Daru::Vector is similar to pandas.Series.
The examples below demonstrates how a simple Daru::Vector can be created and its data viewed.
This first example shows very basic creation of a Vector with missing data (represented by nil).
require 'daru'
true
The very basic way to create a Vector is by just passing an Array of values into the constructor.
Index labels can be specified using the :index
option and you can also name your Vector something using the :name
option. In case :index
isn't specified, the Vector will be assigned an index starting from 0.
a = Daru::Vector.new([1,2,3,4,5], index: [:a, :b, :c, :d, :e], name: :bazinga)
Daru::Vector:30420060 size: 5 | |
---|---|
bazinga | |
a | 1 |
b | 2 |
c | 3 |
d | 4 |
e | 5 |
Values can be accessed using their labels with the #[]
operator.
a[:b]
2
OR you can even specify a range with labels...
a[:b..:d]
Daru::Vector:29850660 size: 3 | |
---|---|
bazinga | |
b | 2 |
c | 3 |
d | 4 |
Values can be assigned with the #[]=
operator.
a[:b] = 999
a
Daru::Vector:30420060 size: 5 | |
---|---|
bazinga | |
a | 1 |
b | 999 |
c | 3 |
d | 4 |
e | 5 |
If you want to treat values apart from nil as missing, you can specify them using the :missing_values
option.
The #only_valid
method can then be used for obtaining all the non-missing values of the Vector. Notice that only_valid
preserves the indexes (labels) of the data.
a = Daru::Vector.new([1,2,3,5,5,4,6,nil,nil], missing_values: [5,nil])
a.only_valid
Daru::Vector:29284260 size: 5 | |
---|---|
nil | |
0 | 1 |
1 | 2 |
2 | 3 |
5 | 4 |
6 | 6 |
The Vector.[]
class method creates a vector from almost any object that has a #to_a
method defined on it. It is similar to R's c
method.
b = Daru::Vector[1,2,3,4,6..10]
Daru::Vector:28825140 size: 9 | |
---|---|
nil | |
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 6 |
5 | 7 |
6 | 8 |
7 | 9 |
8 | 10 |
The new_with_size class method lets you create a Daru::Vector by specifying the size as the argument. The optional block, if supplied, is run once for populating each element in the Vector.
The result of each run of the block is the value that is ultimately assigned to that position in the Vector.
a = Daru::Vector.new_with_size(1000, name: :new_vector) { r=rand(5); r == 4 ? nil: r; }
Daru::Vector:28500640 size: 1000 | |
---|---|
new_vector | |
0 | 2 |
1 | 3 |
2 | 0 |
3 | |
4 | |
5 | 2 |
6 | 2 |
7 | 2 |
8 | 1 |
9 | 1 |
10 | 3 |
11 | 0 |
12 | 2 |
13 | 0 |
14 | 3 |
15 | 1 |
16 | 3 |
17 | 1 |
18 | 3 |
19 | 0 |
20 | 1 |
21 | 1 |
22 | 2 |
23 | 2 |
24 | 2 |
25 | 3 |
26 | 3 |
27 | 2 |
28 | 0 |
29 | |
30 | 3 |
31 | 3 |
... | ... |
999 | 3 |
Use the #head
method for obtaining the top 10 values of the Vector.
a.head
Daru::Vector:27175540 size: 10 | |
---|---|
new_vector | |
0 | 2 |
1 | 3 |
2 | 0 |
3 | |
4 | |
5 | 2 |
6 | 2 |
7 | 2 |
8 | 1 |
9 | 1 |
The Daru::Vector#sort
method will sort the Vector and preserve the indexes.
a = Daru::Vector.new([23,144,332,11,2,5,6765,3])
Daru::Vector:25317760 size: 8 | |
---|---|
nil | |
0 | 23 |
1 | 144 |
2 | 332 |
3 | 11 |
4 | 2 |
5 | 5 |
6 | 6765 |
7 | 3 |
a.sort
Daru::Vector:24840120 size: 8 | |
---|---|
nil | |
4 | 2 |
7 | 3 |
5 | 5 |
3 | 11 |
0 | 23 |
1 | 144 |
2 | 332 |
6 | 6765 |
Arithmetic operations done between two vectors will always perform the arithmetic on corresponding elements of the same index.
The concerned vectors need not have the same size of even the same index. In case of a mismatch, a sorted union of the indexes of both the Vectors is used as an index for the resulting vector.
In case a particular index exists in one vector but not in the other, the result Vector has a nil placed in that index position.
Daru::Vector supports +, -, *, / and ** operators.
a = Daru::Vector.new([1,2,3,4,5,6], index: [:a, :b, :c, :d, :five, :f])
b = Daru::Vector.new([1,2,3,4,5], index: [:a, :b, :c, :ff,:five])
a + b
Daru::Vector:24525720 size: 7 | |
---|---|
nil | |
a | 2 |
b | 4 |
c | 6 |
d | |
f | |
ff | |
five | 10 |
a ** b
Daru::Vector:24243560 size: 7 | |
---|---|
nil | |
a | 1 |
b | 4 |
c | 27 |
d | |
f | |
ff | |
five | 3125 |
Performing arithmetic with a single number will perform the operation on each element in the Vector and return the resultant Vector.
a * 5
Daru::Vector:23813900 size: 6 | |
---|---|
nil | |
a | 5 |
b | 10 |
c | 15 |
d | 20 |
five | 25 |
f | 30 |
Daru::Vector defines a host of statistics methods, which are useful for performing ephemeral statistics on numeric data. All the statistics methods ignore the missing values and work only on the valid data.
For a complete list of statistics functions see the Daru::Maths::Statistics::Vector module in the docs.
v = Daru::Vector.new([1,2,3,4,5,nil,6,nil,7])
v.mean
4.0
v.variance
4.666666666666667
v.median
4
Daru uses nyaplot internally for generating interactive plots.
You can also use rubyvis through statsample for quickly generating scatter plots, histograms and box plots.
A simple scatter plot can be generated by simply calling the #plot
function on Daru::Vector. Feel free to interact with the generated plot.
v = Daru::Vector.new((0..360).step(7).map { |i| Math.sin((i*Math::PI)/180) })
v.plot
Now, lets take some dummy data of a survey that shows the number of people of each age group that are part of this survey. We want to plot the number of people from each age group who have taken the test in a bar graph.
For this purpose we use the #plot
function again, but this time supply it with the :type
option, and set the value of this option as :bar
. The plot function yeilds the corresponding Nyaplot::Plot
object in the block, which can then be used for setting different parameters of the final plot. For more configuration methods see the Nyaplot::Plot documentation.
v = Daru::Vector.new([40,50,20,70,10], index: ['18-24', '24-30', 'Under 18', '30-40', '40-50'], name: "Age Range")
v.plot(type: :bar) do |plt|
plt.x_label "Age Groups"
plt.y_label "Number of People Surveyed"
end
The third kind of plot that Daru::Vector can easily generate from nyaplot is the histogram.
To demonstrate, we'll prepare some sample data using the rnorm
function from the statsample
ruby gem. The rnorm function just generates normally distributied random variables (1000 in this case) and returns a Daru::Vector object that contains these numbers (in variable a
).
A histogram of the normally distributed function has been generated below.
require 'statsample'
include Statsample::Shorthand
a = rnorm(1000)
a.plot type: :histogram do |p|
p.yrange [0,200]
p.y_label "Frequency"
p.x_label "Bins"
end
Apart from interfacing with nyaplot, Daru::Vector also works out-of-the-box with rubyvis through statsample. To see generating plots with statsample and rubyvis in action, checkout the following notebooks: