5 Manipulating Data

5.1 Computing Mean/Min/Max, etc value of an Array

Many base R generic functions such as mean(), min(), and max() have been mapped to their Arrow equivalents, and so can be called on Arrow Array objects in the same way. They will return Arrow objects themselves.

my_values <- Array$create(c(1:5, NA))
mean(my_values, na.rm = TRUE)
## Scalar
## 3

If you want to use an R function which does not have an Arrow mapping, you can use as.vector() to convert Arrow objects to base R vectors.

fivenum(as.vector(my_values))
## [1] 1 2 3 4 5

5.2 Counting occurrences of elements in an Array

Some functions in the Arrow R package do not have base R equivalents. In other cases, the base R equivalents are not generic functions so they cannot be called directly on Arrow Array objects.

For example, the value_count() function in the Arrow R package is loosely equivalent to the base R function table(), which is not a generic function. To count the elements in an R vector, you can use table(); to count the elements in an Arrow Array, you can use value_count().

repeated_vals <- Array$create(c(1, 1, 2, 3, 3, 3, 3, 3))
value_counts(repeated_vals)
## StructArray
## <struct<values: double, counts: int64>>
## -- is_valid: all not null
## -- child 0 type: double
##   [
##     1,
##     2,
##     3
##   ]
## -- child 1 type: int64
##   [
##     2,
##     1,
##     5
##   ]

5.3 Applying arithmetic functions to Arrays.

You can use the various arithmetic operators on Array objects.

num_array <- Array$create(1:10)
num_array + 10
## Array
## <double>
## [
##   11,
##   12,
##   13,
##   14,
##   15,
##   16,
##   17,
##   18,
##   19,
##   20
## ]

You will get the same result if you pass in the value you’re adding as an Arrow object.

num_array + Scalar$create(10)
## Array
## <double>
## [
##   11,
##   12,
##   13,
##   14,
##   15,
##   16,
##   17,
##   18,
##   19,
##   20
## ]