|
| 1 | + |
| 2 | +# purescript-dataframe |
| 3 | + |
| 4 | +[](https://github.com/gabysbrain/purescript-dataframe/releases) |
| 5 | + |
| 6 | +A datastructure designed to be used with queries as well as a type for |
| 7 | +queries. There is also semantics for combining the queries. |
| 8 | + |
| 9 | +# Example |
| 10 | + |
| 11 | +```purescript |
| 12 | +main = do |
| 13 | + let df = init [1, 2, 3, 4, 5, 6, 7] |
| 14 | + q = filter (\x -> x > 3) `chain` |
| 15 | + mutate show `chain` |
| 16 | + trim 3 |
| 17 | + putStrLn $ runQuery q df |
| 18 | +``` |
| 19 | + |
| 20 | +# Getting started |
| 21 | + |
| 22 | +## Installation |
| 23 | + |
| 24 | +``` |
| 25 | +bower install purescript-dataframe |
| 26 | +``` |
| 27 | + |
| 28 | +## Queries |
| 29 | + |
| 30 | +The idea of a Query type is that we want to have a type-safe way to chain |
| 31 | +operations on dataframes and we want to maintain the original dataset |
| 32 | +throughout the query. In other data processing languages this is a common |
| 33 | +source of error, especially when mutating rows. |
| 34 | + |
| 35 | +The set of dataframe operations are based on what's offered by the |
| 36 | +[dplyr](https://github.com/tidyverse/dplyr) R package. |
| 37 | + |
| 38 | +* `filter :: forall r. (r -> Boolean) -> Query (DataFrame r) (Dataframe r)` |
| 39 | + filter rows of the DataFrame |
| 40 | +* `group :: forall r g. Ord g => (r -> g) -> Query (DataFrame r) (DataFrame {group :: g, data :: Dataframe r})` |
| 41 | + group the rows of the dataframe by some grouping method |
| 42 | +* `count :: forall r g. Ord g => (r -> g) -> Query (DataFrame r) (DataFrame {group :: g, count :: Int})` |
| 43 | + group the rows of the dataframe and count the size of the groups |
| 44 | +* `summarize :: forall r x. (r -> x) -> Query (DataFrame r) (Array x)` |
| 45 | + convert each row of the dataframe to some type and return an array |
| 46 | +* `mutate :: forall r s. (r -> s) -> Query (DataFrame r) (Dataframe s)` |
| 47 | + change each row of the dataframe to some other type |
| 48 | +* `sort :: forall r. (r -> r -> Ordering) -> Query (DataFrame r) (Dataframe r)` |
| 49 | + sort the rows of the dataframe given the ordering function |
| 50 | +* `trim :: forall r. Int -> Query (DataFrame r) (Dataframe r)` |
| 51 | + keep only the first n rows of the DataFrame |
| 52 | + |
| 53 | +The `chain :: forall r s t. Query r s -> Query s t -> Query r t` function |
| 54 | +allows us to chain queries and keep the original context. |
| 55 | + |
| 56 | +# API Docs |
| 57 | + |
| 58 | +API documentation is [published on Pursuit](http://pursuit.purescript.org/packages/purescript-dataframe). |
| 59 | + |
| 60 | +# Todos |
| 61 | + |
| 62 | +* DataFrames should be able to operate either as a set of columns or a set of rows. |
| 63 | +* Queries should do caching |
| 64 | + |
0 commit comments