-
Notifications
You must be signed in to change notification settings - Fork 7
Splits
Randall O'Reilly edited this page Aug 8, 2021
·
2 revisions
The split package has key functions that create and manipulate the Splits list of IdxView index views into a Table.
GroupBy creates splits by grouping together all rows that have the same value in a given (set of) column(s). Here's an example from the dataproc code:
byMethod := split.GroupBy(PlanetsAll, []string{"method"})
split.Agg(byMethod, "orbital_period", agg.AggMedian)
GpMethodOrbit = byMethod.AggsToTable(etable.AddAggName) // etable.AddAggName or etable.ColNameOnly for naming cols
This creates the splits for the "method" column (PlanetsAll
is an IdxView of the full set of data), and then Agg computes the Median of the column "orbital_period" for each group. The last line caches out the aggregated data into a new table that can be viewed etc.
See Agg
Permuted splits rows in a random (shuffled, permuted) ordering according to various proportions of row numbers. For example, here's code that creates a random train / test split of a set of patterns:
all := etable.NewIdxView(ss.Pats)
splits, _ := split.Permuted(all, []float64{.8, .2}, []string{"Train", "Test"})
ss.TrainEnv.Table = splits.Splits[0] // IdxView of "Train", 80% of rows
ss.TestEnv.Table = splits.Splits[1] // IdxView of "Test", 20% of rows