{ "contents" : "#' Get ensemble forecast from historical data.\n#' \n#' getHisEnsem use historical data as the forecasting input time series.\n#' \n#' @param TS A time series dataframe, with first column Date, and second column value.\n#' @param example A vector containing two strings showing the start and end date, which represent the \n#' forecasting period. Check details for more information.\n#'\n#' the program will extract every possible period in TS you provided to generate the ensemble. Check details for \n#' more information.\n#' @param interval A number representing the interval of each ensemble member. NOTE: \"interval\" takes\n#' 365 as a year, and 30 as a month, regardless of leap year and months with 31 days. So if you want the interval \n#' to be 2 years, set \\code{interval = 730}, which equals 2 * 365 ; if two months, set \\code{interval = 60}; \n#' 2 days, \\code{interval = 2}, for other numbers that cannot be divided by 365 or 30 without remainder, it will treat the \n#' number as days.By defualt interval is set to be 365, a year.\n#' @param buffer A number showing how many days are used as buffer period for models. Check details for more\n#' information.\n#' \n#' @param plot A string showing whether the plot will be shown, e.g., 'norm' means normal plot (without any process), \n#' 'cum' means cummulative plot, default is 'norm'. For other words there will be no plot.\n#' @param output A string showing which type of output you want. Default is \"data\", if \"ggplot\", the \n#' data that can be directly plotted by ggplot2 will be returned, which is easier for you to make series\n#' plots afterwards. NOTE: If \\code{output = 'ggplot'}, the missing value in the data will\n#' be replaced by \\code{mv}, if assigned, default mv is 0.\n#' \n#' @param name If \\code{output = 'ggplot'}, name has to be assigned to your output, in order to differentiate\n#' different outputs in the later multiplot using \\code{getEnsem_comb}.\n#' \n#' @param mv A number showing representing the missing value. When calculating the cumulative value, \n#' missing value will be replaced by mv, default is 0.\n#' @param ... \\code{title, x, y} showing the title and x and y axis of the plot. e.g. \\code{title = 'aaa'}\n#' \n#' @details \n#' \n#' \\code{example} E.g., if you have a time series from 2000 to 2010. Assuming you are in 2003,\n#' you want to forecast the period from 2003-2-1 to 2003-4-1. Then for each year in your input\n#' time series, every year from 1st Feb to 1st Apr will be extracted to generate the ensemble\n#' forecasts. In this case your input example should be \\code{example = c('2003-2-1', '2003-4-1')}\n#' \n#' \\code{interval} doesn't care about leap year and the months with 31 days, it will take 365 as a year, and 30 as a month.\n#' e.g., if the interval is from 1999-2-1 to 1999-3-1, you should just set interval to 30, although the real interval is 28\n#' days.\n#' \n#' \\code{example} and \\code{interval} controls how the ensemble will be generated. e.g. if the time series is from \n#' 1990-1-1 to 2001-1-1.\n#' \n#' if \\code{example = c('1992-3-1', '1994-1-1')} and \\code{interval = 1095}, note, 1095 = 365 * 3, so the program treat\n#' this as 3 years.\n#' \n#' Then you are supposed to get the ensemble consisting of following part:\n#' \n#' 1. 1992-3-1 to 1994-1-1 first one is the example, and it's NOT start from 1990-3-1.\n#' 2. 1995-3-1 to 1997-1-1 second one starts from 1993, because \"interval\" is 3 years.\n#' 3. 1998-3-1 to 2000-1-1\n#' \n#' because the last one \"2000-3-1 to 2002-1-1\", 2002 exceeds the original TS range, so it will not be included.\n#' \n#' Sometimes, there are leap years and months with 31 days included in some ensemble part, in which case the length of the data will\n#' be different, e.g., 1999-1-1 to 1999-3-1 is 1 day less than 2000-1-1 to 2000-3-1. In this situation,\n#' the data will use example as a standard. If the example is 1999-1-1 to 1999-3-1, then the latter one\n#' will be changed to 2001-1-1 to 2000-2-29, which keeps the start Date and change the end Date.\n#' \n#' If the end date is so important that cannot be changed, try to solve this problem by resetting\n#' the example period, to make the event included in the example.\n#' \n#' Good set of example and interval can generate good ensemble.\n#' \n#' \\code{buffer}\n#' Sometimes the model needs to run for a few days to warm up, before the forecast. E.g., if a forecast starts at\n#' '1990-1-20', for some model like MIKE NAM model, the run needs to be started about 14 days. So the input timeseries\n#' should start from '1990-1-6'.\n#' \n#' Buffer is mainly used for the model hotstart. Sometimes the hot start file cannot contain all the parameters needed,\n#' only some important parameters. In this case, the model needs to run for some time, to make other parameters ready\n#' for the simulation.\n#' \n#' \n#' \\code{name}\n#' Assuming you have two ggplot outputs, you want to plot them together. In this situation, you\n#' need a name column to differentiate one ggplot output from the other. You can assigne this name\n#' by the argument directly, name has to be assigned if \\code{output = 'ggplot'} is selected,\n#' @return A ensemble time series using historical data as forecast.\n#' \n#' @examples\n#' \n#' data(testdl)\n#' \n#' a <- testdl[[1]]\n#' \n#' # Choose example from \"1994-2-4\" to \"1996-1-4\"\n#' b <- getHisEnsem(a, example = c('1994-2-4', '1996-1-4'))\n#' \n#' # Default interval is one year, can be set to other values, check help for information.\n#' \n#' # Take 7 months as interval\n#' b <- getHisEnsem(a, example = c('1994-2-4', '1996-1-4'), interval = 210, plot = 'cum') \n#' # Take 30 days as buffer\n#' b <- getHisEnsem(a, example = c('1994-2-4', '1996-1-4'), interval = 210, buffer = 30)\n#' @importFrom reshape2 melt \n#' @importFrom grDevices rainbow\n#' @import ggplot2\n#' @references \n#' Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software,\n#' 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.\n#' \n#' H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.\n#' @export\n\ngetHisEnsem <- function (TS, example, interval = 365, buffer = 0, plot = 'norm', output = 'data', \n name = NULL, mv = 0, ...) {\n if (!grepl('-|/', TS[1, 1])) {\n stop('First column is not date or Wrong Date formate, check the format in ?as.Date{base} \n and use as.Date to convert.')\n } else if (!grepl('-|/', example[1]) | !grepl('-|/', example[1])) {\n stop('Wrong date format in the example, check the format in ?as.Date{base} \n and use as.Date to convert.')\n } else {\n \n \n \n TS[, 1] <- as.Date(TS[, 1])\n example <- as.Date(example ,tz = '')\n exL <- example[2] - example[1]\n # Test if example is in the range of the TS\n a <- which(TS[, 1] == example[1] | TS[, 1] == example[2])\n if (length(a) < 2) stop('Example is out of the time series, reset example.')\n \n \n \n if (interval %% 365 == 0) {\n d <- interval / 365\n \n # Get sequence of start and end date.\n \n startDate <- rev(seq(from = example[1], to = min(TS[, 1]), by = paste(-d, 'years')))\n endDate <- seq(from = example[2], to = max(TS[, 1]), by = paste(d, 'years'))\n\n n <- length(startDate) + length(endDate) - 1 # example is counted twice, should be subtracted. \n \n # Generate full start date series.\n startDate <- seq(min(startDate), length = n, by = paste(d, 'years'))\n endDate <- startDate + exL\n \n } else if (interval %% 30) {\n d <- interval / 30\n \n # Get sequence of start and end date.\n \n startDate <- rev(seq(from = example[1], to = min(TS[, 1]), by = paste(-d, 'months')))\n endDate <- seq(from = example[2], to = max(TS[, 1]), by = paste(d, 'months'))\n \n n <- length(startDate) + length(endDate) - 1\n \n startDate <- seq(min(startDate), length = n, by = paste(d, 'months'))\n endDate <- startDate + exL\n \n } else {\n d <- interval\n \n # Get sequence of start and end date.\n \n startDate <- rev(seq(from = example[1], to = min(TS[, 1]), by = paste(-d, 'days')))\n endDate <- seq(from = example[2], to = max(TS[, 1]), by = paste(d, 'days'))\n \n n <- length(startDate) + length(endDate) - 1\n \n startDate <- seq(min(startDate), length = n, by = paste(d, 'days'))\n endDate <- startDate + exL\n }\n \n data <- mapply(FUN = function(x, y) extractPeriod_dataframe(dataframe = TS, startDate = x, endDate = y),\n x = startDate, y = endDate)\n \n data <- lapply(1:n, function(x) data.frame(data[, x]))\n \n if (buffer > 0) {\n bufferStart <- example[1] - buffer\n bufferEnd <- example[1] - 1\n bufferTS <- extractPeriod_dataframe(TS, bufferStart, bufferEnd)\n \n data <- lapply(data, function(x) rbind(bufferTS, x))\n \n } else if (buffer < 0) {\n stop ('Buffer should be positive, or reset example.')\n }\n \n \n data_output <- list2Dataframe(data)\n colnames(data_output) <- c('Date', as.character(startDate))\n \n # Rearrange dataframe to make example the first column.\n ind <- match(c('Date', as.character(example[1])), colnames(data_output))\n # when use cbind, to ensure the output is also a dataframe, one inside cbind should be dataframe\n # Even output is alread a dataframe, but when ind is a single number, then output[ind] will\n # not be a dataframe, but an array.\n data_output <- cbind(data.frame(data_output[ind]), data_output[-ind])\n ex_date <- seq(from = example[1] - buffer, to = example[2], by = 1)\n data_output$Date <- ex_date\n colnames(data_output)[2] <- 'Observation'\n \n meanV <- apply(data_output[, 2:ncol(data_output)], MARGIN = 1, FUN = mean, na.rm = TRUE)\n \n data_output <- cbind(data.frame(Date = data_output[, 1]), Mean = meanV, \n data_output[, 2:ncol(data_output)])\n \n data_ggplot <- melt(data_output, id.var = 'Date')\n NAIndex <- is.na(data_ggplot$value)\n data_ggplot$nav <- rep(0, nrow(data_ggplot))\n data_ggplot$nav[NAIndex] <- 1\n \n if (plot == 'norm') {\n data_ggplot$value[NAIndex] <- mv\n \n } else if (plot == 'cum') {\n data_output[is.na(data_output)] <- mv\n cum <- cbind(data.frame(Date = data_output$Date), cumsum(data_output[2:ncol(data_output)]))\n \n data_ggplot <- melt(cum, id.var = 'Date')\n } else {\n stop('plot can only be \"norm\" or \"cum\", do not assign other words')\n }\n \n #generate different colors \n colors = c('brown1', 'dodgerblue3', rainbow(n = length(unique(data_ggplot$variable)) - 2,\n start = 0.1))\n \n theme_set(theme_bw())\n mainLayer <- with(data_ggplot, {\n ggplot(data = data_ggplot) +\n aes(x = Date, y = value, color = variable, group = variable) +\n geom_line(size = 0.5) +\n geom_line(data = data_ggplot[data_ggplot$variable == 'Observation', ], size = 1.6) +\n geom_line(data = data_ggplot[data_ggplot$variable == 'Mean', ], size = 1.6) +\n geom_point(data = data_ggplot[NAIndex, ], size = 3, shape = 4, color = 'black') +\n scale_color_manual(values = colors) +\n labs(empty = NULL, ...) +#in order to pass \"...\", arguments shouldn't be empty.\n theme(axis.text.x = element_text(size = rel(1.8)),\n axis.text.y = element_text(size = rel(1.8)),\n axis.title.x = element_text(size = rel(1.8)),\n axis.title.y = element_text(size = rel(1.8)))\n })\n print(mainLayer)\n \n if (output == 'ggplot') {\n if (is.null(name)) stop('\"name\" argument not found, \n If you choose \"ggplot\" as output, please assign a name.')\n data_ggplot$name <- rep(name, nrow(data_ggplot)) \n data_ggplot$nav <- rep(0, nrow(data_ggplot))\n data_ggplot$nav[NAIndex] <- 1\n\n return(data_ggplot)\n } else {\n return(data_output)\n }\n }\n}\n\n\n\n\n\n\n#' Extract time series from forecasting data.\n#' \n#' getFrcEnsem extract timeseries from forecasting data, if forecasting data has a member session\n#' an ensemble time sereis will be returned, if forecasting data doesn't have a member session, a singe time\n#' series will be returned.\n#' \n#' @param dataset A list containing different information, should be the result of reading netcdf file using\n#' \\code{library(ecomsUDG.Raccess)}.\n#' @param cell A vector containing the locaton of the cell, e.g. c(2, 3), default is \"mean\", representing\n#' the spatially averaged value. Check details for more information.\n#' @param plot A string showing whether the plot will be shown, e.g., 'norm' means normal plot (without any process), \n#' 'cum' means cummulative plot, default is 'norm'. For other words there will be no plot.\n#' @param output A string showing which type of output you want. Default is \"data\", if \"ggplot\", the \n#' data that can be directly plotted by ggplot2 will be returned, which is easier for you to make series\n#' plots afterwards. NOTE: If \\code{output = 'ggplot'}, the missing value in the data will\n#' be replaced by \\code{mv}, if assigned, default mv is 0.\n#' @param name If \\code{output = 'ggplot'}, name has to be assigned to your output, in order to differentiate\n#' different outputs in the later multiplot using \\code{getEnsem_comb}.\n#' @param mv A number showing representing the missing value. When calculating the cumulative value, \n#' missing value will be replaced by mv, default is 0.\n#' @param coord A coordinate of longitude and latitude. e.g. corrd = c(lon, lat). If coord is assigned,\n#' cell argument will no longer be used.\n#' @param ... \\code{title, x, y} showing the title and x and y axis of the plot. e.g. \\code{title = 'aaa'}\n#' \n#' @details \n#' \n#' \\code{cell} representing the location of the cell, NOTE: this location means the index of the cell,\n#' IT IS NOT THE LONGITUDE AND LATITUDE. e.g., \\code{cell = c(2, 3)}, the program will take the 2nd longitude\n#' and 3rd latitude, by the increasing order. Longitude comes first.\n#' \n#' \\code{name}\n#' Assuming you have two ggplot outputs, you want to plot them together. In this situation, you\n#' need a name column to differentiate one ggplot output from the other. You can assigne this name\n#' by the argument directly, If name is not assigned and \\code{output = 'ggplot'} is selected, then\n#' the system time will be selected as name column.\n#' \n#' @examples \n#' \n#' filePath <- system.file(\"extdata\", \"tnc.nc\", package = \"hyfo\")\n\n#' # Then if you don't know the variable name, you can use \\code{getNcdfVar} to get variable name\n#' varname <- getNcdfVar(filePath)\n#' nc <- loadNcdf(filePath, varname)\n#' a <- getFrcEnsem(nc)\n#' \n#' # If there is no member session in the dataset, a single time sereis will be extracted.\n#' a1 <- getFrcEnsem(tgridData)\n#' \n#' \n#' # The default output is spatially averaged, if there are more than one cells in the dataset, \n#' # the mean value of the cells will be calculated. While if you are interested in special cell, \n#' # you can assign the cell value. You can also directly use longitude and latitude to extract \n#' # time series.\n#' \n#' getSpatialMap(nc, 'mean')\n#' a <- getFrcEnsem(nc, cell = c(6,2))\n#' \n#' # From the map, cell = c(6, 2) means lon = -1.4, lat = 43.2, so you can use corrd to locate\n#' # your research area and extract time series.\n#' b <- getFrcEnsem(nc, coord = c(-1.4, 43.2))\n#' \n#' \n#' @return A ensemble time series extracted from forecating data.\n#' \n#' @import ggplot2\n#' @importFrom reshape2 melt\n#' @references \n#' H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.\n#' \n#' Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software,\n#' 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.\n#' \n#' @export\ngetFrcEnsem <- function(dataset, cell = 'mean', plot = 'norm', output = 'data', name = NULL,\n mv = 0, coord = NULL, ...) {\n # cell should be a vector showing the location, or mean representing the loacation averaged.\n \n checkWord <- c('Data', 'xyCoords', 'Dates')\n if (any(is.na(match(checkWord, attributes(dataset)$names)))) {\n stop('Input dataset is incorrect, it should contain \"Data\", \"xyCoords\", and \"Dates\", \n check help for details.')\n }\n \n Date <- as.Date(dataset$Dates$start)\n data <- dataset$Data\n \n # Dimension needs to be arranged. Make sure first and second dimension is lat and lon.\n att <- attributes(data)$dimensions\n dimIndex <- seq(1, length(att))\n dimIndex1 <- match(c('lon', 'lat', 'time'), att)# match can apply to simple cases\n dimIndex2 <- dimIndex[-dimIndex1]# choose nomatch\n \n data <- aperm(data, c(dimIndex1, dimIndex2))\n attributes(data)$dimensions <- att[c(dimIndex1, dimIndex2)]\n \n if (!is.null(coord)) {\n if (length(coord) == 2) {\n # corrdinates\n lonC <- coord[1]\n latC <- coord[2]\n \n lon <- dataset$xyCoords$x\n lat <- dataset$xyCoords$y\n \n # Index\n lonI <- which(abs(lon - lonC) == min(abs(lon - lonC), na.rm = TRUE)) \n latI <- which(abs(lat - latC) == min(abs(lat - latC), na.rm = TRUE))\n \n cell <- c(max(lonI), max(latI))\n \n } else stop('Wrong coord input, should be c(lon, lat). Lon and lat should be within the dataset range.')\n } \n \n \n \n \n if (!any(attributes(data)$dimensions == 'member')){\n message('There is no member part in the dataset, there will be only one column of value\n returned.')\n \n if (length(cell) == 2) {\n data_ensem <- data[cell[1], cell[2], ]\n \n } else if (cell == 'mean') {\n data_ensem <- apply(data, MARGIN = 3, FUN = mean, na.rm = TRUE)\n # colnames <- 1:ncol(data_ensem)\n \n } else {\n stop('Wrong cell input, check help for information.')\n }\n \n } else {\n \n if (length(cell) == 2) {\n data_ensem <- data[cell[1], cell[2], , ]\n meanV <- apply(data_ensem, MARGIN = 1, FUN = mean, na.rm = TRUE)\n data_ensem <- data.frame('Mean' = meanV, data_ensem) \n \n } else if (cell == 'mean') {\n data_ensem <- apply(data, MARGIN = c(3, 4), FUN = mean, na.rm = TRUE)\n # colnames <- 1:ncol(data_ensem)\n meanV <- apply(data_ensem, MARGIN = 1, FUN = mean, na.rm = TRUE)\n data_ensem <- data.frame('Mean' = meanV, data_ensem)\n \n } else {\n stop('Wrong cell input, check help for information.')\n }\n }\n\n \n data_output <- data.frame(Date, data_ensem)\n data_ggplot <- melt(data_output, id.var = 'Date')\n NAIndex <- is.na(data_ggplot$value)\n \n \n if (plot == 'norm') {\n data_ggplot$value[NAIndex] <- mv\n } else if (plot == 'cum') {\n data_output[is.na(data_output)] <- mv\n cum <- cbind(data.frame(Date = data_output$Date), cumsum(data_output[2:ncol(data_output)]))\n \n data_ggplot <- melt(cum, id.var = 'Date')\n \n }\n \n colors = c('brown1', rainbow(n = length(unique(data_ggplot$variable)) - 1,\n start = 0.1))\n \n theme_set(theme_bw())\n mainLayer <- with(data_ggplot, {\n ggplot(data = data_ggplot) +\n aes(x = Date, y = value, color = variable) +\n geom_line(size = 0.5) +\n geom_line(data = data_ggplot[data_ggplot$variable == 'Mean', ], size = 1.6, color = 'red') +\n geom_point(data = data_ggplot[NAIndex, ], size = 2, shape = 4, color = 'black') +\n scale_color_manual(values = colors) +\n theme(axis.text.x = element_text(size = rel(1.8)),\n axis.text.y = element_text(size = rel(1.8)),\n axis.title.x = element_text(size = rel(1.8)),\n axis.title.y = element_text(size = rel(1.8))) +\n labs(empty = NULL, ...)#in order to pass \"...\", arguments shouldn't be empty.\n \n })\n print(mainLayer)\n \n if (output == 'ggplot') {\n if (is.null(name)) stop('\"name\" argument not found, \n If you choose \"ggplot\" as output, please assign a name.')\n \n data_ggplot$name <- rep(name, nrow(data_ggplot)) \n data_ggplot$nav <- rep(0, nrow(data_ggplot))\n data_ggplot$nav[NAIndex] <- 1\n return(data_ggplot)\n } else {\n return(data_output)\n }\n}\n\n\n\n#' Combine ensembles together\n#' @param ... different ensembles generated by \\code{getHisEnsem(, output = 'ggplot')} \n#' or \\code{getFrcEnsem(, output = 'ggplot')}, see details.\n#' @param nrow A number showing the number of rows.\n#' @param list If input is a list containing different ggplot data, use \\code{list = inputlist}.\n#' @param legend A boolean representing whether you want the legend. Sometimes when you combine\n#' plots, there will be a lot of legends, if you don't like it, you can turn it off by setting\n#' \\code{legend = FALSE}, default is TRUE.\n#' @param x A string of x axis name.\n#' @param y A string of y axis name.\n#' @param title A string of the title.\n#' @param output A boolean, if chosen TRUE, the output will be given.\n#' @return A combined ensemble plot.\n#' @export\n#' @import ggplot2\n#' @references \n#' H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.\n\ngetEnsem_comb <- function(..., list = NULL, nrow = 1, legend = TRUE, x = '', y = '', title = '', \n output = FALSE) {\n \n if (!is.null(list)) {\n data_ggplot <- do.call('rbind', list)\n } else {\n plots <- list(...)\n checkBind(plots, 'rbind')\n data_ggplot <- do.call('rbind', plots)\n } \n #data_ggplot$name <- factor(data_ggplot$name, levels = data_ggplot$name, ordered = TRUE)\n \n if (!class(data_ggplot) == 'data.frame') {\n warning('Your input is probably a list, but you forget to add \"list = \" before it.\n Try again, or check help for more information.')\n } else if (is.null(data_ggplot$name)) {\n stop('No \"Name\" column in the input data, check the arguments in getFreEnsem() or getHisEnsem(), if \n output = \"ggplot\" is assigned, more info please check ?getFreEnsem() or ?getHisEnsem().')\n }\n \n colors = c('brown1', 'dodgerblue3', rainbow(n = length(unique(data_ggplot$variable)) - 2,\n start = 0.1))\n \n theme_set(theme_bw())\n mainLayer <- with(data_ggplot, {\n ggplot(data = data_ggplot) +\n aes(x = Date, y = value, color = variable) +\n geom_line(size = 0.5) +\n geom_line(data = data_ggplot[data_ggplot$variable == 'Mean', ], size = 1.6) +\n geom_line(data = data_ggplot[data_ggplot$variable == 'Observation', ], size = 1.6) +\n geom_point(data = data_ggplot[data_ggplot$nav == 1, ], size = 2, shape = 4, color = 'black') +\n scale_color_manual(values = colors) +\n theme(axis.text.x = element_text(size = rel(1.8)),\n axis.text.y = element_text(size = rel(1.8)),\n axis.title.x = element_text(size = rel(1.8)),\n axis.title.y = element_text(size = rel(1.8))) +\n facet_wrap( ~ name, nrow = nrow) +\n labs(x = x, y = y, title = title)\n \n })\n if (legend == FALSE) {\n mainLayer <- mainLayer + \n theme(legend.position = 'none')\n# following ones are to add label, may be added in future.\n# geom_text(data = data_ggplot[data_ggplot$Date == '2003-12-10', ], aes(label = variable), hjust = 0.7, vjust = 1)\n# geom_text(data = data_ggplot[data_ggplot$variable == 'Mean', ], aes(label = variable), hjust = 0.7, vjust = 1)\n }\n \n \n print(mainLayer)\n \n if (output == TRUE) return(data_ggplot)\n \n}", "created" : 1441271740923.000, "dirty" : false, "encoding" : "ASCII", "folds" : "", "hash" : "1081962038", "id" : "51993C6D", "lastKnownWriteTime" : 1441615395, "path" : "~/hyfo/R/getEnsemble.R", "project_path" : "R/getEnsemble.R", "properties" : { "tempName" : "Untitled1" }, "relative_order" : 5, "source_on_save" : false, "type" : "r_source" }