Statistics Conference Proceedings, Presentations and Posters
#noDrivingDirections
0
Images from series: Statistics Conference Proceedings, Presentations and Posters
relativeToGround
-77.005923
38.889779
9000000
Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks
<p>Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known approaches is the distributed gradient descent method (DGD). However, in networks with slow communication rates, DGD's performance is unsatisfactory for solving high-dimensional network consensus problems due to the communication bottleneck. This motivates us to design a communication-efficient DGD-type algorithm based on compressed information exchanges. Our contributions in this paper are three-fold: i) We develop a communication-efficient algorithm called amplified-differential compression DGD (ADC-DGD) and show that it converges under any unbiased compression operator; ii) We rigorously prove the convergence performances of ADC-DGD and show that they match with those of DGD without compression; iii) We reveal an interesting phase transition phenomenon in the convergence speed of ADC-DGD. Collectively, our findings advance the state-of-the-art of network consensus optimization theory.</p>

https://lib.dr.iastate.edu/stat_las_conf/13
]]>
#sn_blue-dot_copy3
2.3522219,48.856614,4
Regression-Enhanced Random Forests
<p>Random forest (RF) methodology is one of the most popular machine learning techniques for prediction problems. In this article, we discuss some cases where random forests may suffer and propose a novel generalized RF method, namely regression-enhanced random forests (RERFs), that can improve on RFs by borrowing the strength of penalized parametric regression. The algorithm for constructing RERFs and selecting its tuning parameters is described. Both simulation study and real data examples show that RERFs have better predictive performance than RFs in important situations often encountered in practice. Moreover, RERFs may incorporate known relationships between the response and the predictors, and may give reliable predictions in extrapolation problems where predictions are required at points out of the domain of the training dataset. Strategies analogous to those described here can be used to improve other machine learning methods via combination with penalized parametric regression techniques.</p>

https://lib.dr.iastate.edu/stat_las_conf/9
]]>
#sn_blue-dot_copy3
-76.61218930000001,39.2903848,4
Identifying precipitation regimes in China using model-based clustering of spatial functional data
<p>The identification of precipitation regimes is important for many purposes such as agricultural planning, water resource management, and return period estimation. Since precipitation and other related meteorological data typically exhibit spatial dependency and different characteristics at different time scales, clustering such data presents unique challenges. In this short paper, we develop a flexible model-based approach to identify precipitation regimes in China by clustering spatial functional data. Though the focus of this study is on precipitation data, this methodology is generally applicable to other environmental data with similar structure.</p>

https://lib.dr.iastate.edu/stat_las_conf/12
]]>
#sn_blue-dot_copy3
-105.2705456,40.0149856,4
Should blocks be fixed or random?
<p>Many studies include some form of blocking in the study design. Block effects are rarely of intrinsic interest; instead they are included in a model so that that model reflects the study design. I consider the question of how these block effects should be modeled: as fixed effects or as random effects. I discuss the consequences of the choice, including the recovery of inter-block information when available, give a simple example to illustrate the connection between recovery of inter-block information and pooling two estimators of a treatment effect, and give an example where fitting a model with random block effects can lead to the wrong answer. I suggest that block effects should be modeled as fixed effects unless there are compelling reasons to do otherwise.</p>

https://lib.dr.iastate.edu/stat_las_conf/6
]]>
#sn_blue-dot_copy3
-96.57166940000002,39.18360819999999,4
Requests Prediction in Cloud with a Cyclic Window Learning Algorithm
<p>Automatic resource scaling is one advantage of cloud systems. Cloud systems are able to scale the number of physical machines depending on user requests. Therefore, accurate request prediction brings a great improvement in cloud systems' performance. If we can make accurate requests prediction, the appropriate number of physical machines that can accommodate predicted amount of requests can be activated and cloud systems will save more energy by preventing excessive activation of physical machines. Also, cloud systems can implement advanced load distribution with accurate requests prediction. We propose a prediction model that predicts probability distribution parameters of requests for each time interval. Maximum Likelihood Estimation (MLE) and Local Linear Regression (LLR) are used to implement this algorithm. An evaluation of the proposed algorithm is performed with the Google cluster-trace data. The prediction is achieved in terms of the number of task arrivals, CPU requests, and memory resource requests. Then the accuracy of prediction is measured with Mean Absolute Percentage Error(MAPE) and Normalized Mean Squared Error (NMSE).</p>

https://lib.dr.iastate.edu/stat_las_conf/11
]]>
#sn_blue-dot_copy3
-77.0368707,38.9071923,4
Case-Specific Random Forests for Big Data Prediction
<p>Some training datasets may be too large for storage on a single computer. Such datasets may be partitioned and stored on separate computers connected in a parallel computing environment. To predict the response associated with a specific target case when training data are partitioned, we propose a method for finding the training cases within each partition that are most relevant for predicting the response of a target case of interest. These most relevant training cases from each partition can be combined into a single dataset, which can be a subset of the entire training dataset that is small enough for storage and analysis in memory on a single computer. To generate a prediction from this selected subset, we use Case-Specific Random Forests, a variation of random forests that replaces the uniform bootstrap sampling used to build a tree in a random forest with unequal weighted bootstrap sampling, where training cases more similar to the target case are given greater weight. We demonstrate our method with an example concrete dataset. Our results show that predictions generated from a small selected subset of a partitioned training dataset can be as accurate as predictions generated in a traditional manner from the entire training dataset.</p>

https://lib.dr.iastate.edu/stat_las_conf/8
]]>
#sn_blue-dot_copy3
-122.3320708,47.6062095,4
Bayesian Inference for a Covariance Matrix
<p>Covariance matrix estimation arises in multivariate problems including multivariate normal sampling models and regression models where random effects are jointly modeled, e.g. random-intercept, random-slope models. A Bayesian analysis of these problems requires a prior on the covariance matrix. Here we compare an inverse Wishart, scaled inverse Wishart, hierarchical inverse Wishart, and a separation strategy as possible priors for the covariance matrix. We evaluate these priors through a simulation study and application to a real data set. Generally all priors work well with the exception of the inverse Wishart when the true variance is small relative to prior mean. In this case, the posterior for the variance is biased toward larger values and the correlation is biased toward zero. This bias persists even for large sample sizes and therefore caution should be used when using the inverse Wishart prior.</p>

https://lib.dr.iastate.edu/stat_las_conf/10
]]>
#sn_blue-dot_copy3
-96.57166939999999,39.18360819999999,4
Evaluating the Impact of Spatial Ability in Virtual and Real World Environments
<p>Survey agencies in the United States continue to move many map-based surveys from paper to handheld computers. With large highly diverse workforces, it is necessary to test software with a diverse population. The present work examines the performance of participants grouped by their level of spatial visualization. The participants were tested in either the field or in a fully immersive virtual environment. The methodology of the study is explained. The performance of the participants in the two environments is modeled with least squares regression. Results of the study are presented and discussed.</p>

https://lib.dr.iastate.edu/stat_las_conf/3
]]>
#sn_blue-dot_copy3
7.26559199999997,43.696036,4
A Structural Equation Model Correlating Success in Engineering with Academic Variables for Community College Transfer Students
<p>Student Enrollment and Engagement through Connections is a collaboration between a large Midwestern university and in-state community colleges (CCs) to increase success of transfers into engineering. This study explores predictors of completing a BS in engineering for CC transfers through a structural equation model. The model was estimated using academic variables from both institutions. The dataset includes 472 in-state CC transfer students admitted to the College of Engineering between 2002 and 2005. The model fits the data well (χ2=74.254, df=30, p<0.0001; RMSE=0.056, Comparative Fit Index=0.984, chi-square/df ratio=2.475). First spring University GPA and credit hours, CC transfer credits toward core engineering courses, first fall credit hours after transfer, first fall University GPA, and University core course GPA are significantly related to graduation in engineering. This research may help increase the success of CC transfers to engineering, emphasizing the importance of core engineering courses.</p>

https://lib.dr.iastate.edu/stat_las_conf/4
]]>
#sn_blue-dot_copy3
-84.3879824,33.7489954,4
Exploring a Map Survey Task's Sensitivity to Cognitive Ability
<p>The present work discusses an exploratory study aimed at understanding how users’ cognitive abilities influence performance and method during a series of address verification tasks. College students were given a paper map and asked to verify seven residential addresses scattered throughout a neighborhood. This approach, as opposed to using a mobile device as the verification medium, allotted participants more freedom with respect to address verification style and map interaction. The study methodology and results are discussed. The key contribution of the work described in the paper has been the identification of map usage behaviors that are sensitive to visualization and perspective taking.</p>

https://lib.dr.iastate.edu/stat_las_conf/1
]]>
#sn_blue-dot_copy3
2.111938000000009,50.17555,4
Computational Integration of Structural and Functional Genomics Data across Species to Develop Information on the Porcine Inflammatory Gene Regulatory Pathway
<p>We are investigating the porcine gut immune response to infection through gene expression profiling. Porcine Affymetrix GeneChip data was obtained from RNA prepared from mesenteric lymph node of swine infected with either Salmonella enterica serovar Typhimurium (ST) or S. Choleraesuis (SC) for 0, 8, 24, 48 or 504 hours post-inoculation (hpi). In total, 2,365 genes with statistical evidence for differential expression (DE; p < 0.01, q < 0.26, fold-change> 2) between at least two time-points were identified. Comparative Gene Ontology analyses revealed that a high proportion of annotated DE genes in both infections are involved in immune and defence responses. Hierarchical clustering of expression patterns and annotations showed that 22 of the 83 genes upregulated from 8-24 hpi in the SC infection are known NF-KB targets. The promoter sequences of human genes orthologous to the DE genes were collected and TFMExplorer was used to identify a set of 72 gene promoters with significant over-representation of NF-KB DNA-binding motifs. All 22 known NF-KB target genes are in this list; we hypothesize that the remaining 51 genes are un-recognized NF-KB targets. Integration of these results and verification of putative target genes will increase our understanding of the porcine response pathways responding to bacterial infection.</p>

https://lib.dr.iastate.edu/stat_las_conf/7
]]>
#sn_blue-dot_copy3
2.3522219000000177,48.856614,4
Quadratic model to estimate the doses causing the highest cholesterol concentration and the same cholesterol concentration as control group
<p>High plasma cholesterol (particularly high LDL-cholesterol) is a high risk factor for coronary heart disease (CHD), which causes a high CHD morbidity and mortality. Besides clinical drugs, more and more interest is focused on finding natural components in the diet that may have hypocholesterolemic effects. Plant sterols are natural components in human diets and found to have cholesterol-lowering effects in humans. Sheanut oil has a relatively high amolmt of plant sterols. Therefore, the two experiments were designed to investigate the hypocholesterolemic effect of sheanut oil in hamsters. The response was not monotonic. Low doses increased plasma cholesterol, but high doses decreased plasma cholesterol. Because there was partial dose repetition between the two experiments, the two were combined together to estimate the dose leading to the highest cholesterol concentration and the dose leading to the same cholesterol concentration as the control group. A quadratic model was selected to fit the combined data after appropriate transformation of exploratory and response variable. Nonparametric smoothing method was used to justify the quadratic model. The results of point estimation and confidence interval were compared by Delta, Fieller's and bootstrapping methods.</p>

https://lib.dr.iastate.edu/stat_las_conf/16
]]>
#sn_blue-dot_copy3
-96.57166939999999,39.18360819999999,4
Web-based Survey Tools
<p>The World Wide Web provides an effective means of supporting survey projects, particularly when data collectors are geographically dispersed. Wireless and wireline communications can be used to integrate the survey team by providing current and consistent supporting materials to all members of the project team. Interactive tutorials, survey instructions and updates, technical support, computer-assisted survey instrument software and updates, text-based and graphic survey management reports, data views for monitoring and editing, and summary reports are some of the tools that can be delivered via Web browsers to data collection staff, survey managers, clients, and the public. We will describe Web-based tools that have been developed to support a national survey of natural resources, and discuss possible extensions of this work.</p>

https://lib.dr.iastate.edu/stat_las_conf/2
]]>
#sn_blue-dot_copy3
-96.80045109999998,32.7801399,4