Optimization Studies in SE (including Search-Based Software Engineering)

Research studies that focus on the formulation of software engineering problems as search problems, and apply optimization techniques to solve such problems¹.

Application

This standard applies to empirical studies that meet the following criteria:

Formulates a software engineering task² as an optimization problem, with one or more specified fitness functions³ used to judge success in this task.
Applies one or more approaches that generate solutions to the problem in an attempt to maximize or minimize the specified fitness functions.

Specific Attributes

We stress that the use of optimization in SE is still a rapidly evolving field. Hence, the following criteria are approximate and there may exist many exceptions to them.

Essential

Desirable

Extraordinary

Analyze different parameter choices to the algorithm, indicating how the final parameters were selected¹⁷.
Analyze the fitness landscape for one or more of the chosen fitness functions.

General Quality Criteria

The most valuable quality criteria for optimization studies in SE include reliability, replicability, reproducibility, rigor, and usefulness (see Glossary).

Examples of Acceptable Deviations

The number of trials can be constrained by available time or experimental resources (e.g. where experiments are time-consuming to repeat or have human elements). In such cases, multiple trials are still ideal, but a limited number of trials can be justified as long as the limitations are disclosed and the possible effects of stochasticity are discussed.
The use of industrial case studies is important in demonstrating the real-world application of a proposed technique, but industrial data generally cannot be shared. In such cases, it is recommended that a small open-source example be prepared and distributed as part of a replication package to demonstrate how the approach can be applied.

Antipatterns

Reporting significance tests (e.g., Mann-Whitney Wilcoxon test) without effect size tests (see Notes)
Conducting multiple trials but failing to disclose or discuss the variation between trials; for instance reporting a measure of central (e.g. median) without any indication of variance (e.g., a boxplot).

Invalid Criticisms

The paper is unimportant. Be cautious of rejecting papers that seem “unimportant” (in the eyes of a reviewer). Research is exploratory and it is about taking risks. Clealy-motivated research and speculative exploration are both important and should be rewarded.
The paper just uses older algorithms with no reference to recent work. Using older (and widely understood algorithms) may be valid when they are used, e.g., (1) as part of a larger set that compares many approaches; e.g. (2) to offer a “straw man” method that defines the “floor” of the performance (that everything else needs to beat); or (3), as a workbench within which one thing is changed (e.g., the fitness function) but everything else remains constant.
That an approach is not benchmarked against an inappropriate or unavailable baseline. If a state-of-the-art approach lacks an available and functional implementation, it is not reasonable to expect the author to recreate that approach for benchmarking purposes.
That a multi-objective approach is not compared to a single-objective approach by evaluating each objective separately. This is not a meaningful comparison because, in a multi-objective problem, the trade-off between the objectives is a major factor in result quality. It is more important to consider the Pareto frontiers and quality indicators.
That one or very few subjects are used, as long as the paper offers a reasonable justification for why this was the case.

Exemplars

Hussein Almulla, Gregory Gay. 2020. Learning How to Search: Generating Exception-Triggering Tests Through Adaptive Fitness Function Selection. In Proceedings of 13th IEEE International Conference on Software Testing (ICST’20). IEEE, 63-73. DOI: https://doi.org/10.1109/ICST46399.2020.00017
Jianfeng Chen, Vivek Nair, Rahul Krishna, Tim Menzies. “Sampling” as a Baseline Optimizer for Search-Based Software Engineering. IEEE Transactions on Software Engineering 2019 45(6), 2019. DOI: https://doi.org/10.1109/TSE.2018.279092
José Campos, Yan Ge, Nasser Albunian, Gordon Fraser, Marcelo Eler and Andrea Arcuri. 2018. An empirical evaluation of evolutionary algorithms for unit test suite generation. Information and Software Technology. vol. 104, pp. 207–235. DOI: https://doi.org/10.1016/j.infsof.2018.08.010
Feather, Martin S., and Tim Menzies. "Converging on the optimal attainment of requirements." Proceedings IEEE Joint International Conference on Requirements Engineering. IEEE, 2002.
G. Mathew, T. Menzies, N. Ernst and J. Klein. 2017. "SHORT"er Reasoning About Larger Requirements Models. In 2017 IEEE 25th International Requirements Engineering Conference (RE), Lisbon, Portugal, pp. 154-163. doi: 10.1109/RE.2017.3
Annibale Panichella, Fitsum Meshesha Kifetew and Paolo Tonella. 2018. Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets. IEEE Transactions on Software Engineering. vol. 44, no. 2, pp. 122–158. DOI: https://doi.org/10.1109/TSE.2017.2663435
Federica Sarro, Filomena Ferrucci, Mark Harman, Alessandra Manna and Jen Ren. 2017. Adaptive Multi-Objective Evolutionary Algorithms for Overtime Planning in Software Projects. IEEE Transactions on Software Engineering, vol. 43, no. 10, pp. 898-917. DOI: https://doi.org/10.1109/TSE.2017.2650914
Federica Sarro, Alessio Petrozziello, and Mark Harman. 2016. Multi-objective software effort estimation. In Proceedings of the 38th International Conference on Software Engineering (ICSE'16). Association for Computing Machinery, New York, NY, USA, 619–630. DOI: https://doi.org/10.1145/2884781.2884830
Norbert Siegmund, Stefan Sobernig, and Sven Apel. 2017. Attributed variability models: outside the comfort zone. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE. Association for Computing Machinery, New York, NY, USA, 268–278. DOI: https://doi.org/10.1145/3106237.3106251

Notes

Regarding the difference between "significance" and "effect size" tests: "Significance" checks if distributions can be distinguished from each other while "Effect size" tests are required to check if the difference between distributions is "interesting" (and not just a trivially "small effect"). These tests can be parametric or non-parametric. For example, code for the parametric t-test/Hedges significance/effect tests endorsed by Kampenese et al. can be found at https://tinyurl.com/y4o7ucnx. Code for a parametric Scott-Knot/Cohen test of the kind endorsed by Mittas et al. is available at https://tinyurl.com/y5tg37fp. Code for the non-parametric bootstrap/Cliffs Delta significant/effect tests of the kind endorsed by Efron et al. and Arcuri et al. can be found at https://tinyurl.com/y2ufofgu.

Footnotes

¹: Note that there are many such optimization techniques (metaheuristic; numerical optimizers; constraint solving theorem provers SAT,SMT,CSP; and other), some of which are stochastic.

²: E.g., test input creation, design refactoring, effort prediction.

³: A "fitness function", or "objective function", is a numerical scoring function used to indicate the quality of a solution to a defined problem. Optimization approaches attempt to maximize or minimize such functions, depending on whether lower or higher scores indicate success.

⁴: E.g., if the cross-product of the space of options is very large or if the time required to perform a task manually is very slow.

⁵: E.g., the numerical optimizer, the specific metaheuristic, the constraint solving method, etc.

⁶: For example, do not use an algorithm such as Simulated Annealing, or even a specific approach such as NSGA-II, to solve an optimization problem unless it is actually appropriate for that problem. While one rarely knows the best approach for a new problem, one should at least consider the algorithms applied to address similar problems and make an informed judgement.

⁷: If the approach addresses a problem never tackled before, then it should be compared - at least - to random search. Otherwise, compare the proposed approach to the existing state of the art.

⁸: E.g., a test suite or test case in test generation.

⁹: E.g., a tree or vector structure.

¹⁰: Example techniques - Simulated Annealing, Genetic Algorithm. Example heuristic - single-point crossover. Example parameters - crossover and mutation rates.

¹¹: E.g., proprietary data, ethics issues, or a Non-Disclosure Agreement.

¹²: For example, stochasticity may arise from the use of randomized algorithms, from the use of a fitness function that measures a random variable from the environment (e.g., a fitness function based on execution time may return different results across different executions), from the use of data sampling or cross-validation approaches.

¹³: E.g., the approach is too slow, human-in-the-loop.

¹⁴: Reviewers should reward sound and novel work and, where possible, support a diverse range of studies.

¹⁵: Including, for example, source code (of approach, solution representation, and fitness calculations), datasets used as experiment input, and collected experiment data (e.g., output logs, generated solutions).

¹⁶: For example, if applying a multi-objective optimization approach, then use a criterion that can analyze the Pareto frontier of solutions (e.g., generational distance and inverse generational distance)

¹⁷: E.g., applying hyperparameter optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimization Studies in SE (including Search-Based Software Engineering)

Application

Specific Attributes

Essential

Desirable

Extraordinary

General Quality Criteria

Examples of Acceptable Deviations

Antipatterns

Invalid Criticisms

Suggested Readings

Exemplars

Notes

Footnotes

About

Releases

Contributors 7

Greg4cr/sbse-sigsoft-standard

Folders and files

Latest commit

History

Repository files navigation

Optimization Studies in SE (including Search-Based Software Engineering)

Application

Specific Attributes

Essential

Desirable

Extraordinary

General Quality Criteria

Examples of Acceptable Deviations

Antipatterns

Invalid Criticisms

Suggested Readings

Exemplars

Notes

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Contributors 7