Journal Description
Data
Data
is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), Ei Compendex, dblp, Inspec, RePEc, and other databases.
- Journal Rank: JCR - Q2 (Multidisciplinary Sciences) / CiteScore - Q2 (Information Systems and Management)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 27.7 days after submission; acceptance to publication is undertaken in 3.5 days (median values for papers published in this journal in the first half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
2.2 (2023);
5-Year Impact Factor:
2.4 (2023)
Latest Articles
Thermal Transmittance Limits Dataset for New and Existing Buildings Across EU Regulations
Data 2024, 9(11), 127; https://doi.org/10.3390/data9110127 (registering DOI) - 31 Oct 2024
Abstract
►
Show Figures
Building energy regulations are essential for reducing energy consumption in the European Union (EU) and achieving climate neutrality goals. This data article supplements the “Overview of EU Building Envelope Energy Requirement for Climate Neutrality” by presenting a detailed dataset on building regulations across
[...] Read more.
Building energy regulations are essential for reducing energy consumption in the European Union (EU) and achieving climate neutrality goals. This data article supplements the “Overview of EU Building Envelope Energy Requirement for Climate Neutrality” by presenting a detailed dataset on building regulations across all 27 EU member states, with a focus on building envelope efficiency. The data include thermal transmittance limits for windows, walls, floors, and roofs, offering insights into regulatory differences and potential opportunities for harmonization. Information was sourced from the Energy Performance of Buildings Directive (EPBD) database, national reports, and scientific literature to ensure comprehensive coverage. Key aspects of each country’s regulations are summarized in tables, covering both new constructions and renovations. The inclusion of Köppen–Geiger climate classifications allows for climate-specific analyses, providing valuable context for researchers, policymakers, and construction professionals. This dataset enables comparative studies, helping to identify best practices and inform policy interventions aimed at enhancing energy efficiency across Europe. It also supports the development of tailored strategies to improve building performance in different environmental conditions, ultimately contributing to the EU’s energy and climate targets.
Full article
Open AccessData Descriptor
Long-Term Outdoor Cultivation of Nannochloropsis in California, Hawaii, and New Mexico
by
Alina A. Corcoran, Marcela Saracco Alvarez, Taryn Cornell, Isidora Echenique-Subiabre, Julia Gerber, Stephanie Getto, Ahlem Jebali, Heather Martinez, Jakob O. Nalley, Charles J. O’Kelly, Aidan Ryan, Jonathan B. Shurin and Shawn R. Starkenburg
Data 2024, 9(11), 126; https://doi.org/10.3390/data9110126 - 29 Oct 2024
Abstract
►▼
Show Figures
The project “Optimizing Selection Pressures and Pest Management to Maximize Cultivation Yield” (OSPREY, award #DE-EE08902) was undertaken to enhance the annual productivity, stability, and quality of algal production strains for biofuels and bioproducts. The foundation of this project was the year-round cultivation of
[...] Read more.
The project “Optimizing Selection Pressures and Pest Management to Maximize Cultivation Yield” (OSPREY, award #DE-EE08902) was undertaken to enhance the annual productivity, stability, and quality of algal production strains for biofuels and bioproducts. The foundation of this project was the year-round cultivation of a Nannochloropsis strain across three outdoor systems in California, Hawaii, and New Mexico. We aimed to leverage environmental selection pressures to drive strain improvement and use metagenomic techniques to inform pest management tools. The resulting dataset includes environmental and biological parameters from these cultivation campaigns, captured in a single CSV file. This dataset aims to serve a wide range of end users, from biologists to algal farmers, addressing the scarcity of publicly available data on algae cultivation. Further data releases will include 16S rRNA amplicon sequencing and shotgun sequencing datasets.
Full article
Figure 1
Open AccessArticle
Enhancing Access Across Europe for Documents Published According to Freedom of Information Act: Applying Woogle Design and Technique to Estonian Public Information Act Document
by
Gerda Viira and Maarten Marx
Data 2024, 9(11), 125; https://doi.org/10.3390/data9110125 - 29 Oct 2024
Abstract
In the Netherlands, the Open Government Act (Wet openbare overheid or Woo/Wob in Dutch) is in effect, with the primary objective of ensuring a more transparent government. In line with the legislation, a search engine named Woogle has been designed and developed to
[...] Read more.
In the Netherlands, the Open Government Act (Wet openbare overheid or Woo/Wob in Dutch) is in effect, with the primary objective of ensuring a more transparent government. In line with the legislation, a search engine named Woogle has been designed and developed to centralize documents published under the Open Government Act. The Estonian Public Information Act serves a similar purpose and requires all public institutions to publish information generated during official duties, fostering transparency and public oversight. Currently, Estonia’s document repositories are decentralized, and content search is not supported, which hinders people’s ability to efficiently locate information. This study aims to assess public information accessibility in Estonia and to apply Woogle’s design and techniques to Estonia’s document repositories, thereby evaluating its potential for broader European implementation. The methodology involved web scraping data and documents from 57 Estonian public institutions’ document repositories. The results indicate that Woogle’s design and techniques can be implemented in Estonia. From a technical perspective, the alignment of the fields was successful, while it was found that content-wise, the Estonian data present challenges due to inconsistencies and lack of comprehensive categorization. The findings suggest potential scalability across European countries, pointing to a broader applicability of the Woogle model for creating a corpus of Freedom of Information Act documents in Europe. The collected data are available as a dataset.
Full article
(This article belongs to the Section Information Systems and Data Management)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
Curated Polyoxometalate Formula Dataset
by
Aleksandar Kondinski, Nadiia Gumerova and Annette Rompel
Data 2024, 9(11), 124; https://doi.org/10.3390/data9110124 - 29 Oct 2024
Abstract
Reticular and cluster materials often feature complex formulas, making a comprehensive overview challenging due to the need to consult various resources. While datasets have been collected for metal-organic frameworks (MOFs), covalent organic frameworks (COFs), and zeolites, among others, there remains a gap in
[...] Read more.
Reticular and cluster materials often feature complex formulas, making a comprehensive overview challenging due to the need to consult various resources. While datasets have been collected for metal-organic frameworks (MOFs), covalent organic frameworks (COFs), and zeolites, among others, there remains a gap in systematically organized information for polyoxometalates. This paper introduces a carefully curated dataset of 1984 polyoxometalate (POM) and related cluster metal oxide formula instances, currently connecting over 2500 POM material instances. These POM instances incorporate 75 different chemical elements, with compositions ranging from binary to octonary element clusters. This dataset not only enhances accessibility to polyoxometalate data but also aims to facilitate further research and development in the study of these complex inorganic compounds.
Full article
(This article belongs to the Section Chemoinformatics)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
Sustainable Transportation Characteristics Diary—Example of Older (50+) Cyclists
by
Sreten Jevremović, Carol Kachadoorian, Filip Arnaut, Aleksandra Kolarski and Vladimir A. Srećković
Data 2024, 9(11), 123; https://doi.org/10.3390/data9110123 - 25 Oct 2024
Abstract
►▼
Show Figures
Cycling is a sustainable and healthy form of transportation that is gradually becoming the primary means of transportation over shorter distances in many countries. This paper describes the dataset used to determine the cycling characteristics of seniors in the USA and Canada. For
[...] Read more.
Cycling is a sustainable and healthy form of transportation that is gradually becoming the primary means of transportation over shorter distances in many countries. This paper describes the dataset used to determine the cycling characteristics of seniors in the USA and Canada. For these purposes, a specially created questionnaire was used in a survey conducted from August 2021 to July 2022. The questionnaire contained sections related to the general socio-demographic characteristics of the respondents, general characteristics of cycling (type of bicycle, cycle time, mileage, etc.), and specific characteristics of cycling (riding in night conditions, termination of cycling, motivating and demotivating factors for cycling, etc.). The total sample consisted of 5096 respondents (50+ years old). This database is particularly significant because it represents the first set of publicly available data related to the cycling characteristics of older adults. The database can be used by various researchers dealing with this topic, but also by the decision-makers who want to design a sustainable and accessible cycling infrastructure, respecting the requirements of this category of users. Finally, this dataset can serve as an adequate basis in the process of determining the specificities and understanding the needs of older cyclists in traffic.
Full article
Figure 1
Open AccessData Descriptor
Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images
by
Dirk Steinke, Sujeevan Ratnasingham, Jireh Agda, Hamzah Ait Boutou, Isaiah C. H. Box, Mary Boyle, Dean Chan, Corey Feng, Scott C. Lowe, Jaclyn T. A. McKeown, Joschka McLeod, Alan Sanchez, Ian Smith, Spencer Walker, Catherine Y.-Y. Wei and Paul D. N. Hebert
Data 2024, 9(11), 122; https://doi.org/10.3390/data9110122 - 25 Oct 2024
Abstract
The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images
[...] Read more.
The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images with representatives from 10 arthropod classes and 26 insect orders. All images were taken using a Keyence VHX-7000 Digital Microscope system with an automatic stage to permit high-resolution (4K) microphotography. Providing phenotypic data for 324,000 species derived from 48 countries, this release represents, by far, the largest dataset of standardized arthropod images. As such, this dataset is well suited for testing the efficacy of machine learning algorithms for identifying specimens into higher taxonomic categories.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures
Figure 1
Open AccessArticle
Computing the Commonalities of Clusters in Resource Description Framework: Computational Aspects
by
Simona Colucci, Francesco Maria Donini and Eugenio Di Sciascio
Data 2024, 9(10), 121; https://doi.org/10.3390/data9100121 - 20 Oct 2024
Abstract
Clustering is a very common means of analysis of the data present in large datasets, with the aims of understanding and summarizing the data and discovering similarities, among other goals. However, despite the present success of the use of subsymbolic methods for data
[...] Read more.
Clustering is a very common means of analysis of the data present in large datasets, with the aims of understanding and summarizing the data and discovering similarities, among other goals. However, despite the present success of the use of subsymbolic methods for data clustering, a description of the obtained clusters cannot rely on the intricacies of the subsymbolic processing. For clusters of data expressed in a Resource Description Framework (RDF), we extend and implement an optimized, previously proposed, logic-based methodology that computes an RDF structure—called a Common Subsumer—describing the commonalities among all resources. We tested our implementation with two open, and very different, RDF datasets: one devoted to public procurement, and the other devoted to drugs in pharmacology. For both datasets, we were able to provide reasonably concise and readable descriptions of clusters with up to 1800 resources. Our analysis shows the viability of our methodology and computation, and paves the way for general cluster explanations to be provided to lay users.
Full article
(This article belongs to the Section Information Systems and Data Management)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
Rainfall Erosivity over Brazil: A Large National Database
by
Mariza P. Oliveira-Roza, Roberto A. Cecílio, David B. S. Teixeira, Michel C. Moreira, André Q. Almeida, Alexandre C. Xavier and Sidney S. Zanetti
Data 2024, 9(10), 120; https://doi.org/10.3390/data9100120 - 14 Oct 2024
Abstract
Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database
[...] Read more.
Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database exists for the country. To fill this gap, this work aimed to review the research and generate a rainfall erosivity database for Brazil. Data were gathered from studies that determined rainfall erosivity from observed rainfall records and synthetic rainfall series. Monthly and annual rainfall erosivity values were organized on a spreadsheet and in the shapefile format. In total, 54 studies from 1990 to 2023 were analyzed, resulting in the compilation of 5516 erosivity values for Brazil, of which 6.3% were pluviographic, and 93.7% were synthetic. The regions with the highest availability of information were the Northeast (35.6%), Southeast (30.1%), South (19.9%), Central-West (7.7%), and North (6.7%). The database, which can be accessed on the Mendeley Data platform, can aid professionals and researchers in adopting public policies and carrying out studies aimed at environmental conservation and management basin development.
Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
►▼
Show Figures
Figure 1
Open AccessArticle
Data Mining Approach for Evil Twin Attack Identification in Wi-Fi Networks
by
Roman Banakh, Elena Nyemkova, Connie Justice, Andrian Piskozub and Yuriy Lakh
Data 2024, 9(10), 119; https://doi.org/10.3390/data9100119 - 14 Oct 2024
Abstract
Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a
[...] Read more.
Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a legitimate access point. The article is devoted to solving the problem of intrusion detection at the OSI model’s physical layers. To solve this, a hardware–software complex has been developed to collect information about the signal strength from Wi-Fi access points using wireless sensor networks. The collected data were supplemented with a generative algorithm considering all possible combinations of signal strength. The k-nearest neighbor model was trained on the obtained data to distinguish the signal strength of legitimate from illegitimate access points. To verify the authenticity of the data, an Evil Twin attack was physically simulated, and a machine learning model analyzed the data from the sensors. As a result, the Evil Twin attack was successfully identified based on the signal strength in the radio spectrum. The proposed model can be used in open access points as well as in large corporate and home Wi-Fi networks to detect intrusions aimed at substituting devices in the radio spectrum where IEEE 802.11 networking equipment operates.
Full article
(This article belongs to the Section Information Systems and Data Management)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
A Dataset of Two-Dimensional XBeach Model Set-Up Files for Northern California
by
Andrea C. O’Neill, Kees Nederhoff, Li H. Erikson, Jennifer A. Thomas and Patrick L. Barnard
Data 2024, 9(10), 118; https://doi.org/10.3390/data9100118 - 11 Oct 2024
Abstract
►▼
Show Figures
Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on
[...] Read more.
Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on the data and their application, such that they might be useful to end-users for other coastal studies. Modeling methods and outputs are presented for Humboldt Bay, California, in which we compare output from a nested 1D modeling approach to 2D model results, demonstrating that the 2D method, while more computationally expensive, results in a more cohesive and directly mappable flood hazard result.
Full article
Figure 1
Open AccessArticle
Perception and Reuse of Open Data in the Spanish University Teaching and Research Community
by
Christian Vidal-Cabo, Enrique Alfonso Sánchez-Pérez and Antonia Ferrer-Sapena
Data 2024, 9(10), 117; https://doi.org/10.3390/data9100117 - 11 Oct 2024
Abstract
Introduction. Open Government is a form of public policy based on the pillars of collaboration and citizen participation, transparency and the right of access to public information. With the help of information and communication technologies, governments and administrations carry out open data initiatives,
[...] Read more.
Introduction. Open Government is a form of public policy based on the pillars of collaboration and citizen participation, transparency and the right of access to public information. With the help of information and communication technologies, governments and administrations carry out open data initiatives, making reusable datasets available to all citizens. The academic community, highly qualified personnel, can become potential reusers of this data, which would lead to its use for scientific research, generating knowledge, and for teaching, improving the training of university students and promoting the reuse of open data in the future. Method. This study was developed using a quantitative research methodology (survey), which was distributed by email in one context block and six technical blocks, with a total of 30 questions. The data collection period was between 15 March and 10 May 2021. Analysis. The data obtained through this quantitative methodology were processed, normalised, and analysed. Results. A total of 783 responses were obtained, from 34 Spanish provinces. The researchers come from 47 Spanish universities and 21 research centres, and 19 research areas of the State Research Agency are represented. In addition, a platform was developed with the data for the purpose of visualising the results of the survey. Conclusions. The sample thus obtained is representative and the conclusions can be extrapolated to the rest of the Spanish university teaching staff. In terms of gender, the study is balanced between men and women (41.76% W vs. 56.58% M). In general, researchers responding to the survey know what open data is (79.31%) but only 50.57% reuse open data. The main conclusion is that open government data prove to be useful sources of information for science, especially in areas such as Social Sciences, Industrial Production, Engineering and Engineering for Society, Information and Communication Technologies, Economics and Environmental Sciences.
Full article
(This article belongs to the Section Information Systems and Data Management)
Open AccessData Descriptor
Data Descriptor for “Understanding and Perception of Automated Text Generation among the Public: Two Surveys with Representative Samples in Germany”
by
Angelica Lermann Henestrosa and Joachim Kimmerle
Data 2024, 9(10), 116; https://doi.org/10.3390/data9100116 - 11 Oct 2024
Abstract
With the release of ChatGPT, text-generating AI became accessible to the general public virtually overnight, and automated text generation (ATG) became the focus of public debate. Previously, however, little attention had been paid to this area of AI, resulting in a gap in
[...] Read more.
With the release of ChatGPT, text-generating AI became accessible to the general public virtually overnight, and automated text generation (ATG) became the focus of public debate. Previously, however, little attention had been paid to this area of AI, resulting in a gap in the research on people’s attitudes and perceptions of this technology. Therefore, two representative surveys among the German population were conducted before (March 2022) and after (July 2023) the release of ChatGPT to investigate people’s attitudes, concepts, and knowledge on ATG in detail. This data descriptor depicts the structure of the two datasets, the measures collected, and potential analysis approaches beyond the existing research paper. Other researchers are encouraged to take up these data sets and explore them further as suggested or as they deem appropriate.
Full article
Open AccessArticle
Characterization and Dataset Compilation of Torque–Angle Curve Behavior for M2/M3 Screws
by
Iván Juan Carlos Pérez-Olguín, Consuelo Catalina Fernández-Gaxiola, Luis Alberto Rodríguez-Picón and Luis Carlos Méndez-González
Data 2024, 9(10), 115; https://doi.org/10.3390/data9100115 - 6 Oct 2024
Abstract
This research explores the torque–angle behavior of M2/M3 screws in automotive applications, focusing on ensuring component reliability and manufacturing precision within the recommended assembly specification limits. M2/M3 screws, often used in tight spaces, are susceptible to issues like stripped threads and inconsistent torque,
[...] Read more.
This research explores the torque–angle behavior of M2/M3 screws in automotive applications, focusing on ensuring component reliability and manufacturing precision within the recommended assembly specification limits. M2/M3 screws, often used in tight spaces, are susceptible to issues like stripped threads and inconsistent torque, which can compromise safety and performance. The study’s primary objective is to develop a comprehensive dataset of torque–angle measurements for these screws, facilitating the analysis of key parameters such as torque-to-seat, torque-to-fail, and process windows. By applying Gaussian curve fitting and Gaussian process regression, the research models and simulates torque behavior to understand torque dynamics in small fasteners and remarks on the potential of statistical methods in torque analysis, offering insights for improving manufacturing practices. As a result, it can be concluded that the proposed stochastics methodologies offer the benefit of fail-to-seat ratio improvement, allow inference, reduce the sample size needed in incoming test studies, and minimize the number of destructive test samples needed.
Full article
(This article belongs to the Special Issue Cutting-Edge Datasets and Algorithms for Enhancing Industrial Processes and Supply Chain Optimization)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
Open and Collaborative Dataset for the Classification of Operational Transconductance Amplifiers for Switched-Capacitor Applications
by
Francesco Gagliardi and Michele Dei
Data 2024, 9(10), 114; https://doi.org/10.3390/data9100114 - 3 Oct 2024
Abstract
►▼
Show Figures
This study introduces a collaborative and open dataset designed to classify operational transconductance amplifiers (OTAs) in switched-capacitor applications. The dataset comprises a diverse collection of OTA designs sourced from the literature, facilitating benchmarking, analysis and innovation in analog and mixed-signal integrated circuit design.
[...] Read more.
This study introduces a collaborative and open dataset designed to classify operational transconductance amplifiers (OTAs) in switched-capacitor applications. The dataset comprises a diverse collection of OTA designs sourced from the literature, facilitating benchmarking, analysis and innovation in analog and mixed-signal integrated circuit design. Various evaluation methodologies, implemented through a companion Python notebook script, are discussed to assess OTA performances across different operating conditions and specifications. Several Figures of Merit (FoMs) are utilized as performance metrics to achieve significant performance classification. This study also uncovers intriguing behaviors and correlations among FoMs, providing valuable insights into OTA design considerations. By making the dataset openly available on platforms like GitHub, this work encourages collaboration and knowledge sharing within the integrated circuit design community, thereby enhancing transparency, reproducibility and innovation in OTA design research.
Full article
Figure 1
Open AccessData Descriptor
Dataset for Machine Learning: Explicit All-Sky Image Features to Enhance Solar Irradiance Prediction
by
Joylan Nunes Maciel, Jorge Javier Gimenez Ledesma and Oswaldo Hideo Ando Junior
Data 2024, 9(10), 113; https://doi.org/10.3390/data9100113 - 29 Sep 2024
Abstract
Prediction of solar irradiance is crucial for photovoltaic energy generation, as it helps mitigate intermittencies caused by atmospheric fluctuations such as clouds, wind, and temperature. Numerous studies have applied machine learning and deep learning techniques from artificial intelligence to address this challenge. Based
[...] Read more.
Prediction of solar irradiance is crucial for photovoltaic energy generation, as it helps mitigate intermittencies caused by atmospheric fluctuations such as clouds, wind, and temperature. Numerous studies have applied machine learning and deep learning techniques from artificial intelligence to address this challenge. Based on the recently proposed Hybrid Prediction Method (HPM), this paper presents an original and comprehensive dataset with nine attributes extracted from all-sky images developed using image processing techniques. This dataset and analysis of its attributes offer new avenues for research into solar irradiance forecasting. To ensure reproducibility, the data processing workflow and the standardized dataset have been meticulously detailed and made available to the scientific community to promote further research into prediction methods for photovoltaic energy generation.
Full article
(This article belongs to the Topic Smart Energy Systems, 2nd Edition)
►▼
Show Figures
Figure 1
Open AccessArticle
Fundamentals of Analysis of Health Data for Non-Physicians
by
Carlos Hernández-Nava, Miguel-Félix Mata-Rivera and Sergio Flores-Hernández
Data 2024, 9(10), 112; https://doi.org/10.3390/data9100112 - 27 Sep 2024
Abstract
The increasing prevalence of diabetes worldwide, including in Mexico, presents significant challenges to healthcare systems. This has a notable impact on hospital admissions, as diabetes is considered an ambulatory care-sensitive condition, meaning that hospitalizations could be avoided. This is just one example of
[...] Read more.
The increasing prevalence of diabetes worldwide, including in Mexico, presents significant challenges to healthcare systems. This has a notable impact on hospital admissions, as diabetes is considered an ambulatory care-sensitive condition, meaning that hospitalizations could be avoided. This is just one example of many challenges faced in the medical and public health fields. Traditional healthcare methods have been effective in managing diabetes and preventing complications. However, they often encounter limitations when it comes to analyzing large amounts of health data to effectively identify and address diseases. This paper aims to bridge this gap by outlining a comprehensive methodology for non-physicians, particularly data scientists, working in healthcare. As a case study, this paper utilizes hospital diabetes discharge records from 2010 to 2023, totaling 36,665,793 records from medical units under the Ministry of Health of Mexico. We aim to highlight the importance for data scientists to understand the problem and its implications. By doing so, insights can be generated to inform policy decisions and reduce the burden of avoidable hospitalizations. The approach primarily relies on stratification and standardization to uncover rates based on sex and age groups. This study provides a foundation for data scientists to approach health data in a new way.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures
Figure 1
Open AccessArticle
Non-Linear Relationship between MiRNA Regulatory Activity and Binding Site Counts on Target mRNAs
by
Shuangmei Tian, Ziyu Zhao, Beibei Ren and Degeng Wang
Data 2024, 9(10), 111; https://doi.org/10.3390/data9100111 - 25 Sep 2024
Abstract
MicroRNAs (miRNA) exert regulatory actions via base pairing with their binding sites on target mRNAs. Cooperative binding, i.e., synergism, among binding sites on an mRNA is biochemically well characterized. We studied whether this synergism is reflected in the global relationship between miRNA-mediated regulatory
[...] Read more.
MicroRNAs (miRNA) exert regulatory actions via base pairing with their binding sites on target mRNAs. Cooperative binding, i.e., synergism, among binding sites on an mRNA is biochemically well characterized. We studied whether this synergism is reflected in the global relationship between miRNA-mediated regulatory activity and miRNA binding site count on the target mRNAs, i.e., leading to a non-linear relationship between the two. Recently, using our own and public datasets, we have enquired into miRNA regulatory actions: first, we analyzed the power-law distribution pattern of miRNA binding sites; second, we found that, strikingly, mRNAs for core miRNA regulatory apparatus proteins have extraordinarily high binding site counts, forming self-feedback-control loops; third, we revealed that tumor suppressor mRNAs generally have more sites than oncogene mRNAs; and fourth, we characterized enrichment of miRNA-targeted mRNAs in translationally less active polysomes relative to more active polysomes. In these four studies, we qualitatively observed obvious positive correlation between the extent to which an mRNA is miRNA-regulated and its binding site count. This paper summarizes the datasets used. We also quantitatively analyzed the correlation by comparative linear and non-linear regression analyses. Non-linear relationships, i.e., accelerating rise of regulatory activity as binding site count increases, fit the data much better, conceivably a transcriptome-level reflection of cooperative binding among miRNA binding sites on a target mRNA. This observation is potentially a guide for integrative quantitative modeling of the miRNA regulatory system.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures
Figure 1
Open AccessArticle
Comprehensive Overview of Long-Term Ecosystem Research Datasets at LTER Site Oberes Stubachtal
by
Bernhard Zagel, Hans Wiesenegger, Robert R. Junker and Gerhard Ehgartner
Data 2024, 9(10), 110; https://doi.org/10.3390/data9100110 - 25 Sep 2024
Abstract
This article provides a comprehensive overview of all currently available datasets of the Long-term Ecosystem Research (LTER) site Oberes Stubachtal. The site is located in the Hohe Tauern mountain range (Eastern Alps, Austria) and includes both protected areas (Hohe Tauern National Park) and
[...] Read more.
This article provides a comprehensive overview of all currently available datasets of the Long-term Ecosystem Research (LTER) site Oberes Stubachtal. The site is located in the Hohe Tauern mountain range (Eastern Alps, Austria) and includes both protected areas (Hohe Tauern National Park) and unprotected areas (Stubach valley). While the main research focus of the site is on high mountains, glaciology, glacial hydrology, and biodiversity, the eLTER Whole-System Approach (WAILS) was used for data selection. This approach involves a systematic screening of all available data to assess their suitability as eLTER Standard Observations (SOs). This includes the geosphere, atmosphere, hydrosphere, biosphere, and sociosphere. These SOs are fundamental to the development of a comprehensive long-term ecosystem research framework. In total, more than 40 datasets have been collated for the LTER site Oberes Stubachtal and included in the Dynamic Ecological Information Management System—Site and Data Registry (DEIMS-SDR), the eLTER’s data platform. This paper provides a detailed inventory of the datasets and their primary attributes, evaluates them against the WAILS-required observation data, and offers insights into strategies for future initiatives. All datasets are made available through dedicated repositories for FAIR (findable, accessible, interoperable, reusable) use.
Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
►▼
Show Figures
Figure 1
Open AccessData Descriptor
Data on Economic Analysis: 2017 Social Accounting Matrices (SAMs) for South Africa
by
Ramigo Pfunzo, Yonas T. Bahta and Henry Jordaan
Data 2024, 9(9), 109; https://doi.org/10.3390/data9090109 - 20 Sep 2024
Abstract
►▼
Show Figures
The purpose of the Social Accounting Matrix (SAM) is to improve the quality of the database for modelling, including, but not limited to, policy analysis, multiplier analysis, price analysis, and Computable General Equilibrium. This article contributes to constructing the 2017 national SAM for
[...] Read more.
The purpose of the Social Accounting Matrix (SAM) is to improve the quality of the database for modelling, including, but not limited to, policy analysis, multiplier analysis, price analysis, and Computable General Equilibrium. This article contributes to constructing the 2017 national SAM for South Africa, incorporating regional accounts. Only in Limpopo Province of South Africa are agricultural industries, labour, and households captured at the district level, while agricultural industry, labour, and household accounts in other provinces remain unchanged. The main data sources for constructing a SAM are found from different sources, such as Supply and Use Tables, National Accounts, Census of Commercial Agriculture, Quarterly Labour Force Survey, South Africa Revenue Service, Global Insight (regional explorer), and South Africa Reserve Bank. The dataset recorded that land returns for irrigation agriculture were highest (18.2%) in the Northern Cape Province of South Africa compared to other provinces, whereas the Free State Province of South Africa rainfed agriculture had the largest shares (22%) for payment to land. Regarding intermediate inputs, rainfed agriculture in the Western Cape, Free State, and Kwazulu-Natal Provinces paid approximately 0.4% for using intermediate inputs. In terms of the districts, land returns for irrigation were highest in the Vhembe district of Limpopo Province of South Africa with 0.3%. Despite Mopani district of Limpopo Province of South Africa having the lowest land returns for irrigation agriculture, it has the highest share (1.6%) of payment to land from rainfed agriculture. The manufacturing and community service sectors had a trade deficit, whereas other sectors experienced a trade surplus. The main challenges found in developing a SAM are scarcity of data to attain the information needed for disaggregation for the sub-matrices and insufficient information from different data sources for estimating missing information to ensure the row and column totals of the SAM are consistent and complete.
Full article
Figure 1
Open AccessData Descriptor
Dataset on the Validation and Standardization of the Questionnaire for the Self-Assessment of Service-Learning Experiences in Higher Education (QaSLu)
by
Roberto Sánchez-Cabrero, Elena López-de-Arana Prado, Pilar Aramburuzabala and Rosario Cerrillo
Data 2024, 9(9), 108; https://doi.org/10.3390/data9090108 - 19 Sep 2024
Abstract
►▼
Show Figures
This dataset shows the original validation and standardization of the Questionnaire for the Self-Assessment of Service-Learning Experiences in Higher Education (QaSLu). The QaSLu is the first instrument to measure university service-learning (USL), validated following a strict qualitative and quantitative process by a sample
[...] Read more.
This dataset shows the original validation and standardization of the Questionnaire for the Self-Assessment of Service-Learning Experiences in Higher Education (QaSLu). The QaSLu is the first instrument to measure university service-learning (USL), validated following a strict qualitative and quantitative process by a sample of experts in USL and generating rating scales for different profiles of professors. The Delphi method was used for the qualitative validation by 16 academic experts, who evaluated the relevance and clarity of the items. After two consultation rounds, 45 items were qualitatively validated, generating the QaSLu-45. Then, 118 instructors from 43 universities took part as the sample in the quantitative validation procedure. Quantitative validation was carried out through goodness-of-fit measures using confirmatory factor analysis and the final configuration optimized using one-factor robust exploratory factor analysis, determining the most optimal version of the questionnaire under the law of parsimony, the QaSLu-27, with only 27 items and better psychometric properties. Finally, rating scales were calculated to compare different profiles of USL professors. These findings offer a valid, strong, and trustworthy instrument. The QaSLu-27 may be helpful for the design of USL experiences, in addition to facilitating the assessment of such programs to enhance teaching and learning processes.
Full article
Figure 1
Journal Menu
► ▼ Journal Menu-
- Data Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Guidelines for Reviewers
- Special Issues
- Topics
- Sections & Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Algorithms, Data, Information, Mathematics, Symmetry
Decision-Making and Data Mining for Sustainable Computing
Topic Editors: Sunil Jha, Malgorzata Rataj, Xiaorui ZhangDeadline: 30 November 2024
Topic in
BDCC, Data, MAKE, Mathematics
Big Data Intelligence: Methodologies and Applications
Topic Editors: Liang Zhao, Liang Zou, Boxiang DongDeadline: 31 December 2024
Topic in
BDCC, Data, Environments, Geosciences, Remote Sensing
Database, Mechanism and Risk Assessment of Slope Geologic Hazards
Topic Editors: Chong Xu, Yingying Tian, Xiaoyi Shao, Zikang Xiao, Yulong CuiDeadline: 28 February 2025
Topic in
Data, Energies, Sensors, Sustainability, Water
Water and Energy Monitoring and Their Nexus
Topic Editors: Lucas Pereira, Hugo Morais, Wolf-Gerrit FrühDeadline: 31 March 2025
Conferences
Special Issues
Special Issue in
Data
Benchmarking Datasets in Bioinformatics, 2nd Volume
Guest Editor: Pufeng DuDeadline: 20 November 2024
Special Issue in
Data
Data in Astrophysics and Geophysics: Research and Applications, 3rd Volume
Guest Editors: Vladimir Sreckovic, Milan S. Dimitrijević, Zoran MijicDeadline: 30 November 2024
Special Issue in
Data
Navigating Emerging Advancements and Challenges in AI and Big Data Technologies for Business and Society
Guest Editor: Michael GerlichDeadline: 30 March 2025
Special Issue in
Data
New Progress in Big Earth Data
Guest Editors: Aditya Chakravarty, Juanle WangDeadline: 30 March 2025
Topical Collections
Topical Collection in
Data
Modern Geophysical and Climate Data Analysis: Tools and Methods
Collection Editors: Vladimir Sreckovic, Zoran Mijic