5/6/2022: Can Alondra Nelson Remake the Government’s Approach to Science and Tech?
Alondra Nelson, a sociologist, is the new director of the White House’s Office of Science and Technology Policy. Our meeting today was based off an article about her and her new position where she may be able to enact change in the executive branch’s relationship with science. Using her as a model, we postulated how we, as scientists and scientists-in-training, can address our research to those who may not interface with our research questions very often or use different jargon. We talked about the changes in science and ethical implications of inappropriate behavior in the workplace and how we can mitigate those effects through positive treatment of those we work with. Our lab hopes to increase outreach toward different groups in order to educate others in what we research and why.
4/22/2022 Day 8: Data transformation - Skewness, normalization and much more
In our research, we may conduct transformations on our data, however we must understand why data transformation is important and under what circumstances it is appropriate. Our article this week focuses on how data might be transformed and why. Sometimes a large range of values between different columns can cause values in each column to be weighted differently. When your data is transformed, it can help appropriately scale data and mitigate issues that may arise from skewedness and attribute aggregation. For normalization we can use the min-max or Z score techniques, where the min-max moves values toward the mean of the data and the Z score maintains impact of outliers. To handle issues with skewedness, there are multiple approaches that may be used: cube root and logarithmic transformations may be used for both positively and negatively skewed data, but due to mathematical limitations, square root transformation should only be applied to positively skewed data, whereas square transformation is used on negatively skewed data. Each of us works with different datasets, so we all recognize the importance of addressing different reasons for data transformation and the types of data transformation we may use.
4/8/2022 Data Science Mistakes to Avoid: Data Leakage
Data leakage can happen when you have information about testing data in your training data or when your model has been trained using information that is not available after it is pushed to production according to an article in Toward Data Science. This article focuses on making a machine learning model as accurate as possible. To prevent data leakage, this article suggests things like “using a sliding window to split time series data” and not to randomly split groups because the model might learn off of data that may not be accurate to the overall pattern. Outside of academia, when deploying our trained model, it might need to be retrained when presented with new information. A way to test this is to create a “challenger model” and test both the previous and challenger models on the challenger model test set. However, we must ensure the previous model’s training set does not contain data in the challenger’s test set. To prevent data leakage, some typical best practices are: splitting your data immediately, using cross-validation, being skeptical of high performance, using pipelines such as those in scikit-learn, and ensuring that features correlate with the target (what you want to predict). We have our ML_Pipeline where we do our best to follow best practices that avoid data leakage, but also like to remind ourselves that there is always room to improve on our systems.
3/25/2022 Broken bread - avert global wheat crisis caused by invasion of Ukraine
This week we recognize a major global event, the invasion of Ukraine by Russia. The article we read was one that Nature published about the impact this invasion has on the global wheat supply. Since Ukraine and Russia contribute about 11% of the world’s calories and 1/3 of the world’s wheat supply, as stated in the article, the invasion of Ukraine by Russia has caused a disruption to the global food supply, especially for wheat. As plant scientists and scientists-in-training, we understand it is important for us to educate ourselves about the scientific landscape. This example has shown how humans must ensure that we are taking steps to prevent crisis. To do so, the article mentioned that we could to expand world wheat production, help farmers gain access to best growing practices, create new flour blends, use monitoring systems for analysis on crop production and optimization of crop fields, study genomics to track pests and plant pathogens, and further invest in agriculture policy and science to support women farmers in rural areas. This article brings attention to potential paths for our research and how we can increase global food security.
3/11/2022 Scientists want to create a library of every sound in the ocean
We are 2 months into the spring semester and our CATSUP meetings have focused on improving our research or coding skills, but we must remember why we are all in this lab in the first place: we love science! This week’s article from Science we learned about how a group of scientists want to create a library, named “GLUBS”, which could eventually collect all underwater sounds. Their goal is to track changing marine ecosystems using this sound library. This article highlights something we love about AI: the diversity and creative ways that we can use different AI approaches to solve problems. A goal they have is to create an app that can be used by citizens to upload/identify sounds collected. Through teamwork, this library will be used to teach an AI to “learn” what the sounds are and identify unknown sound sources in the ocean.
2/25/2022 Python Code Quality: Tools & Best Practices
In the Shiu lab, we have both experienced and novice coders, but we know that everyone can always benefit from reviewing best coding practices. So, this Friday, we talked about the article Python Code Quality: Tools & Best Practices. While we talked about how to make code easier to read, like by having docstrings, and maintain/extend individually, we also recognize the importance of utilizing tools that already exist. This lead us to the conversation about if we should use linters. Some linters, such as Pylint will tell you what is wrong, whereas other linters, like Black, will automatically format your python document. The 2 main categories of linters, logical linters and stylistic linters. Logical linters will bring attention to coding errors, while stylistic linters will point out the code not following typical stylistic patterns. All-in-all, it is a personal preference how a person prepares their coding environment, the Shiu Lab does want to ensure we are all doing our best to follow the same docstring format and general stylistic formatting.
1/28/2022 Logical Fallacies
Logical fallacies are an issue that scientists need to look out for, as we discussed this week during CATSUP, using the article Logical Fallacies from Purdue University’s Online Writing Lab. As scientists-in-training, the graduate and undergraduate students in our lab work to avoid illegitimate arguments and faulty reasoning. Those in the lab with doctorates also recognized that throughout our careers, we will always need to consciously look for fallacies in our own and others arguments to ensure rigorous scientific research. Many of the fallacies are tied to oversimplifying an argument, which we worry will happen when we attempt to make our arguments as clear and concise as possible. We need to balance the nuances of our research with making our inherently multi-disciplinary research accessible to all groups who may look to access it.
1/13/2022 How scientists fool themselves – and how they can stop
In CATSUP this week we discussed the article How scientists fool themselves – and how they can stop by Regina Nuzzo and how our lab can best combat our own cognitive biases to conduct more robust and ethical science. We addressed “The Texas sharpshooter” cognitive trap, where someone produces multiple results and finds a pattern that may not be accurate. Sometimes a person has a tendency to pick the data that most agrees with their hypothesis or is interesting. A part of the article we thought would be a good way to combat this issue is by having “rivals”. In our lab try to create a collaborative environment where we work together to address problems that are related to multiple areas of our research. However, this collaborative environment includes questioning the reasoning of our collaborators and respectively creating counterpoints to address holes in our research results and methods.