Machine Learning applied to biodiversity data

Description

Artificial intelligence, particularly the branch of machine learning, has grown rapidly in recent years, offering a wide range of techniques that can be used individually or together to analyze, extract patterns, and generate new data across countless application areas, which has led to significant advancements in these disciplines.

This course was developed in response to the training needs in different areas of data science identified by the Mesoamerican Data Science Network for Biodiversity Conservation (Redbioma). It is aimed at professionals working in activities related to biodiversity conservation, so it is focused on problem-solving and developing knowledge and skills to design and implement simple machine learning models (using the Python programming language) applied to datasets relevant to the professional areas of the participants.

Schedule and start date
The course will be offered in two groups with the following schedules:

Group #1: Starts on Wednesday, July 10, and will be held every Wednesday from 3:00 PM to 5:00 PM (GMT-6) for 8 weeks.
Group #2: Starts on Thursday, July 11, and will be held every Thursday from 6:00 PM to 8:00 PM (GMT-6) for 8 weeks.

Course type

Modality: Virtual.
Theoretical/Practical: To complete the program, participants must attend more than 75% of virtual synchronous classes and achieve an average greater than or equal to 70 in evaluations.
Cost: Free.

Requirements

Availability of at least 16 hours during the entire program to attend eight virtual synchronous sessions. (2 hrs/class)
Availability of at least 24 hours during the entire the program for completing short assignments, labs, and a final project. (3 hrs/week)
Basic knowledge of Python programming, Numpy and Pandas libraries for data handling, and general knowledge of geospatial data representation. Preferably, participants should have completed the Introduction to Python for data science course offered as part of this training program.
Fill out the form to Participate in Redbioma activities (previously circulated, please fill out only once).

Registration form

Link: Registration for Machine Learning course

Objectives

General

Development of problem-solving skills for simple cases related to biodiversity conservation using machine learning techniques and understanding this branch of data science/artificial intelligence and its current challenges.

Specific

Build a solid understanding of basic concepts, machine learning development cycle, visualization techniques, and well as the role of data in the process, that will help students identify the machine learning skills required for their present and future professional development.
Identify the appropriate type of machine learning for each problem type.
Build simple models based on supervised, unsupervised, semi-supervised, and deep learning techniques relevant to biodiversity conservation.

Course methodology

The course methodology is based on active and collaborative learning, through problem-solving in labs, research work, and inverted classroom, among others techniques. The purpose is to guide students to strengthen their ability to research, use public datasets, critically analyze scientific articles, and apply new concepts based on previously acquired knowledge and course content.

The program is theoretical/practical, allowing participants to apply theoretical knowledge through case studies, group discussions, labs, and research projects.

Important:

All synchronous sessions will be recorded and published on the Redbioma website.
Final research projects will be published on the Redbioma website.

Program content

Fundamentals of Machine Learning

Brief historical overview (timeline, past and future).
Basic definitions.
Theoretical foundations. [1]
Machine Learning Cycle: design, implementation, evaluation, result interpretation, and deployment. [2]
Role of data: types of data, preparation.
No/Low-code AI, Generative AI.

Machine Learning Approaches

Types of problems.
Types of machine learning.
Criteria for technique selection and result comparison.
Considerations for dataset design based on the type of machine learning.

Supervised Learning

Description of supervised learning techniques.
Building simple models based on supervised learning techniques relevant to biodiversity conservation.

Unsupervised Learning

Description of unsupervised learning techniques.
Building simple models based on unsupervised learning techniques relevant to biodiversity conservation.

Semi-Supervised Learning

Description of semi-supervised learning techniques.
Building simple models based on semi-supervised learning techniques relevant to biodiversity conservation.

Deep Learning

Description of deep learning techniques.
Building simple models based on deep learning techniques relevant to biodiversity conservation. [3,4]

Final Reflections

Challenges and risks of machine learning. [5]
Opportunities for application in biodiversity conservation.

Evaluation

Students will complete short assignments, labs, and a final project. Evaluation items are as follows:

Item	Value (%)
Short assignments	30
Labs	40
Final project	30
Total	100

The final project involves identifying a machine learning use case relevant to the professional context of the participant.

Each evaluation will have a previously established due date. Submissions must be made by 11:45 PM (GMT-6). Late submissions will not be accepted. Google Classroom will be used for submissions.

Class schedule

Class	Week
Fundamentals of Machine Learning	1
Machine Learning Approaches	2
Supervised Learning	3
Unsupervised Learning	4
Semi-Supervised Learning	5
Reinforcement Learning	5
Deep Learning	6-8
Final reflections	8
Project presentations	8

Materials

We recommend studying the following tutorials prior to the course:

References

Greener, J. G., Kandathil, S. M., & Jones, D. T. (2022). A guide to machine learning for biologists.Nature Reviews Molecular Cell Biology, 23(1), 40–55. External link
Guralnick, R. P., LaFrance, R., Allen, J. M., & Denslow, M. W. (2024). Ensemble automated approaches for producing high‐quality herbarium digital records. Applications in Plant Sciences, 13(1), e11623. External link
Triki, A., Bouaziz, B., & Mahdi, W. (2022). A deep learning-based approach for detecting plant organs from digitized herbarium specimen images. Ecological Informatics, 69, 101590. External link
Weaver, W. N., & Smith, S. A. (2023). From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2. Applications in Plant Sciences, 11(5), e11548. External link
Sworna, Z. T., Urzedo, D., Hoskins, A. J., & Robinson, C. J. (2024). The ethical implications of Chatbot developments for conservation expertise. AI and Ethics, 4, 917–926. External link

Contacts

Professors	Email
Instructor: Emilia Zeledón Lostalo	emilia.zeledon@gmail.com
María Auxiliadora Mora	maria.mora@itcr.ac.cr